Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Creating Longitudinal Bar Charts with Sankey-style Overlays: A SAS Macro, Study notes of Advanced Data Analysis

A SAS macro for creating longitudinal bar charts with Sankey-style overlays. The macro allows users to visualize changes in groups over time and identify the origin of new groups. Sankey diagrams provide a visual representation of the flow between nodes in a network, making them an ideal addition to longitudinal bar charts. examples of Sankey diagrams and longitudinal bar charts, as well as instructions on how to use the macro with a vertical dataset.

What you will learn

  • How can Sankey diagrams be used to enhance longitudinal bar charts?
  • What information can be gained from using a SAS macro to create longitudinal bar charts with Sankey-style overlays?
  • How can the macro be used to identify the origin of new groups in longitudinal bar charts?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

goofy-6
goofy-6 🇬🇧

5

(6)

230 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
PharmaSUG 2015 - Paper DV07
Getting Sankey with Bar Charts
Shane Rosanbalm, Rho, Inc., Chapel Hill, NC
ABSTRACT
In this paper we present a SAS® macro for depicting change over time in a stacked bar chart with Sankey-style
overlays.
Imagine a clinical trial in which subject disease severity is tracked over time. The disease severity has valid values of
0, 1, 2, and 3. The time points are baseline, 12 months, 30 months, and 60 months.
A straightforward way to represent this data would be with a vertically-oriented stacked bar chart. Visit would be used
as the x-axis variable. Disease severity would be used to form the groups in the stacked bars. The y-axis measure
would be percent of subjects in each group.
Figure 1. A Longitudinal Bar Chart
This type of data visualization allows us to see the change within each group over time. However, the data
visualization does not allow us to see which group of subjects is driving these changes. For instance, if the number of
subjects at severity level 1 were to increase from baseline to month 12, how do we know whether the new subjects
are coming from group 0, 2, or 3?
Sankey diagrams provide a visual depiction of the magnitude of flow between nodes in a network. If we think of the
groups in a stacked bar chart as these nodes, then Sankey-style overlays can be used to show how the subjects flow
from one severity level to another over time. In this paper we will present just such a data visualization.
pf3
pf4
pf5
pf8
pf9

Related documents


Partial preview of the text

Download Creating Longitudinal Bar Charts with Sankey-style Overlays: A SAS Macro and more Study notes Advanced Data Analysis in PDF only on Docsity!

PharmaSUG 2015 - Paper DV

Getting Sankey with Bar Charts

Shane Rosanbalm, Rho, Inc., Chapel Hill, NC

ABSTRACT

In this paper we present a SAS® macro for depicting change over time in a stacked bar chart with Sankey-style overlays.

Imagine a clinical trial in which subject disease severity is tracked over time. The disease severity has valid values of 0, 1, 2, and 3. The time points are baseline, 12 months, 30 months, and 60 months.

A straightforward way to represent this data would be with a vertically-oriented stacked bar chart. Visit would be used as the x-axis variable. Disease severity would be used to form the groups in the stacked bars. The y-axis measure would be percent of subjects in each group.

Figure 1. A Longitudinal Bar Chart

This type of data visualization allows us to see the change within each group over time. However, the data visualization does not allow us to see which group of subjects is driving these changes. For instance, if the number of subjects at severity level 1 were to increase from baseline to month 12, how do we know whether the new subjects are coming from group 0, 2, or 3?

Sankey diagrams provide a visual depiction of the magnitude of flow between nodes in a network. If we think of the groups in a stacked bar chart as these nodes, then Sankey-style overlays can be used to show how the subjects flow from one severity level to another over time. In this paper we will present just such a data visualization.

SANKEY DIAGRAMS

Sankey diagrams provide a visual depiction of the magnitude of flow between nodes in a network. Consider the following example.

Figure 2. A Sankey Diagram of Energy Supply and Consumption

In this Sankey diagram the nodes and links represent energy. The nodes on the left represent energy sources and the nodes on the right represent energy consumers. The nodes and links are drawn in proportion to the amount of energy. For instance, there are two links flowing from the coal node. The upper link is very thin whereas the lower link is much thicker. This indicates that most coal is used for electricity generation, with a very small amounts used directly by consumers.

Sankey diagrams can be used to represent quantities other than energy. Consider the following example taken from the world of auto racing.

Figure 3. A Sankey Diagram of Points Distribution in Auto Racing

In this Sankey diagram the nodes and links represent driver points in a racing series. Several things are immediately obvious based on this diagram. We can see from the sizes of the nodes on the left that each race awards the same number of points. We can see from the sizes of the nodes on the right that Red Bull is the most successful team. We can see that Lewis Hamilton has earned points in every race, and yet he does not have the most points.

SOME DATA TO WORK WITH

The dataset used to generate the Sankey bar charts in this paper is a so-called vertical dataset.

Display 1. A Vertical Dataset

This term comes from the fact that the multiple outcomes for each subject are stored on separate records (i.e., vertically). Some datasets are so-called horizontal datasets, in which there is one record per subject and separate variables for each visit. Given the current prominence of the CDISC ADaM standards within the pharma industry, the Sankey bar chart macros assume a vertical dataset as the source.

Appendix 1 contains the data step code used to generate the above dataset.

SANKEY BAR CHARTS

Sankey bar charts are produced using a set of three SAS macros. There is an outer container macro, %SankeyBarChart, which calls to two helper macros.

 The first of these helper macros is called %RawToSankey. This macro converts a vertical dataset (i.e., one record per subject and visit) into two summary datasets. The first summary dataset is for the Sankey nodes (i.e., the bar segments) and the second summary dataset is for the Sankey links (i.e., the connectors).

 The second of these helper macros is called %Sankey. This macro uses the above summary datasets to produce the Sankey bar chart by way of the SGPLOT procedure.

THE FIRST HELPER MACRO

The %RawToSankey helper macro has 4 required parameters and 2 optional parameters.

Parameter Description Required? data Vertical dataset to be converted Yes subject Subject identifier Yes yvar Categorical y-axis variable Yes xvar Categorical x-axis variable Yes yvarord Sort order for y-axis values E.g., yvarord=%str(red rum, george) (default is equivalent to ORDER=DATA)

No

xvarord Sort order for x-axis values E.g., xvarord=%str(pink plum, fred) (default is equivalent to ORDER=DATA)

No

Table 1. Parameters for %RawToSankey Macro

Using the aforementioned vertical dataset, a typical call to %RawToSankey might appear as follows.

%rawtosankey (data=dummy ,subject=subject ,yvar=riskfactors ,xvar=visit ,yvarord=%str(0, 1, 2, 3) ,xvarord=%str(-1, 12, 30, 60) );

The macro takes the vertical dataset at left and creates two summary dataset: nodes and links.

Display 2. Converting a Vertical Dataset into Sankey-ready Datasets Nodes and Links

Vertical Dataset

THE CONTAINER MACRO

The %SankeyBarChart macro is nothing more than a container for the two helper macros. As such, the parameter list for the container macro is nothing more than the sum of the parameter lists for the helper macros.

Parameter Description Required? data Vertical dataset to be converted Yes subject Subject identifier Yes yvar Categorical y-axis variable Yes xvar Categorical x-axis variable Yes yvarord Sort order for y-axis values E.g., yvarord=%str(red rum, george) (default is equivalent to ORDER=DATA)

No

xvarord Sort order for x-axis values E.g., xvarord=%str(pink plum, fred) (default is equivalent to ORDER=DATA)

No

colorlist A space-separated list of colors: one per y-value E.g., colorlist=red vlio cxb2df8a (default is qualitative Brewer palette)

No

barwidth Width of bars Valid values are from 0- (default is 0.25)

No

xfmt Format for x-axis^ No legendtitle Text for legend title^ No interpol Method of interpolating between bars Valid values are cosine, linear (default is cosine)

No

percents Show percents inside each bar Valid values are yes, no (default is yes)

No

Table 3. Parameters for %SankeyBarChart Macro

A typical call to %SankeyBarChart might appear as follows.

%sankeybarchart (data=dummy ,subject=subject ,yvar=riskfactors ,xvar=visit ,yvarord=%str(0, 1, 2, 3) ,xvarord=%str(-1, 12, 30, 60) ,barwidth=0. ,xfmt=xfmt. ,legendtitle=%str(# of Risk Factors) );

Data manipulation

Formatting

Data manipulation

Formatting

Figure 6. A Bar Chart with Sankey-style Overlays

CONCLUSION

Sankey bar charts are enhancements of longitudinal bar charts that add Sankey-style overlays between the bars at adjacent time points. These overlays illustrate which groups are driving changes in the bars over time, providing deeper insight into the data being visualized. The SAS macros presented in this paper can be used to assist in the creation of Sankey bar charts.

MACRO SOURCE CODE

The Sankey bar chart macro source code is available for direct download at graphics.rhoworld.com/tools/sankeybarchart. Alternatively, send email requests to graphics@rhoworld.com.

RECOMMENDED READING

 Graphically Speaking

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

Name: Shane Rosanbalm Enterprise: Rho, Inc Address: 6330 Quadrangle Drive City, State ZIP: Chapel Hill, NC 27517 Work Phone: 919-595- E-mail: shane_rosanbalm@rhoworld.com Web: graphics.rhoworld.com

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.