





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A SAS macro for creating longitudinal bar charts with Sankey-style overlays. The macro allows users to visualize changes in groups over time and identify the origin of new groups. Sankey diagrams provide a visual representation of the flow between nodes in a network, making them an ideal addition to longitudinal bar charts. examples of Sankey diagrams and longitudinal bar charts, as well as instructions on how to use the macro with a vertical dataset.
What you will learn
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!
PharmaSUG 2015 - Paper DV
In this paper we present a SAS® macro for depicting change over time in a stacked bar chart with Sankey-style overlays.
Imagine a clinical trial in which subject disease severity is tracked over time. The disease severity has valid values of 0, 1, 2, and 3. The time points are baseline, 12 months, 30 months, and 60 months.
A straightforward way to represent this data would be with a vertically-oriented stacked bar chart. Visit would be used as the x-axis variable. Disease severity would be used to form the groups in the stacked bars. The y-axis measure would be percent of subjects in each group.
Figure 1. A Longitudinal Bar Chart
This type of data visualization allows us to see the change within each group over time. However, the data visualization does not allow us to see which group of subjects is driving these changes. For instance, if the number of subjects at severity level 1 were to increase from baseline to month 12, how do we know whether the new subjects are coming from group 0, 2, or 3?
Sankey diagrams provide a visual depiction of the magnitude of flow between nodes in a network. If we think of the groups in a stacked bar chart as these nodes, then Sankey-style overlays can be used to show how the subjects flow from one severity level to another over time. In this paper we will present just such a data visualization.
Sankey diagrams provide a visual depiction of the magnitude of flow between nodes in a network. Consider the following example.
Figure 2. A Sankey Diagram of Energy Supply and Consumption
In this Sankey diagram the nodes and links represent energy. The nodes on the left represent energy sources and the nodes on the right represent energy consumers. The nodes and links are drawn in proportion to the amount of energy. For instance, there are two links flowing from the coal node. The upper link is very thin whereas the lower link is much thicker. This indicates that most coal is used for electricity generation, with a very small amounts used directly by consumers.
Sankey diagrams can be used to represent quantities other than energy. Consider the following example taken from the world of auto racing.
Figure 3. A Sankey Diagram of Points Distribution in Auto Racing
In this Sankey diagram the nodes and links represent driver points in a racing series. Several things are immediately obvious based on this diagram. We can see from the sizes of the nodes on the left that each race awards the same number of points. We can see from the sizes of the nodes on the right that Red Bull is the most successful team. We can see that Lewis Hamilton has earned points in every race, and yet he does not have the most points.
The dataset used to generate the Sankey bar charts in this paper is a so-called vertical dataset.
Display 1. A Vertical Dataset
This term comes from the fact that the multiple outcomes for each subject are stored on separate records (i.e., vertically). Some datasets are so-called horizontal datasets, in which there is one record per subject and separate variables for each visit. Given the current prominence of the CDISC ADaM standards within the pharma industry, the Sankey bar chart macros assume a vertical dataset as the source.
Appendix 1 contains the data step code used to generate the above dataset.
Sankey bar charts are produced using a set of three SAS macros. There is an outer container macro, %SankeyBarChart, which calls to two helper macros.
The first of these helper macros is called %RawToSankey. This macro converts a vertical dataset (i.e., one record per subject and visit) into two summary datasets. The first summary dataset is for the Sankey nodes (i.e., the bar segments) and the second summary dataset is for the Sankey links (i.e., the connectors).
The second of these helper macros is called %Sankey. This macro uses the above summary datasets to produce the Sankey bar chart by way of the SGPLOT procedure.
The %RawToSankey helper macro has 4 required parameters and 2 optional parameters.
Parameter Description Required? data Vertical dataset to be converted Yes subject Subject identifier Yes yvar Categorical y-axis variable Yes xvar Categorical x-axis variable Yes yvarord Sort order for y-axis values E.g., yvarord=%str(red rum, george) (default is equivalent to ORDER=DATA)
No
xvarord Sort order for x-axis values E.g., xvarord=%str(pink plum, fred) (default is equivalent to ORDER=DATA)
No
Table 1. Parameters for %RawToSankey Macro
Using the aforementioned vertical dataset, a typical call to %RawToSankey might appear as follows.
%rawtosankey (data=dummy ,subject=subject ,yvar=riskfactors ,xvar=visit ,yvarord=%str(0, 1, 2, 3) ,xvarord=%str(-1, 12, 30, 60) );
The macro takes the vertical dataset at left and creates two summary dataset: nodes and links.
Display 2. Converting a Vertical Dataset into Sankey-ready Datasets Nodes and Links
Vertical Dataset
The %SankeyBarChart macro is nothing more than a container for the two helper macros. As such, the parameter list for the container macro is nothing more than the sum of the parameter lists for the helper macros.
Parameter Description Required? data Vertical dataset to be converted Yes subject Subject identifier Yes yvar Categorical y-axis variable Yes xvar Categorical x-axis variable Yes yvarord Sort order for y-axis values E.g., yvarord=%str(red rum, george) (default is equivalent to ORDER=DATA)
No
xvarord Sort order for x-axis values E.g., xvarord=%str(pink plum, fred) (default is equivalent to ORDER=DATA)
No
colorlist A space-separated list of colors: one per y-value E.g., colorlist=red vlio cxb2df8a (default is qualitative Brewer palette)
No
barwidth Width of bars Valid values are from 0- (default is 0.25)
No
xfmt Format for x-axis^ No legendtitle Text for legend title^ No interpol Method of interpolating between bars Valid values are cosine, linear (default is cosine)
No
percents Show percents inside each bar Valid values are yes, no (default is yes)
No
Table 3. Parameters for %SankeyBarChart Macro
A typical call to %SankeyBarChart might appear as follows.
%sankeybarchart (data=dummy ,subject=subject ,yvar=riskfactors ,xvar=visit ,yvarord=%str(0, 1, 2, 3) ,xvarord=%str(-1, 12, 30, 60) ,barwidth=0. ,xfmt=xfmt. ,legendtitle=%str(# of Risk Factors) );
Data manipulation
Formatting
Data manipulation
Formatting
Figure 6. A Bar Chart with Sankey-style Overlays
Sankey bar charts are enhancements of longitudinal bar charts that add Sankey-style overlays between the bars at adjacent time points. These overlays illustrate which groups are driving changes in the bars over time, providing deeper insight into the data being visualized. The SAS macros presented in this paper can be used to assist in the creation of Sankey bar charts.
The Sankey bar chart macro source code is available for direct download at graphics.rhoworld.com/tools/sankeybarchart. Alternatively, send email requests to graphics@rhoworld.com.
Graphically Speaking
Your comments and questions are valued and encouraged. Contact the author at:
Name: Shane Rosanbalm Enterprise: Rho, Inc Address: 6330 Quadrangle Drive City, State ZIP: Chapel Hill, NC 27517 Work Phone: 919-595- E-mail: shane_rosanbalm@rhoworld.com Web: graphics.rhoworld.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.