Late to the Party
I’ve known about Makeover Monday for years, but never quite found the motivation to participate. Frankly, I don’t see myself as having the artistic design skills necessary to do great visualizations. Or, to use a more “growth-mindset” phrasing, I haven’t put effort into developing design skills. Also, in my work at Cigna most of the time I use Tableau for data discovery rather than for dashboard development, so I’ve gotten away without having developed Tableau design skills.
But I had the idea of creating a course around the Makeover Monday datasets. The idea would be to demonstrate analyzing the datasets and creating a visual, but more importantly to get students to submit their own work and give and receive feedback. Michelle’s comments on resonate with me, “I had been building visualizations for many years at work, but I was not satisfied with how they looked or the information they conveyed/stories they were telling. “
Makeover Monday is on an indefinite hiatus, however the datasets from over 300 weeks are available, and you can get inspiration from other people’s work by looking through submissions on the submission tracker:
The Bloomberg article with the original viz has the title The Cereal Industry Had a Very Weird Year, basically showing that after a period of decline, cereal manufacturers were surprised by a sudden surge of demand related to COVID in March of 2020. “Cereal” in the first visual actually includes things like rice, flour, and pasta in addition to breakfast cereal, but there are other breakfast cereal-specific charts later in the article.
So my first question was whether other food categories showed a similar spike. It turns out they did.
I created a crosstab view with Product Sub-Category and Month, filtering on 2020:
A quick table calculation shows the % increase in March 2020:
All sub-categories saw large increases in March 2020. Fats and oils showed the same % increase as cereals. Notice also that there was a decrease in April, but expenditures didn’t fall back to the same level they were at in February… for example Cereals spiked 26.94% in March, then fell 12.61% in February. All sub-categories followed that pattern.
I put together a workbook with some more analysis, writing down insights in the captions.
Creating a Barbell Chart
Using the submission tracker to look at past submissions, I found a nice one by Aeiyuni Husna:
I like this one because for a few reasons:
- The viz highlights the magnitude of the increase, not only from February to March but from March 2019 to 2020.
- The annotation makes it clear why the increase occurred
- The animation (which you can see if you click the link) and buttons are nice way to look at the different categories. Compare Alcohol to the other sub-categories.
But there are some limitations:
- Filtering out the data before 2019 means that you can’t see just how unprecedented this spike is.
- The x-axis does not include zero, which visually exaggerates the magnitude of the change.
- The scale is unclear… what does 15K mean? There is nothing on the chart to indicate
- You can look at individual categories using the buttons, but there’s no way to see the overall trend
Producing a Similar Visualization
Aeiyuni used a barbell chart, but it has a number of other names: dumbell chart, DNA chart, gap chart, barbell chart, and connected dot plot. Tableau has a tutorial on it, but I found this more helpful. Following the instructions, I came up with this:
I multiplied the “Millions of dollars” measure by 1,000,000 so to create a field called “Dollars” and set the default number format to billions.
That’s a start, but I’d like it to look better. Some improvements:
- The line (bar) of the barbell should go behind the dot (bell).
- Get rid of the right axis, and most of the axis lines
- Change the color scheme. 2020 should pop out.
- Add an attractive title.
- Improve the appearance of the annotation
The measure that comes first on the row shelf goes to the front, so by reordering we can get the dots to come in front
Get rid of the gridlines
Format > Borders then set the Row Divider and Column Divider to None
Change the color scheme. 2020 should pop out.
After overwritting my tps file by copying the file to \Documents\My Tableau Repository, I restarted Tableau. The new colors show up here:
I went with Magma 20.
Improve the appearance of the annotation
Aeiyuni’s approach of creating a an “annotation” was to add a text object to the dashboard, then use two blank objects with a background color to create the lines from the text object to the chart. by adding a text object and
about following I set an (invisible) reference line above my chart to add space for the annotation.
This extra space gave me room for the annotation. I did a rounded border and made the lines fairly thick:
Hiding the left axis
With a dual axis chart, if you try to hide an axis you end up hiding both. To hide just the right axis, set the tick marks to None:
Victuals is apparently the correct term for “food and drink,” though perhaps I should have just written “Buy for Food and Drink” instead. Here’s the link to the dashboard.
This is by no means Iron Viz awesome, but I think it’s progress.
This post will explain three ways to calculate the median in Tableau. Let’s get started!
Method 1: Use the MEDIAN function
First create a new calculation by selecting Analysis > Create Calculated Field:
Next write using the syntax MEDIAN(expression). The expression should be just the name of the field for which you’d like the median. In my example, I will calculate the median exam score.
Median is an aggregation function, so summarizes the exam scores.
You can see that I get the same value if I used the Median function in Excel:
Method 2: Use Percentile
The median is another term for the 50th percentile, so one way to calculate the median to calculate the 50th percentile, as shown here.
Method 3: Create a Box-and-Whisker Plot
When you create a box plot, Tableau will calculate the median. First uncheck aggregate measures:
Add the field you’d like to find the median for to the viz.
Then select Box-and-Whisker on the show-me page:
When you mouse over the box-and-whisker plot, you’ll see the median value:
A continuous axis in Tableau will default to include 0 and will adjust automatically the range based on the minimum and maximum values in the visualization. However, changing the axis range can be done easily by right-clicking on the axis and selecting edit axis:
After selecting Edit Axis, you’ll see this menu with numerous options including the option to change the axis range.
For example, I can keep the start at 0 but increase the top of the range:
This won’t work if you’re using a discrete field (shown with a blue pill) for the axis. Discrete fields don’t actually create axes, they create headers. Each unique value of the discrete field will create a separate header. In the example below, since the State is discrete I don’t get the option to Edit Axis.
Tableau has a Percentile function:
And will return the percentile corresponding to number for the given expression. The Percentile function acts as an aggregate calculation because it takes a set of values as an input and returns a single value. In this case, Percentile takes a set of values and returns the specified percentile for those values.
It’s easier to understand with an example, of course!
This calculation will return the 75th percentile for the Exam Scores. Here it is in action:
- The 50th percentile for the exam score is 62… so about half the students scored higher and half scored lower.
- The 75th percentile is 81, so about a quarter of students scored higher.
- The 0th percentile would give the lowest score on the exam and the 100th percentile would give the highest score
We can add the Gender dimension to the view and see how the percentiles differ:
The Percentile Table Calculation
The Percentile function returns the specified percentile for the underlying data at current level of detail in the view.
Tableau also has a percentile table calculation. This percentile is calculated based on the values in the visualization, rather than the underlying data in the visualization. Here’s what it looks like:
Notice that the data is aggregated with SUM(Exam Scores). The Percentile table calculation operates on the values after they have been aggregated in the view.
Hope this was helpful. Post a comment if you have any questions.
Suppose you want multiple measures in your view. No problem.
But suppose you want multiple calculations on those multiple measures, and you want the multiple calculations to all be in the same row. That’s what this post was discussing. The person wanted a view with a different measure in each row and a different calculation in each column. The view should look something like this:
Easy in Excel, but not so easy in Tableau! You can easily have multiple rows with multiple measures, but how to have this while also doing different calculations in each column? The only way I can see to do it is to first reshape your data. Your data probably looks like this:
But instead it should look like this, so that each metric gets its own row.
Once you have it reshaped, you create a couple calculations in Tableau:
And then create a view like this:
The practice tests and courses here deal with Tableau Desktop, but that’s only one of Tableau’s software offerings! Please check out my post https://blog.udemy.com/what-is-tableau-used-for/ that gives an overview of Tableau Online, Tableau Server, Tableau Prep and more!
When you update to Tableau 2020.2, you’ll see this on the data source view:
If you didn’t read the announcement from Tableau, you’ll think, “relate? what ever happened to join?”
Joins are still available in Tableau 2020.2, you just double-click to get back to the view you’re used to where you can create joins and unions:
Tableau says that relationships have mostly superseded blends. Relationships allow you to combine also avoid data duplication but are more flexible.
|Combines columns or rows?||Match on key field duplicates data?||Limitations||Tableau Documentation|
|Relationship||Combines columns||No||Relationships cannot be formed between tables from data sources published to Tableau Server or Tableau Online.||Relationships|
|Join||Combines columns||Yes||Cross database joins will not work with all data sources||Joins|
|Blend||Combines columns||No||Fields from the secondary data source cannot be brought in to the view unless there is a match on the key field. This makes a blend somewhat comparable to a left join, since data from the primary data source is always brought into the view even if there is no match to the secondary source.
Blends are only able to combine two tables, a primary and secondary data source.
More information on limitations of blending here here
|Union||Combines rows||No, since unions do not use key fields.||Data sources used in the union must come from the same connection||Unions|
Join Vs Union
Joins combine columns from one table with columns from another table. Here’s an example of how a join looks in Tableau:
And here’s what the data looks like after the two tables have been combined using the join. Notice that the first three columns come from the Book table, while the next three columns come from the Award table.
Unions combine rows from one table with rows from another table. Here’s an example of how a union looks in Tableau:
And here is what the data looks like when you combine Sales Q1 and Sales Q2 with a union. Notice the three of the rows come from Sales Q1 and three of the rows come from Sales Q2
Join Vs Relationship
When tables are joined, the result table has a row each time the join criteria is met. For example, if we have two tables, one with transaction data showing purchases:
and a second table with Monthly Sales targets, with one row per month:
Let’s see how these are combined in Tableau, first using a relationship. I use datetrunc so that 1/14/2022 becomes 1/1/2022 so that it matches to the value in the Monthly Sales target table.
Here’s my view in Tableau:
Now try the same but with a join. Notice that I have the option of Inner, Left, Right, and Full Outer, which I didn’t have when I did the relationship.
Also notice the data preview area. The values of Sales Amt Target are replicated. The values from the sales targets are replicated for each row of the purchase data.
If I don’t do anything to handle this duplication, and simply apply a sum to the Sales Amt Target field, I end up with inflated values for Sales Target:
So this illustrates a couple of the key differences between joins and relationships:
- Joins give the option of different join types – inner, outer, left, and right – which handle nonmatches differently
- Joins produce a row each time the join criteria is met, which can cause duplication. Relationships do not produce duplication.
You can download the Join vs Relationship workbook here: https://public.tableau.com/app/profile/lukas.halim/viz/JoinVsRelationship/Join