This is great, thanks so much for the extremely helpful practice questions!

A few pieces of feedback
– question 28 (the pareto chart) – this is extremely ambiguous – bar and trend lines vs bar and line chart? that’s probably an unecessarily tricksie question
– question 30 – based on the statistical significance, consuming few fruits and vegetables isn’t related (sig > 0.5), could clarify that you want to identify the positive relationship independent of statistical significance (and FYI for the answers document, smokers is completely unrelated given that trend line and p value)
– explanation of r^2 and p values in the videos aren’t quite right (r^2 is amount of y explained by x, and p is a measure of likelihood that the value would have been observed by chance, rather than being about whether it would hold with further data)

Overall though, amazing resource, and I will recommend this to anyone going for the CA exam!

Thank your for your feedback, Beth!
28 – Pareto charts were recently removed from the exam guide, so I’m going to switch this to a question about bullet charts
30 – the question didn’t ask about statistical significance, though I can see how you could say that is the p-value is high then there is no association. Perhaps I should clarify this or update the question… it does seem this question causes problems for many people.
When you say that p-value is a measure of the likelihood that the value would be observed by chance, you are implying that the data you are looking at is a sampling of some larger data set (called the population). So, the point is that if you get to see all the data then the p-value is a measure of the likelihood that the association observed holds. Statistics is a method for using a sample to make statements about a population. The population is the “further data.”

My numbers seem to be different that what’s shown in the answer guide. For example in area code 203, my Sum Sales is 2743 while the exam guide shows 7934

Using the Players sheet on the Little League data, what percent of players on the Lions scored between 5 and 10 runs?
Answer Provided: No Answer Provided
Correct Answer: 31.82%

–Could you please explain question # 6? I have ran it over and over, and I am continuously getting 18.18% (4/22 total players)

for Q6, to get the answer,
1. create a calculation field: [runs]>=5 and [runs]<=10
2. drag baseball team, the new field and countd of player id to the view.
3. use quick table calculation to show the % of total

Hi Lukas,
For the 1st question, the solution key says this:
“Hold down the control key to keep both dates selected, then select “Keep Only””
However, Tableau exam won’t allow us to use the Control Key. Isn’t it? In that case, how do we get to select multiple, remote data points without using the Control key?

You can bring up the on-screen keyboard to use special keys during the exam. I don’t remember whether control (or shift, if you’re selecting a bunch of marks that are next to each other) were a problem on the virtual machine. I know escape was a problem… it would kick you out of the virtual machine.

For associate exam 2, question 5 asks for BETWEEN 5 and 10. the solution uses 5, 6, 7, 8, 9, 10. Does tableau’s definition of BETWEEN include the ends of the range? I was expecting to use 6, 7, 8, 9. Technically speaking “5” does not fall BETWEEN 5 and 10 🙂

Hi Lukas,
Thank you for the great exams so far. Regarding question #10, I see in your solution doc you have used the following formula to create the % of total smokers from the survey:
avg (if [Smoker] = TRUE then 1 else 0 end)

Unfortunately, I believe this is wrong because it doesn’t give you the % of smokers from the total smokers, but the averages.

The solution I used was to create a calculated field [SMOKERS] after pivoting the data for smokers as you did in question 9.
if smokers? = TRUE then 1 else 0 end

Then just using this field [SMOKERS] calculate the % of total in the build in functions.

In my case:
1960 – 23.04% (while you have – 37.70%)
1970 – 21.21% (while you have – 34.70%)
1990 – 17.36% (while you have – 28.40%)

and this results in having as a right answer 1970, not 1990.
Please, feel free to correct me if I am wrong.

I am not sure I can follow your approach very well, but I will simplify the approach from the solution guide – formula avg (if [Smoker] = TRUE then 1 else 0 end) calculates the average of our condition – if it is smoker then 1 else 0. Ex: we have 5 people, 2 of them smokers the other 3 non-smokers. The result of our calculation will be 2/5 = 0.4 which means that 40% of the survey responders were smokers. This is how the calculation works and it’s output is the correct one.

For # 15,
15. Answer this question using the Order sheet from the SuperStore data. Create a scatterplot showing the sum of Profit on the Y-axis and sum of Sales on the x-axis for each Customer. Add a linear trendline. What is the function?
Correct Answer: 0.097578*Sales + 162.386

I thought with scatter plots we needed to uncheck the Aggregate Measures. Do we not need to do this because we added the Customer Name to the canvas?
Thank you.

The question asked about the function between profit and sales fixed at the level of each customer. So in this case, it will include total profit/sales generated by each customer. So unchecking [aggregate measures] is not needed.

The question requests to create a histogram using the Adj High price with a bin size of 5 and after that, we should find out which bin has the highest adjusted average volume. The solution implies all these steps – creating the histogram with the Adj High Price field, and after that, we can add the average Adj Volume to color the bins and see which one has the highest volume.

As it is also mentioned in the solution guide, although the trend line may be exponential this doesn’t exclude the possibility that the model being a linear regression.

Thank you for providing such enriching questions to prepare us for our exams. I had a query on Question no. 36. Although the question says avg. sales but the solution provides a sum (sales)/countd(Customer ID). Am i missing out on something ? My understanding is avg sales should have avg mentioned. Can you please or anyone in the team here help me understand where I am getting wrong?

Not having “average” mentioned in the name doesn’t make it a wrong calculation. Average means total sum divided by the count of the dimension we are trying to calculate the average on – in this case, Average sales by customer is the total sum of sales divided by the number of distinct customers.
Hope this makes it clear.

on Question no. 36, I used LOD calculation avg sales per customer by this:{fixed [customer Id]: avg([sales])} . For Connecticut, it is 519. avg{fixed [customer Id]: sum([sales])} will give me 522. I am confused.

The calculation avg{fixed [customer Id]: sum([sales])} will produce the same result as in the solution guide because it is the correct calculation of the average sales per customer – you first need to calculate the sum of sales per customer and after that find the average of those sales.

The calculation {fixed [customer Id]: avg([sales])} is not the one we are looking for because it is calculating first the average sales per customer and after that is calculating the average value of those averages per customer.

For example, if customer 1 has 2 orders of 500$ and 20$ the sum of sales would be 520$ and the average sales for that customer would be 260$. Using each one of those values in another calculation would give different results.

For question No. 4, Since the question is to find the most common customer first name, I just simply count Customer First Name after Splitting customer name. I don’t understand why the distinct count of Customer ID is used in the solution.

We are using Countd of Customer ID, because we there might be multiple orders from the same customer. This way we avoid counting multiple times the same customer.

I think that question 12 has different data for me:

Using the Coca Cola data, combine the “Price Archive” and “Price” worksheets, which date had the greatest increase between the Adjusted Open price (Adj Open) and the Adjusted Close Price (Adj Open)

A. May 2009
B. August 1998
C. June 1970
D. June 1971

My full data shows that in August 1998 there was Adj open 22.3810 and Adj close 20.0202 with difference of 2.36075. Did I miss something?

In August 1998 the difference between Adj Close and Adj Open is negative – the question is requiring to find the month that had the greatest increase, this means we must select the largest value, not the largest absolute difference.

12. (NEW) HANDS-ON Question
Using the Coca Cola data, combine the “Price Archive” and “Price” worksheets, which date had the greatest increase between the Adjusted Open price (Adj Open) and the Adjusted Close Price (Adj Open)

Hi
in question 31 the trendline formular is: Obesity (% of pop) = .441593 * Smokers (% of pop) + .153193
and we are asked to find obesity(% of pop),now if we substitute smokers(% of pop) with 1 we should get
.441593(1) + .153193=0.594786
Please how did come about 0.44% ?
Thanks

If you change the Smokers percent with 1 the result will be 0.594786. If you change the smokers percent with 2 the result will be 1.036379 (higher with 0.441593 than the value for smokers percentage 1), for smokers percent 3 the result will be 1.477972 (higher with 0.441593 than percentage smokers 2). This is why the result is 0.44%, an increase of 1% in smokers percentage is associated with an increase of 0.44% of the obese population.

I have the same question as Loui. After i read your notes, i still can’t understand why it is 0.44%. And i can’t relate your example “If you change the smokers percent with 2 the result will be 1.036379 (higher with 0.441593 than the value for smokers percentage 1), for smokers percent 3 the result will be 1.477972 (higher with 0.441593 than percentage smokers 2). ” to this question. The question is asking 1% change.

Thanks for your reply about Q30，I still confused about why 1% in smoker percentage is associated with an increase of 0.44% of the obese population?

In my oponion if we use 1% add into the equation
0.441593*1%+0.153193=0.15760893
And what’s the relationship you list about percentage 2 &3? Can you please clarify more?

The equation Obesity (% of pop) = .441593 * Smokers (% of pop) + .153193. Think of Smokers (% of pop) as the variable.
We would have the following cases:
Smokers = 1% then the equation would be .441593 * 1 + .153193 = 0.594786
Smokers = 2% then the equation would be .441593 * 2 + .153193 = 1.036379
Smokers = 3% then the equation would be .441593 * 3 + .153193 = 1.477972

Now, if we compare the result of the equation in all cases, we can notice that for each increase with 1 unit of the Smokers variable, the final result increases with 0.441593.

Can you please help me to combine the 2 data in Q10. I have done all the steps as per your solution. I have changed ‘Year’ to numeric. However, when I want to connect them, the newly created split field ‘Year’ does not appear in the edit relationship window and thus I am stuck and cannot connect. Has anyone else faced this problem? Please, help.

It was asked to find the greatest increase from Adj open to Adj close.

So it means that Adj Close should be higher. Hence Close – Open

This is great, thanks so much for the extremely helpful practice questions!

A few pieces of feedback

– question 28 (the pareto chart) – this is extremely ambiguous – bar and trend lines vs bar and line chart? that’s probably an unecessarily tricksie question

– question 30 – based on the statistical significance, consuming few fruits and vegetables isn’t related (sig > 0.5), could clarify that you want to identify the positive relationship independent of statistical significance (and FYI for the answers document, smokers is completely unrelated given that trend line and p value)

– explanation of r^2 and p values in the videos aren’t quite right (r^2 is amount of y explained by x, and p is a measure of likelihood that the value would have been observed by chance, rather than being about whether it would hold with further data)

Overall though, amazing resource, and I will recommend this to anyone going for the CA exam!

Thank your for your feedback, Beth!

28 – Pareto charts were recently removed from the exam guide, so I’m going to switch this to a question about bullet charts

30 – the question didn’t ask about statistical significance, though I can see how you could say that is the p-value is high then there is no association. Perhaps I should clarify this or update the question… it does seem this question causes problems for many people.

When you say that p-value is a measure of the likelihood that the value would be observed by chance, you are implying that the data you are looking at is a sampling of some larger data set (called the population). So, the point is that if you get to see all the data then the p-value is a measure of the likelihood that the association observed holds. Statistics is a method for using a sample to make statements about a population. The population is the “further data.”

For question 8,

My numbers seem to be different that what’s shown in the answer guide. For example in area code 203, my Sum Sales is 2743 while the exam guide shows 7934

7,934 is the value you will see before adding the filter on Espresso.

Once you filter on Esspreso, area code 203 shows 2743.

Using the Players sheet on the Little League data, what percent of players on the Lions scored between 5 and 10 runs?

Answer Provided: No Answer Provided

Correct Answer: 31.82%

–Could you please explain question # 6? I have ran it over and over, and I am continuously getting 18.18% (4/22 total players)

Hi

for Q6, to get the answer,

1. create a calculation field: [runs]>=5 and [runs]<=10

2. drag baseball team, the new field and countd of player id to the view.

3. use quick table calculation to show the % of total

thanks

Iris

I am still getting

False – 81.82

and

True – 18.18

Hi Suhas,

Please make sure that you’ve set the calculation correctly (>=5 and 5 and <10.

Thanks,

Narcis

Hi Ben, when you include the 10 (… and [Runs] <= 10) you get the correct result.

I have verified in the Excel sheet as well; the 3 players you are missing have 10 runs:

Name runs Baseball Team

Jacqueline Gomez 10 Lions

Donna Rogers 9 Lions

Theresa Armstrong 6 Lions

Cynthia Robinson 6 Lions

Pamela Clark 10 Lions

Wanda Robertson 10 Lions

Michael Ramirez 9 Lions

Hi Lukas,

For the 1st question, the solution key says this:

“Hold down the control key to keep both dates selected, then select “Keep Only””

However, Tableau exam won’t allow us to use the Control Key. Isn’t it? In that case, how do we get to select multiple, remote data points without using the Control key?

Thanks,

Vijaytha

You can bring up the on-screen keyboard to use special keys during the exam. I don’t remember whether control (or shift, if you’re selecting a bunch of marks that are next to each other) were a problem on the virtual machine. I know escape was a problem… it would kick you out of the virtual machine.

For associate exam 2, question 5 asks for BETWEEN 5 and 10. the solution uses 5, 6, 7, 8, 9, 10. Does tableau’s definition of BETWEEN include the ends of the range? I was expecting to use 6, 7, 8, 9. Technically speaking “5” does not fall BETWEEN 5 and 10 🙂

Hi

Usually between includes two end values.

Thanks,

Iris

Hi Lukas,

Thank you for the great exams so far. Regarding question #10, I see in your solution doc you have used the following formula to create the % of total smokers from the survey:

avg (if [Smoker] = TRUE then 1 else 0 end)

Unfortunately, I believe this is wrong because it doesn’t give you the % of smokers from the total smokers, but the averages.

The solution I used was to create a calculated field [SMOKERS] after pivoting the data for smokers as you did in question 9.

if smokers? = TRUE then 1 else 0 end

Then just using this field [SMOKERS] calculate the % of total in the build in functions.

In my case:

1960 – 23.04% (while you have – 37.70%)

1970 – 21.21% (while you have – 34.70%)

1990 – 17.36% (while you have – 28.40%)

and this results in having as a right answer 1970, not 1990.

Please, feel free to correct me if I am wrong.

Best,

Kiril

I got the same answer as you Kiril so would like an answer / explanation too if possible.

Hi,

I am not sure I can follow your approach very well, but I will simplify the approach from the solution guide – formula avg (if [Smoker] = TRUE then 1 else 0 end) calculates the average of our condition – if it is smoker then 1 else 0. Ex: we have 5 people, 2 of them smokers the other 3 non-smokers. The result of our calculation will be 2/5 = 0.4 which means that 40% of the survey responders were smokers. This is how the calculation works and it’s output is the correct one.

Hope this helps,

Narcis

Hi

After downloading the dataset, scroll down to see the next button.

Thanks,

Iris

Hi

After downloading the dataset, please scroll further down and you will see the NEXT button on your right side.

Thanks,

Iris

For # 15,

15. Answer this question using the Order sheet from the SuperStore data. Create a scatterplot showing the sum of Profit on the Y-axis and sum of Sales on the x-axis for each Customer. Add a linear trendline. What is the function?

Correct Answer: 0.097578*Sales + 162.386

I thought with scatter plots we needed to uncheck the Aggregate Measures. Do we not need to do this because we added the Customer Name to the canvas?

Thank you.

Hi,

The question asked about the function between profit and sales fixed at the level of each customer. So in this case, it will include total profit/sales generated by each customer. So unchecking [aggregate measures] is not needed.

Iris

Hi ,

Could you please elaborate the answer/solution for Question 24?

It is not clear

Thanks

Hello Vasantha,

The question requests to create a histogram using the Adj High price with a bin size of 5 and after that, we should find out which bin has the highest adjusted average volume. The solution implies all these steps – creating the histogram with the Adj High Price field, and after that, we can add the average Adj Volume to color the bins and see which one has the highest volume.

Hope this helps,

Narcis

Hi,

For Question 18, A trendline using an exponential model type will be fit using exponential regression.

Why is this false?

Hi Erin,

As it is also mentioned in the solution guide, although the trend line may be exponential this doesn’t exclude the possibility that the model being a linear regression.

Hope this helps,

Narcis

Hi Lukas and Team,

Thank you for providing such enriching questions to prepare us for our exams. I had a query on Question no. 36. Although the question says avg. sales but the solution provides a sum (sales)/countd(Customer ID). Am i missing out on something ? My understanding is avg sales should have avg mentioned. Can you please or anyone in the team here help me understand where I am getting wrong?

Regards

Hi Shekhar,

Not having “average” mentioned in the name doesn’t make it a wrong calculation. Average means total sum divided by the count of the dimension we are trying to calculate the average on – in this case, Average sales by customer is the total sum of sales divided by the number of distinct customers.

Hope this makes it clear.

Thanks,

Narcis

on Question no. 36, I used LOD calculation avg sales per customer by this:{fixed [customer Id]: avg([sales])} . For Connecticut, it is 519. avg{fixed [customer Id]: sum([sales])} will give me 522. I am confused.

Hello Sharon,

The calculation avg{fixed [customer Id]: sum([sales])} will produce the same result as in the solution guide because it is the correct calculation of the average sales per customer – you first need to calculate the sum of sales per customer and after that find the average of those sales.

The calculation {fixed [customer Id]: avg([sales])} is not the one we are looking for because it is calculating first the average sales per customer and after that is calculating the average value of those averages per customer.

For example, if customer 1 has 2 orders of 500$ and 20$ the sum of sales would be 520$ and the average sales for that customer would be 260$. Using each one of those values in another calculation would give different results.

Thank you,

Narcis

Hi, Lukas,

For question No. 4, Since the question is to find the most common customer first name, I just simply count Customer First Name after Splitting customer name. I don’t understand why the distinct count of Customer ID is used in the solution.

Thank you for your help

Hi Sharon,

We are using Countd of Customer ID, because we there might be multiple orders from the same customer. This way we avoid counting multiple times the same customer.

Hope this helps,

Narcis

Hi,

I think that question 12 has different data for me:

Using the Coca Cola data, combine the “Price Archive” and “Price” worksheets, which date had the greatest increase between the Adjusted Open price (Adj Open) and the Adjusted Close Price (Adj Open)

A. May 2009

B. August 1998

C. June 1970

D. June 1971

My full data shows that in August 1998 there was Adj open 22.3810 and Adj close 20.0202 with difference of 2.36075. Did I miss something?

Thanks,

Daniel

Or now realizing it maybe wasn’t absolute difference, but difference in real numbers and hence 0.84 > -2.21?

Hi Daniel,

In August 1998 the difference between Adj Close and Adj Open is negative – the question is requiring to find the month that had the greatest increase, this means we must select the largest value, not the largest absolute difference.

Hope this makes it clear,

Narcis

Hi

Seems that there’s a typo on question 12:

12. (NEW) HANDS-ON Question

Using the Coca Cola data, combine the “Price Archive” and “Price” worksheets, which date had the greatest increase between the Adjusted Open price (Adj Open) and the Adjusted Close Price (Adj Open)

Hi

in question 31 the trendline formular is: Obesity (% of pop) = .441593 * Smokers (% of pop) + .153193

and we are asked to find obesity(% of pop),now if we substitute smokers(% of pop) with 1 we should get

.441593(1) + .153193=0.594786

Please how did come about 0.44% ?

Thanks

Hello Loui,

If you change the Smokers percent with 1 the result will be 0.594786. If you change the smokers percent with 2 the result will be 1.036379 (higher with 0.441593 than the value for smokers percentage 1), for smokers percent 3 the result will be 1.477972 (higher with 0.441593 than percentage smokers 2). This is why the result is 0.44%, an increase of 1% in smokers percentage is associated with an increase of 0.44% of the obese population.

Hope this makes it clear.

Thanks,

Narcis

I have the same question as Loui. After i read your notes, i still can’t understand why it is 0.44%. And i can’t relate your example “If you change the smokers percent with 2 the result will be 1.036379 (higher with 0.441593 than the value for smokers percentage 1), for smokers percent 3 the result will be 1.477972 (higher with 0.441593 than percentage smokers 2). ” to this question. The question is asking 1% change.

Hi Narcis,

Thanks for your reply about Q30，I still confused about why 1% in smoker percentage is associated with an increase of 0.44% of the obese population?

In my oponion if we use 1% add into the equation

0.441593*1%+0.153193=0.15760893

And what’s the relationship you list about percentage 2 &3? Can you please clarify more?

Thank you！

Hello,

The equation Obesity (% of pop) = .441593 * Smokers (% of pop) + .153193. Think of Smokers (% of pop) as the variable.

We would have the following cases:

Smokers = 1% then the equation would be .441593 * 1 + .153193 = 0.594786

Smokers = 2% then the equation would be .441593 * 2 + .153193 = 1.036379

Smokers = 3% then the equation would be .441593 * 3 + .153193 = 1.477972

Now, if we compare the result of the equation in all cases, we can notice that for each increase with 1 unit of the Smokers variable, the final result increases with 0.441593.

Hope this makes it clearer.

Narcis

Hi Lukas,

Can you please help me to combine the 2 data in Q10. I have done all the steps as per your solution. I have changed ‘Year’ to numeric. However, when I want to connect them, the newly created split field ‘Year’ does not appear in the edit relationship window and thus I am stuck and cannot connect. Has anyone else faced this problem? Please, help.

Thanks,

Zinnia

Hi Zinnia,

Can you please share a printscreen with your issue?

The solution guide uses a blend as a relationship method, make sure that you are not trying to use a join or the new relationship method – https://interworks.com/blog/2020/04/21/comparing-tableaus-new-relationships-blends-joins/

Thanks,

Narcis