Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
Hello everyone,
I'm studying measures through a classic rock top500 ranking which I found on Kaggle.
For my purpose I intend to select three different columns: Music, YearRanking, Top500.
Music contains all the music from this database;
YearRanking contains all the years, from 2015 to 2022, that the interview was applied;
Top500 contains a ranking that goes from 0 to 500, where 0 means that nobody rated that song and 500 means that this song was rated, but it's in the last position.
Ok, so the problem I'm facing is:
I want to filter this 3 columns to find the 5 most rated song, from 1 to 5, where ALL these songs must be presents in 2015, 2016, 2017, 2018, 2019, 2020, 2021 AND 2022.
How can I solve that?
I tried to get the closest I can, using this measure:
TopMusic = CALCULATE(COUNTROWS(classic_rock_playlist), classic_rock_playlist[Top500]<>0, classic_rock_playlist]<6)
But when I use this as a solution, the logic applied here is: Find the songs that is rated from 1 to 5 in any year from 2015 to 2022.
But what I want is to find the songs that is rated from 1 to 5 in all the years.
Thanks in advance.
Solved! Go to Solution.
@DanielaAmadeuPr It's not a trivial problem to solve. Your filter clause is the problem with your CALCULATE and you really shouldn't be using CALCULATE as a beginner. CALCULATE is an incredibly complex function. With your filter clause you are always going to be in the position of getting music that appears in the top 5 for one or more years but not all years. That's the problem you have to solve for. So, to explain the code:
TopMusicTable =
/*
First, get a table for all music that appears in the top 5 and group that by Year and a unique Key in this case since there are multiple songs named One. The Key is simply a concatenation of artist and music with a | character in between. Use TOCSV(__Table) in the RETURN to visualize the table that is returned. You can use a Card visual for this.
*/
VAR __Table =
SUMMARIZE(
FILTER('classic_rock_playlist',[Top500]<>0 && [Top500]<6),
[YearRanking],
[Key]
)
/*
This simply counts the distinct Year values so now you know how many years you are dealing with
*/
VAR __NumYears = COUNTROWS(DISTINCT('classic_rock_playlist'[YearRanking]))
/*
/*
Next, we add a column called __NumYears to our base table. This column calculates how times that music appears in the base table (__Table). It does this by getting the Key adn then counting the rows in the base table where the Key matches. Again, use TOCSV(__Table1) as the RETURN value to visualize this table in a Card visual for example.
*/
VAR __Table1 =
ADDCOLUMNS(__Table, "__NumYears",
VAR __Music = [Key]
VAR __Result = COUNTROWS(FILTER(__Table,[Key] = __Music))
RETURN
__Result
)
/*
Now all we have to do is to filter the table with the additional column (__Table1) where the __NumYears column matches our __NumYears VAR that we created, meaning that the song appeared in all years because they match. We only want the key column as a return value so we use SELEECTCOLUMNS for that and just we only want distinct values so we use DISTINCT as well.
*/
VAR __Result = DISTINCT(SELECTCOLUMNS(FILTER(__Table1, [__NumYears] = __NumYears),"Key",[Key]))
RETURN
__Result
@DanielaAmadeuPr I'm thinking something like this:
Measure =
VAR __Table =
SUMMARIZE(
FILTER('classic_rock_playlist',[Top500]>0 && [Top500]<6)
[YearRanking],
[Music]
)
VAR __NumYears = COUNTROWS(DISTINCT('classic_rock_playlist'[YearRanking]))
VAR __Table1 =
ADDCOLUMNS(
__Table,
"__NumYears" =
VAR __Music = [Music]
VAR __Result = COUNTROWS(FILTER(__Table,[Music] = __Music))
RETURN
__Result
)
VAR __Result = COUNTROWS(FILTER(__Table1, [__NumYears] = __NumYears))
RETURN
__Result
Hello, @Greg_Deckler
First of all, thanks for helping me.
I tried your code as a solution, but it's only filtering the musics that are in some of the years.
Here is your code:
TopMusic =
VAR __Table =
SUMMARIZE(
FILTER('classic_rock_playlist',[Top500]<>0 && [Top500]<6),
[YearRanking],
[Music]
)
VAR __NumYears = COUNTROWS(DISTINCT('classic_rock_playlist'[YearRanking]))
VAR __Table1 =
ADDCOLUMNS(__Table, "NumYears",
VAR __Music = [Music]
VAR __Result = COUNTROWS(FILTER(__Table,[Music] = __Music))
RETURN
__Result
)
VAR __Result = COUNTROWS(FILTER(__Table1, __NumYears = __NumYears))
RETURN
__Result
And here is the bar chart:
As you can see in the purple bar, it's a value that occurs only in 2022, but it's present when it should not be present.
😞
@DanielaAmadeuPr Yeah, kind of flying blind without the dataset, is it this one?
Classic Rock - Top 500 songs | Kaggle
If not, which dataset are you using?
Yes!
@DanielaAmadeuPr OK, here you go. PBIX is attached below signature.
TopMusic =
VAR __Table =
SUMMARIZE(
FILTER('classic_rock_playlist',[Top500]<>0 && [Top500]<6),
[YearRanking],
[Music]
)
VAR __NumYears = COUNTROWS(DISTINCT('classic_rock_playlist'[YearRanking]))
VAR __Table1 =
ADDCOLUMNS(__Table, "__NumYears",
VAR __Music = [Music]
VAR __Result = COUNTROWS(FILTER(__Table,[Music] = __Music))
RETURN
__Result
)
VAR __Result = COUNTROWS(DISTINCT(SELECTCOLUMNS(FILTER(__Table1, [__NumYears] = __NumYears),"__Music",[Music])))
RETURN
__Result
Hello, @Greg_Deckler
Apologies for the delay.
I went back to studying the problem, and came very close to the solution. My only impediment has been replacing SELECTEDVALUE for another measure that fits better.
Here's my new measure:
@DanielaAmadeuPr I don't think that SELECTEDVALUE is your issue. See updated PBIX file that presents your visual. PBIX is attached below signature.
@Greg_Deckler I see.
Ok, it works better. But as a beginner, is there an easier way to solve that?
I can study this solution of course, but my initial idea was to be capable of solving this task in a way that I can understand.
If it's not possible ok, I can handle that. The solution you proposed is incredible and I gonna study it.
Thanks for being so patient.
@DanielaAmadeuPr It's not a trivial problem to solve. Your filter clause is the problem with your CALCULATE and you really shouldn't be using CALCULATE as a beginner. CALCULATE is an incredibly complex function. With your filter clause you are always going to be in the position of getting music that appears in the top 5 for one or more years but not all years. That's the problem you have to solve for. So, to explain the code:
TopMusicTable =
/*
First, get a table for all music that appears in the top 5 and group that by Year and a unique Key in this case since there are multiple songs named One. The Key is simply a concatenation of artist and music with a | character in between. Use TOCSV(__Table) in the RETURN to visualize the table that is returned. You can use a Card visual for this.
*/
VAR __Table =
SUMMARIZE(
FILTER('classic_rock_playlist',[Top500]<>0 && [Top500]<6),
[YearRanking],
[Key]
)
/*
This simply counts the distinct Year values so now you know how many years you are dealing with
*/
VAR __NumYears = COUNTROWS(DISTINCT('classic_rock_playlist'[YearRanking]))
/*
/*
Next, we add a column called __NumYears to our base table. This column calculates how times that music appears in the base table (__Table). It does this by getting the Key adn then counting the rows in the base table where the Key matches. Again, use TOCSV(__Table1) as the RETURN value to visualize this table in a Card visual for example.
*/
VAR __Table1 =
ADDCOLUMNS(__Table, "__NumYears",
VAR __Music = [Key]
VAR __Result = COUNTROWS(FILTER(__Table,[Key] = __Music))
RETURN
__Result
)
/*
Now all we have to do is to filter the table with the additional column (__Table1) where the __NumYears column matches our __NumYears VAR that we created, meaning that the song appeared in all years because they match. We only want the key column as a return value so we use SELEECTCOLUMNS for that and just we only want distinct values so we use DISTINCT as well.
*/
VAR __Result = DISTINCT(SELECTCOLUMNS(FILTER(__Table1, [__NumYears] = __NumYears),"Key",[Key]))
RETURN
__Result
Hello @Greg_Deckler
I was wondering... if I can't use CALCULATE as a beginner, what should I use instead?
Thanks
@DanielaAmadeuPr Use this approach (below). Also, my new channel DAX For Humans is all about this technique although it's very early in the series so far.
With your table solution, I now have a new problem: I need to filter the Blank songs generated by your table. I don't see it as an effective solution.
You could do this, updated PBIX see page 2 attached.
Rankings Measure =
VAR __Table =
SUMMARIZE(
FILTER(ALL('classic_rock_playlist'),[Top500]<>0 && [Top500]<6),
[YearRanking],
[Key]
)
VAR __NumYears = COUNTROWS(DISTINCT(ALL('classic_rock_playlist'[YearRanking])))
VAR __Table1 =
ADDCOLUMNS(__Table, "__NumYears",
VAR __Music = [Key]
VAR __Result = COUNTROWS(FILTER(__Table,[Key] = __Music))
RETURN
__Result
)
VAR __TopTable = DISTINCT(SELECTCOLUMNS(FILTER(__Table1, [__NumYears] = __NumYears),"Key",[Key]))
VAR __Key = MAX('classic_rock_playlist'[Key])
VAR __Result = IF(__Key IN SELECTCOLUMNS(__TopTable,"__Key",[Key]),MAX('classic_rock_playlist'[Top500]), BLANK())
RETURN
__Result
Thanks a lot for explaining even the tiny details. I'll keep studying DAX and CALCULATE even more.
Hey, @Greg_Deckler
Thanks again for helping me.
The requirements is:
The values of the ranking must be from 1 to 5;
The songs that are from 1 to 5 must be present in all years from 2015 until 2022.
There are only three songs that follow those rules:
Stairway to heaven, One, and Bohemian Rhapsody.
Thanks again!!!
@DanielaAmadeuPr Right, the PBIX file does exactly that and comes out with an answer of 3. You can even revise the measure to CONCATENATEX them up so you get the actual songs back. Here's the CONCATENATEX version:
TopMusic2 =
VAR __Table =
SUMMARIZE(
FILTER('classic_rock_playlist',[Top500]<>0 && [Top500]<6),
[YearRanking],
[Music]
)
VAR __NumYears = COUNTROWS(DISTINCT('classic_rock_playlist'[YearRanking]))
VAR __Table1 =
ADDCOLUMNS(__Table, "__NumYears",
VAR __Music = [Music]
VAR __Result = COUNTROWS(FILTER(__Table,[Music] = __Music))
RETURN
__Result
)
VAR __Result = CONCATENATEX(DISTINCT(SELECTCOLUMNS(FILTER(__Table1, [__NumYears] = __NumYears),"__Music",[Music])),[__Music],",")
RETURN
__Result
what context do you have for the measure?
try like:
TopMusic =
CALCULATE(
COUNTROWS(classic_rock_playlist),
classic_rock_playlist[Top500]<>0,
classic_rock_playlist[Top500]<6,
classic_rock_playlist[YearRanking]>=2015,
classic_rock_playlist[YearRanking]<=2022
)
Hello @FreemanZ ,
I was trying exactly this as a solution. But the return I get is the same: I can see all the musics ranking from 1 to 5 that are in one year or another.
Thanks.
Check out the October 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
115 | |
112 | |
105 | |
95 | |
58 |
User | Count |
---|---|
174 | |
147 | |
136 | |
102 | |
82 |