Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hello all,
I imagine this is quite a basic question given the standard of users on here. I am in my first week of using power BI and although I have solved so many problems using google, I am not sure I am using the right language for this one as I cant find anything that seems to relate.
I am working with an example data set from Kaggle that explores amazon prime videos. I have successfully followed along with a tutorial to build a basic dashboard but that was mainly focused on the design concepts and didnt really do much to clean the data. So I am now trying to improve on this on my own and have run into a problem.
The data set lists titles on amazon prime. Some of the columns have multiple categories stored in list format. For example, country and listed in columns both often have multiple values sepreated by a comma where the title is available in more than one country and fits more than one genre. I want to clean my data and structure it in a way that I could filter by an individual country. So for example the top title that appears in Afghanistan and France would appear if either filter option was selected.
My instict is that I probably need to do some sort of text to columns cleansing then unpivot the data somehow so there is a row per category (i.e Line 1 - Title 1 Afghanistan, Line 2 - Title 2 France) but I am not really sure how to go about that or if it is actually a useful thing to do.
I got the data set HERE
If anyone understands the problem and would be happy to try and explain to me I would be so grateful!
Solved! Go to Solution.
Hi @lottie - You are definitely on the right track. You should convert the comma separated list into a Power Query list and then expand the values to new rows, like the example below. You'll need to do that for genres and countries. Then you will have multiple tables in your model: Movies, MoviesGenre, MoviesCountry. Movies will have a one to many relationship with MovieGenre and a one to many relationship with MovieCountry. Depending on the other tables in your model, you may or may not need to also add a Country dimension table and a Genre dimension table, which would relate to MovieCountry and MovieGenre respectively. You would then add a DAX measure to return the value needed. Please let me know if this gets you what you are needing or if you need additional assistanance.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8s0vy0w1VNJRckwuyczPU4rVwRDTUXBMKUvNKyktSiUgraPgW1lcklpUqRQbCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Title = _t, Genre = _t]),
#"Replaced Value" = Table.ReplaceValue(Source,each [Genre], each Text.Split ( [Genre], ", "),Replacer.ReplaceValue,{"Genre"}),
#"Expanded Column1" = Table.ExpandListColumn(#"Replaced Value", "Genre")
in
#"Expanded Column1"
Here is an example of a data model that has a similar requirement. This is a data model for calendar events where each event can have one or more associated contacts and one or more associated categories.
If this post helps to answer your questions, please consider marking it as a solution so others can find it more quickly when faced with a similar challenge.
Proud to be a Microsoft Fabric Super User
@jennratten I have now managed to create the list and expand the rows into two new tables as you suggested. I have established the one to many connections and have been able to improve upon the bar chart from the original dashboard as you can see below.
Just curious though, in what scenario would I need bridging tables as per your example?
Original
Improved
Amazing! I will have to work through that step by step and see if I can make it work tomorrow. Thanks for your help 🙂
Hi @lottie - You are definitely on the right track. You should convert the comma separated list into a Power Query list and then expand the values to new rows, like the example below. You'll need to do that for genres and countries. Then you will have multiple tables in your model: Movies, MoviesGenre, MoviesCountry. Movies will have a one to many relationship with MovieGenre and a one to many relationship with MovieCountry. Depending on the other tables in your model, you may or may not need to also add a Country dimension table and a Genre dimension table, which would relate to MovieCountry and MovieGenre respectively. You would then add a DAX measure to return the value needed. Please let me know if this gets you what you are needing or if you need additional assistanance.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8s0vy0w1VNJRckwuyczPU4rVwRDTUXBMKUvNKyktSiUgraPgW1lcklpUqRQbCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Title = _t, Genre = _t]),
#"Replaced Value" = Table.ReplaceValue(Source,each [Genre], each Text.Split ( [Genre], ", "),Replacer.ReplaceValue,{"Genre"}),
#"Expanded Column1" = Table.ExpandListColumn(#"Replaced Value", "Genre")
in
#"Expanded Column1"
Here is an example of a data model that has a similar requirement. This is a data model for calendar events where each event can have one or more associated contacts and one or more associated categories.
If this post helps to answer your questions, please consider marking it as a solution so others can find it more quickly when faced with a similar challenge.
Proud to be a Microsoft Fabric Super User
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Power BI update to learn about new features.
User | Count |
---|---|
15 | |
11 | |
8 | |
8 | |
7 |
User | Count |
---|---|
15 | |
13 | |
9 | |
7 | |
6 |