Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Hi,
I am playing with some IMDB files that can be downloaded from https://datasets.imdbws.com.
One of the files (title.basics.tsv.gz) is a GZip file and contains basic information about each movie.
I use the Web connector to download and Binary.Decompress to unzip the file (since I will download many files, I have created a function for this):
(FileUrl) =>
let
Source = Web.Contents(FileUrl),
Decompress = Binary.Decompress(Source, Compression.GZip)
in
Decompress
Now, one of the columns, "genres", is a multi-valued column. Leaving out some columns, it looks like this:
Let's call this table "Movies"
Now, I want to create a new table, which maps between each movie in the table and its genres, that table would look like this:
Let's call this table "MovieGenres". I would then create a one-to-many relationship between Movie and MovieGenres.
Now, I know one way to do this: Create a copy of the "Movies" query, rename it "MovieGenres", and split the "genre" column to rows.
However, that seems like a very inefficient way of doing things - Power Query will download, unzip and transform the file twice - once for the Movies query, once for the MovieGenres query. And since the file is almost 800 MB in size, it takes several minutes on my laptop.
So, is there a more efficient way to do this, which doesn't involve downloading and unzipping the file twice?
Thanks!
Hi @gunman ,
I started with this table:
I ended with this table by splitting the column and unpivoting:
using this:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUrMy8xNLMnMz9NRKM7ILypRitWJVjICSpSnFpekFgGFi/JzE/OSU8ESxkCJ5Pzc1JRKHYXcSpCCSqXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column2", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column2.1", "Column2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2.1", type text}, {"Column2.2", type text}}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Column1"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"})
in
#"Removed Columns"
Is that what you are looking to do?
Let me know if you have any questions.
If this solves your issues, please mark it as the solution, so that others can find it easily. Kudos 👍are nice too.
Nathaniel
Proud to be a Super User!
It seems this way you can use your original table without creating a new one. I just did not include the movie title.
Let me know if you have any questions.
If this solves your issues, please mark it as the solution, so that others can find it easily. Kudos 👍are nice too.
Nathaniel
Proud to be a Super User!
I am not sure what your proposed solution is, but your "
Source = Table.FromRows...
is equivalent to my
(FileUrl) =>
let
Source = Web.Contents(FileUrl),
Decompress = Binary.Decompress(Source, Compression.GZip)
in
Decompress
and both have to appear in two different queries in order to produce two different tables (Movies and MovieGenres). So, I don't see how this solves the problem.
BTW, you don't have to unpivot etc. If you split the multi-valued column to rows directly, the code is simplified (and is more flexible, since your code assumes there are only 2 different values):
The code in that case becomes:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUrMy8xNLMnMz9NRKM7ILypRitWJVjICSpSnFpekFgGFi/JzE/OSU8ESxkCJ5Pzc1JRKHYXcSpCCSqXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Column2", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2", type text}})
in
#"Changed Type1"
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
User | Count |
---|---|
14 | |
13 | |
12 | |
12 | |
12 |