Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
gunman
New Member

Efficient way to create lookup table from multi-value column

Hi,

I am playing with some IMDB files that can be downloaded from https://datasets.imdbws.com.

One of the files (title.basics.tsv.gz) is a GZip file and contains basic information about each movie.

 

I use the Web connector to download and Binary.Decompress to unzip the file (since I will download many files, I have created a function for this):

 

(FileUrl) => 
let
    Source = Web.Contents(FileUrl),
    Decompress = Binary.Decompress(Source, Compression.GZip)
in
    Decompress

 

 

Now, one of the columns, "genres", is a multi-valued column. Leaving out some columns, it looks like this:

gunman_0-1672951215316.png

Let's call this table "Movies"

Now, I want to create a new table, which maps between each movie in the table and its genres, that table would look like this:

gunman_1-1672951437697.png

Let's call this table "MovieGenres". I would then create a one-to-many relationship between Movie and MovieGenres.

 

Now, I know one way to do this: Create a copy of the "Movies" query, rename it "MovieGenres", and split the "genre" column to rows.

However, that seems like a very inefficient way of doing things - Power Query will download, unzip and transform the file twice - once for the Movies query, once for the MovieGenres query. And since the file is almost 800 MB in size, it takes several minutes on my laptop.

 

So, is there a more efficient way to do this, which doesn't involve downloading and unzipping the file twice?

 

Thanks!

 

 

3 REPLIES 3
Nathaniel_C
Community Champion
Community Champion

Hi @gunman ,
I started with this table:

Nathaniel_C_0-1672958644806.png

 

I ended with this table by splitting the column and unpivoting:

Nathaniel_C_1-1672958748918.png

 

using this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUrMy8xNLMnMz9NRKM7ILypRitWJVjICSpSnFpekFgGFi/JzE/OSU8ESxkCJ5Pzc1JRKHYXcSpCCSqXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column2", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column2.1", "Column2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2.1", type text}, {"Column2.2", type text}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Column1"}, "Attribute", "Value"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"})
in
    #"Removed Columns"

Is that what you are looking to do?


Let me know if you have any questions.

If this solves your issues, please mark it as the solution, so that others can find it easily. Kudos 👍are nice too.
Nathaniel





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




It seems this way you can use your original table without creating a new one. I just did not include the movie title.


Let me know if you have any questions.

If this solves your issues, please mark it as the solution, so that others can find it easily. Kudos 👍are nice too.
Nathaniel





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




I am not sure what your proposed solution is, but your "

Source = Table.FromRows...

is equivalent to my 

(FileUrl) => 
let
    Source = Web.Contents(FileUrl),
    Decompress = Binary.Decompress(Source, Compression.GZip)
in
    Decompress

and both have to appear in two different queries in order to produce two different tables (Movies and MovieGenres). So, I don't see how this solves the problem.

 

BTW, you don't have to unpivot etc. If you split the multi-valued column to rows directly, the code is simplified (and is more flexible, since your code assumes there are only 2 different values):

gunman_0-1673297027724.png

The code in that case becomes:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUrMy8xNLMnMz9NRKM7ILypRitWJVjICSpSnFpekFgGFi/JzE/OSU8ESxkCJ5Pzc1JRKHYXcSpCCSqXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Column2", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2", type text}})
in
    #"Changed Type1"

 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.