Efficient way to create lookup table from multi-va...

gunman · ‎01-05-2023

Hi,

I am playing with some IMDB files that can be downloaded from https://datasets.imdbws.com.

One of the files (title.basics.tsv.gz) is a GZip file and contains basic information about each movie.

I use the Web connector to download and Binary.Decompress to unzip the file (since I will download many files, I have created a function for this):

(FileUrl) => 
let
    Source = Web.Contents(FileUrl),
    Decompress = Binary.Decompress(Source, Compression.GZip)
in
    Decompress

Now, one of the columns, "genres", is a multi-valued column. Leaving out some columns, it looks like this:

Let's call this table "Movies"

Now, I want to create a new table, which maps between each movie in the table and its genres, that table would look like this:

Let's call this table "MovieGenres". I would then create a one-to-many relationship between Movie and MovieGenres.

Now, I know one way to do this: Create a copy of the "Movies" query, rename it "MovieGenres", and split the "genre" column to rows.

However, that seems like a very inefficient way of doing things - Power Query will download, unzip and transform the file twice - once for the Movies query, once for the MovieGenres query. And since the file is almost 800 MB in size, it takes several minutes on my laptop.

So, is there a more efficient way to do this, which doesn't involve downloading and unzipping the file twice?

Thanks!

Nathaniel_C · ‎01-05-2023

Hi @gunman ,
I started with this table:

I ended with this table by splitting the column and unpivoting:

using this:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUrMy8xNLMnMz9NRKM7ILypRitWJVjICSpSnFpekFgGFi/JzE/OSU8ESxkCJ5Pzc1JRKHYXcSpCCSqXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column2", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column2.1", "Column2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2.1", type text}, {"Column2.2", type text}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Column1"}, "Attribute", "Value"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"})
in
    #"Removed Columns"

Is that what you are looking to do?

Let me know if you have any questions.

If this solves your issues, please mark it as the solution, so that others can find it easily. Kudos 👍are nice too.
Nathaniel

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Nathaniel_C · ‎01-05-2023

It seems this way you can use your original table without creating a new one. I just did not include the movie title.

Let me know if you have any questions.

If this solves your issues, please mark it as the solution, so that others can find it easily. Kudos 👍are nice too.
Nathaniel

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

gunman · ‎01-09-2023

I am not sure what your proposed solution is, but your "

Source = Table.FromRows...

is equivalent to my

(FileUrl) => 
let
    Source = Web.Contents(FileUrl),
    Decompress = Binary.Decompress(Source, Compression.GZip)
in
    Decompress

and both have to appear in two different queries in order to produce two different tables (Movies and MovieGenres). So, I don't see how this solves the problem.

BTW, you don't have to unpivot etc. If you split the multi-valued column to rows directly, the code is simplified (and is more flexible, since your code assumes there are only 2 different values):

The code in that case becomes:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUrMy8xNLMnMz9NRKM7ILypRitWJVjICSpSnFpekFgGFi/JzE/OSU8ESxkCJ5Pzc1JRKHYXcSpCCSqXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Column2", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column2"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2", type text}})
in
    #"Changed Type1"

Efficient way to create lookup table from multi-value column

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

Efficient way to create lookup table from multi-value column

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025