Re: Power Query: Merging on virtual table from sta...

dpc_development · ‎11-12-2021

I have a starting CSV that contains a SELECT DISTINCT of all the dimension columns of my transaction data. The columns are hierarchical, so I am trying to split that further into three tables based on column distribution and frequency of reporting level required.

My code is similar to the following, where I create an index on a distinct combination of the first set of columns and merge it back to the original master. After repeating twice for two other sets of columns, I merge the original master id to the transaction table master id, to yield the three dimension foreign keys for a more defined star schema.

let
    Source = Folder.Files(#"Folder Path"),
    ReadCSV = Table.AddColumn(Source, "CSV", each
                    Table.PromoteHeaders(
                                            Csv.Document([Content], [Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]), [PromoteAllScalars=true]
                                        )
                ),
    KeepCSV = Table.SelectColumns(ReadCSV,{"CSV"}),
    ExpandedCSV = Table.ExpandTableColumn(
                            KeepCSV, "CSV",
                            {"list", "of", "columns"},
                            {"List", "Of", "columns"}
                        ),
    ChangedType = Table.TransformColumnTypes(ExpandedCSV,{{"List", type text}, {"Of", type text}, {"Columns", type text}}),
	
	/* Lines 17 - 24 (EventScenarios - ExpandedES) is repeated similarly two more times, with some other columns */
    EventScenarios = let
                        ESSource = Table.Distinct( Table.Group(ChangedType, {"GroupColumn"}, {{"ValueColumn", each List.Max([ValueColumn])}}) ),
                        ESSorted = Table.Sort(ESSource, {{"GroupColumn", Order.Ascending}}),
                        ESFinal = Table.AddIndexColumn(ESSorted, "Id", 1, 1, Int64.Type)
                     in
                        ESFinal,
    MergedES = Table.NestedJoin(ChangedType, {"Event", "Group", "Scenario"}, EventScenarios, {"Event", "Group", "Scenario"}, "Event Scenarios", JoinKind.LeftOuter),
    ExpandedES = Table.ExpandTableColumn(MergedES, "Event Scenarios", {"Id"}, {"Event Scenario Id"})
in
    ExpandedES

The code above works. However, it doesn't seem to cache the previous merge operations nor the inner generation of distinct rows.

The original file is only 65mb (uncompressed), but the loaded data increases to around 570mb.

Since the dimensions are hierarchical and I include the previous generated master table's ID, in the subsequent select distinct, Power Query seems to re-process the previously generated tables again. I tried enclosing them in Table.Buffer, but that only seemed to make things worse.

Can someone suggest how may I improve the loading process.

Anonymous · ‎11-16-2021

Hi @dpc_development ,

Power Query automatically detects what connector to use based on the first file found in the list.

After selecting Transform data in the Combine files dialog box, you'll be taken back to the Power Query Editor in the query that you initially created from the connection to the local folder. The output query now contains the source file name in the left-most column, along with the data from each of the source files in the remaining columns.

Please review the entire process according to this document to see if it can improve your situation.

Combine CSV files

Hope it helps,

Community Support Team _ Caitlyn

If this post helps then please consider Accept it as the solution to help the other members find it more quickly.

lbendlin · ‎11-13-2021

Do you have to do the merge in Power Query or could the data model do the work for you?

There are some really good articles on Table.NestedJoin and its pitfalls.

dpc_development · ‎11-13-2021

@lbendlin I would prefer it happen in Power Query itself. When you say Data Model, do you mean in the primary data source location? I do not have access to that, only the denormalised CSV output.

lbendlin · ‎11-13-2021

No, I mean the Power BI part, after you loaded the queries you can join the tables in the data model view

dpc_development · ‎11-23-2021

Not sure how I'd use Join in the Data Model though. I am basically normalising the flattened CSV output. So taking a distinct of some table columns in a separate table, generating Ids, replacing those columns in the flattened table with the generated Id.

I assume the Join needs to be done in Power Query itself, in this scenario.

lbendlin · ‎11-23-2021

" I am basically normalising the flattened CSV output"

Yes, I understand what you are trying to do. But do you really have to? Are you familiar with the concept of expanded tables?

How bad is your memory pressure? Maybe you have a solution that is looking for a problem. What would happen if you consume the CSV as is, and then let the data model do the work (even if the table is denormalized, in-memory hierarchies will be fast).

Power Query: Merging on virtual table from starting CSV takes too long and repeats code execution

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

Fabric Data Days starts November 4th!