The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi,
I need some assistance figuring out how to join multiple tables along a single shared column. I understand how to join static tables that I can see and have the structure for. The trouble I am having is a more complicated scenario.
The data I am accessing is downloaded monthly from an API as a zip folder, and the data itself is saved in bunch of separate csv files, each file contains different columns from the data set in different numbers. Every month the number of files changes, and all the file names change. I need to rebuild all of these separate tables from the csv files into a single table using the FilingID column as the shared index.
So far I have been saving some sample data in a SharePoint folder and am able to load all of the tables from each csv file through a custom function. The problem is I do not know the Power Query syntax for performing a join on tables that are created from invoking custom functions.
let
Source = SharePoint.Files("https://_.sharepoint.com", [ApiVersion = 15]),
#"Filtered Rows" = Table.SelectRows(Source, each ([Folder Path] = "https://_.sharepoint.com/sites/dev/ODC/MayRaw/Filing_Data_20250501_202505311/")),
#"Filtered Hidden Files1" = Table.SelectRows(#"Filtered Rows", each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", List.Distinct(List.Combine(List.Transform (#"Removed Other Columns1"[Transform File], each Table.ColumnNames(_)))))
in
#"Expanded Table Column1"
I believe where the join needs to happen is at the #"Expanded Table Column1" step, but I'm not certain about that.
Can anyone help me out? Thanks in advance
Solved! Go to Solution.
Hi @jrrs
Try this
let
Source = SharePoint.Files("https://_.sharepoint.com", [ApiVersion = 15]),
FilteredRows = Table.SelectRows(Source, each [Folder Path] = "https://_.sharepoint.com/sites/dev/ODC/MayRaw/Filing_Data_20250501_202505311/"),
FilteredHiddenFiles = Table.SelectRows(FilteredRows, each [Attributes]?[Hidden]? <> true),
InvokedTransform = Table.AddColumn(FilteredHiddenFiles, "Transform File", each #"Transform File"([Content])),
RenamedColumns = Table.RenameColumns(InvokedTransform, {"Name", "Source.Name"}),
RemovedOtherColumns = Table.SelectColumns(RenamedColumns, {"Source.Name", "Transform File"}),
// Extract the list of tables
TableList = RemovedOtherColumns[Transform File],
// Merge all tables on FilingID using List.Accumulate
MergedTable = List.Accumulate(
TableList,
TableList{0}, // Start with the first table
(state, current) => Table.Join(state, "FilingID", current, "FilingID", JoinKind.FullOuter)
)
in
MergedTable
I hope this helps, please give a thumbs up and mark as solved if it does, thanks!
Hi @jrrs
I've recreated the issue with a list of tables. You can replace these with the end result of your other query but for testing this might be helpful:
let
Table1 = Table.FromRows( //Generate fake data 1
{
{1, "afa", "UK"},
{2, "abc", "da"}
},
{"FilingID", "StuffA", "StuffB"}
),
Table2 = Table.FromRows(//Generate fake data 2
{
{1, "x", "y"},
{3, "z", "w"}
},
{"FilingID", "StuffE", "StuffF"}
),
Table3 = Table.FromRows( //Generate fake data 3
{
{2, "x", "y"},
{1, "z", "w"}
},
{"FilingID", "StuffC", "StuffD"}
),
AllTables = {Table1, Table2, Table3} //Combine fake tables, replace this with the output of your Folder join
,
FirstTable = List.First(AllTables), //freturn the first table everything else will join to
RemainingTables = List.Skip(AllTables, 1), //Ignore the first row of the table
IndexedTables = List.Zip({RemainingTables, List.Numbers(1, List.Count(RemainingTables))}), //Index the column names using a count
CombinedTable = List.Accumulate( //loop over the tables running a function on each
IndexedTables, //see above
FirstTable, //see above
(state, pair) => //two parameters to pass in
let
nextTable = pair{0}, //return name
index = pair{1}, //return the index number
renamedTable = Table.RenameColumns(nextTable, {"FilingID", "FilingID" & Text.From(index)}) //rename the column so no clash when merging (Filing1, Filing2 etc)
in
Table.Join(state, "FilingID", renamedTable, "FilingID" & Text.From(index), JoinKind.FullOuter) //Full outer join so every column frm every table returned
),
//optionally loop through and create a combined filing index column
//if every ID is represented in every file this is no nedded
FilingIDColumns = List.Select(Table.ColumnNames(CombinedTable), each Text.Contains(_, "FilingID")),
// Use the above list of names to add a new column, remove the empty rows
AddCombinedColumn = Table.AddColumn(CombinedTable, "CombinedFilingID", each
List.RemoveNulls(List.Transform(FilingIDColumns, (col) => Record.Field(_, col)))
),
// Take the list of column names and return a single column
FlattenedColumn = Table.TransformColumns(AddCombinedColumn, {"CombinedFilingID", each if List.Count(_) > 0 then _{0} else null})
//Use the previously created list of names to delete the old columns
, DeleteColumns = Table.RemoveColumns(FlattenedColumn, FilingIDColumns),
//reorder to put the CombinedFilingID at the start but first remove the CombinedFilingID column then add it to the start
#"Reordered Columns" = Table.ReorderColumns(DeleteColumns, List.Combine({{"CombinedFilingID"}, List.RemoveItems(Table.ColumnNames( DeleteColumns),{"CombinedFilingID"})}))
in
#"Reordered Columns"
If you are happy with this answer please mark as a solution for others to find !
Kudos are always appreciated! Check out our free Power BI video courses.
Hi @jrrs
We can add a line to ensure FilingID isn't duplicated as your error points to this. Strange I know, but let's force the code to figure it out rather than checking through files to find the problem! If we add this before list.accumulate, we shouldn't bump into the value=3 error, but if we do, you'll need to change the type of your problem column to allow numbers and letters.
This is the line of code we will add, it will ensure all tables have only one FilingID column:
TableList = List.Transform(RemovedOtherColumns[Transform File], each Table.Distinct(_, {"FilingID"}))
When added to your code before list.accumulate, it will look like this:
let
Source = SharePoint.Files("https://_.sharepoint.com", [ApiVersion = 15]),
FilteredRows = Table.SelectRows(Source, each [Folder Path] = "https://_.sharepoint.com/sites/dev/ODC/MayRaw/Filing_Data_20250501_202505311/"),
FilteredHiddenFiles = Table.SelectRows(FilteredRows, each [Attributes]?[Hidden]? <> true),
InvokedTransform = Table.AddColumn(FilteredHiddenFiles, "Transform File", each #"Transform File"([Content])),
RenamedColumns = Table.RenameColumns(InvokedTransform, {"Name", "Source.Name"}),
RemovedOtherColumns = Table.SelectColumns(RenamedColumns, {"Source.Name", "Transform File"}),
// Extract and deduplicate each table in the list
TableList = List.Transform(RemovedOtherColumns[Transform File], each Table.Distinct(_, {"FilingID"})),
// Merge all tables on FilingID using List.Accumulate
MergedTable = List.Accumulate(
TableList,
TableList{0}, // Start with the first table
(state, current) => Table.Join(state, "FilingID", current, "FilingID", JoinKind.FullOuter)
)
in
MergedTable
Hope it works this time!
Thank you, wardy, but I am still receiving errors about duplicate column names even after your suggested changes. In samwise's suggestion, he had hardcoded tables and the result table was exactly what I am looking for, but as soon as I try to use it with the csv tables, I start getting the same duplicate column errors.
I'm not sure what to do to troubleshoot. Maybe it's in the custom function?
Actually, I think that's exactly it. Right after I typed the previous response I went to check. I thought the function was the most basic of basic function to grab the table and return it, but the editor added a set column number to thqat function and the column count is supposed to b dynamic, so once I deleted that it stopped giving the duplicate column names error.
I believe the code is working correctly now, thank you for your help!
I really appreciate everyone's help, but I'm having a little trouble getting the code to work at the List.Accumulate step in each of the suggestions.
The first error I get is:
Expression.Error: A join operation cannot result in a table with duplicate column names ("FilingID").
Details:
[Type]
So I tried changing the Table.Join to Table.NestedJoin, but then I get the error:
Expression.Error: We cannot convert the value 3 to type Text.
Details:
Value=3
Type=[Type]
I'm not sure what this means. I thought maybe it was in my files but there isn't a 3 anywhere, and I even replaced the test files with new ones, and when I click on "Go to error" it takes me to this screen.
Is the "Value 3" that accumulator function or something else? Any ideas?
Hi @jrrs ,
Thank you for reaching out to the Microsoft Community Forum.
Please follow below steps.
1. Convert your list of CSV tables into a list of Name, Table pairs.
2. Iterative Table.Join them on FillingID.
Please refer below M code.
let
Source = SharePoint.Files("https://_.sharepoint.com", [ApiVersion = 15]),
FilteredRows = Table.SelectRows(Source, each [Folder Path] = "https://_.sharepoint.com/sites/dev/ODC/MayRaw/Filing_Data_20250501_202505311/"),
FilteredHidden = Table.SelectRows(Filtered Rows, each [Attributes]?[Hidden]? <> true),
Invoked = Table.AddColumn(FilteredHidden, "Data", each #"Transform File"([Content])),
OnlyNameAndData = Table.SelectColumns(Invoked, {"Name", "Data"}),
TablesList = List.Transform(OnlyNameAndData[Data], each Table.TransformColumnTypes(_, {{"FilingID", type text}})),
JoinOnFilingID = (t1 as table, t2 as table) =>
Table.Join(t1, "FilingID", t2, "FilingID", JoinKind.FullOuter),
CombinedTable = List.Accumulate(TablesList, TablesList{0}, (state, current) => JoinOnFilingID(state, current))
in
CombinedTable
Regards,
Dinesh
Hi @jrrs
I've recreated the issue with a list of tables. You can replace these with the end result of your other query but for testing this might be helpful:
let
Table1 = Table.FromRows( //Generate fake data 1
{
{1, "afa", "UK"},
{2, "abc", "da"}
},
{"FilingID", "StuffA", "StuffB"}
),
Table2 = Table.FromRows(//Generate fake data 2
{
{1, "x", "y"},
{3, "z", "w"}
},
{"FilingID", "StuffE", "StuffF"}
),
Table3 = Table.FromRows( //Generate fake data 3
{
{2, "x", "y"},
{1, "z", "w"}
},
{"FilingID", "StuffC", "StuffD"}
),
AllTables = {Table1, Table2, Table3} //Combine fake tables, replace this with the output of your Folder join
,
FirstTable = List.First(AllTables), //freturn the first table everything else will join to
RemainingTables = List.Skip(AllTables, 1), //Ignore the first row of the table
IndexedTables = List.Zip({RemainingTables, List.Numbers(1, List.Count(RemainingTables))}), //Index the column names using a count
CombinedTable = List.Accumulate( //loop over the tables running a function on each
IndexedTables, //see above
FirstTable, //see above
(state, pair) => //two parameters to pass in
let
nextTable = pair{0}, //return name
index = pair{1}, //return the index number
renamedTable = Table.RenameColumns(nextTable, {"FilingID", "FilingID" & Text.From(index)}) //rename the column so no clash when merging (Filing1, Filing2 etc)
in
Table.Join(state, "FilingID", renamedTable, "FilingID" & Text.From(index), JoinKind.FullOuter) //Full outer join so every column frm every table returned
),
//optionally loop through and create a combined filing index column
//if every ID is represented in every file this is no nedded
FilingIDColumns = List.Select(Table.ColumnNames(CombinedTable), each Text.Contains(_, "FilingID")),
// Use the above list of names to add a new column, remove the empty rows
AddCombinedColumn = Table.AddColumn(CombinedTable, "CombinedFilingID", each
List.RemoveNulls(List.Transform(FilingIDColumns, (col) => Record.Field(_, col)))
),
// Take the list of column names and return a single column
FlattenedColumn = Table.TransformColumns(AddCombinedColumn, {"CombinedFilingID", each if List.Count(_) > 0 then _{0} else null})
//Use the previously created list of names to delete the old columns
, DeleteColumns = Table.RemoveColumns(FlattenedColumn, FilingIDColumns),
//reorder to put the CombinedFilingID at the start but first remove the CombinedFilingID column then add it to the start
#"Reordered Columns" = Table.ReorderColumns(DeleteColumns, List.Combine({{"CombinedFilingID"}, List.RemoveItems(Table.ColumnNames( DeleteColumns),{"CombinedFilingID"})}))
in
#"Reordered Columns"
If you are happy with this answer please mark as a solution for others to find !
Kudos are always appreciated! Check out our free Power BI video courses.
Hi @jrrs did you try this code? It replaces the FilingId in each table with its own name before joining.
If you are happy with this answer please mark as a solution for others to find !
Kudos are always appreciated! Check out our free Power BI video courses.
I did, but each time I am getting an error at the Join or the Accumlate. When I use the hardcoded sample tables, your code did precisely what I need, but when I try to do it using tables read from csv files, I get errors about duplicate column names.
The error I received from your suggestion was:
Expression.Error: A join operation cannot result in a table with duplicate column names ("").
Details:
[Type]
I'm pretty new to this language and I don't know what I should do to track down the cause
Thank you, Sam, I seem to have found the issue. In the custom function the editor added a set column count even though the files have different numbers of columns, and once I removed that it seems your code is working correctly.
Thank you for your help!
Hi @jrrs
Try this
let
Source = SharePoint.Files("https://_.sharepoint.com", [ApiVersion = 15]),
FilteredRows = Table.SelectRows(Source, each [Folder Path] = "https://_.sharepoint.com/sites/dev/ODC/MayRaw/Filing_Data_20250501_202505311/"),
FilteredHiddenFiles = Table.SelectRows(FilteredRows, each [Attributes]?[Hidden]? <> true),
InvokedTransform = Table.AddColumn(FilteredHiddenFiles, "Transform File", each #"Transform File"([Content])),
RenamedColumns = Table.RenameColumns(InvokedTransform, {"Name", "Source.Name"}),
RemovedOtherColumns = Table.SelectColumns(RenamedColumns, {"Source.Name", "Transform File"}),
// Extract the list of tables
TableList = RemovedOtherColumns[Transform File],
// Merge all tables on FilingID using List.Accumulate
MergedTable = List.Accumulate(
TableList,
TableList{0}, // Start with the first table
(state, current) => Table.Join(state, "FilingID", current, "FilingID", JoinKind.FullOuter)
)
in
MergedTable
I hope this helps, please give a thumbs up and mark as solved if it does, thanks!
Very nice, managed that in fewer lines than I did!
If you are happy with this answer please mark as a solution for others to find !
Kudos are always appreciated! Check out our free Power BI video courses.