The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello everyone,
I would like to combine several tables from the same file and keep only certain columns.
The problem is that the column headings are not always in the same place.
I have attached a sample file with the desired result.
Can you please help me?
Thank you in advance.
Best regards
Hi @Mederic
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions.
Thank you.
Here is a little different approach.
First, create a custom function to extract the relevant data from your table:
This extracts the relevant column, starting at the location of the header
We will apply the special handling for the "Document" column in the main query.
//rename "fnExtractCol"
(rawTable as table, colHeader as text)=>
[a=Table.ToColumns(rawTable),
b=List.FindText(a,colHeader){0},
col=List.Skip(b,List.PositionOf(b,colHeader)+1)][col]
And the Main Query
Use List.Accumulate and the custom function to process each table and combine them
let
//Changew next lines to create a list of all your tables
//to be used in the List.Accumulate function
Source1 = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Source2 = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
Source3 = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
Tbls = List.Accumulate(
{Source1,Source2,Source3},
#table({},{}),
(s,cur)=> Table.Combine({s,
[a=fnExtractCol(cur,"Document No"),
b={List.First(a,2)},
c=fnExtractCol(cur,"Order"),
d=fnExtractCol(cur,"Amount"),
e=Table.FromColumns({b,c,d},
type table[Document No=Int64.Type, Order=text,Amount=Currency.Type]),
f=Table.FillDown(e,{"Document No"})][f]}))
in
Tbls
From your data in your most recent sample file:
Good evening everyone,
I have carefully reviewed the various solutions.
It seems to me that we are not using the same data model (start of Data).
And it is not easy to explain when the source is a .pdf file. So please accept my apologies.
I have recreated a new file here that is representative of my personal file.
I hope this will be much clearer.
Thank you in advance.
Best regards,
The data that I was using is the data that YOU supplied (your Excel file that you linked to).
Your latest Excel sheet is slightly different from what you have previously supplied. And only minor changes are required in the code I supplied for it to produce your desired result (on your Result tab) given your six tables on your Template Invoice 536 tab.
So I don't really understand your issue.
The relevant differences between the data YOU supplied this time and the data YOU supplied previously has to do with the extra empty cells in the Order and Amount Columns, as well as the different capitalization of Document No vs Document no; both issues easily taken care of with minor code alterations.
Also, one of your tables has no data for Order and Amount. so we need to check for the error and return an empty table if that is the case.
let
//Changew next lines to create a list of all your tables
//to be used in the List.Accumulate function
Source1 = Excel.CurrentWorkbook(){[Name="Tableau1"]}[Content],
Source2 = Excel.CurrentWorkbook(){[Name="Tableau2"]}[Content],
Source3 = Excel.CurrentWorkbook(){[Name="Tableau3"]}[Content],
Source4 = Excel.CurrentWorkbook(){[Name="Tableau4"]}[Content],
Source5 = Excel.CurrentWorkbook(){[Name="Tableau5"]}[Content],
Source6 = Excel.CurrentWorkbook(){[Name="Tableau6"]}[Content],
Tbls = List.Accumulate(
{Source1,Source2,Source3,Source4,Source5, Source6},
#table({},{}),
(s,cur)=> Table.Combine({s,
[a=fnExtractCol(cur,"Document no"),
b={List.First(a,2)},
c=fnExtractCol(cur,"Order"),
d=fnExtractCol(cur,"Amount"),
e=Table.FromColumns({b,c,d},
type table[Document No=Int64.Type, Order=text,Amount=Currency.Type]),
f=try Table.FillDown(e,{"Document No"}) otherwise #table({},{}) ][f]}))
in
Tbls
Of course, I also wonder about the process you use to get the PDF files into the six tables that you show, and wonder if there might be some improvement at that level.
Thanks again,
I put the tables in an Excel file because I couldn't add the PDF invoice.
Indeed the import process is different.
I don't want to charge any pages from the PDF (tables) into the tabs, but rather the result directly.
I was inspired by this video for the process.
I don't understand the language, but it allowed me to get the result that I put in my previous example files.
Best regards,
Hello @ronrsnfld,
Thank you very much for your reply,
This solution with a function is a good idea.
I'll test it out this evening and get back to you
Have a nice day
Best regards
Hello @MasonMA ,
Thank you again for your help.
I'm not sure I can apply some of your steps to my object of Tables without expanding them.
I'will using a Text parameter and adding it directly to my Source.
I've tested it on several invoices and "Document No" always appears in column 2 and on the 3rd line.
I'll use this method even if it's not ideal.
Have a good day.
Best regards,
Hello @Mederic
Could you share your M code?
I got confused again. In your message yesterday, it was mentioned Document No. - 'This information is sometimes in column 2 or 3.'
However in your last message, '"Document No" always appears in column 2 and on the 3rd line.'
If you could review my initial message, the logic is cleaning the tables until your have no Null or columns with unrelated info Before you use Table.Comnine and then transform your Document No. column from there.
However , I wouldn't be able to validate them unless I see your PDF files. Anyways, It's great you have another solution.
Happy to chat on platform.
The problem is that the column headings are not always in the same place.
That is not a problem. Use Table.Combine or the & symbol, it will match the columns from all tables no matter the column order.
Hi @Mederic
I would use below strategies if in your situation.
1. Create one reusable Function for data cleaning as below and apply this function on each table
(table) =>
let
Source = table,
HeaderRow = List.First(List.Select(Table.ToRows(Source), each List.Contains(_, "Document No"))),
HeaderRowIndex = List.PositionOf(Table.ToRows(Source), HeaderRow),
PromoteHeaders = Table.PromoteHeaders(Table.Skip(Source, HeaderRowIndex)),
RemoveNulls = Table.SelectRows(PromoteHeaders, each [Document No] <> null and [Document No] <> ""),
SelectColumns = Table.SelectColumns(RemoveNulls, {"Document No", "Order", "Amount"}, MissingField.UseNull)
in
SelectColumns
2. Use Table.Combine or 'Append Queries as New' from UI to combine all your tables.
3. Add a custom column that returns the Document No only for rows where it's a number, and blanks for name rows
4. Fill down the Document No so each order row gets matched with the correct document.
5. Remove rows where both Order and Amount are blank.
6. Move the No column to the left as your new Document No column. Optionally, you can also remove the original Document No column if it’s no longer needed.
You can also wrap Steps 2–5 into another function if you're repeating this logic across multiple datasets.
In case you would like to refer to this file, please use this link
Hope this helps:)
Thank you for your replies,
Masson, thank you for your clear explanations and this solution.
But Ibendlin's comment made me realise that my file was not exactly the same as the one in my first message.
And sorry for the mistake.
Here is a new file and screenshots of what I would like to do in a specific step, namely adding the column number to the location shown in the screenshot.
Thanks in advance
Best regards
If you need one solution ready for use on your file, please share some real sample data from each table.
Also on your picture, what does it mean by making 'Invoice No.' dynamic? I'm not sure i understand your logic.
Thank @MasonMA you for your reply.
With "make the column dynamic", I meant "automatic" . As we can see, I added a hard-coded formula.
Regarding a real file:
- the source comes from a .pdf file with tables and pages.
- Each table in my example file corresponds to the pages of the .pdf file.
- Each page contains 10 to 11 columns and approxi. 70 rows.
- I have between 5 and 10 pages per invoice.
I want to combine the pages of a single invoice at a time, not of multiple invoices
I have managed to achieve what I wanted, but I am stuck on the "Invoice No" ("Document No") column because, This information is sometimes in column 2 or 3.
In my fictitious file, I have placed "Invoice No" in column 2 or 4 for illustration purposes.
I hope that was clear.
Thank you in advance.
Best regards
Hello @Mederic
I was having the same challenge handling your Document No. when i was playing with your mock file, that's why i created one additional column for these Document No. and use this new Column as a replacement of your old Document No. column (If you see my Step 3 and 4), then filling them down so that right Document No. can be attached to the right Orders.
In my proposal, if you identified that Document No. only appear on Column 2 and 3, you may just need to update AddColumn by creating an inner block (of course you can further tweak the code and make it even more dynamic if you are not sure which Columns have Document No.)
AddInvoiceNoColumn = Table.AddColumn(Source, "Document No", each
let
colsToCheck = { [Column2], [Column3]},
foundDocNo =
List.First(
List.Select(colsToCheck, each Value.Is(Value.FromText(Text.Trim(_)), Int64.Type)),
null
)
in
foundDocNo
)
Hope this gives you some ideas:)