Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hi,
I have some invoices in PDF and would like to see if the data can be tabulised in the way I want.
So, I bring PDF files to Power query and would like to tidy the appended table up.
Unfortunately, the values are not always aligned in the same columns.
Amonst the messy table, I need only values for the Customer PO#, PO Item #, Part #, and Quantity as well as tax invoice number as listed in "Result" tab.
I have already highlighted the values I would like to transfer in yellow in "Invoice" Tab, raw data extracted from PDF.
https://limewire.com/d/UfvBI#k4d0bWtA0w
Is it something we can do using Power query?
Hi @dogbural ,
Thanks for posting in Microsoft Fabric Community.
Thanks @DataNinja777 for outlining the general approach using Power Query - that’s exactly the right direction.
I created a sample Power BI file where the invoice, PO, line, and quantity details are extracted into a structured table. You can download it and click Transform Data to review the applied steps and final output. Please tweak the steps as needed to fit your source data.
Hope this helps. Please reach out for further assistance.
Thank you.
Hi @dogbural ,
Yes, you can absolutely do this using Power Query. It's the perfect tool for transforming messy, semi-structured data like the kind extracted from PDF invoices into a clean, usable table.
The general strategy is to create a new, dedicated column for each piece of information you need: the Tax Invoice number, Customer PO#, PO Item #, Part #, and Quantity. You will use Power Query's functions to scan your existing columns, find the rows that contain the labels for your data (like "Customer PO#"), and then pull the corresponding values into your new, clean columns.
You can start this process by adding a custom column for each field. Go to the Add Column tab and select Custom Column. To get the Tax Invoice number, you can use a formula like the one below. This M code checks if a cell in Column1 contains the text "Tax Invoice" and, if it does, it returns the value from the adjacent column, Column2. You will need to adjust these column names to match your specific table.
if Text.Contains([Column1], "Tax Invoice") then [Column2] else null
You will then repeat this process for the other header-level information. For example, to extract the Customer PO number, you would create another custom column using a similar formula, again adjusting the column names as needed.
if Text.Contains([Column1], "Customer PO#") then [Column2] else null
After creating all the new columns for your data, you will notice that the values only appear on single rows, with the rest of the cells being null. To fix this, you need to carry those values down to all the relevant line items. Select the new columns you just created, navigate to the Transform tab, click on the Fill button, and then select Down. This action will populate the null cells with the value from the first populated cell above, correctly associating each line item with its parent invoice and PO number.
The final step is to clean up the table. Now that all your required data is organized in its own columns, you can remove the rows you no longer need. A good way to do this is to filter the "Part #" or "Quantity" column to show only the rows that are not null. This will effectively remove all the original header and miscellaneous text rows. Once that's done, you can simply right-click and remove the original, messy columns from the PDF import, leaving you with the perfectly structured table you need.
Best regards,