Learn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now
I have a PDF document, lets say 100 pages for example, where 50 pages contain information I want to extract into excel format. The information i want is the client name and their address.
The position of the text want is relatively the same on each page. When I get data from the PDF, the formatting gets strange and the location is no longer the same on each tab representing the page.
Is there an alternative solution? Is power query the best way to do this?
The key to working with PDFs and power. Query is to make sure that, before you expand the table column, You sorted the tables in descending order by number of columns. So once you have the table column, add a column that uses
each Table.ColumnCount([Name of Table Column])
Now you can sort that column in descending order by number of columns. If possible, make that first PDF your example file. Now when you expand the table column, none of the columns should maintain their order, and none of the column names will be missing.
--Nate
I believe PQ is one of the best tool to extract info from a PDF. Please can you share some screenshots from your query ?
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 6 | |
| 4 | |
| 3 | |
| 2 | |
| 2 |
| User | Count |
|---|---|
| 11 | |
| 10 | |
| 8 | |
| 7 | |
| 7 |