Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Syndicate_Admin
Administrator
Administrator

Power Query - Extracting Data from a PDF

I have a PDF document, lets say 100 pages for example, where 50 pages contain information I want to extract into excel format. The information i want is the client name and their address.

 

The position of the text want is relatively the same on each page. When I get data from the PDF, the formatting gets strange and the location is no longer the same on each tab representing the page. 

 

Is there an alternative solution? Is power query the best way to do this?

3 REPLIES 3
watkinnc
Super User
Super User

The key to working with PDFs and power. Query is to make sure that, before you expand the table column, You sorted the tables in descending order by number of columns. So once you have the table column, add a column that uses

each Table.ColumnCount([Name of Table Column])


Now you can sort that column in descending order by number of columns. If possible, make that first PDF your example file. Now when you expand the table column, none of the columns should maintain their order, and none of the column names will be missing.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!
Einomi
Resolver II
Resolver II

Hi @Syndicate_Admin 

 

I believe PQ is one of the best tool to extract info from a PDF. Please can you share some screenshots from your query ?

I've attached an example of a PDF I am working with. I've removed some text but highlighted in yellow to indicate there was text there previously. I'd like to create a to pick up the beneficiary name, TFN and the input for the tax items e.g. A, A1, A2 etc.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors