Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes! Register now.

Reply
TameshDoobay
Frequent Visitor

Extract Table from multiple tables from multiple pdfs

Hi,

 

Right now I have some PDFs which have multiple tables within each PDF.

 

I need to extract a single table from each of the PDFs. 

 

The way I'm attempting or suggesting to do it is by:

1. Import the PDF data

2. In Power Query, I expand the data. At this point, I now have 5 columns: Source.Name, Id, Name, Kind and Data. The seperated tables are within the Data Column.
3. This is where i'm confused. My goal is to extract the tables that have these headers: ITEM NO., PART NUMBER, DESCRIPTION, QTY.

So, should I create a column that somehow checks the tables and gives a true/false value based on if the table contains those headers? How would I do that?

Thanks

 

1 ACCEPTED SOLUTION
NaveenGandhi
Super User
Super User

@TameshDoobay 

The attached file has a solution which will work. For me the table where without headers, so i needed promote headers to be done. I am not sure about yours, so i did one with promote headers and another without promote headers.

Basically i have created a filter column to know which tables match the column names in ColNames list that i have created. Then expand the tables.

NaveenGandhi_0-1721244004945.png

 


Let me know if you have any questions.
If this post helps, then please consider Accept it as the solution to help the others find it more quickly. Appreciate you kudos!!

Follow me on LinkedIn!!!


View solution in original post

3 REPLIES 3
NaveenGandhi
Super User
Super User

@TameshDoobay 

The attached file has a solution which will work. For me the table where without headers, so i needed promote headers to be done. I am not sure about yours, so i did one with promote headers and another without promote headers.

Basically i have created a filter column to know which tables match the column names in ColNames list that i have created. Then expand the tables.

NaveenGandhi_0-1721244004945.png

 


Let me know if you have any questions.
If this post helps, then please consider Accept it as the solution to help the others find it more quickly. Appreciate you kudos!!

Follow me on LinkedIn!!!


NaveenGandhi
Super User
Super User

Hi @TameshDoobay 

Can you share a sample file? and screenshots to understand the problem better.

Regards,
NG

Unfortunately I can't share a file due to business retrictions. However I've created a sample file and will share screenshots.

It looks like this when I import the data:

TameshDoobay_1-1721240527803.png

Then, I expand the Content Column, and it looks like this:

TameshDoobay_2-1721240600870.png

From here, I know that I want to only keep Table001. It's the only table that contains the headers ITEM NO., PART NUMBER, DESCRIPTION and QTY.

TameshDoobay_3-1721240738868.png

Basically, I want some kind of advanced filtering capability to find and "keep" these tables.

Keep in mind, I need to do this en mass. IE, with multiple PDFs loaded in. 

Thanks,

Tamesh

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

September Power BI Update Carousel

Power BI Monthly Update - September 2025

Check out the September 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors