Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
SCC1990
New Member

Importing PDFs With Structured & Misc Data To Multiple Table (500+ files)

First time power query user so still figuring it out and guessing this is an easy question. Happy to do the research, just looking for which terms are should be going after to find the right tutorials or videos.

 

I am trying to import 500+ PDF reports that are 2 pages each. Long ago these were actually produced by someone else in excel, printed to a PDF, and then overwrote the excel file to make the next daily report. Not how I would have done it for obvious reasons but here we are. The PDFs are daily reports of production progress and have some misc fields like date, shift, weather, 3 fields for general type notes for various categories at the end, and then 2 tables on page 1 (that may or may not be fully filled out depending on the day) and 1 table on page 2.  Screen shots attached (can't share the actual files).

 

My end goal is actually to clean this all up and import into airtable and link a bunch of records together. The first table on page 1 is a list of our crew doing installing for the day, the second table is a list (partially pre-populated with defaults and some entries that vary) of the tools they used that day. This seems to show up in PQ in the navigator (as far as I've gotten) "OK", but it sees these two separate data sets as one table. I will need to separate these. The table on the second page also seems to show up in the navigator ok. The date and weather are picked up in their own table but the 3 misc notes fields are only picked up in "page", so there's a lot to delete to just leave those.

 

What I am hoping to figure out how to do across all these files is create 4 final tables:

First table will be one row per file and have fields for report date, shift (day/night), weather, and the 3 misc notes fields. The lead field will be report date + shift to create a unique key for that report (later linking in airtable)

Table1.jpg

Second table will be one row for every crew member on a report, so if a report has 3 members that report will add 3 lines to the table. Next report has 5 guys so it adds 5 more now we're at 8, etc. This one will have fields for the report date + shift (from the previous table single record so I can link them), the crew members name, and their role that day.

Table2.jpg

Third table is the same approach as table 2 but for the tools table. Lead field is the report date + shift (for linking) followed by the tool type. If there's 5 tools on a report, it adds 5 rows to the table.

Table3.jpg

Fourth table is from page 2 and is for the work items. See the screenshot, but it would have the lead field as report date + shift followed by the 7 fields per row from the table.Table4.jpgPage1.jpgPage2.jpg

 

 

 

Since this was setup and implemented before my time, the forms are basically setup as if they would be handwritten so the table doesn't dynamically get bigger or smaller depending on the entries for the day, it just might have blanks (or end up with a second report for the day if then ran over). 

 

Being totally new to power query, just looking for what terms/processes I need to familiarize myself with to make this happen (assuming its cleanly doable). I've messed around with one or two all-table PDFs and importing those is straight forward enough, maybe fixing the headers or adding another column. The part I'm a bit lost of what to search for is the separating, transforming, and distributing of the data from one file into 4 different destinations. 

 

Any point in the right direction, terms to search for, tutorial links for similar process, etc are much appreciated.

 

4 REPLIES 4
Omid_Motamedise
Super User
Super User

Can you provide the sample file?

If my answer helped solve your issue, please consider marking it as the accepted solution. It helps others in the community find answers faster—and keeps the community growing stronger!
You can also check out my YouTube channel for tutorials, tips, and real-world solutions in Power Query with the following link
https://youtube.com/@omidbi?si=96Bo-ZsSwOx0Z36h
johnt75
Super User
Super User

You'll need 4 separate queries, each opening and processing the same file, to produce the 4 different tables you want. In each query go through the processing and transformation steps necessary to get just what you want for that table and delete everything else.

When you've got all of the queries working on 1 file you will probably want to turn the queries into functions. This would allow you to create new queries where you open e.g. a sharepoint folder which holds all of the reports you want to process, and create columns by calling your functions. Again, you would I think need to do this 4 times, one for each table.

Multiple queries was the key (I think), thank you. Having never done it before, I was stuck on going all-in-one so to speak, but I've now got 4 separate queries going that appear to pull the data into 4 separate tabs/tables in excel exactly as I want (3 for 4).

 

The last one I am trying to transform I have like below, rows 1, 4, and 6 have what I want to be the column headers and rows 2-3, 5, and 7 have the corresponding data that would go in those columns. Still working on that transform and the only issue I see is it appears that when importing, because the text in the "QUALITY CONTROL ACTIONS IMPLEMENTED" section was more than one row of text in the PDF, it didn't just run long in one row but put the wrapped portion on line 2 into another row/cell (hence 2 and 3 instead of just 2). Will need to figure out how to dynamically deal with that with all 3 sections since the notes vary from day to day.

Notes.jpg

 

Unfortunately can't share the actual files since they are "technically" sensitive (functionally, not at all) and currently don't have the old template they were using to create these in the first place to make a dummy PDF. Appreciate the input

Anonymous
Not applicable

Hi @SCC1990 

What is your final output you want, and i have a question that did your title position is fixed? can you provide some more sample data and the output you want so that can provide more suggestion for you.

 

Best Regards!

Yolo Zhu

 

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Top Solution Authors