Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
mmace1
Impactful Individual
Impactful Individual

Folder of PDFS - importing and appending tables, but sometimes double quotes create extra column

I have a folder full of PDFs.  I'm importing and appending the tables from those PDFs - but double quote (") are causing additional columns to be generated. 

Background: 

I want to extract from all the files in the folder, [Data Table 1] & [Data Table 2] in the below screenshot. 

 

Actual PDF screenshot.png

 

When Power Query interprets the file:

 

  • Data Table 1: Comes through fine - with headers
  • Junk:  Table I'm ignoring
  • Data Table 2: Comes through fine (though as a seperate table from Data Table 1) - without headers

 

Currently, I'm importing the folder of PDFs 1x, then referencing it 2x using the built-in functionality

 

Built in functionality.png

 

1 time to reference [Data Table 1], and 1 time to reference [Data Table 2]

 

The issue is sometimes the first column of [Data Table 1] or [Data Table 2] has a " in it, which Power Query then interprets as a new column break.  Well, so far it's only occured in [Data Table 2], but I'm sure " will pop up in [Data Table 1] eventually. 

 

e.g. - Power Query when no " are present

Normal Table 2.JPG

 

And one with ", - an additional column is generated

 

Problem Table 2.JPG

Due to the additional column, it naturally messes up the Append function.  


How can I work around this?  I was thinking: 

  • Telling Power Query to ignore the " character, prior to the table interpretation
  • Telling  Power Query to swap out the " character for something else, prior to table interpretation
  • Telling Power Query "If you detect 5 columns, drop the 2nd column"
  • Instead of using the built in [Content] expansion (image below), somehow doing this manually with pdf.Tables in a way that achieves the above

 

Not sure what the best course is though to avoid that extra column messing up the Append function (nor how to do any of the above!)

1 REPLY 1
Arash_Bhz
Frequent Visitor

Hi, 
In PowerQuery, use one of the correct PDFs without quotation marks, Ctrl+Click on all field names that you want to keep. Then right-click and select Remove Other Columns. This will remove any extra columns that are added in the future that is not in your selected set of columns. I hope this helps.

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

September Power BI Update Carousel

Power BI Monthly Update - September 2025

Check out the September 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors