Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
PaulDBrown
Community Champion
Community Champion

Extracting multiple PDF files from a folder

Good evening!

I need to extract multiple pdf files (I'm trying to use the Folder connector) where each file has a different number of pages. My limited knowledge of M code is all the more evident during this process: I'm stuck on the "sample file" which requests I select a "page" to transform. Basically I can't work out how to import all the possible pages from each file to access all available "relevant" data. 
I've attached a zip file with three PDFs (each with a different number of pages) as a sample. I need to get access to all the pages from each file to be able to work on the transformations.
Any guidance will be a huge help!

Many thanks for your time.

Best,

Paul.





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






5 REPLIES 5
nidheshtiwari
Frequent Visitor

Hi Paul,

 

did you resolved the above the problem ? There are 28 tables in your first file so just want to confirm wether all the pdf files have same number of tables and strucures ? Also do you wish to extract all the 28 tables or a specific table ?

Thanks

tmhwk
New Member

Hi all

just joining the talk now. Has a solution been found for multiple pdfs with multiple pages?. I have exactly same situation and looking for a solution

Anonymous
Not applicable

You could always delete the sample query, and also delete the invocation of the sample creamery in your main file, which will give you just a list of the tables, which you can then select individually with another query and transform each separately.

 

--Nate

@Anonymous Thanks for the suggestion, but I'm not too sure what you mean. The source data is 34 PDF files with at least half a dozen pages each (where there are rows/text which I don't need mixed with data which I need to transform). 

This is the interface I get when I select the folder connector. I need to select a "page" to access the sample file:

Load form folder.jpg

which leads to the following in the Transform file code:

= (Parameter1 as binary) => let
    Source = Pdf.Tables(Parameter1, [Implementation="1.3"]),
    Page001 = Source{[Id="Page001"]}[Data],
    #"Changed Type1" = Table.TransformColumnTypes(Page001,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Changed Type1",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}})
in
    #"Changed Type"

where the code is set to "page1". The final Query loading the data leads with this code:

Source = Folder.Files("D:\OneDrive - In2-Action.com\Biniarbolla\Informes MB\Estructura corta"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}}),

This only loads the first page for each file. How can I change it to load every page from each file?

 

Many thanks!

 





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






Hi, Paul! Try this .pbix file like example.

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.