Advance your Data & AI career with 50 days of live learning, dataviz contests, hands-on challenges, study groups & certifications and more!
Get registeredJoin us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM. Register now.
I'm trying get data from pdf files. Its first page data is slightly different from other pages. First I filtered 1st page and did transformation in sample transformation query. After that from there with add to new query, from filter page step I selected other pages and did the transformations..
Finally I Append both to initial sample query to form my data..
But when I run this query with a folder with multiple pdf...other than first page, data is getting repeated for other pdfs.. Ie. First pdf data getting repeated..
Can you please guide what was wrong.. Or how to solve this
Let me know if this helps.
@Sahir_Maharaj why did you post one message in 10 reply?
@technolog I apologize if my previous responses caused any confusion. However, I wanted to clarify that my action to post one message in multiple replies was intentional and made with the aim of making it easier for the recipient to read and respond.
By breaking it up into smaller pieces, I aimed to provide a better user experience and make it more manageable for the recipient to process the information.
It seems to me that information is much easier to absorb when it is presented in one message😊
I understand that this may not be the preferred method for everyone.
Thank you for your input as I am open to learning new ways of presenting my suggestions.
I will consider your feedback for future interactions 😊
Here are the general steps:
Here is an example M code to get you started:
let
// Step 1: Get list of PDF files
Source = Folder.Files("C:\Path\To\PDF\Folder"),
PDFFiles = Table.SelectRows(Source, each [Extension] = ".pdf"),
// Step 2: Combine PDF files into single binary column
CombinePDFs = Table.AddColumn(PDFFiles, "Contents", each Binary.Combine({[Content]})),
// Step 3: Extract tables from PDF files
ExtractTables = Table.AddColumn(CombinePDFs, "Tables", each Pdf.Tables([Contents], [Name])),
ExpandedTables = Table.ExpandTableColumn(ExtractTables, "Tables", {"Data", "Columns"}, {"Data", "Columns"}),
// Step 4: Clean and reshape data as needed
// ...
// Combine all tables into a single table
CombinedTables = Table.Combine(ExpandedTables[Data])
in
CombinedTables
Data is not structured as tables in pdf. In my case first page structure is little different from other pages. I did transformation to make both same but when appending to 'transform sample '.. Other pages is repeating first pdf data.
What might I have done wrong?
This code assumes that all of the PDF files in the folder have tables on their pages.
If some of the files do not have tables or have different structures, you may need to add additional logic to handle those cases.
4. Use any necessary transformations to clean and reshape the data.
3. Use the "Pdf.Tables" function to extract the tables from the combined binary column.
2. Use the "Binary.Combine" function to combine the contents of all the PDF files into a single binary column.
To solve this, you can try creating a fully dynamic query that can handle the varying structure of the PDF files. One way to do this is by using the "Combine binaries (binary.Combine)" function to combine the PDF files into a single binary column.
Hello @NJamigos,
It sounds like the issue may be that your queries are not fully dynamic and are therefore not handling the varying structure of the PDF files in the folder. When you append the queries from the different PDF files, the queries are likely still referring to the first file's structure, causing the data to be repeated.
Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!
Check out the October 2025 Power BI update to learn about new features.