Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
NJamigos
Frequent Visitor

Transform multiple sample page / query

I'm trying get data from pdf files. Its first page data is slightly different from other pages. First I filtered 1st page and did transformation in sample transformation query. After that from there with add to new query, from filter page step I selected other pages and did the transformations..

Finally I Append both to initial sample query to form my data.. 

 

But when I run this query with a folder with multiple pdf...other than first page, data is getting repeated for other pdfs.. Ie. First pdf data getting repeated..

 

Can you please guide what was wrong.. Or how to solve this

 

@amitchandak @olgad 

16 REPLIES 16
Sahir_Maharaj
Super User
Super User

Let me know if this helps.


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

@technolog I apologize  if my previous responses caused any confusion. However, I wanted to clarify that my action to post one message in multiple replies was intentional and made with the aim of making it easier for the recipient to read and respond.

 

By breaking it up into smaller pieces, I aimed to provide a better user experience and make it more manageable for the recipient to process the information.


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

It seems to me that information is much easier to absorb when it is presented in one message😊

I understand that this may not be the preferred method for everyone. 

 

Thank you for your input as I am open to learning new ways of presenting my suggestions.‌

 

I will consider your feedback for future interactions 😊


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning
Sahir_Maharaj
Super User
Super User

Here are the general steps:


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Here is an example M code to get you started:

 


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

let
    // Step 1: Get list of PDF files
    Source = Folder.Files("C:\Path\To\PDF\Folder"),
    PDFFiles = Table.SelectRows(Source, each [Extension] = ".pdf"),

    // Step 2: Combine PDF files into single binary column
    CombinePDFs = Table.AddColumn(PDFFiles, "Contents", each Binary.Combine({[Content]})),

    // Step 3: Extract tables from PDF files
    ExtractTables = Table.AddColumn(CombinePDFs, "Tables", each Pdf.Tables([Contents], [Name])),
    ExpandedTables = Table.ExpandTableColumn(ExtractTables, "Tables", {"Data", "Columns"}, {"Data", "Columns"}),

    // Step 4: Clean and reshape data as needed
    // ...

    // Combine all tables into a single table
    CombinedTables = Table.Combine(ExpandedTables[Data])
in
    CombinedTables

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Data is not structured as tables in pdf. In my case first page structure is little different from other pages. I did transformation to make both same but when appending to 'transform sample '.. Other pages is repeating first pdf data.

 

What might I have done wrong? 

This code assumes that all of the PDF files in the folder have tables on their pages.

 

If some of the files do not have tables or have different structures, you may need to add additional logic to handle those cases.


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

4. Use any necessary transformations to clean and reshape the data.


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

3. Use the "Pdf.Tables" function to extract the tables from the combined binary column.


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

2. Use the "Binary.Combine" function to combine the contents of all the PDF files into a single binary column.


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

  1. Use the "Folder.Files" function to get a list of all the PDF files in the folder.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning
Sahir_Maharaj
Super User
Super User

To solve this, you can try creating a fully dynamic query that can handle the varying structure of the PDF files. One way to do this is by using the "Combine binaries (binary.Combine)" function to combine the PDF files into a single binary column.

 


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning
Sahir_Maharaj
Super User
Super User

Hello @NJamigos,

 

It sounds like the issue may be that your queries are not fully dynamic and are therefore not handling the varying structure of the PDF files in the folder. When you append the queries from the different PDF files, the queries are likely still referring to the first file's structure, causing the data to be repeated.


Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution?
➤ Lets connect on LinkedIn: Join my network of 13K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now
LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.