Transform multiple sample page / query

NJamigos · ‎02-12-2023

I'm trying get data from pdf files. Its first page data is slightly different from other pages. First I filtered 1st page and did transformation in sample transformation query. After that from there with add to new query, from filter page step I selected other pages and did the transformations..

Finally I Append both to initial sample query to form my data..

But when I run this query with a folder with multiple pdf...other than first page, data is getting repeated for other pdfs.. Ie. First pdf data getting repeated..

Can you please guide what was wrong.. Or how to solve this

@amitchandak @olgad

Sahir_Maharaj · ‎02-12-2023

Let me know if this helps.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

technolog · ‎02-13-2023

@Sahir_Maharaj why did you post one message in 10 reply?

____________

⭐️ Fabric Group Channel

⭐️ Microsoft Fabric Community

Please join the Power BI UX/UI User Group if you need help with dashboard design and usability

Join to Data Governance User Group
Join to DENEB and Power BI Enthusiasts User Group
Join to Data Fabric Best Practices User Group
Subscribe to my medium blog

Sahir_Maharaj · ‎02-13-2023

@technolog I apologize if my previous responses caused any confusion. However, I wanted to clarify that my action to post one message in multiple replies was intentional and made with the aim of making it easier for the recipient to read and respond.

By breaking it up into smaller pieces, I aimed to provide a better user experience and make it more manageable for the recipient to process the information.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

technolog · ‎02-13-2023

It seems to me that information is much easier to absorb when it is presented in one message😊

____________

⭐️ Fabric Group Channel

⭐️ Microsoft Fabric Community

Please join the Power BI UX/UI User Group if you need help with dashboard design and usability

Join to Data Governance User Group
Join to DENEB and Power BI Enthusiasts User Group
Join to Data Fabric Best Practices User Group
Subscribe to my medium blog

Sahir_Maharaj · ‎02-13-2023

I understand that this may not be the preferred method for everyone.

Thank you for your input as I am open to learning new ways of presenting my suggestions.‌

I will consider your feedback for future interactions 😊

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

Here are the general steps:

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

Here is an example M code to get you started:

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

let
    // Step 1: Get list of PDF files
    Source = Folder.Files("C:\Path\To\PDF\Folder"),
    PDFFiles = Table.SelectRows(Source, each [Extension] = ".pdf"),

    // Step 2: Combine PDF files into single binary column
    CombinePDFs = Table.AddColumn(PDFFiles, "Contents", each Binary.Combine({[Content]})),

    // Step 3: Extract tables from PDF files
    ExtractTables = Table.AddColumn(CombinePDFs, "Tables", each Pdf.Tables([Contents], [Name])),
    ExpandedTables = Table.ExpandTableColumn(ExtractTables, "Tables", {"Data", "Columns"}, {"Data", "Columns"}),

    // Step 4: Clean and reshape data as needed
    // ...

    // Combine all tables into a single table
    CombinedTables = Table.Combine(ExpandedTables[Data])
in
    CombinedTables

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

NJamigos · ‎02-12-2023

Data is not structured as tables in pdf. In my case first page structure is little different from other pages. I did transformation to make both same but when appending to 'transform sample '.. Other pages is repeating first pdf data.

What might I have done wrong?

Sahir_Maharaj · ‎02-12-2023

This code assumes that all of the PDF files in the folder have tables on their pages.

If some of the files do not have tables or have different structures, you may need to add additional logic to handle those cases.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

4. Use any necessary transformations to clean and reshape the data.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

3. Use the "Pdf.Tables" function to extract the tables from the combined binary column.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

2. Use the "Binary.Combine" function to combine the contents of all the PDF files into a single binary column.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

Use the "Folder.Files" function to get a list of all the PDF files in the folder.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

To solve this, you can try creating a fully dynamic query that can handle the varying structure of the PDF files. One way to do this is by using the "Combine binaries (binary.Combine)" function to combine the PDF files into a single binary column.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)

➤ Lets connect on LinkedIn: Join my network of 15K+ professionals
➤ Join my free newsletter: Data Driven: From 0 to 100
➤ Website: https://sahirmaharaj.com
➤ About: https://sahirmaharaj.com/about.html
➤ Email: sahir@sahirmaharaj.com
➤ Want me to build your Power BI solution? Lets chat about how I can assist!
➤ Join my Medium community of 30K readers! Sharing my knowledge about data science and artificial intelligence
➤ Explore my latest project (350K+ views): Wordlit.net
➤ 100+ FREE Power BI Themes: Download Now

LinkedIn Top Voice in Artificial Intelligence, Data Science and Machine Learning

Sahir_Maharaj · ‎02-12-2023

Hello @NJamigos,

It sounds like the issue may be that your queries are not fully dynamic and are therefore not handling the varying structure of the PDF files in the folder. When you append the queries from the different PDF files, the queries are likely still referring to the first file's structure, causing the data to be repeated.

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Sahir Maharaj
Data Scientist | Data Engineer | Data Analyst | AI Engineer
P.S. Want me to build your Power BI solution? (Yes, its FREE!)