Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
I have a PDF file that I am trying to convert into Excel. It's got multiple data records, with each record having this format.
How can I convert each of these 'clusters' into a single row in Excel? Right now I have been:
1. Converting the pdf into Excel using the utility (Get Data - From File - PDF, Selecting all the tables and 'transforming' them)
2. Transposing the data in each table and creating a header row.
3. Loading into Excel using append so it's all in 1 worksheet.
4. Copying the range into a table.
5. MANUALLY moving the second row of data to be on the same row as the first.
6. Doing LOTS of cleanup.
My original file is 1,400 records so it's really not sustainable for me to do manually. A smaller sample file of 20 records took me 30 minutes to get to a usable format.
Thank you!
Solved! Go to Solution.
Hi @evonneflood1 ,
According to your description, converting pdf files to excel format the method you mentioned is indeed a viable path. But because the data in your pdf file is not displayed in a uniform format, you'd better clean the data after importing it into excel. You can use m code to go through this process. Of course you can also go through the power query in some of the buttons to achieve. In short, excel provides a conversion button, but still need you to complete the process of data cleaning.
You can refer to the following documents to get more steps.
How to Convert PDF to Excel using Excel Power Query - Data Cycle Analytics
How to Import PDF Files into Excel with Power Query - Excel Campus
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Hi @evonneflood1 ,
According to your description, converting pdf files to excel format the method you mentioned is indeed a viable path. But because the data in your pdf file is not displayed in a uniform format, you'd better clean the data after importing it into excel. You can use m code to go through this process. Of course you can also go through the power query in some of the buttons to achieve. In short, excel provides a conversion button, but still need you to complete the process of data cleaning.
You can refer to the following documents to get more steps.
How to Convert PDF to Excel using Excel Power Query - Data Cycle Analytics
How to Import PDF Files into Excel with Power Query - Excel Campus
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
With pdfs, the key is to make sure that you have the table column sorted descending by the number of columns. This is the only way that you will keep all columns, and their order, when you combine the tables.
When you have your tables in a column, add a column using Table.ColumnCount([TableColumn]), and then sort that column descending. This will solve a lot of you issues with transforming pdfs.
--Nate
hi @evonneflood1 . could you please send me three pages of this file in PDF? I want to test some solutions, and if one of them works correctly, I’ll explain it here.
How can I get the sample file to you? There isn't the option here to attach it.
Thank you!