Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
joshpgeorge
Regular Visitor

NEED HELP PLEASE- PDF Report with varying number of pages and tables

Trying to use power query to import tables from an invoice in PDF format to an excel format. The data on the first page (PDF) has a consistent tabular format. The data on the following pages (PDF) is in tabular format, but doesn't line up with the first. This is not an issue. The problem is that the report can be 2-50 pages and each page imports as its own table, even though the format is consistent from page 2 on. Does anyone know how to write a power query that I can use on this report regardless of the number of pages/tables I am trying to import?

 

Newbie.

1 ACCEPTED SOLUTION

Hi @joshpgeorge 

 

For the 3rd query, you can first filter on "Page" in Kind column, then keep the bottom 1 row. 

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")),
    #"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1)
in
    #"Kept Last Rows"

 

For the 2nd query, you can first filter on "Page" in Kind column, then remove the first row and last row.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Filtered Rows",1)
in
    #"Removed Bottom Rows"

 

Under Home tab, Keep Rows and Remove Rows can also be used to filter rows.

080602.jpg

 

Best Regards,
Community Support Team _ Jing

View solution in original post

11 REPLIES 11
Anonymous
Not applicable

Table.RemoveLastN(Table, 1)

 

or to Select last row,

 

Table.LastN(Table, 1)

 

--Nate

v-jingzhang
Community Support
Community Support

Hi @joshpgeorge 

 

When you connect to the PDF file, do not select any table or page in the Navigator window. Instead, right click on the file name and select Transform Data. This will bring in a table with all tables and pages in this file, which is consistent with the result of Source step when you select a table or a page.

080303.jpg

 

Then filter out "Table" in Kind column and filter out the first page. You can connect to the first page in another query if you want to have it. 

 

Remove other columns except for the Data column and expand Data column. Promote the first row as headers.

 

In this way, you can combine all pages from page 2 to the last page into a table no matter how many pages you have in the file. All these operations can be done by using the user interface.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Item #", type text}, {"Month", type date}, {"Total#(lf)Purchase", Int64.Type}, {"Ending#(lf)Inventory", Int64.Type}, {"Turnover", Int64.Type}})
in
    #"Changed Type"

 

Let me know if you have any questions.

 

Regards,
Community Support Team _ Jing
If this post helps, please Accept it as the solution to help other members find it.

This is almost perfect except when the last page contains data in a different table format. How do I select the "last" page regardless of page number and alternately how do I exclude the "last" page.

@joshpgeorge 

Insert a step before expanding Data column to remove the last row. Go to Remove Rows > Remove Bottom Rows.

080501.jpg

Jing

I need to remove the last Page from the 2nd Query and then I need another (3rd) Query to look at ONLY the last Page. Here is what I have in my first and second Queries so far-

 

1st- 

let Source = Pdf.Tables(File.Contents("C:\Users\JG\OneDrive - BRInc\Desktop_Home\Custom Reports JG\Email_NewAgreeRpt.pdf"), [Implementation="1.3"]),
#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")and ([Id] = "Page001")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
#"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([Column3] = "-")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows1",{"Column3", "Column4", "Column5", "Column6", "Column7", "Column1", "Column2", "Column8"}),
#"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Column1", "Column2", "Column8"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Total"}})
in
#"Renamed Columns"

 

2nd 

let Source = Pdf.Tables(File.Contents("C:\Users\JG\OneDrive - BRInc\Desktop_Home\Custom Reports JG\Email_NewAgreeRpt.pdf"), [Implementation="1.3"]),#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")), #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}), #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8"}), #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
#"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([Column3] = "-")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows1",{"Column3", "Column4", "Column5", "Column6", "Column7", "Column1", "Column2", "Column8"}),
#"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Column1", "Column2", "Column8"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Total"}})
in
#"Renamed Columns"

 

Hi @joshpgeorge 

 

For the 3rd query, you can first filter on "Page" in Kind column, then keep the bottom 1 row. 

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")),
    #"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1)
in
    #"Kept Last Rows"

 

For the 2nd query, you can first filter on "Page" in Kind column, then remove the first row and last row.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Filtered Rows",1)
in
    #"Removed Bottom Rows"

 

Under Home tab, Keep Rows and Remove Rows can also be used to filter rows.

080602.jpg

 

Best Regards,
Community Support Team _ Jing

This report is SUCH A PAIN. Here is a new problem. I get a new report each day and the tables/pages are not consistent, only the data is. I want to keep columns containing the words, "Delivery Driver", any US City and State data, and any numbers containing a 2 digit decimal (example- 1.34 or 234.12). Is this type of query possible?

@joshpgeorge 

I would suggest that you create a new topic in this forum as the new problem is different from the original one in this topic. And also provide some sample data and expected output in the new topic. Currently it's difficult to reply whether it's possible or not because I don't know what the expected output should be like. So please create a new topic to work on the new problem. Thank you for understanding.

 

Jing

Anonymous
Not applicable

The page approach is correct (filter to just the pages, filter out the tables). But with PDFs, I find that you should also sort by column count descending before expanding the tables. This will prevent missing and/or null columns. Columns =Table.AddColumn(TableOrStepName, "Count", each Table.ColumnCount(PdfTableColumn)

Then sort, then expand. You can always re-sort by the page number column. 
--Nate

Following Jakinta's suggestion and that worked to import the pages. Now since I normally use power query (instead of Advanced Editor) to do the import, my only option is to select the tables and that leads me back to my original problem. This is where I am.

 

let
Source = Pdf.Tables(File.Contents("C:\Users\Joshua G\OneDrive - Bestway \Desktop\Indd\20210.pdf")), #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page"))
in
#"Filtered Rows"

Jakinta
Solution Sage
Solution Sage

Does this help?

 

let
    Source = Pdf.Tables(File.Contents("YourPDFfilepath")),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page"))
in
    #"Filtered Rows"

 

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors