Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Mederic
Post Patron
Post Patron

Extract range in a pdf

Hello,

I'm working on a single pdf file.
Once I've finished all the transformation steps, I'd like to turn it into a function
However, I'm having trouble selecting the range in the table
I've added an "Index" column but then I don't know how to continue.

 

I would like to keep dynamically the range from "Column1" to the "Index" column according to the interval :
- if Column1 = "INVOICE N°" 
- if Column 5 = "Puissance :" (Row 20-1=Row 19) 
In this example, il want tu keep the range Row 15 to 19 and delete the rest (see screenshot 3)

 

I don't know where to start with all the possibilities: List.PositionOf, Text.Contains....etc

Thanks in advance for your help

Best regards

let
    Source = File.Contents(AllPath[Column1]{0}),
    Custom1 = Pdf.Tables(Source),
    #"Filtered Rows" = Table.SelectRows(Custom1, each ([Kind] = "Page")),
    Custom2 = Table.TransformColumns(#"Filtered Rows", {"Data", each Table.AddIndexColumn(_, "Index", 1,1) })


in
    Custom2

 

Step_1.jpgStep_2.jpgStep_3.jpg

5 REPLIES 5
Mederic
Post Patron
Post Patron

Hello,

after a headache, I got around the problem of my previous message,
Taking "Table003" instead of the "Page001", it worked for the 23 pdf, even though I would have liked to use "Page001"
I also simplified the code M (see below)

One question please, is it possible to combine the 2 lines "Custom3" and "Custom2" in one ?

Otherwise, I have not quite finished yet, I still have 4 columns to add but it is more complicated there
Thanks in advance

Best regards

 

(file_binary)=>
let
    //Source = File.Contents(AllPath[Column1]{0}),
    Custom1 = Pdf.Tables(file_binary),
    #"Filtered Rows3" = Table.SelectRows(Custom1, each ([Id] = "Table003")),
    Custom3 = Table.TransformColumns(#"Filtered Rows3", {"Data", each Table.PromoteHeaders(_)}),
    Custom2 = Table.TransformColumns(Custom3, {"Data", each Table.SelectRows(_, each ([#"Unit Price"] <> null and [#"Unit Price"] <> "" ))}),
    #"Removed Other Columns" = Table.SelectColumns(Custom2,{"Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Reference", "Designation", "Qty/Time", "Unit Price", "Amount excl. taxes"}, {"Reference", "Designation", "Qty/Time", "Unit Price", "Amount excl. taxes"})
in
    #"Expanded Data"

 

Table003.png

Mederic
Post Patron
Post Patron

Hello everyone,
I haven't made any progress since my last message but I'm not giving up hope 🤗.
Today I converted the code into a function and applied it to the 23 Binary files (pdf)

It works very well for this part of the code except for 3 or 4 pdf files,
The data is shifted to the right with an almost empty column1
My code returns an empty table for these 3 or 4 files
I don't know how to correct this

Thank you in advance for your help
Best Regards


decalage_pdf.jpg

Anonymous
Not applicable

Hi @Mederic 

You can try to use Table.RemoveMatchingRows() function.

e.g 

= Table.AddColumn(#"Changed Type1", "Custom", each if [Column3]="apple" then Table.RemoveMatchingRows(#"Changed Type1",{[Index=1]},"Index") else null)

vxinruzhumsft_0-1696299756603.png

 

You can refer to the following link about the function.

Table.RemoveMatchingRows - PowerQuery M | Microsoft Learn

 

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hello @Anonymous ,

Thank you for your message and for your help,
Last night, I got the result in screenshot with the code below
It seems to work even if I would have liked to see other more elegant solutions.
I still have to add the last columns but I'm currently Table.

Best Regards

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type any}, {"Column9", type text}, {"Column10", type any}, {"Column11", type text}, {"Column12", type text}, {"Column13", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    RecordList = Table.AddColumn(#"Added Index", "Custom", each Record.ToList(_)),
    ListContains = Table.AddColumn(RecordList, "Custom.1", each List.ContainsAny([Custom], {"Reference"})),
    #"Filtered Rows" = Table.SelectRows(ListContains, each ([Custom.1] = true)),
    StartPointRows = #"Filtered Rows"{0}[Index],
    RefersChangedType = #"Changed Type",
    #"Removed Top Rows" = Table.Skip(RefersChangedType,StartPointRows-1),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Reference", type text}, {"Designation", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Qty/Time", type text}, {"Column7", type text}, {"Column8", type any}, {"Unit Price", Int64.Type}, {"Column10", type any}, {"", type text}, {"Amount excl. taxes", Int64.Type}, {"Column13", Int64.Type}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type1", each ([Unit Price] <> null)),
    SelectColumns = #"Filtered Rows1"[[Reference],[Designation],[#"Qty/Time"],[#"Unit Price"],[#"Amount excl. taxes"]]
in
    SelectColumns

 

Resultat_Partiel.png

Mederic
Post Patron
Post Patron

Hello,

I know it's not easy, especially with images,
I enclose a similar Excel file of the pdf table and a new capture corresponding to the final result.

https://we.tl/t-6h9tTuYXBM 

Thank you in advance
Best regards

 

Output Pdf.png

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors