Extract range in a pdf

Mederic · ‎10-01-2023

Hello,

I'm working on a single pdf file.
Once I've finished all the transformation steps, I'd like to turn it into a function
However, I'm having trouble selecting the range in the table
I've added an "Index" column but then I don't know how to continue.

I would like to keep dynamically the range from "Column1" to the "Index" column according to the interval :
- if Column1 = "INVOICE N°"
- if Column 5 = "Puissance :" (Row 20-1=Row 19)
In this example, il want tu keep the range Row 15 to 19 and delete the rest (see screenshot 3)

I don't know where to start with all the possibilities: List.PositionOf, Text.Contains....etc

Thanks in advance for your help

Best regards

let
    Source = File.Contents(AllPath[Column1]{0}),
    Custom1 = Pdf.Tables(Source),
    #"Filtered Rows" = Table.SelectRows(Custom1, each ([Kind] = "Page")),
    Custom2 = Table.TransformColumns(#"Filtered Rows", {"Data", each Table.AddIndexColumn(_, "Index", 1,1) })


in
    Custom2

Mederic · ‎10-03-2023

Hello,

after a headache, I got around the problem of my previous message,
Taking "Table003" instead of the "Page001", it worked for the 23 pdf, even though I would have liked to use "Page001"
I also simplified the code M (see below)

One question please, is it possible to combine the 2 lines "Custom3" and "Custom2" in one ?

Otherwise, I have not quite finished yet, I still have 4 columns to add but it is more complicated there
Thanks in advance

Best regards

(file_binary)=>
let
    //Source = File.Contents(AllPath[Column1]{0}),
    Custom1 = Pdf.Tables(file_binary),
    #"Filtered Rows3" = Table.SelectRows(Custom1, each ([Id] = "Table003")),
    Custom3 = Table.TransformColumns(#"Filtered Rows3", {"Data", each Table.PromoteHeaders(_)}),
    Custom2 = Table.TransformColumns(Custom3, {"Data", each Table.SelectRows(_, each ([#"Unit Price"] <> null and [#"Unit Price"] <> "" ))}),
    #"Removed Other Columns" = Table.SelectColumns(Custom2,{"Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", {"Reference", "Designation", "Qty/Time", "Unit Price", "Amount excl. taxes"}, {"Reference", "Designation", "Qty/Time", "Unit Price", "Amount excl. taxes"})
in
    #"Expanded Data"

Mederic · ‎10-03-2023

Hello everyone,
I haven't made any progress since my last message but I'm not giving up hope 🤗.
Today I converted the code into a function and applied it to the 23 Binary files (pdf)

It works very well for this part of the code except for 3 or 4 pdf files,
The data is shifted to the right with an almost empty column1
My code returns an empty table for these 3 or 4 files
I don't know how to correct this

Thank you in advance for your help
Best Regards

Anonymous · ‎10-02-2023

Hi @Mederic

You can try to use Table.RemoveMatchingRows() function.

e.g

= Table.AddColumn(#"Changed Type1", "Custom", each if [Column3]="apple" then Table.RemoveMatchingRows(#"Changed Type1",{[Index=1]},"Index") else null)

You can refer to the following link about the function.

Table.RemoveMatchingRows - PowerQuery M | Microsoft Learn

Best Regards!

Yolo Zhu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Mederic · ‎10-03-2023

Hello @Anonymous ,

Thank you for your message and for your help,
Last night, I got the result in screenshot with the code below
It seems to work even if I would have liked to see other more elegant solutions.
I still have to add the last columns but I'm currently Table.

Best Regards

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type any}, {"Column9", type text}, {"Column10", type any}, {"Column11", type text}, {"Column12", type text}, {"Column13", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    RecordList = Table.AddColumn(#"Added Index", "Custom", each Record.ToList(_)),
    ListContains = Table.AddColumn(RecordList, "Custom.1", each List.ContainsAny([Custom], {"Reference"})),
    #"Filtered Rows" = Table.SelectRows(ListContains, each ([Custom.1] = true)),
    StartPointRows = #"Filtered Rows"{0}[Index],
    RefersChangedType = #"Changed Type",
    #"Removed Top Rows" = Table.Skip(RefersChangedType,StartPointRows-1),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Reference", type text}, {"Designation", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Qty/Time", type text}, {"Column7", type text}, {"Column8", type any}, {"Unit Price", Int64.Type}, {"Column10", type any}, {"", type text}, {"Amount excl. taxes", Int64.Type}, {"Column13", Int64.Type}}),
    #"Filtered Rows1" = Table.SelectRows(#"Changed Type1", each ([Unit Price] <> null)),
    SelectColumns = #"Filtered Rows1"[[Reference],[Designation],[#"Qty/Time"],[#"Unit Price"],[#"Amount excl. taxes"]]
in
    SelectColumns

Mederic · ‎10-02-2023

Hello,

I know it's not easy, especially with images,
I enclose a similar Excel file of the pdf table and a new capture corresponding to the final result.

https://we.tl/t-6h9tTuYXBM

Thank you in advance
Best regards

Extract range in a pdf

Helpful resources

Power BI Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Extract range in a pdf

Helpful resources

Power BI Monthly Update - September 2025

FabCon Atlanta 2026