Solved: Re: Filtering Column A with a text value from colu...

TimAUpp · ‎02-03-2023

Hi!

I'm very new to power query and I am having some issues with dynamic values!

Context:
I have a Word Document (converted to PDF) that contains several tables. I want to bring a specific table to power query. Doing this is no problem, and automating what table is chosen is not a problem either (since I know the Table name of interest beforehand). My problem however, lies in my attempt to futureproof the power query code in the event that the table would change name!

What my current code is doing: It filters away the "page" results and am left with only the Table results. Then expand the data and add a conditional column, if the row contains a certain word (only present in one table) the Table name (Id) is added to the row. My issue is the next step.

Problem:

I can't find a way to reference the row with the TableID of interest and filter the Id column so only that table is present.

Code:

let
    Source = Pdf.Tables(File.Contents("XYZ.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Id], "Page")),
    #"Expanded Data" = Table.ExpandTableColumn(#"Filtered Rows", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4"}),
    #"Added Conditional Column" = Table.AddColumn(#"Expanded Data", "Custom", each if [Data.Column2] = "XYZ" then [Id] else null),
    #"Table of Interest" = Table.SelectRows ( #"Added Conditional Column", ??? ))
in
    #"Table of Interest"

If I need to format anything differently, or need to clarify anything just let me know!

If there's a smarter way to do it feel free to tell me!

BA_Pete · ‎02-03-2023

Ok, cool.

I don't think we need a conditional column. The following query turns this:

...into this:

In Power Query, create a new blank query, open Advanced Editor from the Home tab, then paste all of this over the default code so you can see each step in action:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCklMykk1MDBU0lFSitVB4Ts6OaMLubi6IQsZoekC8SMiowgpQeYbA/nuHp7oQkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, Data.Column2 = _t]),
    repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Data.Column2"}),
    groupIdAllRows = Table.Group(repBlankNull, {"Id"}, {{"data", each _, type table [Id=nullable text, Data.Column2=nullable text]}}),
    filterTargetTable = Table.SelectRows(groupIdAllRows, each List.Contains([data][Data.Column2], "XYZ")),
    expandDataCol = Table.ExpandTableColumn(filterTargetTable, "data", {"Data.Column2"}, {"Data.Column2"})
in
    expandDataCol

--Summary--

repBlankNull - You can ignore this step, it just swaps blank values for nulls.

groupIdAllRows - Group the table in [Id], but use the All Rows aggregator to keep all the other rows in nested tables.

filterTargetTable - Evaluates the [Data.Column2] column in each nested table to see if it contains "XYZ", and keeps the [Id] row if any of them do.

expandDataCol - Just expand back out any of the nested columns that you need.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

View solution in original post

BA_Pete · ‎02-03-2023

Hi @TimAUpp ,

I may be oversimplifying your scenario here but, following your current query logic, it looks like you can just filter on the [Data.Column2] value:

let
    Source = Pdf.Tables(File.Contents("XYZ.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Id], "Page")),
    #"Expanded Data" = Table.ExpandTableColumn(#"Filtered Rows", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4"}),

    #"Table of Interest" =
        Table.SelectRows( #"Expanded Data", each [Data.Column2] = "XYZ")

in
    #"Table of Interest"

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

TimAUpp · ‎02-03-2023

Hi @BA_Pete !

That works, but right now my table of interest is "Table002". I want to avoid the scenario where the PDF is updated and another table gets added before Table002 so current Table002 --> becomes Table003. Then I'd have to alter the Power Query and that's what I am trying to avoid!

If any more clarification is needed please let me know!

Kind regards

TimAUpp · ‎02-03-2023

This is what my table looks like for further clarification!

BA_Pete · ‎02-03-2023

I'm struggling to follow. My suggestion just filtered your table on [Data.Column2] = "XYZ", which was in logical consistency to your original query.

In the absence of any further information, I assumed that Table002[Data.Column2] = "XYZ" was unique, and remained true even if the table is updated to Table003 (to become Table003[Data.Column2] = "XYZ" and is still unique).

What other information is there available to identify the exact row that you want, besides [Data.Column2]?

Are you actually asking for the file name to be brought into the results table to be matched against?

*EDIT* -------------------------------

Hold on, I think I might get what you're on about here.

Do you want to identify the the Table[Id] that has [Data.Column2] = "XYZ", then filter the query for ALL rows that contain that Table[Id]?

---------------------------------------

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

TimAUpp · ‎02-03-2023

Thanks for your persistence in trying to help when I am not being very clear!

What you mention in your edit is what I am trying to achieve, the Added Conditional Column gives me the Table[Id] I want to filter the [Id] column by.

BA_Pete · ‎02-03-2023

Ok, cool.

I don't think we need a conditional column. The following query turns this:

...into this:

In Power Query, create a new blank query, open Advanced Editor from the Home tab, then paste all of this over the default code so you can see each step in action:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCklMykk1MDBU0lFSitVB4Ts6OaMLubi6IQsZoekC8SMiowgpQeYbA/nuHp7oQkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, Data.Column2 = _t]),
    repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Data.Column2"}),
    groupIdAllRows = Table.Group(repBlankNull, {"Id"}, {{"data", each _, type table [Id=nullable text, Data.Column2=nullable text]}}),
    filterTargetTable = Table.SelectRows(groupIdAllRows, each List.Contains([data][Data.Column2], "XYZ")),
    expandDataCol = Table.ExpandTableColumn(filterTargetTable, "data", {"Data.Column2"}, {"Data.Column2"})
in
    expandDataCol

--Summary--

repBlankNull - You can ignore this step, it just swaps blank values for nulls.

groupIdAllRows - Group the table in [Id], but use the All Rows aggregator to keep all the other rows in nested tables.

filterTargetTable - Evaluates the [Data.Column2] column in each nested table to see if it contains "XYZ", and keeps the [Id] row if any of them do.

expandDataCol - Just expand back out any of the nested columns that you need.

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

TimAUpp · ‎02-03-2023

Thank you very much for your detailed explanation and very fast help!

The example you provided is very educational! Now I just need some time to apply this to my own table 😄

Again, thank you very much for your help!

Filtering Column A with a text value from column B

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Filtering Column A with a text value from column B

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026