Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
TimAUpp
New Member

Filtering Column A with a text value from column B

Hi! 

I'm very new to power query and I am having some issues with dynamic values!

 

Context:
I have a Word Document (converted to PDF) that contains several tables. I want to bring a specific table to power query. Doing this is no problem, and automating what table is chosen is not a problem either (since I know the Table name of interest beforehand). My problem however, lies in my attempt to futureproof the power query code in the event that the table would change name! 


What my current code is doing: It filters away the "page" results and am left with only the Table results. Then expand the data and add a conditional column, if the row contains a certain word (only present in one table) the Table name (Id) is added to the row. My issue is the next step.


Problem:

I can't find a way to reference the row with the TableID of interest and filter the Id column so only that table is present.

 

Code: 

let
    Source = Pdf.Tables(File.Contents("XYZ.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Id], "Page")),
    #"Expanded Data" = Table.ExpandTableColumn(#"Filtered Rows", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4"}),
    #"Added Conditional Column" = Table.AddColumn(#"Expanded Data", "Custom", each if [Data.Column2] = "XYZ" then [Id] else null),
    #"Table of Interest" = Table.SelectRows ( #"Added Conditional Column", ??? ))
in
    #"Table of Interest"

 

 

If I need to format anything differently,  or need to clarify anything just let me know!

If there's a smarter way to do it feel free to tell me!

 

1 ACCEPTED SOLUTION

 

Ok, cool.

I don't think we need a conditional column. The following query turns this:

BA_Pete_0-1675424400557.png

 

...into this:

BA_Pete_1-1675424425374.png

 

In Power Query, create a new blank query, open Advanced Editor from the Home tab, then paste all of this over the default code so you can see each step in action:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCklMykk1MDBU0lFSitVB4Ts6OaMLubi6IQsZoekC8SMiowgpQeYbA/nuHp7oQkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, Data.Column2 = _t]),
    repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Data.Column2"}),
    groupIdAllRows = Table.Group(repBlankNull, {"Id"}, {{"data", each _, type table [Id=nullable text, Data.Column2=nullable text]}}),
    filterTargetTable = Table.SelectRows(groupIdAllRows, each List.Contains([data][Data.Column2], "XYZ")),
    expandDataCol = Table.ExpandTableColumn(filterTargetTable, "data", {"Data.Column2"}, {"Data.Column2"})
in
    expandDataCol

 

--Summary--

repBlankNull - You can ignore this step, it just swaps blank values for nulls.

groupIdAllRows - Group the table in [Id], but use the All Rows aggregator to keep all the other rows in nested tables.

filterTargetTable - Evaluates the [Data.Column2] column in each nested table to see if it contains "XYZ", and keeps the [Id] row if any of them do.

expandDataCol - Just expand back out any of the nested columns that you need.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

7 REPLIES 7
BA_Pete
Super User
Super User

Hi @TimAUpp ,

 

I may be oversimplifying your scenario here but, following your current query logic, it looks like you can just filter on the [Data.Column2] value:

let
    Source = Pdf.Tables(File.Contents("XYZ.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Id], "Page")),
    #"Expanded Data" = Table.ExpandTableColumn(#"Filtered Rows", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4"}),

    #"Table of Interest" =
        Table.SelectRows( #"Expanded Data", each [Data.Column2] = "XYZ")

in
    #"Table of Interest"

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi @BA_Pete !

 

That works, but right now my table of interest is "Table002". I want to avoid the scenario where the PDF is updated and another table gets added before Table002 so current Table002 --> becomes Table003. Then I'd have to alter the Power Query and that's what I am trying to avoid! 

If any more clarification is needed please let me know! 

Kind regards 

1BBEF586-F42A-41A5-B9D5-1AEF7B2D7B38.png

This is what my table looks like for further clarification!

 

I'm struggling to follow. My suggestion just filtered your table on [Data.Column2] = "XYZ", which was in logical consistency to your original query.

In the absence of any further information, I assumed that Table002[Data.Column2] = "XYZ" was unique, and remained true even if the table is updated to Table003 (to become Table003[Data.Column2] = "XYZ" and is still unique).

What other information is there available to identify the exact row that you want, besides [Data.Column2]?

Are you actually asking for the file name to be brought into the results table to be matched against?

 

*EDIT* -------------------------------

Hold on, I think I might get what you're on about here.

Do you want to identify the the Table[Id] that has [Data.Column2] = "XYZ", then filter the query for ALL rows that contain that Table[Id]?

---------------------------------------

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Thanks for your persistence in trying to help when I am not being very clear! 

What you mention in your edit is what I am trying to achieve, the Added Conditional Column gives me the Table[Id] I want to filter the [Id] column by.

 

Ok, cool.

I don't think we need a conditional column. The following query turns this:

BA_Pete_0-1675424400557.png

 

...into this:

BA_Pete_1-1675424425374.png

 

In Power Query, create a new blank query, open Advanced Editor from the Home tab, then paste all of this over the default code so you can see each step in action:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCklMykk1MDBU0lFSitVB4Ts6OaMLubi6IQsZoekC8SMiowgpQeYbA/nuHp7oQkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, Data.Column2 = _t]),
    repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Data.Column2"}),
    groupIdAllRows = Table.Group(repBlankNull, {"Id"}, {{"data", each _, type table [Id=nullable text, Data.Column2=nullable text]}}),
    filterTargetTable = Table.SelectRows(groupIdAllRows, each List.Contains([data][Data.Column2], "XYZ")),
    expandDataCol = Table.ExpandTableColumn(filterTargetTable, "data", {"Data.Column2"}, {"Data.Column2"})
in
    expandDataCol

 

--Summary--

repBlankNull - You can ignore this step, it just swaps blank values for nulls.

groupIdAllRows - Group the table in [Id], but use the All Rows aggregator to keep all the other rows in nested tables.

filterTargetTable - Evaluates the [Data.Column2] column in each nested table to see if it contains "XYZ", and keeps the [Id] row if any of them do.

expandDataCol - Just expand back out any of the nested columns that you need.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Thank you very much for your detailed explanation and very fast help!

 

The example you provided is very educational! Now I just need some time to apply this to my own table 😄

 

Again, thank you very much for your help! 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors