The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi!
I'm very new to power query and I am having some issues with dynamic values!
Context:
I have a Word Document (converted to PDF) that contains several tables. I want to bring a specific table to power query. Doing this is no problem, and automating what table is chosen is not a problem either (since I know the Table name of interest beforehand). My problem however, lies in my attempt to futureproof the power query code in the event that the table would change name!
What my current code is doing: It filters away the "page" results and am left with only the Table results. Then expand the data and add a conditional column, if the row contains a certain word (only present in one table) the Table name (Id) is added to the row. My issue is the next step.
Problem:
I can't find a way to reference the row with the TableID of interest and filter the Id column so only that table is present.
Code:
let
Source = Pdf.Tables(File.Contents("XYZ.pdf"), [Implementation="1.3"]),
#"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Id], "Page")),
#"Expanded Data" = Table.ExpandTableColumn(#"Filtered Rows", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4"}),
#"Added Conditional Column" = Table.AddColumn(#"Expanded Data", "Custom", each if [Data.Column2] = "XYZ" then [Id] else null),
#"Table of Interest" = Table.SelectRows ( #"Added Conditional Column", ??? ))
in
#"Table of Interest"
If I need to format anything differently, or need to clarify anything just let me know!
If there's a smarter way to do it feel free to tell me!
Solved! Go to Solution.
Ok, cool.
I don't think we need a conditional column. The following query turns this:
...into this:
In Power Query, create a new blank query, open Advanced Editor from the Home tab, then paste all of this over the default code so you can see each step in action:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCklMykk1MDBU0lFSitVB4Ts6OaMLubi6IQsZoekC8SMiowgpQeYbA/nuHp7oQkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, Data.Column2 = _t]),
repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Data.Column2"}),
groupIdAllRows = Table.Group(repBlankNull, {"Id"}, {{"data", each _, type table [Id=nullable text, Data.Column2=nullable text]}}),
filterTargetTable = Table.SelectRows(groupIdAllRows, each List.Contains([data][Data.Column2], "XYZ")),
expandDataCol = Table.ExpandTableColumn(filterTargetTable, "data", {"Data.Column2"}, {"Data.Column2"})
in
expandDataCol
--Summary--
repBlankNull - You can ignore this step, it just swaps blank values for nulls.
groupIdAllRows - Group the table in [Id], but use the All Rows aggregator to keep all the other rows in nested tables.
filterTargetTable - Evaluates the [Data.Column2] column in each nested table to see if it contains "XYZ", and keeps the [Id] row if any of them do.
expandDataCol - Just expand back out any of the nested columns that you need.
Pete
Proud to be a Datanaut!
Hi @TimAUpp ,
I may be oversimplifying your scenario here but, following your current query logic, it looks like you can just filter on the [Data.Column2] value:
let
Source = Pdf.Tables(File.Contents("XYZ.pdf"), [Implementation="1.3"]),
#"Filtered Rows" = Table.SelectRows(Source, each not Text.StartsWith([Id], "Page")),
#"Expanded Data" = Table.ExpandTableColumn(#"Filtered Rows", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4"}),
#"Table of Interest" =
Table.SelectRows( #"Expanded Data", each [Data.Column2] = "XYZ")
in
#"Table of Interest"
Pete
Proud to be a Datanaut!
Hi @BA_Pete !
That works, but right now my table of interest is "Table002". I want to avoid the scenario where the PDF is updated and another table gets added before Table002 so current Table002 --> becomes Table003. Then I'd have to alter the Power Query and that's what I am trying to avoid!
If any more clarification is needed please let me know!
Kind regards
This is what my table looks like for further clarification!
I'm struggling to follow. My suggestion just filtered your table on [Data.Column2] = "XYZ", which was in logical consistency to your original query.
In the absence of any further information, I assumed that Table002[Data.Column2] = "XYZ" was unique, and remained true even if the table is updated to Table003 (to become Table003[Data.Column2] = "XYZ" and is still unique).
What other information is there available to identify the exact row that you want, besides [Data.Column2]?
Are you actually asking for the file name to be brought into the results table to be matched against?
*EDIT* -------------------------------
Hold on, I think I might get what you're on about here.
Do you want to identify the the Table[Id] that has [Data.Column2] = "XYZ", then filter the query for ALL rows that contain that Table[Id]?
---------------------------------------
Pete
Proud to be a Datanaut!
Thanks for your persistence in trying to help when I am not being very clear!
What you mention in your edit is what I am trying to achieve, the Added Conditional Column gives me the Table[Id] I want to filter the [Id] column by.
Ok, cool.
I don't think we need a conditional column. The following query turns this:
...into this:
In Power Query, create a new blank query, open Advanced Editor from the Home tab, then paste all of this over the default code so you can see each step in action:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCklMykk1MDBU0lFSitVB4Ts6OaMLubi6IQsZoekC8SMiowgpQeYbA/nuHp7oQkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, Data.Column2 = _t]),
repBlankNull = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Data.Column2"}),
groupIdAllRows = Table.Group(repBlankNull, {"Id"}, {{"data", each _, type table [Id=nullable text, Data.Column2=nullable text]}}),
filterTargetTable = Table.SelectRows(groupIdAllRows, each List.Contains([data][Data.Column2], "XYZ")),
expandDataCol = Table.ExpandTableColumn(filterTargetTable, "data", {"Data.Column2"}, {"Data.Column2"})
in
expandDataCol
--Summary--
repBlankNull - You can ignore this step, it just swaps blank values for nulls.
groupIdAllRows - Group the table in [Id], but use the All Rows aggregator to keep all the other rows in nested tables.
filterTargetTable - Evaluates the [Data.Column2] column in each nested table to see if it contains "XYZ", and keeps the [Id] row if any of them do.
expandDataCol - Just expand back out any of the nested columns that you need.
Pete
Proud to be a Datanaut!
Thank you very much for your detailed explanation and very fast help!
The example you provided is very educational! Now I just need some time to apply this to my own table 😄
Again, thank you very much for your help!
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.