Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have an extremely large text file that I need to extract useful information from. I wrote an example of what it looks like below. Basically I want to extract data to keep the green fields so that Column1 has the Protocol Code row and Column2 has the subseqent row starting with DEPARTMENT.
The Protocol code is always 2 rows down from the line that starts with PROTOCOL ID, but the Department line is not always in the same relative position (but it always starts with DEPARTMENT:). Example:
-
junk
more junk
more junk
PROTOCOL ID:
junk
[PROTOCOL CODE]
more junk
more junk
more junk
more junk
DEPARTMENT: [Department Name]
more junk
more junk
junk of varying lines
I've been at this all day and it seems impossible to write a powerquery that I can use with any export like this.
let
Source = Text.FromBinary(File.Contents("C:[MY PATH]"), 1252),
#"Split by Line" = Text.Split(Source, "#(cr)#(lf)"),
#"Converted to Table" = Table.FromList(#"Split by Line", Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.None), {"Column1", "Column2", "Column3", "Column4"}),
#"Filtered Rows1" = Table.SelectRows(#"Converted to Table", each Text.StartsWith([Column1], "PROTOCOL ID:")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each try #"Converted to Table"{[Index]+2}[Column1] otherwise null),
#"Filtered Rows2" = Table.SelectRows(#"Converted to Table", each Text.StartsWith([Column1], "DEPARTMENT")),
#"Appended Queries" = Table.Combine({#"Filtered Rows1", #"Custom"}),
#"Merged Queries" = Table.Combine({#"Appended Queries", #"Filtered Rows2"})
in
#"Merged Queries"
Ideas? Thank you!!
Solved! Go to Solution.
Oh, you are trying to combine the protocol and department lines into a single row. Instead of using Table.Combine, use
Table.FromColumns({#"Added Custom"[Custom], #"Filtered Rows2"[Column2]})
The original code above gave me issues trying to split the text file; there's no good delimiter I can see. SO I tried this below, both with and without the duplicate/rename column steps.
But this gives me rows that JUST start with "PROTOCOL ID" at the end. It's not returning the value from 2 rows down. I can see the other rows that start with DEPARTMENT when I clock on the #Filtered Rows2 step, but the merged query step ends up just showing me the same lines I see in the #FilteredRows1 step. Both columns remain the same in all steps.
I have isolated the two columns that I want and can see them in different steps, but just need one to appear in Column1 and the other to appear in Column2.
I feel like I'm really close!!
let Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\[PATH], null, null, 1252)}), #"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"),
#"Renamed Columns" = Table.RenameColumns(#"Duplicated Column",{{"Column1 - Copy", "Column2"}}), #"Filtered Rows1" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column1], "PROTOCOL")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each try #"Renamed Columns"{[Index]+2}[Column1] otherwise null),
#"Filtered Rows2" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column2], "DEPARTMENT")),
#"Merged Queries" = Table.Combine({#"Added Custom", #"Filtered Rows2"})
in
#"Merged Queries"
Try if this helps.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WyirNy1aK1YlWys0vSlXAwQsI8g/xd/b3UfB0sQILwGWi4VLO/i6usQRNwsdzcQ1wDArxdfULsVJQiHZJLUgsKslNzStR8EvMTSVsNIihkJ+mUJZYVJmZl66Qk5mXWqwUGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
Custom = Table.AddColumn(Source, "Custom", each let i= List.PositionOf(Source[Column], "PROTOCOL ID:") in if [Column]= "PROTOCOL ID:" then Source[Column]{i+2} else if Text.StartsWith ([Column], "DEPARTMENT") then Text.AfterDelimiter ([Column], "DEPARTMENT: ") else null ),
Filtered = Table.SelectRows(Custom, each ([Custom] <> null)),
FINAL = Table.TransformColumns(Filtered, {{"Column", each Text.BeforeDelimiter(_, ":"), type text}})
in
FINAL
Oh, you are trying to combine the protocol and department lines into a single row. Instead of using Table.Combine, use
Table.FromColumns({#"Added Custom"[Custom], #"Filtered Rows2"[Column2]})
Yes!!! Thank you!!
The only problem I see is at the end. The #"Appended Queries" step isn't needed. And the merged queries should combine #"Added Custom" with #"Filtered Rows2"
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.