Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
LegoTelephone
Regular Visitor

Transforming a messy text export

I have an extremely large text file that I need to extract useful information from. I wrote an example of what it looks like below. Basically I want to extract data to keep the green fields so that Column1 has the Protocol Code row and Column2 has the subseqent row starting with DEPARTMENT.

The Protocol code is always 2 rows down from the line that starts with PROTOCOL ID, but the Department line is not always in the same relative position (but it always starts with DEPARTMENT:). Example:
-

junk

more junk

more junk

PROTOCOL ID:

junk

[PROTOCOL CODE]
more junk

more junk

more junk

more junk

DEPARTMENT:  [Department Name]

more junk

more junk

junk of varying lines


I've been at this all day and it seems impossible to write a powerquery that I can use with any export like this. 

 

let
    Source = Text.FromBinary(File.Contents("C:[MY PATH]"), 1252),
    #"Split by Line" = Text.Split(Source, "#(cr)#(lf)"),
    #"Converted to Table" = Table.FromList(#"Split by Line", Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.None), {"Column1", "Column2", "Column3", "Column4"}),
    #"Filtered Rows1" = Table.SelectRows(#"Converted to Table", each Text.StartsWith([Column1], "PROTOCOL ID:")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each try #"Converted to Table"{[Index]+2}[Column1] otherwise null),
    #"Filtered Rows2" = Table.SelectRows(#"Converted to Table", each Text.StartsWith([Column1], "DEPARTMENT")),
    #"Appended Queries" = Table.Combine({#"Filtered Rows1", #"Custom"}),
    #"Merged Queries" = Table.Combine({#"Appended Queries", #"Filtered Rows2"})
in
    #"Merged Queries"

 

Ideas? Thank you!!

1 ACCEPTED SOLUTION

Oh, you are trying to combine the protocol and department lines into a single row. Instead of using Table.Combine, use 

Table.FromColumns({#"Added Custom"[Custom], #"Filtered Rows2"[Column2]})

 

View solution in original post

5 REPLIES 5
LegoTelephone
Regular Visitor

The original code above gave me issues trying to split the text file; there's no good delimiter I can see. SO I tried this below, both with and without the duplicate/rename column steps.
But this gives me rows that JUST start with "PROTOCOL ID" at the end. It's not returning the value from 2 rows down. I can see the other rows that start with DEPARTMENT when I clock on the #Filtered Rows2 step, but the merged query step ends up just showing me the same lines  I see in the #FilteredRows1 step.  Both columns remain the same in all steps.
I have isolated the two columns that I want and can see them in different steps, but just need one to appear in Column1 and the other to appear in Column2.
I feel like I'm really close!!

 

 

let Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\[PATH], null, null, 1252)}), #"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"), 
#"Renamed Columns" = Table.RenameColumns(#"Duplicated Column",{{"Column1 - Copy", "Column2"}}), #"Filtered Rows1" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column1], "PROTOCOL")), 
#"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each try #"Renamed Columns"{[Index]+2}[Column1] otherwise null), 
#"Filtered Rows2" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column2], "DEPARTMENT")), 
#"Merged Queries" = Table.Combine({#"Added Custom", #"Filtered Rows2"}) 
in 
#"Merged Queries"

 

 

Try if this helps.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WyirNy1aK1YlWys0vSlXAwQsI8g/xd/b3UfB0sQILwGWi4VLO/i6usQRNwsdzcQ1wDArxdfULsVJQiHZJLUgsKslNzStR8EvMTSVsNIihkJ+mUJZYVJmZl66Qk5mXWqwUGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
    Custom = Table.AddColumn(Source, "Custom", each let i= List.PositionOf(Source[Column], "PROTOCOL ID:") in if [Column]= "PROTOCOL ID:" then Source[Column]{i+2} else if Text.StartsWith ([Column], "DEPARTMENT") then Text.AfterDelimiter ([Column], "DEPARTMENT:  ") else null  ),
    Filtered = Table.SelectRows(Custom, each ([Custom] <> null)),
    FINAL = Table.TransformColumns(Filtered, {{"Column", each Text.BeforeDelimiter(_, ":"), type text}})
in
    FINAL

Oh, you are trying to combine the protocol and department lines into a single row. Instead of using Table.Combine, use 

Table.FromColumns({#"Added Custom"[Custom], #"Filtered Rows2"[Column2]})

 

Yes!!! Thank you!!

artemus
Microsoft Employee
Microsoft Employee

The only problem I see is at the end. The #"Appended Queries" step isn't needed. And the merged queries should combine #"Added Custom" with #"Filtered Rows2"

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.