Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
LegoTelephone
Regular Visitor

Transforming a messy text export

I have an extremely large text file that I need to extract useful information from. I wrote an example of what it looks like below. Basically I want to extract data to keep the green fields so that Column1 has the Protocol Code row and Column2 has the subseqent row starting with DEPARTMENT.

The Protocol code is always 2 rows down from the line that starts with PROTOCOL ID, but the Department line is not always in the same relative position (but it always starts with DEPARTMENT:). Example:
-

junk

more junk

more junk

PROTOCOL ID:

junk

[PROTOCOL CODE]
more junk

more junk

more junk

more junk

DEPARTMENT:  [Department Name]

more junk

more junk

junk of varying lines


I've been at this all day and it seems impossible to write a powerquery that I can use with any export like this. 

 

let
    Source = Text.FromBinary(File.Contents("C:[MY PATH]"), 1252),
    #"Split by Line" = Text.Split(Source, "#(cr)#(lf)"),
    #"Converted to Table" = Table.FromList(#"Split by Line", Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.None), {"Column1", "Column2", "Column3", "Column4"}),
    #"Filtered Rows1" = Table.SelectRows(#"Converted to Table", each Text.StartsWith([Column1], "PROTOCOL ID:")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each try #"Converted to Table"{[Index]+2}[Column1] otherwise null),
    #"Filtered Rows2" = Table.SelectRows(#"Converted to Table", each Text.StartsWith([Column1], "DEPARTMENT")),
    #"Appended Queries" = Table.Combine({#"Filtered Rows1", #"Custom"}),
    #"Merged Queries" = Table.Combine({#"Appended Queries", #"Filtered Rows2"})
in
    #"Merged Queries"

 

Ideas? Thank you!!

1 ACCEPTED SOLUTION

Oh, you are trying to combine the protocol and department lines into a single row. Instead of using Table.Combine, use 

Table.FromColumns({#"Added Custom"[Custom], #"Filtered Rows2"[Column2]})

 

View solution in original post

5 REPLIES 5
LegoTelephone
Regular Visitor

The original code above gave me issues trying to split the text file; there's no good delimiter I can see. SO I tried this below, both with and without the duplicate/rename column steps.
But this gives me rows that JUST start with "PROTOCOL ID" at the end. It's not returning the value from 2 rows down. I can see the other rows that start with DEPARTMENT when I clock on the #Filtered Rows2 step, but the merged query step ends up just showing me the same lines  I see in the #FilteredRows1 step.  Both columns remain the same in all steps.
I have isolated the two columns that I want and can see them in different steps, but just need one to appear in Column1 and the other to appear in Column2.
I feel like I'm really close!!

 

 

let Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\[PATH], null, null, 1252)}), #"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"), 
#"Renamed Columns" = Table.RenameColumns(#"Duplicated Column",{{"Column1 - Copy", "Column2"}}), #"Filtered Rows1" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column1], "PROTOCOL")), 
#"Added Custom" = Table.AddColumn(#"Filtered Rows1", "Custom", each try #"Renamed Columns"{[Index]+2}[Column1] otherwise null), 
#"Filtered Rows2" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Column2], "DEPARTMENT")), 
#"Merged Queries" = Table.Combine({#"Added Custom", #"Filtered Rows2"}) 
in 
#"Merged Queries"

 

 

Try if this helps.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WyirNy1aK1YlWys0vSlXAwQsI8g/xd/b3UfB0sQILwGWi4VLO/i6usQRNwsdzcQ1wDArxdfULsVJQiHZJLUgsKslNzStR8EvMTSVsNIihkJ+mUJZYVJmZl66Qk5mXWqwUGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
    Custom = Table.AddColumn(Source, "Custom", each let i= List.PositionOf(Source[Column], "PROTOCOL ID:") in if [Column]= "PROTOCOL ID:" then Source[Column]{i+2} else if Text.StartsWith ([Column], "DEPARTMENT") then Text.AfterDelimiter ([Column], "DEPARTMENT:  ") else null  ),
    Filtered = Table.SelectRows(Custom, each ([Custom] <> null)),
    FINAL = Table.TransformColumns(Filtered, {{"Column", each Text.BeforeDelimiter(_, ":"), type text}})
in
    FINAL

Oh, you are trying to combine the protocol and department lines into a single row. Instead of using Table.Combine, use 

Table.FromColumns({#"Added Custom"[Custom], #"Filtered Rows2"[Column2]})

 

Yes!!! Thank you!!

artemus
Employee
Employee

The only problem I see is at the end. The #"Appended Queries" step isn't needed. And the merged queries should combine #"Added Custom" with #"Filtered Rows2"

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors