How to import not homogeneous text file?

cgkas · ‎12-12-2022

Good day,

May someone could help me with this complex problem [at least for me 🙂 ]. I'm using Power Query in Excel, but the rules would be almost the same in Power Bi.

I ´m trying to import this not homogeneous text file that has upper part with one format and the lower part with other format.

Upper data appears in blocks that begins with parameter Az until Gz and values are to the right after the "=" sign in same line. Within each block [Az to Gz] are sub blocks separated with an empty line. So when tabulate the values of those sub blocks should go in different lines.

For the lower part of the data with second format (a mix of first format and new format), the blocks of data begins with parameter "Bz" until RRCO. Then comes some consecutive lines arranged in 4 columns.

I have the idea to import it separating by spaces or by "=" sign, but since the file has the second format in lower part, separating by "=" wouldn't separate the second format in lower part

My expected output is like this:

My input file is like this:

Thanks in advance for any help.

SOME TEXT SOME TEXT SOME TEXT
SOME TEXT
RETCODE = 0

                        Az = 1
                        Bz = 5
                        Cz = 3
						 
                        Dz = 0
                        Ez = 0
						 
                        Dz = 4
                        Ez = 1
                        Gz = 1


blah blah


---    ENDBLOCK

SOME TEXT
RETCODE = 0
                        Az = 3
                        Bz = 2
                        Cz = 8
                        Dz = 6
                        Ez = 9
                        Fz = 3



blah blah

blah blah

---    ENDBLOCK

SOME TEXT
RETCODE = 0
                        Az = 7
                        Bz = 2
                        Cz = 2
                        Dz = 9
                        Ez = 0
						 
                        Dz = 1
                        Ez = 4
						 
                        Dz = 7
                        Ez = 0
                        Fz = 1
						 
                        Dz = 1
                        Gz = 8


blah blah

blah blah

---    ENDBLOCK
%%XYZ HGGYT:%%

BLA BLAH = 0 

                            Bz = 2
                            Cz = MAIN-332
                           DRM = CTRRES
                          RRCO = DLAM
 
blah blahblah blahblah blahblah blah
-----------------------------------------------
 
RDX         FCD         MBZ          HGW                        
PRM         <NULL>      XJN          bbv.uj002.pkl706.svfdata   
PRM         ALLO        STT          bbv.uj003.pkl706.svfdata   
PRM         <NULL>      BNN          bbv.uj410.pkl310.svfdata   


blah blah

---    ENDBLOCK
%%XYZ HGGYT:%%

BLA BLAH = 0 

                            Bz = 5
                            Cz = GOPAG
                           DRM = BRI5T
                          RRCO = TALD
 
blah blahblah blahblah blahblah blah
-----------------------------------------------
 
RDX         FCD         MBZ          HGW  
Rdx         Fcd         Mbz          bbv.123k.svfdata
REJ         PRMRJC      YW           bbv.322h7.svfdata
PRM         <NULL>      G4           bbv.119.svfdata

cgkas · ‎12-13-2022

Thanks for your answer. I was able to import the file and remove garbage and empty lines. Now I have a table containing only rows I need, but I'm stuck in how to manipulate it in order to get the desired output.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("zZVdT8IwFIb/SkOyO0f2wYeYeLGtY4D7IKXGAXLBQEOECxPFmP16186tMLZ2RIyehHJWnr49PX3J5vPGJPBsgO0Qg5Kssbg6IOgTsrEVQBvcAoU+0wFUhBEnnMpFTIK0uYhFEF28G4zzqqoQO65XOJVqiaX4Z3MYwoZot9wAMhTmZVmmsj403cC6Yz/w+1+1t8GaVoXQ1mvi1l+LW9URt6rHRfqFO+b2q2TqN7rXvUj3+AgUt+Ysz/INaTNbi6X4x7fFf7Z+wf4/KNxhPryEMyQpnM7AwHGm+EaSGGu6Bkg+A3I0IC6cRA0bkKBW8IyhL+u6kIbIS2gLI2RPBCxCVpCw0DW8lDxuAyfJmnNOHGyBYJgX0bdgnnvmjFU3cB6qCqci4+SgWTzuFUVf+feuS7OndDYc+WxNFH009y+KojVft7uu0mm+fTyvl+/LMjnDdYMsn2B8KqLXECmvyfRPamqpCpHTk6+C3J97k/9uJUG96QRjw6lnTBMN27ieL7Hhwv/qyxRffzJ8tWZ4FDOc3LGq6dvsbr9fJaMcSDyDRlaaTw8tT1bqmrbpHi0VW8xpFURUtcckFl8=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Filtered Rows" = Table.SelectRows(Source, 
                         each 
                            Text.StartsWith([Column1], " ") or
                            //List.Count(Text.Split([Column1]," ")) > 4 
                            Text.Contains([Column1], "svfdata")
                        ),
    RemoveMultiSpaces = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.Combine(List.Select(Text.SplitAny(_, " "), each _ <> "")," "), type text}}),
    #"Filtered Rows1" = Table.SelectRows(RemoveMultiSpaces, each ([Column1] <> "")),
    #"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1"," = ","@",Replacer.ReplaceText,{"Column1"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," ","@",Replacer.ReplaceText,{"Column1"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1", "Column1", Splitter.SplitTextByDelimiter("@", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4"})
in
    #"Split Column by Delimiter"

This is my current output.

v-rzhou-msft · ‎12-12-2022

Hi @cgkas ,

As far as I know, Power BI does not support to load text file with unhomogeneous structure as you requirement.

Here I suggest you to try "Extract Table Using Examples" function in Text connector.

Function:

You may refer to this video to learn more details: Import Text/CSV using Examples in Power BI

Best Regards,
Rico Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

How to import not homogeneous text file?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly