Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
scoutmedic
Frequent Visitor

Split Column Help Needed

I'm working with a company which captures their routine inspection findings in an excel table. I'm importing the table into Power BI and need to split a column to display individual findings. The company is not willing to change how they capture their findings and I'm at a loss for splitting the column. Here is some sample data for one specific site:

 

LocationDate StartDeficency #FindingProcedure
New York1/1/202411) First Finding. Examples include:
a. first example (SOP-001)
b. second example (SOP-002)
c. third example (Guidance Manual)
i.xxx
ii. Yyyyy
iii. zzzzzzz
SOP-001
SOP-002
Guidance Manual
New York1/1/202422) Second Finding. Examples include:
a. first example (SOP-001)
b. second example (SOP-002)
SOP-001
SOP-002
New York1/1/202433) Third finding. (Form SOP-004)Form SOP-004

 

In the findings column, the goal is that each item (e.g. first example, second example, etc.) will split into a separate row and the procedure # will populate in the procedure field. Stripping the leading indicators (a, b, etc.) and procedure from the Finding column would be nice as well. The result should look like this:

 

LocationDate StartDeficency #FindingProcedure
New York1/1/20241First Finding. Examples include: 
New York1/1/20241first example (SOP-001)SOP-001
New York1/1/20241second example (SOP-002)SOP-002
New York1/1/20241third example (Guidance Manual)
i.xxx
ii. Yyyyy
iii. Zzzzzzz
Guidance Manual
New York1/1/20242Second Finding. Examples include: 
New York1/1/20242first example (SOP-001)SOP-001
New York1/1/20242second example (SOP-002)SOP-002
New York1/1/20243Third finding. (Form SOP-004)Form SOP-004

 

I'm open to using whatever method will accoplish the task. 

 

NOTE: Cross-posted in Power Query, but open to best solution.

1 ACCEPTED SOLUTION
tackytechtom
Super User
Super User

Hi @scoutmedic ,

 

How about this solution in Power Query?

Before:

tackytechtom_0-1730495470359.png

 

After:

tackytechtom_1-1730495507136.png

 

Here the code in Power Query M that you can paste into the advanced editor (if you do not know, how to exactly do this, please check out this quick walkthrough)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nVDLDoIwEPyVTU82wfLQk3f15CPBC7EcKlRthMXwiOjX2wImhIgHN5l2NzO7ne3xSLbyAUGW34hFXNu1Pcebm9SAwkrlRalPjBVeGCxrkd4TWYDCKKliueAoGJwbkWw5mPi7/dRxdDPHE4NCRhnGQ9YzbMSgvKq8R64rFQuMJGwEViIxIsXquja3YhA8dTS5Ll5taJ/dgxy72RwHc0hoje3pGVDwW5fji8L/m/Lyi8cflmYGFA7N15w/jiarLE+h7Z5TLenXJAzf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, #"Deficency #" = _t, Finding = _t, Procedure = _t]),
    #"Added Finding" = Table.AddColumn(Source, "Finding_New", each Text.Trim(Text.AfterDelimiter([Finding], ") "))),
    #"Removed Columns" = Table.RemoveColumns(#"Added Finding",{"Finding"}),
    #"Split Column by Delimiter a." = Table.SplitColumn(#"Removed Columns", "Finding_New", Splitter.SplitTextByDelimiter("a. ", QuoteStyle.Csv), {"Finding", "Finding.a"}),
    #"Split Column by Delimiter b." = Table.SplitColumn(#"Split Column by Delimiter a.", "Finding.a", Splitter.SplitTextByDelimiter("b. ", QuoteStyle.Csv), {"Finding.a", "Finding.b"}),
    #"Split Column by Delimiter c." = Table.SplitColumn(#"Split Column by Delimiter b.", "Finding.b", Splitter.SplitTextByDelimiter("c. ", QuoteStyle.Csv), {"Finding.b", "Finding.c"}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter c.", {"Location", "Date Start", "Deficency #", "Procedure"}, "Attribute", "Finding_new"),
    #"Added Procedure_new" = Table.AddColumn(#"Unpivoted Columns", "Procedure_new", each if Text.Contains([Finding_new], "(") and Text.Contains([Finding_new], ")") then
    Text.Middle(
        [Finding_new],
        Text.PositionOf([Finding_new], "(") + 1,
        Text.PositionOf([Finding_new], ")") - Text.PositionOf([Finding_new], "(") - 1
    )
  else
    null),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Procedure_new",{"Location", "Date Start", "Deficency #", "Finding_new", "Procedure_new"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Finding_new", "Finding"}, {"Procedure_new", "Procedure"}})
in
    #"Renamed Columns"

 

Note, the solution above is not very dynamic and you would need to add additional steps for the other alphabetical bullets like d., e., f. etc. This approach however has its limitations as soon as you get the to the letter i. Additionally, I made a few other assumption based on the data provided. My guts tell me that this approach won't take you all the way since your real data probably has some stuff in it which this solution does not take into consideration (yet). Still, it might get you closer 🙂

 

Hope this helps!

 

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/



Did I answer your question➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn! linkedIn

#proudtobeasuperuser 

View solution in original post

7 REPLIES 7
danextian
Super User
Super User

Would have been a simple case if there was only one delimiter but with two or more what I would  to is create list of delimiters. In your case that would be letter a. to probably z. (probably but it is better to have the list.

danextian_0-1730524980462.png

In the code above, you can change the ending delimiter to another letter. And the rest will be splitting the two columns by their respective delimiters then recombining them so the splits align (same row)

danextian_2-1730525202017.png

 

Below is a sample M code. You can see at each every applied step the transformations.

let
    Delimiters = let 
letters = {"a".."e"}
in 
List.Transform(letters, each _ &"."),

Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XU7LCsIwEPyVJScLGtPgybt68gGeatNDTKIG61baBqNf77YqSId9zDLD7uY527gHZFV9ZWMm0imFFHJGQ9plAktfNy1VtB7PHBZR3+6la8CjKYN1c4Waw6k3uY8Go/12NxEiTRQeOTTOVGiHoiTRcGgvvv7TVsFbjcbBWmPQJXk8jzFS8xyyJ6GjxA+vHvSiwu+1H5EKB2sUsqJ4Aw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, Date = _t, Deficency = _t, Finding = _t, Procedure = _t]),

#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Date", type date}, {"Deficency", Int64.Type}, {"Finding", type text}, {"Procedure", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Finding_Split", each Splitter.SplitTextByAnyDelimiter(Delimiters, QuoteStyle.Csv)([Finding])),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Procedure_Split", each Text.Split([Procedure], "#(lf)")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Finding", "Procedure"}),
    #"Combine the splits as a single table" = Table.AddColumn(#"Removed Columns", "Split_Combined", each Table.FromColumns({[Finding_Split],[Procedure_Split]}, {"Finding", "Procedure"})),
    #"Removed Columns1" = Table.RemoveColumns(#"Combine the splits as a single table",{"Finding_Split", "Procedure_Split"}),
    #"Expanded Split_Combined" = Table.ExpandTableColumn(#"Removed Columns1", "Split_Combined", {"Finding", "Procedure"}, {"Finding", "Procedure"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Split_Combined", each [Finding] <> null and [Finding] <> ""),
    #"Added Custom3" = Table.AddColumn(#"Filtered Rows", "Finding2", each let 
BeforeDelimiter = Text.BeforeDelimiter([Finding],")"),
firstlinecheck =  try Number.From(BeforeDelimiter) otherwise null
in if firstlinecheck = null then [Finding] else Text.AfterDelimiter([Finding], ")")),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom3",{{"Finding2", Text.Trim, type text}}),
    #"Removed Columns2" = Table.RemoveColumns(#"Trimmed Text",{"Finding"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns2",{"Location", "Date", "Deficency", "Finding2", "Procedure"}),
    #"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{{"Finding2", "Finding"}})
in
    #"Renamed Columns"

You can also add an index column prior to splitting to identify which original row the final/split rows belong to.

 










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.
ThxAlot
Super User
Super User

Just one line of code for the extraction,

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tU9NC4JAEP0rg6cWbNXNU/fs1AfYRdTDpmst6Rir0tavb1cJIqhbA2/mwZvHm0lTZytukLTq4rhO4AUe81loqQWBSKquNx1LiScKK82bay06kFjUQymWGXIK1bgkJg1m8W4/9/2AZHik0ImixfJTZEYsKPRnqd609SBLjoWADceB12ZHUq21GZJCcjdlqeGPqZzc/XY+syAQT+l/uP9H9sKCwGF8rnpFz6JWNTDZQ2PPnw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, Deficency = _t, #"# Finding" = _t]),

    #"Run Python script" = Python.Execute("import re#(lf)dataset['Procedure'] = dataset['# Finding'].apply(lambda string: '\n'.join([m.group() for m in re.finditer(r'(?<=\()[^)]+', string)]))",[dataset=Source]),

    dataset = #"Run Python script"{[Name="dataset"]}[Value]
in
    dataset

 

 

ThxAlot_0-1730507697990.png



Expertise = List.Accumulate(


        {Days as from Today},


        {Skills and Knowledge},


        (Current, Everyday) => Current & Day.LeanAndPractise(Everyday)


)



scoutmedic
Frequent Visitor

Thanks all. I'm still new to PBI and am still in the one-step fix-all mindset. Makes complete sense to split the delimiters up into separate steps.

 

@tackytechtom replacing the "i" with an asterisk or another bulletpoint option would address that limitation. Yes, there are multiple other fields I'd need to manage, but this was my biggest hurdle to overcome.

 

I'll do some testing next week to confirm these will work. I'm also open to any other suggestions as well.

tackytechtom
Super User
Super User

Hi @scoutmedic ,

 

How about this solution in Power Query?

Before:

tackytechtom_0-1730495470359.png

 

After:

tackytechtom_1-1730495507136.png

 

Here the code in Power Query M that you can paste into the advanced editor (if you do not know, how to exactly do this, please check out this quick walkthrough)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nVDLDoIwEPyVTU82wfLQk3f15CPBC7EcKlRthMXwiOjX2wImhIgHN5l2NzO7ne3xSLbyAUGW34hFXNu1Pcebm9SAwkrlRalPjBVeGCxrkd4TWYDCKKliueAoGJwbkWw5mPi7/dRxdDPHE4NCRhnGQ9YzbMSgvKq8R64rFQuMJGwEViIxIsXquja3YhA8dTS5Ll5taJ/dgxy72RwHc0hoje3pGVDwW5fji8L/m/Lyi8cflmYGFA7N15w/jiarLE+h7Z5TLenXJAzf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, #"Deficency #" = _t, Finding = _t, Procedure = _t]),
    #"Added Finding" = Table.AddColumn(Source, "Finding_New", each Text.Trim(Text.AfterDelimiter([Finding], ") "))),
    #"Removed Columns" = Table.RemoveColumns(#"Added Finding",{"Finding"}),
    #"Split Column by Delimiter a." = Table.SplitColumn(#"Removed Columns", "Finding_New", Splitter.SplitTextByDelimiter("a. ", QuoteStyle.Csv), {"Finding", "Finding.a"}),
    #"Split Column by Delimiter b." = Table.SplitColumn(#"Split Column by Delimiter a.", "Finding.a", Splitter.SplitTextByDelimiter("b. ", QuoteStyle.Csv), {"Finding.a", "Finding.b"}),
    #"Split Column by Delimiter c." = Table.SplitColumn(#"Split Column by Delimiter b.", "Finding.b", Splitter.SplitTextByDelimiter("c. ", QuoteStyle.Csv), {"Finding.b", "Finding.c"}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter c.", {"Location", "Date Start", "Deficency #", "Procedure"}, "Attribute", "Finding_new"),
    #"Added Procedure_new" = Table.AddColumn(#"Unpivoted Columns", "Procedure_new", each if Text.Contains([Finding_new], "(") and Text.Contains([Finding_new], ")") then
    Text.Middle(
        [Finding_new],
        Text.PositionOf([Finding_new], "(") + 1,
        Text.PositionOf([Finding_new], ")") - Text.PositionOf([Finding_new], "(") - 1
    )
  else
    null),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Procedure_new",{"Location", "Date Start", "Deficency #", "Finding_new", "Procedure_new"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Finding_new", "Finding"}, {"Procedure_new", "Procedure"}})
in
    #"Renamed Columns"

 

Note, the solution above is not very dynamic and you would need to add additional steps for the other alphabetical bullets like d., e., f. etc. This approach however has its limitations as soon as you get the to the letter i. Additionally, I made a few other assumption based on the data provided. My guts tell me that this approach won't take you all the way since your real data probably has some stuff in it which this solution does not take into consideration (yet). Still, it might get you closer 🙂

 

Hope this helps!

 

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/



Did I answer your question➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn! linkedIn

#proudtobeasuperuser 

OktayPamuk80
Resolver II
Resolver II

Hi,

You can use following, just for testing create a blank query in power Query and put this within the advaned editor:

 

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tVDLDoIwEPyVTU+QYHnIybt48pHghQCHCkUbYTE8Ivr1toAJIerNTaadZrbT2YYh2fE7BGV1JQaxTdt0LMdVVEEHT1R1I1dMBZ4prDtW3HJeg8Akb1O+ipBRyPomPmig+fvDwrJsPcIThZonJaZz0ZFiQqG5iGqibVqRMkw4bBm2LJc9gnZdJzdBIXjIUlTy51Ay4vhWhKNvhDMTEhvfRnQUdPCHhH+Y8WO+H4GWCjoc+1/J3nk0r6wKGO67ynN6JnH8Ag==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, Deficency = _t, #"# Finding" = _t, Procedure = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Date Start", type date}, {"Deficency", Int64.Type}, {"# Finding", type text}, {"Procedure", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"# Finding", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "# Finding"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"# Finding", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Procedure"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns", "# Finding", Splitter.SplitTextByDelimiter(" (", QuoteStyle.Csv), {"# Finding.1", "# Finding.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"# Finding.1", type text}, {"# Finding.2", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type2",")","",Replacer.ReplaceText,{"# Finding.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Replaced Value",{{"# Finding.2", "Procedure"}})
in
#"Renamed Columns"

What is does:

1) I use split by delimiter and use linefeed as delimiter (At the bottom of the dialog, expand Advanced options and check Split using special characters)

2) You can delete Procedure, as you can get this information from the new column by splitting by delimiter " ("

3) with the new column, you get replace ")" with nothing and you have your result

Regards,

Oktay

 

Did I answer your question? Then please mark my post as the solution.

If I helped you, click on the Thumbs Up to give Kudos.

 

@OktayPamuk80 that might work, but I would need to keep the "i", "ii", and "iii" with their finding. Thoughts?

Yes, I didn't notice that one. However, glad Tom's solution worked out.

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.