Solved: Split Column Help Needed

scoutmedic · ‎11-01-2024

I'm working with a company which captures their routine inspection findings in an excel table. I'm importing the table into Power BI and need to split a column to display individual findings. The company is not willing to change how they capture their findings and I'm at a loss for splitting the column. Here is some sample data for one specific site:

Location	Date Start	Deficency #	Finding	Procedure
New York	1/1/2024	1	1) First Finding. Examples include: a. first example (SOP-001) b. second example (SOP-002) c. third example (Guidance Manual) i.xxx ii. Yyyyy iii. zzzzzzz	SOP-001 SOP-002 Guidance Manual
New York	1/1/2024	2	2) Second Finding. Examples include: a. first example (SOP-001) b. second example (SOP-002)	SOP-001 SOP-002
New York	1/1/2024	3	3) Third finding. (Form SOP-004)	Form SOP-004

In the findings column, the goal is that each item (e.g. first example, second example, etc.) will split into a separate row and the procedure # will populate in the procedure field. Stripping the leading indicators (a, b, etc.) and procedure from the Finding column would be nice as well. The result should look like this:

Location	Date Start	Deficency #	Finding	Procedure
New York	1/1/2024	1	First Finding. Examples include:
New York	1/1/2024	1	first example (SOP-001)	SOP-001
New York	1/1/2024	1	second example (SOP-002)	SOP-002
New York	1/1/2024	1	third example (Guidance Manual) i.xxx ii. Yyyyy iii. Zzzzzzz	Guidance Manual
New York	1/1/2024	2	Second Finding. Examples include:
New York	1/1/2024	2	first example (SOP-001)	SOP-001
New York	1/1/2024	2	second example (SOP-002)	SOP-002
New York	1/1/2024	3	Third finding. (Form SOP-004)	Form SOP-004

I'm open to using whatever method will accoplish the task.

NOTE: Cross-posted in Power Query, but open to best solution.

tackytechtom · ‎11-01-2024

Hi @scoutmedic ,

How about this solution in Power Query?

Before:

After:

Here the code in Power Query M that you can paste into the advanced editor (if you do not know, how to exactly do this, please check out this quick walkthrough)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nVDLDoIwEPyVTU82wfLQk3f15CPBC7EcKlRthMXwiOjX2wImhIgHN5l2NzO7ne3xSLbyAUGW34hFXNu1Pcebm9SAwkrlRalPjBVeGCxrkd4TWYDCKKliueAoGJwbkWw5mPi7/dRxdDPHE4NCRhnGQ9YzbMSgvKq8R64rFQuMJGwEViIxIsXquja3YhA8dTS5Ll5taJ/dgxy72RwHc0hoje3pGVDwW5fji8L/m/Lyi8cflmYGFA7N15w/jiarLE+h7Z5TLenXJAzf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, #"Deficency #" = _t, Finding = _t, Procedure = _t]),
    #"Added Finding" = Table.AddColumn(Source, "Finding_New", each Text.Trim(Text.AfterDelimiter([Finding], ") "))),
    #"Removed Columns" = Table.RemoveColumns(#"Added Finding",{"Finding"}),
    #"Split Column by Delimiter a." = Table.SplitColumn(#"Removed Columns", "Finding_New", Splitter.SplitTextByDelimiter("a. ", QuoteStyle.Csv), {"Finding", "Finding.a"}),
    #"Split Column by Delimiter b." = Table.SplitColumn(#"Split Column by Delimiter a.", "Finding.a", Splitter.SplitTextByDelimiter("b. ", QuoteStyle.Csv), {"Finding.a", "Finding.b"}),
    #"Split Column by Delimiter c." = Table.SplitColumn(#"Split Column by Delimiter b.", "Finding.b", Splitter.SplitTextByDelimiter("c. ", QuoteStyle.Csv), {"Finding.b", "Finding.c"}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter c.", {"Location", "Date Start", "Deficency #", "Procedure"}, "Attribute", "Finding_new"),
    #"Added Procedure_new" = Table.AddColumn(#"Unpivoted Columns", "Procedure_new", each if Text.Contains([Finding_new], "(") and Text.Contains([Finding_new], ")") then
    Text.Middle(
        [Finding_new],
        Text.PositionOf([Finding_new], "(") + 1,
        Text.PositionOf([Finding_new], ")") - Text.PositionOf([Finding_new], "(") - 1
    )
  else
    null),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Procedure_new",{"Location", "Date Start", "Deficency #", "Finding_new", "Procedure_new"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Finding_new", "Finding"}, {"Procedure_new", "Procedure"}})
in
    #"Renamed Columns"

Note, the solution above is not very dynamic and you would need to add additional steps for the other alphabetical bullets like d., e., f. etc. This approach however has its limitations as soon as you get the to the letter i. Additionally, I made a few other assumption based on the data provided. My guts tell me that this approach won't take you all the way since your real data probably has some stuff in it which this solution does not take into consideration (yet). Still, it might get you closer 🙂

Hope this helps!

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/

Did I answer your question❓➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn!

#proudtobeasuperuser

View solution in original post

danextian · ‎11-01-2024

Would have been a simple case if there was only one delimiter but with two or more what I would to is create list of delimiters. In your case that would be letter a. to probably z. (probably but it is better to have the list.

In the code above, you can change the ending delimiter to another letter. And the rest will be splitting the two columns by their respective delimiters then recombining them so the splits align (same row)

Below is a sample M code. You can see at each every applied step the transformations.

let
    Delimiters = let 
letters = {"a".."e"}
in 
List.Transform(letters, each _ &"."),

Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XU7LCsIwEPyVJScLGtPgybt68gGeatNDTKIG61baBqNf77YqSId9zDLD7uY527gHZFV9ZWMm0imFFHJGQ9plAktfNy1VtB7PHBZR3+6la8CjKYN1c4Waw6k3uY8Go/12NxEiTRQeOTTOVGiHoiTRcGgvvv7TVsFbjcbBWmPQJXk8jzFS8xyyJ6GjxA+vHvSiwu+1H5EKB2sUsqJ4Aw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, Date = _t, Deficency = _t, Finding = _t, Procedure = _t]),

#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Date", type date}, {"Deficency", Int64.Type}, {"Finding", type text}, {"Procedure", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Finding_Split", each Splitter.SplitTextByAnyDelimiter(Delimiters, QuoteStyle.Csv)([Finding])),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Procedure_Split", each Text.Split([Procedure], "#(lf)")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Finding", "Procedure"}),
    #"Combine the splits as a single table" = Table.AddColumn(#"Removed Columns", "Split_Combined", each Table.FromColumns({[Finding_Split],[Procedure_Split]}, {"Finding", "Procedure"})),
    #"Removed Columns1" = Table.RemoveColumns(#"Combine the splits as a single table",{"Finding_Split", "Procedure_Split"}),
    #"Expanded Split_Combined" = Table.ExpandTableColumn(#"Removed Columns1", "Split_Combined", {"Finding", "Procedure"}, {"Finding", "Procedure"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Split_Combined", each [Finding] <> null and [Finding] <> ""),
    #"Added Custom3" = Table.AddColumn(#"Filtered Rows", "Finding2", each let 
BeforeDelimiter = Text.BeforeDelimiter([Finding],")"),
firstlinecheck =  try Number.From(BeforeDelimiter) otherwise null
in if firstlinecheck = null then [Finding] else Text.AfterDelimiter([Finding], ")")),
    #"Trimmed Text" = Table.TransformColumns(#"Added Custom3",{{"Finding2", Text.Trim, type text}}),
    #"Removed Columns2" = Table.RemoveColumns(#"Trimmed Text",{"Finding"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns2",{"Location", "Date", "Deficency", "Finding2", "Procedure"}),
    #"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{{"Finding2", "Finding"}})
in
    #"Renamed Columns"

You can also add an index column prior to splitting to identify which original row the final/split rows belong to.

Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

ThxAlot · ‎11-01-2024

Just one line of code for the extraction,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tU9NC4JAEP0rg6cWbNXNU/fs1AfYRdTDpmst6Rir0tavb1cJIqhbA2/mwZvHm0lTZytukLTq4rhO4AUe81loqQWBSKquNx1LiScKK82bay06kFjUQymWGXIK1bgkJg1m8W4/9/2AZHik0ImixfJTZEYsKPRnqd609SBLjoWADceB12ZHUq21GZJCcjdlqeGPqZzc/XY+syAQT+l/uP9H9sKCwGF8rnpFz6JWNTDZQ2PPnw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, Deficency = _t, #"# Finding" = _t]),

    #"Run Python script" = Python.Execute("import re#(lf)dataset['Procedure'] = dataset['# Finding'].apply(lambda string: '\n'.join([m.group() for m in re.finditer(r'(?<=\()[^)]+', string)]))",[dataset=Source]),

    dataset = #"Run Python script"{[Name="dataset"]}[Value]
in
    dataset

Expertise = List.Accumulate(

{Days as from Today},

{Skills and Knowledge},

(Current, Everyday) => Current & Day.LearnAndPractise(Everyday)

)

scoutmedic · ‎11-01-2024

Thanks all. I'm still new to PBI and am still in the one-step fix-all mindset. Makes complete sense to split the delimiters up into separate steps.

@tackytechtom replacing the "i" with an asterisk or another bulletpoint option would address that limitation. Yes, there are multiple other fields I'd need to manage, but this was my biggest hurdle to overcome.

I'll do some testing next week to confirm these will work. I'm also open to any other suggestions as well.

tackytechtom · ‎11-01-2024

Hi @scoutmedic ,

How about this solution in Power Query?

Before:

After:

Here the code in Power Query M that you can paste into the advanced editor (if you do not know, how to exactly do this, please check out this quick walkthrough)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nVDLDoIwEPyVTU82wfLQk3f15CPBC7EcKlRthMXwiOjX2wImhIgHN5l2NzO7ne3xSLbyAUGW34hFXNu1Pcebm9SAwkrlRalPjBVeGCxrkd4TWYDCKKliueAoGJwbkWw5mPi7/dRxdDPHE4NCRhnGQ9YzbMSgvKq8R64rFQuMJGwEViIxIsXquja3YhA8dTS5Ll5taJ/dgxy72RwHc0hoje3pGVDwW5fji8L/m/Lyi8cflmYGFA7N15w/jiarLE+h7Z5TLenXJAzf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, #"Deficency #" = _t, Finding = _t, Procedure = _t]),
    #"Added Finding" = Table.AddColumn(Source, "Finding_New", each Text.Trim(Text.AfterDelimiter([Finding], ") "))),
    #"Removed Columns" = Table.RemoveColumns(#"Added Finding",{"Finding"}),
    #"Split Column by Delimiter a." = Table.SplitColumn(#"Removed Columns", "Finding_New", Splitter.SplitTextByDelimiter("a. ", QuoteStyle.Csv), {"Finding", "Finding.a"}),
    #"Split Column by Delimiter b." = Table.SplitColumn(#"Split Column by Delimiter a.", "Finding.a", Splitter.SplitTextByDelimiter("b. ", QuoteStyle.Csv), {"Finding.a", "Finding.b"}),
    #"Split Column by Delimiter c." = Table.SplitColumn(#"Split Column by Delimiter b.", "Finding.b", Splitter.SplitTextByDelimiter("c. ", QuoteStyle.Csv), {"Finding.b", "Finding.c"}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter c.", {"Location", "Date Start", "Deficency #", "Procedure"}, "Attribute", "Finding_new"),
    #"Added Procedure_new" = Table.AddColumn(#"Unpivoted Columns", "Procedure_new", each if Text.Contains([Finding_new], "(") and Text.Contains([Finding_new], ")") then
    Text.Middle(
        [Finding_new],
        Text.PositionOf([Finding_new], "(") + 1,
        Text.PositionOf([Finding_new], ")") - Text.PositionOf([Finding_new], "(") - 1
    )
  else
    null),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Procedure_new",{"Location", "Date Start", "Deficency #", "Finding_new", "Procedure_new"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Finding_new", "Finding"}, {"Procedure_new", "Procedure"}})
in
    #"Renamed Columns"

Note, the solution above is not very dynamic and you would need to add additional steps for the other alphabetical bullets like d., e., f. etc. This approach however has its limitations as soon as you get the to the letter i. Additionally, I made a few other assumption based on the data provided. My guts tell me that this approach won't take you all the way since your real data probably has some stuff in it which this solution does not take into consideration (yet). Still, it might get you closer 🙂

Hope this helps!

/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/

Did I answer your question❓➡️ Please, mark my post as a solution ✔️

Also happily accepting Kudos 🙂

Feel free to connect with me on LinkedIn!

#proudtobeasuperuser

OktayPamuk80 · ‎11-01-2024

Hi,

You can use following, just for testing create a blank query in power Query and put this within the advaned editor:

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tVDLDoIwEPyVTU+QYHnIybt48pHghQCHCkUbYTE8Ivr1toAJIerNTaadZrbT2YYh2fE7BGV1JQaxTdt0LMdVVEEHT1R1I1dMBZ4prDtW3HJeg8Akb1O+ipBRyPomPmig+fvDwrJsPcIThZonJaZz0ZFiQqG5iGqibVqRMkw4bBm2LJc9gnZdJzdBIXjIUlTy51Ay4vhWhKNvhDMTEhvfRnQUdPCHhH+Y8WO+H4GWCjoc+1/J3nk0r6wKGO67ynN6JnH8Ag==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, Deficency = _t, #"# Finding" = _t, Procedure = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Date Start", type date}, {"Deficency", Int64.Type}, {"# Finding", type text}, {"Procedure", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"# Finding", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "# Finding"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"# Finding", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Procedure"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns", "# Finding", Splitter.SplitTextByDelimiter(" (", QuoteStyle.Csv), {"# Finding.1", "# Finding.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"# Finding.1", type text}, {"# Finding.2", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type2",")","",Replacer.ReplaceText,{"# Finding.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Replaced Value",{{"# Finding.2", "Procedure"}})
in
#"Renamed Columns"

What is does:

1) I use split by delimiter and use linefeed as delimiter (At the bottom of the dialog, expand Advanced options and check Split using special characters)

2) You can delete Procedure, as you can get this information from the new column by splitting by delimiter " ("

3) with the new column, you get replace ")" with nothing and you have your result

Regards,

Oktay

Did I answer your question? Then please mark my post as the solution.

If I helped you, click on the Thumbs Up to give Kudos.

scoutmedic · ‎11-07-2024

@OktayPamuk80 that might work, but I would need to keep the "i", "ii", and "iii" with their finding. Thoughts?

OktayPamuk80 · ‎11-07-2024

Yes, I didn't notice that one. However, glad Tom's solution worked out.

Split Column Help Needed

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

How to Get Your Question Answered Quickly

FabCon is coming to Atlanta