Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
I'm working with a company which captures their routine inspection findings in an excel table. I'm importing the table into Power BI and need to split a column to display individual findings. The company is not willing to change how they capture their findings and I'm at a loss for splitting the column. Here is some sample data for one specific site:
Location | Date Start | Deficency # | Finding | Procedure |
New York | 1/1/2024 | 1 | 1) First Finding. Examples include: a. first example (SOP-001) b. second example (SOP-002) c. third example (Guidance Manual) i.xxx ii. Yyyyy iii. zzzzzzz | SOP-001 SOP-002 Guidance Manual |
New York | 1/1/2024 | 2 | 2) Second Finding. Examples include: a. first example (SOP-001) b. second example (SOP-002) | SOP-001 SOP-002 |
New York | 1/1/2024 | 3 | 3) Third finding. (Form SOP-004) | Form SOP-004 |
In the findings column, the goal is that each item (e.g. first example, second example, etc.) will split into a separate row and the procedure # will populate in the procedure field. Stripping the leading indicators (a, b, etc.) and procedure from the Finding column would be nice as well. The result should look like this:
Location | Date Start | Deficency # | Finding | Procedure |
New York | 1/1/2024 | 1 | First Finding. Examples include: | |
New York | 1/1/2024 | 1 | first example (SOP-001) | SOP-001 |
New York | 1/1/2024 | 1 | second example (SOP-002) | SOP-002 |
New York | 1/1/2024 | 1 | third example (Guidance Manual) i.xxx ii. Yyyyy iii. Zzzzzzz | Guidance Manual |
New York | 1/1/2024 | 2 | Second Finding. Examples include: | |
New York | 1/1/2024 | 2 | first example (SOP-001) | SOP-001 |
New York | 1/1/2024 | 2 | second example (SOP-002) | SOP-002 |
New York | 1/1/2024 | 3 | Third finding. (Form SOP-004) | Form SOP-004 |
I'm open to using whatever method will accoplish the task.
NOTE: Cross-posted in Power Query, but open to best solution.
Solved! Go to Solution.
Hi @scoutmedic ,
How about this solution in Power Query?
Before:
After:
Here the code in Power Query M that you can paste into the advanced editor (if you do not know, how to exactly do this, please check out this quick walkthrough)
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nVDLDoIwEPyVTU82wfLQk3f15CPBC7EcKlRthMXwiOjX2wImhIgHN5l2NzO7ne3xSLbyAUGW34hFXNu1Pcebm9SAwkrlRalPjBVeGCxrkd4TWYDCKKliueAoGJwbkWw5mPi7/dRxdDPHE4NCRhnGQ9YzbMSgvKq8R64rFQuMJGwEViIxIsXquja3YhA8dTS5Ll5taJ/dgxy72RwHc0hoje3pGVDwW5fji8L/m/Lyi8cflmYGFA7N15w/jiarLE+h7Z5TLenXJAzf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, #"Deficency #" = _t, Finding = _t, Procedure = _t]), #"Added Finding" = Table.AddColumn(Source, "Finding_New", each Text.Trim(Text.AfterDelimiter([Finding], ") "))), #"Removed Columns" = Table.RemoveColumns(#"Added Finding",{"Finding"}), #"Split Column by Delimiter a." = Table.SplitColumn(#"Removed Columns", "Finding_New", Splitter.SplitTextByDelimiter("a. ", QuoteStyle.Csv), {"Finding", "Finding.a"}), #"Split Column by Delimiter b." = Table.SplitColumn(#"Split Column by Delimiter a.", "Finding.a", Splitter.SplitTextByDelimiter("b. ", QuoteStyle.Csv), {"Finding.a", "Finding.b"}), #"Split Column by Delimiter c." = Table.SplitColumn(#"Split Column by Delimiter b.", "Finding.b", Splitter.SplitTextByDelimiter("c. ", QuoteStyle.Csv), {"Finding.b", "Finding.c"}), #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter c.", {"Location", "Date Start", "Deficency #", "Procedure"}, "Attribute", "Finding_new"), #"Added Procedure_new" = Table.AddColumn(#"Unpivoted Columns", "Procedure_new", each if Text.Contains([Finding_new], "(") and Text.Contains([Finding_new], ")") then Text.Middle( [Finding_new], Text.PositionOf([Finding_new], "(") + 1, Text.PositionOf([Finding_new], ")") - Text.PositionOf([Finding_new], "(") - 1 ) else null), #"Removed Other Columns" = Table.SelectColumns(#"Added Procedure_new",{"Location", "Date Start", "Deficency #", "Finding_new", "Procedure_new"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Finding_new", "Finding"}, {"Procedure_new", "Procedure"}}) in #"Renamed Columns"
Note, the solution above is not very dynamic and you would need to add additional steps for the other alphabetical bullets like d., e., f. etc. This approach however has its limitations as soon as you get the to the letter i. Additionally, I made a few other assumption based on the data provided. My guts tell me that this approach won't take you all the way since your real data probably has some stuff in it which this solution does not take into consideration (yet). Still, it might get you closer 🙂
Hope this helps!
/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/
Did I answer your question❓➡️ Please, mark my post as a solution ✔️ |
Also happily accepting Kudos 🙂 |
Feel free to connect with me on LinkedIn! |
#proudtobeasuperuser |
Would have been a simple case if there was only one delimiter but with two or more what I would to is create list of delimiters. In your case that would be letter a. to probably z. (probably but it is better to have the list.
In the code above, you can change the ending delimiter to another letter. And the rest will be splitting the two columns by their respective delimiters then recombining them so the splits align (same row)
Below is a sample M code. You can see at each every applied step the transformations.
let
Delimiters = let
letters = {"a".."e"}
in
List.Transform(letters, each _ &"."),
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XU7LCsIwEPyVJScLGtPgybt68gGeatNDTKIG61baBqNf77YqSId9zDLD7uY527gHZFV9ZWMm0imFFHJGQ9plAktfNy1VtB7PHBZR3+6la8CjKYN1c4Waw6k3uY8Go/12NxEiTRQeOTTOVGiHoiTRcGgvvv7TVsFbjcbBWmPQJXk8jzFS8xyyJ6GjxA+vHvSiwu+1H5EKB2sUsqJ4Aw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, Date = _t, Deficency = _t, Finding = _t, Procedure = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Date", type date}, {"Deficency", Int64.Type}, {"Finding", type text}, {"Procedure", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Finding_Split", each Splitter.SplitTextByAnyDelimiter(Delimiters, QuoteStyle.Csv)([Finding])),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Procedure_Split", each Text.Split([Procedure], "#(lf)")),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Finding", "Procedure"}),
#"Combine the splits as a single table" = Table.AddColumn(#"Removed Columns", "Split_Combined", each Table.FromColumns({[Finding_Split],[Procedure_Split]}, {"Finding", "Procedure"})),
#"Removed Columns1" = Table.RemoveColumns(#"Combine the splits as a single table",{"Finding_Split", "Procedure_Split"}),
#"Expanded Split_Combined" = Table.ExpandTableColumn(#"Removed Columns1", "Split_Combined", {"Finding", "Procedure"}, {"Finding", "Procedure"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Split_Combined", each [Finding] <> null and [Finding] <> ""),
#"Added Custom3" = Table.AddColumn(#"Filtered Rows", "Finding2", each let
BeforeDelimiter = Text.BeforeDelimiter([Finding],")"),
firstlinecheck = try Number.From(BeforeDelimiter) otherwise null
in if firstlinecheck = null then [Finding] else Text.AfterDelimiter([Finding], ")")),
#"Trimmed Text" = Table.TransformColumns(#"Added Custom3",{{"Finding2", Text.Trim, type text}}),
#"Removed Columns2" = Table.RemoveColumns(#"Trimmed Text",{"Finding"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns2",{"Location", "Date", "Deficency", "Finding2", "Procedure"}),
#"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{{"Finding2", "Finding"}})
in
#"Renamed Columns"
You can also add an index column prior to splitting to identify which original row the final/split rows belong to.
Proud to be a Super User!
Just one line of code for the extraction,
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tU9NC4JAEP0rg6cWbNXNU/fs1AfYRdTDpmst6Rir0tavb1cJIqhbA2/mwZvHm0lTZytukLTq4rhO4AUe81loqQWBSKquNx1LiScKK82bay06kFjUQymWGXIK1bgkJg1m8W4/9/2AZHik0ImixfJTZEYsKPRnqd609SBLjoWADceB12ZHUq21GZJCcjdlqeGPqZzc/XY+syAQT+l/uP9H9sKCwGF8rnpFz6JWNTDZQ2PPnw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, Deficency = _t, #"# Finding" = _t]),
#"Run Python script" = Python.Execute("import re#(lf)dataset['Procedure'] = dataset['# Finding'].apply(lambda string: '\n'.join([m.group() for m in re.finditer(r'(?<=\()[^)]+', string)]))",[dataset=Source]),
dataset = #"Run Python script"{[Name="dataset"]}[Value]
in
dataset
Expertise = List.Accumulate( {Days as from Today}, {Skills and Knowledge}, (Current, Everyday) => Current & Day.LeanAndPractise(Everyday) ) |
Thanks all. I'm still new to PBI and am still in the one-step fix-all mindset. Makes complete sense to split the delimiters up into separate steps.
@tackytechtom replacing the "i" with an asterisk or another bulletpoint option would address that limitation. Yes, there are multiple other fields I'd need to manage, but this was my biggest hurdle to overcome.
I'll do some testing next week to confirm these will work. I'm also open to any other suggestions as well.
Hi @scoutmedic ,
How about this solution in Power Query?
Before:
After:
Here the code in Power Query M that you can paste into the advanced editor (if you do not know, how to exactly do this, please check out this quick walkthrough)
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nVDLDoIwEPyVTU82wfLQk3f15CPBC7EcKlRthMXwiOjX2wImhIgHN5l2NzO7ne3xSLbyAUGW34hFXNu1Pcebm9SAwkrlRalPjBVeGCxrkd4TWYDCKKliueAoGJwbkWw5mPi7/dRxdDPHE4NCRhnGQ9YzbMSgvKq8R64rFQuMJGwEViIxIsXquja3YhA8dTS5Ll5taJ/dgxy72RwHc0hoje3pGVDwW5fji8L/m/Lyi8cflmYGFA7N15w/jiarLE+h7Z5TLenXJAzf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, #"Deficency #" = _t, Finding = _t, Procedure = _t]), #"Added Finding" = Table.AddColumn(Source, "Finding_New", each Text.Trim(Text.AfterDelimiter([Finding], ") "))), #"Removed Columns" = Table.RemoveColumns(#"Added Finding",{"Finding"}), #"Split Column by Delimiter a." = Table.SplitColumn(#"Removed Columns", "Finding_New", Splitter.SplitTextByDelimiter("a. ", QuoteStyle.Csv), {"Finding", "Finding.a"}), #"Split Column by Delimiter b." = Table.SplitColumn(#"Split Column by Delimiter a.", "Finding.a", Splitter.SplitTextByDelimiter("b. ", QuoteStyle.Csv), {"Finding.a", "Finding.b"}), #"Split Column by Delimiter c." = Table.SplitColumn(#"Split Column by Delimiter b.", "Finding.b", Splitter.SplitTextByDelimiter("c. ", QuoteStyle.Csv), {"Finding.b", "Finding.c"}), #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter c.", {"Location", "Date Start", "Deficency #", "Procedure"}, "Attribute", "Finding_new"), #"Added Procedure_new" = Table.AddColumn(#"Unpivoted Columns", "Procedure_new", each if Text.Contains([Finding_new], "(") and Text.Contains([Finding_new], ")") then Text.Middle( [Finding_new], Text.PositionOf([Finding_new], "(") + 1, Text.PositionOf([Finding_new], ")") - Text.PositionOf([Finding_new], "(") - 1 ) else null), #"Removed Other Columns" = Table.SelectColumns(#"Added Procedure_new",{"Location", "Date Start", "Deficency #", "Finding_new", "Procedure_new"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Finding_new", "Finding"}, {"Procedure_new", "Procedure"}}) in #"Renamed Columns"
Note, the solution above is not very dynamic and you would need to add additional steps for the other alphabetical bullets like d., e., f. etc. This approach however has its limitations as soon as you get the to the letter i. Additionally, I made a few other assumption based on the data provided. My guts tell me that this approach won't take you all the way since your real data probably has some stuff in it which this solution does not take into consideration (yet). Still, it might get you closer 🙂
Hope this helps!
/Tom
https://www.tackytech.blog/
https://www.instagram.com/tackytechtom/
Did I answer your question❓➡️ Please, mark my post as a solution ✔️ |
Also happily accepting Kudos 🙂 |
Feel free to connect with me on LinkedIn! |
#proudtobeasuperuser |
Hi,
You can use following, just for testing create a blank query in power Query and put this within the advaned editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tVDLDoIwEPyVTU+QYHnIybt48pHghQCHCkUbYTE8Ivr1toAJIerNTaadZrbT2YYh2fE7BGV1JQaxTdt0LMdVVEEHT1R1I1dMBZ4prDtW3HJeg8Akb1O+ipBRyPomPmig+fvDwrJsPcIThZonJaZz0ZFiQqG5iGqibVqRMkw4bBm2LJc9gnZdJzdBIXjIUlTy51Ay4vhWhKNvhDMTEhvfRnQUdPCHhH+Y8WO+H4GWCjoc+1/J3nk0r6wKGO67ynN6JnH8Ag==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Location = _t, #"Date Start" = _t, Deficency = _t, #"# Finding" = _t, Procedure = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}, {"Date Start", type date}, {"Deficency", Int64.Type}, {"# Finding", type text}, {"Procedure", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"# Finding", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "# Finding"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"# Finding", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Procedure"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns", "# Finding", Splitter.SplitTextByDelimiter(" (", QuoteStyle.Csv), {"# Finding.1", "# Finding.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"# Finding.1", type text}, {"# Finding.2", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type2",")","",Replacer.ReplaceText,{"# Finding.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Replaced Value",{{"# Finding.2", "Procedure"}})
in
#"Renamed Columns"
What is does:
1) I use split by delimiter and use linefeed as delimiter (At the bottom of the dialog, expand Advanced options and check Split using special characters)
2) You can delete Procedure, as you can get this information from the new column by splitting by delimiter " ("
3) with the new column, you get replace ")" with nothing and you have your result
Regards,
Oktay
Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.
@OktayPamuk80 that might work, but I would need to keep the "i", "ii", and "iii" with their finding. Thoughts?
Yes, I didn't notice that one. However, glad Tom's solution worked out.
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
94 | |
92 | |
83 | |
71 | |
49 |
User | Count |
---|---|
143 | |
120 | |
110 | |
59 | |
57 |