Solved: Split column when date/time value occurs

jbetkowskawbd · ‎09-19-2024

Hello,

I have table where I have id column and all notes made for this ID with date when note was created. Notes are separated by blank line (enter).

My goal is to split this comment column, but spliting by line (native funkction in PowerBI power query) is too much, because comments can contain enters so then one comment is splitted also.

Is there any way to split anytime and only when date/time occurs?

This is what I have:

id	work notes
101	2024-05-08 14:42:24 - Jon Snow (Work notes) lowering priority 2024-05-08 13:47:01 - Daenerys Targaryen (Work notes) I want that dragon 2024-05-08 12:44:11 - Robert Baratheon (Work notes) Task is created by King of the Seven Kingdoms of Westeros
102	2023-12-05 14:50:19 - XYZ (Work notes) Resolving incident 2023-12-04 19:22:55 - ABC (Work notes) incident was acknowledged by XYZ 2023-12-04 19:22:47 - Service Account (Work notes) 2023-12-04 19:22:47 - Service Account (Work notes) Task is created by King of the Seven Kingdoms of Westeros 2023-12-04 19:22:46 - System (Work notes) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 2023-12-04 19:21:51 - System (Work notes) Task is created by System Administrator

This is what I want to achieve:

id	2024-05-08 14:42:24 - Jon Snow (Work notes) lowering priority
101	2024-05-08 13:47:01 - Daenerys Targaryen (Work notes) I want that dragon
101	2024-05-08 12:44:11 - Robert Baratheon (Work notes) Task is created by King of the Seven Kingdoms of Westeros
101	2023-12-05 14:50:19 - XYZ (Work notes) Resolving incident
102	2023-12-04 19:22:55 - ABC (Work notes) incident was acknowledged by XYZ
102	2023-12-04 19:22:47 - Service Account (Work notes)
102	2023-12-04 19:22:47 - Service Account (Work notes) Task is created by King of the Seven Kingdoms of Westeros
102	2023-12-04 19:22:46 - System (Work notes) Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
102	2023-12-04 19:21:51 - System (Work notes) Task is created by System Administrator

BeaBF · ‎09-19-2024

@jbetkowskawbd I've tried to adjust the code:

let
// Load your initial source data
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText("pZPJUttAEIZfpcunpMpOSUIOQTcIOWQ5ASmSUByamcZ0MZoWsxj8NnmWPBk9Nk5RoJADJ8uzfP+i1tnZpK7qyXTSVE07q+az6gPUbdc2XdPCDL6Ih2Mvt/DmVMI1eEkU307Op2cTveLklgL7BQyBJXBabTe2v4+ZO12721W1Mg+RPIVVhBMMCwwr8qP0z3CLPkG6wgQ24EL8S/ima9uuLvgjuaCQ4AADpiuScfgJxmvgCCYQJrJwsYKvJYlcqiDBMS3VVVmx0seyekoxUZC4BtRVs5HfmdWNOiiNzauu3lP5Hz9/jSoeURS3VOKf38DesCWfRgJtiC3Ue13TdPO5EvcPPo4StxTtKQKaa31Njuxik0Zt/Jfe7ir9mMKSDcG+MZIVNqb0esLr+n5k4d9W3hcrK73Vjzr4JkF3eIi5BytOAkROgD2lKRjxkUyilAOg5YGjKebIsW5GtWsFiHPsxYLyB728Lp9tyZsTOLxQPFDaoAl6XHgEdHyT8d1T798TkOdepaDn8qDhGfsp3GStyEtMIVugOwqGEybWGc7OYW9kI1QOceQivFbgQQ8Doebo1aJs8qhyeiZ9WBQwJwIOWX1ummAPgYZAV+StftGpLCzF5UHVSd1pD0AxEhh2btufxs1wmRes36cv/mDAoH9y+Kv66c7QkCiXrrUoMQbJ6HGTB7aYykXNNgRZz7FWXepUbZPdgKUNnYRLNoxgKVIou7244gZLbawlxYfyc/8s65MRqbt5/fKIjAzpw+l9W16UvhdMEibn5/c=", BinaryEncoding.Base64),
Compression.Deflate)
),
let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, #"work notes" = _t]
),

// Change data types of columns
#"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"work notes", type text}}),

// Add an index column to help with splitting later (optional)
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),

// Define a function to split based on the date-time pattern 'YYYY-MM-DD HH:MM:SS'
SplitByDateTime = (text as text) =>
let
// Split the text only when a full date-time stamp (YYYY-MM-DD HH:MM:SS) is found
splitText = Text.Split(text, "202") // This will split at any instance of '202'
in
List.Transform(splitText, each if Text.StartsWith(_, "4-") then "202" & _ else _),

// Apply the splitting function to the "work notes" column
#"Split Work Notes" = Table.TransformColumns(#"Added Index", {"work notes", each SplitByDateTime(_), type list}),

// Expand the split column into separate rows
#"Expanded Work Notes" = Table.ExpandListColumn(#"Split Work Notes", "work notes"),

// Trim any leading or trailing whitespace from the work notes
#"Trimmed Work Notes" = Table.TransformColumns(#"Expanded Work Notes", {{"work notes", Text.Trim, type text}}),

// Fill down the ID column to ensure every note has the correct ID
#"Filled Down ID" = Table.FillDown(#"Trimmed Work Notes", {"id"}),

// Remove the index column (optional)
#"Removed Index" = Table.RemoveColumns(#"Filled Down ID", {"Index"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Index", each [work notes] <> null and [work notes] <> "")

in
#"Filtered Rows"

if it's correct, please accept my answer as solution to help other users!

Thx,

BBF

View solution in original post

dufoq3 · ‎09-19-2024

Hi @jbetkowskawbd, different approach:

Output:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZNLU9swEMe/yk5O7UzCxMYpxTcoPfRxAjq0JRwWaQk7SFqjRyDfvqsYZtpMeuFmy9rf/yH5+nrSzJvJdNLO2242X8zmH6Hp+q7t2w5m8FUCXAR5gndXEh8gSKb0fhmcPFHksIIhskTOm2VYhr8Jh3131M8bJZwhBYqbBJcYVxg3FHZYX+AJQ4Z8jxlsxJWEXVjbd13fVNi53FLMcIoR8z3JLuoS0wNwAhMJM1m43cC36lLuFE9wQWtVrytWfKqrV5QyRUmTm2ntoR17OJw1rWrXHhbzvjlW4Z+/fu9onVMSt6504GDYUsgvvsfxDprjvm37xULHT04/7Yy/zmj4BGgetGNHdjWaVrW9rO5IWRcU12wIToyRooB/uW+benNxy7Bf8kOV3Ogev6P0XaKu8ZCKBytOIiTOgJ7yFIyERCZTLhHQ8sDJVBPkWD8mtWUFiEvyYkHJgw5va2RbE5UMDm8VD5RHNIHHVUBAx48FD6rTHxkosFc8eK4PGozRT+GxaPwgKcdigZ4pGs6YWa9YcQ69kRFeN3HiKral8qCbgVC9e7UlYwZVy1u5s0rFkgk4FvUzJuYAkYZI9xSs/ka5LqzFlUEVSR1pXqCUCAw799qTxipwV1asv0monmDAqC8lqtLnZ0NDplLb1CrEGCSjG00Z2GKuI5pkiLK9c1pmLUxVTXED1ux6pndsGMFSoli/enHVB9aSWCtJL/UWf7DnyJt+0fzvyPdcrpd9J7YegnaOWeLk5uYP", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, #"work notes" = _t]),
    Ad_Split = Table.AddColumn(Source, "Split", each Text.SplitAny([work notes], "#(lf)#(cr)"), type list),
    ExpandedSplit = Table.ExpandListColumn(Ad_Split, "Split"),
    RemovedColumns = Table.RemoveColumns(ExpandedSplit,{"work notes"}),
    Ad_DateTime = Table.AddColumn(RemovedColumns, "DateTime", each try DateTime.From(Text.BeforeDelimiter([Split], " ", 1)) otherwise null, type text),
    GroupedRows = Table.Group(Ad_DateTime, {"id", "DateTime"}, {{"Work Notes", each Text.Combine(Table.RemoveLastN(_, (x)=> Text.Trim(x[Split]) = "")[Split], "#(lf)"), type text}}, GroupKind.Local,
        (x,y)=> Number.From( y[id] <> x[id] or y[DateTime] is datetime ) ),
    RemovedColumns1 = Table.RemoveColumns(GroupedRows,{"DateTime"})
in
    RemovedColumns1

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

View solution in original post

dufoq3 · ‎09-19-2024

Hi @jbetkowskawbd, different approach:

Output:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZNLU9swEMe/yk5O7UzCxMYpxTcoPfRxAjq0JRwWaQk7SFqjRyDfvqsYZtpMeuFmy9rf/yH5+nrSzJvJdNLO2242X8zmH6Hp+q7t2w5m8FUCXAR5gndXEh8gSKb0fhmcPFHksIIhskTOm2VYhr8Jh3131M8bJZwhBYqbBJcYVxg3FHZYX+AJQ4Z8jxlsxJWEXVjbd13fVNi53FLMcIoR8z3JLuoS0wNwAhMJM1m43cC36lLuFE9wQWtVrytWfKqrV5QyRUmTm2ntoR17OJw1rWrXHhbzvjlW4Z+/fu9onVMSt6504GDYUsgvvsfxDprjvm37xULHT04/7Yy/zmj4BGgetGNHdjWaVrW9rO5IWRcU12wIToyRooB/uW+benNxy7Bf8kOV3Ogev6P0XaKu8ZCKBytOIiTOgJ7yFIyERCZTLhHQ8sDJVBPkWD8mtWUFiEvyYkHJgw5va2RbE5UMDm8VD5RHNIHHVUBAx48FD6rTHxkosFc8eK4PGozRT+GxaPwgKcdigZ4pGs6YWa9YcQ69kRFeN3HiKral8qCbgVC9e7UlYwZVy1u5s0rFkgk4FvUzJuYAkYZI9xSs/ka5LqzFlUEVSR1pXqCUCAw799qTxipwV1asv0monmDAqC8lqtLnZ0NDplLb1CrEGCSjG00Z2GKuI5pkiLK9c1pmLUxVTXED1ux6pndsGMFSoli/enHVB9aSWCtJL/UWf7DnyJt+0fzvyPdcrpd9J7YegnaOWeLk5uYP", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, #"work notes" = _t]),
    Ad_Split = Table.AddColumn(Source, "Split", each Text.SplitAny([work notes], "#(lf)#(cr)"), type list),
    ExpandedSplit = Table.ExpandListColumn(Ad_Split, "Split"),
    RemovedColumns = Table.RemoveColumns(ExpandedSplit,{"work notes"}),
    Ad_DateTime = Table.AddColumn(RemovedColumns, "DateTime", each try DateTime.From(Text.BeforeDelimiter([Split], " ", 1)) otherwise null, type text),
    GroupedRows = Table.Group(Ad_DateTime, {"id", "DateTime"}, {{"Work Notes", each Text.Combine(Table.RemoveLastN(_, (x)=> Text.Trim(x[Split]) = "")[Split], "#(lf)"), type text}}, GroupKind.Local,
        (x,y)=> Number.From( y[id] <> x[id] or y[DateTime] is datetime ) ),
    RemovedColumns1 = Table.RemoveColumns(GroupedRows,{"DateTime"})
in
    RemovedColumns1

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

jbetkowskawbd · ‎09-19-2024

You are awesome, thank you!

dufoq3 · ‎09-19-2024

You're welcome, enjoy 😉

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

BeaBF · ‎09-19-2024

@jbetkowskawbd I've achieved something similar i think, try with this M code, then we can adjust it:

let
// Load your initial source data
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText("pZPJUttAEIZfpcunpMpOSUIOQTcIOWQ5ASmSUByamcZ0MZoWsxj8NnmWPBk9Nk5RoJADJ8uzfP+i1tnZpK7qyXTSVE07q+az6gPUbdc2XdPCDL6Ih2Mvt/DmVMI1eEkU307Op2cTveLklgL7BQyBJXBabTe2v4+ZO12721W1Mg+RPIVVhBMMCwwr8qP0z3CLPkG6wgQ24EL8S/ima9uuLvgjuaCQ4AADpiuScfgJxmvgCCYQJrJwsYKvJYlcqiDBMS3VVVmx0seyekoxUZC4BtRVs5HfmdWNOiiNzauu3lP5Hz9/jSoeURS3VOKf38DesCWfRgJtiC3Ue13TdPO5EvcPPo4StxTtKQKaa31Njuxik0Zt/Jfe7ir9mMKSDcG+MZIVNqb0esLr+n5k4d9W3hcrK73Vjzr4JkF3eIi5BytOAkROgD2lKRjxkUyilAOg5YGjKebIsW5GtWsFiHPsxYLyB728Lp9tyZsTOLxQPFDaoAl6XHgEdHyT8d1T798TkOdepaDn8qDhGfsp3GStyEtMIVugOwqGEybWGc7OYW9kI1QOceQivFbgQQ8Doebo1aJs8qhyeiZ9WBQwJwIOWX1ummAPgYZAV+StftGpLCzF5UHVSd1pD0AxEhh2btufxs1wmRes36cv/mDAoH9y+Kv66c7QkCiXrrUoMQbJ6HGTB7aYykXNNgRZz7FWXepUbZPdgKUNnYRLNoxgKVIou7244gZLbawlxYfyc/8s65MRqbt5/fKIjAzpw+l9W16UvhdMEibn5/c=", BinaryEncoding.Base64),
Compression.Deflate)
),
let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, #"work notes" = _t]
),

// Change data types of columns
#"Changed Type" = Table.TransformColumnTypes(Source,{{"id", Int64.Type}, {"work notes", type text}}),

// Add an index column to help with splitting later (optional)
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),

// Define a function to split based on the date-time pattern
SplitByDateTime = (text as text) =>
let
// Split the text based on the date-time pattern
splitText = Text.Split(text, "202")
in
List.Transform(splitText, each "202" & Text.Trim(_)),

// Apply the splitting function to the "work notes" column
#"Split Work Notes" = Table.TransformColumns(#"Added Index", {"work notes", each SplitByDateTime(_), type list}),

// Expand the split column into separate rows
#"Expanded Work Notes" = Table.ExpandListColumn(#"Split Work Notes", "work notes"),

// Trim any leading or trailing whitespace from the work notes
#"Trimmed Work Notes" = Table.TransformColumns(#"Expanded Work Notes", {{"work notes", Text.Trim, type text}}),

// Fill down the ID column to ensure every note has the correct ID
#"Filled Down ID" = Table.FillDown(#"Trimmed Work Notes", {"id"}),

// Remove the index column (optional)
#"Removed Index" = Table.RemoveColumns(#"Filled Down ID", {"Index"})

in
#"Removed Index"

BBF

jbetkowskawbd · ‎09-19-2024

Hello,

thank you, solution is more or less sufficient and already helped me a lot.

The only one detail just for your information that comments can be also email with its details including date and time of sending so everytime note is an email it recognized that 202 part and is splitting email also. However. as I said this solution already helped me a lot in analyzing data.

BeaBF · ‎09-19-2024