Don't miss your chance to take the Fabric Data Engineer (DP-600) exam for FREE! Find out how by attending the DP-600 session on April 23rd (pacific time), live or on-demand.
Learn moreNext up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now
Hi,
I've connected to Excel as source data to export webchat transcripts from customers.
Each new line within the transcript doesn't actually have a character that I can see:
I've managed to extract the operator's name, customer name and been able to extract the first response from the customer to explain why they are getting in touch:
= Table.AddColumn(#"Inserted Text Before Delimiter", "Customer_enquiry", each Text.BetweenDelimiters([Transcript],"- " & [Customer_name] & " - "," - " & [Operator_name]))
What I'm finding is that sometimes the customer will press enter multiple times so what I'm receiving is a timestamp of each carriage.
What I'd like is the message as a whole without the timestamp and the customer's name:
"I have some questions
I've moved in a new pr
How can I register for t..."
I've managed to remove the last timestamp before the agent responds by extracting between delimiters using #(lf) but this is mid-string.
Would doing a find [customer_name] & replace with blank be sufficient? A blanket removal of numbers isn't possible.
Thanks for any help.
Hi @Anonymous ,
Based on your description, it seems you've made good progress with extracting certain elements using `Text.BetweenDelimiters`. To address the issue of multiple timestamps caused by the customer pressing enter multiple times, we can follow a few steps to clean up the data.
1.If the messages are split across multiple rows, you may want to combine them into a single text block for each conversation. You can use the `Group By` feature in Power Query to group messages by conversation ID (or a similar identifier), and then use the `Text.Combine` function to concatenate the messages into a single text field.
2.To remove timestamps, you can use the `Text.Select` function to keep only the characters that are not part of the timestamp pattern. For example, if your timestamps are always in the format `[HH:MM AM/PM]`, you can create a custom column that excludes these patterns. Alternatively, if the timestamp format is not consistent, you might need to use a combination of `Text.Start`, `Text.End`, `Text.Middle`, `Text.Length`, and `Text.PositionOf` functions to dynamically locate and remove the timestamps.
Here is the sample M code you can try
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
GroupedMessages = Table.Group(Source, {"ConversationID"}, {"CombinedMessage", each Text.Combine([MessageColumn], " "), type text}),
RemovedTimestamps = Table.TransformColumns(GroupedMessages, {"CombinedMessage", each Text.Select(_, {"a".."z", "A".."Z", "0".."9", " ", ".", ",", "!", "?"}), type text}),
CleanedText = Table.TransformColumns(RemovedTimestamps, {"CombinedMessage", each Text.Trim(Text.Clean(_)), type text})
in
CleanedText
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 48 | |
| 43 | |
| 39 | |
| 19 | |
| 17 |
| User | Count |
|---|---|
| 68 | |
| 63 | |
| 31 | |
| 30 | |
| 23 |