Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredPower BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.
Hi,
I've connected to Excel as source data to export webchat transcripts from customers.
Each new line within the transcript doesn't actually have a character that I can see:
I've managed to extract the operator's name, customer name and been able to extract the first response from the customer to explain why they are getting in touch:
= Table.AddColumn(#"Inserted Text Before Delimiter", "Customer_enquiry", each Text.BetweenDelimiters([Transcript],"- " & [Customer_name] & " - "," - " & [Operator_name]))
What I'm finding is that sometimes the customer will press enter multiple times so what I'm receiving is a timestamp of each carriage.
What I'd like is the message as a whole without the timestamp and the customer's name:
"I have some questions
I've moved in a new pr
How can I register for t..."
I've managed to remove the last timestamp before the agent responds by extracting between delimiters using #(lf) but this is mid-string.
Would doing a find [customer_name] & replace with blank be sufficient? A blanket removal of numbers isn't possible.
Thanks for any help.
Hi @Anonymous ,
Based on your description, it seems you've made good progress with extracting certain elements using `Text.BetweenDelimiters`. To address the issue of multiple timestamps caused by the customer pressing enter multiple times, we can follow a few steps to clean up the data.
1.If the messages are split across multiple rows, you may want to combine them into a single text block for each conversation. You can use the `Group By` feature in Power Query to group messages by conversation ID (or a similar identifier), and then use the `Text.Combine` function to concatenate the messages into a single text field.
2.To remove timestamps, you can use the `Text.Select` function to keep only the characters that are not part of the timestamp pattern. For example, if your timestamps are always in the format `[HH:MM AM/PM]`, you can create a custom column that excludes these patterns. Alternatively, if the timestamp format is not consistent, you might need to use a combination of `Text.Start`, `Text.End`, `Text.Middle`, `Text.Length`, and `Text.PositionOf` functions to dynamically locate and remove the timestamps.
Here is the sample M code you can try
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
GroupedMessages = Table.Group(Source, {"ConversationID"}, {"CombinedMessage", each Text.Combine([MessageColumn], " "), type text}),
RemovedTimestamps = Table.TransformColumns(GroupedMessages, {"CombinedMessage", each Text.Select(_, {"a".."z", "A".."Z", "0".."9", " ", ".", ",", "!", "?"}), type text}),
CleanedText = Table.TransformColumns(RemovedTimestamps, {"CombinedMessage", each Text.Trim(Text.Clean(_)), type text})
in
CleanedText
Best regards,
Albert He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
User | Count |
---|---|
84 | |
79 | |
69 | |
46 | |
43 |
User | Count |
---|---|
106 | |
50 | |
49 | |
40 | |
39 |