Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
Hi,
I have a table that I download from a Salesforce report. One of the columns is the Saleforce case number and another one ('Defect')is a free text which may contain the string 'ABC-number', where number is 1-99999. There may be some other text before the string and/or after.
I wrote Python script that extracts the required string via a regexp and discards all other text, and puts it in a new column.
This works fine in most cases. However, in a few rows of the table, the text contains newlines, like line 3 here:
Serial | Case number | Defect | Comment |
1 | 1234 | ABC-123 | no problem, ACBDEF-123 |
2 | 1235 | ABC-123,XYZ-123 | no newline |
3 | 1277 | ABC-201 DEF-777 | 1 newline |
4 | 1345 | ABC-786 | no newline |
When the text in Defect column contains one or more newline characters, the dataframe that is provided to the Python script looks like this:
Case number | Defect | Comment |
1277 | ABC-201 | 'Defect' column was truncated... |
DEF-777 | null | ...Text after newline created this new dataframe row, with invalid data in 'Case Number' column and null in all other columns |
1345 | ABC-786 | Good row, not corrupted |
My guess is that there is a bug in the Power BI code that converts the table data to a Pandas dataframe and it barfs on newline characters.
Has anyone seen that?
Thanks,
Amir
Solved! Go to Solution.
The response from Microsoft is that Power BI was not designed to handle embedded newlines - see here:
So the only solution is to "sanitize" the data before invoking Python or R.
To do that, use the Table.ReplaceValue() in M Query, see below. All these actions can be invoked from the GUI.
#"Duplicated Column" = Table.DuplicateColumn(#"some-previous-step", "Defect", "Defect - Copy")
// Santize the new column
#"Replaced Value" = Table.ReplaceValue(#"Duplicated Column","#(lf)","___",Replacer.ReplaceText,{"Defect-Copy"}),#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","#(cr)","___",Replacer.ReplaceText,{"Defect-Copy"}),
// Must remove the original bad column
#"Removed Columns1" = Table.RemoveColumns(#"Replaced Value1",{"Defect"}),
// Rename the new column
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"Defect-Copy", "DefectSanitized"}})
The response from Microsoft is that Power BI was not designed to handle embedded newlines - see here:
So the only solution is to "sanitize" the data before invoking Python or R.
To do that, use the Table.ReplaceValue() in M Query, see below. All these actions can be invoked from the GUI.
#"Duplicated Column" = Table.DuplicateColumn(#"some-previous-step", "Defect", "Defect - Copy")
// Santize the new column
#"Replaced Value" = Table.ReplaceValue(#"Duplicated Column","#(lf)","___",Replacer.ReplaceText,{"Defect-Copy"}),#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","#(cr)","___",Replacer.ReplaceText,{"Defect-Copy"}),
// Must remove the original bad column
#"Removed Columns1" = Table.RemoveColumns(#"Replaced Value1",{"Defect"}),
// Rename the new column
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"Defect-Copy", "DefectSanitized"}})
To add an interesting (and relevant) data point:
I fed the same table into an R script, using this simple-minded script (which does not change any data):
new_ds <- dataset
new_ds$DefectCopy <- new_ds$Defect
And the resulting dataset is similar:
1. The field 'Defect' now contains only the data before the first newline
2. New "dummy" rows are added after this row, each containing the part from the previous newline to the next.
3. In these "dummy" rows, all other columns, including 'Defect', are empty
So this adds weight to my theory that the Power Bi code that parses the table and generates a Python/R dataframe cannot handle newlines embedded in text columns.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Ask questions in Eventhouse and KQL, Eventstream, and Reflex.
User | Count |
---|---|
40 | |
26 | |
22 | |
21 | |
16 |