Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
Anonymous
Not applicable

Pandas dataframe passed to Python script is corrupted

Hi,

I have a table that I download from a Salesforce report. One of the columns is the Saleforce case number and another one ('Defect')is a free text which may contain the string 'ABC-number', where number is 1-99999. There may be some other text before the string and/or after.

I wrote Python script that extracts the required string via a regexp and discards all other text, and puts it in a new column. 

This works fine in most cases. However, in a few rows of the table, the text contains newlines, like line 3 here:

SerialCase numberDefectComment
11234ABC-123no problem, ACBDEF-123
21235ABC-123,XYZ-123no newline
31277

ABC-201

DEF-777

1 newline
41345

ABC-786

no newline

 

When the text in Defect column contains one or more newline characters, the dataframe that is provided to the Python script looks like this:

Case numberDefectComment
1277ABC-201'Defect' column was truncated...
DEF-777null...Text after newline created this new dataframe row, with invalid data in 'Case Number' column and null in all other columns
1345ABC-786Good row, not corrupted

 

My guess is that there is a bug in the Power BI code that converts the table data to a Pandas dataframe and it barfs on newline characters.

 

Has anyone seen that?

 

Thanks,

Amir

1 ACCEPTED SOLUTION
Anonymous
Not applicable

The response from Microsoft is that Power BI was not designed to handle embedded newlines - see here:

https://community.powerbi.com/t5/Issues/Python-pandas-R-dataframes-passed-to-Python-R-script-are/idc....

 

So the only solution is to "sanitize" the data before invoking Python or R.

To do that, use the Table.ReplaceValue() in M Query, see below. All these actions can be invoked from the GUI.

 

#"Duplicated Column" = Table.DuplicateColumn(#"some-previous-step", "Defect", "Defect - Copy")
// Santize the new column
#"Replaced Value" = Table.ReplaceValue(#"Duplicated Column","#(lf)","___",Replacer.ReplaceText,{"Defect-Copy"}),#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","#(cr)","___",Replacer.ReplaceText,{"Defect-Copy"}),
// Must remove the original bad column
#"Removed Columns1" = Table.RemoveColumns(#"Replaced Value1",{"Defect"}),
// Rename the new column
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"Defect-Copy", "DefectSanitized"}})

View solution in original post

3 REPLIES 3
Anonymous
Not applicable

The response from Microsoft is that Power BI was not designed to handle embedded newlines - see here:

https://community.powerbi.com/t5/Issues/Python-pandas-R-dataframes-passed-to-Python-R-script-are/idc....

 

So the only solution is to "sanitize" the data before invoking Python or R.

To do that, use the Table.ReplaceValue() in M Query, see below. All these actions can be invoked from the GUI.

 

#"Duplicated Column" = Table.DuplicateColumn(#"some-previous-step", "Defect", "Defect - Copy")
// Santize the new column
#"Replaced Value" = Table.ReplaceValue(#"Duplicated Column","#(lf)","___",Replacer.ReplaceText,{"Defect-Copy"}),#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","#(cr)","___",Replacer.ReplaceText,{"Defect-Copy"}),
// Must remove the original bad column
#"Removed Columns1" = Table.RemoveColumns(#"Replaced Value1",{"Defect"}),
// Rename the new column
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"Defect-Copy", "DefectSanitized"}})
Anonymous
Not applicable

I have submitted an issue, here 

Anonymous
Not applicable

To add an interesting (and relevant) data point:

I fed the same table into an R script, using this simple-minded script (which does not change any data):

 

new_ds <- dataset
new_ds$DefectCopy <- new_ds$Defect

 

And the resulting dataset is similar:

1. The field 'Defect' now contains only the data before the first newline

2. New "dummy" rows are added after this row, each containing the part from the previous newline to the next.

3. In these "dummy" rows, all other columns, including 'Defect', are empty

 

So this adds weight to my theory that the Power Bi code that parses the table and generates a Python/R dataframe cannot handle newlines embedded in text columns.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.