Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
I have a column that needs some advanced parsing, that seems to be best done with Python. However, there is another column that I am not touching in Python that is being altered. Can anyone explain what is going on under the hood?
# 'dataset' holds the input data for this script
import pandas as pd
ds = dataset
ds['ROAD_NAME_CLEAN'] = ds['ROAD_NAME'].str.extract(r'(\d+\s+)?([NnEeSsWw](\.\s|\s|\.))?([^\(]*)(\s?\(.*\))?')[3]
ds['REFERENCE_ROAD_NAME_CLEAN'] = ds['REFERENCE_ROAD_NAME'].str.extract(r'(\d+\s+)?([NnEeSsWw](\.\s|\s|\.))?([^\(]*)(\s?\(.*\))?')[3]
dataset = ds
Before:
After:
@maiios - Sorry, what is being affected that shouldn't? It's hard to compare the pictures.
Sorry... the CENSUS_TRACT is chaning from a string to a floating point, even though the column is still a string. Basically, the CENSUS_TRACT is supposed to be a six digit number, but the decimal, and dropping the leading zeros causes issues.
I could reformat the string, but I want to understand why its happening.
@dm-p You have any Python skillz??
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!