Learn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now
I have a column that needs some advanced parsing, that seems to be best done with Python. However, there is another column that I am not touching in Python that is being altered. Can anyone explain what is going on under the hood?
# 'dataset' holds the input data for this script
import pandas as pd
ds = dataset
ds['ROAD_NAME_CLEAN'] = ds['ROAD_NAME'].str.extract(r'(\d+\s+)?([NnEeSsWw](\.\s|\s|\.))?([^\(]*)(\s?\(.*\))?')[3]
ds['REFERENCE_ROAD_NAME_CLEAN'] = ds['REFERENCE_ROAD_NAME'].str.extract(r'(\d+\s+)?([NnEeSsWw](\.\s|\s|\.))?([^\(]*)(\s?\(.*\))?')[3]
dataset = ds
Before:
After:
@maiios - Sorry, what is being affected that shouldn't? It's hard to compare the pictures.
Sorry... the CENSUS_TRACT is chaning from a string to a floating point, even though the column is still a string. Basically, the CENSUS_TRACT is supposed to be a six digit number, but the decimal, and dropping the leading zeros causes issues.
I could reformat the string, but I want to understand why its happening.
@dm-p You have any Python skillz??
| User | Count |
|---|---|
| 13 | |
| 9 | |
| 6 | |
| 5 | |
| 5 |