Re: Python step is changing unrelated columns

maiios · ‎09-09-2020

I have a column that needs some advanced parsing, that seems to be best done with Python. However, there is another column that I am not touching in Python that is being altered. Can anyone explain what is going on under the hood?

# 'dataset' holds the input data for this script
import pandas as pd
ds = dataset
ds['ROAD_NAME_CLEAN'] = ds['ROAD_NAME'].str.extract(r'(\d+\s+)?([NnEeSsWw](\.\s|\s|\.))?([^\(]*)(\s?\(.*\))?')[3]
ds['REFERENCE_ROAD_NAME_CLEAN'] = ds['REFERENCE_ROAD_NAME'].str.extract(r'(\d+\s+)?([NnEeSsWw](\.\s|\s|\.))?([^\(]*)(\s?\(.*\))?')[3]
dataset = ds

Before:

After:

Greg_Deckler · ‎09-09-2020

@maiios - Sorry, what is being affected that shouldn't? It's hard to compare the pictures.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...

maiios · ‎09-09-2020

Sorry... the CENSUS_TRACT is chaning from a string to a floating point, even though the column is still a string. Basically, the CENSUS_TRACT is supposed to be a six digit number, but the decimal, and dropping the leading zeros causes issues.

I could reformat the string, but I want to understand why its happening.

Greg_Deckler · ‎09-11-2020

@dm-p You have any Python skillz??

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: DAX For Humans

DAX is easy, CALCULATE makes DAX hard...