Solved: Automating Data Wrangler output

ccornelia · ‎02-21-2024

Hi,

Cornelia here. I am really excited to participate in this Hackathon!😁

I am trying to use Data Wrangler in order to generate a summary of a dataset.

I can save the csv or the generated Python code by clicking on specific buttons but I would need to automate this code or csv generation inside a data pipeline. Is there such a posibility or a workaround to use Data Wrangler in this way?

Thanks!

Anonymous · ‎02-21-2024

Hi Cornelia,

If I understand you correctly, you'd like to describe a dataset automatically/programatically, the way data wrangler does in python notebooks. I don't know that there is an api to use the data wrangler programatically.

what I can suggest is either using the dataframe description capabilites in spark:

https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.describe.html

or generate the description directly in the dataset (assuming we're talking about pbi datasets = semantic models) using MCode

https://learn.microsoft.com/en-us/powerquery-m/table-schema

https://learn.microsoft.com/en-us/powerquery-m/table-profile

hope this helps!

View solution in original post

ccornelia · ‎02-22-2024

Thank you for your help! 😁

Even if Data Wrangler seems pretty good at generating python code for describing data, I ended up creating some custom pyspark functions - easier to modify and automate.

Anonymous · ‎02-21-2024