The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi,
Cornelia here. I am really excited to participate in this Hackathon!😁
I am trying to use Data Wrangler in order to generate a summary of a dataset.
I can save the csv or the generated Python code by clicking on specific buttons but I would need to automate this code or csv generation inside a data pipeline. Is there such a posibility or a workaround to use Data Wrangler in this way?
Thanks!
Solved! Go to Solution.
Hi Cornelia,
If I understand you correctly, you'd like to describe a dataset automatically/programatically, the way data wrangler does in python notebooks. I don't know that there is an api to use the data wrangler programatically.
what I can suggest is either using the dataframe description capabilites in spark:
https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.describe.html
or generate the description directly in the dataset (assuming we're talking about pbi datasets = semantic models) using MCode
https://learn.microsoft.com/en-us/powerquery-m/table-schema
https://learn.microsoft.com/en-us/powerquery-m/table-profile
hope this helps!
Thank you for your help! 😁
Even if Data Wrangler seems pretty good at generating python code for describing data, I ended up creating some custom pyspark functions - easier to modify and automate.
Hi Cornelia,
If I understand you correctly, you'd like to describe a dataset automatically/programatically, the way data wrangler does in python notebooks. I don't know that there is an api to use the data wrangler programatically.
what I can suggest is either using the dataframe description capabilites in spark:
https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.describe.html
or generate the description directly in the dataset (assuming we're talking about pbi datasets = semantic models) using MCode
https://learn.microsoft.com/en-us/powerquery-m/table-schema
https://learn.microsoft.com/en-us/powerquery-m/table-profile
hope this helps!
Once you have the Python code generated with data wrangler you can put it in a cell of a notebook and from the notebook schedule a data pipeline run of the notebook.Run add to pipeline in Microsoft Fabric notebooks