Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
ccornelia
Frequent Visitor

Automating Data Wrangler output

Hi, 

Cornelia here. I am really excited to participate in this Hackathon!😁

 

I am trying to use Data Wrangler in order to generate a summary of a dataset.

I can save the csv or the generated Python code by clicking on specific buttons but I would need to automate this code or csv generation inside a data pipeline. Is there such a posibility or a workaround to use Data Wrangler in this way? 

 

Thanks!

1 ACCEPTED SOLUTION
alxdean
Advocate V
Advocate V

Hi Cornelia, 

If I understand you correctly, you'd like to describe a dataset automatically/programatically, the way data wrangler does in python notebooks. I don't know that there is an api to use the data wrangler programatically. 

what I can suggest is either using the dataframe description capabilites in spark:

https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.describe.html

or generate the description directly in the dataset (assuming we're talking about pbi datasets = semantic models) using MCode

https://learn.microsoft.com/en-us/powerquery-m/table-schema

https://learn.microsoft.com/en-us/powerquery-m/table-profile 

 

hope this helps!

 

View solution in original post

3 REPLIES 3
ccornelia
Frequent Visitor

Thank you for your help! 😁

 

Even if Data Wrangler seems pretty good at generating python code for describing data, I ended up creating some custom pyspark functions - easier to modify and automate.

alxdean
Advocate V
Advocate V

Hi Cornelia, 

If I understand you correctly, you'd like to describe a dataset automatically/programatically, the way data wrangler does in python notebooks. I don't know that there is an api to use the data wrangler programatically. 

what I can suggest is either using the dataframe description capabilites in spark:

https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.DataFrame.describe.html

or generate the description directly in the dataset (assuming we're talking about pbi datasets = semantic models) using MCode

https://learn.microsoft.com/en-us/powerquery-m/table-schema

https://learn.microsoft.com/en-us/powerquery-m/table-profile 

 

hope this helps!

 

imejiauseche
Employee
Employee

Once you have the Python code generated with data wrangler you can put it in a cell of a notebook and from the notebook schedule a data pipeline run of the notebook.
Run add to pipeline in Microsoft Fabric notebooksRun add to pipeline in Microsoft Fabric notebooks

 

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

Top Kudoed Authors