This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. We're covering it all. You won't want to miss it.
Learn moreDid you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now
In today’s world, cleaning and transforming data is incredibly important. Clean data means removing errors, duplicates, and irrelevant information to ensure its correct and complete, making the data more trustworthy for analysis. Transforming data means changing it into a format that is easy to use and analyze. Together, these steps improve data quality, leading to better insights and smarter decisions. Without clean and well-organized data, businesses might make poor choices based on faulty information, leading to missed opportunities. Investing in good data cleaning and transformation processes is crucial for any business that wants to use data to its advantage.
Cleaning and transforming data ensure accuracy and usability, enhancing decision-making. Reliable data leads to better insights and avoids missed opportunities in business.
Contoso Retailers, a fictitious retail company, is dedicated to applying data analysis techniques on their data to track sales trends and gain insights to implement new strategies based on those trends. They aim to make sure that they enhance the data quality by cleaning and transforming their data to prepare it for analysis, machine learning model training and visualization to gain insights on employee and sales performance over a period of time.
To achieve this, you can use Copilot for Data Factory to assist you in ingesting data, create new transformations on existing queries to add new columns for further analysis, clean data to remove empty values and create new queries with new synthetic data.
Before you can clean and transform data, you need to create a Dataflow Gen 2 that will assist you in ingesting your data. You can learn more about Dataflows and how to ingest data by referring to this training.
Add a column 'Gross Revenue' that is a product of 'UnitPrice' and 'OrderQty', the result is rounded to two decimal places.
Enhancing_data_quality_with_Copilot_for_Data_Factory
Example prompt: Add a column 'Discount Value' that is a product of 'Gross Revenue' and 'UnitPriceDiscount', the result is rounded to two decimal places.
Note: It’s important to transform your data based on your scenario and requirements.
When using Copilot for Data Factory or any other Copilots, it is important to remember it is Copilot not Autopilot which means you still have control over how you transform your data. For this requirement, we manually create a new query to update it later.
Example prompt: Add a step to the query to add date values to the 'DateKey' column starting from 1/1/2012 to 12/31/2013.
Example prompt: Add 1 to 'Year' if the 'Date' month is greater than or equal to 7 and keep as is if the 'Date' month is less than or equal to 6. Store the result in a new column 'Fiscal Year' and take the last two digits of the result combine them with the prefix 'FY'.
Example prompt: Add a column 'Quarter' with values as 'Q1' when the 'Month' is equal to 7 or 8 or 9, 'Q2' when the 'Month' is equal to 10 or 11 or 12, 'Q3' when the 'Month' is equal to 1 or 2 or 3, 'Q4' when the 'Month' is equal to 4 or 5 or 6.
Enhancing_data_quality_with_Copilot_for_Data_Factory
There’s more that you can do for a retail/sales scenario with Copilot for Data Factory, we provided a baseline of how you would approach data transformations and cleaning using Copilot. This can be applied to other scenarios; you would have to be mindful what transformations your scenario requires.
You can find a deep dive of using Copilot for Data Factory in our Copilot Learning Hub for Data professional's tutorial.
Resources
Start your Copilot learning journey
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.