The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Data plays an important role in decision making, and organizations that can harness its power can gain a competitive edge in the market. However, data alone is not enough. To extract value from data, organizations need to build end-to-end data analysis pipelines that can handle the entire data lifecycle, from data ingestion, data cleaning, data transformation, to data visualization and analysis. In this blog post, we will explore why end-to-end data analysis is important for organizations and how it can help them achieve their goals.
Contoso Cuisines, a fictional company that sells food products around the world, wants to analyze their revenue performance based on their different products, the performance of their employees, and the impact of the number of days to ship a product on the revenue. To do this, they need to build an end-to-end data analysis pipeline that can handle the following steps:
By following these steps, Contoso Cuisines can leverage their data to gain a better understanding of their business performance and identify opportunities for improvement or growth.
The foundation of Microsoft Fabric is a Lakehouse, which is built on top of the OneLake scalable storage layer and uses Apache Spark and SQL compute engines for big data processing. A Lakehouse is a unified platform that combines: The flexible and scalable storage of a data lake and the ability to query and analyze data of a data warehouse.
Note: Make sure that the Data Destination is set to your Lakehouse for all your tables/queries as this will send the data into your Lakehouse once you publish the Dataflow.
Now that you have ingested your data you have to do a bit of cleaning first before you can visualize your data so you can have an accurate analysis for this scenario.
To achieve this, you can use Notebooks by following these steps:
# Load Data
df_order_details = spark.sql("SELECT * FROM northwind_lkh.Order_Details LIMIT 1000")
# Display Dataframe summary
display(df_order_details, summary=True)
df_order_details_pandas = df_order_details.toPandas()
def clean_data(df_order_details_pandas):
# Drop rows with missing data in column: 'ShippedDate'
df_order_details_pandas = df_order_details_pandas.dropna(subset=['ShippedDate'])
return df_order_details_pandas
df_order_details_clean = clean_data(df_order_details_pandas.copy())
display(df_order_details_clean)
Note: You removed the missing values for ShippedDate and it automatically removed missing values for Days to Ship as well because they are related.
Once that’s done, you can use the new and cleaned Dataframe to create a new table in your Lakehouse with the cleaned data. You can achieve this by running this script in a new code cell.
table_name = "Order_Details_clean"
sparkDF=spark.createDataFrame(df_order_details_clean)
sparkDF.write.mode("overwrite").format("delta").save(f"Tables/{table_name}")
print(f"Spark dataframe saved to delta table: {table_name}")
An additional requirement for this scenario is to ensure that the company is able to see the revenue generated per order made.
To achieve this, you will need measures which you will add by following these steps:
Note: you have to switch from the Data view to the Model view to achieve this.
End-to-end data analysis can help organizations achieve their strategic goals, such as increasing revenue, reducing costs, improving efficiency, enhancing customer satisfaction, gaining competitive advantage, innovating new products and services, etc. By building end-to-end data analysis pipelines, organizations can leverage their data as a strategic asset and a source of competitive advantage.
Stay tuned for Part 2!!
Now that you have went through an example of building a simple end-to-end solution with Fabric, you’re tasked with:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.