The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hello,
I'm working for a consultancy company where we've been using Synapse for a while and have developed a set of notebooks to create and maintain a lakehouse following the medallion architecture. This means that our notebooks contain business logic (in the form of SQL queries that define the silver and gold layer tables) as well as generic notebooks that extract, transform (based on the query), and load the data to the silver and gold layers.
With the move to Fabric approaching, I'm exploring how we should implement our current code into Fabric.
According to Microsoft documentation: "The Microsoft Fabric notebook is a primary code item for developing Apache Spark jobs and machine learning experiments."
https://learn.microsoft.com/en-us/fabric/data-engineering/how-to-use-notebook
And from earlier posts on this forum:
https://community.fabric.microsoft.com/t5/Data-Engineering/Understanding-the-Use-Cases-for-Spark-Job...
The short answer seems to be: notebooks are for prototyping, and Spark job definitions are for production loads.
Based on what I've read so far, I'm considering moving all the generic code from the notebooks into Spark job definitions and structuring them into functions. This would be a very welcome upgrade from notebooks, as it would make the code more maintainable and even better testable.
However, I still don't have a clear picture of how we can create a development cycle that allows us to transition from notebooks to Spark jobs for our business logic. As mentioned before, our business logic consists of SparkSQL queries in a notebook that, for example, query the bronze layer and transform the data into a structure for the silver layer. This notebook passes the query to a generic notebook, which performs the ETL.
An advantage of this method is that it provides developers with a lot of flexibility in developing and testing queries. Once the query is ready for production, we simply add this notebook to our nightly run. Another advantage of using notebooks is that updating the query after release is as simple as opening the notebook and making edits. Of course, we are using branches and a DTAP environment for development, but the use of notebooks offers a flexible way of developing business logic.
So I'm curious about Microsoft's view on the development of business logic with Spark job definitions. Is there a practical workflow for our use case?
A few scenarios that come to mind are:
Looking forward to your suggestions!
Got an additional response through another fabric representative at Microsoft.
He stated that there is no difference in performance and monitoring capabilities. The choice of either should be based upon the use case. So we're going for a combination of notebooks for the business logic and with supporting functions developed through the vscode extention for fabric.
Hi @Broeks,
If you want to develop notebook on local device vscode, you need to restore the environment and built-in libraries and take care about the limitations about library usages in vscode extensions:
VS Code extension overview - Microsoft Fabric | Microsoft Learn
Regards,
Xiaoxin Sheng
Thanks for your response!
Looking at the differences between spark job defintions and notebooks as described in the post: Solved: Re: Understanding the Use Cases for Spark Job Defi... - Microsoft Fabric Community, notebooks could have a downside in performance and/or monitoring/logging capability's. Do these differences still excist? As the notebooks seems to be under constant development. And if so, can you cleariffy these differences?
And are there scenario's I'm forgetting?
For example a view in the lakehouse with our business logic, that can be accessed and processed by spark.
Hi @Broeks,
I think the second one should be better, the spark job definition not suitable for handle these logic operations. Collections these processing and functionality to create libraries will better to manage and use them.
Regards,
Xiaoxin Sheng