The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
In order to get started let's prepare our environment. To enable communication between Databricks and Fabric, the first step is to create an Azure Databricks Premium Tier resource and the second step is to ensure two things on our cluster:
1) Use an “unrestricted” or “power user compute” policy.
2) Make sure that Databricks can pass our credentials through Spark.
This can be enabled in the advanced options.
NOTE: I won’t go into further details about cluster creation. I’ll leave the rest of the processing options for you to explore, or I assume you’re already familiar with them if you’re reading this post. If you can't see those settings, make sure to turn off the "Simple form".
Once our cluster is created, we’re going to create a notebook and start reading data in Fabric.
We’ll achieve this using ABFS (Azure Blob File System), which is an open-format address whose driver is included in Azure Databricks.
The address should be composed of something similar to the following string:
oneLakePath ='abfss://myWorkspaceId@onelake.dfs.fabric.microsoft.com/myLakehouse.lakehouse/Files/'
Knowing this path, we can start working as usual.
Let’s look at a simple notebook to read a Parquet file in Fabric Lakehouse:
Thanks to the cluster configuration, the processes are as simple as: spark.read
Writing is just as simple.
Starting with a cleanup of unnecessary columns and using a simple: [frame].write we’ll have a clean silver table. We then go to Fabric, and we’ll find it in our Lakehouse.
This concludes our Databricks processing in Fabric’s Lakehouse, but not the article. We haven’t yet talked about the other type of storage in the blog, but we’re going to mention what’s relevant to this post.
Fabric Warehouses are also built with a next-generation traditional lake structure. Their main difference is that they offer a 100% SQL-based user experience, as if we were working in a regular database. However, behind the scenes, we’ll find Delta as a Spark Catalog or Metastore.
The path should look something like this:
path_dw = "abfss://WorkspaceName@onelake.dfs.fabric.microsoft.com/WarehouseName.Datawarehouse/Tables/dbo/"
Considering that Fabric aims to store Delta format content in both its Lakehouse Spark Catalog (tables) and its Warehouse, we’ll read it as shown in the following example:
Now this does conclude our article, showing how we can use Databricks to work with Fabric’s storage options.
Original post from LaDataWeb in spanish
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.