March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi Community,
In Microsoft fabric i have lakehouse that has the landing raw data from other source, Now i want to build aggregate tables on top of this data in a warehouse using notebook.
I have this lakehouse and warehouse in the same fabric workspace only.
I want to perform a Full Load(Truncate and Load) operation in the warehouse using the notebook how to do?
Can any one say how to conncet to the warehouse in fabric using notebook and write the data to it.
Solved! Go to Solution.
Hi,
Here is one example I have used.
First, create SP and make sure it has right permissions and workspace / warehouse access.
below is a example code I have used to get access using a service principal (Followed this article: https://debruyn.dev/2023/connect-to-fabric-lakehouses-warehouses-from-python-code/). You can modify this and use it in fabric notebook & also a goood idea is to save and retrieve service principal secret from Key Vault.
//Alexander
import logging
import pyodbc
from azure.identity import ClientSecretCredential
import struct
from itertools import chain, repeat
import os
def get_database_connection():
try:
tenant_id = os.getenv('TENANT_ID')
client_id=os.getenv('CLIENT_ID') #Service principal
client_secret=os.getenv('SP_SECRET') #Service principal key
sql_endpoint = os.getenv('YOUR_SQL_ENDPOINT')
database = os.getenv('DATABASE_NAME')
credential = ClientSecretCredential(tenant_id,client_id,client_secret)
token_object = credential.get_token("https://database.windows.net//.default") # Retrieve an access token valid to connect to SQL databases
token_as_bytes = bytes(token_object.token, "UTF-8") # Convert the token to a UTF-8 byte string
encoded_bytes = bytes(chain.from_iterable(zip(token_as_bytes, repeat(0)))) # Encode the bytes to a Windows byte string
token_bytes = struct.pack("<i", len(encoded_bytes)) + encoded_bytes # Package the token into a bytes object
connection_string = f"Driver={{ODBC Driver 18 for SQL Server}};Server={sql_endpoint},1433;Database={database};Encrypt=Yes;TrustServerCertificate=No"
attrs_before = {1256: token_bytes} # Attribute pointing to SQL_COPT_SS_ACCESS_TOKEN to pass access token to the driver
return pyodbc.connect(connection_string, attrs_before=attrs_before) #connection
except Exception as e:
logging.error(f"Error in get_database_connection: {str(e)}")
raise
Hi @Shanthan118 ,
Thanks for the reply from @AlexanderPowBI .
Yes, you can build an aggregate table on top of the warehouse, but once created, the aggregate table belongs to the warehouse, so it is not visible in the SQL endpoint of the lakehouse.
To schedule data transfer and keep the aggregate table updated, you can use Dataflow Gen2 in Microsoft Fabric.
After selecting Lakehouse in Get Data, you can perform aggregation, set the destination to Warehouse, and set a scheduled refresh after publish, so that when your Lakehouse is updated, the data will be refreshed to aggregate.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi @Shanthan118 ,
Thanks for the reply from @frithjof_v / @AlexanderPowBI .
Transferring data from Lakehouse to Warehouse using a notebook can be a little difficult, I have a more convenient solution here.
Once you're in the warehouse in Fabric, select the place I've highlighted.
Select the Lakehouse you want to connect to.
Now it's time to use your connected Lakehouse.
If you have any other questions please feel free to contact me.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi @v-huijiey-msft
First Thanks for the solution,
By the Solution you gave what i understood is I can connect directly to the lakehouse that has the tables required to the warehouse and build aggregate tables on top of that warehouse right.
Is my understanding correct?
if we create aggregate tabel can this table accessible from Lakehouse END Point.
By doing this how can we put schedule for data transferring to get data into aggregate tables and by doing this can i get data in to the warehouse from lakehouse like when there is a change in the data let us assume i have 20 records when i first connected and sometime later the record count is 40 now will these 40 records available in the warehouse table or not?
Just a heads up that Truncate is not a supported operation as of now:
https://learn.microsoft.com/en-us/fabric/data-warehouse/tsql-surface-area
For your question, I have used a stored procedured and scheduled in a pipeline for this data movement. However, you can do this:
//Alexander
Hi @AlexanderPowBI
Thanks for the solution,
Can you please provide any script or steps to connnect to warehouse using the service principal and the SQL connection string from fabric using Pyodbc.
Hi,
Here is one example I have used.
First, create SP and make sure it has right permissions and workspace / warehouse access.
below is a example code I have used to get access using a service principal (Followed this article: https://debruyn.dev/2023/connect-to-fabric-lakehouses-warehouses-from-python-code/). You can modify this and use it in fabric notebook & also a goood idea is to save and retrieve service principal secret from Key Vault.
//Alexander
import logging
import pyodbc
from azure.identity import ClientSecretCredential
import struct
from itertools import chain, repeat
import os
def get_database_connection():
try:
tenant_id = os.getenv('TENANT_ID')
client_id=os.getenv('CLIENT_ID') #Service principal
client_secret=os.getenv('SP_SECRET') #Service principal key
sql_endpoint = os.getenv('YOUR_SQL_ENDPOINT')
database = os.getenv('DATABASE_NAME')
credential = ClientSecretCredential(tenant_id,client_id,client_secret)
token_object = credential.get_token("https://database.windows.net//.default") # Retrieve an access token valid to connect to SQL databases
token_as_bytes = bytes(token_object.token, "UTF-8") # Convert the token to a UTF-8 byte string
encoded_bytes = bytes(chain.from_iterable(zip(token_as_bytes, repeat(0)))) # Encode the bytes to a Windows byte string
token_bytes = struct.pack("<i", len(encoded_bytes)) + encoded_bytes # Package the token into a bytes object
connection_string = f"Driver={{ODBC Driver 18 for SQL Server}};Server={sql_endpoint},1433;Database={database};Encrypt=Yes;TrustServerCertificate=No"
attrs_before = {1256: token_bytes} # Attribute pointing to SQL_COPT_SS_ACCESS_TOKEN to pass access token to the driver
return pyodbc.connect(connection_string, attrs_before=attrs_before) #connection
except Exception as e:
logging.error(f"Error in get_database_connection: {str(e)}")
raise
I don't think it's possible to write data to Fabric Data Warehouse by using Notebook.
I think you can only use T-SQL (scripts, stored procedures) inside the Data Warehouse interface, or use Data Pipeline or Dataflow Gen2.
However I am interested if anyone knows how to write data to Fabric Data Warehouse by using Notebook.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
7 | |
4 | |
2 | |
2 | |
1 |
User | Count |
---|---|
13 | |
7 | |
5 | |
4 | |
3 |