March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
In Microsoft Fabric's module 'Synapse Data Engineering' we are able to create a Notebook that uses Python (more specifically the PySpark library) which we can use to process large volumes of Data.
How can I, programmatically in Notebook, connect to a SAP Hana database and fetch for example one table to insert to OneLake?
Here are some resources that I found but I still feel like it's not what I really want.
1. Efficient Data Ingestion with Spark and Microsoft Fabric Notebook | by Satya Nerurkar | Medium
(This one for example connects to some Azure Blob Storage? and not directly the SAP database.)
2. Use Apache Spark in Microsoft Fabric - Training | Microsoft Learn
(There's also this but this is a generic use of Apache Spark and not exactly the connection to SAP Hana.)
3. Accessing SAP HANA Data lake Relational Engine using Apache Spark | SAP Tutorials
(This one seems to be closer to what I was expecting to do, however the Step 1 seems unecessary for me, while Step 2 seems to be the actual CODE to make the connection to the database.
If you could provide any more resources or code that simply connect to the SAP Hana database to fetch a table and then upload it to OneLake I would appreciate.
Thank you in advance.
Solved! Go to Solution.
HI @SergioSurfer,
Perhaps you can try to use jdbc with sap driver to connect to the SAP HANA data source:
from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder \
.appName("SAP HANA") \
.getOrCreate()
# SAP HANA connection properties
hana_url = "jdbc:sap://<SAP_HANA_HOST>:<PORT>"
hana_properties = {
"user": "<USERNAME>",
"password": "<PASSWORD>",
"driver": "com.sap.db.jdbc.Driver"
}
# Load data from SAP HANA
df = spark.read.jdbc(url=hana_url, table="<TABLE_NAME>", properties=hana_properties)
# Show the data
df.show()
Regards,
Xiaoxin Sheng
HI @SergioSurfer,
Perhaps you can try to use jdbc with sap driver to connect to the SAP HANA data source:
from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder \
.appName("SAP HANA") \
.getOrCreate()
# SAP HANA connection properties
hana_url = "jdbc:sap://<SAP_HANA_HOST>:<PORT>"
hana_properties = {
"user": "<USERNAME>",
"password": "<PASSWORD>",
"driver": "com.sap.db.jdbc.Driver"
}
# Load data from SAP HANA
df = spark.read.jdbc(url=hana_url, table="<TABLE_NAME>", properties=hana_properties)
# Show the data
df.show()
Regards,
Xiaoxin Sheng
Hi @SergioSurfer,
is this something that could work for you:
Do note that this is the cloud version of SAP HANA. If you are using the on-premise version you should first install the Fabric gateway. Using the gateway in a notebook is not supported yet, so you will have to use a copy activity in a data pipeline.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
8 | |
1 | |
1 | |
1 | |
1 |
User | Count |
---|---|
13 | |
4 | |
3 | |
2 | |
2 |