Join us for an expert-led overview of the tools and concepts you'll need to pass exam PL-300. The first session starts on June 11th. See you there!
Get registeredJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
I was trying to connect and read a sharepoint excel file using notebooks with PySpark, but I cannot find any tutorial that ables me to perform this accurately.
Please any help would be appreciate it.
It's just conenct and read an excel file in sharepoint so I don't have to import it using dataflow and create a table or importing it on the lakehouse and refresh the dataflow everytime i do a change.
Solved! Go to Solution.
The solution needs to be done by entering to azure portal, create an app, give permitions of reader in mivcrosfot graph and sharepoint.
From that app you will take tenant id, client secret, client id.
When that's done you can run the following code:
# Authentication details
tenant_id = "enter-your-tenant-id"
client_id = "enter-your-client-id"
client_secret = "enter-your-client-secret"
sharepoint_domain = "enter-your-sharepoint-domain"
site_name = "enter-your-site"
file_path = "/path/to/excel.xlsx" # Corrected path
# Step 1: Get an access token using Microsoft Identity OAuth2.0
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
token_data = {
"grant_type": "client_credentials",
"client_id": "client_id",
"client_secret": "client_secret",
"scope": "https://graph.microsoft.com/.default"
}
response = requests.post(token_url, data=token_data)
response.raise_for_status() # Raise error if request fails
access_token = response.json().get("access_token")
# Debugging: Print token preview to ensure it's not empty
print(" Access Token Received:", access_token[:50], "...")
# Step 2: Retrieve the SharePoint Site ID
headers = {"Authorization": f"Bearer {access_token}"}
site_url = f"https://graph.microsoft.com/v1.0/sites/{sharepoint_domain}:/sites/{site_name}"
response = requests.get(site_url, headers=headers)
response.raise_for_status()
site_id = response.json()["id"]
print(f" Retrieved Site ID: {site_id}")
# Step 3: Retrieve the File Content from SharePoint using Graph API
file_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/root:/{file_path}:/content"
response = requests.get(file_url, headers=headers)
if response.status_code != 200:
raise Exception(f" Failed to download file: {response.text}")
print(" File successfully downloaded from SharePoint!")
# Step 4: Read the specific sheet "LH_Mapping" from the downloaded Excel file
xls = BytesIO(response.content) # Convert the response content into an in-memory file
df = pd.read_excel(xls, sheet_name="sheet-name") # Trinitec Share Point PLUs
print(' Excel fetched succesfully!\n')
The solution needs to be done by entering to azure portal, create an app, give permitions of reader in mivcrosfot graph and sharepoint.
From that app you will take tenant id, client secret, client id.
When that's done you can run the following code:
# Authentication details
tenant_id = "enter-your-tenant-id"
client_id = "enter-your-client-id"
client_secret = "enter-your-client-secret"
sharepoint_domain = "enter-your-sharepoint-domain"
site_name = "enter-your-site"
file_path = "/path/to/excel.xlsx" # Corrected path
# Step 1: Get an access token using Microsoft Identity OAuth2.0
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
token_data = {
"grant_type": "client_credentials",
"client_id": "client_id",
"client_secret": "client_secret",
"scope": "https://graph.microsoft.com/.default"
}
response = requests.post(token_url, data=token_data)
response.raise_for_status() # Raise error if request fails
access_token = response.json().get("access_token")
# Debugging: Print token preview to ensure it's not empty
print(" Access Token Received:", access_token[:50], "...")
# Step 2: Retrieve the SharePoint Site ID
headers = {"Authorization": f"Bearer {access_token}"}
site_url = f"https://graph.microsoft.com/v1.0/sites/{sharepoint_domain}:/sites/{site_name}"
response = requests.get(site_url, headers=headers)
response.raise_for_status()
site_id = response.json()["id"]
print(f" Retrieved Site ID: {site_id}")
# Step 3: Retrieve the File Content from SharePoint using Graph API
file_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/root:/{file_path}:/content"
response = requests.get(file_url, headers=headers)
if response.status_code != 200:
raise Exception(f" Failed to download file: {response.text}")
print(" File successfully downloaded from SharePoint!")
# Step 4: Read the specific sheet "LH_Mapping" from the downloaded Excel file
xls = BytesIO(response.content) # Convert the response content into an in-memory file
df = pd.read_excel(xls, sheet_name="sheet-name") # Trinitec Share Point PLUs
print(' Excel fetched succesfully!\n')
Hi @CrhIT ,
Thanks for reaching out to the Microsoft fabric community forum.
As @nilendraFabric suggested, have you got an opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.
Hello @CrhIT ,
I wanted to inquire if you had the chance to review the provided information. Please do not hesitate to reach out if you have any additional questions. If the issue is resolved, kindly accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.
Hi @CrhIT ,
May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
Thank you.
Hello @CrhIT
Here is very basic code, not production ready as secrets are in plain text , but good to try the logic
import requests
from notebookutils import mssparkutils
# Authentication details
tenant_id = "your_tenant_id"
client_id = "your_client_id"
client_secret = "your_client_secret"
sharepoint_site = "https://yourtenant.sharepoint.com/sites/yoursite"
file_path = "/Shared Documents/yourfile.xlsx"
# Step 1: Get access token
auth_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
auth_data = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"scope": "https://graph.microsoft.com/.default"
}
response = requests.post(auth_url, data=auth_data)
if response.status_code != 200:
raise Exception(f"Failed to authenticate: {response.text}")
access_token = response.json().get("access_token")
# Step 2: Download file from SharePoint
headers = {"Authorization": f"Bearer {access_token}"}
file_url = f"{sharepoint_site}/_api/web/getfilebyserverrelativeurl('{file_path}')/$value"
response = requests.get(file_url, headers=headers)
if response.status_code != 200:
raise Exception(f"Failed to download file: {response.text}")
# Step 3: Save file to OneLake using notebookutils
# Ensure the path includes a subfolder under /Files
one_lake_path = "/lakehouse/default/Files/subfolder/yourfile.xlsx" # Path relative to Lakehouse Files folder
# Create the subfolder if it doesn't exist
mssparkutils.fs.mkdirs("/lakehouse/default/Files/subfolder")
# Write binary content directly to OneLake
mssparkutils.fs.put(one_lake_path, response.content.decode("latin1"), overwrite=True) # Decode binary content as a string
print(f"File downloaded and saved to OneLake at: {one_lake_path}")
I haven't tried it by my own. Once data is written in Fabric read it using Pandas
from pyspark.sql import SparkSession
file_path = "abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>/Files/yourfile.csv"
# Read the Excel file using Pandas df = pd.read_excel(file_path)
# Display the first few rows of the data
print(df.head())
see if this works.
Thanks
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
User | Count |
---|---|
74 | |
48 | |
16 | |
12 | |
7 |
User | Count |
---|---|
81 | |
81 | |
27 | |
8 | |
7 |