Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
CrhIT
Frequent Visitor

How to read sharepoint excel with Notebooks using PySpark

I was trying to connect and read a sharepoint excel file using notebooks with PySpark, but I cannot find any tutorial that ables me to perform this accurately.

 

Please any help would be appreciate it.

It's just conenct and read an excel file in sharepoint so I don't have to import it using dataflow and create a table or importing it on the lakehouse and refresh the dataflow everytime i do a change.

1 ACCEPTED SOLUTION
CrhIT
Frequent Visitor

The solution needs to be done by entering to azure portal, create an app, give permitions of reader in mivcrosfot graph and sharepoint.

 

From that app you will take tenant id, client secret, client id.

When that's done you can run the following code:

 

# Authentication details
tenant_id = "enter-your-tenant-id"
client_id = "enter-your-client-id"
client_secret = "enter-your-client-secret"
sharepoint_domain = "enter-your-sharepoint-domain"
site_name = "enter-your-site"
file_path = "/path/to/excel.xlsx"  # Corrected path

# Step 1: Get an access token using Microsoft Identity OAuth2.0
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
token_data = {
    "grant_type": "client_credentials",
    "client_id": "client_id",
    "client_secret": "client_secret",
    "scope": "https://graph.microsoft.com/.default"
}

response = requests.post(token_url, data=token_data)
response.raise_for_status()  # Raise error if request fails
access_token = response.json().get("access_token")

# Debugging: Print token preview to ensure it's not empty
print(" Access Token Received:", access_token[:50], "...")

# Step 2: Retrieve the SharePoint Site ID
headers = {"Authorization": f"Bearer {access_token}"}
site_url = f"https://graph.microsoft.com/v1.0/sites/{sharepoint_domain}:/sites/{site_name}"
response = requests.get(site_url, headers=headers)
response.raise_for_status()
site_id = response.json()["id"]
print(f" Retrieved Site ID: {site_id}")

# Step 3: Retrieve the File Content from SharePoint using Graph API
file_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/root:/{file_path}:/content"
response = requests.get(file_url, headers=headers)

if response.status_code != 200:
    raise Exception(f" Failed to download file: {response.text}")

print(" File successfully downloaded from SharePoint!")

# Step 4: Read the specific sheet "LH_Mapping" from the downloaded Excel file
xls = BytesIO(response.content)  # Convert the response content into an in-memory file
df = pd.read_excel(xls, sheet_name="sheet-name") # Trinitec Share Point PLUs
print(' Excel fetched succesfully!\n')

 



View solution in original post

5 REPLIES 5
CrhIT
Frequent Visitor

The solution needs to be done by entering to azure portal, create an app, give permitions of reader in mivcrosfot graph and sharepoint.

 

From that app you will take tenant id, client secret, client id.

When that's done you can run the following code:

 

# Authentication details
tenant_id = "enter-your-tenant-id"
client_id = "enter-your-client-id"
client_secret = "enter-your-client-secret"
sharepoint_domain = "enter-your-sharepoint-domain"
site_name = "enter-your-site"
file_path = "/path/to/excel.xlsx"  # Corrected path

# Step 1: Get an access token using Microsoft Identity OAuth2.0
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
token_data = {
    "grant_type": "client_credentials",
    "client_id": "client_id",
    "client_secret": "client_secret",
    "scope": "https://graph.microsoft.com/.default"
}

response = requests.post(token_url, data=token_data)
response.raise_for_status()  # Raise error if request fails
access_token = response.json().get("access_token")

# Debugging: Print token preview to ensure it's not empty
print(" Access Token Received:", access_token[:50], "...")

# Step 2: Retrieve the SharePoint Site ID
headers = {"Authorization": f"Bearer {access_token}"}
site_url = f"https://graph.microsoft.com/v1.0/sites/{sharepoint_domain}:/sites/{site_name}"
response = requests.get(site_url, headers=headers)
response.raise_for_status()
site_id = response.json()["id"]
print(f" Retrieved Site ID: {site_id}")

# Step 3: Retrieve the File Content from SharePoint using Graph API
file_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/root:/{file_path}:/content"
response = requests.get(file_url, headers=headers)

if response.status_code != 200:
    raise Exception(f" Failed to download file: {response.text}")

print(" File successfully downloaded from SharePoint!")

# Step 4: Read the specific sheet "LH_Mapping" from the downloaded Excel file
xls = BytesIO(response.content)  # Convert the response content into an in-memory file
df = pd.read_excel(xls, sheet_name="sheet-name") # Trinitec Share Point PLUs
print(' Excel fetched succesfully!\n')

 



v-tsaipranay
Community Support
Community Support

Hi @CrhIT ,

 

Thanks for reaching out to the Microsoft fabric community forum.

 

As @nilendraFabric suggested, have you got an opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.


Thank you.

Hello @CrhIT ,


I wanted to inquire if you had the chance to review the provided information. Please do not hesitate to reach out if you have any additional questions. If the issue is resolved, kindly accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

Hi @CrhIT ,

 

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

 

Thank you.

nilendraFabric
Community Champion
Community Champion

Hello @CrhIT 

Here is very basic code, not production ready as secrets are in plain text , but good to try the logic 

 

import requests
from notebookutils import mssparkutils

 

# Authentication details
tenant_id = "your_tenant_id"
client_id = "your_client_id"
client_secret = "your_client_secret"
sharepoint_site = "https://yourtenant.sharepoint.com/sites/yoursite"
file_path = "/Shared Documents/yourfile.xlsx"

 

 

# Step 1: Get access token
auth_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
auth_data = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"scope": "https://graph.microsoft.com/.default"
}

 

 

response = requests.post(auth_url, data=auth_data)
if response.status_code != 200:
raise Exception(f"Failed to authenticate: {response.text}")

 

 

access_token = response.json().get("access_token")

 

 

# Step 2: Download file from SharePoint
headers = {"Authorization": f"Bearer {access_token}"}
file_url = f"{sharepoint_site}/_api/web/getfilebyserverrelativeurl('{file_path}')/$value"

response = requests.get(file_url, headers=headers)
if response.status_code != 200:
raise Exception(f"Failed to download file: {response.text}")

 

 

# Step 3: Save file to OneLake using notebookutils
# Ensure the path includes a subfolder under /Files
one_lake_path = "/lakehouse/default/Files/subfolder/yourfile.xlsx" # Path relative to Lakehouse Files folder

# Create the subfolder if it doesn't exist
mssparkutils.fs.mkdirs("/lakehouse/default/Files/subfolder")

 

 

# Write binary content directly to OneLake
mssparkutils.fs.put(one_lake_path, response.content.decode("latin1"), overwrite=True) # Decode binary content as a string
print(f"File downloaded and saved to OneLake at: {one_lake_path}")

 

I haven't tried it by my own. Once data is written in Fabric read it using Pandas

from pyspark.sql import SparkSession


file_path = "abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>/Files/yourfile.csv"

# Read the Excel file using Pandas df = pd.read_excel(file_path)

# Display the first few rows of the data

print(df.head())

see if this works.

Thanks 

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

May FBC25 Carousel

Fabric Monthly Update - May 2025

Check out the May 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.