Solved: How to read sharepoint excel with Notebooks using ...

CrhIT · ‎01-23-2025

I was trying to connect and read a sharepoint excel file using notebooks with PySpark, but I cannot find any tutorial that ables me to perform this accurately.

Please any help would be appreciate it.

It's just conenct and read an excel file in sharepoint so I don't have to import it using dataflow and create a table or importing it on the lakehouse and refresh the dataflow everytime i do a change.

CrhIT · ‎02-05-2025

The solution needs to be done by entering to azure portal, create an app, give permitions of reader in mivcrosfot graph and sharepoint.

From that app you will take tenant id, client secret, client id.

When that's done you can run the following code:

# Authentication details
tenant_id = "enter-your-tenant-id"
client_id = "enter-your-client-id"
client_secret = "enter-your-client-secret"
sharepoint_domain = "enter-your-sharepoint-domain"
site_name = "enter-your-site"
file_path = "/path/to/excel.xlsx"  # Corrected path

# Step 1: Get an access token using Microsoft Identity OAuth2.0
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
token_data = {
    "grant_type": "client_credentials",
    "client_id": "client_id",
    "client_secret": "client_secret",
    "scope": "https://graph.microsoft.com/.default"
}

response = requests.post(token_url, data=token_data)
response.raise_for_status()  # Raise error if request fails
access_token = response.json().get("access_token")

# Debugging: Print token preview to ensure it's not empty
print(" Access Token Received:", access_token[:50], "...")

# Step 2: Retrieve the SharePoint Site ID
headers = {"Authorization": f"Bearer {access_token}"}
site_url = f"https://graph.microsoft.com/v1.0/sites/{sharepoint_domain}:/sites/{site_name}"
response = requests.get(site_url, headers=headers)
response.raise_for_status()
site_id = response.json()["id"]
print(f" Retrieved Site ID: {site_id}")

# Step 3: Retrieve the File Content from SharePoint using Graph API
file_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/root:/{file_path}:/content"
response = requests.get(file_url, headers=headers)

if response.status_code != 200:
    raise Exception(f" Failed to download file: {response.text}")

print(" File successfully downloaded from SharePoint!")

# Step 4: Read the specific sheet "LH_Mapping" from the downloaded Excel file
xls = BytesIO(response.content)  # Convert the response content into an in-memory file
df = pd.read_excel(xls, sheet_name="sheet-name") # Trinitec Share Point PLUs
print(' Excel fetched succesfully!\n')

View solution in original post

CrhIT · ‎02-05-2025

The solution needs to be done by entering to azure portal, create an app, give permitions of reader in mivcrosfot graph and sharepoint.

From that app you will take tenant id, client secret, client id.

When that's done you can run the following code:

# Authentication details
tenant_id = "enter-your-tenant-id"
client_id = "enter-your-client-id"
client_secret = "enter-your-client-secret"
sharepoint_domain = "enter-your-sharepoint-domain"
site_name = "enter-your-site"
file_path = "/path/to/excel.xlsx"  # Corrected path

# Step 1: Get an access token using Microsoft Identity OAuth2.0
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
token_data = {
    "grant_type": "client_credentials",
    "client_id": "client_id",
    "client_secret": "client_secret",
    "scope": "https://graph.microsoft.com/.default"
}

response = requests.post(token_url, data=token_data)
response.raise_for_status()  # Raise error if request fails
access_token = response.json().get("access_token")

# Debugging: Print token preview to ensure it's not empty
print(" Access Token Received:", access_token[:50], "...")

# Step 2: Retrieve the SharePoint Site ID
headers = {"Authorization": f"Bearer {access_token}"}
site_url = f"https://graph.microsoft.com/v1.0/sites/{sharepoint_domain}:/sites/{site_name}"
response = requests.get(site_url, headers=headers)
response.raise_for_status()
site_id = response.json()["id"]
print(f" Retrieved Site ID: {site_id}")

# Step 3: Retrieve the File Content from SharePoint using Graph API
file_url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drive/root:/{file_path}:/content"
response = requests.get(file_url, headers=headers)

if response.status_code != 200:
    raise Exception(f" Failed to download file: {response.text}")

print(" File successfully downloaded from SharePoint!")

# Step 4: Read the specific sheet "LH_Mapping" from the downloaded Excel file
xls = BytesIO(response.content)  # Convert the response content into an in-memory file
df = pd.read_excel(xls, sheet_name="sheet-name") # Trinitec Share Point PLUs
print(' Excel fetched succesfully!\n')

v-tsaipranay · ‎01-26-2025

Hi @CrhIT ,

Thanks for reaching out to the Microsoft fabric community forum.

As @nilendraFabric suggested, have you got an opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

v-tsaipranay · ‎01-30-2025

Hello @CrhIT ,

I wanted to inquire if you had the chance to review the provided information. Please do not hesitate to reach out if you have any additional questions. If the issue is resolved, kindly accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

v-tsaipranay · ‎02-02-2025

Hi @CrhIT ,

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

nilendraFabric · ‎01-23-2025

Hello @CrhIT

Here is very basic code, not production ready as secrets are in plain text , but good to try the logic

import requests
from notebookutils import mssparkutils

# Authentication details
tenant_id = "your_tenant_id"
client_id = "your_client_id"
client_secret = "your_client_secret"
sharepoint_site = "https://yourtenant.sharepoint.com/sites/yoursite"
file_path = "/Shared Documents/yourfile.xlsx"

# Step 1: Get access token
auth_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
auth_data = {
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret,
"scope": "https://graph.microsoft.com/.default"
}

response = requests.post(auth_url, data=auth_data)
if response.status_code != 200:
raise Exception(f"Failed to authenticate: {response.text}")

access_token = response.json().get("access_token")

# Step 2: Download file from SharePoint
headers = {"Authorization": f"Bearer {access_token}"}
file_url = f"{sharepoint_site}/_api/web/getfilebyserverrelativeurl('{file_path}')/$value"

response = requests.get(file_url, headers=headers)
if response.status_code != 200:
raise Exception(f"Failed to download file: {response.text}")

# Step 3: Save file to OneLake using notebookutils
# Ensure the path includes a subfolder under /Files
one_lake_path = "/lakehouse/default/Files/subfolder/yourfile.xlsx" # Path relative to Lakehouse Files folder

# Create the subfolder if it doesn't exist
mssparkutils.fs.mkdirs("/lakehouse/default/Files/subfolder")

# Write binary content directly to OneLake
mssparkutils.fs.put(one_lake_path, response.content.decode("latin1"), overwrite=True) # Decode binary content as a string
print(f"File downloaded and saved to OneLake at: {one_lake_path}")

I haven't tried it by my own. Once data is written in Fabric read it using Pandas

from pyspark.sql import SparkSession

file_path = "abfss://<workspace>@onelake.dfs.fabric.microsoft.com/<lakehouse>/Files/yourfile.csv"

# Read the Excel file using Pandas df = pd.read_excel(file_path)

# Display the first few rows of the data

print(df.head())

see if this works.

Thanks

How to read sharepoint excel with Notebooks using PySpark

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

How to read sharepoint excel with Notebooks using PySpark

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025