Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
VickyDev18
Advocate II
Advocate II

How to load an excel file from Sharepoint to a Lakehouse delta table using Notebooks

I have a file located in sharepoint and I want to load the contents of that file into a Lakehouse. The file can have multiple sheets so I need to load each sheet into a separate table. 

 

I couldn't find any reference article that provided the steps to do this. 

 

Even if I upload the file into the Lakehouse itself, and just try to read the file, I am not able to get this to work

df = (spark.read.format("com.crealytics.spark.excel")
    .option("location", "Files/Test.xlsx")
    .option("useHeader", "true")
    .option("sheetName", "test_sheet")
    .load()
    )

display(df)

 

It gives an error "[DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.crealytics.spark.excel."

 

Note: I know I can do this using Dataflows. I just want to know how I would do it using Notebooks. 

1 ACCEPTED SOLUTION
HimanshuS-msft
Microsoft Employee
Microsoft Employee

Hello @VickyDev18 
I can answer the read the excel from a lakeshouse and I did tried the below price of code and it did worked . I am using pandas here .

import pandas as pd
# Load data into pandas DataFrame from "/lakehouse/default/" + "Files/test2/data.csv"
df = pd.read_excel("/lakehouse/default/" + "Files/test2/Book1.xlsx")
display(df)

 

HimanshuSmsft_0-1708021616647.png

HTH 
Himanshu 



View solution in original post

6 REPLIES 6
sruthiramana
Frequent Visitor

Hello, I am working on a similar issue. I able to read multiple sheets using pandas as mentioned below if the file is in lakehouse, but is there a automated way that it picks the file from sharepoint using pipeline ? and then can be accessed in a notebook to read it. Please let me know if you have found any solution to this problem.

 

Thanks,

Sruthi

@sruthiramana  I think for your need a good starting point could be this article Automating Data Ingestion from SharePoint to Azure Lakehouse using PySpark and Microsoft Graph API .

It took me several weeks to develop an automated and general-purpose solution to ingest Sharepoint files in lakehose suitable for an enterprise environment and the above article was my starting point.

The main challenge were permissions: Sharepoint and Fabric are two different and unconnected worlds, there are no shortcuts!

To develop this solution you need to be familiar with app registration in EntraID and with the Microsoft Graph API (I had to learn how to work with Postman to debug my code) and you need the support of a Sharepoint Tenant admin and an Azure Tenant admin to grant proper permissions to your "automated solution".

And last, but for sure not the least, another big issue was that all the official Microsoft documentation on the topic (December 2024) is contradictory or outdated.

Below the rationale of the solution I developed, if you need further information, just let me know.

 

Copy files from Sharepoint to Lakehouse

How to copy all files contained in a SharePoint Online folder into a Fabric Lakehouse folder

Use Case

Sometimes users consume custom classifications for departmental or product data analysis or other specific investigations that integrate or complete the official company classifications defined in the source data systems. These custom definitions are also subject to frequent changes that are managed directly by end users typically using spreadsheets in which semantic dictionaries are defined. These dictionaries are typically managed by users on SharePoint and the data engineer task is to make these additional classifications available in the EDW (Enterprise Data Warehouse).

Goal

Automatically ingest business user’s files stored in SharePoint to have a staging copy independent and detached from the source system. Depending on the data process requirement, files can be extracted at regular intervals, such as once a day or every time a file is modified.

Solution

Developing a general-purpose notebook for Fabric able to programmatically copy the full content of any SharePoint Online folder in any Lakehouse folder when called by a pipeline activity.

Note

A similar but simpler solution has already been developed for Data Factory and Synapse Analytics, but since in Fabric today (October 2024) the required connectors cannot be parameterized to allow a general-purpose solution, we decided to handle the entire process with a Spark Notebook.

Requirements and constraints

  1. SharePoint Online is not an Azure component, so you need to properly authenticate to access it from Fabric. In addition, since we access SharePoint using an app and not a user (delegated authentication), we need to setup a non-interactive authentication (app identity authentication) using a registered app identity in EntraId.
  2. Today you can access SharePoint using SharePoint REST API with ACS (Azure Access Control Service) authentication, or using Microsoft Graph REST API with AD (Azure Active Directory) authentication (ref. Upgrading SharePoint applications from Azure Access Control Service to Azure Active Directory | Micr...). ACS has been discontinued in 2018, so we decided to develop this solution using Microsoft Graph using AD.
  3. Fabric connections today (October 2024) can’t manage secrets using KeyVault, for this reason we decided to develop this solution using a Spark notebook to properly manage secrets stored in Key Vault using the pySpark notebookutils module.
  4. Required Fabric connectors for this solution today (October 2024) cannot be parametrised, so we decided to develop this general-purpose solution using a Fabric pySpark notebook passing the configuration parameters to the notebook from the calling pipeline as pipeline parameters.

Hi,
I'm able to read the files and folders in the notebook. How I can download the file directly in its format to lakhouse files folders?

HimanshuS-msft
Microsoft Employee
Microsoft Employee

Hello @VickyDev18 
I can answer the read the excel from a lakeshouse and I did tried the below price of code and it did worked . I am using pandas here .

import pandas as pd
# Load data into pandas DataFrame from "/lakehouse/default/" + "Files/test2/data.csv"
df = pd.read_excel("/lakehouse/default/" + "Files/test2/Book1.xlsx")
display(df)

 

HimanshuSmsft_0-1708021616647.png

HTH 
Himanshu 



are we able to write to the excel document in this way as well?

Thanks. Pandas option worked!

I'm curious to know why the spark option isn't working though.

The option in the UI menu "Load data > Spark" does nothing when using an xlsx file either. Works for csv.   

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors