Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I have a file located in sharepoint and I want to load the contents of that file into a Lakehouse. The file can have multiple sheets so I need to load each sheet into a separate table.
I couldn't find any reference article that provided the steps to do this.
Even if I upload the file into the Lakehouse itself, and just try to read the file, I am not able to get this to work
df = (spark.read.format("com.crealytics.spark.excel")
.option("location", "Files/Test.xlsx")
.option("useHeader", "true")
.option("sheetName", "test_sheet")
.load()
)
display(df)
It gives an error "[DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.crealytics.spark.excel."
Note: I know I can do this using Dataflows. I just want to know how I would do it using Notebooks.
Solved! Go to Solution.
Hello @VickyDev18
I can answer the read the excel from a lakeshouse and I did tried the below price of code and it did worked . I am using pandas here .
HTH
Himanshu
Hello, I am working on a similar issue. I able to read multiple sheets using pandas as mentioned below if the file is in lakehouse, but is there a automated way that it picks the file from sharepoint using pipeline ? and then can be accessed in a notebook to read it. Please let me know if you have found any solution to this problem.
Thanks,
Sruthi
@sruthiramana I think for your need a good starting point could be this article Automating Data Ingestion from SharePoint to Azure Lakehouse using PySpark and Microsoft Graph API .
It took me several weeks to develop an automated and general-purpose solution to ingest Sharepoint files in lakehose suitable for an enterprise environment and the above article was my starting point.
The main challenge were permissions: Sharepoint and Fabric are two different and unconnected worlds, there are no shortcuts!
To develop this solution you need to be familiar with app registration in EntraID and with the Microsoft Graph API (I had to learn how to work with Postman to debug my code) and you need the support of a Sharepoint Tenant admin and an Azure Tenant admin to grant proper permissions to your "automated solution".
And last, but for sure not the least, another big issue was that all the official Microsoft documentation on the topic (December 2024) is contradictory or outdated.
Below the rationale of the solution I developed, if you need further information, just let me know.
Copy files from Sharepoint to Lakehouse
How to copy all files contained in a SharePoint Online folder into a Fabric Lakehouse folder
Sometimes users consume custom classifications for departmental or product data analysis or other specific investigations that integrate or complete the official company classifications defined in the source data systems. These custom definitions are also subject to frequent changes that are managed directly by end users typically using spreadsheets in which semantic dictionaries are defined. These dictionaries are typically managed by users on SharePoint and the data engineer task is to make these additional classifications available in the EDW (Enterprise Data Warehouse).
Automatically ingest business user’s files stored in SharePoint to have a staging copy independent and detached from the source system. Depending on the data process requirement, files can be extracted at regular intervals, such as once a day or every time a file is modified.
Developing a general-purpose notebook for Fabric able to programmatically copy the full content of any SharePoint Online folder in any Lakehouse folder when called by a pipeline activity.
A similar but simpler solution has already been developed for Data Factory and Synapse Analytics, but since in Fabric today (October 2024) the required connectors cannot be parameterized to allow a general-purpose solution, we decided to handle the entire process with a Spark Notebook.
Hi,
I'm able to read the files and folders in the notebook. How I can download the file directly in its format to lakhouse files folders?
Hello @VickyDev18
I can answer the read the excel from a lakeshouse and I did tried the below price of code and it did worked . I am using pandas here .
HTH
Himanshu
are we able to write to the excel document in this way as well?
Thanks. Pandas option worked!
I'm curious to know why the spark option isn't working though.
The option in the UI menu "Load data > Spark" does nothing when using an xlsx file either. Works for csv.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
16 | |
15 | |
4 | |
4 | |
3 |