Automate SharePoint to Fabric Lakehouse Data Sync ...

FataiSanni · ‎11-13-2025

If you're working with files stored in SharePoint and need to regularly sync them to Microsoft Fabric Lakehouse, you have a few options. While Dataflow Gen2 provides a UI-driven approach for connecting to SharePoint data sources, it has limitations, it can't handle certain file types, may struggle with complex folder structures, and doesn't always support the flexibility needed for custom ETL logic.

What if you needed more control? A code-based solution that could download any file type from SharePoint, apply custom transformations, and load them into your Lakehouse with a single notebook run?

I've built an open-source PySpark notebook that does exactly that. In this post, I'll walk you through the solution, explain how it works, and show you how to get it running in your environment.

What It Does

This notebook automatically:

Downloads Excel files from SharePoint using sharing links
Writes them directly to your Fabric Lakehouse using ABFS paths
Overwrites existing files to keep data fresh
Handles batch processing of multiple files
Provides detailed logging and error handling

Perfect for teams that need regular data syncs without manual intervention.

Quick Start

1. Create SharePoint Sharing Links

For each Excel file:

Right-click → Share → "Anyone with the link can edit"
Copy the link

2. Configure the Notebook

Update Cell 3 with your paths and file details:

# Your Lakehouse path
lakehouse_abfs_path = "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Files/data"

# Your files
source_files = [
    {
        "url": "SHAREPOINT_FILE_URL",
        "sharing_link": "PASTE_SHARING_LINK_HERE",
        "lakehouse_name": "sales_data.xlsx",
        "description": "Monthly sales report"
    }
]

3. Run and Schedule

Upload the notebook to your Fabric workspace
Attach it to your Lakehouse
Run all cells
Optional: Schedule it in a Fabric Pipeline for automated refreshes

That's it! Your data will sync automatically.

Schedule It for Hands-Off Operation

Create a Fabric Pipeline, add a Notebook activity, and configure a schedule trigger (hourly, daily, etc.). Now your SharePoint data automatically flows into your Lakehouse without any manual work.

Security Note

The current implementation uses "Anyone with the link can edit" for simplicity. For production environments, I recommend implementing Azure App Registration with client credentials for proper authentication. The README includes guidance on this approach, and contributions to add native authentication support are welcome!

Get the Code

The notebook is available on GitHub (https://github.com/dyfatai/SharePoint-To-MicrosoftFabric-Lakehouse-Notebook) with complete documentation, configuration examples, and troubleshooting tips.

Are you automating SharePoint to Lakehouse data flows? What's your approach? Drop a comment below, I'd love to hear how you're solving this challenge!

aditya87 · ‎11-13-2025

Good

Lasun · ‎11-19-2025

This is actually a great one to try out. Thank you for sharing @FataiSanni

OlayemiAwofe · ‎11-19-2025

This is actually great @FataiSanni

I look into this, thank you for this

Ankita_9 · ‎11-20-2025

great explination

kkaroluks8 · ‎12-12-2025

You should investigate the new SharePoint shortcut to lakehouse it would be beneficial.