Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.

FataiSanni

Automate SharePoint to Fabric Lakehouse Data Sync with Python

If you're working with files stored in SharePoint and need to regularly sync them to Microsoft Fabric Lakehouse, you have a few options. While Dataflow Gen2 provides a UI-driven approach for connecting to SharePoint data sources, it has limitations, it can't handle certain file types, may struggle with complex folder structures, and doesn't always support the flexibility needed for custom ETL logic.

 

What if you needed more control? A code-based solution that could download any file type from SharePoint, apply custom transformations, and load them into your Lakehouse with a single notebook run?

I've built an open-source PySpark notebook that does exactly that. In this post, I'll walk you through the solution, explain how it works, and show you how to get it running in your environment.

 

What It Does

This notebook automatically:

  • Downloads Excel files from SharePoint using sharing links
  • Writes them directly to your Fabric Lakehouse using ABFS paths
  • Overwrites existing files to keep data fresh
  • Handles batch processing of multiple files
  • Provides detailed logging and error handling

Perfect for teams that need regular data syncs without manual intervention.

 

Quick Start

1. Create SharePoint Sharing Links

For each Excel file:

  • Right-click → Share → "Anyone with the link can edit"
  • Copy the link

2. Configure the Notebook

Update Cell 3 with your paths and file details:

# Your Lakehouse path
lakehouse_abfs_path = "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Files/data"

# Your files
source_files = [
    {
        "url": "SHAREPOINT_FILE_URL",
        "sharing_link": "PASTE_SHARING_LINK_HERE",
        "lakehouse_name": "sales_data.xlsx",
        "description": "Monthly sales report"
    }
]

3. Run and Schedule

  • Upload the notebook to your Fabric workspace
  • Attach it to your Lakehouse
  • Run all cells
  • Optional: Schedule it in a Fabric Pipeline for automated refreshes

That's it! Your data will sync automatically.

 

Schedule It for Hands-Off Operation

Create a Fabric Pipeline, add a Notebook activity, and configure a schedule trigger (hourly, daily, etc.). Now your SharePoint data automatically flows into your Lakehouse without any manual work.

 

Security Note

The current implementation uses "Anyone with the link can edit" for simplicity. For production environments, I recommend implementing Azure App Registration with client credentials for proper authentication. The README includes guidance on this approach, and contributions to add native authentication support are welcome!

 

Get the Code

The notebook is available on GitHub (https://github.com/dyfatai/SharePoint-To-MicrosoftFabric-Lakehouse-Notebook) with complete documentation, configuration examples, and troubleshooting tips.


Are you automating SharePoint to Lakehouse data flows? What's your approach? Drop a comment below, I'd love to hear how you're solving this challenge!

Comments

Good

This is actually a great one to try out. Thank you for sharing @FataiSanni 

This is actually great @FataiSanni 

I look into this, thank you for this 

great explination

You should investigate the new SharePoint shortcut to lakehouse it would be beneficial.