This is best Fabric, Power BI, SQL and AI community event. How do we know? The last event sold out! Save €200 with code FABCMTY200.
Register nowA new Data Days event is coming soon! This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. Don't miss out.
If you're working with files stored in SharePoint and need to regularly sync them to Microsoft Fabric Lakehouse, you have a few options. While Dataflow Gen2 provides a UI-driven approach for connecting to SharePoint data sources, it has limitations, it can't handle certain file types, may struggle with complex folder structures, and doesn't always support the flexibility needed for custom ETL logic.
What if you needed more control? A code-based solution that could download any file type from SharePoint, apply custom transformations, and load them into your Lakehouse with a single notebook run?
I've built an open-source PySpark notebook that does exactly that. In this post, I'll walk you through the solution, explain how it works, and show you how to get it running in your environment.
This notebook automatically:
Perfect for teams that need regular data syncs without manual intervention.
For each Excel file:
Update Cell 3 with your paths and file details:
# Your Lakehouse path
lakehouse_abfs_path = "abfss://workspace@onelake.dfs.fabric.microsoft.com/lakehouse.Lakehouse/Files/data"
# Your files
source_files = [
{
"url": "SHAREPOINT_FILE_URL",
"sharing_link": "PASTE_SHARING_LINK_HERE",
"lakehouse_name": "sales_data.xlsx",
"description": "Monthly sales report"
}
]That's it! Your data will sync automatically.
Create a Fabric Pipeline, add a Notebook activity, and configure a schedule trigger (hourly, daily, etc.). Now your SharePoint data automatically flows into your Lakehouse without any manual work.
The current implementation uses "Anyone with the link can edit" for simplicity. For production environments, I recommend implementing Azure App Registration with client credentials for proper authentication. The README includes guidance on this approach, and contributions to add native authentication support are welcome!
The notebook is available on GitHub (https://github.com/dyfatai/SharePoint-To-MicrosoftFabric-Lakehouse-Notebook) with complete documentation, configuration examples, and troubleshooting tips.
Are you automating SharePoint to Lakehouse data flows? What's your approach? Drop a comment below, I'd love to hear how you're solving this challenge!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.