Data Engineering Community Blog

ghanchiasif

Organizations often rely on SharePoint as a lightweight data exchange layer—teams upload CSVs or Excel files that are later consumed for reporting and analytics. While convenient, this pattern frequently leads to manual ingestion, inconsistent schemas, delayed refreshes, and downstream data quality issues.

To address this gap, we’re excited to opensource FabricSharePointCopy utility, a framework that provides a standardized, metadatadriven way to ingest files from SharePoint into Microsoft Fabric Lakehouse tables, with builtin validation and automation.

The project is now available on GitHub:

https://github.com/microsoft/FabricSharePointCopy

What is FabricSharePointCopy?

FabricSharePointCopy is a utility framework for seamlessly transferring files from SharePoint into Microsoft Fabric managed tables, enabling structured, Lakehouseready data for downstream analytics and reporting.

The framework focuses on:

Standardizing ingestion from SharePoint

Enforcing data quality before publish

Reducing manual intervention

Making curated data quickly available to Fabric consumers

It is designed to be generic, reusable, and extensible, rather than tied to any single business domain.

Why We Built This

While Microsoft Fabric provides powerful analytics capabilities, filebased ingestion from SharePoint often requires custom, oneoff solutions:

Pipelines that only run on schedules

Manual schema fixes after ingestion

Silent failures when files change unexpectedly

Inconsistent naming and table structures

FabricSharePointCopy addresses these challenges by introducing a metadatadriven ingestion layer that reacts to file changes and enforces validation before data reaches curated tables.

How the Framework Works

At a high level, FabricSharePointCopy continuously watches configured SharePoint folders and triggers ingestion whenever a file is created or updated.

Endtoend flow:

Detect change – A new file upload or modification is detected for a configured SharePoint folder.

Register the file – File metadata (name, path, modified time, size) is captured to drive processing.

Validate (DQ gate) – Metadatadriven data quality checks run before publish (schema, required columns, thresholds, sheet rules).

Ingest & transform – CSV or Excel files are read and processed based on configured load type.

Publish to Fabric – Curated tables are updated in the Fabric Lakehouse and made available for consumption.

Notify on failure – If validation fails, the framework sends a notification with the failure reason.

This ensures only validated, structured data reaches downstream analytics.

Supported File Formats and Load Types

FabricSharePointCopy supports common business file formats and ingestion patterns out of the box:

File formats

CSV

Excel (including multisheet files, skip rows, and skip columns)

Load types

Full load

Delta load

Custom load logic

All behavior is driven through metadata rather than hardcoded logic.

BuiltIn Data Quality (DQ)

A key design principle of FabricSharePointCopy is fail fast on bad data.

Before any data is published:

Schema checks ensure expected columns exist

Required fields can be enforced as nonnull

Rowcount thresholds can be applied

Sheet selection rules are validated

When validation fails, the framework stops processing and notifies the relevant owner, preventing corrupted or incomplete data from flowing downstream.

Standardized Naming with Flexibility

To keep curated data easy to discover, the framework applies a consistent naming convention for Silverlayer tables:

Silver_{Folder}_{FileName}

This default can be overridden using metadata when needed, allowing teams to balance clarity, consistency, and customization.

Designed for Microsoft Fabric

FabricSharePointCopy is built specifically for Microsoft Fabric Lakehouse architectures:

Works with OneLake paths

Produces managed tables ready for Direct Lake and downstream analytics

Aligns with Fabric notebooks and pipelinebased orchestration

Prerequisites and setup details are documented in the GitHub README, including Fabric workspace requirements, SharePoint access, and Lakehouse shortcuts.

Open Source and Extensible

We’ve released FabricSharePointCopy as an opensource project under the MIT license, making it easy for teams to:

Adopt the framework asis

Extend validation logic

Add custom transformations

Integrate with their own notification or monitoring systems

Who Is This For?

FabricSharePointCopy is useful for:

Data teams ingesting operational files from SharePoint

Analytics engineers standardizing filebased ingestion

Fabric users looking for near realtime availability of curated data

Teams aiming to reduce manual data fixes and rework

Contributors
@ghanchiasif, @kranthimeda, @swapnil09

Murtaza_Ghafoor

Org Apps introduce the ability to create multiple apps from one workspace, helping organizations deliver customized analytics experiences without duplicating content or creating unnecessary workspaces.

Srisakthi · ‎02-26-2026

Stop copying data—start copying insights: Zero-copy access in Azure Databricks is here.

AnujPandey · ‎02-24-2026

#Databricks #fabric #OneLake #Azure #DataPlatform

Ilgar_Zarbali · ‎02-23-2026

2026 - Meetup Covers.png

One of the most frequent questions I receive from DP-600 learners and data professionals is:

“How do we properly ingest and transform data in Microsoft Fabric?”

Murtaza_Ghafoor · ‎02-19-2026

Discover how Microsoft Fabric and Dataverse can work together without copying data, giving you real-time insights and faster app development.

sivak_microsoft · ‎02-17-2026

A production-ready pattern for ingesting data from multiple Azure Data Explorer (Kusto) databases into Microsoft Fabric Lakehouse using workspace identity, smart refresh logic, and parallel execution.

pallavi_r · ‎02-09-2026

Discover Direct Lake in Microsoft Fabric, query Delta tables straight from OneLake with no data duplication while achieving low latency, high performance analytics.

Understand how snapshot isolation, incremental framing, and optimized Delta table design enable consistent, up-to-date, and scalable reporting for enterprise-grade Power BI solutions.

Murtaza_Ghafoor · ‎02-03-2026

Fabric notebooks help teams work faster, collaborate better, and build reliable data solutions using the Lakehouse. They are simple to use but powerful enough for real-world data workloads.

Mauro89 · ‎01-27-2026

The Confusion Ends Here

Working with Microsoft Fabric? Then it's only a matter of time before encountering the acronym "UDF"—and wondering what it really means. Is it a Power BI thing? Data Engineering? The answer is: it's both.

The good news: once the distinction is clear, choosing the right UDF becomes intuitive. And more importantly, understanding both reveals how Fabric's workloads are designed to work together seamlessly.

What Makes UDFs Worth Understanding

Both User Defined Functions (in Power BI) and User Data Functions (in Data Engineering) embody the same software engineering principle: modularity and the DRY principle—Don't Repeat Yourself. Yet they solve completely different problems.

Power BI's UDFs let analysts encode business logic once and reuse it across every dashboard and report. Data Engineering's UDFs enable data engineers to write transformations once and apply them wherever data needs to be processed. In both cases, the benefit is the same: one source of truth, no duplicated code, and centralized maintenance.

It's the difference between building consistent analytical metrics and processing data at scale—and why organizations need both.

Dive Deeper

Curious about how to leverage both? Ready to architect Fabric solutions that follow software engineering best practices?

NHariGouthami · ‎01-22-2026

What if your Power BI report could teach your AI Data Agent how to answer questions correctly?
In this article, I show how .pbip files become a knowledge base, Power BI DAX becomes ground truth, and Fabric Data Agents turn into self‑learning, production‑ready analytics assistants—with automated accuracy testing and continuous improvement.

AparnaRamakris · ‎01-12-2026

Why maintain separate batch pipelines in Fabric? Spark Structured Streaming combined with foreachBatch lets you handle backfills and daily loads without breaking your flow. Batch meets streaming inside OneLake.

pallavi_r · ‎12-30-2025

Traditional Gold tables can struggle as business logic evolves over time. Analytics lineage becomes harder to trace, governance more complex, and maintaining consistent metrics across reports increasingly challenging. Materialized Lake Views in Microsoft Fabric provide a SQL-based, reusable consumption layer that delivers Gold-level performance while remaining closely aligned with the Silver layer.

AparnaRamakris · ‎12-29-2025

Over the last few years, I’ve had the opportunity to build data platforms from scratch using both Microsoft Fabric and Databricks—sometimes as competing options, and increasingly as complementary pieces of the same architecture.

Fabric and Databricks are not chasing the same outcomes and using one to “replace” the other is usually the wrong starting question. This post is not about feature checklists. It’s about how these platforms behave in real-world architectures, why Fabric often wins on speed and coherence, and why Databricks continues to lead when Spark depth and governance precision really matter.

AparnaRamakris · ‎12-17-2025

The blog is to explore the Materialized Lake View Available in Microsoft Fabric ,its implementation and real time implementation challenges .Please note the feature is in preview and may not be recommended for production workloads as of the date of writing this content.

Ilgar_Zarbali · ‎12-16-2025

Meetup Covers.png

This article is based on official Microsoft Fabric documentation and practical learning resources provided by Microsoft. To move beyond theory and demonstrate real implementation, I also followed a hands-on Lakehouse lab published by Microsoft Learning. The lab walks through core concepts such as creating a lakehouse, ingesting data, and exploring it using different Fabric experiences.

If you would like to explore the same step-by-step exercise used in this article and in my demonstration, you can access the lab here:

Lab

techies · ‎11-17-2025

This article explains how Microsoft Fabric integrates with Moodle LMS REST API to create a scalable and reliable learning analytics ecosystem. We will walk through API integration, ingestion, lakehouse storage, Spark optimization, and automated pipelines: the foundation required to operationalize LMS analytics at an enterprise level.

FataiSanni · ‎11-13-2025

If you're working with files stored in SharePoint and need to regularly sync them to Microsoft Fabric Lakehouse, you have a few options. While Dataflow Gen2 provides a UI-driven approach for connecting to SharePoint data sources, it has limitations, it can't handle certain file types, may struggle with complex folder structures, and doesn't always support the flexibility needed for custom ETL logic.

What if you needed more control? A code-based solution that could download any file type from SharePoint, apply custom transformations, and load them into your Lakehouse with a single notebook run?

I've built an open-source PySpark notebook that does exactly that. In this post, I'll walk you through the solution, explain how it works, and show you how to get it running in your environment.

Rufyda · ‎09-21-2025

When organizations work with Microsoft Fabric, one of the most attractive features is the ability to create shortcuts to external storage systems such as AWS S3.
A shortcut gives you the convenience of accessing external data as if it were already part of OneLake, without the need to copy or duplicate files.

But here’s the catch: while shortcuts simplify connectivity, they don’t eliminate one of the biggest hidden costs in cloud analytics — data transfer fees.

How Shortcuts Work

A Fabric shortcut is essentially a pointer to the data. When you query parquet files in S3 through Fabric, the compute engine (running in Azure) must fetch the bytes from AWS. This means the data is leaving AWS, and every gigabyte transferred counts as egress traffic.

So even though the files aren’t duplicated inside Fabric storage, AWS still charges you for every read that crosses into Azure.

The Cost of Reading 200 GB Daily

Let’s consider a realistic example:

Your S3 bucket contains about 200 GB of parquet files.

These files are refreshed daily, and your Fabric semantic model needs a daily refresh.

That means 200 GB per day × 30 days = ~6 TB per month.

Based on typical AWS S3 data transfer rates (around $0.09 per GB for the first 10 TB), you’re looking at:

6,000 GB × $0.09 ≈ $540 per month in AWS egress charges.

That’s before considering Fabric compute costs.

Why Shortcuts Don’t Reduce Egress Fees

It’s important to understand that shortcuts don’t magically reduce data transfer charges. They prevent duplication of storage, but the actual bytes must still move from AWS to Azure every time you run a query or refresh your model.

So, if you’re reading the full 200 GB daily, you’ll pay egress fees as if you were downloading the data each day.

Strategies to Optimize Costs

The good news is that you don’t have to accept those fees at face value. There are practical ways to bring them down:

Initial Full Copy + Incremental Loads
Do one large migration of your dataset into OneLake (or Azure Data Lake). After that, only copy the new or updated files each day. This reduces daily transfers to just the delta, which is usually far smaller than the entire dataset.

Partitioning and Predicate Pushdown
Structure your parquet files by date or partition keys. Ensure your queries are selective so that Fabric only reads what’s necessary instead of scanning all 200 GB.

Push Changes from AWS
Instead of letting Fabric pull data every day, configure S3 event triggers (with Lambda or DataSync) to push only the new files into Azure as they arrive.

Compression and Column Pruning
Since parquet is columnar, make sure your reports only pull the columns that are actually needed. This reduces the amount of data read — and the egress bill.

Evaluate Long-Term Data Residency

If your workload is permanent and heavy, it may be more cost-effective to migrate the dataset fully into Azure and avoid continuous cross-cloud transfers.

Fabric shortcuts offer a great way to connect to S3 without moving data right away, but they don’t avoid AWS data transfer charges. If you access large volumes of S3 data every day, costs can add up quickly.

The most effective approach is usually to copy once, then refresh incrementally, while designing your data to minimize unnecessary reads. That way, you get the best of both worlds: the convenience of Fabric integration and a controlled cloud bill.

NHariGouthami · ‎09-09-2025

Spark JSON read and backfill optimizations

uzuntasgokberk · ‎09-02-2025

Scrape currency data from Hurriyet/Doviz with Python BeautifulSoup and store it in Microsoft Fabric Lakehouse or Warehouse step by step.

mk_sunitha · ‎09-02-2025

This guide walks you through building a front-end application using React, Apollo Client, and TypeScript, integrated with a GraphQL API hosted in Microsoft Fabric. It highlights how to integrate and configure useful local tools like auto-completion and code generation, focusing on delivering an intuitive and seamless developer experience with GraphQL.

NHariGouthami · ‎08-12-2025

🚀 Build a Fabric Data Agent in Minutes with GitHub Copilot Agent Mode

Discover how to supercharge your data workflows using GitHub Copilot Agent Mode in VS Code. Learn how to explore schemas, generate AI instructions, and create example queries—all in under an hour. If you're working with Microsoft Fabric, this fast and intuitive method is a game-changer for building conversational analytics agents.

charlyS · ‎07-14-2025

Resume your capacity ➜ Run pipelines ➜ Refresh Power BI datasets ➜ Suspend capacity — all automated thanks to PowerShell and Azure Automation !

Ilgar_Zarbali · ‎07-02-2025

Microsoft Fabric revolutionizes data architecture by offering a unified platform that integrates Power BI, data science, real-time analytics, and more. At the heart of this ecosystem is the Lakehouse, a powerful, flexible, and scalable storage layer tailored for modern data engineering workflows.

In this article, we explore how Lakehouses work in Microsoft Fabric, how to set one up, and how they serve as the foundation for managing both files and structured data—all without the traditional complexity of data platforms.

mabdollahi · ‎06-19-2025

💡Ever wondered how to bring AI into your data engineering workflows in Microsoft Fabric?

In my latest hands-on project, I show how to automate Sentiment Analysis on customer feedback using:

Microsoft Fabric Lakehouse
PySpark notebooks
Azure OpenAI (GPT-4)
Fabric Data Pipelines
Power BI for real-time insights

This solution is fully integrated, no external services required, and takes just 10 minutes to set up.

📖Check out the full blog and video tutorial to see it in action:

#MicrosoftFabric #AzureOpenAI #DataEngineering #SentimentAnalysis #PowerBI #PySpark

jehebr1 · ‎06-19-2025

Explore a more native and streamlined alternative to detect and anonymize PII data. With just a few lines of code, Fabric’s built-in AI functions like ai.extract and ai.generate_response allow you to identify and redact PII directly within your data pipelines - no external libraries required.

Anusha_M · ‎05-08-2025

Discover the Variable Library in Microsoft Fabric, designed to empower users to define and manage variables at the workspace level. Seamlessly integrate across various workspace items, including data pipelines, notebooks, and Lakehouse shortcuts. This feature addresses several pain points and enhances the overall user experience within Fabric.

Ayush_Tiwari · ‎05-06-2025

In the ever-evolving landscape of data management, optimizing storage and access times is paramount. This article delves into the innovative V-Order feature in Fabric, a game-changer for data read times and storage efficiency. Discover how V-Order's write-time optimization technique enhances performance, reduces costs, and transforms data operations. Join us as we explore the technical intricacies, benefits, and real-world applications of V-Order, and learn how it can revolutionize your data management strategies.

Anonymous · ‎04-25-2025

Learn how to connect cross Tenant Azure Data Factory (and other services) to Fabric.

Find articles, guides, information and community news

FabricSharepointCopy – An open-source utility to ingest Sharepoint Data into Microsoft Fabric

Traditional Apps in Microsoft Fabric vs Org Apps (Preview Feature)

Guide to implement Zero-copy access to OneLake data in Azure Databricks

Zero-Copy Access to One ake Data in Azure Databricks

Mastering Data Ingestion with Microsoft Fabric Pipelines

Microsoft Fabric + Dataverse Zero-Copy Integration

Config-Driven Kusto to Fabric Lakehouse Ingestion: One Pipeline, Many Sources

Designing for Direct Lake: Architecture, Storage Strategy, and Performance in Microsoft Fabric

Why Fabric Notebooks Are Useful in a Lakehouse

Understanding UDF in Microsoft Fabric: Two Concepts, One Acronym

The Confusion Ends Here

What Makes UDFs Worth Understanding

Dive Deeper

Self‑Learning Fabric Data Agent Using .pbip + SDK

Batch at Streaming Speed: Simplifying File Processing with Spark Structured Streaming

Reinventing the Gold Layer Using Materialized Lake Views

Unified Analytics, Two Roads: Comparing Fabric’s Lakehouse and Databricks’ Spark‑First Stack

Materialized Lake Views -Case Study -Using NYC Taxi Data

Lakehouse in Microsoft Fabric: One Platform, Many Analytics Possibilities

Building a Scalable Learning Analytics Architecture: Integrating Moodle LMS with Microsoft Fabric

Automate SharePoint to Fabric Lakehouse Data Sync with Python

Using Fabric Shortcuts to AWS S3: What You Need to Know About Data Transfer Costs

🚀 From 60 Minutes to 30 Minutes: Optimizing Spark JSON Processing & Delta Lake Backfill

Scrape Currency Data with Python and Load into MS Fabric

Build apps with Fabric API for GraphQL with code generation, intelliSense in VS Code

Fabric Data Agent with GitHub Copilot Agent Mode

A Cost-Effective Fabric Solution Driven by Azure Automation

Microsoft Fabric Lakehouses: The Engine Room of Data Engineering

AI-Powered Sentiment Analysis in Microsoft Fabric with Azure OpenAI

PII Detection and Redaction with Fabric AI Functions

Transform Configuration Management with Fabric Variable Library

Unlock the power of V-Order: Revolutionize Data Read times and storage efficiency

Azure cross tenant access to Fabric Warehouse and Lakeshouse

Helpful resources

Get Fabric certified for FREE!

Find articles, guides, information and community news

The Confusion Ends Here

What Makes UDFs Worth Understanding

Dive Deeper

Helpful resources