Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
gemcityzach
Helper IV
Helper IV

Building a File based data mart strategy?

Hey folks,

 

in trying to figure out what the best solution is going to be for my use case. There's not an easy way to get direct connections to any of our databases. Therefore, for 90% of my source data it's all coming from daily CSV files delivered via email. Let's say these are daily snapshots from 30 data sources. And the total records each day is less than 500k records in total from all 30 data sets.

 

we cannot easily manage this in SharePoint to PowerBI any more. We need to move into something more automated and sustainable to do our transformations and calculations.

 

what would be the best fabric solution to manage the ingestion and transformation of all of this data so that we can do transformations in Python and publish the cleansed data into PowerBI or a TSQL end point?

 

I was thinking of a premium fabric data warehouse. We stage the files daily into a fabric lake house parent folder with 30 sub-folders. Each sub folder would have the daily files that get appended in. Those appends could be handled by a powerautomate flow that grabs the data out of our mailbox and writes them into the correct folder.

 

then we use either data flow or nktebooks to connect to the lake house folders to ingest the data into 30 different tables in the data warehouse. In this case we're just. Then we use notebooks to transform and load to 30 silver or gold tables?

 

 

3 REPLIES 3
lbendlin
Super User
Super User

we cannot easily manage this in SharePoint to PowerBI any more

Can you please elaborate?  With a sufficient quota a SharePoint Document Library seems to work well for us, especially when you can avoid folder contents enumeration across the whole site.

 

A Fabric Data Lake would be an obvious choice but it is based on Parquet (Delta Lake) files, and the general guidance is to avoid the "many small files"  scenario that you have. They prefer large files in the gigabyte range.

 

If you can run a Fabric trial you should test it out and see what the performance is, including the transforms from CSV to Parquet (which is built in, I think).

Are you bringing the data into directly from SharePoint to a fabric data warehouse into individual tables like I described? If that was possible and then we could do transformations in the data warehouse that would work

Are you bringing the data into directly from SharePoint to a fabric data warehouse into individual tables like I described?

We are not there yet - we are using SharePoint as our Data Lake for now.  But with Data Factory you can bring the data over if you need to.

Helpful resources

Announcements
ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.