March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hey folks,
in trying to figure out what the best solution is going to be for my use case. There's not an easy way to get direct connections to any of our databases. Therefore, for 90% of my source data it's all coming from daily CSV files delivered via email. Let's say these are daily snapshots from 30 data sources. And the total records each day is less than 500k records in total from all 30 data sets.
we cannot easily manage this in SharePoint to PowerBI any more. We need to move into something more automated and sustainable to do our transformations and calculations.
what would be the best fabric solution to manage the ingestion and transformation of all of this data so that we can do transformations in Python and publish the cleansed data into PowerBI or a TSQL end point?
I was thinking of a premium fabric data warehouse. We stage the files daily into a fabric lake house parent folder with 30 sub-folders. Each sub folder would have the daily files that get appended in. Those appends could be handled by a powerautomate flow that grabs the data out of our mailbox and writes them into the correct folder.
then we use either data flow or nktebooks to connect to the lake house folders to ingest the data into 30 different tables in the data warehouse. In this case we're just. Then we use notebooks to transform and load to 30 silver or gold tables?
we cannot easily manage this in SharePoint to PowerBI any more
Can you please elaborate? With a sufficient quota a SharePoint Document Library seems to work well for us, especially when you can avoid folder contents enumeration across the whole site.
A Fabric Data Lake would be an obvious choice but it is based on Parquet (Delta Lake) files, and the general guidance is to avoid the "many small files" scenario that you have. They prefer large files in the gigabyte range.
If you can run a Fabric trial you should test it out and see what the performance is, including the transforms from CSV to Parquet (which is built in, I think).
Are you bringing the data into directly from SharePoint to a fabric data warehouse into individual tables like I described? If that was possible and then we could do transformations in the data warehouse that would work
Are you bringing the data into directly from SharePoint to a fabric data warehouse into individual tables like I described?
We are not there yet - we are using SharePoint as our Data Lake for now. But with Data Factory you can bring the data over if you need to.
User | Count |
---|---|
8 | |
6 | |
5 | |
2 | |
1 |
User | Count |
---|---|
15 | |
10 | |
5 | |
4 | |
4 |