Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!To celebrate FabCon Vienna, we are offering 50% off select exams. Ends October 3rd. Request your discount now.
Hi Brains Trust,
We are currently setting up a Data Warehouse in MS Fabric and where wondering what the best Practice for both performace and usability would be to take a weekly archive of a large amount of data?
The team need to be able to access 104 Weeks of archives (2 years) to do certain comparisons, reports, forecast acuracy etc.
So we have 4 sets of data we need to archive.
* LPE ExFactory - Week Cut Over
* LPE ExFactory - Month Cut Over
* LPE Scan - Week Cut Over
* LPE Scan - Month Cut Over
ExFactory and Scan are totally different from each other, but the Week/Month cut over between the same source have overlapping data (eg in image below) (Note it's about 8 years of data in total)
Currently, we have taken the absolute easiest approcach; we have a data flow that is executed once a week that does a full copy of the entire table adding an archive date, in the archive table.
So in 2 years table, this table will have 104 snap shots of the source table.
This is currently taking abotu 3 hours for each table (Run concurrently)
I'm sure this is both very inefficent for the amount of data being archive and the process being used.
I've had thoughts about archiving only the forecast portion and then using something complex in the Semantic model to merge, but I'm also aware that if I'm trying to merge 104 archives to the corrisponding actuals in the SM, this might also take a long time to refresh. (Note SM only needs to be refresh once a day)
I've also thought about using Notebooks which I believe is more efficent that DataFlows, but my coding days are a little behind me, so it might take a while to get my head around setting these up.
Thanks in advance for any thoughts, assistant, or links to helpful resources provided.
UPDATE -
As my SQL is fairly good, so I decied to just do a simple test, thinking another option would be to right an Stored Proc to do this instead of a Dataflow.
INSERT INTO Test_LPE_Archive
Select '2025/05/01', *
from [LPE ExFactory Month]
This is basically the behaviour I want from the dataflow and it took only 22 seconds to run, so I don't know why my dataflow is taking around 3 hours to do this.
there are 6.5
Update 2
Not sure if this is the best option, especially for usability in SM, but for now, I've moved away from Dataflows and setup a TSQL notebook and just run the required inserts. This is taking just 1 min to run everything 🙂
Solved! Go to Solution.
Hi @Byzza,
You're on the right path using T-SQL notebooks, they’re faster and give you more control than Dataflows, especially for large-scale archiving. To make your process efficient, maintainable, and scalable over 8 years, here's a refined approach:
Split Forecasts and Actuals
Actuals don’t change, so archive them once in a permanent history table. Forecasts change weekly, so only archive the forecast data each week, adding an ArchiveDate. This avoids copying static data and keeps storage lean.
One-time actuals archive INSERT INTO LPE_Actuals_History (...)SELECT ... FROM LPE_ExFactory_Week WHERE Actual IS NOT NULL UNION ALL SELECT ... FROM LPE_ExFactory_Month WHERE Actual IS NOT NULL;
Weekly forecast snapshot INSERT INTO LPE_Forecast_ArchiveSELECT GETDATE(), Week, Month, 'Week', Forecast, 'ExFactory' FROM LPE_ExFactory_WeekWHERE Forecast IS NOT NULL UNION ALL SELECT GETDATE(), Week, Month, 'Month', Forecast, 'ExFactory' FROM LPE_ExFactory_MonthWHERE Forecast IS NOT NULL;
Automate with Notebooks and Pipelines
Continue using T-SQL notebooks for performance. Schedule them weekly with a Fabric pipeline. Add a logging table to track row counts and run times:
INSERT INTO Archive_LogSELECT GETDATE(), 'LPE_Forecast_Archive', @@ROWCOUNT;
Use Delta Tables and Partitioning
Store your forecast archive in a Fabric Lakehouse as a Delta table and partition it by ArchiveDate. This improves performance and keeps queries efficient as data grows.
Power BI Setup
In your Semantic Model, create separate fact tables for Actuals and Forecasts. Use a shared Dim_Date table. Add a slicer for ArchiveDate so users can choose which forecast to compare against actuals. Set up incremental refresh to only update recent data.
daxCopyEditForecast Selected =
CALCULATE(
SUM('Fact_Forecast_Archive'[Forecast]),
'Fact_Forecast_Archive'[ArchiveDate] = SELECTEDVALUE('Dim_ArchiveDate'[ArchiveDate])
)
Plan for Growth
After two years, consider summarizing older forecast data (e.g., monthly averages) and moving it to cold storage. You can also archive actuals by year if needed, and monitor storage growth through your log table.
If this solution worked for you, kindly mark it as Accept as Solution and feel free to give a Kudos, it would be much appreciated!
Thank you.
Hi,
Thanks for this detailed reply, that sounds like a great path to take and I'll try to implement it when I have some more time
Hi @Byzza,
You're on the right path using T-SQL notebooks, they’re faster and give you more control than Dataflows, especially for large-scale archiving. To make your process efficient, maintainable, and scalable over 8 years, here's a refined approach:
Split Forecasts and Actuals
Actuals don’t change, so archive them once in a permanent history table. Forecasts change weekly, so only archive the forecast data each week, adding an ArchiveDate. This avoids copying static data and keeps storage lean.
One-time actuals archive INSERT INTO LPE_Actuals_History (...)SELECT ... FROM LPE_ExFactory_Week WHERE Actual IS NOT NULL UNION ALL SELECT ... FROM LPE_ExFactory_Month WHERE Actual IS NOT NULL;
Weekly forecast snapshot INSERT INTO LPE_Forecast_ArchiveSELECT GETDATE(), Week, Month, 'Week', Forecast, 'ExFactory' FROM LPE_ExFactory_WeekWHERE Forecast IS NOT NULL UNION ALL SELECT GETDATE(), Week, Month, 'Month', Forecast, 'ExFactory' FROM LPE_ExFactory_MonthWHERE Forecast IS NOT NULL;
Automate with Notebooks and Pipelines
Continue using T-SQL notebooks for performance. Schedule them weekly with a Fabric pipeline. Add a logging table to track row counts and run times:
INSERT INTO Archive_LogSELECT GETDATE(), 'LPE_Forecast_Archive', @@ROWCOUNT;
Use Delta Tables and Partitioning
Store your forecast archive in a Fabric Lakehouse as a Delta table and partition it by ArchiveDate. This improves performance and keeps queries efficient as data grows.
Power BI Setup
In your Semantic Model, create separate fact tables for Actuals and Forecasts. Use a shared Dim_Date table. Add a slicer for ArchiveDate so users can choose which forecast to compare against actuals. Set up incremental refresh to only update recent data.
daxCopyEditForecast Selected =
CALCULATE(
SUM('Fact_Forecast_Archive'[Forecast]),
'Fact_Forecast_Archive'[ArchiveDate] = SELECTEDVALUE('Dim_ArchiveDate'[ArchiveDate])
)
Plan for Growth
After two years, consider summarizing older forecast data (e.g., monthly averages) and moving it to cold storage. You can also archive actuals by year if needed, and monitor storage growth through your log table.
If this solution worked for you, kindly mark it as Accept as Solution and feel free to give a Kudos, it would be much appreciated!
Thank you.
User | Count |
---|---|
3 | |
2 | |
2 | |
1 | |
1 |
User | Count |
---|---|
12 | |
8 | |
4 | |
3 | |
2 |