Solved: How to Schedule Mirrored Data replication in Micro...

August_Stidberg · ‎03-17-2025

Hello,

I'm working with Microsoft Fabric and Snowflake, and I'm using the Mirroring feature to replicate data from Snowflake into Onelake. However, The data replication process runs continuously throughout the day, which is impacting my Snowflake costs.

My Snowflake tables reload completely once a day, and I would like to set up the mirroring process in Microsoft Fabric to only trigger after the data refresh is complete in Snowflake. This way, I can avoid unnecessary replication throughout the day and reduce costs.

Is there any way to schedule the mirroring refresh to only run once a day after my Snowflake tables have reloaded? Ideally, I would like to set it up so that the mirroring is aligned with my Snowflake data refresh schedule.

Any guidance or best practices on how to configure this would be greatly appreciated!

Thanks in advance

v-tsaipranay · ‎03-17-2025

Hi @August_Stidberg ,

Thank you for reaching out to microsoft fabric community forum. I understand you want to control data replication from Snowflake to Microsoft Fabric’s OneLake to reduce costs due to continuous mirroring increasing Snowflake usage.

While Fabric’s mirroring feature doesn’t currently support built-in scheduling, you can achieve this by setting up a Microsoft Fabric Data Pipeline. This approach allows you to configure a pipeline that extracts data from Snowflake and loads it into OneLake, scheduled to run once per day, ideally timed to start after your Snowflake tables finish their daily refresh.

For example, if your data refresh completes around 3 AM, you can schedule the pipeline to trigger at 3:30 AM to ensure the data is ready. We recommend running the pipeline manually first to test the setup and verify the data is loaded correctly.

Additionally, keep an eye on performance metrics in both Fabric and Snowflake to ensure the setup is cost-efficient, adjusting the schedule if your Snowflake refresh timing changes. For more advanced optimization, you could explore Snowflake Streams to capture only new or changed data after the refresh, minimizing replication further.

This setup will help you avoid unnecessary data transfers, cut down on Snowflake costs, and keep your OneLake data up to date with your daily Snowflake refresh.

I hope my suggestions give you good idea, if you need any further assistance, feel free to reach out.

If this post helps, then please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

View solution in original post

v-tsaipranay · ‎03-31-2025

Hi @August_Stidberg

Thanks for your update. Using the Fabric Mirroring REST API to control replication timing is a great approach.

Regarding your Direct Lake performance issue on F128 capacity, the key is optimizing data size and query efficiency. Since Direct Lake has a 3 billion row / 50GB memory limit, performance can degrade with very large datasets.

To improve efficiency, consider partitioning tables (e.g., by date) and using incremental mirroring with Snowflake Streams to avoid full table replication.

Additionally, combining Direct Lake with Import mode for frequently queried data can enhance performance. Ensure your Power BI queries use aggregations to reduce memory consumption and monitor F128 resource usage to avoid exceeding capacity limits. If performance issues persist, scaling to F256 or optimizing query structures may be necessary.

If this post helps, then please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

View solution in original post

v-tsaipranay · ‎03-17-2025

Hi @August_Stidberg ,

Thank you for reaching out to microsoft fabric community forum. I understand you want to control data replication from Snowflake to Microsoft Fabric’s OneLake to reduce costs due to continuous mirroring increasing Snowflake usage.

While Fabric’s mirroring feature doesn’t currently support built-in scheduling, you can achieve this by setting up a Microsoft Fabric Data Pipeline. This approach allows you to configure a pipeline that extracts data from Snowflake and loads it into OneLake, scheduled to run once per day, ideally timed to start after your Snowflake tables finish their daily refresh.

For example, if your data refresh completes around 3 AM, you can schedule the pipeline to trigger at 3:30 AM to ensure the data is ready. We recommend running the pipeline manually first to test the setup and verify the data is loaded correctly.

Additionally, keep an eye on performance metrics in both Fabric and Snowflake to ensure the setup is cost-efficient, adjusting the schedule if your Snowflake refresh timing changes. For more advanced optimization, you could explore Snowflake Streams to capture only new or changed data after the refresh, minimizing replication further.

This setup will help you avoid unnecessary data transfers, cut down on Snowflake costs, and keep your OneLake data up to date with your daily Snowflake refresh.

I hope my suggestions give you good idea, if you need any further assistance, feel free to reach out.

If this post helps, then please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

August_Stidberg · ‎03-28-2025

This didn't fully solve my problem. I found a solution using Fabric Mirroring's REST API to start and stop mirroring my tables. When the tables were finished in Snowflake, the mirroring started, and when the mirroring status was completed in Fabric, I stopped the mirroring. However, I didn't go through the entire process because I realized that this still didn’t solve my issue due to a limitation in Direct Lake regarding rows. In an F128 capacity, the system supports queries up to 3 billion rows within a 50GB memory limit, usually handling about 280 million rows across 25 columns efficiently

v-tsaipranay · ‎03-31-2025

Hi @August_Stidberg

Thanks for your update. Using the Fabric Mirroring REST API to control replication timing is a great approach.

Regarding your Direct Lake performance issue on F128 capacity, the key is optimizing data size and query efficiency. Since Direct Lake has a 3 billion row / 50GB memory limit, performance can degrade with very large datasets.

To improve efficiency, consider partitioning tables (e.g., by date) and using incremental mirroring with Snowflake Streams to avoid full table replication.

Additionally, combining Direct Lake with Import mode for frequently queried data can enhance performance. Ensure your Power BI queries use aggregations to reduce memory consumption and monitor F128 resource usage to avoid exceeding capacity limits. If performance issues persist, scaling to F256 or optimizing query structures may be necessary.

If this post helps, then please give us Kudos and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

v-tsaipranay · ‎04-03-2025

Hi @August_Stidberg ,

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

v-tsaipranay · ‎03-20-2025

Hi @August_Stidberg ,

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

v-tsaipranay · ‎03-23-2025

Hello @August_Stidberg ,

I wanted to follow up on our previous suggestions regarding the issue. We would love to hear back from you to ensure we can assist you further.

If my response has addressed your query, please accept it as a solution and give a ‘Kudos’ so other members can easily find it. Please let us know if there’s anything else we can do to help.

Thank you.

v-tsaipranay · ‎03-27-2025

Hi @August_Stidberg ,

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.