Hi, does anyone have an idea of when the "Copy data" activity might work for on-prem data sources (in our case SQL Server)? I know we can use Gen2 dataflows for this - but they are very slow. Really hoping copy data is significantly faster.
Hi all, quick updates on this topic.
1. Per the Fabric roadmap ( https://learn.microsoft.com/en-us/fabric/release-plan/data-factory#opdg ), the ability to "fast" copy from on prem to the Fabric service is estimated for Q1 2024. Note however this talks about requiring that the gateway servers be "behind a vnet". No clue exactly what this means or how difficult this is to set up
2. During Ignite, Microsoft announced database mirroring ( https://blog.fabric.microsoft.com/en-us/blog/introducing-mirroring-in-microsoft-fabric?ft=All ), a kind of next-gen change data capture. SQL server on-prem is not ready yet but is one of the next databases they'll be targetting.
Hope this helps,
Hi just for clarity I think the above "Note however this talks about requiring that the gateway servers be "behind a vnet" " is referenced in the Fabric link above:
"This feature will enable data pipelines to use Fabric data gateways to access data that is on-premises and behind a VNet." I have bolded the and because I think this should be an OR ie this is two different scenarios where gateways can be used.
@Dhicks that would be awesome if true. I'm a simple BI guy, don't really understand VNets, but just glancing through them I could feel heartburn at having to get this approved by security.
Appreciate the insights!
Datamarts are you friend here - it sounds odd but they are a great tool in the Fabric toolbox
Create a Datamart and load you data from your on prem SQL server to the Datamart
Then in a pipeline use a copy activity to move the data from the datamart ( which appears as an Azure SQL Server) to the target Warehouse.
It works really well and we use it estensively as Gen2 Dataflows do not have the reliability or performance that we need.
@BryanCarmichael - curious how do you get your data into the datamart performantly? I thought those used dataflows, which is still a huge performance issue for us. I need to move just over a Terabyte nightly. Understand once it's in a datamart it might be fast, but not sure how to get the data in there quickly to start with.
Hi - We use dataflows to load to datamarts - they are reasonably fast - do have a limit of 100 GB though.
Do you need to load all of that terrabyte of data each day ? - could you not just do a delta load to the datamart ?
We are working on enabling OPDG for Copy Activity to provide the same capabilities in Fabric data pipelines as already existing today in ADF with SHIR. The plan is to bring this capability to Fabric very soon during public preview.
hi @makromer , just wanted to touch base and see if there are any updates on when the fast copy capability to move data from on-prem SQL server to Fabric is. Coming soon?
What sort of transfer rates ar you getting in gen2 dataflows? Mine just took an hour to load a 2gb table from On Prem SQL. Trying to work out if it's our environment or just how it is.
FYI @bcdobbs - the 2GB / hr seems roughly in line with what we get with stuff going through the gateway server. That's with a 3 gateway cluster but I haven't spent any time getting fancy with things like trying to partition the data (via incremental refresh) to get multiple loads running in parallel.
Hope this helps,
I'll send you some info once we get the firewalls opened. What I can say is this - if I look at Gen1 dataflow performance, it's nowhere even close to what the "COPY" utility is supposed to do. Based on this documentation ( https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance ) I should be able to move our entire DW over in under an hour. That's orders of magnitude faster than what Gen1 dataflows feels like.
Hope this helps!
Hey Scott! Saw your comment in another thread with respect to being able to support on-prem connectivity via pipelines, this is a top priority to bring to Fabric as part of the team's path to General Availability. There will be some improvements brought to Dataflows to help with ingestion (leveraging the same underlying copy activity / fast copy infrastructure of pipelines) there are still few items to resolve to get us there but if you can help the team out by voting over on Fabric ideas so they can best prioritize the necessary work: https://aka.ms/fabricideas
Fabric has now gone General Availability, do you have any idea of the timeframe to support Copy Data activity for pipelines, through a On Premise Data gateway?
Hi @AlexPowers ,
Thank you for this statement as it makes me feel much better. "this is a top priority to bring to Fabric as part of the team's path to General Availability."
As-of today, is-there any news on when on-prem connectivity via pipelines would become available?
hi @AlexPowers , just wanted to follow up and see if there's been any progress on using pipelines and the copy activity to bring on-prem SQL server data into Fabric. Been a couple of months, hoping we're getting close. Gen2 dataflow performance is just too slow for the amounts of data we're hoping to move to Fabric.
Do you know if the copy data activity in pipelines to on prem systems is available yet? And if not are there any timescales as to when it will be?
Hi @AlexPowers , appreciate the reply - thank you very much! I'll definitely vote in the fabric ideas area, but I'll also throw out there: wihtout this capability, it's really limiting your customers to ones that are already buying Azure from you, and doesn't help attract new customers who (like us) can't move our on-prem SQL server to the cloud easily.
I appreciate the response - thank you!
Check out the November 2023 Fabric update to learn about new features.
130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.