The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
We are attempting to drive our data ingestion via CDC and SQL server in support of (near)realtime reporting. Generic mirroring seemed dicey due to various high watermarks in tables being unreliable, and other reasons. Generic Fabric streaming solution also became dicey because large legacy DBs would be locked for extended, and often unacceptible, periods during snapshot reads. Currently what we've developed is a custom Debezium instance writing to Eventhubs, then a Fabric Spark Job to read change events and translate those into the appropriate Delta tables with a high watermark based on the CDC event.
All well and good. Scaling is where this begins to seem shaky. We have several hundred client DBs in Azure, and also incorporating on-prem DBs from other parts of our company. So...
Solved! Go to Solution.
Hi @PhilBrown ,
I’d encourage you to submit your detailed feedback and ideas via Microsoft's official feedback channels, such as the Microsoft Fabric Ideas.
Feedback submitted here is often reviewed by the product teams and can lead to meaningful improvement.
Thanks,
Prashanth Are
MS Fabric community support
May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.
If we don’t hear back, we’ll go ahead and close this thread. For any further discussions or questions, please start a new thread in the Microsoft Fabric Community Forum we’ll be happy to assist.
Thank you for being part of the Microsoft Fabric Community.
Hi @PhilBrown,
As we have not heard back from last convesation, can you please provide your insights on below suggestions? did these help or you facing any other challenges.
Thanks,
Prashanth Are
MS Fabric community support
Hi @PhilBrown,
As we have not heard back from last convesation, can you please provide your insights on below suggestions? did these help or you facing any other challenges.
@lbendlin, Thanks for your inputs in this topic.
Thanks,
Prashanth Are
MS Fabric community support
Hi @PhilBrown ,
Try considering below steps for your usecase and let me know if this helps?
Thanks,
Prashanth Are
MS Fabric community support
#1 5 minutes seems to be the sweet/sour spot for spinning up/tearing down Spark sessions. If you need processing more rapidly than every 5 minutes then keeping the session alive is more economical. If less frequently then the spin up/tear down cost is more palatable. Similar to the start/stop feature of some cars.
#3 Not sure what you mean by micro batching but generally Delta Lake likes larger files, around the 1 GB mark. Lots of small files make for messy maintenance. So while CDC is nice for the data source it is not necessarily good for Fabric.
#5 Whatever you do in Fabric will end up in Delta Lake, if you want it or not.
Thanks for the reply Prashanth. A couple of notes, while I research.
In general, there always seems to be some minor to significant latency in spinning up batch jobs, often adding between 2 and 4 minutes to any notebook, pipeline execution, or Spark job. I was attempting to avoid this accumulated latency with something closer to a long-running job, but this may be better suited to Azure functions or service.
Regarding points 2 & 3, my existing spark job is already doing this effectively caching in memory and batching table upserts/deletes every 2-5 minutes.
On #4, there are config settings in Debezium that prevent full table-locks during snapshot, yet still provide acceptable concurrancy in a majority of cases. They just aren't made available within the generic EventStream setup in Fabric. Probably a big miss in my opinion.
Hi @PhilBrown ,
I’d encourage you to submit your detailed feedback and ideas via Microsoft's official feedback channels, such as the Microsoft Fabric Ideas.
Feedback submitted here is often reviewed by the product teams and can lead to meaningful improvement.
Thanks,
Prashanth Are
MS Fabric community support