Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi Team,
i need to ingest the data from on premise sql server to Fabric. i am able to do this using Dataflows, but my requirement is here to use Notebook.
is there any way or code to do that. please let me know.
TIA
The best way to do that is create a Pipeline and use the Copy Data function to ingestion (is more performant than Notebooks). Then once you have the raw data in your Workspace use notebooks to transform and load the data proccesed in your final destination.
Hello @sudhav
I am in agreement with @DennesTorres . It seems like the bottle neck at this time is the data gateway .
Can you try and upload the data to the cloud and then try to run the data flow ( pointing to the cloud location )? You never mentioned what kind of data you have ? Is it SQL or somekind of files ? If its file you can give a try with Azcopy
@sudhav Try and connect with Vengatesh, he mentioned in his video that it can be possible via synapse sprak
I am unsure of his solution , but good idea to chat with him as he seems MS employee
https://www.reddit.com/user/VengateshP/
https://www.youtube.com/watch?v=ulspQ4Rb_mY
Hi,
I don't think the notebooks on Fabric can access data through the data gateway. The pipelines can't also, only the dataflows can.
I'm curious: Wouldn't it be enough to bring the data to the Fabric environment using a dataflow and later transform the data using a notebook?
Kind Regards,
Dennes
Hi, I agree with you dataflows are enough, but those are very slow and my customer asking me to do it in another way because huge volume of the dta in on prem. so dataflows are getting failed oftenely while publishing and also getting performance issues.
Hi,
I don't think the dataflows are the ones to be blamed here, although this is a very specific experience and it would be interesting to hear from someone from Microsoft.
Dataflows are linked to the Fabric capacity as the notebooks, the performance should not have this huge difference, it should be only a matter of choice - I believe so.
Anyway, the bottleneck on this case doesn't seems to be the dataflow execution, but the transfer through data gateway, which notebooks don't support yet.
First, I would suggest to analyse the volume of data: would it be possible to make incremental load to reduce it?
Second, I would consider alternate ways. What if you make some automation on premises, may be SSIS, to generate files and upload these files to the "Files" area of a lakehouse, than start in the cloud from this point?
At the moment you may be tied by the lack of notebook support to the data gateway. In the future, you may need a support ticket to ask about the best option and performance issues.
Kind Regards,
Dennes
Thanks for your explaination here Dennes. This issue bother me as well for these days.
Now I have to thinking use dataflow to ingest source data instead of Notebook since it does not support data gateway.
Would like to add one point : you know sometimes the source data tables could be hundreds or thousands and I need to manually config the destination one by one in dataflow as below.
In addtion, notebook will be more easier to mainten if anything need to be changed, like datasource connection string change., data base tables change, etc.
Check out the September 2024 Fabric update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
5 | |
3 | |
2 | |
1 | |
1 |
User | Count |
---|---|
11 | |
7 | |
3 | |
3 | |
3 |