March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi
Not necicerally a pipeline question but what would be the best way to ingest a 1.5tb db into Fabric?
I feel like a notebook into a KQL DB would be most optermised. I am going to run through tutorials for the likes of the taxi data but I am wondering if any one has had any experiance importing this scale of data into Fabric.
Thanks
Solved! Go to Solution.
Hi @Housden996
Thanks for using Fabric Community.
Ingesting large-scale data into Microsoft Fabric can be achieved through several methods, each with its own strengths depending on your specific use case.
Here are some options:
COPY (Transact-SQL): This method offers flexible, high-throughput data ingestion from an external Azure storage account. You can use the COPY statement as part of your existing ETL/ELT logic in Transact-SQL code.
Data Pipelines: Pipelines offer a code-free or low-code experience for data ingestion. Using pipelines, you can orchestrate robust workflows for a full Extract, Transform, Load (ETL) experience that includes activities to help prepare the destination environment, run custom Transact-SQL statements, perform lookups, or copy data from a source to a destination.
Dataflows: An alternative to pipelines, dataflows enable easy data preparation, cleaning, and transformation using a code-free experience.
Cross-Warehouse Ingestion: Data ingestion from workspace sources is also possible. This scenario might be required when there’s the need to create a new table with a subset of a different table, or as a result of joining different tables in the warehouse and in the lakehouse.
For large-scale data ingestion, you might want to consider using data pipelines or the COPY statement for the highest data ingestion throughput possible. You can also use Dataflow Gen2 . Dataflows provide a low-code interface for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations.
For more information refer to these links:
https://www.youtube.com/watch?v=sXZkrFtN5oc
Ingesting data into the warehouse - Microsoft Fabric | Microsoft Learn
https://www.linkedin.com/pulse/mastering-data-loading-microsoft-fabric-comprehensive-machado-ph-d-/
https://community.fabric.microsoft.com/t5/Desktop/Need-help-on-handling-the-large-volume-of-data/td-...
https://www.linkedin.com/pulse/data-ingestion-patterns-microsoft-fabric-sankha-chakraborty-zei4c/
Hope this helps. Please let me know if you have any further questions.
If it is in Azure, the new Mirroring integration may simplify things for you: Mirroring - Microsoft Fabric | Microsoft Learn
1. This is a MYSQL Server hosted on the client side
2. Unsure about how many tables they didnt let me know
3. Clone for the time being, could be sunsetted once migrated over to MongoDB
I dont think unfortuntly I can mirror this type of DB.
Hi @Housden996
Thanks for using Fabric Community.
Ingesting large-scale data into Microsoft Fabric can be achieved through several methods, each with its own strengths depending on your specific use case.
Here are some options:
COPY (Transact-SQL): This method offers flexible, high-throughput data ingestion from an external Azure storage account. You can use the COPY statement as part of your existing ETL/ELT logic in Transact-SQL code.
Data Pipelines: Pipelines offer a code-free or low-code experience for data ingestion. Using pipelines, you can orchestrate robust workflows for a full Extract, Transform, Load (ETL) experience that includes activities to help prepare the destination environment, run custom Transact-SQL statements, perform lookups, or copy data from a source to a destination.
Dataflows: An alternative to pipelines, dataflows enable easy data preparation, cleaning, and transformation using a code-free experience.
Cross-Warehouse Ingestion: Data ingestion from workspace sources is also possible. This scenario might be required when there’s the need to create a new table with a subset of a different table, or as a result of joining different tables in the warehouse and in the lakehouse.
For large-scale data ingestion, you might want to consider using data pipelines or the COPY statement for the highest data ingestion throughput possible. You can also use Dataflow Gen2 . Dataflows provide a low-code interface for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations.
For more information refer to these links:
https://www.youtube.com/watch?v=sXZkrFtN5oc
Ingesting data into the warehouse - Microsoft Fabric | Microsoft Learn
https://www.linkedin.com/pulse/mastering-data-loading-microsoft-fabric-comprehensive-machado-ph-d-/
https://community.fabric.microsoft.com/t5/Desktop/Need-help-on-handling-the-large-volume-of-data/td-...
https://www.linkedin.com/pulse/data-ingestion-patterns-microsoft-fabric-sankha-chakraborty-zei4c/
Hope this helps. Please let me know if you have any further questions.
Hi @Housden996
We haven’t heard from you on the last response and was just checking back to see if your query has been resolved. Otherwise, will respond back with the more details and we will try to help.
Thanks
Hi @Housden996
We haven’t heard from you on the last response and was just checking back to see if your query has been resolved. Otherwise, will respond back with the more details and we will try to help.
Thanks
Thanks and appolagies for the delay.
I will be going down the cloning route, how the potential client curerntly uses Tableau Server is a bit strange as they dont really use it for reporting more just for working with the data so the likes of Notebooks and the ML side of this is more usefull to them.
I have looked into the copy feature and this seems more cost effective for the client!
User | Count |
---|---|
6 | |
2 | |
2 | |
1 | |
1 |