Solved: Re: Guidance Needed for Importing Large Datasets i...

js15 · ‎02-17-2025

I hope this message finds you well. I am currently working on a project that involves importing a large dataset into a Fabric notebook and subsequently transferring it into a DataFrame. However, I am unsure of the best approach to handle the dataset in chunks within Fabric.

Could you please provide guidance on how to effectively split the dataset into manageable chunks for import? Additionally, any tips on transferring these chunks into a DataFrame within the notebook would be greatly appreciated.

Thank you for your assistance.

v-saisrao-msft · ‎02-17-2025

Hi @js15,

Thank you for reaching out to Microsoft Fabric Froum Community.

Regarding importing large datasets into Microsoft Fabric Notebooks. To handle such datasets efficiently, please follow the below steps:

To prevent memory issues and ensure efficient processing, consider breaking down the dataset into smaller chunks. Many tools, such as pandas, support reading data in chunks, allowing only manageable portions to be loaded into memory at a time.

CSV Files: Use the chunk size parameter in pandas to read the dataset in segments. This helps minimize memory usage by working with smaller portions of the data.
Parquet Files: For Parquet files, utilize libraries like pandas,which support chunked reading and are optimized for large datasets, ensuring better memory management.

Upon processing each chunk, you have the option to handle them separately or merge them into a single DataFrame, based on your analytical needs. This can be achieved using pandas concat function or other appropriate methods.

If the dataset is very large, parallel processing can expedite the import and processing phases.

When handling large datasets, it is crucial to monitor memory usage. Adjust the chunk size based on your available memory to avoid overflow.

If this post helps, then please give us ‘Kudos’ and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

View solution in original post

v-saisrao-msft · ‎02-27-2025

Hi @js15,

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.

Thank you.

js15 · ‎02-27-2025

Yes, that is great! thank you.

js15 · ‎02-18-2025

Thank you for your assistance. However, it appears that the chunk size is not being recognized. Could you please advise on the next steps I should take?

v-saisrao-msft · ‎02-24-2025

Hi @js15,

Thank you for the clarification. It appears that the following thread, which has already been resolved, is similar to your issue:

https://community.fabric.microsoft.com/t5/Service/Chunksize-and-num-row-parameter-not-found-in-Fabri...

Could you please confirm if your issue has been resolved? If so, kindly mark the helpful reply and accept it as the solution. This will assist other community members in resolving similar problems more quickly.

Thank you.

v-saisrao-msft · ‎02-17-2025

Hi @js15,

Thank you for reaching out to Microsoft Fabric Froum Community.

Regarding importing large datasets into Microsoft Fabric Notebooks. To handle such datasets efficiently, please follow the below steps:

To prevent memory issues and ensure efficient processing, consider breaking down the dataset into smaller chunks. Many tools, such as pandas, support reading data in chunks, allowing only manageable portions to be loaded into memory at a time.

CSV Files: Use the chunk size parameter in pandas to read the dataset in segments. This helps minimize memory usage by working with smaller portions of the data.
Parquet Files: For Parquet files, utilize libraries like pandas,which support chunked reading and are optimized for large datasets, ensuring better memory management.

Upon processing each chunk, you have the option to handle them separately or merge them into a single DataFrame, based on your analytical needs. This can be achieved using pandas concat function or other appropriate methods.

If the dataset is very large, parallel processing can expedite the import and processing phases.

When handling large datasets, it is crucial to monitor memory usage. Adjust the chunk size based on your available memory to avoid overflow.

If this post helps, then please give us ‘Kudos’ and consider Accept it as a solution to help the other members find it more quickly.

Thank you.

Guidance Needed for Importing Large Datasets into Microsoft Fabric Notebooks

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026

FabCon is coming to Atlanta

Guidance Needed for Importing Large Datasets into Microsoft Fabric Notebooks

Helpful resources

Power BI Monthly Update - November 2025

Fabric Data Days

FabCon Atlanta 2026