Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
js15
Helper I
Helper I

Guidance Needed for Importing Large Datasets into Microsoft Fabric Notebooks

I hope this message finds you well. I am currently working on a project that involves importing a large dataset into a Fabric notebook and subsequently transferring it into a DataFrame. However, I am unsure of the best approach to handle the dataset in chunks within Fabric.

Could you please provide guidance on how to effectively split the dataset into manageable chunks for import? Additionally, any tips on transferring these chunks into a DataFrame within the notebook would be greatly appreciated.

Thank you for your assistance.

1 ACCEPTED SOLUTION
v-saisrao-msft
Community Support
Community Support

Hi @js15,

Thank you for reaching out to Microsoft Fabric Froum Community. 

 

Regarding importing large datasets into Microsoft Fabric Notebooks. To handle such datasets efficiently, please follow the below steps: 

To prevent memory issues and ensure efficient processing, consider breaking down the dataset into smaller chunks. Many tools, such as pandas, support reading data in chunks, allowing only manageable portions to be loaded into memory at a time. 

  • CSV Files: Use the chunk size parameter in pandas to read the dataset in segments. This helps minimize memory usage by working with smaller portions of the data. 
  • Parquet Files: For Parquet files, utilize libraries like pandas,which support chunked reading and are optimized for large datasets, ensuring better memory management. 

Upon processing each chunk, you have the option to handle them separately or merge them into a single DataFrame, based on your analytical needs. This can be achieved using pandas concat function or other appropriate methods. 

If the dataset is very large, parallel processing can expedite the import and processing phases.

When handling large datasets, it is crucial to monitor memory usage. Adjust the chunk size based on your available memory to avoid overflow.

 

If this post helps, then please give us ‘Kudos’ and consider Accept it as a solution to help the other members find it more quickly. 

 

Thank you. 

 

View solution in original post

5 REPLIES 5
v-saisrao-msft
Community Support
Community Support

Hi @js15,

 

I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.


Thank you.

Yes, that is great! thank you.

 

js15
Helper I
Helper I

Thank you for your assistance. However, it appears that the chunk size is not being recognized. Could you please advise on the next steps I should take?

Hi @js15,

 

Thank you for the clarification. It appears that the following thread, which has already been resolved, is similar to your issue:

https://community.fabric.microsoft.com/t5/Service/Chunksize-and-num-row-parameter-not-found-in-Fabri...

Could you please confirm if your issue has been resolved? If so, kindly mark the helpful reply and accept it as the solution. This will assist other community members in resolving similar problems more quickly.

 

Thank you.

 

 

v-saisrao-msft
Community Support
Community Support

Hi @js15,

Thank you for reaching out to Microsoft Fabric Froum Community. 

 

Regarding importing large datasets into Microsoft Fabric Notebooks. To handle such datasets efficiently, please follow the below steps: 

To prevent memory issues and ensure efficient processing, consider breaking down the dataset into smaller chunks. Many tools, such as pandas, support reading data in chunks, allowing only manageable portions to be loaded into memory at a time. 

  • CSV Files: Use the chunk size parameter in pandas to read the dataset in segments. This helps minimize memory usage by working with smaller portions of the data. 
  • Parquet Files: For Parquet files, utilize libraries like pandas,which support chunked reading and are optimized for large datasets, ensuring better memory management. 

Upon processing each chunk, you have the option to handle them separately or merge them into a single DataFrame, based on your analytical needs. This can be achieved using pandas concat function or other appropriate methods. 

If the dataset is very large, parallel processing can expedite the import and processing phases.

When handling large datasets, it is crucial to monitor memory usage. Adjust the chunk size based on your available memory to avoid overflow.

 

If this post helps, then please give us ‘Kudos’ and consider Accept it as a solution to help the other members find it more quickly. 

 

Thank you. 

 

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.