Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
pbix
Helper III
Helper III

Loading data from ADLS Delta Table into Spark Dataframe is slow

Hi there, 

 

I'm experiencing slow read times when loading data from delta tables into data frames using PySpark in Synapse notebooks. 

 

This does not include the time taken for the Spark cluster to spin up. 

 

The delta table I am loading data from is relatively small, approximately 1 million rows and it takes about 30 seconds to load these rows into a dataframe.

 

Compared to SQL server this is very slow.

 

The simple syntax I'm using is:

 

df = spark.read.format("delta").load(deltasource).select("field1","field2","field3").
 
Data is text codes and dates - but is not especially wide
 
I am not doing any processing on this dataframe yet - just loading it. 
 
Are there any likely candidates for why the data frame loading speed is so slow? Synapse Serverless is much faster at loading this dataset as well. 
 
Thank you.
 
 
 
 
 
 
1 REPLY 1
ananthkrishna99
Frequent Visitor

Hi @pbix 

Where are you executing this query, is it in Fabric/Synapse. If it's in synapse what is the spark pool size used to run the notebook?

If you are using Fabric, what type of environment is it, Trail/Dedicated Capacity, if it's dedicated capacity what is the size of sku and node size, if it's trail what is the node size used?

In Synapse serverless, how did you test it, is it simply by select * from table or any other way?

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

PBI_Carousel_NL_June

Fabric Community Update - June 2024

Get the latest Fabric updates from Build 2024, key Skills Challenge voucher deadlines, top blogs, forum posts, and product ideas.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.