Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric Data Days Monthly is back. Join us on March 26th for two expert-led sessions on 1) Getting Started with Fabric IQ and 2) Mapping & Spacial Analytics in Fabric. Register now

Reply
omkar3888
New Member

Predicted results ingestion takes a lot of time

Hello Community Users,

i was trying to develope a ML model in fabric using pyspark notebook. i has also registered model decisiontreeclassifier with tfidvectorizer . but after using above registered model for prediction the ingestion of prediction takes hell lot of time. to ingest 10K records it tool 25 min and my prediction dataframe has 3 mn records. I followed each and every step mentioned in fabric github repo. since the model is not working across the entire cluster and i am only getting a single node to work on the process. can somebody explain why this is happening..? am i missing any step.

1 ACCEPTED SOLUTION
v-dineshya
Community Support
Community Support

Hi @omkar3888 ,

Thank you for reaching out to the Microsoft Community Forum.

 

If you wrapped your scikit‑learn DecisionTreeClassifier + TfidfVectorizer pipeline in a regular Python UDF (per-row), Spark has to shuttle each row across the JVM <-> Python boundary, which is extremely slow and often collapses execution to effectively one core/node. Regular UDFs operate one value at a time and suffer from serialization overhead; vectorized Pandas UDFs via Arrow process batches, dramatically reducing that overhead.


Fabric’s default/starter pool supports single-node clusters driver and executor co-located. If your workspace/pool/environment is set to minimum nodes = 1, you will observe exactly one node working, regardless of data size. Consider a custom Spark pool or higher node count and appropriate executor/core settings.


If you are collecting to the driver like df.collect().toPandas() then writing or writing with features like Optimize Write and V‑Order without tuning, the write phase can dominate. Optimize Write introduces an extra shuffle to produce larger files; Microsoft’s docs note it can add ~15% to write times on average (in exchange for much faster reads). Misconfigured writes or small-file patterns can explode latency.

 

Spark best-practices not enabled, If Adaptive Query Execution (AQE), proper partitioning, and column pruning aren’t in place, you can get skewed partitions and inefficient shuffles. Fabric’s Spark basics guide calls these out as default tuning steps.

 

Please try below things to fix the issue.

 

1. Use Fabric’s scalable batch scoring instead of manual UDFs. If you registered the model in Fabric (MLflow), call it with the PREDICT path:

 

MLFlowTransformer API (SynapseML) on a Spark DataFrame cluster-distributed scoring, no per-row Python overhead. Please refer sample code.


from synapse.ml.mlflow import MLFlowTransformer

pred_df = (MLFlowTransformer()
.setModelUri("models:/YourModelName/1")
.transform(input_df))

 

Note: This approach is built for Fabric and scales out across executors.

 

2. Check cluster settings,

Workspace --> Spark Compute: node count > 1? autoscale enabled appropriately?
Environment --> Compute: driver/executor cores/memory tuned?


3. Look for task/executor usage, Python UDF time, shuffle hotspots, and write stages.

 

4. Enable basics, AQE on, DataFrame APIs (not RDD), prune columns before prediction, avoid wide rows.

 

5. Write path, Ensure you don’t collect() before writing. Use distributed .write.format("delta") and tune Optimize Write/V‑Order as per workload.


Note: Register your model with MLflow in Fabric. Use MLFlowTransformer or predict_batch_udf for distributed scoring. Write predictions directly to Delta; test with Optimize Write on/off and adjust bin size for your latency target. Verify multi-node compute and monitor the run. This typically moves throughput from hundreds of rows/sec (scalar UDF) to tens/hundreds of thousands of rows/sec (batch/Arrow), and ensures all executors participate.

 

Please refer below links.

Model scoring with PREDICT - Microsoft Fabric | Microsoft Learn

Lakehouse and Delta Tables - Microsoft Fabric | Microsoft Learn

Tutorial: Perform batch scoring and save predictions - Microsoft Fabric | Microsoft Learn

Workspace administration settings in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

Compute Management in Fabric Environments - Microsoft Fabric | Microsoft Learn

Spark Basics - Microsoft Fabric | Microsoft Learn

Tutorial: Train and register machine learning models - Microsoft Fabric | Microsoft Learn

Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn

Using optimize write on Apache Spark to produce more efficient tables - Azure Synapse Analytics | Mi...

 

I hope this information helps. Please do let us know if you have any further queries.

 

Regards,

Dinesh

View solution in original post

3 REPLIES 3
v-dineshya
Community Support
Community Support

Hi @omkar3888 ,

Thank you for reaching out to the Microsoft Community Forum.

 

If you wrapped your scikit‑learn DecisionTreeClassifier + TfidfVectorizer pipeline in a regular Python UDF (per-row), Spark has to shuttle each row across the JVM <-> Python boundary, which is extremely slow and often collapses execution to effectively one core/node. Regular UDFs operate one value at a time and suffer from serialization overhead; vectorized Pandas UDFs via Arrow process batches, dramatically reducing that overhead.


Fabric’s default/starter pool supports single-node clusters driver and executor co-located. If your workspace/pool/environment is set to minimum nodes = 1, you will observe exactly one node working, regardless of data size. Consider a custom Spark pool or higher node count and appropriate executor/core settings.


If you are collecting to the driver like df.collect().toPandas() then writing or writing with features like Optimize Write and V‑Order without tuning, the write phase can dominate. Optimize Write introduces an extra shuffle to produce larger files; Microsoft’s docs note it can add ~15% to write times on average (in exchange for much faster reads). Misconfigured writes or small-file patterns can explode latency.

 

Spark best-practices not enabled, If Adaptive Query Execution (AQE), proper partitioning, and column pruning aren’t in place, you can get skewed partitions and inefficient shuffles. Fabric’s Spark basics guide calls these out as default tuning steps.

 

Please try below things to fix the issue.

 

1. Use Fabric’s scalable batch scoring instead of manual UDFs. If you registered the model in Fabric (MLflow), call it with the PREDICT path:

 

MLFlowTransformer API (SynapseML) on a Spark DataFrame cluster-distributed scoring, no per-row Python overhead. Please refer sample code.


from synapse.ml.mlflow import MLFlowTransformer

pred_df = (MLFlowTransformer()
.setModelUri("models:/YourModelName/1")
.transform(input_df))

 

Note: This approach is built for Fabric and scales out across executors.

 

2. Check cluster settings,

Workspace --> Spark Compute: node count > 1? autoscale enabled appropriately?
Environment --> Compute: driver/executor cores/memory tuned?


3. Look for task/executor usage, Python UDF time, shuffle hotspots, and write stages.

 

4. Enable basics, AQE on, DataFrame APIs (not RDD), prune columns before prediction, avoid wide rows.

 

5. Write path, Ensure you don’t collect() before writing. Use distributed .write.format("delta") and tune Optimize Write/V‑Order as per workload.


Note: Register your model with MLflow in Fabric. Use MLFlowTransformer or predict_batch_udf for distributed scoring. Write predictions directly to Delta; test with Optimize Write on/off and adjust bin size for your latency target. Verify multi-node compute and monitor the run. This typically moves throughput from hundreds of rows/sec (scalar UDF) to tens/hundreds of thousands of rows/sec (batch/Arrow), and ensures all executors participate.

 

Please refer below links.

Model scoring with PREDICT - Microsoft Fabric | Microsoft Learn

Lakehouse and Delta Tables - Microsoft Fabric | Microsoft Learn

Tutorial: Perform batch scoring and save predictions - Microsoft Fabric | Microsoft Learn

Workspace administration settings in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

Compute Management in Fabric Environments - Microsoft Fabric | Microsoft Learn

Spark Basics - Microsoft Fabric | Microsoft Learn

Tutorial: Train and register machine learning models - Microsoft Fabric | Microsoft Learn

Delta Lake table optimization and V-Order - Microsoft Fabric | Microsoft Learn

Using optimize write on Apache Spark to produce more efficient tables - Azure Synapse Analytics | Mi...

 

I hope this information helps. Please do let us know if you have any further queries.

 

Regards,

Dinesh

Hi @omkar3888 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. And, if you have any further query do let us know.

 

Regards,

Dinesh

Hi @omkar3888 ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. And, if you have any further query do let us know.

 

Regards,

Dinesh

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Fabric Update Carousel

Fabric Monthly Update - February 2026

Check out the February 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.