Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric Data Days Monthly is back. Join us on March 26th for two expert-led sessions on 1) Getting Started with Fabric IQ and 2) Mapping & Spacial Analytics in Fabric. Register now

Reply
roshneematlani
Microsoft Employee
Microsoft Employee

R Model training takes forever in Fabric Notebook

Hello,

 

I use GLM Model in R in databricks which usually takes 3-4 hrs for model training. I am trying to replicate same in the Fabric notebook in SparkR, and it is taking forver to run. Everytime session gets disconnected. Is it the known issue that sparkR in fabric does not support GLM Models or is v v slow? I tried increasing the capacity of Fabric Subscription but it is not helping in any way. Any suggestions on this?

 

Thanks

1 ACCEPTED SOLUTION

Hi @roshneematlani,

 

Currently there is no single official wiki page that explicitly calls out the limitation. But from the official Spark MLlib and SparkR documentation, we can see the set of supported GLM algorithms and parameters, and the bias reduced methods like Jeffreys/Firth are not part of that supported surface area. Please refer below documents:
https://spark.apache.org/docs/latest/ml-classification-regression

https://spark.apache.org/docs/latest/api/R/articles/sparkr-vignettes

 

Thanks and regards,

Anjan Kumar Chippa

 

View solution in original post

8 REPLIES 8
roshneematlani
Microsoft Employee
Microsoft Employee

My current model is as follows-

fit <- glm( formula = y~ x1+ x2+ x3+ x4 + x5 + x6 + x7 + x8 + x9 + x10, data = train_df, family = binomial, weights = weights, method = "brglmFit", type = 'MPL_Jeffreys', a = 0.1, control = list( maxit = 5000, trace = TRUE, slowit = 0.1 ) )

and based on my research- using
method = "brglmFit"
type = "MPL_Jeffreys"
in fabric sparkR is very slow and equivalent is NOT available in Spark MLlib.

Any other alternative for fabric?

  • Train brglmFit model locally (CPU/GPU)
  • Save model coefficients
  • Load them into:
    • Django backend
    • Or inference-only service
Fabric is then used for:
  • Data ingestion
  • Feature prep
  • Analytics

Hi @roshneematlani,

 

Yes you are correct, brglmFit with MPL_Jeffreys has no equivalent in Spark MLlib, and that is why the workload performs poorly in fabric sparkR. These models are strictly single node and run on the spark driver, because of that long running training disconnects in fabric.

There is no equivalent or better alternative inside fabric today.

If jeffreys prior bias reduction is required, the recommended approach is to train the model outside fabric using native R and then deploy the trained coefficients for scoring inside fabric.

Thanks and regards,

Anjan Kumar Chippa

Hi @roshneematlani,

 

As we haven’t heard back from you, we wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.

 

Thanks and regards,

Anjan Kumar Chippa

Do we have any official wiki or any other resource which states this limitation?

 

Thanks

Roshnee

Hi @roshneematlani,

 

Currently there is no single official wiki page that explicitly calls out the limitation. But from the official Spark MLlib and SparkR documentation, we can see the set of supported GLM algorithms and parameters, and the bias reduced methods like Jeffreys/Firth are not part of that supported surface area. Please refer below documents:
https://spark.apache.org/docs/latest/ml-classification-regression

https://spark.apache.org/docs/latest/api/R/articles/sparkr-vignettes

 

Thanks and regards,

Anjan Kumar Chippa

 

Hi @roshneematlani,

 

We wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.

 

Thanks and regards,

Anjan Kumar Chippa

v-achippa
Community Support
Community Support

Hi @roshneematlani,

 

Thank you for reaching out to Microsoft Fabric Community.

 

This is not a known issue with GLM support in Fabric. While SparkR is supported in fabric, GLM training via glm() in SparkR is driver bound and is not distributed. Increasing fabric capacity only adds executor resources, which does not improve SparkR GLM performance.

In fabric notebooks long running SparkR jobs can be very slow and may hit notebook session or driver limits, that causes the disconnections you are seeing. This behaviour differs from databricks, where SparkR clusters allow larger and more persistent drivers.

For large datasets or multi-hour training, the recommend way is to use Spark MLlib GLM (via sparklyr or PySpark), which is fully distributed and supported in fabric. SparkR is best for the smaller or exploratory workloads.

 

Thanks and regards,

Anjan Kumar Chippa

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Fabric Update Carousel

Fabric Monthly Update - February 2026

Check out the February 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.