Don't miss your chance to take exam DP-600 or DP-700 on us!
Request nowFabric Data Days Monthly is back. Join us on March 26th for two expert-led sessions on 1) Getting Started with Fabric IQ and 2) Mapping & Spacial Analytics in Fabric. Register now
Hello,
I use GLM Model in R in databricks which usually takes 3-4 hrs for model training. I am trying to replicate same in the Fabric notebook in SparkR, and it is taking forver to run. Everytime session gets disconnected. Is it the known issue that sparkR in fabric does not support GLM Models or is v v slow? I tried increasing the capacity of Fabric Subscription but it is not helping in any way. Any suggestions on this?
Thanks
Solved! Go to Solution.
Hi @roshneematlani,
Currently there is no single official wiki page that explicitly calls out the limitation. But from the official Spark MLlib and SparkR documentation, we can see the set of supported GLM algorithms and parameters, and the bias reduced methods like Jeffreys/Firth are not part of that supported surface area. Please refer below documents:
https://spark.apache.org/docs/latest/ml-classification-regression
https://spark.apache.org/docs/latest/api/R/articles/sparkr-vignettes
Thanks and regards,
Anjan Kumar Chippa
My current model is as follows-
fit <- glm( formula = y~ x1+ x2+ x3+ x4 + x5 + x6 + x7 + x8 + x9 + x10, data = train_df, family = binomial, weights = weights, method = "brglmFit", type = 'MPL_Jeffreys', a = 0.1, control = list( maxit = 5000, trace = TRUE, slowit = 0.1 ) )
and based on my research- using
method = "brglmFit"
type = "MPL_Jeffreys"
in fabric sparkR is very slow and equivalent is NOT available in Spark MLlib.
Any other alternative for fabric?
Train brglmFit model locally (CPU/GPU)
Save model coefficients
Load them into:
Django backend
Or inference-only service
Fabric is then used for:
Data ingestion
Feature prep
Analytics
Hi @roshneematlani,
Yes you are correct, brglmFit with MPL_Jeffreys has no equivalent in Spark MLlib, and that is why the workload performs poorly in fabric sparkR. These models are strictly single node and run on the spark driver, because of that long running training disconnects in fabric.
There is no equivalent or better alternative inside fabric today.
If jeffreys prior bias reduction is required, the recommended approach is to train the model outside fabric using native R and then deploy the trained coefficients for scoring inside fabric.
Thanks and regards,
Anjan Kumar Chippa
Hi @roshneematlani,
As we haven’t heard back from you, we wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.
Thanks and regards,
Anjan Kumar Chippa
Do we have any official wiki or any other resource which states this limitation?
Thanks
Roshnee
Hi @roshneematlani,
Currently there is no single official wiki page that explicitly calls out the limitation. But from the official Spark MLlib and SparkR documentation, we can see the set of supported GLM algorithms and parameters, and the bias reduced methods like Jeffreys/Firth are not part of that supported surface area. Please refer below documents:
https://spark.apache.org/docs/latest/ml-classification-regression
https://spark.apache.org/docs/latest/api/R/articles/sparkr-vignettes
Thanks and regards,
Anjan Kumar Chippa
Hi @roshneematlani,
We wanted to kindly follow up to check if the solution I have provided for the issue worked? or let us know if you need any further assistance.
Thanks and regards,
Anjan Kumar Chippa
Hi @roshneematlani,
Thank you for reaching out to Microsoft Fabric Community.
This is not a known issue with GLM support in Fabric. While SparkR is supported in fabric, GLM training via glm() in SparkR is driver bound and is not distributed. Increasing fabric capacity only adds executor resources, which does not improve SparkR GLM performance.
In fabric notebooks long running SparkR jobs can be very slow and may hit notebook session or driver limits, that causes the disconnections you are seeing. This behaviour differs from databricks, where SparkR clusters allow larger and more persistent drivers.
For large datasets or multi-hour training, the recommend way is to use Spark MLlib GLM (via sparklyr or PySpark), which is fully distributed and supported in fabric. SparkR is best for the smaller or exploratory workloads.
Thanks and regards,
Anjan Kumar Chippa
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
Check out the February 2026 Fabric update to learn about new features.
| User | Count |
|---|---|
| 2 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |