Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

A new Data Days event is coming soon! This time we’re going bigger than ever. Fabric, Power BI, SQL, AI and more. Don't miss out.

erenorbey

Serve real-time predictions seamlessly with ML model endpoints

Fabric offers a wide variety of data-science capabilities, from automated machine learning with FLAML to batch inferencing with the SynapseML PREDICT function. We’re pleased to announce that ML models can now serve real-time predictions from secure, scalable, and easy-to-use online endpoints. In addition to generating batch predictions in Spark, you can use endpoints to bring the predictive power of your ML models to other Fabric solutions and custom applications.

ML models in Fabric can now serve real-time predictions from secure, fully managed, and easy-to-use online endpoints.

ML model endpoints expand the reach of your data-science solutions while drastically simplifying the deployment process; let’s take a closer look.

Real-time serving with a single call or click

In Fabric, endpoints are available as built-in properties of most ML models, requiring no setup to kick off fully managed deployments. Models have dedicated endpoints for individual versions and a customizable default endpoint, which serves predictions from a version that you can choose and change. You can activate endpoints with a single call to our REST API or a single click in the Fabric interface; we’ll handle the rest.

Serve_real-time_predictions_seamlessly_with_ML_model_endpointsServe_real-time_predictions_seamlessly_with_ML_model_endpointsServe_real-time_predictions_seamlessly_with_ML_model_endpoints

You can activate real-time endpoints for an ML model version with a single call to our REST API or a single click in the Fabric interface.

Auto-scaling enabled out of the box

Behind the scenes, Fabric manages the container infrastructure to host your model, dynamically adjusting the resources allocated to each endpoint based on incoming traffic. During periods without traffic, we’ll automatically scale down resource usage to zero, saving you Fabric capacity. You can customize this behavior and more programmatically with our API or directly from the Fabric interface, by navigating to your model’s settings.

Serve_real-time_predictions_seamlessly_with_ML_model_endpointsServe_real-time_predictions_seamlessly_with_ML_model_endpointsServe_real-time_predictions_seamlessly_with_ML_model_endpoints

The settings for your ML model now include endpoint properties, including a user-configurable default version and an auto-sleep toggle for each active version.

Sample predictions for sanity testing

Before serving predictions to other Fabric experiences or custom applications, you can preview sample predictions without leaving the product. A low-code interface enables instant testing, letting you key in requests with form fields or a JSON editor and examine responses in real time.

Serve_real-time_predictions_seamlessly_with_ML_model_endpointsServe_real-time_predictions_seamlessly_with_ML_model_endpointsServe_real-time_predictions_seamlessly_with_ML_model_endpoints

Before putting your ML model endpoints into production, you can preview sample predictions from a low-code interface without leaving the product.

Next steps

Learn more about ML model endpoints from our Serve real-time predictions with ML model endpoints (Preview) or API reference documentation. Before getting started, please make note of a few prerequisites:

  • Your administrator needs to enable the tenant switch for ML model endpoints in the Fabric admin portal in order for you to use the feature.
  • Your ML model must be registered with a scalar-based schema and no dependencies on private or internal packages in order to support real-time endpoints.

We can’t wait for you to try out ML model endpoints in Fabric. Let us know what you think by submitting feedback on Fabric Ideas or joining the conversation on the Fabric Community.