Automated ML integration with Power BI dataflows allows training and applying Binary Prediction, General Classification and Regression models. The ML models are internally represented as specially marked dataflow entities. I’ll describe how the ML related entities are defined in M language and how they can be edited using the Power Query editor.
The following diagram illustrates the entities generated during the ML process and the dependencies between them:
As you can see above the process of training an ML model creates 2 additional entities besides the model itself: training data entity and testing data entity.
Let's take an example of training a Binary Prediction model on an entity named “Customers” with selected columns “City”, State”, “CreditCardBalance”, “WebEngagementScore” and “CustomerStatus”. The training data entity will get created with the following M definition:
let Source = Customers, #"Selected columns" = Table.SelectColumns(Source, {"City", "State", "CreditCardBalance", "WebEngagementScore", "CustomerStatus"}), #"Removed nulls" = Table.SelectRows(#"Selected columns", each [CustomerStatus] <> null), #"Sampled input" = AI.SampleStratifiedWithHoldout("CustomerStatus", Table.RowCount(#"Removed nulls"), #"Removed nulls") in #"Sampled input"
Note that the training entity does two things:
It returns a table with the same columns as the input table plus a boolean “__IsTraining__” column indicating if a given record should be used for training (true) or testing (false).
The model entity would get created with M definition of:
let Source = #"CustomerChurnModel Training Data", #"Invoked TrainPrediction" = AIInsights.Contents(){[Key = "AI.Execute"]}[Data]("AI.TrainPrediction", "Regular", [labelColumnName = "CustomerStatus", data = Source]), #"Selected training schema columns" = Table.SelectColumns(#"Invoked TrainPrediction", {"TrainingId", "Model", "Stats", "GlobalExplanation", "TrainingSchema", "TrainingAUC", "LabelColumn"}) in #"Selected training schema columns"
It uses the training data entity as input and invokes the “AI.TrainPrediction” transform on the AI workload of your premium capacity. Note that the invocation could be written in a simpler form:
#"Invoked TrainPrediction" = AIInsights.Contents(){[Key = "AI.TrainPredictiontexttable"]}[Data]("CustomerStatus", Source)
AIInsights.Contents() returns a table of AI transforms supported by the AI workload with columns such as: Name, Data (the actual M function invoking the transform), Key (unique identifier of the transform). That list includes transforms generated for Azure ML services you have access to and Cognitive Services. The “AI.Execute” transform used in the query generated by the “Add ML model” wizard is a wrapper that allows calling other transforms by passing parameters in a record. That allows adding new optional parameters to existing transforms without breaking existing models with queries generated without passing values for those parameters.
The model entity created by the AI.TrainPrediction transform is currently a table with a single record. This table also contains “Global explanations” for the model. The report works on top of these explanations. In the future there may be multiple records corresponding to versions of the model.
Sample output of AI.TrainPrediction:
The testing data entity allows you to explore the records used by AutoML for model evaluation and hyperparameter tuning. For the above scenario its generated M definition would be:
let Source = #"CustomerChurnModel Training Data", #"Filtered rows" = Table.SelectRows(Source, each ([__IsTraining__] = false)), #"Invoked Scoring" = CustomerChurnModel.Score(#"Filtered rows", "CustomerChurnModelOutput", 0.5) in #"Invoked Scoring"
It uses the training data entity as input and selects rows where the “__IsTraning__” column added during sampling is set to false. It then applies the scoring function defined as a separate query:
let ApplyScoringFunction = (inputQuery as table, newColumn as text, decisionThreshold as number) => let MlModel = CustomerChurnModel, MlModelJson = try Text.FromBinary(Json.FromValue(MlModel{0})) otherwise "InvalidModel", Source = inputQuery, SelectedBaseEntityColumns = {"City", "State", "CreditCardBalance", "WebEngagementScore", "CustomerStatus"}, InputRowCount = Table.RowCount(Source), InputTableType = Value.Type(Source), SelectedColumnsTypes = List.Transform(SelectedBaseEntityColumns, each Type.TableColumn(InputTableType, _)), ScoringFunction = let ScoringFunctionScalarType = type function (row as record) as any, VectorizedScoringFunction = (input as table) => let ExpandedColumns = Table.ExpandRecordColumn(input, "row", SelectedBaseEntityColumns), ExpandedColumnsWithTypes = Table.TransformColumnTypes(ExpandedColumns, List.Zip({SelectedBaseEntityColumns, SelectedColumnsTypes})), ErrorList = List.Repeat({[Output = null]}, InputRowCount), Result = if MlModelJson <> "InvalidModel" then (try Table.ToRecords(AIInsights.Contents(){[Key = "AI.Execute"]}[Data]("AI.ScorePrediction", "Vectorized", [data = ExpandedColumns, scoreParameters = MlModelJson])) otherwise ErrorList) else ErrorList in Result, ScalarVectorScoringFunction = Function.ScalarVector(ScoringFunctionScalarType, VectorizedScoringFunction) in ScalarVectorScoringFunction, AddScoringColumn = Table.AddColumn(Source, newColumn, each ScoringFunction(_)), ExpandResultColumns = Table.ExpandRecordColumn(AddScoringColumn, newColumn, {"PredictionScore", "PredictionExplanation"}, {Text.Combine({newColumn, "PredictionScore"}, "."), Text.Combine({newColumn, "PredictionExplanation"}, ".")}), LabeledOutput = Table.AddColumn(ExpandResultColumns, Text.Combine({newColumn, "Outcome"}, "."), each Record.Field(_, Text.Combine({newColumn, "PredictionScore"}, ".")) >= decisionThreshold * 100), ReplacedErrors = Table.ReplaceErrorValues(LabeledOutput, {{Text.Combine({newColumn, "Outcome"}, "."), null}, {Text.Combine({newColumn, "PredictionScore"}, "."), null}, {Text.Combine({newColumn, "PredictionExplanation"}, "."), null}}), TransformTypes = Table.TransformColumnTypes(ReplacedErrors, {{Text.Combine({newColumn, "Outcome"}, "."), type logical}, {Text.Combine({newColumn, "PredictionScore"}, "."), type text}, {Text.Combine({newColumn, "PredictionExplanation"}, "."), type text}}) in TransformTypes in ApplyScoringFunction
Note that the decision threshold value passed by the generated testing data entity is 0.5. The scoring function invokes the “AI.ScorePrediction” transform on the AI workload of your premium capacity passing the ML model and the input table. On output 3 new columns get added: Outcome (Boolean), PredictionScore and PredictionExplanation.
The “Add ML model” wizard creates the previously defined entities in the dataflow, but it’s only during refresh that the training actually occurs and the resulting model is materialized in the model entity in the dataflow store (Azure Data Lake). After training the model can be applied to another entity with matching schema.
Applying the trained ML model to an entity, let’s say “NewCustomers” entity, will result in creation on another entity with appended word “enriched” and ML model name, e.g. “NewCustomers enriched CustomerChurnModel”. The definition of such enriched entity would be:
let Source = NewCustomers, #"Invoked CustomerChurnModel.Score" = CustomerChurnModel.Score(Source, "CustomerChurnModel", 0.5) in #"Invoked CustomerChurnModel.Score"
The enriched entity definition uses the same scoring function as the test data entity.
Please note: The application of the model to an entity also occurs during dataflow refresh when the enriched entity is materialized in the dataflow store.
Marek Rycharski | Principal Software Engineer at Microsoft Power BI (Artificial Intelligence) team
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.