Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreShape the future of the Fabric Community! Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions. Take survey.
I executed a power BI machine learning model on a dataset containing 200,000 rows. However, the model training report displayed only 20,000+ sampled rows. Does this imply that only 20,000 rows are utilized for model training? is there documentation available to explain the concept of sampled rows? Thanks.
Solved! Go to Solution.
That's the point of ML. If you train it based on the entire model then it has nothing to predict for you. You train it on a subset, and then you check how well it matches with the entire set.
As to the percentage and sampling method - that may be described in a whitepaper somewhere, or it may be considered IP.
Hi @happyclick ,
When creating and training machine learning models in Power BI, the platform typically uses a subset of the entire dataset for training. This subset is called a sampled row.
The purpose of sampling is to make the training process more efficient, especially when working with large datasets. By using a smaller subset, models can be trained faster without sacrificing much accuracy.
However, it is important to recognize that the sampled rows represent only a portion of the data, not the entire dataset.
In your case, you mentioned that your model training report shows 20,000+ rows of sampled rows in a total dataset of 200,000 rows.
This does not mean that only 20,000 rows were used for model training. Instead, it indicates that the model was trained on a subset of the data (the sample rows).
The actual training process involves using these sample rows to construct an internal representation of the model (e.g., weights and biases) based on the features and labels in the dataset.
Below is the official link will help you:
Tutorial: Build a machine learning model in Power BI - Power BI | Microsoft Learn
Best Regards,
Xianda Tang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @happyclick ,
When creating and training machine learning models in Power BI, the platform typically uses a subset of the entire dataset for training. This subset is called a sampled row.
The purpose of sampling is to make the training process more efficient, especially when working with large datasets. By using a smaller subset, models can be trained faster without sacrificing much accuracy.
However, it is important to recognize that the sampled rows represent only a portion of the data, not the entire dataset.
In your case, you mentioned that your model training report shows 20,000+ rows of sampled rows in a total dataset of 200,000 rows.
This does not mean that only 20,000 rows were used for model training. Instead, it indicates that the model was trained on a subset of the data (the sample rows).
The actual training process involves using these sample rows to construct an internal representation of the model (e.g., weights and biases) based on the features and labels in the dataset.
Below is the official link will help you:
Tutorial: Build a machine learning model in Power BI - Power BI | Microsoft Learn
Best Regards,
Xianda Tang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
That's the point of ML. If you train it based on the entire model then it has nothing to predict for you. You train it on a subset, and then you check how well it matches with the entire set.
As to the percentage and sampling method - that may be described in a whitepaper somewhere, or it may be considered IP.
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Check out the November 2024 Power BI update to learn about new features.
User | Count |
---|---|
32 | |
30 | |
20 | |
11 | |
8 |
User | Count |
---|---|
52 | |
38 | |
30 | |
14 | |
12 |