Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
happyclick
New Member

Does Power BI machine learning only incorporate a portion of the data through sampled rows?

I executed a power BI machine learning model on a dataset containing 200,000 rows. However, the model training report displayed only 20,000+ sampled rows. Does this imply that only 20,000 rows are utilized for model training? is there documentation available to explain the concept of sampled rows? Thanks.

2 ACCEPTED SOLUTIONS
lbendlin
Super User
Super User

That's the point of ML.  If you train it based on the entire model then it has nothing to predict for you.  You train it on a subset, and then you check how well it matches with the entire set.

 

As to the percentage and sampling method - that may be described in a whitepaper somewhere, or it may be considered IP.

View solution in original post

v-xiandat-msft
Community Support
Community Support

Hi @happyclick ,

When creating and training machine learning models in Power BI, the platform typically uses a subset of the entire dataset for training. This subset is called a sampled row.
The purpose of sampling is to make the training process more efficient, especially when working with large datasets. By using a smaller subset, models can be trained faster without sacrificing much accuracy.
However, it is important to recognize that the sampled rows represent only a portion of the data, not the entire dataset.

In your case, you mentioned that your model training report shows 20,000+ rows of sampled rows in a total dataset of 200,000 rows.
This does not mean that only 20,000 rows were used for model training. Instead, it indicates that the model was trained on a subset of the data (the sample rows).
The actual training process involves using these sample rows to construct an internal representation of the model (e.g., weights and biases) based on the features and labels in the dataset.

Below is the official link will help you:

Tutorial: Build a machine learning model in Power BI - Power BI | Microsoft Learn

Best Regards,

Xianda Tang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
v-xiandat-msft
Community Support
Community Support

Hi @happyclick ,

When creating and training machine learning models in Power BI, the platform typically uses a subset of the entire dataset for training. This subset is called a sampled row.
The purpose of sampling is to make the training process more efficient, especially when working with large datasets. By using a smaller subset, models can be trained faster without sacrificing much accuracy.
However, it is important to recognize that the sampled rows represent only a portion of the data, not the entire dataset.

In your case, you mentioned that your model training report shows 20,000+ rows of sampled rows in a total dataset of 200,000 rows.
This does not mean that only 20,000 rows were used for model training. Instead, it indicates that the model was trained on a subset of the data (the sample rows).
The actual training process involves using these sample rows to construct an internal representation of the model (e.g., weights and biases) based on the features and labels in the dataset.

Below is the official link will help you:

Tutorial: Build a machine learning model in Power BI - Power BI | Microsoft Learn

Best Regards,

Xianda Tang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

lbendlin
Super User
Super User

That's the point of ML.  If you train it based on the entire model then it has nothing to predict for you.  You train it on a subset, and then you check how well it matches with the entire set.

 

As to the percentage and sampling method - that may be described in a whitepaper somewhere, or it may be considered IP.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors