Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Anonymous
Not applicable

Percentage data driven random sample

I've researched this group and have found similar but not enough similarity to answer the following.
 
Process:
We have a QC process needing to sample changes with fixed percentages (20% and 100%) of closed changes for ease of discussion based on LCAB LCCB_SYS_ID.  The sample will be exported to excel until such time we can enable premium features needed within PowerApps to write back to the source tables.
 
The idea being to filter the type of change (Projects, SOPS, LowRisk Changes), then group the changes by LCCB.  Each LCCB type has a pre determined sample rate.  The number of changes needed to satisfy that rate of sample is calculated as a measure...such as 20% or 100%. 
 
Logic:
1) Import all changes into a Query and perform all operations for data clean up, 01 - Change - ServiceNowChangeRequests: Closed
2) Added an index to be used as part of the random number calculation to insure unique value
3) Added a random buffer calculation to be used for sample selection
4) Alias the first table so create additional work and confusion.
Actually done to give the work a name and store any measure or logic that might be specific to the type of change. SOPDerived = '01 - Change - ServiceNowChangeRequests: Closed'
5)Created a measure to determine a rounded up number for the percentage required for the type/group defined percentage and reflective of page based slicers
Percentage measure was built as 20% = ROUNDUP(CALCULATE(COUNT('SOPDerived'[CHANGE_NUMBER])*.20),0)
 
6) Leverage the count of the 20% Measure to get a value (September value is 477 * 20% = 95.4 rounded up to 96) 
96 Changes would need to be sampled 
 
My initial throught was to grab the calculated numerical measure value 96 and feed that to the TopN calculation over the the random column's values, do to my ignorance or scaler value error, or not using the correct type of calculation. 
1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @Anonymous ,

Based on your description, it seems you are looking to create a sampling process that selects a fixed percentage of records from a dataset, grouped by a specific field (`LCCB_SYS_ID`), and that this process is influenced by slicers on your report page.

 

Please try below steps:

1. Data Preparation:
- Import your dataset into Power BI and perform any necessary data cleanup.
- Add an index column to ensure each row has a unique identifier.
- Add a column with a random number to be used for random sampling.

 

2. Creating Measures:
- Create a measure to calculate the number of changes required for sampling based on the percentage defined for each group. For example:

SampleSize = ROUNDUP(CALCULATE(COUNT('SOPDerived'[CHANGE_NUMBER]) * 0.20), 0)

- Ensure that this measure updates correctly based on the page slicers.

 

3. Sampling Logic:
- To select the top N items based on the random number column, you can use the `TOPN` function in combination with a filter or calculated table. However, measures cannot be directly used in the `TOPN` function since they do not return a table. Instead, you can use a calculated column or a calculated table to achieve this.
- Here's an example of how you might create a calculated table that takes the top N items based on your sample size measure:

SampledChanges = 
     VAR SampleSize = [SampleSize] -- This is your measure from step 2
     RETURN
     TOPN(
         SampleSize,
         ALL('SOPDerived'),
         'SOPDerived'[RandomColumn], -- This is the column with random numbers
         ASC
     )

- This calculated table will dynamically update based on the measure, which in turn is influenced by the slicers.

 

4. Export to Excel:
- Once you have your sampled data, you can export it to Excel by using the "Export data" option available in Power BI visuals.

 

Please note that the calculated table approach may have performance implications depending on the size of your dataset. If you encounter performance issues, consider optimizing your model or sampling within the query editor before loading the data into the model.

 

Best regards,
Community Support Team_Binbin Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

1 REPLY 1
Anonymous
Not applicable

Hi @Anonymous ,

Based on your description, it seems you are looking to create a sampling process that selects a fixed percentage of records from a dataset, grouped by a specific field (`LCCB_SYS_ID`), and that this process is influenced by slicers on your report page.

 

Please try below steps:

1. Data Preparation:
- Import your dataset into Power BI and perform any necessary data cleanup.
- Add an index column to ensure each row has a unique identifier.
- Add a column with a random number to be used for random sampling.

 

2. Creating Measures:
- Create a measure to calculate the number of changes required for sampling based on the percentage defined for each group. For example:

SampleSize = ROUNDUP(CALCULATE(COUNT('SOPDerived'[CHANGE_NUMBER]) * 0.20), 0)

- Ensure that this measure updates correctly based on the page slicers.

 

3. Sampling Logic:
- To select the top N items based on the random number column, you can use the `TOPN` function in combination with a filter or calculated table. However, measures cannot be directly used in the `TOPN` function since they do not return a table. Instead, you can use a calculated column or a calculated table to achieve this.
- Here's an example of how you might create a calculated table that takes the top N items based on your sample size measure:

SampledChanges = 
     VAR SampleSize = [SampleSize] -- This is your measure from step 2
     RETURN
     TOPN(
         SampleSize,
         ALL('SOPDerived'),
         'SOPDerived'[RandomColumn], -- This is the column with random numbers
         ASC
     )

- This calculated table will dynamically update based on the measure, which in turn is influenced by the slicers.

 

4. Export to Excel:
- Once you have your sampled data, you can export it to Excel by using the "Export data" option available in Power BI visuals.

 

Please note that the calculated table approach may have performance implications depending on the size of your dataset. If you encounter performance issues, consider optimizing your model or sampling within the query editor before loading the data into the model.

 

Best regards,
Community Support Team_Binbin Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

Top Solution Authors