Solved: Percentage data driven random sample

Anonymous · ‎01-03-2024

I've researched this group and have found similar but not enough similarity to answer the following.

Process:

We have a QC process needing to sample changes with fixed percentages (20% and 100%) of closed changes for ease of discussion based on LCAB LCCB_SYS_ID. The sample will be exported to excel until such time we can enable premium features needed within PowerApps to write back to the source tables.

The idea being to filter the type of change (Projects, SOPS, LowRisk Changes), then group the changes by LCCB. Each LCCB type has a pre determined sample rate. The number of changes needed to satisfy that rate of sample is calculated as a measure...such as 20% or 100%.

Logic:

1) Import all changes into a Query and perform all operations for data clean up, 01 - Change - ServiceNowChangeRequests: Closed

2) Added an index to be used as part of the random number calculation to insure unique value

3) Added a random buffer calculation to be used for sample selection

4) Alias the first table so create additional work and confusion.

Actually done to give the work a name and store any measure or logic that might be specific to the type of change. SOPDerived = '01 - Change - ServiceNowChangeRequests: Closed'

5)Created a measure to determine a rounded up number for the percentage required for the type/group defined percentage and reflective of page based slicers

Percentage measure was built as 20% = ROUNDUP(CALCULATE(COUNT('SOPDerived'[CHANGE_NUMBER])*.20),0)

6) Leverage the count of the 20% Measure to get a value (September value is 477 * 20% = 95.4 rounded up to 96)

96 Changes would need to be sampled

My initial throught was to grab the calculated numerical measure value 96 and feed that to the TopN calculation over the the random column's values, do to my ignorance or scaler value error, or not using the correct type of calculation.

Anonymous · ‎01-09-2024

Hi @Anonymous ,

Based on your description, it seems you are looking to create a sampling process that selects a fixed percentage of records from a dataset, grouped by a specific field (`LCCB_SYS_ID`), and that this process is influenced by slicers on your report page.

Please try below steps:

1. Data Preparation:
- Import your dataset into Power BI and perform any necessary data cleanup.
- Add an index column to ensure each row has a unique identifier.
- Add a column with a random number to be used for random sampling.

2. Creating Measures:
- Create a measure to calculate the number of changes required for sampling based on the percentage defined for each group. For example:

SampleSize = ROUNDUP(CALCULATE(COUNT('SOPDerived'[CHANGE_NUMBER]) * 0.20), 0)

- Ensure that this measure updates correctly based on the page slicers.

3. Sampling Logic:
- To select the top N items based on the random number column, you can use the `TOPN` function in combination with a filter or calculated table. However, measures cannot be directly used in the `TOPN` function since they do not return a table. Instead, you can use a calculated column or a calculated table to achieve this.
- Here's an example of how you might create a calculated table that takes the top N items based on your sample size measure:

SampledChanges = 
     VAR SampleSize = [SampleSize] -- This is your measure from step 2
     RETURN
     TOPN(
         SampleSize,
         ALL('SOPDerived'),
         'SOPDerived'[RandomColumn], -- This is the column with random numbers
         ASC
     )

- This calculated table will dynamically update based on the measure, which in turn is influenced by the slicers.

4. Export to Excel:
- Once you have your sampled data, you can export it to Excel by using the "Export data" option available in Power BI visuals.

Please note that the calculated table approach may have performance implications depending on the size of your dataset. If you encounter performance issues, consider optimizing your model or sampling within the query editor before loading the data into the model.

Best regards,
Community Support Team_Binbin Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Anonymous · ‎01-09-2024