Reply
killer85
Regular Visitor
Partially syndicated - Outbound

Generating Dynamic Random Sample in Power BI with Custom Constraints

Hello Power BI Community,

I hope you’re all doing well!
I’m currently working on a project where I need to generate a dynamic random sample from a dataset of approximately 4,000 to 5,000 rows, with the dataset size varying weekly. The user needs to be able to choose a sample size from predefined percentages (4%, 5%, 6%, 7%, 8%, or 9%) directly within the Power BI report.

Here’s a more detailed description of the scenario:

  • Dynamic Sample Selection: Based on the user’s selection, the percentage of rows to be displayed in the report should change accordingly. For example, if the user selects 5%, only 5% of the total rows should be shown in the report. It could be 5%, 6%, 7%, 8 and 9%.
  • Region Constraint: The dataset contains two regions, and I need to ensure that the first region is represented by at least 5% of the random sample.
  • Collaborator Distribution: After the sample is selected, I need to split the rows between two collaborators according to a 40% / 60% distribution. Considering the fact as well that one region is fully going to only one collaborator.
  • Interactive Report: The user should be able to see the selected sample in real-time on the report screen, and the sample size (percentage) should be reflected dynamically based on their choice.

What I’ve tried so far:
I’ve experimented with various methods to achieve this dynamic sampling and have found that the only way that currently works is by creating specific tables for each case (e.g., one for 5%, one for 6%, etc.). I then compile them and disclose this table in the report, depending on the user’s selection. While this method works, it feels a bit cumbersome, and I’d like to explore to see your solution.

My specific questions:

  • How can I create a random sample that dynamically changes the number of rows displayed in the report based on the user’s selected percentage (4%, 5%, 6%, 7%, 8%, or 9%)?

I would really appreciate any insights, techniques, or recommendations you might have to improve this solution. Thank you in advance for your help!

Best regards,

1 ACCEPTED SOLUTION

Here is a sample setup, assuming even distribution. 

 

lbendlin_0-1735688375740.png

Table:

 

 

Samples = ADDCOLUMNS(GENERATESERIES(1,5000),"Rand",RAND())

 

 

Filter Measure:

 

 

Include = if(sum(Samples[Rand])<=[Sample size Value],1,0)

 

 

If you need it more precise, use percentilex.:

Include = if(sum(Samples[Rand])<=PERCENTILEX.INC(all(Samples[Rand]),[Rand],[Sample size Value]),1,0)

Row count measure:

 

 

Included Rows = COUNTROWS(FILTER(Samples,[Include]=1))

 

 

 

View solution in original post

6 REPLIES 6
killer85
Regular Visitor

Syndicated - Outbound

Hi Ibendlim,

Firstly, thanks for your explanations but for me it seems to cover only the part regarding the random selection which is not the tricky part.
Regarding the second part, for which I tried to let the possibility to my user to choose its percentage of lines selected based on a filter or slicer. I still don't know how to manage it with your propsal of using simple filter or PERCENTILEX.
Would you please help me by clarifying it?

Thank you,

How can I create a random sample that dynamically changes the number of rows displayed in the report based on the user’s selected percentage (4%, 5%, 6%, 7%, 8%, or 9%)?

So for the case when the user selects 5% you would then only accept the rows that fall in the .05 percentile. and so on.

So I create a slicer with the parameter, and I retrieve the value chosen by the user by using SELECTEDVALUE?
The problem is that at this stage, when I do this, the number of rows displayed in my table does not change. It's as if the data is static. 
Just as a reminder of my need, I have to disclose all the lines that are part of the 5% of my selection. My request is not to disclose only the value in which the 5% value is.
May I ask you to be more precise on how to use this percentileX because I think I miss something there.

Here is a sample setup, assuming even distribution. 

 

lbendlin_0-1735688375740.png

Table:

 

 

Samples = ADDCOLUMNS(GENERATESERIES(1,5000),"Rand",RAND())

 

 

Filter Measure:

 

 

Include = if(sum(Samples[Rand])<=[Sample size Value],1,0)

 

 

If you need it more precise, use percentilex.:

Include = if(sum(Samples[Rand])<=PERCENTILEX.INC(all(Samples[Rand]),[Rand],[Sample size Value]),1,0)

Row count measure:

 

 

Included Rows = COUNTROWS(FILTER(Samples,[Include]=1))

 

 

 

Hi Ibendlin,
First I should start by wishing all the best for this new year 2025.
Thank you for your very clear explanations. I still don’t understand why I didn’t get it before, as it seems quite easy to set up.
Thanks for your support.

lbendlin
Super User
Super User

Syndicated - Outbound

In Power Query:

 

- add a random value column

- sort by the new column

- add an index column 

 

(NOTE:  Sorting in Power Query is very expensive - I am only proposing it because of your low number of rows)

 

That's all you need.  Now in Power BI you can use a simple filter or PERCENTILEX on the index column.

avatar user

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

Feb2025 NL Carousel

Fabric Community Update - February 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)