Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Find articles, guides, information and community news

Most Recent
v-yiruan-msft
Community Support
Community Support

Scenario

Sometimes, we need to pick some data as a sample to measure the overall situation randomly. For example, a factory produces multiple products, and each product needs to pick out a certain amount of samples to measure the overall pass rate. Moreover, in the selection process, it is also necessary to exclude the factor of human choice. So, how to use Power BI to realize this requirement?

Next, let us take a concrete look at different scenarios and needs.

Note:

In this blog, we only discuss how to sample, not the subsequent process of judging whether it is qualified or not.

Here is a sample table: Table1. And you can find others in the attached .xlsx file. 

Product

Product ID

Production Date

Metrics

Product A

1001

2022/3/12

1.37

Product A

1002

2022/3/12

1.69

Product A

1003

2022/3/13

2.23

Product A

1004

2022/3/14

1.98

Product B

2001

2022/3/12

2.01

Product B

2002

2022/3/13

3.04

Product B

2003

2022/3/13

2.65

Product B

2004

2022/3/14

1.75

Product B

2005

2022/3/14

2.23

Product C

3001

2022/3/13

4.11

Product C

3002

2022/3/13

3.51

Product C

3003

2022/3/13

3.66

Product C

3004

2022/3/14

2.35

Product C

3005

2022/3/15

4.01

Product C

3006

2022/3/16

2.98

 

Part1: Randomly select a few rows of records from the entire table

In Power BI, there is a DAX function named SAMPLE. It will return a sample of N rows from the specified table. And if OrderBy arguments are pro...

Here are examples:

EX1: Return a calculated table in which 3 samples are randomly selected from Table1.

Table1_Sample_1 =

SAMPLE (

    3,                  /* The number of rows to return. */

     Table1,         /* Any DAX expression that returns a table of data from where to extract the 'n' sample rows. */

     1                  /* Since the minimum argument count for SAMPLE function is 3, put any scalar DAX expression. You can also specify a column. */

     )

EX2: Return a calculated table in which 3 samples are randomly selected from Table1 and sorted by one specific column.

Table1_Sample_2 =

SAMPLE (

    3,

     Table1,

     [Metrics]         /* Sort by [Value] column. */

     )

In addition, from my limited testing, I find that SAMPLE function will not update the result returned after refresh if data has not been updated. Here is the test result:

Then how do we get different results after each refresh?

Let’s consider using RAND function. It works the same as in Excel, returning a random number greater than or equal to 0 a...

There are two methods.

a. Combined with SUMMARIZE, TOPN and ADDCOLUMNS functions to create a calculated table.

Table1_RAND =

SUMMARIZE (

    TOPN ( 3, ADDCOLUMNS ( Table1, "Rand", RAND () ), [Rand] ),

    [Product],

    [Product ID],

    [Production Date],

    [Metrics]

)

b. Create a calculated column in the original table and just return the filtered values in Table visual.

Note: To avoid affecting the test results above, we copy the Table1 below, generate Table2, and then create a new computed column in Table2.

Rand = RAND()

We can see that after each refresh, the returned rows are different.

 

Now, we know how to get sample rows from one table. Then, how do we get the same number of rows for each category? Let’s go to the next part.

 

Part2: Randomly select records with the same number of rows by category from the entire table

a. Based on my test, class-by-class sampling cannot be achieved by SAMPLE function in this scenario. For example:

Table1_Sample_3 =

VAR ProductNum_ =

    DISTINCTCOUNT ( Table1[Product] )

RETURN

    SAMPLE ( ProductNum_ * 2, Table1, [Product] )

b. Then we use RANK function to create measures based on the Rand column in Part1 and then use “Filters on this visual” feature.Rank_Table2 =

RANKX (

    ALLEXCEPT ( Table2, Table2[Category] ),

    CALCULATE ( SUM ( Table2[Rand] ) ),

    ,

    DESC,

    DENSE

)

So far, we have implemented sampling by category. Then, if some categories have fewer records and some categories have more records. Then how do we sample it? Let’s go to Part3.

 

Part3: Randomly select records with different rows by category from the entire table

 It is, in fact, very simple that we only need to combine the SWITCH function and add a judgment condition and then use “Filters on this visual” feature. For exam...

RankFilter_Table2 =
VAR Count_ =
    CALCULATE ( COUNTROWS ( Table2 ), ALLEXCEPT ( Table2, Table2[Product] )
VAR Threshold_ =
    SWITCH ( TRUE (), Count_ >= 8, 4, Count_ >= 6, 3, Count_ >= 4, 2, )
RETURN
    IF ( [Rank_Table2] <= Threshold_, )

 

If you want to get more details, please check the attached .pbix file.

Hope this blog can help you.

 

 

Author: Icey Zhang

Reviewer: Kerry Wang & Ula Huang

yingyinr_8-1647408887666.gif

yingyinr_7-1647408793068.gif

yingyinr_4-1647408701580.png

yingyinr_3-1647408626593.gif

yingyinr_2-1647408363640.gif

yingyinr_3-1647408085579.gif

yingyinr_1-1647407908935.png

yingyinr_0-1647407908930.png

Read more...

Helpful resources

Join Blog
Interested in blogging for the community? Let us know.