Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Carlotta_Fabris
Frequent Visitor

Improve Performances on matrix and histogram

Hi everyone!

I have a dataset of responses to different incidents, categorized by a variety of variables like incident type, unit type etc.

In the dataset I have different types of responses, like at scene time, travel time, turnout time etc.

I would like to build a histogram and a matrix with different bins and the frequency of those bins for the different responses. For example if I want to measure Travel time, I will have all the possible bins and then the number of travel times that fall under those bins.

The huge problem that I am facing is that bin size should also be dynamic.

 

I was able to build a report with different parameters and this is the result:

Carlotta_Fabris_0-1709047518802.png

The problem is that when I change the bin size, it takes more than 10 seconds to update the histogram and the matrix.
Below I will share the model and the formula I am using.
The model is only one big table and I know that maybe I could try to transform it in a star schema model to improve performances, but I wanted to understand if you have other ideas on how I could improve this.

 

Carlotta_Fabris_1-1709047866614.png

 

 

 

 

Responses_Frequency =
SWITCH (
    SELECTEDVALUE ( Select_Measure[Measures Fields] ),
    "'Data'[Response Seconds]",
        VAR _BinSize_Start =
            SELECTEDVALUE ( Parameter[Parameter] )
        VAR _BinSize_End = ( _BinSize_Start + 'Bin Size'[Bin Size Value] ) - 1
        VAR _ResponsesCount =
            COUNT ( Data[Response Seconds] )
        VAR _ResponsesNr =
            CALCULATE (
                COUNT ( Data[Response Seconds] ),
                FILTER (
                    ALL ( Data[Response Seconds] ),
                    Data[Response Seconds] >= _BinSize_Start
                        && Data[Response Seconds] <= _BinSize_End
                )
            )
        VAR _ResponsesNrAll =
            IF ( ISINSCOPE ( Parameter[Bins] ), _ResponsesNr, _ResponsesCount )
        RETURN
            _ResponsesNrAll,
    "'Data'[Call Processing Seconds]",
        VAR _BinSize_Start =
            SELECTEDVALUE ( Parameter[Parameter] )
        VAR _BinSize_End = ( _BinSize_Start + 'Bin Size'[Bin Size Value] ) - 1
        VAR _ResponsesCount =
            COUNT ( Data[Call Processing Seconds] )
        VAR _ResponsesNr =
            CALCULATE (
                COUNT ( Data[Call Processing Seconds] ),
                FILTER (
                    ALL ( Data[Call Processing Seconds] ),
                    Data[Call Processing Seconds] >= _BinSize_Start
                        && Data[Call Processing Seconds] <= _BinSize_End
                )
            )
        VAR _ResponsesNrAll =
            IF ( ISINSCOPE ( Parameter[Bins] ), _ResponsesNr, _ResponsesCount )
        RETURN
            _ResponsesNrAll,
    "'Data'[Turnout Seconds]",
        VAR _BinSize_Start =
            SELECTEDVALUE ( Parameter[Parameter] )
        VAR _BinSize_End = ( _BinSize_Start + 'Bin Size'[Bin Size Value] ) - 1
        VAR _ResponsesCount =
            COUNT ( Data[Turnout Seconds] )
        VAR _ResponsesNr =
            CALCULATE (
                COUNT ( Data[Turnout Seconds] ),
                FILTER (
                    ALL ( Data[Turnout Seconds] ),
                    Data[Turnout Seconds] >= _BinSize_Start
                        && Data[Turnout Seconds] <= _BinSize_End
                )
            )
        VAR _ResponsesNrAll =
            IF ( ISINSCOPE ( Parameter[Bins] ), _ResponsesNr, _ResponsesCount )
        RETURN
            _ResponsesNrAll,
    "'Data'[Travel Seconds]",
        VAR _BinSize_Start =
            SELECTEDVALUE ( Parameter[Parameter] )
        VAR _BinSize_End = ( _BinSize_Start + 'Bin Size'[Bin Size Value] ) - 1
        VAR _ResponsesCount =
            COUNT ( Data[Travel Seconds] )
        VAR _ResponsesNr =
            CALCULATE (
                COUNT ( Data[Travel Seconds] ),
                FILTER (
                    ALL ( Data[Travel Seconds] ),
                    Data[Travel Seconds] >= _BinSize_Start
                        && Data[Travel Seconds] <= _BinSize_End
                )
            )
        VAR _ResponsesNrAll =
            IF ( ISINSCOPE ( Parameter[Bins] ), _ResponsesNr, _ResponsesCount )
        RETURN
            _ResponsesNrAll,
    "'Data'[Total Reflex Seconds]",
        VAR _BinSize_Start =
            SELECTEDVALUE ( Parameter[Parameter] )
        VAR _BinSize_End = ( _BinSize_Start + 'Bin Size'[Bin Size Value] ) - 1
        VAR _ResponsesCount =
            COUNT ( Data[Total Reflex Seconds] )
        VAR _ResponsesNr =
            CALCULATE (
                COUNT ( Data[Total Reflex Seconds] ),
                FILTER (
                    ALL ( Data[Total Reflex Seconds] ),
                    Data[Total Reflex Seconds] >= _BinSize_Start
                        && Data[Total Reflex Seconds] <= _BinSize_End
                )
            )
        VAR _ResponsesNrAll =
            IF ( ISINSCOPE ( Parameter[Bins] ), _ResponsesNr, _ResponsesCount )
        RETURN
            _ResponsesNrAll,
    "'Data'[At scene Seconds]",
        VAR _BinSize_Start =
            SELECTEDVALUE ( Parameter[Parameter] )
        VAR _BinSize_End = ( _BinSize_Start + 'Bin Size'[Bin Size Value] ) - 1
        VAR _ResponsesCount =
            COUNT ( Data[At scene Seconds] )
        VAR _ResponsesNr =
            CALCULATE (
                COUNT ( Data[At scene Seconds] ),
                FILTER (
                    ALL ( Data[At scene Seconds] ),
                    Data[At scene Seconds] >= _BinSize_Start
                        && Data[At scene Seconds] <= _BinSize_End
                )
            )
        VAR _ResponsesNrAll =
            IF ( ISINSCOPE ( Parameter[Bins] ), _ResponsesNr, _ResponsesCount )
        RETURN
            _ResponsesNrAll,
    "'Data'[Transport Travel Seconds]",
        VAR _BinSize_Start =
            SELECTEDVALUE ( Parameter[Parameter] )
        VAR _BinSize_End = ( _BinSize_Start + 'Bin Size'[Bin Size Value] ) - 1
        VAR _ResponsesCount =
            COUNT ( Data[Transport Travel Seconds] )
        VAR _ResponsesNr =
            CALCULATE (
                COUNT ( Data[Transport Travel Seconds] ),
                FILTER (
                    ALL ( Data[Transport Travel Seconds] ),
                    Data[Transport Travel Seconds] >= _BinSize_Start
                        && Data[Transport Travel Seconds] <= _BinSize_End
                )
            )
        VAR _ResponsesNrAll =
            IF ( ISINSCOPE ( Parameter[Bins] ), _ResponsesNr, _ResponsesCount )
        RETURN
            _ResponsesNrAll
)

 

 

 

 

 

A possible solution I was thinking is to ceil the values of the different responses and then use those columns in the histogram and matrix. 
Doing this values greater than 00:00:00 will be ceiled for example to 00:00:05 that is my smaller bin, but I would like it to be contained in side the bin that goes from o to 5 seconds, not in the one that goes from 5 to 10.

 

I also tried to create a table with all the bins and create a relationship with the responses. In this case I have two problems, the total is not correct and the second is that the performances did not improve much.

 

Do you have ideas on how to improve the performances?

If you need more details, I will be happy to give it to you.

 

Thank you,
Carlotta.

2 REPLIES 2
v-yilong-msft
Community Support
Community Support

Hi @Carlotta_Fabris ,

Creating histograms and matrices with dynamic bin sizes can indeed be challenging, especially when dealing with large datasets. Here are my answers.

 

Ensure that your dataset is properly indexed to speed up query performance. Consider pre-aggregating data where possible. For example, if you have a large number of records, aggregate them into broader time intervals (e.g., minutes instead of seconds) before creating histograms.

 

Your idea of ceiling values is a good one. However, ensure that the ceiling values align with the desired bin boundaries. For instance, if your smallest bin is 00:00:05, make sure that values greater than 00:00:00 are rounded up to 00:00:05, not 00:00:10.

 

Transforming your model into a star schema might improve performance by reducing the complexity of queries. Consider creating separate fact tables for different response types (e.g., scene time, travel time) and dimension tables for incident type, unit type, etc. Use relationships between fact and dimension tables to create efficient queries.

 

There may be other aspects that could be optimized, but these are the directions I can think of.

 

 

 

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

Thank you for your suggestions.

After a lot of trial and error, I discovered that the problem was caused by the measure that calculates the cumulative frequency.
Here it is:

VAR _BinSize_Start = 0
VAR _BinSize_End = ((_BinSize_Start + SELECTEDVALUE(Parameter[Parameter])) + 'Bin Size'[Bin Size Value]) -1
VAR _ResponsesCount = COUNT(Data[Response Seconds])
VAR _ResponsesNr = CALCULATE(COUNT(Data[Response Seconds]), FILTER(ALL(Data[Response Seconds]),Data[Response Seconds] >= _BinSize_Start && Data[Response Seconds] <= _BinSize_End))
VAR _ResponsesNrAll = 
  IF(ISINSCOPE(Parameter[Bins]),
     _ResponsesNr,
     _ResponsesCount
  )
RETURN _ResponsesNrAll

It basically takes as start the 0 bin and thencounts all the reponses that are within 0 and the end of the bin.
The end of the bin is always updated taking the current bin + the size of the bin -1. This way we always go from 0 to the end of the next bin.

Do you have any suggestion on how to make this faster?

Thanks,
Carlotta.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.