Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
hello, I'm confused about the High density sampling (HDS) algorithm. I've got a Line Chart with 166 different series. The documentation about High Density (Line) Sampling indicates that the maximum number of series that can be displayed is 60. It then describes the process of how 60 representative series are selected if - as in my case - the actual total is higher.
https://docs.microsoft.com/en-us/power-bi/desktop-high-density-sampling
Specifically:
The algorithm creates as many bins as possible to create the greatest granularity for the visual. Within each bin, the algorithm finds the minimum and maximum data value, to ensure that important and significant values (for example, outliers) are captured and displayed in the visual.
Below are two screenshots, first one with HDS On, second with HDS Off:
Based on these screenshots, it appears as if HDS is being applied as indicated in documentation. However ... it turns out that - at least in my use case - outliers at the top end are not represented at all but left out altogether when HDS is On (I used targeted filtering to eliminate some series and leave the outliers in).
I've tried getting my head round the information in the 'Considerations and limitations' section to understand if this is intended behaviour of HDS but am getting confused because of the points below, which appear to suggest the outliers are excluded because alphabetically they appear after the 60th series, but to me this would defeat the point of HDS altogether.
In any case it seems to me this is undesirable behaviour from HDS but can anyone explain why the outliers are not included by HDS?
Many thanks, Bastiaan
Hi @BastiaanBrak,
In any case it seems to me this is undesirable behaviour from HDS but can anyone explain why the outliers are not included by HDS?
What is your desired output? What outliners were you referring to?
Regards,
Yuliana Gu
hi Yuliana @v-yulgu-msft
Ideally, my desired output is for all 166 series to be visible. If that option is not available, I'd be content with what High Density Sampling is purported to do, i.e. "ensure that important and significant values (for example, outliers) are captured and displayed in the visual" but in my use case HDS does not work as described.
As you can see in the screenshot on the right (High Density Sampling = OFF), there are at five series, three in the pink area and two in the blue area, that are not present when High Density Sampling = ON (screenshot on left), at least three of which I would argue represent "important and significant values" since they represent faster rising series than the ones included in HDS.
Thanks and hope this helps, Bastiaan
Anyone??
Update: you can see the high density sampling error as described above in action in this web report:
https://ahdb.org.uk/bgmec
Specifically: the location in the south-west of England, which has been omitted from the graph by the high density sampling algorithm, represents the time-series with the steepest slope (click the location on the map or select 'South West England' from the Region drop down to verify) so should NOT have been omitted.
Can you confirm this has been raised as a glitch now @v-yulgu-msft ?
Hope it's OK to tag in some of the contributors listed on the HDS documentation article?@DavidIseminger
@lcasey