- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interpreting Key Influencer Visual
Hi all,
Hoping someone can help me interpret the Key Influencer visual below, created in Power BI Desktop (Nov. 2019 release):
- There are many other influential subcategories with a higher volume of projects AND a larger proportion of successful projects (like "Festivals" for instance, which is listed at only 1.89x)
- Based on Microsoft documentation, the 2.77x factor should be calculated as the ratio of % Successful for Literary Journals (48.33%) compared to the average (36.06%), which is clearly not the case here.
I understand that this is driven by an underlying regression model and that statistical significance likely plays some role here, but I just can't wrap my head around these results...
Thoughts?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, the way I look at this is the following (maybe I am wrong, and if so please correct me). I don't know the math behind this, but this is what I learned about statements as 'is x times more likely to have y in case of z) Statistically, it normalizes other variables around it so that it truly can be compared based on the specific variable you are looking at (in this case, subcategory).
Think of it as this example, a table of festivals:
FestivalName | Visitors | Weather | Revenue |
A | 1000 | Sun | 100.000 |
B | 1100 | Sun | 120.000 |
C | 20000 | Rain | 1.000.000 |
D | 500 | Sun | 60.000 |
E | 18000 | Rain | 950.000 |
If you don't look at the whole dataset (all columns), you might draw statements like 'You have much more visitors when it rains'. Obviously this is wrong, these festivals (C and E) might be just very large festivals that even with rain have a higher number of visitors. So you take into account the Revenue and correct for that. Now a better conclusion will be "You have more revenue per visitor when it is sunny", which suddenly makes sense. But you can't say that by looking at absolute numbers or subsets of the data.
So, when you say 'with a higher volume of projects AND a larger proportion of successful projects', that doesn't mean yet that this give a higher chance on success, if other factors combined (and combined over all other subcategories) are playing a role.
I hope this makes sense, it's late where I am 😉
Kind regards
Djerro123
-------------------------------
If this answered your question, please mark it as the Solution. This also helps others to find what they are looking for.
Keep those thumbs up coming! 🙂
Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Key Influencer visual do balanced down-sampling to 10K, i.e., if you have 130K TRUE target and 140K FALSE target, it will only take 5K TRUE and 5K False due to the performance concern for interactive visual and platform limitation (cannot afford to train too large input dataset with ML models).
They adjust the lift with the positiveRatio = subsetTrue/totalTrue, and negativeRatio. Yet it is still likely to have the discrepency on lift because sampling is just a subset of the total population.
In particular for sparse features, e.g., your subcategory may have a lot of distinct categries, and Literary Journals is one of them with small portion of input, the sampling may not work that well.
Helpful resources
Join us at the Microsoft Fabric Community Conference
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Power BI Monthly Update - January 2025
Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.
Subject | Author | Posted | |
---|---|---|---|
04-04-2024 09:12 AM | |||
05-27-2024 02:36 AM | |||
10-30-2024 01:32 AM | |||
06-17-2024 09:50 PM | |||
09-18-2024 09:20 AM |
User | Count |
---|---|
118 | |
75 | |
46 | |
44 | |
34 |
User | Count |
---|---|
180 | |
85 | |
68 | |
47 | |
46 |