Solved: Re: Decision Trees in Power BI

freswood · ‎11-08-2017

Hi all, has anybody tried to use the Decision Tree custom visual? Whenever I try to use it (even with < 150k rows as input), it turns out completely different to the output I get in RStudio. Even when using exacty the same input and same settings, the % in the top node are completely different. Eg in one scenario I had a true/false predictive variable, and in RStudio it was 98/02 and in Power BI it was 52/48.

I'm curious to know whether anyone else has had this experience. Thanks for your help!

thekkm13 · ‎08-23-2018

Hey @freswood @Regulate,

I came across this same issue today.

In case you're still looking for a solution, here's what I have figured out so far:

Every time you associate a target variable and select a few input variables, Power BI internally creates a data-table from your original set that contains only the fields you have selected.

It then goes and deletes all duplicate rows.

Assume your original input data is a 5x5 table like:

	Distinct Var A	Input B	Input C	Input D	Target
1	a	1	2	1	TRUE
2	b	1	2	2	TRUE
3	c	2	1	1	FALSE
4	d	2	3	1	TRUE
5	e	0	1	1	FALSE

At this point, you have 60% True, 40% False.

Let's say you set your target and pick only Input D as the input variable.

Your internal data-table being used for the decision tree looks like:

Input D	Target
1	TRUE
2	TRUE
1	FALSE

Now, you have 66.67% True, 33.33% False.

Let's say you now use Input C as an additional input variable.
Your new internal data-table for the decision tree looks like:

Input C	Input D	Target
2	1	TRUE
2	2	TRUE
1	1	FALSE
3	1	TRUE

Following along, you're now at 75% True, 25% False

This explains why the values in your root node keep changing every time you modify the input variables.

I haven't yet been able to figure out a way to make this funny (maybe as designed, but I don't like it) behavior stop.

As a workaround, I'm considering including a variable that is unique for each record. Ideally, the tree should never be splitting on that varaible and it'll ensure that all records of your data are considered, since there won't be any duplicates.

Let me know if this makes sense/works. Cheers!

KK

View solution in original post

Regulate · ‎06-07-2018

I'm having the same problem. In fact, the percentages in the top node change drastically when I try different input variables. But shouldn't the top node be the same irregardless of the input variables?

I have a True/False target variable, and the division in the data is obviously always the same (about 50/50), yet the percentages in the top node change when I add/delete input variables. Percentages in the top node are sometimes 2%/98% and sometimes 40%/60% or anything in between. It's never the same as in the data itself, or if it is it is by chance.

Any help? How can I trust that the decision tree involves all the data if the percentage in the top node changes all the time?

thekkm13 · ‎08-23-2018

Hey @freswood @Regulate,

I came across this same issue today.

In case you're still looking for a solution, here's what I have figured out so far:

Every time you associate a target variable and select a few input variables, Power BI internally creates a data-table from your original set that contains only the fields you have selected.

It then goes and deletes all duplicate rows.

Assume your original input data is a 5x5 table like:

	Distinct Var A	Input B	Input C	Input D	Target
1	a	1	2	1	TRUE
2	b	1	2	2	TRUE
3	c	2	1	1	FALSE
4	d	2	3	1	TRUE
5	e	0	1	1	FALSE

At this point, you have 60% True, 40% False.

Let's say you set your target and pick only Input D as the input variable.

Your internal data-table being used for the decision tree looks like:

Input D	Target
1	TRUE
2	TRUE
1	FALSE

Now, you have 66.67% True, 33.33% False.

Let's say you now use Input C as an additional input variable.
Your new internal data-table for the decision tree looks like:

Input C	Input D	Target
2	1	TRUE
2	2	TRUE
1	1	FALSE
3	1	TRUE

Following along, you're now at 75% True, 25% False

This explains why the values in your root node keep changing every time you modify the input variables.

I haven't yet been able to figure out a way to make this funny (maybe as designed, but I don't like it) behavior stop.

As a workaround, I'm considering including a variable that is unique for each record. Ideally, the tree should never be splitting on that varaible and it'll ensure that all records of your data are considered, since there won't be any duplicates.

Let me know if this makes sense/works. Cheers!

KK

freswood · ‎08-26-2018

Thanks @thekkm13 for figuring this out! As you can see this post has been open for a long time, so it's great to finally have an answer. Please do let us know if you end up trying out the workaround 🙂 Thanks again.

v-yuezhe-msft · ‎11-10-2017

@freswood,

Could you please share sample data of your table?

Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Nabil · ‎12-20-2017

Hello,

Relating to last question, we had the same issue with our data, the percentage is wrong compared to what we can found; in our example, the part of defect is around 24 % in the brut file, with Power BI after getting those data, it’s rather 47%. Can you explain where might be the problem is located ?

i tried to join the file, but not possible into this platform.

can you indicate me you professionnal e-mail adress for sending the file ?

Sincerely,

Nabil

Nabil · ‎12-20-2017

Hello,

Relating to last question, we had the same issue with our data, the percentage is wrong compared to what we can found; in our example, the part of defect is around 24 % in the brut file, with Power BI after getting those data, it’s rather 47%. Can you explain where might be the problem is located ?

I tried to join the file, but not possible with this platform.

Can you indicate me your professional e-mail for sending the file ?

Sincerely,

O.N.

freswood · ‎11-12-2017

Hi Lydia, unfortunately not because the data is commercially sensitive. However I'm hoping that perhaps other people have had similar experiences.

Decision Trees in Power BI

Helpful resources

Join us at the Microsoft Fabric Community Conference

Power BI Monthly Update - January 2025

Fabric Community Update - January 2025

How to Get Your Question Answered Quickly

New Offer! Become a Certified Fabric Data Engineer