Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
EricM
Helper I
Helper I

SQL? DAX? Machine Learning? or what?

I have a little problem for which I am wondering what would be the right tool...

 

I have several versions of a dataset.

Each version contains between 500,000 rows and a million.

 

Each dataset is divided in three hierarchies

  • Time (Year/Quarter/Month)
  • Geography (Region/Country)
  • Attributes (Level 1, Level 2, Level 3, Level 4, etc)

I would like to compare 2 versions of the dataset and spot whether there are differences of more than X percent at any level.

 

I could do that in SQL but it would be quite slow...

I'd have to loop through all potential combinations of the three hierarchies, etc... 

Feasible but tedious and slow.

 

I could also do this in DAX but I might also have to loop through all possible hierachy levels.

Feels as tedious even if probably quicker.

 

What about machine learning?

I don't have experience of it but it "feels" like it should be able to do that sort of thing.

 

Or should I use an entirely different approach?

 

 

Thanks

 

Eric

1 ACCEPTED SOLUTION
v-sihou-msft
Microsoft Employee
Microsoft Employee

@EricM

 

In this scenario, since your two datasets same metadata and same number of rows, you can build the relatioinship between two tables and add a calculated column to tag if the fact data is different or not.

 

IfDifferent = IF(Table1[Column]=RELATED(Table2[Column]),0,1)

 

Then you just sum above column together, divide by count of total rows. To calculated it on different level, you just need to use ALLEXCEPT() as filters to group your calculation on different level.

 

Diff Pct On Year = 
CALCULATE(SUM(Table1[IfDifferent]),ALLEXCEPT(Table1,Table1[Year]))
/
CALCULATE(COUNTROWS(Table1),ALLEXCEPT(Table1,Table1[Year]))
Diff Pct On Month = 
CALCULATE(SUM(Table1[IfDifferent]),ALLEXCEPT(Table1,Table1[Month]))
/
CALCULATE(COUNTROWS(Table1),ALLEXCEPT(Table1,Table1[Month]))

Regards,

 

View solution in original post

2 REPLIES 2
v-sihou-msft
Microsoft Employee
Microsoft Employee

@EricM

 

In this scenario, since your two datasets same metadata and same number of rows, you can build the relatioinship between two tables and add a calculated column to tag if the fact data is different or not.

 

IfDifferent = IF(Table1[Column]=RELATED(Table2[Column]),0,1)

 

Then you just sum above column together, divide by count of total rows. To calculated it on different level, you just need to use ALLEXCEPT() as filters to group your calculation on different level.

 

Diff Pct On Year = 
CALCULATE(SUM(Table1[IfDifferent]),ALLEXCEPT(Table1,Table1[Year]))
/
CALCULATE(COUNTROWS(Table1),ALLEXCEPT(Table1,Table1[Year]))
Diff Pct On Month = 
CALCULATE(SUM(Table1[IfDifferent]),ALLEXCEPT(Table1,Table1[Month]))
/
CALCULATE(COUNTROWS(Table1),ALLEXCEPT(Table1,Table1[Month]))

Regards,

 

Greg_Deckler
Community Champion
Community Champion

Could potentially be machine learning falling under "Anomoly Detection" but I would think that you could create a Measure that calculates your % and then through that into a matrix with your hierarchy. Could potentially use the new Top N filter to view highest % differences. Could you supply some mock data and expected result?



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
DAX For Humans

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.