Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
twowaysyellow
Frequent Visitor

How to prepare and model an unusual fact table for cohort analysis?

As a complete beginner, I know this might be an ambitious use case to follow, but it is nonetheless the one we need to get to :-).

We would like to measure and compare retention rates based on referral codes. I have found some examples online on how to make cohort analyses. Still, I’m unsure how to model our data, as our business is not a traditional subscription or eCommerce business.

The best way to illustrate our situation is by comparing us to an auction site.

Our “fact” table data from our production system is structured by transactions where each line item looks like this.

orderID   sellerUserID   buyerUserID   amount   date
11011202022-12-24 

 

We have another table with user info like this.

userID   creationDate   referralCode
102022-12-01ref01 

 

From looking at examples online, I gather it would be better for us to have a fact table that is a little bit different. Something like:

transactionID   transactionType   orderID   userID   amount   date
1sell110202022-12-24
2buy111202022-12-24 

 

The next thing I would start working on is learning how to convert our existing fact table in Power BI to the one proposed.

Am I on the right track, or should I think differently about this?

1 ACCEPTED SOLUTION

IMO you should no to this direction, but the answer will be like this.

Direct answer

From this:

bolfri_0-1671911794499.png

 

To this:

bolfri_1-1671911807223.png

Power Query M:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTI0ABEglpEBmDAy0jU00jUyUYqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"orderID   " = _t, #"sellerUserID   " = _t, #"buyerUserID   " = _t, #"amount   " = _t, date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"orderID   ", Int64.Type}, {"sellerUserID   ", Int64.Type}, {"buyerUserID   ", Int64.Type}, {"amount   ", Int64.Type}, {"date", type date}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"orderID   ", "amount   ", "date"}, "transactionType", "UserID"),
    #"Detect Sell" = Table.ReplaceValue(#"Unpivoted Columns","sellerUserID","sell",Replacer.ReplaceText,{"transactionType"}),
    #"Detect Buy" = Table.ReplaceValue(#"Detect Sell","buyerUserID","buy",Replacer.ReplaceText,{"transactionType"})
in
    #"Detect Buy"

If you will go with this scenario you will end up with 2n rows at the end, which may be a problem in the future due to fact that you can expect a lot of transactions, right? 🙂

Recommended solution

Split information from your oryginal data info fact_transaction and fact_user_to_transaction tables.

 

Data preparation

fact_transaction

bolfri_2-1671912292105.png

let
    #"Removed Other Columns" = Table.SelectColumns(Sample,{"orderID   ", "date", "amount   "})
in
    #"Removed Other Columns"

fact_user_to_transaction

bolfri_3-1671912315556.png

let
    #"Removed Other Columns" = Table.SelectColumns(Sample,{"orderID   ", "sellerUserID   ", "buyerUserID   "}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Removed Other Columns", {"orderID   "}, "transactionType", "userID"),
    #"Detect sell transactions" = Table.ReplaceValue(#"Unpivoted Columns","sellerUserID","sell",Replacer.ReplaceText,{"transactionType"}),
    #"Detect buy transactions" = Table.ReplaceValue(#"Detect sell transactions","buyerUserID","buy",Replacer.ReplaceText,{"transactionType"})
in
    #"Detect buy transactions"

dim_user

bolfri_4-1671912327350.png

Note that I've added a customer with ID 11.

 

Relationship

bolfri_5-1671912499541.png

 

It will be easier for you to track the logic and write down measures like number of transactions, numer for transactions for each user, numer of transactions where user was a buyer or maybe KPI: share buyer / user role. In this model it will be just easy and it saves a lot of unnessesary rows.

PBIX file: https://we.tl/t-t98gEAuuDF

 

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

6 REPLIES 6
bolfri
Super User
Super User

transactionID   transactionType   orderID   userID   amount   date
1sell110202022-12-24
2buy1111202022-12-24

Can you explain why row 2 has transactionID with number 2 and why the orderID is 11? User ID is clear for me, but I dont understand this 2 fields?

What I mean here is that you are thying to get unpivoted information about transaction so it orderID suppose to be 1 not 11, right? But still - what is transactionID?

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Sorry. You are right. This is a mistake. It should of course be orderID = 1.

 

I will correct it in the original post.

Thanks for pointing it out.

And for the transactionID? What it is comming from? 🙂 I can help you with transformation but this is the missing part.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Thanks for your interest. Actually the transactionID is not coming form anywhere as it is not something we have today. The example with the transactionID, is the end result I'm thinking we should end up with.

 

The two first examples is how our data looks today. The third example, you are are refering to, is what I "think" should be the end result :-).

 

Hope it makes sense.

IMO you should no to this direction, but the answer will be like this.

Direct answer

From this:

bolfri_0-1671911794499.png

 

To this:

bolfri_1-1671911807223.png

Power Query M:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTI0ABEglpEBmDAy0jU00jUyUYqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"orderID   " = _t, #"sellerUserID   " = _t, #"buyerUserID   " = _t, #"amount   " = _t, date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"orderID   ", Int64.Type}, {"sellerUserID   ", Int64.Type}, {"buyerUserID   ", Int64.Type}, {"amount   ", Int64.Type}, {"date", type date}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"orderID   ", "amount   ", "date"}, "transactionType", "UserID"),
    #"Detect Sell" = Table.ReplaceValue(#"Unpivoted Columns","sellerUserID","sell",Replacer.ReplaceText,{"transactionType"}),
    #"Detect Buy" = Table.ReplaceValue(#"Detect Sell","buyerUserID","buy",Replacer.ReplaceText,{"transactionType"})
in
    #"Detect Buy"

If you will go with this scenario you will end up with 2n rows at the end, which may be a problem in the future due to fact that you can expect a lot of transactions, right? 🙂

Recommended solution

Split information from your oryginal data info fact_transaction and fact_user_to_transaction tables.

 

Data preparation

fact_transaction

bolfri_2-1671912292105.png

let
    #"Removed Other Columns" = Table.SelectColumns(Sample,{"orderID   ", "date", "amount   "})
in
    #"Removed Other Columns"

fact_user_to_transaction

bolfri_3-1671912315556.png

let
    #"Removed Other Columns" = Table.SelectColumns(Sample,{"orderID   ", "sellerUserID   ", "buyerUserID   "}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Removed Other Columns", {"orderID   "}, "transactionType", "userID"),
    #"Detect sell transactions" = Table.ReplaceValue(#"Unpivoted Columns","sellerUserID","sell",Replacer.ReplaceText,{"transactionType"}),
    #"Detect buy transactions" = Table.ReplaceValue(#"Detect sell transactions","buyerUserID","buy",Replacer.ReplaceText,{"transactionType"})
in
    #"Detect buy transactions"

dim_user

bolfri_4-1671912327350.png

Note that I've added a customer with ID 11.

 

Relationship

bolfri_5-1671912499541.png

 

It will be easier for you to track the logic and write down measures like number of transactions, numer for transactions for each user, numer of transactions where user was a buyer or maybe KPI: share buyer / user role. In this model it will be just easy and it saves a lot of unnessesary rows.

PBIX file: https://we.tl/t-t98gEAuuDF

 

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Woow thanks a lot. This is much more than I expected. A solution and advice to do improve. Thanks a lot. I will study your examples with great care.

Helpful resources

Announcements
July 2024 Power BI Update

Power BI Monthly Update - July 2024

Check out the July 2024 Power BI update to learn about new features.

July Newsletter

Fabric Community Update - July 2024

Find out what's new and trending in the Fabric Community.