Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
nwrozyna
New Member

How to model soccer data ?

Hi all,

I've got a short data modeling questions for a soccer dataset. Tables look like this:


Matches:

matchID | homeID | awayID | ...

 

Teams:

teamID | Name | ...

 

Events:

eventID | matchID | Event


Now, to my question:

The events table contains all tracked events (eg goal or yellow card). Based on this, I am calculating some stats. Now, I want to create a dashboard with a slicer to select the team. Based on the slicer all games with the team being involved are shown and stats are calculated. 
But how do I link the "teams" table to two columns in the "matches" table? 
One could say: unoivot the data and you've got two entries per match, one for home and one for away. But if so, how to link the ticker table then other than n:m?

 

happy to hear your advice!

nico 

9 REPLIES 9
nwrozyna
New Member

Thanks! Got until that point. This is what the real dataset actually looks like. Now two scenarios that I struggle with:


Scenario 1)

Select a team slicer filtering dimension table "Teams" with e g Liverpool to show only games (incl. results and some game metadata) that Liverpool is playing (either home or away). How to deal with it other than changing the data structure of the matches table into something like that:

 

 matchID | homeOrAway | teamID

m10001 | home | t1

m10001 | away | t2

 

With this I can filter the teamID column to let only games appear where t2 (Liverpool) is playing. 
What's best practice when it comes to storing other match metadata like result, date, location? I could store it into another table with matchID as column with unique value (1*n relationship). 

Scenario 2:

I want to browse some key stats out of the events table. But only for a specific game Liverpool was playing. I can only achieve this by building an n:m relationship between the newly created unpivoted matches table and the events table (via matchID). 

What's your suggestion on this?

aj1973
Community Champion
Community Champion

My suggestion is that you send me your PBIX file so you can save time for and for you.

My guessing is that you are trying to build a Dataset off your guessing 🙂 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Sure! The only thing a made up is reducing the dataset the minimum required and switch from local league handball to public soccer 😀

 

Please see attached the dataset with english headers ...

My issue with this is that the datamodel somehow is bad practice because different way of filtering cannot be accomplished with just a lean data table setup.

Remark: this is considered to be a proof of concept and not the final dataset. that's why power query steps are a little messy 😉 

 

Datasat: https://1drv.ms/u/s!AhmA5aJ6VboGhOswCkcY0CS-PJ_TrQ?e=gTLS8B

 

aj1973
Community Champion
Community Champion

First, Hi Germany 😉 and you can name it Football instead of soccer, am originally from Tunisia 😀.

Second, thank you for sharing your file because it's gonna be easier for me to help you.

Now, I have to say that your Model is not quite optimised 

aj1973_0-1667223531896.png

You need to look to obtain a Star Schema with correct Relationships. A Star Shema will make life easier to develop a good report.

So what is the issue you are facing? How can I help you? I still didn't understand what's your point! Can we do this step by step and show me by screenshots the desired Outcome?

 

 

 

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Alright, thanks Amine!

Actually, I have really issues in describing the problem that exists in my head. I know about the star schema and it worked for almost every other data model I have built, but just for this one I really struggle ...

 

Maybe I rephrase my question and I go one step back again:

1) I have raw data as provided in FT_Ticker and FT_Games (or maybe it's a dimension table - let's see). All of the rest is done by myself during ETL.

2) What would you propose if I want to answer the two questions in one datamodel:

-> Please show me all games for one specific team?

-> If I have one match identified, how can I see the liveticker for this match?

 

I guess by answering those two questions you'll face my challenge.

 

Nico

 

 

aj1973
Community Champion
Community Champion

OK, To begin with here is a good model

aj1973_0-1667338839339.png

To answer your first question the model allows to select a team a see the matches and the event for each player.

aj1973_1-1667339291295.png

 

HOWEVER your model and Tables are missing a very important Fileds which are the "Dates". All supposedly Fact tables FT_Games , FT_Games-unpivot and FT_Ticker are all missing column dates, Like when did this match occure, when did this player have a yellow card... also to see the ticker in Live mode when the match is on you would need another type of connection to the FT_Ticker, like Direct Query mode. Last but not least the FT_Ticker table needs to contain a column for the GameID in order to know "This Player had an Event in what Game at what Date". These columns must existe somewhere, keep digging my friend.

Hope this is your first step to glory 🙂

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

aj1973
Community Champion
Community Champion

Hi Nico,

Can you share the three Tables please!

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Thanks for the quick reply. I don't have the real data with me because I'm on the go. but basically it's like this:

 

7EE8A877-5CD0-4FAC-AE1F-33F3978DAA0A.png

aj1973
Community Champion
Community Champion

Hi @nwrozyna 

You should have another table with the PlayerID, PlayerName and PlayerTeam. Like Salah is a Liverpool Player or Sterling is Man City...

Then in event Table you would add a Column for Event name like (Yellow Card, Red Card,Goal,Sub...) and another column with the name of the player or even better his ID.

aj1973_0-1667129107342.png

I am sure these infos are available somewhere in Google.

Good luck

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.