Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
Anonymous
Not applicable

How to design and build data sets of large amount of records

I am new to Power BI.  Please excuse any terminology not consistent with Power BI.  I have been placed in charge of building dashboards for a large amount of data.  I am trying to determine how to design the data sets to be used by Power BI.

 

As background, in the below example, I have built some aggregate tables for use in MS SQL reporting.  A quick list of aggregate tables show these tables, which are equivalent to levels in reporting:

 

1. Region - additional categories - Month for 6 metrics

2. Market - Region -  additional categories - Month for 6 metrics

3. Company - Market - Region -  additional categories - Month for 6 metrics

4. Then the Contact table and fact tables

 

This works well in a SQL world.  How do these "levels" get translated into a Power BI world?

 

Here is our data.  Our basic facts are in 3 tables of 40 million, 5 million, and 1 million.  The fact records hold information about Contacts and three different types of activities by date for the last 36 months.  Our Contacts belong to Companies, which are categorized by Market, and then by Region.  The hierarchy is like this:

 

 

   Region - 10

   Market - 55

   Company - 6,000

   Contacts - 150,000

 

Our users typically analyze each month fact counts, by region, market and company.  the users want to interact with the dashboards to:

 

1.  Compare 6 metrics based on counts by Region.

2.  Then drill to Markets within any one Region

3.  Then compare Companies, usually showing the top 20 Companies for a given month.  Here we need a selector or filter to allow us to pick a month, and show the top 20 companies for that month. 

4.  then the users want to click on a Company and show the Contacts for that Company for that Month.

 

Note: realising how much data there is, we are considering limiting the dashboards to the last three months only.  This would limit the facts to about 9 million records

 

 

What I am not sure of is this:

- Should I build a data set for Region and Market, and then one for each Market and the Companies in each Market?

- And how do I handle Company/Contact data?  

 

Or in genera, how do others determine from large data sets, what levels, like region, market, company, to build data sets?

 

Thanks for any advice

 

This post is intended to give me the starting point to map out the data sets I need to build.

1 ACCEPTED SOLUTION
v-juanli-msft
Community Support
Community Support

Hi @Anonymous

1.  Compare 6 metrics based on counts by Region.

I don't understand what "Compare 6 metrics" means, please give me an example.

counts by Region can be achived by a measure

counts by region = CALCULATE(COUNT('fact'[contacts]),ALLEXCEPT(contact,contact[region]))

Before i create a relationship between table "contact" and "fact" based on "contact" column

1.png

 

2.  Then drill to Markets within any one Region

Add "Region","Market","Company" columns in the row field of the matrix visual, the measure in the value field.

measure->

count change with hierarchy = COUNT('fact'[contacts])

2.png

Then drill down from" region" to "market" level.

reference:

Using drill down with the Matrix visual

Drill mode in a visualization in Power BI

 

3.  Then compare Companies, usually showing the top 20 Companies for a given month.  Here we need a selector or filter to allow us to pick a month, and show the top 20 companies for that month. 

Create a rank measure, then add this measure to the visual level filter to show items when value is less than or equal 20.

counts by companies = CALCULATE(COUNT('fact'[contacts]),ALLEXCEPT(contact,contact[companies]))

rank = RANKX(ALLSELECTED(contact),[counts by companies],,DESC,Dense)

3.png

 

To keep other visuals on the dashboard not affected by the slicer, plase edit interactions as below

 

4.png

Visualization interactions in a Power BI report

 

4.  then the users want to click on a Company and show the Contacts for that Company for that Month.

Based on step3, add column "company" to the slicer, then add [counts by companies] in a card visual

5.png

 

Best Regards

Maggie

 

View solution in original post

4 REPLIES 4
v-juanli-msft
Community Support
Community Support

Hi @Anonymous

1.  Compare 6 metrics based on counts by Region.

I don't understand what "Compare 6 metrics" means, please give me an example.

counts by Region can be achived by a measure

counts by region = CALCULATE(COUNT('fact'[contacts]),ALLEXCEPT(contact,contact[region]))

Before i create a relationship between table "contact" and "fact" based on "contact" column

1.png

 

2.  Then drill to Markets within any one Region

Add "Region","Market","Company" columns in the row field of the matrix visual, the measure in the value field.

measure->

count change with hierarchy = COUNT('fact'[contacts])

2.png

Then drill down from" region" to "market" level.

reference:

Using drill down with the Matrix visual

Drill mode in a visualization in Power BI

 

3.  Then compare Companies, usually showing the top 20 Companies for a given month.  Here we need a selector or filter to allow us to pick a month, and show the top 20 companies for that month. 

Create a rank measure, then add this measure to the visual level filter to show items when value is less than or equal 20.

counts by companies = CALCULATE(COUNT('fact'[contacts]),ALLEXCEPT(contact,contact[companies]))

rank = RANKX(ALLSELECTED(contact),[counts by companies],,DESC,Dense)

3.png

 

To keep other visuals on the dashboard not affected by the slicer, plase edit interactions as below

 

4.png

Visualization interactions in a Power BI report

 

4.  then the users want to click on a Company and show the Contacts for that Company for that Month.

Based on step3, add column "company" to the slicer, then add [counts by companies] in a card visual

5.png

 

Best Regards

Maggie

 

Anonymous
Not applicable

Thanks for your post.

 

What I am trying to figure out is this: 

 

- Do I load all 46 million fact records into a pbix?  or

 

- Do I break up the data in some fashion for faster loading dashboards?  Maybe by region by market or by market by company?

 

 

 

 

 

 

Hi @Anonymous

When you load all 46 million fact records into a pbix, it may slow down the performance.

If you need all the data, after import, you could follow these tips to improve the performance.

 

if you need only some data, you could use parameter to connect data source and add parameter to filter data.

For more details, please read through this article.

https://www.red-gate.com/simple-talk/sql/bi/power-bi-introduction-working-with-parameters-in-power-bi-desktop-part-4/

 

 

Best Regards

Maggie

 

 

 

Anonymous
Not applicable

My question involves how to split data sets?  My original post may have been complicated, but I am looking for advice on how to approach splitting my data to suuport our dashboarding needs.

 

How have others approach data set splitting?

 

 

 

 

Helpful resources

Announcements
June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.