Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
jessegorter
Helper I
Helper I

large datasets - incremental refresh

I was wondering, if I have a p1 capacity, and set up incremental refresh. Can the total dataset size grow beyond the 25 gig max as long as the incremental refresh itself does not consume more than 25 gig?

 

is there no upper limit then?

 

or do you need to go to a higher capacity once your dataset is 25 gig?

2 ACCEPTED SOLUTIONS

Hey @jessegorter ,

 

it's simple not possible that a dataset can outgrow the available memory of your P1 capacity.

The table provided in the latest post by @solvisig inidcates a 400GB dataset size limit. This is absolutely true - but for a P5 capacity.


This article What is Power BI Premium Gen2? - Power BI | Microsoft Docs shows a table that has the dataset size limits per capacity.

I assume that you have transformed you P1 capacity to the new Gen 2 architecture (simply by toggling a button). one of the benefits of Gen 2 is that each artifact can grow to the available memory.

In regards to your initial question about your P1 capaciy - the dataset can not outgrow the available memory.
Memory is of course not the same as storage. Storage is 100TB per capacity.

Sometimes I got asked why there is so much storage in comparison to the memory.

The answer is simple, the underlying assumption of the Power BI Premium (Gen 1 and  Gen 2) is that there are many datasets and not just one. Power BI is juggling the datasets that are loaded to the available memory.

 

Regards,

Tom 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

Hey @jessegorter,

 

it's difficult to estimate the required memory for a fact table with a billion rows, this is due to all the compression that happens in the in-memory colum-oriented world of the Tabular model. This article will get you started: Inside VertiPaq - Compress for success - Data Mozart (data-mozart.com)

 

If the origin of the fact table is relational it will be likely that there is a column that has a unique identifier for each row inside a fact table, most of the time this unique identifier is of no use, but can add a tremendous memory footprint inside Tabular models.

 

The most problematic issue with large datasets is the initial load. I doubt that a fact table with a billion (10^9) rows requires a P5.

Fact tables with that number of rows require some thinking, e.g. considering using measures instead of pre-computed columns can create a lot of unique values. These columns will burn your memory in seconds 🙂

 

Good luck,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

10 REPLIES 10
jessegorter
Helper I
Helper I

thanks all! I didn't know the surrogate key could add such a footprint also.

Having said that, the direct query generated by Power BI didn't seem so smart also: fot a distinct count dax, it just sent a "select 2 billion rows from sql table" - query 🙂
is there a way to influence your direct query query based on the dax?

solvisig
Advocate III
Advocate III

My understanding is this:

  1. Your dataset can grow beyond 25GB
    1. Incremental refresh size depends on your capcity settings. If the memory consumptions during refresh is up to half of your capacity you can start to see performance issues and should therefore scale out the capacity
  2. Dataset upper limit is 400GB
  3. No need to go to higher capacity when your dataset grows beyond 25GB as long as the incremental refresh partitions are not consuming this much memory and causing performance issues.

 

 

Hey @solvisig ,

 

unfortunately the dataset size can not outgrow the available memory of the capacity P1 eq 25GB.

This article https://docs.microsoft.com/en-us/power-bi/enterprise/service-premium-large-models has a Considerations and limitations section, unfortunately it is not explicitely mentioned bullet point 5 is relevant here (next to my personal experience).

Regards,
Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

"Refreshing large datasets: Datasets that are close to half the size of the capacity size (for example, a 12-GB dataset on a 25-GB capacity size) may exceed the available memory during refreshes. Using the enhanced refresh REST API or the XMLA endpoint, you can perform fine grained data refreshes, so that the memory needed by the refresh can be minimized to fit within your capacity's size."

I don't think this is referring to storage. Only memory on the capacity processing the dataset.

On the subscription comparison table you can see the model size limit(dataset size limit)

solvisig_0-1656582169180.png

 

Hey @jessegorter ,

 

it's simple not possible that a dataset can outgrow the available memory of your P1 capacity.

The table provided in the latest post by @solvisig inidcates a 400GB dataset size limit. This is absolutely true - but for a P5 capacity.


This article What is Power BI Premium Gen2? - Power BI | Microsoft Docs shows a table that has the dataset size limits per capacity.

I assume that you have transformed you P1 capacity to the new Gen 2 architecture (simply by toggling a button). one of the benefits of Gen 2 is that each artifact can grow to the available memory.

In regards to your initial question about your P1 capaciy - the dataset can not outgrow the available memory.
Memory is of course not the same as storage. Storage is 100TB per capacity.

Sometimes I got asked why there is so much storage in comparison to the memory.

The answer is simple, the underlying assumption of the Power BI Premium (Gen 1 and  Gen 2) is that there are many datasets and not just one. Power BI is juggling the datasets that are loaded to the available memory.

 

Regards,

Tom 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

thank you!

It makes sense it cannot grow beyond the capacity limit, I was confused by some of the documentation. I had a large fact table around 1 billion rows, and it seems I could not even fit that into a p5 without incremental refresh

Hey @jessegorter,

 

it's difficult to estimate the required memory for a fact table with a billion rows, this is due to all the compression that happens in the in-memory colum-oriented world of the Tabular model. This article will get you started: Inside VertiPaq - Compress for success - Data Mozart (data-mozart.com)

 

If the origin of the fact table is relational it will be likely that there is a column that has a unique identifier for each row inside a fact table, most of the time this unique identifier is of no use, but can add a tremendous memory footprint inside Tabular models.

 

The most problematic issue with large datasets is the initial load. I doubt that a fact table with a billion (10^9) rows requires a P5.

Fact tables with that number of rows require some thinking, e.g. considering using measures instead of pre-computed columns can create a lot of unique values. These columns will burn your memory in seconds 🙂

 

Good luck,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany
TomMartens
Super User
Super User

Hey @jessegorter ,

 

the dataset can not outgrow the available memory. This article provides a table with the limits that apply to all the capacities: What is Microsoft Power BI Premium? - Power BI | Microsoft Docs

 

Regards,
Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

The OP is referring to the large dataset format:

https://docs.microsoft.com/en-us/power-bi/enterprise/service-premium-large-models

 
------------------------------------------------
1. How to get your question answered quickly - good questions get good answers!
2. Learning how to fish > being spoon-fed without active thinking.
3. Please accept as a solution posts that resolve your questions.
------------------------------------------------
BI Blog: Datamarts | RLS/OLS | Dev Tools | Languages | Aggregations | XMLA/APIs | Field Parameters | Custom Visuals

Hey @jessegorter ,

 

as @otravers already mentioned, using the large dataset format, can grow the dataset size limit to the size of the capacity, for this reason I adjusted my initial request. I changed "... dataset size limit ..." to "... available memory ..."

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors