The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
we currently have a powerbi on premise gateway hosted on aws ec2 t2.large which has 2 vcpu and 8gb ram.
however when refreshing one report we are having memory issues.
one table has 57Million records in it.
does anyone have any experience of sizing the gateway on ec2, before i decide to go to t2.xlarge 4 vcpu and 16gb ram.
Solved! Go to Solution.
@Anonymous,
I think the issue is with system specs. 2 vCPUs and 8GB of RAM is very, very low when dealing with large datasets. Instead of table row size, how much total data are you dealing with in terms of GBs?
We have clients with 50GB+ datasets, and we're using Azure Databricks with 15 vCPUs and 56GB of RAM.
i had a support ticket in with microsoft about this, but there solutions didnt work.
in the end i increased the memory on the server to 16gb and it works fine now - peaks at around 8.3gb
we are now using t2.xlarge.
Interesting observations about the gateway whilst debugging the problem.
i would say the process is split into 3
a) get the data via the odbc connector
b) mashup the data using the mashup container
c) send the data to powerbi service
In our case step b) is the memory intensive part and doesnt handle running out of memory very elegantly
Also I would recommend putting the gateway as near to the data source as possible, to avoid network issues as the volume of data received (step a) is far greater than the volume of data sent (step c)
i had a support ticket in with microsoft about this, but there solutions didnt work.
in the end i increased the memory on the server to 16gb and it works fine now - peaks at around 8.3gb
we are now using t2.xlarge.
Interesting observations about the gateway whilst debugging the problem.
i would say the process is split into 3
a) get the data via the odbc connector
b) mashup the data using the mashup container
c) send the data to powerbi service
In our case step b) is the memory intensive part and doesnt handle running out of memory very elegantly
Also I would recommend putting the gateway as near to the data source as possible, to avoid network issues as the volume of data received (step a) is far greater than the volume of data sent (step c)
HI everyone, had time to sleep on this, and have reached a different solution.
Our data is in redshift, and we started out using the redshift connector (no PBI gateway required). However there are limitations with how you can use this connector, i.e. it only connects to a whole table.
Yes you can apply basic filters afterwards and it seems to take these into account on the data refresh.
Anyway we need more control than this, i.e. the ability to join tables in our data request.
So we moved to ODBC and reluctantly set up a powerbi gateway.
Great this all works, except for the performance on large datasets.
So I have now tried using a hybrid approach. Those tables where we are taking all or almost all the data, we can take using the redshift connector, and those tables where we need more control - use the odbc connector. This takes a lot of strain off the gateway.
We will probably still resize the gateway to 16gb, given others comments above, and I have a support ticket in to consider if there are some bugs. I do believe there are because for instance it is passing the same queries to redshift twice.
i am now more inclined to say that this is a bug in the on premise gateway.
i have noticed that there are duplicate queries being requested to the redshift database
i have updated to the latest version of the gateway 14.16.6745.2 but no avail
i have tried setting MashupDefaultPoolContainerMaxCount = 1 but no change
The error i am getting is
Exception of type 'System.OutOfMemoryException' was thrown.
when trying to import a largish table.
@Anonymous,
I think the issue is with system specs. 2 vCPUs and 8GB of RAM is very, very low when dealing with large datasets. Instead of table row size, how much total data are you dealing with in terms of GBs?
We have clients with 50GB+ datasets, and we're using Azure Databricks with 15 vCPUs and 56GB of RAM.
This one particular report is only 175Mb. I havent heard of azure databricks before, is it possible to install a powerbi on premise gateway on there?
I wasn't referring to the PBIX file - because of compression, there's not a direct ratio between the amount of data and the size of the file. I was referring to the amount of data you're pulling in from your datasource, in GB.
Databricks is not a Power BI service, it's a data processing platform. I was just using it as an example of the specs we deal with.
apologies.
The main table that we are processing is 9gb in redshift, and most of its columns are encoded, so i guess depending on how these things work, it could be higher by the time it gets to powerbi gateway.
To make the import more manageable i load it in 10 separate imports and then append these inside powerbi - not sure if this is a good idea or not.
@Anonymous I agree with @RMDNA any heavy lifting at all just crushes the gateway with import methods being used. We pretty much always recommend at least 16GB as a starting point, but I can easily see scaling higher as said.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
50 | |
26 | |
14 | |
14 | |
12 |
User | Count |
---|---|
108 | |
40 | |
24 | |
23 | |
19 |