Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
SaiK
Helper I
Helper I

100 GB data on AWS Workspace

I have approx 100GB dataset in Power BI. I am importing this data from MySQL DB.An issue here is am unable to import the entire data because of the size.How do I workabout this issue?
I am also using AWS Workspace so I have memory and space restrictions. I want to publish the 100GB data to power bi service. What should be my approach? How do I go about this?

2 ACCEPTED SOLUTIONS

Pre-aggregate means to summarize the data using GROUP BY instead of loading all rows. For example instead of loading by transaction id or transaction lines, group the rows by a certain column.





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

View solution in original post

Ohh yess, that is something I could do. Instead of applying the measures after importing shall apply GROUP BY before importing. Thanks a lot!!

View solution in original post

17 REPLIES 17
MFelix
Super User
Super User

Hi @SaiK ,

 

Not sure of what is the granularity of your data but believe that the best option is not to do import mode but to do Direct Query mode, that way the dataset will be kept to the meta data instead of copying the data to Power BI, this is a litle bit slowrr because it will do a query to your MySQL DB when you run a report but allows to minimize the size of the data in Power BI service.


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



I am importing a table from the database. It has 8 columns. I did try the DIrectQuery method with the help of MariaDB connector however it was too slow. Even applying measures was taking more than 5 minutes.

What is the number of rows you are working with?


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



More than 213M rows.

Hi @SaiK,

 

With this level of granularity, the import mode will bring issues like the size and also the refresh. 

 

You need to do use of direct query or summarre your data.

 

Depending on the final result you may want to take advantage of aggregation tables, dynamic filtering or reducing your data. 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



The thing is as of now I am working with 2months of data, which is almost 150M records. So 6 months data will be really huge and I did try Direct Query earlier, but the performance was really bad. Do you think making use of stored procedures and indexes during the time of import would help?

Do you really need those 150M records? Do you need all the columns? Can't they be pre-aggregated?





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

Well I have 5 columns and I need to display those 5 columns with six months data to the user. Currently I am working on 2 months data and its 150M rows. So once I start with six months data it will be easily above 300M. Also what do you mean by pre aggregation? 
Also another issue is am working on Amazon workspace so I am not able to save such huge data after loading it because of memory constraints in AWS. 

Pre-aggregate means to summarize the data using GROUP BY instead of loading all rows. For example instead of loading by transaction id or transaction lines, group the rows by a certain column.





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

Hello,
An update on the same:
I tried various aggregations and simplifications to my query, however have been getting this error: MySQL: Fatal error encountered during command execution. 
I have tried indexing the columns, using window functions instead of group by ,etc. But only the basic queries like SELECT * FROM table works. The rest of the queries with aggregation or partitioning ,etc fails.

Hi @SaiK,

 

Have you tried to execute a stored procedure to create a new table with the aggregation data  and then get the SELECT * from that new table?


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



Hi Felix,
After having a discussion with my team, turns out we will have to build a ETL pipeline to extract the data, peform some transformations on it prolly on DBT and then load the data to Power BI.

Did you create a view in the database?





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

No. Should I create one? How should I go about this issue? Any bit of complexity I add to my code and it gives me the error

 

This is a sample script:

CREATE VIEW ProductCategoryView AS
SELECT 
    p.ProductID,
    p.ProductName,
    p.CategoryID,
    c.CategoryName
FROM 
    Product p
INNER JOIN 
    Category c
ON 
    p.CategoryID = c.CategoryID;

Note: Different databases will have different syntaxes so this

 

This wil create ProductCategoryView view and instead of connecting to the tables, you connect to this instead in power query.





Dane Belarmino | Microsoft MVP | Proud to be a Super User!

Did I answer your question? Mark my post as a solution!


"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

I did try creating the view as you said, even that failed and gave me the same error. I have also changed the query timeout settings to limitless(0). 

Ohh yess, that is something I could do. Instead of applying the measures after importing shall apply GROUP BY before importing. Thanks a lot!!

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors