The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
I have approx 100GB dataset in Power BI. I am importing this data from MySQL DB.An issue here is am unable to import the entire data because of the size.How do I workabout this issue?
I am also using AWS Workspace so I have memory and space restrictions. I want to publish the 100GB data to power bi service. What should be my approach? How do I go about this?
Solved! Go to Solution.
Pre-aggregate means to summarize the data using GROUP BY instead of loading all rows. For example instead of loading by transaction id or transaction lines, group the rows by a certain column.
Ohh yess, that is something I could do. Instead of applying the measures after importing shall apply GROUP BY before importing. Thanks a lot!!
Hi @SaiK ,
Not sure of what is the granularity of your data but believe that the best option is not to do import mode but to do Direct Query mode, that way the dataset will be kept to the meta data instead of copying the data to Power BI, this is a litle bit slowrr because it will do a query to your MySQL DB when you run a report but allows to minimize the size of the data in Power BI service.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsI am importing a table from the database. It has 8 columns. I did try the DIrectQuery method with the help of MariaDB connector however it was too slow. Even applying measures was taking more than 5 minutes.
What is the number of rows you are working with?
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsMore than 213M rows.
Hi @SaiK,
With this level of granularity, the import mode will bring issues like the size and also the refresh.
You need to do use of direct query or summarre your data.
Depending on the final result you may want to take advantage of aggregation tables, dynamic filtering or reducing your data.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsThe thing is as of now I am working with 2months of data, which is almost 150M records. So 6 months data will be really huge and I did try Direct Query earlier, but the performance was really bad. Do you think making use of stored procedures and indexes during the time of import would help?
Do you really need those 150M records? Do you need all the columns? Can't they be pre-aggregated?
Well I have 5 columns and I need to display those 5 columns with six months data to the user. Currently I am working on 2 months data and its 150M rows. So once I start with six months data it will be easily above 300M. Also what do you mean by pre aggregation?
Also another issue is am working on Amazon workspace so I am not able to save such huge data after loading it because of memory constraints in AWS.
Pre-aggregate means to summarize the data using GROUP BY instead of loading all rows. For example instead of loading by transaction id or transaction lines, group the rows by a certain column.
Hello,
An update on the same:
I tried various aggregations and simplifications to my query, however have been getting this error: MySQL: Fatal error encountered during command execution.
I have tried indexing the columns, using window functions instead of group by ,etc. But only the basic queries like SELECT * FROM table works. The rest of the queries with aggregation or partitioning ,etc fails.
Hi @SaiK,
Have you tried to execute a stored procedure to create a new table with the aggregation data and then get the SELECT * from that new table?
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsHi Felix,
After having a discussion with my team, turns out we will have to build a ETL pipeline to extract the data, peform some transformations on it prolly on DBT and then load the data to Power BI.
Did you create a view in the database?
No. Should I create one? How should I go about this issue? Any bit of complexity I add to my code and it gives me the error
This is a sample script:
CREATE VIEW ProductCategoryView AS
SELECT
p.ProductID,
p.ProductName,
p.CategoryID,
c.CategoryName
FROM
Product p
INNER JOIN
Category c
ON
p.CategoryID = c.CategoryID;
Note: Different databases will have different syntaxes so this
This wil create ProductCategoryView view and instead of connecting to the tables, you connect to this instead in power query.
I did try creating the view as you said, even that failed and gave me the same error. I have also changed the query timeout settings to limitless(0).
Ohh yess, that is something I could do. Instead of applying the measures after importing shall apply GROUP BY before importing. Thanks a lot!!