The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hello everyone!
We are working with big datasets, and the performance is degrading very quickly. Most probably, because our workflow isn't optimized. I'll describe our workflow.
We use big TXT files as data sources, and we import them into PowerBI desktop. These TXT files are stored in OneDrive, and we use the direct link to them. As data is updated, we create a a new TXT file, we go into the PowerBI file, we change the link, and we update the data. This mostly works by appending the new data to the existing tables (and performing some transformation), so pretty quickly the tables inside the .pbix file grow up, and some of them have >4M rows. This makes it pretty difficult to work with them, since files are so big that a desktop PC struggles to work with them, even with 32GB of RAM.
So, I wanted to ask you what would be the best way to improve performance. I would normally think that using an SQL server instead of importing the TXT files into PoweBI desktop should help, but since we are also performing several transformations and using the TXT imported data to populate other tables, I'm not sure if just storing everything in an SQL server would work (never tried it), since I guess I would need to import them anyway in PowerBI to run the needed transformations and hence defeating the purpose of having a workable desktop file.
I couldn't find any documentation on ways to improve over a scenario like this one (maybe I missed it), so here I am with an SOS call. Let me know if I need to clarify anything, and as always, any help will be great.
Thanks in advance!
Seb
Solved! Go to Solution.
@sebasj Well, your other option is to try to eliminate columns in your imported data that you are not using. Focus on high cardinality columns (lots of unique values) That is going to increase the size of your data model. For example, if you have columns for "Units" and "Cost Per Unit" as well as "Total Cost", consider eliminating the "Total Cost" column and using a measure for that calculation instead. This will save on your data model size.
The other thing you can do that can sometimes reduce a model by leaps and bounds is to turn off Auto Time Intelligence. If you have lots of date columns in your data, a separate date table gets created for each and that can blow-up your data model tremendously.
In terms of migrating from text files to SQL. You can import the data into SQL and then you just need to modify your query in you semantic model to point to the SQL Server for the data instead of the text file. You can do this one at a time. The easiest thing to do would be to create a brand new query that connects to your data in SQL Server and transforms it the way you want. Then you use Advanced Editor to copy all of the M code. Now, open your original query with Advanced Editor and replace everything with this copied code. Save the query, delete the temporary query you created and save and apply. You original table that retrieved data from the text file will now get it from SQL Server and nothing in your model will break.
Well while this is marked as solved, it isn't sadly, we still are testing.
In the meantime, if someone knows if there's a way to easily change a model from Import to DirectQuery would be great. I searched the forum but didn't find anything meaninful, the ones I found are from a couple years ago and most say that you need to start from scratch almost.
Perhaps there's now an easier way which I'm not aware, because the model is pretty complex and redoing it would take some serious time.
Thanks again!
Seb
@sebasj If you put the data into SQL Server then you would want to perform the transformations using SQL Views if at all possible in order to avoid doing the transformations in Power Query and thus using local resources. If you do need to perform transformations in Power Query, then you would want to ensure that they fold back to SQL Server. You do this by right-clicking steps and ensuring that View Native Query is available. Again, this pushes processing back to the SQL Server instead of your local resources.
The other option that using SQL Server enables is that you could potentially use DirectQuery instead of import. This will generally be slower but, if you local machine can't handle the size of the data model, then this would be an option since with DirectQuery you wouldn't have a local data model and thus it wouldn't suck up your RAM for example.
Greg thanks a lot for the prompt reply! That's exactly what's happening, it's eating all the RAM. I tested the files with my home rig, which has 64GB of RAM, and they do work fine, but with 32GB and windows 11 it runs out of memory and found no way to fix it. My guess is also that memory management isn't the same between windows 10 and 11, because the machines in windows 10 with 32GB RAM can still update these files, the same file on a W11 PC can't handle it, needs 64GB. But this is a different story.
As far as I could read, DirectQuery has some limitations, so I guess I would migrate the model to SQL. There are some things that I don't have really clear, so I'm guessing a little bit. First of all, if I go for the first option (migrating tables to SQL) I guess I'll need to recreate all the relationships between tables (I could do that using SQL server management studio, using foreign keys of course). Then I assume that connecting PowerBI to the server would retrieve the tables and the relationships. Am I correct here? The model is actually quite complex (I guess that adds up to it eating all the RAM). If there's any way or any tool that migrates the current tables and relationships to SQL would be great. Second, and mainly for future updates, I'm not sure how to perform the data updates. I'm guessing would be just importing and appending to tables (and running the transformations in SQL if they're needed) and updating the views that run the transformations.
Is this correct so far? Would this help in having a workable file, or would it sill import the data into the pbix file? My doubt is because you first mentioned that in this way, the only thing I'd need to take into account is if I still need to run transformations in PowerQuery, ensuring they fold back to SQL server. Nevertheless you later mentioned that we could use DirectQuery instead of import I wouldn't have a local data model. So, the former still creates a local data model? Because I guess that then we would again run into issues because of the huge file size.
Thanks again!
Seb
@sebasj Well, your other option is to try to eliminate columns in your imported data that you are not using. Focus on high cardinality columns (lots of unique values) That is going to increase the size of your data model. For example, if you have columns for "Units" and "Cost Per Unit" as well as "Total Cost", consider eliminating the "Total Cost" column and using a measure for that calculation instead. This will save on your data model size.
The other thing you can do that can sometimes reduce a model by leaps and bounds is to turn off Auto Time Intelligence. If you have lots of date columns in your data, a separate date table gets created for each and that can blow-up your data model tremendously.
In terms of migrating from text files to SQL. You can import the data into SQL and then you just need to modify your query in you semantic model to point to the SQL Server for the data instead of the text file. You can do this one at a time. The easiest thing to do would be to create a brand new query that connects to your data in SQL Server and transforms it the way you want. Then you use Advanced Editor to copy all of the M code. Now, open your original query with Advanced Editor and replace everything with this copied code. Save the query, delete the temporary query you created and save and apply. You original table that retrieved data from the text file will now get it from SQL Server and nothing in your model will break.
@Greg_Deckler Thanks a lot for the input and sorry for the late reply. We've been deploying a local machine and doing some tests.
We've found that as long as we use import method instead of directquery, the file size increases as usual. We suspected so since it basically changed the source, and instead of retrieving the tables from CSV files stored in our OneDrive, retrieves the table from the SQL server. Nevertheless, still imports anc copies all data to the PBIX file, so the memory issues still happen.
We're checking DirectQuery method, which actually worked fine initially, kept the file size to a minimum since it didn't import all data. But we see there are some limitations, particularly regarding the max amount of rows that can be retrieved in a single query, which is 1M. Some of our files have more than that (actually there's one with >6M rows), so we're not sure if DirectQuery would work.
I'm not sure if either of this would work. A third approach I saw, but haven't tried, is deploying SSAS, but I'm not sure if this would solve the issue.
Thanks again, and sorry for the late reply, it's been basically the time we used to deploy a testing environment and start tests.
Seb
User | Count |
---|---|
65 | |
62 | |
59 | |
53 | |
28 |
User | Count |
---|---|
181 | |
82 | |
67 | |
47 | |
44 |