Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
I'm developing a dashboard using Apache Hive as the backend storage (using an ODBC driver) and I've noticed something strange.
When I add a datasource, Power BI runs the query 3 times for each refresh. Some of these queries are quite complex on huge datasets, with many joins that can take > 15 minutes to complete. Having Power BI run the exact same query 3 times is causing issues.
What is Power BI doing, and is there any way to get it to not run the same query multiple times on each refresh?
I'm on Power BI version 2.47.4766.542 64-bit (June 2017)
I had the same problem with reading local excel files into Power Query. This helped me somehow:
https://docs.microsoft.com/en-us/power-query/multiple-queries
Same issue here. Running mysql connector and noticing multiple threads per single datasource. This causes huge performance issues especially when you have complex queries.
I'm seeing this as well. I built an API that works perfectly fine in a browser, but when the call is executed as a web query it executes many times. I've seen > 10 queries being executed before the preview dialog pops up!
This seems to be related to the query execution time as well. If PBI does not get an answer, it simply retries the query (every 15 secs or so). But some of my queries take more time than that and you can imagine what happens to my databases when the same heavy query is re-executed all the time.
I analyzed the actual requests and they are all the same, so I cannot 'filter' based on request info. I also tried to rewrite the query from select * from table to select 'all column names' from table. That is (of course) faster, but not always feasible (huge nr of columns).
Any ideas?
Hi there - I have the exact same issue and have tried to search for a solution, but it doesn't really seem that there is one, so I made a work-around which seems to work, but it does involve a few steps.
In your backend:
- make a unique constraint on your tables primary_key (or another useful column) and a column with timestamp (or create a column if your don't have one), this ensures that the database only accepts one insert at a time.
In powerBI:
- make a dateTime column. If your query runs once a day simply take date with latest hour, in this way all the entries will be the same even if it takes a few minutes from insert to insert. powerBI will now throw an error because the database now has a unique constraint, but the error will only come at the second insert.
I use R to send data from powerBI to SQL andto avoid the error I have wrapped the insert_data into a tryCatch - see below.
# 'dataset' holds the input data for this script
# Install nescessaary package(s)
library(DBI)
# Connect to database
con <- dbConnect(odbc::odbc(), "odbc_writer", timeout = 10)
# Begin session
dbBegin(con)
# Skip to next if error is encounted (this WILL happen because poewrBI tries to send the data multiple times and the database table has been set up to be unique).
for (i in 1:10) {
skip_to_next <- FALSE
# Append data to table - and skip by error
tryCatch(dbAppendTable(con, "table", dataset, row.names = NULL), error = function(e) { skip_to_next <<- TRUE})
if(skip_to_next) { next }
}
# Commit what has been written
dbCommit(con)
# Disconnect data connection
dbDisconnect(con)
Hi Microsoft team !
What about this really important issue ?
It is not acceptable from customers not to have any answer or solution for this old issue, which is really problematic for us as several users run queries from PowerBi Desktop to our datasources (like Hive) and this become now a real performance issue for our platform.
We really need a solution from your team. Quickly.
Hi,
I am having the same problem with the connection to Big Query.
I have just received our bill for last month and have noticed that the queries I have run via Power BI are running 2 or 3 times and each run is costing the business as it seems it is saying they are new queries each time so it is not using the cached data from a previous run.
Has anyone had any responses on why this is happening and if so any recommendations to stop it
Hi @Anonymous ! I didn't have any answer from Microsoft Power Bi Team unfortunately.
This problem is really critical and it is causing extra costs for your company and I don't understand why they are not answering us.
Several tickets are opened about the same issue but nobody moves ...
Hi,
I know we are getting extra costs as well.
I have raised a ticket via our helpdesk to Microsoft so hopefully if we keep bombarding them we might get somewhere.
If I hear anything will let you know if you could do the same
Hi,
I have been having an email conversation with the Power BI support team and first they said this was bug but now they are saying it is by design.
I have pushed it back to them to explain why they think this is acceptable
Will update when I know anything
@Anonymous Thank you for all those informations.
From my point of view, no answer from the Microsoft PowerBI Team.
Hi,
I have had numerous email converstaions and an actual call last Friday abou this.
In my case we are using Big Query and apparently by design PBI will run a query twice once for metadat and then for the actual data.
However the problem comes when you use an ODBC driver to connect ( I am using Simba as advised by Google) this then multiples the running of a query up by anothe two times so in all at least 4 times the query will run, inmy case it runs a query anything up to 6 times .
I ran a test and the query runs multiple times if you make amends in power query , just openeing the PBIX file. Just seting up the PBIX adding a conditional column in PQ aloing with a custom column closing reopening and refreshing in total a simple query ran 22 times in all...not good.
We have pushed back to t hem now to sort out, theu did suggest to use there Big Query API but the problem is it runs the whole tables in and even if you set up views in Big Query it brings them in stages so back to multiple queries.
Its back with them now 😞
@Anonymous do you have any updates on this? We are running into a similar issue but ours is with an Azure Managed Instance.
We are having the same problem. We run a complicated query with multiple joins. We're connecting to a Vertica DB running on AWS. We connect via ODBC. Running the query multiple times (concurrently!) means that the data base runs out of memory. It also costs us money unnecessarily.
Hello BIDrone, did you find a way to avoid this?
I am having the same major issue.
¿Did you find a way to solve this?
The repeated running of the same query is a pain that affects multiple sources.
I raised an idea:
If you feel strongly, vote for it.
Is this expected behavior for any ODBC connection? I am running into this issue with a Redbrick database also.
This is simply unacceptable. This is the definition of death by a thousand cuts. We have over 60 dashboards spread across 3 servers and that's still not enough to off load the issue this is creating. If a dashboard is pulling in 5 tables thats 10 connections and executions to pull one dashboard.
In the examply below I ran a trace on the dashboard refresh and you can clearly see the query running twice with a connection reset inbetween. It's impossible to negate the negative effect of this many connections even to multiple servers.
The Problem occurs when Power BI issues a Hive Query something like 'SELECT * FROM DATE' but if imported as the table it uses only one connection. Hope this helps
Thanks
Hi,
This is because of you connection is established directly to your dataset. Microsoft recomends to build tabular model, so that your data will be stored in olap engine and will throw results alone to your PBI Reports.
I'm not sure I fully understand. The way I'm pulling data (and I bet the others in the tread are) is to hook up via ODBC to hive and run an import on a table or a query that results in a "tablular" dataset. As far as Power BI should be concerned, it's the same output as hooking up to an oracle table using ODBC, or any other database.
Is the tabular model something different then an output that looks like a table (column/rows)?