Solved: Slow Query on Gateway but Fast on Database

dave4667 · ‎12-02-2025

Hello,

Our company has a server dedicated for Power BI On-Premises Data Gateway with 20 vCPU and 64 GB RAM. We have a little problem where Power BI Gateway performs refresh queries very slow especially to Hive tables. There's this one semantic model that queries to the Hive tables, the problem is, when I click refresh on the semantic model, it can take up to 15 minutes with a query limit of 50 rows. But when we check from the Hive side, the query only takes 3 seconds. From the Gateway perspective, the CPU usage never goes above 15% and memory usage rarely goes above 30%. So where is the bottleneck and how do I troubleshoot this? The problem only occurs to Hive. We're not getting these issues with Oracle, PostgreSQL, and MySQL databases.

Thank You

Zanqueta · ‎12-02-2025

Hi @dave4667,

This behaviour is relatively common when working with Hive via the On-Premises Data Gateway, and it is usually not related to CPU or memory usage but rather to network latency, data serialisation, and lack of efficient query folding. Let us analyse the main points:

Why is the query fast in Hive but slow through the Gateway?

The Gateway does not execute the query directly: It acts as a proxy, receiving data from Hive and sending it to the Power BI service.
Even if the query returns only 50 rows, Hive may send additional metadata or process via ODBC/JDBC, which introduces overhead.
Hive ODBC/JDBC driver performance: Some drivers are not optimised for small operations and can have high initialisation latency.
Partial or no query folding: If Power Query applies transformations that are not fully folded to Hive, the Gateway may process part of the logic locally.
Network between Gateway and Hive: Even with sufficient CPU and RAM, bandwidth or latency can be the bottleneck.

How to diagnose and improve

Check the connection mode:
- Are you using Import or DirectQuery?
- In Import mode, the Gateway must load the data into the Power BI service, which can be slow if compression or conversion is involved.
Enable Query Diagnostics in Power BI Desktop:
- This will show whether transformations are being folded to Hive or processed locally.
- https://learn.microsoft.com/power-query/query-diagnostics
Test with a simple query in Power Query:
- For example, SELECT * LIMIT 50 without transformations. If it remains slow, the issue is likely with the driver or network.
Update the Hive ODBC/JDBC driver:
- Older drivers often have performance issues.
Adjust Gateway settings for timeouts and concurrent connections:
- In Gateway Settings, increase the maximum number of concurrent connections for Hive.
Check compression and serialisation:
- Hive may send data in Avro/Parquet format, but the Gateway converts it internally, which can be slow.

Best practices

Where possible, create optimised views in Hive to reduce transformations in Power Query.
Consider using DirectQuery to avoid large data transfers, but test whether performance improves.
Install the Gateway as close as possible to the Hive cluster to minimise network latency.

If this response was helpful in any way, I’d gladly accept a 👍much like the joy of seeing a DAX measure work first time without needing another FILTER.

Please mark it as the correct solution. It helps other community members find their way faster (and saves them from another endless loop 🌀.

View solution in original post

v-priyankata · ‎12-08-2025

Hi @dave4667

Thank you for reaching out to the Microsoft Fabric Forum Community.

@R1k91 @Zanqueta Thanks for your inputs

I hope the information provided by users was helpful. If you still have questions, please don't hesitate to reach out to the community.

v-priyankata · ‎12-11-2025

Hi @dave4667

Hope everything’s going smoothly on your end. I wanted to check if the issue got sorted. if you have any other issues please reach community.

R1k91 · ‎12-04-2025

what kind of power query has been implemented in the semantic model?

I mean, is it zero/low transformation power query and all the transformations happen on Hive side or there're many transformations?
in the second case have you checked if transformations are folding to the source?

if it's not folding, transformations happen on the gateway side and they can be slow.

--
Riccardo Perico
BI Architect @ Lucient Italia | Microsoft MVP

Blog | GitHub

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Zanqueta · ‎12-02-2025

Hi @dave4667,

This behaviour is relatively common when working with Hive via the On-Premises Data Gateway, and it is usually not related to CPU or memory usage but rather to network latency, data serialisation, and lack of efficient query folding. Let us analyse the main points:

Why is the query fast in Hive but slow through the Gateway?

The Gateway does not execute the query directly: It acts as a proxy, receiving data from Hive and sending it to the Power BI service.
Even if the query returns only 50 rows, Hive may send additional metadata or process via ODBC/JDBC, which introduces overhead.
Hive ODBC/JDBC driver performance: Some drivers are not optimised for small operations and can have high initialisation latency.
Partial or no query folding: If Power Query applies transformations that are not fully folded to Hive, the Gateway may process part of the logic locally.
Network between Gateway and Hive: Even with sufficient CPU and RAM, bandwidth or latency can be the bottleneck.

How to diagnose and improve

Check the connection mode:
- Are you using Import or DirectQuery?
- In Import mode, the Gateway must load the data into the Power BI service, which can be slow if compression or conversion is involved.
Enable Query Diagnostics in Power BI Desktop:
- This will show whether transformations are being folded to Hive or processed locally.
- https://learn.microsoft.com/power-query/query-diagnostics
Test with a simple query in Power Query:
- For example, SELECT * LIMIT 50 without transformations. If it remains slow, the issue is likely with the driver or network.
Update the Hive ODBC/JDBC driver:
- Older drivers often have performance issues.
Adjust Gateway settings for timeouts and concurrent connections:
- In Gateway Settings, increase the maximum number of concurrent connections for Hive.
Check compression and serialisation:
- Hive may send data in Avro/Parquet format, but the Gateway converts it internally, which can be slow.

Best practices

Where possible, create optimised views in Hive to reduce transformations in Power Query.
Consider using DirectQuery to avoid large data transfers, but test whether performance improves.
Install the Gateway as close as possible to the Hive cluster to minimise network latency.

If this response was helpful in any way, I’d gladly accept a 👍much like the joy of seeing a DAX measure work first time without needing another FILTER.

Please mark it as the correct solution. It helps other community members find their way faster (and saves them from another endless loop 🌀.

Slow Query on Gateway but Fast on Database

Why is the query fast in Hive but slow through the Gateway?

How to diagnose and improve

Best practices

Why is the query fast in Hive but slow through the Gateway?

How to diagnose and improve

Best practices

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Slow Query on Gateway but Fast on Database

Why is the query fast in Hive but slow through the Gateway?

How to diagnose and improve

Best practices

Why is the query fast in Hive but slow through the Gateway?

How to diagnose and improve

Best practices

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - December 2025

FabCon Atlanta 2026