Best Practice for Import vs. DirectQuery Mode in P...

HamidBee · ‎01-19-2024

Hi All,

I'm delving into Power BI reporting and came across a scenario in a tutorial (applicable to Azure Synapse but relevant to Power BI in general) where DirectQuery mode was used for all tables in a data model, including both dimensional and fact tables. This raised a question in my mind about the optimal approach when dealing with different types of tables in Power BI reports.

Wouldn't it make more sense to use Import mode for dimensional tables and reserve DirectQuery mode only for the fact tables? I'm thinking this could potentially optimize the performance and efficiency of the reports, especially considering the typically smaller size of dimensional tables and the larger volume of data in fact tables.

I'd appreciate any insights or experiences on this matter. Is there a general best practice in such scenarios, or does it depend heavily on the specific context of the data and the reporting needs?

https://microsoftlearning.github.io/DP-500-Azure-Data-Analyst/Instructions/labs/04-create-a-star-sch...

Thank you for your guidance!

lbendlin · ‎01-19-2024

You need to distinguish between "external" (to Azure) data sources and ~~datasets~~ semantic models. Import mode is not something your report users ever experience, they get their data from the dataset which behaves like a Direct Query data source. (Or from Delta Lake in the future). Only if and when you have a Direct Query connection to an "external" (on-prem for example) data source will the end user see a performance impact. That also means that you should only consider Direct Query mode if your data source can actually handle the load (ie being bombarded with gazillions of queries initiated by users interacting with your report).

Read about the concepts of aggregations (where there are arguments pro and contra on which side should be import mode and which side should be direct query) and the concept of incremental refresh with "hot" direct query partition.

So my answer would be a definitive "It depends."

HamidBee · ‎01-21-2024

Hi @lbendlin,

Thank you for sharing your insights. I have some follow up questions just to try to gain further understanding:

1. You mentioned Delta Lake, are you referring to the Lakehouse in Fabric or something else?.

2. Is it safe to say that so long as the dimensions tables in our star schema are small we can opt for import mode unless, (this has happened to me in a previous report I was building) I imported data and I used Power Query to normalize it which resulted in duplicating and referencing tables. I got away with it becasue the dataset was small but I can't imagine what would have happened if I was working on like 30GB+ of data. I guess it would be okay if I imported the table as a dimension table. I guess this goes back to what you were saying about "it depends".

Thank you.

lbendlin · ‎01-21-2024

1. I am referring to the future data storage format

Lakehouse and Delta tables - Microsoft Fabric | Microsoft Learn

2. There is a soft limit on the cardinality of the key column in table relationships. Anything above 50K will start to give you problems both in composite data models but also inside a semantic model. This is mostly caused by the key column values being enumerated verbatim, which can lead to extremely large query texts (not even talking about data here!) and severe performance degradation.

Direct Query mode against an on-premise data source should only be used if that source is

- too large (TB range)

- queried very infrequently and randomly

- fast changing (note I say changing, not inserting)

In all other scenarios import mode is preferred. Just keep in mind that one person's (let's call him "Mr. OmPrem") import mode is another person's (let's call him "Mr. Wade") semantic model. The report users don't care about this distinction.

HamidBee · ‎01-21-2024

Thanks for sharing. Regarding what you mentioned here:

@lbendlin wrote:
2. There is a soft limit on the cardinality of the key column in table relationships. Anything above 50K will start to give you problems both in composite data models but also inside a semantic model. This is mostly caused by the key column values being enumerated verbatim, which can lead to extremely large query texts (not even talking about data here!) and severe performance degradation.

Please correct me if I'm wrong but are you saying this soft limit would apply for both DirectQuery and Import mode?. Furthermore, what woud be the workaround when handling data with such high cardinality?.

I will try to avoid DirectQuery moving forward unless I'm dealing with one of the three scenarios that you listed:

@lbendlin wrote:
Direct Query mode against an on-premise data source should only be used if that source is
- too large (TB range)
- queried very infrequently and randomly
- fast changing (note I say changing, not inserting)

I noticed you put an emphasis on on-premises data. If the data source was stored on the cloud would you make a distinction here or would the same general rule apply?.

Thanks in advance.

lbendlin · ‎01-21-2024

are you saying this soft limit would apply for both DirectQuery and Import mode?. Furthermore, what woud be the workaround when handling data with such high cardinality?

The soft limit applies to the semantic model (which by definition is direct query only)

There is no easy solution. Let's assume you have a DateTime field as the primary key - you could consider splitting that into two tables, one with dates and one with date fractions. That would still give you 86400 rows in the second table (assuming second level granularity) but you might be able to get away with it.

You may be able to find compromises, but there's no guarantee. I have had cases where I had to abandon a project entirely (a dimension with 1.3M distinct key values, added to a composite model, produced direct query queries larger than 6GB (JUST THE QUERY TEXT). That was simply not sustainable.

Best Practice for Import vs. DirectQuery Mode in Power BI Reports with Dimensional and Fact Tables

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly

Jumpstart your career with the Fabric Career Hub

Best Practice for Import vs. DirectQuery Mode in Power BI Reports with Dimensional and Fact Tables

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly