Re: Lakehouse and semantic model/report on separat...

frithjof_v · ‎03-06-2024

Hi,

I am wondering if it is possible to have the Lakehouse and the custom Direct Lake Semantic Model (created by clicking the New semantic model button) in separate capacities?

I read in this blog that the Direct Lake Semantic Model can be created in another workspace than the Lakehouse. Will it also work if the Direct Lake Semantic Model is created in another capacity?

Let's say my Lakehouse is in Capacity A, and the direct lake semantic model is in Capacity B.

If Capacity A has been overloaded and is in throttling or rejection mode. Can I have the direct lake semantic model (and/or the power bi report) on Capacity B, and does that mean that the Power BI direct lake mode will be working even if Capacity A (where the Lakehouse resides) is in throttling or rejection mode?

(What I am really trying to find out, is a good solution to avoid that the end user experience (interactive usage) gets affected by the potential throttling/rejection due to data engineering activities)

Thank you!

Anonymous · ‎03-07-2024

Hi @frithjof_v ,

We have an update from internal team -

A shortcut looks like a good option for this. See OneLake capacity consumption example - Microsoft Fabric | Microsoft Learn. The OneLake transactions activity would count against the capacity that is being accessed. Without shortcuts, the OneLake transactions would always count against the owning capacity.

Today a Direct Lake semantic model is tied to a Lakehouse or Warehouse. But shortcuts in the lakehouse can be used to bring in data that lives in another workspace.

And yes, this means that the reports and Direct Lake models in a remote workspace can continue to work even if the capacity for the source lakehouse is paused or is in rejection mode.

Hope this is helpful. Please let me know incase of further queries.

frithjof_v · ‎03-07-2024

Thank you, @Anonymous, for a fast and enlightening answer!

I can consider using shortcuts.

However, another option I am wondering about, which would be easier to set up and maintain, is if I create a Power BI report (not semantic model) in a workspace in Capacity B, and this report is connected to the semantic model in Capacity A via live connection.

Will Capacity B (where the report resides) or Capacity A (where the semantic model and lakehouse resides) be regarded as the consuming capacity in this case with regards to throttling? (Ref. this doc which distinguishes between the terms producing capacity and consuming capacity).

Will the transactions when a user is reading the report, be counted against capacity B or capacity A?

Another question: I am happily surprised to learn that shortcuts can access data from a paused capacity.

Because this documentation says that data in a paused capacity is inaccessible.

At the same time, this documentation says that a shortcut can access data from a paused capacity, which is great.

If I understand correctly, a shortcut created in Capacity B could access data which resides in Capacity A, even if Capacity A is paused.

Is it only shortcuts which have a special ability to access data from a paused capacity?

Or can other types of Fabric/Power BI workloads also access data from a paused capacity (without using shortcut).

This documentation says that if a OneLake artifact (I guess that means OneLake data) on capacity A is being consumed by capacity B, then only the throttling state of the consuming capacity B matters. Which, in my mind, kind of implies that all the transactions when accessing OneLake data would be counted against the consuming capacity. This documentation doesn't make a distinction between using shortcut/not using shortcut.

However this documentation implies more strongly that the OneLake transactions are being counted against the owning (aka producing) capacity when accessing it's data, unless I am using shortcuts in which case the OneLake transactions will be counted against the capacity where the shortcut is created.

I guess what maybe confuses me, is that the documentation regarding throttling says that only the throttling state of the consuming capacity matters.

However, the documentation regarding OneLake transactions implies that OneLake transactions are counted against the owning (producing) capacity, unless using shortcuts.

Does this mean, even if I can access the data on a throttling capacity A from a non-throttling capacity B, that the OneLake transactions will be counted against capacity A and add to the overload on capacity A?

(I understand that Shortcuts changes the logic here, so if I am using Shortcuts the OneLake transactions are counted against the consuming capacity. Which is nice.

But I am asking about the case where I am not using shortcuts, but a situation where I have a Power BI report in Capacity B (non-throttling) while the semantic model and lakehouse is in Capacity A (in throttling or reject state).

Or similarly, if I have a notebook in Capacity B which is querying data from Capacity A).

Thank you!

Anonymous · ‎03-11-2024

Hi @frithjof_v ,

Response from internal team -

Power BI Reports consume no capacity (Paginated reports are another story). Reports send queries to semantic models, and it's the semantic models that consume capacity. So the capacity associated with the semantic model will have all the charges.

If your Lakehouse and Semantic model are in different capacities, are you using the data in the Lakehouse in your semantic model? If so, you would need to create a shortcut in CapacityB to the Lakehouse in CapacityA for the semantic model to access the data.

Further, to your question regarding throttled capacities, ideally, if CapacityA is throttled but CapacityB is not, as transactions are billed to the calling capacity, your semantic model should work fine. However, we did uncover a bug on our side for this and are triaging it atm. I will share once a known issue is posted for this.

However, the documentation regarding OneLake transactions implies that OneLake transactions are counted against the owning (producing) capacity, unless using shortcuts. - this is correct

Hope this is helpful. Please let me know incase of further queries.

frithjof_v · ‎03-11-2024

Thank you @Anonymous,

This is very clarifying 😃

Now I know that consumption for Power BI report reading is counted against the semantic model, that is valuable knowledge for me 😃 (Although I was hoping it was counted against the report, because then I could deploy the report to multiple capacities, in order to isolate the load for different user groups, but that isn't important now).

I'm still struggling to understand something which seems like a contradiction to me, but I'm probably missing something:

"Further, to your question regarding throttled capacities, ideally, if CapacityA is throttled but CapacityB is not, as transactions are billed to the calling capacity, your semantic model should work fine. However, we did uncover a bug on our side for this and are triaging it atm. I will share once a known issue is posted for this.

However, the documentation regarding OneLake transactions implies that OneLake transactions are counted against the owning (producing) capacity, unless using shortcuts. - this is correct"

I'm confused, are transactions billed (counted) against the calling or the owning capacity? Is there a distinction here between OneLake transactions and compute transactions, so OneLake transactions will always be counted against the owning capacity, and compute transactions will always be counted against the calling capacity?

(Assume I'm not using shortcuts, in which case I understand OneLake transactions are counted against the capacity where the shortcut was created)

If my capacity A is throttled and capacity B is not throttled:

Will calls from capacity B to data owned by capacity A add OneLake transactions to capacity A and thus further increase the overload on capacity A?
Will the response time for the data to reach capacity B be slower because A is throttling?

(Assume I'm not using shortcuts in this case)

I am thinking my Fabric capacity may be vulnerable if some developer inadvertently start consuming too many CU's, then the experience can be slow for everyone if the consumption is greater than the smoothing can handle. So I am wondering about isolating different user groups by assigning different capacities.

I don't think it's possible to say for example that 20% of a single capacity should be reserved for Power BI interactive use, and other workloads can only use 80% of that capacity. If I want to create a separation between them, I need to create multiple capacities, if I understand correctly.

All the users are sharing the same pool on a single capacity, so I am thinking if a relevant strategy will be to create multiple capacities. And I am interested how the interaction between the capacities is working.

Thank you for your help and the great information you provide! 😃

Anonymous · ‎03-13-2024

Hello @frithjof_v ,

At this time we are waiting for a response from internal team.
We will update you once we hear back from them.
Appreciate your patience.

Anonymous · ‎03-26-2024

Hi @frithjof_v ,

Apologize for the delay in response from our end.
Response from internal team -

I don't believe this is possible as asked by the user. But you could do your Data Engineering in Workspace A, attached to Lakehouse A, and then create shortcuts from Lakehouse B in Workspace B, and then host the semantic model in Workspace B. The differences in Workspaces allows different capacities to be used.

Hope this is helpful.
Thank you

Anonymous · ‎03-27-2024

Hello @frithjof_v

We haven’t heard from you on the last response and was just checking back to see if we answered your query.
Otherwise, will respond back with the more details and we will try to help .

Anonymous · ‎03-29-2024

Hi @frithjof_v ,

We haven’t heard from you on the last response and was just checking back to see if we answered your query.
Otherwise, will respond back with the more details and we will try to help .

Lakehouse and semantic model/report on separate capacities

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - February 2025

Fabric Community Update - February 2025

New Offer! Become a Certified Fabric Data Engineer

Lakehouse and semantic model/report on separate capacities

Helpful resources

Join us at the Microsoft Fabric Community Conference

Fabric Monthly Update - February 2025

Fabric Community Update - February 2025