Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Enable OneLake Availability on the Follower Eventhouse

I think enabling OneLake availability directly from a follower Eventhouse would be a valuable feature. This idea builds on another proposal I shared earlier that currently appears to be stuck in “Needs clarification.”

 

In our architecture, we treat the primary Eventhouse strictly as ingestion-only, while the follower Eventhouse is used exclusively for downstream consumption.

 

Ideally, business users should have access only to the follower Eventhouse, never the primary. However, today when we enable OneLake availability and create a shortcut to a Lakehouse, users must be granted access to the primary Eventhouse in order to see the table. This breaks our intended segregation model and introduces a risk, since business users can now directly query the primary Eventhouse—something we explicitly want to avoid.

 

While the follower Eventhouse is read-only, it is already the surface area exposed to downstream consumers. Allowing OneLake availability to be enabled from the follower Eventhouse would preserve this separation of concerns and eliminate the need to grant any access to the primary.

 

This would also empower downstream users to enable OneLake availability and create their own shortcuts, rather than requiring the development team to configure it on the primary Eventhouse and manage narrowly scoped access to individual primary tables.

Overall, this change would better support secure, ingestion-vs-consumption–aligned architectures and reduce unnecessary operational overhead.

Status: Needs Votes

We have not received many requests for OneLake availability for the follower, in most cases, users prefer the main location. I will gather votes and feedback before deciding how to proceed. I will add this to my investigation area list.

Thanks 

Comments
YSD
Microsoft Employee
Microsoft Employee
Status changed to: New

Thanks for your input!

Tzvia
Microsoft Employee
Status changed to: Need Clarification

I want to make sure I understand your setup correctly.

Are you creating the Follower database on Eventhouse?
If yes, then what is the additional need for view‑only access to OneLake? Normally, the Follower already provides read‑only access to the data of the Eventhouse, so it’s not clear why OneLake access is also required.

matthewlopesdev
Regular Visitor

Hey @Tzvia, Yes I am creating the follower database and for real time use cases the business partners are using the follower eventhouse. But when they want to blend data with batch data for batch use cases we enable onelake availability and shortcut it to lakehouse for them to consume from lakehouse.

 

When we enable onelake availability we have to enable it on the primary eventhouse table. In order for the business partner to see the shortcut they have to have access to the primary eventhouse table which is against what we want to do since we want primary to only be used for ingestion not downstream consumption so we do not want business partners seeing primary eventhouse at all.

 

The follower does provide read-only in an eventhouse but onelake availability shortcuts the data to lakehouse. For batch use cases like notebooks we do not want the business partners reading from eventhouse because that is very expensive when we can just shortcut to lakehouse using onelake availability.

 

Does this clear up any questions you have?

matthewlopesdev
Regular Visitor

@TzviaTo provide more context:

 

When you read from Eventhouse into a notebook it causes a large memory spike due to running a .export to blob storage. So, for any notebook reading eventhouse data we want to enable OneLake Availability and shortcut it to the lakehouse for the notebook to read from lakehouse instead of eventhouse. This is because when you query the eventhouse it does an export to blob which requires a lot of memory and in-tern scales the eventhouse up taking more CUs than necessary. I worked with the PG on this previously and the information I was given is as follows:

 

There are 3 options on the "readMode" option in spark

 

ForceSingleMode: Queries will directly be sent to the engine for evaluation and the results will be evaluated back into Spark.

 

ForceDistributedMode: perform a .export command, export to a parquet file and then read the parquet file back to a Spark dataframe.

 

Default: Spark tries to approx row-count the query and see if it encounters a timeout and/or the query times out. If that happens, it automatically thinks that the query will hit query limits and tries to do a ForceDistributedMode. This is what the Spark connector does in case no readMode is specified.

 

If we want to take control and instruct Spark that you should not do DistributedMode, then we can just say .option("readMode","ForceSingleMode"), but if the query times out or Query limits are hit this will fail.

 

Also, we cannot enforce downstream consumers to always hard code to avoid this export to blob that is causing the memory spike.

Tzvia
Microsoft Employee
Status changed to: Needs Votes

We have not received many requests for OneLake availability for the follower, in most cases, users prefer the main location. I will gather votes and feedback before deciding how to proceed. I will add this to my investigation area list.

Thanks 

matthewlopesdev
Regular Visitor
@Tzvia, that is completely understandable, thank you!