Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! It's time to submit your entry. Live now!

Automated export for Eventhouse data to Lakehouse without primary table access

Revision:

Be able to enable OneLake Availability on the follower so the business users can enable it as well as inherit the permissions from the follower rather than the primary.

 

 

Problem:

Currently, downstream consumers who need to query Eventhouse data via Lakehouse shortcuts must have read access to the primary Eventhouse tables. This creates two main challenges:

  1. Security / Governance: Users can access the primary Eventhouse, which may not be desirable.
  2. Operational Overhead: Teams must schedule incremental .export commands for each table to provide Lakehouse access, which is cumbersome and difficult to manage at scale.

 

Proposed Solution:

Introduce a feature that allows Eventhouse to automatically export data to Lakehouse shortcuts without requiring users to have read access to the primary tables.

Benefits:

  • Downstream consumers can query Lakehouse shortcuts without impacting Eventhouse compute or hitting query limits.
  • Eliminates the need to schedule per-table exports manually.
  • Maintains strict security and governance by preventing direct access to the primary Eventhouse tables.
  • Supports both full and incremental exports, keeping Lakehouse data fresh and up-to-date.
Status: Need Clarification

Thanks for the input.

 

Trying to understand the scenario better - what would be the reason for doing this?
Eventhouse data and OneLake Availability is considered as one logical copy with same access level and retention. You can query the data in OneLake without using the Eventhouse compute for historical analysis, while querying the Eventhouse direcely for timeseries/real-time analytics.

If you would like to manage the data in lake seperately, then copying it outside Eventhouse might be a better way think about it. 

Comments
YSD
Microsoft Employee
Microsoft Employee

Thanks for your input!

anshulsharma
Microsoft Employee
Status changed to: Need Clarification

Thanks for the input.

 

Trying to understand the scenario better - what would be the reason for doing this?
Eventhouse data and OneLake Availability is considered as one logical copy with same access level and retention. You can query the data in OneLake without using the Eventhouse compute for historical analysis, while querying the Eventhouse direcely for timeseries/real-time analytics.

If you would like to manage the data in lake seperately, then copying it outside Eventhouse might be a better way think about it. 

matthewlopesdev
Regular Visitor

@anshulsharmaThanks for the follow up, I am happy to clarify! In my scenario, I intentionally segregate ingestion from downstream consumption. The primary Eventhouse is treated strictly as a protected ingestion surface. No analytical or ad-hoc queries are permitted there. All downstream consumption is expected to occur on follower databases or via a Lakehouse shortcut.

 

The challenge with OneLake Availability today is that consumers of a Lakehouse shortcut must still be granted read access to the corresponding tables in the primary Eventhouse. Even if users are instructed to only query via the Lakehouse, this effectively gives them the ability to query the primary Eventhouse, which violates our governance model.

 

Ad-hoc or poorly optimized queries against the primary Eventhouse can:

• Consume significant memory

• Trigger unnecessary scale-out

• Introduce ingestion instability or operational risk

 

While OneLake Availability is logically the same data, the access boundary still matters operationally.

 

To clarify, I’m not asking for a traditional export or data duplication and I want Eventhouse to manage the data just as OneLake Availability is doing. What I am looking for is a OneLake Availability pattern where:

• Lakehouse consumers do not require read access to the primary Eventhouse tables

• The primary Eventhouse remains a strictly protected ingestion layer

 

Maybe this could be achieved by allowing OneLake Availability to be enabled on the follower DB so the downstream consumer doesnt need access to the primary, they would need access to the follower which they will have.

 

This would enable a clean producer/consumer separation while still leveraging the Eventhouse table as the single logical copy without duplicating the data. Does this better explain the governance and operational concern I am trying to solve?