Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Days is here! Join us now for 60+ days of learning, challenges, and connection. Learn more

Reply
transposed
New Member

AWS S3 Shortcut to Fabric Lakehouse; Massive Compute Spike

Hi there, we're provided an S3 bucket with data we need to use for creating reports. Instead of replicating the data, we chose to use Fabric Shortcuts instead, to load up a lakehouse and use this lakehouse to serve reports. 

 

The Shortcut works, but somehow it uses massive amounts of compute; one instance (not even the whole day) it used up >330,000% (no typo) of our F8 compute.

 

The S3 source is not large at all, probably 10GB or so, and we've already switched caching on, on a weekly basis. So we're really not sure why there's such a spike; maybe 300% over I can understand and we'd probably need higher SKU.

 

Any help appreciated. Thanks in advance! 

1 ACCEPTED SOLUTION
Tamanchu
Impactful Individual
Impactful Individual

Hi @transposed,

Interesting scenario.  Honestly, I don’t think this is necessarily a Fabric bug. It looks more like a mismatch between the shortcut usage pattern and a BI serving workload.

A OneLake shortcut to AWS S3 is essentially a virtual reference to external storage, not a fully optimized native Delta table physically stored inside OneLake.

So even if the source is only ~10 GB, the compute impact can still become very large depending on :

  • The report design
  • The number of visuals
  • Concurrent users
  • Refresh frequency
  • Query parallelism
  • And the file structure in S3

Power BI reports can generate many simultaneous queries behind the scenes, and with external shortcuts those queries are usually much less optimized than queries against native Delta tables stored directly in Fabric.

That’s why you can sometimes see surprisingly large CU spikes even on relatively small datasets.

The caching helps, but weekly cache alone usually doesn’t protect you from repeated interactive report queries happening between refresh windows.

What I would personally recommend as the long-term architecture :

  • Use the S3 shortcut only as the ingestion/landing layer
  • Materialize the data into native Delta tables inside the Lakehouse (Notebook, Copy Job, Dataflow Gen2, etc.)
  • Point Power BI to the Delta tables instead of directly to the shortcut-backed data

Once the data is materialized in Delta format inside Fabric, you benefit from :

  • V-Order optimization
  • Better column pruning
  • Local OneLake optimization
  • Reduced external reads
  • Much lower CU consumption overall

If the shortcut must remain the serving layer for business reasons, I would at least try :

  • More frequent caching
  • Reducing the number of visuals per page
  • Avoiding highly interactive Direct Lake patterns directly on shortcut-backed data
  • Validating whether the S3 files are properly partitioned Parquet files

So my feeling is that the spike itself is probably the symptom of repeated external query execution rather than raw data size alone.

 

Hope this helps you narrow it down 🙂
If you find the exact root cause or what optimization helped the most, feel free to share it back here this is a really useful scenario for the community.

 

References :

 

View solution in original post

2 REPLIES 2
Tamanchu
Impactful Individual
Impactful Individual

Hi @transposed,

Interesting scenario.  Honestly, I don’t think this is necessarily a Fabric bug. It looks more like a mismatch between the shortcut usage pattern and a BI serving workload.

A OneLake shortcut to AWS S3 is essentially a virtual reference to external storage, not a fully optimized native Delta table physically stored inside OneLake.

So even if the source is only ~10 GB, the compute impact can still become very large depending on :

  • The report design
  • The number of visuals
  • Concurrent users
  • Refresh frequency
  • Query parallelism
  • And the file structure in S3

Power BI reports can generate many simultaneous queries behind the scenes, and with external shortcuts those queries are usually much less optimized than queries against native Delta tables stored directly in Fabric.

That’s why you can sometimes see surprisingly large CU spikes even on relatively small datasets.

The caching helps, but weekly cache alone usually doesn’t protect you from repeated interactive report queries happening between refresh windows.

What I would personally recommend as the long-term architecture :

  • Use the S3 shortcut only as the ingestion/landing layer
  • Materialize the data into native Delta tables inside the Lakehouse (Notebook, Copy Job, Dataflow Gen2, etc.)
  • Point Power BI to the Delta tables instead of directly to the shortcut-backed data

Once the data is materialized in Delta format inside Fabric, you benefit from :

  • V-Order optimization
  • Better column pruning
  • Local OneLake optimization
  • Reduced external reads
  • Much lower CU consumption overall

If the shortcut must remain the serving layer for business reasons, I would at least try :

  • More frequent caching
  • Reducing the number of visuals per page
  • Avoiding highly interactive Direct Lake patterns directly on shortcut-backed data
  • Validating whether the S3 files are properly partitioned Parquet files

So my feeling is that the spike itself is probably the symptom of repeated external query execution rather than raw data size alone.

 

Hope this helps you narrow it down 🙂
If you find the exact root cause or what optimization helped the most, feel free to share it back here this is a really useful scenario for the community.

 

References :

 

Thank you Tamanchu for the detailed response! Appreciate it!

 

Your response does help us narrow down the alternatives we should pursue. We'd probably not be able to pinpoint the cause of that spike as I don't think we'd find any partner to help trace that operation/transaction to diagnose what went on, so we'd probably just work around this issue.

 

Thanks again!

Helpful resources

Announcements
Fabric Data Days is here Carousel

Fabric Data Days 2026

Don't miss out on Data Days, June 15 through August 7. Learn Fabric, Power BI, SQL, AI and more.

June Fabric Update Carousel

Fabric Monthly Update - June 2026

Check out the June 2026 Fabric update to learn about new features.