Solved: Re: Fetching AWS Athena data into Fabric

visheshjain · ‎06-30-2025

Hello everyone,

I have all my sql tables in AWS Athena.

Is there some way we can get all those tables into Fabric?

I am looking to mirror/shortcut the Athena data into Fabric.

Thank you,

Vishesh Jain

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

v-mdharahman · ‎08-29-2025

Hi @visheshjain,

You’re right Athena doesn’t natively support continuous export on data arrival. Athena is a query service on top of data stored in S3. It doesn’t generate new files by itself unless you explicitly run a query that outputs data. You need another AWS service to call Athena on a schedule or in response to S3 events. You can also reach out to AWS support team to get a better understanding of the services in AWS and how they works with incremental data.
Coming to the latency part, you are also right that dumping a full CSV daily won’t scale well. If every refresh requires exporting the entire dataset into a single CSV/Parquet, the process will keep getting slower. The larger the file, the longer it takes to write it in Athena the transfer it to S3 and load it into Fabric and then refresh the Power BI dataset. So instead of a full export, only export incremental changes (new or updated rows) from Athena to S3 and store them in partitioned folders (e.g., by date). Now in Fabric, load just the new partitions into the Lakehouse table. This way, refresh time remains roughly constant even as total data grows.

Also as @lbendlin has already mentioned that there is no direct connector available in fabric to pull table from AWS athena and if you want it then you can raise the issue on Fabric Ideas Forum.

Best Regards,

Hammad.

View solution in original post

v-sdhruv · ‎09-26-2025

Hi @visheshjain ,
Since we didnt hear back, we would be closing this thread.
If you need any assistance, feel free to reach out by creating a new post.

Thank you for using Microsoft Community Forum

visheshjain · ‎07-04-2025

Hello,

I was not able to get the data from Athena directly into Fabric, however I was able to get the data from S3 into a Lakehouse and make a table using a SQL endpoint, which was stored into a Lakehouse table.

I created a report from this lakehouse table and published it to the service, but here is the issue.

I added new data to S3 and when I refresh this semantic model in the service, the new data does not show up.

If anyone could please help me with it.

Thank you,

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

v-mdharahman · ‎07-13-2025

Hi @visheshjain,

Thanks for the detailed follow-up, the issue you are facing is likely due to how the data is being loaded from S3 into the Lakehouse. Fabric doesn't automatically sync new files added to S3 unless you've built a process like a Dataflow Gen2, Notebook, or Pipeline to handle that refresh regularly.

You can set up a recurring Dataflow Gen2 or Notebook in Fabric that reads from your S3 path and updates the Lakehouse table which makes sure the new S3 data is picked up before the semantic model refresh. If you're using COPY INTO or reading Parquet/CSV files directly from S3, you'll need to rerun that logic whenever new files are added.

Once the Lakehouse table is refreshed with the latest data from S3, the semantic model refresh in the Power BI Service will pick up the changes correctly.

Best Regards,

Hammad.

visheshjain · ‎07-14-2025

Hello @v-mdharahman,

Could you please help me out with how to design the solution?

I want to design the solution in such a way that after loading new data in S3, when the user refreshes the dataset in PBI service, all the data new and old is fetched from S3 and the report is updated.

Thank you,

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

v-mdharahman · ‎07-19-2025

Hi @visheshjain,

To automatically reflecting the latest S3 data whenever the dataset is refreshed in the Power BI Service you’ll need to set up a Fabric pipeline or scheduled process that refreshes the Lakehouse table before the semantic model refresh runs.

If using a Fabric Pipeline to Orchestrate the Flow, first create a Notebook or Dataflow Gen2. Now write logic to load all data from your S3 folder into a Lakehouse table, if using a notebook, you can use COPY INTO or read_parquet() / read_csv() and then write to the table using saveAsTable() or write.format("delta").save().

Now create a Fabric Pipeline and add two activities, first activity is to run the Notebook/Dataflow that loads data from S3 to Lakehouse and second activity is to refresh the semantic model (dataset) using the Power BI Refresh semantic model activity. This ensures that the Lakehouse is updated with fresh S3 data before the report’s semantic model is refreshed.

Lastly schedule the pipeline or trigger it using external tools. You can schedule this pipeline to run at regular intervals, or use APIs/automation to run it when new files are added to S3 (e.g., from AWS Lambda if you want real-time sync).

If I misunderstand your needs or you still have problems on it, please feel free to let us know.

Best Regards,
Hammad.

visheshjain · ‎07-21-2025

Hi @v-mdharahman,

Notebooks do not seem to be a viable options as I do not know how to trigger them when a refresh is triggered from the user's side and running so many notebooks will only create unneccasary load on the capacity.

Can you provide more information as I doubt, a Lambda in AWS can trigger a dataflow/pipeline in Fabric.

Thank you for responding!

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

v-mdharahman · ‎07-28-2025

Hi @visheshjain,

You're absolutely right triggering a Fabric Notebook for every semantic model refresh could put unnecessary load on your capacity, and currently, there’s no native way to trigger a Dataflow Gen2 or Pipeline directly when a user clicks “Refresh” on a Power BI dataset.

Also, as you pointed out, AWS Lambda cannot directly trigger Fabric Pipelines, since there's no public webhook or API for Fabric pipelines today. So instead of writing S3 data into a Lakehouse table and creating a semantic model on top of that table, you can connect your semantic model directly to the files in S3 using a Dataflow Gen2 that reads from S3 and lands data in a Lakehouse folder.

Then, you build your semantic model on top of that folder-based Lakehouse table.

Best Regards,

Hammad

visheshjain · ‎07-28-2025

Hello @v-mdharahman,

Yes I could do that but here is a problem with that.

I have created joins in Athena with other tables and if I bring in data directly from S3 I will have to perform those joins in PQ which everyone knows is an expensive operation for a table of 6 milllion rows and counting, that too without incremental refresh working, as PQ is not able to query fold.

If you have any other solution/suggestion I am all ears.

Thank you,

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

v-mdharahman · ‎07-31-2025

Hi @visheshjain,

You can try to precompute in Athena then export joined eesult to S3 and then load to Fabric Lakehouse Table. This way you you offload heavy joins to Athena where they run efficiently and lands a single clean dataset into Fabric, minimizing Power Query transformation needs. You also get a proper Lakehouse table that supports incremental refresh and partitions while avoiding the need to run joins at runtime in Power BI and keep performance manageable.

If I misunderstand your needs or you still have problems on it, please feel free to let us know.

Best Regards,
Hammad.

visheshjain · ‎07-31-2025

Hi @v-mdharahman,

It seems like a feasible solution but I am not sure if we can automate exporting a CSV from Athena to S3 every time new data comes in.

Also, even if we figure that out, latency will be a question, as exporting data, which is growing every day, to CSV will keep on increasing time and the user will have to wait for some time before s/he can hit the refresh button on the service.

Thank you,

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

v-mdharahman · ‎08-29-2025

Hi @visheshjain,

You’re right Athena doesn’t natively support continuous export on data arrival. Athena is a query service on top of data stored in S3. It doesn’t generate new files by itself unless you explicitly run a query that outputs data. You need another AWS service to call Athena on a schedule or in response to S3 events. You can also reach out to AWS support team to get a better understanding of the services in AWS and how they works with incremental data.
Coming to the latency part, you are also right that dumping a full CSV daily won’t scale well. If every refresh requires exporting the entire dataset into a single CSV/Parquet, the process will keep getting slower. The larger the file, the longer it takes to write it in Athena the transfer it to S3 and load it into Fabric and then refresh the Power BI dataset. So instead of a full export, only export incremental changes (new or updated rows) from Athena to S3 and store them in partitioned folders (e.g., by date). Now in Fabric, load just the new partitions into the Lakehouse table. This way, refresh time remains roughly constant even as total data grows.

Also as @lbendlin has already mentioned that there is no direct connector available in fabric to pull table from AWS athena and if you want it then you can raise the issue on Fabric Ideas Forum.

Best Regards,

Hammad.

v-mdharahman · ‎09-07-2025

Hi @visheshjain,
We noticed there hasn’t been any recent activity on this thread. If you still need support, just drop a reply here and we’ll pick it up from where we left off.

Best Regards,

Hammad.

v-mdharahman · ‎06-30-2025

Hi @visheshjain,

Thanks for reaching out to the Microsoft fabric community forum.

At the moment, Microsoft Fabric doesn’t provide a native connector to directly pull tables from AWS Athena into a Lakehouse or Warehouse. While Power BI Desktop does have an Athena connector that lets you import data for reporting, this connector isn’t currently available in Fabric Dataflows or Pipelines.

However, there’s a supported workaround using Amazon S3. You can export your Athena query results to S3 (in formats like CSV or Parquet), and then use the Amazon S3 connector in Fabric to bring that data into your Lakehouse.

If a direct Athena connector is important for your use case, as already recommend by @lbendlin, try submitting an idea in the Ideas Forum so the product team can prioritize it.

I would also take a moment to thank @suparnababu8 and @lbendlin, for actively participating in the community forum and for the solutions you’ve been sharing in the community forum. Your contributions make a real difference.

If I misunderstand your needs or you still have problems on it, please feel free to let us know.

Best Regards,
Hammad.
Community Support Team

v-mdharahman · ‎07-04-2025

Hi @visheshjain,

As we haven’t heard back from you, so just following up to our previous message. I'd like to confirm if you've successfully resolved this issue or if you need further help. If yes, you are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.

And if you're still looking for guidance, feel free to give us an update, we’re here for you.

Best Regards,

Hammad.

v-mdharahman · ‎07-09-2025

Hi @visheshjain,
Hope everything’s going smoothly on your end. We haven’t heard back from you, so I wanted to check if the issue got sorted.
Still stuck? No worries just drop us a message and we can jump back in on the issue.

Best Regards,

Hammad.

visheshjain · ‎07-09-2025

Hi @v-mdharahman

Please read my previous response!

Thank you,

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

suparnababu8 · ‎06-30-2025

Hi @visheshjain

There is no direct connector availble in fabric to pull table from AWS athena as of now. Currently support connector from AWS is S3 bucket.

If you want Athena Connector, Please submit your Idea here: Fabric Ideas - Microsoft Fabric Community

Thank you!

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

lbendlin · ‎06-30-2025

There is a standard AWS Athena connector in Power BI. Are you looking for something else (shortcuts or mirroring)?

visheshjain · ‎07-01-2025

Hi @lbendlin,

Yes I am looking to mirror the tables. I have the data in S3 on which tables have been created in Athena.

Do you know if having the data in Fabric will reduce refresh times or not?

Currently the gateway takes about 45-50 mins for a 6 million row dataset.

Thank you!

Vishesh Jain

Did I answer your question?
If yes, then please mark my post as a solution!
Thank you,
Vishesh Jain

Proud to be a Super User!

Fetching AWS Athena data into Fabric

Helpful resources

Fabric Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Fetching AWS Athena data into Fabric

Helpful resources

Fabric Monthly Update - September 2025

FabCon Atlanta 2026