Re: Architecture Help for Internal reporting using... - Page 2

icassiem · ‎07-30-2025

Good day,

I need help with a reporting strategy for internal users of the product service data. We have two AWS regions, and I was asked to keep as aws native as possible in terms of tools, etc

So I have the issue that I can't set up a small reporting database in one region with ETL glue etc, but it needs to be replicated for the other region

thinking out loud here:

1. Could I use the ETL to copy data to Power BI or Azure as a final dataset, meaning only the ETL resides in both regions, AWS accounts?

2. but this would not be an aws solution as i think azure is cut off

3. can i do it to powerbi but that would be security concern of data residing in powerbi service

i just dont know how i owuld setup an environment, where i can a do self service platform for my internal clients without duplicating to multiple regions mwaning the data management will be complex etc

anybody has ideas i can explore, i thought push for a case for the data to reside in azure as its best partnered with powerbi and we dont need an ec2 gateway and the aws vpc security isses, but i am feeling strong azure will be a no go as aws is the future plan of the org with azure only for it management

but if i can build a case of data under 1TB or is there an aws archtetcture, power bi solution?

i will be managing and developing the space for chapion users to do self service, the azure from aws is a blocker

please help, any ideas?

regards

rohit1991 · ‎07-30-2025

Hi @icassiem ,

Option 1: AWS-Native with Power BI Gateway (Most AWS-Native); Keep your reporting DB and ETL in your primary AWS region. Use the Power BI on-premises data gateway (can be hosted on an EC2 instance) to connect Power BI to your AWS database. For the second AWS region, only replicate the critical data using AWS DMS or cross-region ETL, rather than duplicating everything. This keeps most of your stack AWS-native and minimizes data management headaches.

Option 2: Central Data Lake (S3) Use an S3 bucket as your “single source of truth” with cross-region replication. Have both regions’ ETL jobs push data to the same S3 structure. Power BI can connect directly to S3 via connectors (ODBC, Athena, or third-party connectors), so you don’t need a separate database per region. Makes self-service much easier, especially if you automate permissions and folder structure.

Option 3: Azure Hybrid (Best for Tight Power BI Integration) If you want the easiest Power BI experience and don’t mind being a little less AWS-pure, you can have your ETL jobs push the data to Azure Data Lake or Azure SQL DB. Power BI “just works” with Azure data sources no gateway or VPC fuss. For under 1TB, Azure storage is pretty cheap, and you eliminate a ton of maintenance (no EC2, no gateway, no cross-region VPC headaches). Great for organizations where reporting is more important than strict cloud purity.

Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

jaineshp · ‎07-30-2025

Hey @icassiem,

Looking at your multi-region AWS setup with PowerBI requirements, here are some practical approaches to consider:

Option 1: AWS-Native with PowerBI Gateway Set up your reporting database and ETL in your primary AWS region, then use PowerBI's on-premises data gateway (can run on EC2) to connect.

For the secondary region, replicate only critical data using AWS DMS or cross-region replication. This keeps most infrastructure AWS-native while enabling PowerBI connectivity.

Option 2: Centralized Data Lake Approach Use S3 as your central data repository with cross-region replication. Build your ETL processes in both regions but have them write to the same S3 bucket structure.

PowerBI can connect directly to S3 via appropriate connectors, reducing the need for separate databases in each region.

Option 3: Hybrid Push Strategy Since you mentioned under 1TB of data, consider using AWS Glue to transform and push final datasets to PowerBI's cloud service via API. This eliminates the gateway requirement and reduces VPC complexity.

Set up the push process in both regions targeting the same PowerBI workspace.

Building Your Azure Case: For executive buy-in, emphasize that PowerBI's strength lies in its integration with Azure services. Point out that keeping data in Azure eliminates:

EC2 gateway maintenance overhead
VPC security complexities
Cross-region data sync challenges
Licensing costs for gateway infrastructure

The 1TB limit actually works in your favor - it's small enough that Azure storage costs are minimal compared to the operational complexity savings.

Would any of these directions align with your organizational constraints?

Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

icassiem · ‎08-04-2025

@v-karpurapud, @jaineshp , @rohit1991 Thank you so much for sharing your knowledge

I am about to start with the strategy, and I want to present three options.

Azure & data warehouse environments like Redshift, Snowflake are not an option, our organisation spent a lot of money migrating to aws over the past 2 years

i need something better than what i currently have of an job schedules > glue > lambda > pyspark api loop > s3 > athena data lake > (no views or manipulation) as i need to prove why we need to move for a centralised self service space where users access the managed dataviews only for writing powerbi reports

if athena remains the best option, how do i expand and improve as i need a job management, exception hanlding/notification. was hoping there was something better than GUi glue, i also dont see the athena data view enviornment. Also note i know it is under 1tb at the moment but what if in 2 years it grows to 2TB for the Hybrid Push Strategy

Please guide me, please provide the 3 best options and in its order you highly advise please, with some detail and as to why/benefits links etc, even roughly cost brackets. the reason why i struggle with glue is that we use lambda's which i dont want to use or pyspark scripting as i need to then use a vendor which we do now and my space is small and dont need his dependency, i want to insert api and loop per table config example to get client api data for all clients and the a data management to manipulate the data, so do i go rds postgress or mysql or ? i would like snowflake or redshift but my datasize does not justify that cost as the purpose is purely internal management reporting on product with roughly 20 clients, data is small but i need to be able to build bussiness logic datasets with sql or ?

please help, i want to place the 3 options over a table to compare and provide my recommendation of the option priority, so that i can be open minded to my CTO that i present this best robust solution as my sources are rds, product api per client (loop), i need alreats and need views to mniputlate and track failures

sorry i know i am asking a lot, but me searching the net i will come know where close to whats best and i really want to add value in my new organization

Architecture Help for Internal reporting using PowerBI

Hi @icassiem ,

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Architecture Help for Internal reporting using PowerBI

Hi @icassiem ,

Helpful resources

Power BI Monthly Update - August 2025

Fabric Community Update - August 2025