The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Good day,
I need help with a reporting strategy for internal users of the product service data. We have two AWS regions, and I was asked to keep as aws native as possible in terms of tools, etc
So I have the issue that I can't set up a small reporting database in one region with ETL glue etc, but it needs to be replicated for the other region
thinking out loud here:
1. Could I use the ETL to copy data to Power BI or Azure as a final dataset, meaning only the ETL resides in both regions, AWS accounts?
2. but this would not be an aws solution as i think azure is cut off
3. can i do it to powerbi but that would be security concern of data residing in powerbi service
i just dont know how i owuld setup an environment, where i can a do self service platform for my internal clients without duplicating to multiple regions mwaning the data management will be complex etc
anybody has ideas i can explore, i thought push for a case for the data to reside in azure as its best partnered with powerbi and we dont need an ec2 gateway and the aws vpc security isses, but i am feeling strong azure will be a no go as aws is the future plan of the org with azure only for it management
but if i can build a case of data under 1TB or is there an aws archtetcture, power bi solution?
i will be managing and developing the space for chapion users to do self service, the azure from aws is a blocker
please help, any ideas?
regards
Hey @icassiem,
Looking at your multi-region AWS setup with PowerBI requirements, here are some practical approaches to consider:
Option 1: AWS-Native with PowerBI Gateway Set up your reporting database and ETL in your primary AWS region, then use PowerBI's on-premises data gateway (can run on EC2) to connect.
For the secondary region, replicate only critical data using AWS DMS or cross-region replication. This keeps most infrastructure AWS-native while enabling PowerBI connectivity.
Option 2: Centralized Data Lake Approach Use S3 as your central data repository with cross-region replication. Build your ETL processes in both regions but have them write to the same S3 bucket structure.
PowerBI can connect directly to S3 via appropriate connectors, reducing the need for separate databases in each region.
Option 3: Hybrid Push Strategy Since you mentioned under 1TB of data, consider using AWS Glue to transform and push final datasets to PowerBI's cloud service via API. This eliminates the gateway requirement and reduces VPC complexity.
Set up the push process in both regions targeting the same PowerBI workspace.
Building Your Azure Case: For executive buy-in, emphasize that PowerBI's strength lies in its integration with Azure services. Point out that keeping data in Azure eliminates:
The 1TB limit actually works in your favor - it's small enough that Azure storage costs are minimal compared to the operational complexity savings.
Would any of these directions align with your organizational constraints?
Did it work? ✔ Give a Kudo • Mark as Solution – help others too!
@v-karpurapud, @jaineshp , @rohit1991 Thank you so much for sharing your knowledge
I am about to start with the strategy, and I want to present three options.
Azure & data warehouse environments like Redshift, Snowflake are not an option, our organisation spent a lot of money migrating to aws over the past 2 years
i need something better than what i currently have of an job schedules > glue > lambda > pyspark api loop > s3 > athena data lake > (no views or manipulation) as i need to prove why we need to move for a centralised self service space where users access the managed dataviews only for writing powerbi reports
if athena remains the best option, how do i expand and improve as i need a job management, exception hanlding/notification. was hoping there was something better than GUi glue, i also dont see the athena data view enviornment. Also note i know it is under 1tb at the moment but what if in 2 years it grows to 2TB for the Hybrid Push Strategy
Please guide me, please provide the 3 best options and in its order you highly advise please, with some detail and as to why/benefits links etc, even roughly cost brackets. the reason why i struggle with glue is that we use lambda's which i dont want to use or pyspark scripting as i need to then use a vendor which we do now and my space is small and dont need his dependency, i want to insert api and loop per table config example to get client api data for all clients and the a data management to manipulate the data, so do i go rds postgress or mysql or ? i would like snowflake or redshift but my datasize does not justify that cost as the purpose is purely internal management reporting on product with roughly 20 clients, data is small but i need to be able to build bussiness logic datasets with sql or ?
please help, i want to place the 3 options over a table to compare and provide my recommendation of the option priority, so that i can be open minded to my CTO that i present this best robust solution as my sources are rds, product api per client (loop), i need alreats and need views to mniputlate and track failures
sorry i know i am asking a lot, but me searching the net i will come know where close to whats best and i really want to add value in my new organization