Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
icassiem
Post Prodigy
Post Prodigy

Architecture Help for Internal reporting using PowerBI

Good day,

 

I need help with a reporting strategy for internal users of the product service data. We have two AWS regions, and I was asked to keep as aws native as possible in terms of tools, etc

 

So I have the issue that I can't set up a small reporting database in one region with ETL glue etc, but it needs to be replicated for the other region

thinking out loud here:

1. Could I use the ETL to copy data to Power BI or Azure as a final dataset, meaning only the ETL resides in both regions, AWS accounts?

2. but this would not be an aws solution as i think azure is cut off

3. can i do it to powerbi but that would be security concern of data residing in powerbi service

 

i just dont know how i owuld setup an environment, where i can a do self service platform for my internal clients without duplicating to multiple regions mwaning the data management will be complex etc

 

anybody has ideas i can explore, i thought push for a case for the data to reside in azure as its best partnered with powerbi and we dont need an ec2 gateway and the aws vpc security isses, but i am feeling strong azure will be a no go as aws is the future plan of the org with azure only for it management

but if i can build a case of data under 1TB or is there an aws archtetcture, power bi solution?

i will be managing and developing the space for chapion users to do self service, the azure from aws is a blocker

 

please help, any ideas?

 

regards 

19 REPLIES 19
jaineshp
Impactful Individual
Impactful Individual

Hey @icassiem,

Thanks for the detailed follow-up — it's great to see how committed you are to designing a future-proof and self-service reporting solution. Based on your constraints (AWS-only, under 1TB currently, Glue/S3/Athena as baseline, avoidance of Lambda, preference for job control and managed views), here are three practical options, prioritized by suitability.

Option 1: Athena-Based Architecture with Step Functions and Enhanced Monitoring (Recommended)

  • Build on your current Glue → S3 → Athena structure, but introduce Step Functions to orchestrate ETL, API loops per client, and manage job state.

  • Replace Lambda with Glue Python Shell Jobs, giving you more control and flexibility.

  • Use Athena Views to define and expose only managed, consumable data sets for PowerBI reporting.

  • Set up SNS and CloudWatch Alarms for failure alerts, retries, and visibility.

  • Enables a fully serverless, cost-efficient architecture with strong observability.

  • Approximate Cost: Low — primarily based on data scanned in Athena and Glue job runtime.

 Option 2: Introduce RDS PostgreSQL as a Curated SQL Layer

  • Use RDS (PostgreSQL) as a lightweight, centralized database to hold curated datasets after transformation.

  • PowerBI connects via on-premises data gateway (can run on EC2 or container).

  • Step Functions or Glue handle ingestion and transformation, then write to RDS.

  • Ideal for building re-usable views and joining multiple source tables (e.g., client-level API data).

  • Approximate Cost: Moderate — small RDS instance (~$80/month) + gateway maintenance.

  Option 3: Hybrid Push to PowerBI Service via API (Best for Controlled Volumes)

  • Push final, transformed datasets directly into PowerBI via the PowerBI REST API — skipping the need for a gateway.

  • Glue or Step Functions can perform transformations and trigger API calls.

  • Suitable for small, predefined models and eliminates VPC/networking complexity.

  • Less ideal for audit logging, recovery, or user-driven exploration.

  • Approximate Cost: Very low infra cost — pay per execution/API call only.

You're clearly thinking strategically, and this approach will definitely bring long-term benefits to your data consumers.

Fixed? ✓ Mark it • Share it • Help others!


Best Regards,
Jainesh Poojara | Power BI Developer

@v-karpurapud, @jaineshp , @rohit1991  Thank you so much for sharing your knowledge

I am about to start with the strategy, and I want to present three options.

Azure & data warehouse environments like Redshift, Snowflake are not an option, our organisation spent a lot of money migrating to aws over the past 2 years

i need something better than what i currently have of an  job schedules > glue > lambda > pyspark api loop > s3 > athena data lake > (no views or manipulation) as i need to prove why we need to move for a centralised self service space where users access the managed dataviews only for writing powerbi reports

 

if athena remains the best option, how do i expand and improve as i need a job management, exception hanlding/notification. was hoping there was something better than GUi glue, i also dont see the athena data view enviornment. Also note i know it is under 1tb at the moment but what if in 2 years it grows to 2TB for the Hybrid Push Strategy

 

Please guide me, please provide the 3 best options and in its order you highly advise please, with some detail and as to why/benefits links etc, even roughly cost brackets. the reason why i struggle with glue is that we use lambda's which i dont want to use or pyspark scripting as i need to then use a vendor which we do now and my space is small and dont need his dependency, i  want to insert api and loop per table config example to get client api data for all clients and the a data management to manipulate the data, so do i go rds postgress or mysql or ? i would like snowflake or redshift but my datasize does not justify that cost as the purpose is purely internal management reporting on product with roughly 20 clients, data is small but i need to be able to build bussiness logic datasets with sql or ?

 

please help, i want to place the 3 options over a table to compare and provide my recommendation of the option priority, so that i can be open minded to my CTO that i present this best robust solution as my sources are rds, product api per client (loop), i need alreats and need views to mniputlate and track failures

 

sorry i know i am asking a lot, but me searching the net i will come know where close to whats best and i really want to add value in my new organization

jaineshp
Impactful Individual
Impactful Individual

Hey @icassiem,

Thank you for the kind recognition - always happy to contribute to our community's success!

Best Regards,
Jainesh Poojara | Power BI Developer

@jaineshp would your reply of earlier be your response to my latest reply/request (30min ago)? Please advise, brainstorm and scrutinise my thinking as I will start putting in place the 3 options. I prefer links to back my case to show my homework done, etc..

jaineshp
Impactful Individual
Impactful Individual

Hey @icassiem,

Thank you for expanding on the requirements—it really helps clarify the situation you're working with. Here's how I'd refine and validate the three options further, based on your latest inputs:

Confirmation: Yes — My Previous Reply Stands, With Some Refinements

Your thinking is very sound — especially your push for:

  • Central SQL-based data views

  • Alerting and monitoring

  • Avoiding PySpark/Lambda-heavy dependencies

  • Building a scalable, centralized data layer

That said, below is a revised breakdown with some scrutiny, suggestions, and extra links to strengthen your case when pitching to leadership.

Option 1: AWS Step Functions + RDS (PostgreSQL) + S3 + Athena

Still my top recommendation based on cost, flexibility, and simplicity


What works well:

  • RDS PostgreSQL gives you clean, view-based modeling with SQL (great for BI self-service)

  • Step Functions offers job orchestration without coding everything from scratch

  • You can still archive to S3 and query via Athena when needed

  • Very low Lambda dependency — just use them if absolutely necessary as small wrappers

  • SNS + CloudWatch for failure alerts and visibility

 Scrutiny:

  • Step Functions state machine debugging can get tricky for large pipelines — use visual workflows & logging

  • RDS scaling should be monitored, especially with concurrent Power BI users (add read replicas if needed)

 Resources to back it up:

Step Functions Intro

Using RDS PostgreSQL for BI

Athena CTAS & Views 


Option 2: MWAA (Managed Airflow) + RDS + S3

Use if orchestration complexity will increase soon (e.g. parallel APIs, retries, conditional workflows)

 Pros:

  • Airflow gives you granular DAG-level visibility, retries, logging

  • You can modularize your logic and control retries per task

  • Can be extended later to trigger ML models or external systems

  • Still SQL-centric with PostgreSQL as core transformation layer

 Watch out for:

  • MWAA is more expensive to run even for small loads

  • Setup time is more than Step Functions, and operational overhead is real

  • Might feel like "overkill" unless workflows become more complex than they are now

 

Resources:

MWAA on AWS

Airflow DAG Concepts 

Airflow + RDS use case 

Option 3: Athena-Only with Better Scheduling + Metadata Layer

Viable if budget is the main blocker, and you don’t anticipate high data transformation needs

 Strengths:

  • Zero infra — Athena on S3 is very cheap and scalable

  • You can define business views using SQL (via CTAS or Views)

  • Simple scheduling with EventBridge or Step Functions + Lambda/Python jobs

 Weaknesses:

  • No central orchestration: managing failure, retries, dependencies becomes painful

  • Not ideal for layered business logic across tables (joins can get slow at >2TB)

  • Lack of state or metadata tracking without extra tooling (e.g., no “which client failed last?” out of the box)

 

Fixed? ✓ Mark it • Share it • Help others!


Best Regards,
Jainesh Poojara | Power BI Developer

@jaineshp , @v-karpurapud , @rohit1991  , apologies for a simple questiona, are there no additional perceptives to my updated response & @jaineshp answer yesterday?

As i want to compile the best 3 options and want to be 200% confident everything and best solutions is provided. Putting my neck out here i would be implementing the archtetcure too so i really need to be sure i built a futurastic and modern archetcture, one that i can be proud of and comfrotable meets the objective and provides visible stakeholder change/improvements

@jaineshp the option 1 of: AWS Step Functions + RDS (PostgreSQL) + S3 + Athena is very similar to the current legacy of AWS step function > glue > lambda > crawlers > s3 > athena > ec2 gateway
the symantic layer of view are probably in the postgress but the job management similar to sql job agent is missing where all jobs and tasks are mangeed with alerts etc,, i miss SSIS/Daat factory 🙂 thought there was tool integrated in AWS (DBT example, dont know) that could manage the space of scheduling/orch > etl > data management > build semantic layer > power bi connect

the option 1 also misses the key factor of the (2) multiple regions that we want to centralize and denomrlize the data from client api to various db stages for self service, so that we dont dupl the enviornment per region
Sorry for asking again

Hi @icassiem 


Thank you for getting back with us. To design a modern, scalable, and AWS-native internal reporting platform that eliminates current architectural pain points and supports centralized, self-service BI with Power BI, the most suitable approach is to build a centralized, SQL-driven architecture with simplified orchestration, data modeling, and governance.

Option 1: MWAA + RDS (PostgreSQL) + S3 + Athena — Future-Ready Architecture


This architecture presents a fully centralized, SQL-driven, and scalable solution designed for long-term growth and operational efficiency. AWS Managed Workflows for Apache Airflow (MWAA) is leveraged to orchestrate API ingestion, data transformation, retries, and alerting, offering complete visibility and control through DAGs. Raw data is stored in Amazon S3, while business logic is modeled using PostgreSQL views, which can be directly consumed by Power BI for reporting and analytics.

This approach eliminates the need for complex Lambda-based logic, supports multi-region data handling via S3 replication, and ensures centralized job management and failure tracking. It is particularly well-suited for organizations anticipating data growth beyond 2TB and seeking robust orchestration, SQL-based modeling, and future scalability.

 
Option 2: Step Functions + RDS (PostgreSQL) + S3 + Athena — Lightweight & Cost-Effective Architecture

 

This architecture offers a streamlined and cost-efficient solution for orchestrating data pipelines using AWS Step Functions. It enables lightweight orchestration of client API loops, data transformations, and alerting through CloudWatch and SNS, while storing raw data in Amazon S3. Business logic is modeled using PostgreSQL views, which can be directly accessed by Power BI for reporting and analytics.

The solution provides visual workflows, built-in error handling, and scheduling capabilities with significantly lower operational overhead compared to MWAA, making it ideal for smaller teams or simpler workloads. It supports centralized, SQL-driven modeling without the complexity of Airflow, and is well-suited for organizations prioritizing simplicity, low-code orchestration, and cost efficiency.
Estimated Monthly Cost: Approximately $150–$250

 Option 3: Athena + Glue DataBrew + EventBridge — Minimal & Budget-Friendly Architecture

 

This architecture is designed for teams managing small datasets with minimal orchestration requirements, offering a highly cost-effective and simplified solution. AWS Glue DataBrew enables no-code data preparation, while SQL-based modeling is achieved through Athena Views or CTAS (Create Table As Select). Scheduling is handled via EventBridge or optionally Step Functions, depending on workflow complexity. Although this setup does not provide full orchestration or built-in job monitoring, it is well-suited for use cases where simplicity, low maintenance, and budget constraints are key considerations. Manual intervention may be required for failure tracking and metadata management, but the overall operational overhead remains minimal.Estimated Monthly Cost: Less than $100

I hope this information is useful. If you need more assistance, please provide additional details.

Regards,
Karpurapu D,
Microsoft Fabric Community Support Team.



 

@v-karpurapud , @jaineshp , @rohit1991 Thank you very much, and apologies for the silly questions here as I need to ensure I highlight the pros/cons when presenting

 

@v-karpurapud 

Option 1: MWAA + RDS (PostgreSQL) + S3 + Athena — Future-Ready Architecture

Option 2: Step Functions + RDS (PostgSQL) + S3 + Athena — Lightweight & Cost-Eff Arch

Option 3: Athena + Glue DataBrew + EventBridge — Minimal & Budget-Friendly Architecture
Questions

A. Is the MWAA the orchestrates and ETL to extract sources (rds mysql/post, api, s3, mongodb), how would it handle data manipulation like we used pyspark/pyhton (denormalise, api loop, manipulate etc) and this eliminates Glue?

B. The raw data is stored in S3 and logic in postgresql views - does this mean that psotgresql and not athena connects directly on the S3 - what about the unstructred daat etc, i thought postgresql only works on rationalised data structures?

C. If the data is stored in S3 i would not be able to store snapshot data overtime correct, meaning it will be driven/refreshed from source only?

@jaineshp  with MWAA i looked into this some time back, and the only reason is i was looking for a tool that has many options packed, like @v-karpurapud mentions it can do api loops/data manipulation etc hence i mentioned for example only DBT "i dont know" then i dont need different places to track etl/glue and step/functions for orch and then still pyspark/python to transform manipulate/loop lamnda api calls, but the job scheduling is very simple and basic

 

Also the elements we missing, are the 2 regions centralised & gateway - we dont want to replicate the environment on both regions, plus is the EC2/Gateway within the VPC needed as the prod product resides in same VPC hence security was not keen

 

Apologies, please help and provide a bit more detail, i recevied some more clarity yesterday it must be a native AWS self service env on possibly a transformed DB/S3 storage for data snapshot/or not "con if not" and semantic layer with views because the objective might in future expand for sharing data to clients via api & product reporting, the current objective is a draft strategy for internal PBI reporting but i know stakeholders expectations as theres already misalignment. I want to ensure it is simple (to dev/manage/alerts etc), cost effective (+/- $1000 monthly target), modern (design, data flow, tools  etc) and yet Expandable, like the data storeage of Postgresql and the MWAA orch/etl I think cost is roughly $150-200 monthly, but if this is the key orch+etl+data maniputlation it is essentailly half the data flow i think i can justify it, but it must make my life much easier as i implement, future develop and maintain.


Me thinking out loud here 🙂
I want to narrow it down to 2 best recommended options, one with S3 storage for internal reporting on current data only with overwrites, no data snapshots, track over time or benchmarking etc data and the other a more upscale one of db store etc ?

 

I think the challenge outside of the tools etc is the simplicity of dev.manage with data centralization

@jaineshp , @v-karpurapud , @rohit1991 Hi, any update on my last reply above, please?

For security, do I set up a separate account within the one region but outside of the product VPC so that i can run MWAA/glue accross regions into the one database and then it is easier/less risk and answer the security officers push back, to expose the db for users to connect to the db from their pc and dont require to connect to an EC2 to access the db+semantic layer of views?

also can the MWAA/glue do ingest/access as its mostly api/rds across the regions+accounts to the one region new account outside of prod/source data?

 

if i keep it the one account same a prod product and one region i cant expose the db and have to let uses connect to the EC2, which is not really ideal for Self service cntralised environement

please help

Hi @icassiem 

 

The cleanest way to meet your security requirements and still centralize reporting is to run everything in a dedicated AWS account set up specifically for analytics. This account would sit in one region and host all of your reporting infrastructure: MWAA or Fargate to handle orchestration and API ingestion, a central S3 bucket for raw and transformed data, and an Aurora or RDS instance to hold your curated datasets and semantic views. Keeping this separate from your production accounts isolates any reporting activity from operational systems, reduces security risk, and gives you a clear, compliant boundary for data access.

Data from both regions can flow into this account without replicating the entire environment. In a push model, small extract jobs in each source account write directly into the reporting S3 bucket using a cross-account IAM role that only allows that one action. In a pull model, MWAA or Fargate in the reporting account uses STS to assume a read-only role in the source accounts and fetch from APIs, RDS replicas, or S3. Either way, all transformation, denormalization, and business logic happen in the reporting account, so you’re no longer relying on Lambda/PySpark or juggling jobs across multiple regions.

Analysts would connect to the curated database in a way that satisfies security without forcing an EC2 gateway. The easiest path is AWS Client VPN, which gives them private network access to the reporting VPC and supports corporate authentication, or PrivateLink if you prefer to route through your existing corporate network. The database stays private, fully inside AWS, and only authorized users can reach it. This keeps the design AWS-native, simple to manage, cost-controlled, and flexible enough to expand later with features like historical snapshots or client-facing APIs when the business is ready.

Regards,
Karpurapu D,
Microsoft Fabric Community Support Team.

 

@v-karpurapud Thank you for your patience

Please check if I understand it correctly, as I have a few ques, the proposed architecture flow is:

 

1. A Separate AWS account "Analytics" from the Product Account in the one main region

2. Setup Analytics account read access to the two Regions Prod (rds, api, s3)

3. MWAA or Fargate to both job orch + ingest, no need for glue (cross region RDS, API, S3)?

       > your recommendation that provides simplest/best gui dev?

       > can this do loops of api with params?

4. Source destination of ETL = Stageing S3 Bucket

5. Data Transformation to (Aurora or RDS)

      > what is your preference on the DB for denormalise/user access layer/works best for PBI?

      > how would i automate/schedule transformation from the S3 Stageing to Semantic Layer             of (Aurora or RDS) tables, could i shcedule sql scripts/procs on (Aurora or RDS) to read the staging S3 then write to a table or schedule or a tool/DBT etc?

     > I think Aurora +RDS can be serverless, im just thinking which one offers the best sql tramsformation jobs becuase my senior does not want views/dw etc?

6. User Connectivity & report development directly to DB and not EC2

       > No EC2 connect ofr db connectivity
       > Setup in Analytics acc security of AWS Client VPN or PrivateLink

       > which one you recommend for users connecting from desktop to the RDS/Aurora?

7.  Don't we need a On-Prem Gateway on a EC2?

 

Regards

v-karpurapud
Community Support
Community Support

Hi @icassiem 

Thank you for reaching out to the Microsoft Fabric Community Forum.


As recommended by @rohit1991 and @jaineshp  the optimal solution for a secure, scalable, AWS-native internal reporting architecture is to implement a Centralized S3 Data Lake governed by AWS Lake Formation. By enabling S3 Cross-Region Replication, data from multiple AWS regions can be consolidated into a single source of truth, eliminating manual duplication and simplifying data management.
 

AWS Lake Formation will provide fine-grained access controls at the database, table, and column levels, ensuring secure, governed data access for Power BI users while reducing IAM complexity. To simplify current ETL pipelines, AWS Glue DataBrew offers a no-code, visual interface for data transformation and cleansing, making ETL processes manageable for a single developer with automated, repeatable workflows.
 

For business logic modeling, Athena SQL Views can serve as a semantic layer for immediate needs. If performance and concurrency demands grow, this can be scaled to Amazon Redshift Serverless for a high-performance semantic model without infrastructure overhead.
 

Power BI will connect directly to AWS data sources via Athena ODBC or Redshift connectors, eliminating the need for EC2-hosted gateways or data movement into Azure. Given your data volume of under 1TB, Import Mode in Power BI remains a viable option to enhance report performance while ensuring data remains within AWS.
 

This architecture simplifies ETL, strengthens data governance, enables secure self-service BI, and aligns with your organization’s AWS-first strategy  providing a future-proof, cost-effective solution for scalable internal reporting.

Regards,
Karpurapu D,
Microsoft Fabric Community Support Team.


Hey @v-karpurapud,

Thank you for the kind recognition - always happy to contribute to our community's success!

Best Regards,
Jainesh Poojara | Power BI Developer

icassiem
Post Prodigy
Post Prodigy

@rohit1991 @jaineshp  Wow, Thank You

Let me complete my scorecard this week, have a few things to implement, I did not expect such good help so quickly 🙂 

 

Please help me build this business case, where i present all 3 options, with pros and cons and then make a recommendation A or B of the 3 options Can you help with this in the meantime please?

 

So my current env is a datalake using glue off S3 and thena, but cost is heavy for small space, self serive not great for users as i need to transform the data but i dont like glue and i am alone so dont want heavy coding solutions with job management and alerts etc

i am starting full swing with this on monday, please bear with me

Hi @icassiem ,

 

Yes, you can keep everything AWS-native without Azure as the final data layer:

  • Keep your reporting data in S3 (not Azure) and use AWS Glue/ETL to update only what’s needed.

  • Connect Power BI directly to S3 using Athena (ODBC or built-in connector). Power BI just reads from AWS no need to move data into Azure.

  • Security is handled at the AWS level (IAM/Lake Formation). Only metadata goes to Power BI.

This avoids duplicating data across clouds, keeps everything compliant, and still allows for self-service reporting in Power BI.


Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

but this is very similar to the current environment (1 region only) of glue to S3 and athena connect to powerbi, i am trying to not only consolidate the two regions data but also wantto improve it as the glue was code heavy with little data manipuation as the lambda api call loops etc i am using a 3rd party which we need to move away from, meaning we moving forward of a 5 year old design that is expensive, i need to prove why we chagning for better. being a sql guy i would have a semantic layer of views with business logic off a dw for users to drag and drop, so i need to thissimlar not just drop data in glue, i want a lyaer where ist user ready

rohit1991
Super User
Super User

Hi @icassiem ,

 

Option 1: AWS-Native with Power BI Gateway (Most AWS-Native); Keep your reporting DB and ETL in your primary AWS region.  Use the Power BI on-premises data gateway (can be hosted on an EC2 instance) to connect Power BI to your AWS database. For the second AWS region, only replicate the critical data using AWS DMS or cross-region ETL, rather than duplicating everything. This keeps most of your stack AWS-native and minimizes data management headaches.

Option 2: Central Data Lake (S3) Use an S3 bucket as your “single source of truth” with cross-region replication. Have both regions’ ETL jobs push data to the same S3 structure. Power BI can connect directly to S3 via connectors (ODBC, Athena, or third-party connectors), so you don’t need a separate database per region. Makes self-service much easier, especially if you automate permissions and folder structure.

Option 3: Azure Hybrid (Best for Tight Power BI Integration) If you want the easiest Power BI experience and don’t mind being a little less AWS-pure, you can have your ETL jobs push the data to Azure Data Lake or Azure SQL DB. Power BI “just works” with Azure data sources no gateway or VPC fuss. For under 1TB, Azure storage is pretty cheap, and you eliminate a ton of maintenance (no EC2, no gateway, no cross-region VPC headaches). Great for organizations where reporting is more important than strict cloud purity.


Did it work? ✔ Give a Kudo • Mark as Solution – help others too!
jaineshp
Impactful Individual
Impactful Individual

Hey @icassiem,

Looking at your multi-region AWS setup with PowerBI requirements, here are some practical approaches to consider:

Option 1: AWS-Native with PowerBI Gateway Set up your reporting database and ETL in your primary AWS region, then use PowerBI's on-premises data gateway (can run on EC2) to connect.

For the secondary region, replicate only critical data using AWS DMS or cross-region replication. This keeps most infrastructure AWS-native while enabling PowerBI connectivity.


Option 2: Centralized Data Lake Approach
Use S3 as your central data repository with cross-region replication. Build your ETL processes in both regions but have them write to the same S3 bucket structure.


PowerBI can connect directly to S3 via appropriate connectors, reducing the need for separate databases in each region.


Option 3: Hybrid Push Strategy
Since you mentioned under 1TB of data, consider using AWS Glue to transform and push final datasets to PowerBI's cloud service via API. This eliminates the gateway requirement and reduces VPC complexity.


Set up the push process in both regions targeting the same PowerBI workspace.


Building Your Azure Case:
For executive buy-in, emphasize that PowerBI's strength lies in its integration with Azure services. Point out that keeping data in Azure eliminates:

  • EC2 gateway maintenance overhead
  • VPC security complexities
  • Cross-region data sync challenges
  • Licensing costs for gateway infrastructure

The 1TB limit actually works in your favor - it's small enough that Azure storage costs are minimal compared to the operational complexity savings.

Would any of these directions align with your organizational constraints?

Did it work? ✔ Give a Kudo • Mark as Solution – help others too!

@v-karpurapud, @jaineshp , @rohit1991  Thank you so much for sharing your knowledge

I am about to start with the strategy, and I want to present three options.

Azure & data warehouse environments like Redshift, Snowflake are not an option, our organisation spent a lot of money migrating to aws over the past 2 years

i need something better than what i currently have of an  job schedules > glue > lambda > pyspark api loop > s3 > athena data lake > (no views or manipulation) as i need to prove why we need to move for a centralised self service space where users access the managed dataviews only for writing powerbi reports

 

if athena remains the best option, how do i expand and improve as i need a job management, exception hanlding/notification. was hoping there was something better than GUi glue, i also dont see the athena data view enviornment. Also note i know it is under 1tb at the moment but what if in 2 years it grows to 2TB for the Hybrid Push Strategy

 

Please guide me, please provide the 3 best options and in its order you highly advise please, with some detail and as to why/benefits links etc, even roughly cost brackets. the reason why i struggle with glue is that we use lambda's which i dont want to use or pyspark scripting as i need to then use a vendor which we do now and my space is small and dont need his dependency, i  want to insert api and loop per table config example to get client api data for all clients and the a data management to manipulate the data, so do i go rds postgress or mysql or ? i would like snowflake or redshift but my datasize does not justify that cost as the purpose is purely internal management reporting on product with roughly 20 clients, data is small but i need to be able to build bussiness logic datasets with sql or ?

 

please help, i want to place the 3 options over a table to compare and provide my recommendation of the option priority, so that i can be open minded to my CTO that i present this best robust solution as my sources are rds, product api per client (loop), i need alreats and need views to mniputlate and track failures

 

sorry i know i am asking a lot, but me searching the net i will come know where close to whats best and i really want to add value in my new organization

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.