Solved: Data Environment Options Follow-Up

icassiem · ‎06-10-2026

Good day,

I previously requested assistance on designing a small data platform for Portfolio reporting

Solved: Data Environment Options - Microsoft Fabric Community

My source enviornment is across 2 AWS regions using frontend report exports API (RDS PosgresSQL + S3 + mongodb).

The API forms data logic java developed API from S3. These product data export API are my primary ingest using then secondary else prod source rds + sr+ mongo) with a service account. So i need to go small scale as i will try and motivate for funding but i need a data store, ingest, compute/transform orch, api loop and powerbi to read. So i received some boost where i was told the reports and data needs to be resuable which gives me encouragement of store + semantic sql views but my cost is low, the AI narratives is only a side benefit not the core purpose

So i hve a current report using a azure function consuming 150 USD, and thought i could now decom this report and push for F2 license of 280 usd with Fabirc+PythinIngest & orchestration for PowerBI direct query reporting and i could use i think Management studio to query data and build reporting overtime. Cam i do this and still be Compliant (GDPR) meaning raw data resides in region but agg is in fabric, is the store limitation etc

like @Natarajan_M pointed out archetetcurei could use

-------

F2 (~$262/month) is your entire stack in one purchase:

Fabric Lakehouse — your storage (replaces needing Azure SQL or Dataverse)
Fabric Notebooks — your ETL (Python script that loops the API per client weekly, writes to Lakehouse)
Fabric Pipelines — your scheduler (automates the notebook runs)
Direct Lake — Power BI reads from Lakehouse with no import refresh needed
Copilot — AI narratives, natural language Q&A, DAX generation

You don't need Dataverse, Azure SQL, or to migrate off AWS. Just read from the product API and store results in Fabric Lakehouse.
Architecture:
Product API → Fabric Notebook (loops per client) → Lakehouse → Power BI + Copilot

-------

i want to design. motivate and fool proof this to my cto, does anyone challenge this and is my best approach it remains low cost, covers full overtime storage, how does storage and ingest and consukme add on top of the F2 cost, i cant afford F3 they will be happy with manual refresh them but im trying to build a small data pltform with this i could possibly push to 400usd if i motivate well because i ma reusing existing cost that is wasted but i need to be sure the pythin can loop these aws client facing api or query the regions s3 + rds and orchesatrete and transform for agg data maniputation in pythin, then sql to build semantic views and then consume or even compute for some ML FC in the future as added benefits like the AI copilot. I cant afford F3 as i think its roughly 600usd

Please help, is this a good and best strongest method and my powerbi reports plugs to the fabric sql views remember i need to agg on my ingest for compliance and i eed to motivate why fabric because we AWS product org, my angle is that easy integrate with powerbi and that i could write scripts accross the fabric store etc comnining both regions data as concoildated. My prod aws analytics account env was decom due to cost but in my new role of consulting and client facing reporting need to show what our product api can do but alsio aid our consulting team, hence narratives , fc etc

Please help if you have any suggestions, improvements, better ideas, documnts, links, erd, motivation etc to help me

Thank You

v-dineshya · ‎06-12-2026

Hi @icassiem ,

1. ERD / Architecture Example.

AWS (API / S3 / RDS / Mongo)
|
Fabric Notebook (Python ingest + transform, remove PII)
|
Lakehouse Tables (Silver Layer only)
|
SQL Views (Semantic Layer)
|
Power BI (Direct Lake / DirectQuery)

Ex: fact_usage_metrics, dim_client, dim_product and dim_date

2. Motivation + Risks: Replace Azure Function ($150) and Potential DB / ETL tools with single Fabric capacity.

Note: Reusable data for reports, Centralized semantic layer and Shared across consulting + clients. Risks you must call out, Capacity limits (F2) as Small compute (2 CUs) Can slow if too many users and large joins

Solution: Pre-aggregate data and Small models

3. Medallion — Do you need it?

Try below architecture.

API --> Python (clean + aggregate + remove PII) --> Silver tables
|
SQL Views (Gold)

Note: No Bronze (save cost), Silver = your storage and Gold = SQL views (no extra storage)

4. Orchestration: Best option is Fabric Pipelines, You can Schedule notebook runs, Retry handling and No extra cost (included in F2).

5. Query using SSMS: YES , this is supported, Please refer below steps.

Go to Lakehouse
Open SQL Endpoint
Copy connection string
Connect via SSMS (SQL Auth / AAD)

Note: Fabric exposes T-SQL endpoint over Lakehouse

6. Dev Access + Power BI Connectivity:

Development: You can use Fabric web portal (primary), VS Code (optional for notebooks) and Git integration available.

Power BI: Direct Lake Connect to Lakehouse tables and No gateway needed.

Note: Gateway only needed if you connect to on-prem data.

7. Why NOT AWS RDS:

AWS RDS --> Operational DB not analytics, Power BI , RDS --> Security risk (prod exposure), Scaling --> Expensive + manual, Transformation -->No built-in ETL.

Note: Fabric contains Separate analytics layer, No impact on prod, Built-in ETL + BI and Lower total cost.

8. Copilot usage: Works in Power BI (narratives, DAX, Q&A) and Fabric notebooks (code assist)

Note: It is not full pipeline automation and Not replacing ETL. It is a“Enhancement layer, not core architecture”.

9. Python vs SQL roles: Python Ingest APIs, Transform, Aggregate and Remove PII. SQL contains Views, Joins and Semantic layer.

10. Ingestion Pattern (overwrite vs CDC): Simple approach (recommended for F2) Overwrite or append and Use timestamps. Better approach: Incremental loads (by date) and Partition tables.

Note: CDC Overkill for your case and Requires more complexity. You do NOT need Azure Data Factory. No extra infra needed Everything is SaaS and no hidden infra costs.

11. Backup Plan (Power BI only approach): Yes you can, but No central data store, Repeated API calls, Not reusable and Not scalable.

Note: It acceptable only for Very small MVP only. Power BI-only creates datasets and is not a reusable data platform.

I hope this information helps. Please do let us know if you have any further queries.

Regards,

Dinesh

View solution in original post

v-dineshya · ‎06-11-2026

Hi @icassiem ,

Thank you for reaching out to the Microsoft Community Forum.

Yes, your proposed Fabric architecture is one of the best low-cost options available today and 100% achievable on F2 capacity.

Regarding your queries, please refer below.

Can Python in Fabric call AWS APIs + loop clients?

Yes, Fabric Notebooks run Python and can call REST APIs, loop through tenants/clients, handle JSON/CSV responses and write to Lakehouse tables. This replaces your Azure Function entirely.

Can it read AWS S3 / RDS if needed?

Yes, it reads via REST APIs, JDBC for PostgreSQL and S3 connectors.

Can you build SQL semantic views?

Yes, Fabric Lakehouse exposes a SQL endpoint, you can create views, Aggregate data and Build reporting models. This becomes your “semantic layer”.

Power BI integration?

Yes, you can integrate.

GDPR & Compliance?

This is compliant if you do not store raw PII in Fabric, you mask/anonymize during ingest and you define retention policy and purpose limitation.

Note: Fabric is used only for aggregated and reporting-level data, not raw personal data. That aligns with GDPR principles.

Architecture Improvements:

1. Layer your Lakehouse properly, even in small setup: Bronze (raw API response - temporary), Silver (cleaned + structured) and Gold (aggregated reporting tables). Then delete Bronze data if needed (for GDPR). Power BI reads from Gold.

2. Use incremental ingestion, instead of full reload only pull changed data per run.

3. Add simple orchestration via Pipelines, schedule notebook weekly/daily and retry logic for API failures.

4. Keep datasets small, because you are on F2, avoid huge joins and Pre-aggregate early.

Please refer below links.

Lakehouse end-to-end scenario: overview and architecture - Microsoft Fabric | Microsoft Learn

What is a lakehouse? - Microsoft Fabric | Microsoft Learn

How to use notebooks - Microsoft Fabric | Microsoft Learn

Direct Lake overview - Microsoft Fabric | Microsoft Learn

Microsoft Fabric - Pricing | Microsoft Azure

Governance and compliance in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

Standards compliance in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

I hope this information helps. Please do let us know if you have any further queries.

Regards,

Dinesh

icassiem · ‎06-11-2026

@v-dineshya Thank You very much

1. do you have an ERD envioronment & solution archetetcure examples or supporting links and docs please

2. and motivation or risks i need to cover for my proposal

3. i probably cant do the medallian due to storage and cost becuase i guess that additional costs, maybe into silver layer where the pythin pipeline agg and removes PII data, maybe only a silver with the sql views as the gold mantic layer iots not storage but its bus tranformed or pre access layer?

4. orchestration?

5. how do i query ssms?

6. how would i access fabric from pc for dev and powerbi connect a gateway to sql views?

7. have I covered all angles theres nothing else like a db in azure or they will ask why not a rds db in aws but that gives powerbi security to rds prod issus and a db on prod product and ev etc

8. thereafter i can use copilot in powrbi narratives, does it allow m,e to use copilot else where in my data management or ingest, transform?

9. so everything is via python correct, the ingest to store to transfomr and sql only for views and query?

10. would my ingest be overrite be or is there a cdc manner or can i use data factory with pythoin or is that dddiotnal cost and is pc dveelopement that needs to deploy to server are those all additional costs?

Sorry for all the ques, im really trying to be sure i cover all agles etc

Please help

icassiem · ‎06-11-2026

I don't think it's a big ask, motivating from 170 USD to 263 USD due to storage and doing internal reporing with F2 pluss the client essay narratives from the Ask to be distrubuted and summaries are good bonuses - so this is my first prize pitch

But wrost case could i do this in powerbi, i know i can schedule refresh python in the services but could powerquery loop per clinet api do the ingest and publish the extracts as datasets for me then to do the repoeting from the datasets almost like dtamrts as powerbi store?

"just thinking out loud"

Please Help

v-dineshya · ‎06-12-2026