Solved: Getting Real time data for large datasets

Hussein_charif · ‎04-07-2025

Hi,

i have just recently been using/learning microsoft fabrics. I am using a business central API to pull data to a dataflow, from that into a lakehouse, then i am pulling the data from my lakehouse into a warehouse using data pipelines.

all the tables coming to my dataflow load very fast, except for the fact table which contains a very large amount of data(300m+).

as for the data pipeline taking the data from my lakehouse to a warehouse it takes roughly 5 minutes.

i researched and found that we could use the following:
for my dataflows, i could apply an incremental refresh to only get new/updated data, that way it doesnt have to load everything which would improve data fetching to be faster.

for the data pipeline, i researched and found something called "data pipeline streaming"? if i am not mistaken, which would make the process where my pipeline gets the data from my lakehouse to my warehouse almost instant and continuous.

can anyone please provide a clear detailed way to apply these steps, or moreover if there are better options as well?

lbendlin · ‎04-07-2025

Consider dropping the dataflow idea and using incremental refresh for your semantic model instead. You can combine that with a "today" direct query partition.

Advanced incremental refresh and real-time data with the XMLA endpoint in Power BI - Power BI | Micr...

View solution in original post

v-priyankata · ‎04-28-2025

Hi @Hussein_charif
I hope this information is helpful. Please let me know if you have any further questions or if you'd like to discuss this further. If this answers your question, please Accept it as a solution and give it a 'Kudos' so others can find it easily.
Thank you.

v-priyankata · ‎04-17-2025

Hi @Hussein_charif
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.

v-priyankata · ‎04-14-2025

Hi @Hussein_charif

May I ask if you have resolved this issue? If so, please mark the helpful reply and accept it as the solution. This will be helpful for other community members who have similar problems to solve it faster.

Thank you.

lbendlin · ‎04-07-2025

300M rows is considered small. You won't gain much performance from incremental refresh.

You state your update takes 5 minutes. Is that not fast enough? What would be your expectation?

Hussein_charif · ‎04-07-2025

hi, sorry for not clarifiying more, the data is 300m rows per month. and as i was told the client wants it as fast as 30 seconds to 1 minute.

lbendlin · ‎04-07-2025

Consider dropping the dataflow idea and using incremental refresh for your semantic model instead. You can combine that with a "today" direct query partition.

Advanced incremental refresh and real-time data with the XMLA endpoint in Power BI - Power BI | Micr...