Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
brft
Frequent Visitor

Data getting lost between sequential dataflows

To increase performance when analyzing  a large on-prem Excel file, I split a single dataflow into three sequential dataflows.  Performance has increased substantially, but I am seeing some missing data in a particular table I can't understand.

 

My lineage:

brft_0-1658847852761.png

Staging generates 3 tables.  One of these is called "Notes", and it is the nearly raw data from one tab of the excel file.  This dataflow runs without issue, and I see the following information in the refresh history (consistent):

Requested onDataflow nameDataflow refresh statusTable namePartition nameRefresh statusStart timeEnd timeDurationRows processedBytes processed (KB)Max commit (KB)Processor TimeWait timeCompute engineError
26-07-2022 16:35File Staging DataflowCompletedNotesFullRefreshPolicyPartitionCompleted26-07-2022 16:3526-07-2022 16:3500:00:09.91202459170NANA00:00:00.1180CachedNA

 

The next dataflow takes "Notes" as a linked table, then does some transformations, resulting in a new table called "Clean Notes".  In the query editor, the results of the last step look a bit like this.  Clearly there are multiple rows of data.

brft_1-1658848073838.png

After saving that dataflow, I can refresh either only data flow 2 (data cleaning) or dataflows 1 and 2.  In either case, I see the following result for the output of the "Clean Notes" table.  Here only 1 row is processed (and indeed the resulting table has only 1 row).  At the same time, the max commit is 421164, which seems quite large.

Requested onDataflow nameDataflow refresh statusTable namePartition nameRefresh statusStart timeEnd timeDurationRows processedBytes processed (KB)Max commit (KB)Processor TimeWait timeCompute engineError
26-07-2022 16:35Data CleaningCompletedClean NotesFullRefreshPolicyPartitionCompleted26-07-2022 16:3526-07-2022 16:3500:00:19.427011942116400:00:18.844000:00:00.0510Cached + foldedNA

 

Can anyone help me understand/correct when the query editor shows data but when I refresh the dataflow, the data is not written to the resulting table?  I have experimented with changes to the compute engine, but with no obvious change.  Incremental refresh is disabled for all dataflows. I can provide more information if I've left anything important out of the explanation.  This is my first time implementing linked tables across multiple dataflows, so it could end up being something quite simple, but I've been bashing my head into this wall for several hours now.

 

 

 

3 REPLIES 3
brft
Frequent Visitor

One additional note--if I disable the compute engine for all three dataflows, the duration increases from about 1 minute to about 8 minutes.  Tolerable in this case, but it gives me great doubt about using the compute engine in the future.  That being said, it does seem to quite clearly be related to some caching of data.  I would understand, I suppose if a key column remained constant and changes in other columns were missed in the refresh, but to miss an update from 1 row to hundreds of rows seems quite odd.  I don't have any keys defined in the flow here--would that help at all?

Hello - are any of your tables refrencing the disabled table "Clean Details"?

No.  Also no issues with this table.  Earlier today I was having a problem where a column was not detected in dataflow 2 that was clearly present in the first dataflow.  Eventually I moved a bit of transformation from dataflow 2 to dataflow one, which I think triggered a change to the schema and emptied the cache.  Not worried exactly about that since it's fixed now, but I suppose it has the same root cause.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors