The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello everyone,
I have a Dataflow Gen2 that copies data from a Dataverse table (D365) to a Lakehouse table.
The Dataverse table has 70,037 rows.
In the dataflow development screen I can see the same number of lines
However, when I go to the result table I only have 18,468 rows
And if I do a re-run, it won't be the same number
In the end, the result is always a success...
After some research I tried to deactivate ‘Fast Copy’, but it doesn't change anything...
Have you ever encountered this problem?
For information, the table is large (over 300 columns), but that's no reason, especially as it's a success.
Thank in advance for you return,
Vivien
I'd still recommend raising the support ticket so an engineer can take a closer look at your scenario and get the traces for it.
Would you happen to have a way in which we can reproduce the issue?
I initially tried to create a Dataflow with the sources you mentioned but was never able to get the same result that you describe. It could be something quite specific about your query and/or your sources, so a deeper investigation through the support route would be the best place to get a much closer look.
Before creating a ticket, I wanted to see if the community had already encountered this problem, as it does not seem specific to my environment.
Yes, to reproduce the problem, I think you just need to:
With the first code, some rows are lost, while with the second code, all rows are transferred.
In fact, as shown in this screenshot, with the first code, not all rows are transferred because it cannot read all the rows.
It transfers all the lines that are read, but since it cannot read everything, not all of it is transferred.
could you confirm whats the code being passed for each of those queries? the screenshot uses references not visible (specifically Dataverse_Tablename)
I specified this in my reply here (dbo.Account).
Thanks for the info!
I'd recommend removing Staging from your queries if its enabled. Would also recommend removing the add custom column step to see if you get the same issue and to isolate the variables as you've mentioned that the issue could be strictly with the Value.NativeQuery usage.
There's a situation where the native query might not give you the same data that you get through the navigator of a connector. However, the difference would be at the table schema shown in the query and not the number of rows. This is because the connector provides projections or reference columns that have either "record" or "table" values which are not supported to be loaded to a Lakehouse, but they'll show up as columns in your query. A native query wouldn't "project" or provide those column references.
I finished doing some tests and I can see a behavior that I'd like to explain.
When we create a connection to a source, we may need to read multiple rows or make multiple calls to get the data that we need. In the activity statistics it may show up the total of all the rows that we read to perform the activity, but it doesn't directly mean that it was the output of the query that could've ran inside of the activity.
Both of the attemps that I had show this behavior:
But they write the same number of rows which is the number of rows that your query, in theory, should have.
You can run 2 validations to make sure that your query is evaluating to the number of rows expected:
If both the queries that you've created are exactly equivalent from each other and they yield the same number of rows within the Power Query Editor (you can validate this by the process stated in #1 using the "Count rows"), then this is something that we would like to understand more and should be raised as a support ticket so an engineer can take a closer look. With a support ticket we would get to the bottom of your scenario and determine why one query loads X number of rows whereas the other might load Y.
@miguel ,
Thank you very much for your feedback and testing.
I can confirm that the Staging feature is not enabled.
When I count the lines, I have more than 70,000 lines.
I did another test. I took only the 111 columns I need (out of 350).
And in this case, it works :
This confirms that it is Value.NativeQuery that causes problems when there are too many columns/too much data.
Can you reproduce a test on your side with a volume (rows and columns) equivalent to what I have on my side?
Thank you in advance for your feedback.
Vivien
I don't have an environment that matches that criteria, but it would help tremendously if our engineering team can capture some traces for your Dataflow. They wouldn't need access to the environment, but just be able to capture the traces from your account / dataflow with explicit consent and do some deeper investigations. Definitely do feel free to raise the support ticket if you have a chance and our team can get to the bottom of it.
In theory a "SELECT * FROM dbo.Account" is not exactly the same query plan that shows up for the query without the Value.NativeQuery, so the team might start by analyzing that first.
Hello,
I have just made some progress in investigating the issue.
Here are two DataFlowGen 2 codes (in the first one, Dataverse_Tablename = ‘account’).
In the first one, I lose lines as explained above, but not in the second one.
This seems to be caused by the use of the Value.NativeQuery line.
Does that give you any ideas? Have you ever encountered this problem before?
Thank in advance,
Have a nice day,
Vivien
Hello @vivien57
First, can you try to only select a few columns
Then, to check if the issue is comming from dataverse OR on fabric side, could you try to execute it in a dataflow gen 1 and check if in your Power BI report you retrieve all data ?
Keep us inform about the results
Hello,
Thank you for your feedback.
By randomly removing 87/366 columns I have managed to read and write 32.152 lines on 70.037 lines.
By randomly removing 151/366 columns I have managed to read and write 54.922 lines on 70.037 lines.
By randomly removing 364/366 columns I have managed to read and write 70.037 lines on 70.037 lines.
So there seems to be a link between the number of columns and this problem.
These tests were done on an F2 capacity, I've tried on an F64 capacity, which doesn't solve the problem.
Do you have any idea what the problem is? It seems to come from DataflowGen2....
Thank in advance,
Vivien
Hello @v-saisrao-msft ,
No the issue is not resolved. I have tested again, and it's the same.
Thank in advance,
Vivien
Hi @vivien57,
May I know if the issue has been resolved? If not, please raise a support ticket as suggested by @miguel, so that the backend team can investigate the issue further.
Below is the link to raise the support ticket:
How to create a Fabric and Power BI Support ticket - Power BI | Microsoft Learn
Thank you.
Would you mind raising a support ticket so an engineer can take a closer look at your scenario and collect all the information to troubleshoot ? Below is the link to raise the support ticket:
Hi @vivien57,
Please see the thread below, which outlines an issue similar to yours. In this case, the data preview in Dataflow Gen2 displays correctly, but some rows are missing in the Lakehouse after publishing.
Solved: Dataflow Gen2 to Lakehouse - missing rows - Microsoft Fabric Community
Thank you.
Hello,
I am on vacation. I will do more tests when I get back in mid-August.
Vivien