- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dataflow Join Bug
I have a JSON file with many nested arrays that I'm trying to flatten into a single table using a Dataflow Gen2. I have a main table (table0) that contains some metadata about each record and an expandable object. I added a few steps to filter out unwanted records from table0.
I referenced table0 to make two more tables (table1 and table 2), expanding different nested arrays but keeping the id field which is the unique record identifier. I then left joined table1 to table2. As soon as I expanded table2 to add fields to table1, a number of records (81) disappeared from table1 (the left side of the join) and were replaced by the same number of new records, but the new records were all ones that had been filtered out of table0 (the source for both table1 and table2). I've compared the datasets both right before and after expanding the fields and confirmed this step was introducing the bad data. Is this this a known issue or has anyone seen anything like this before? It's making dataflows unusable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply @v-yilong-msft. I double checked the tables and joins and couldn't find anything suspicious. I also did not find anything in the known issues site. I'm guessing it has something to do with the way the system is parsing the JSON, and possibly (mis)handling empty arrays (even though I am actively filtering them out). So, I've abandoned trying to parse the file using a dataflow and am changing to a spark notebook instead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @pace-jp ,
Based on the behavior you describe, i.e., records that were filtered out after expanding fields after connecting reappear, it is unusual.
So I think you can take the following steps:
1. Ensure that the join conditions are correctly specified and that there are no unintended matches causing the reintroduction of filtered records.
2. Verify that the data in table0, table1, and table2 is consistent and that the filtering steps are correctly applied before the join operation.
3. The issue might be related to how the expansion step is handled. Double-check the configuration of this step to ensure it doesn't inadvertently reintroduce filtered records.
4. You can also look at this document about dataflow known issues below:
Fabric known issues - Microsoft Fabric | Microsoft Learn
Best Regards
Yilong Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources
Subject | Author | Posted | |
---|---|---|---|
05-30-2024 07:57 AM | |||
05-30-2024 07:32 AM | |||
02-20-2025 12:03 PM | |||
04-22-2024 04:39 AM | |||
04-05-2024 11:01 AM |