Solved: Microsoft Fabric Lakehouse shows blank data

yash14vashistha · ‎01-08-2025

Hello All!

I have created a Dataflow Gen2 dataflow to bring in 900 JSON files from Azure Blob storage into a Lakehouse. The dataflow ran and completed in 44 sec. That did seem to be a very short time to me. But I was able to see the table getting created however, I could not see any data when I navigated into it -

Moving forward with the process, I loaded this data into a Semantic model I have created which already contains two other datasets I have brought in from the same blob storage, however I had only brought in one and two files. I have also created a Power BI report connected to the model in which I created a table with the data coming from the 900 Files. However, the table came out to be blank. I am not sure where I am going wrong.

Any help will be greatly appreciated.

govindarajan_d · ‎01-13-2025

Hi @yash14vashistha,

If you can load data with lesser number of files, then I don't think it is a network restriction issue. As I said earlier, DF g2 is not optimized from what I experienced, so it would be best to try to build 2 dataflows - one that brings all the data into lakehouse as json files itself and the other one which actually transforms the data.

If you have experience with spark, I also recommend using that as it is more efficient.

View solution in original post

Anonymous · ‎01-21-2025

Hi @yash14vashistha

F16 capacity might not be sufficient for efficiently loading and processing 45GB of data in Dataflow Gen2 at once.
The F16 capacity provides 16 Capacity Units (CUs), which may not offer the necessary compute power for handling such a large dataset in a single run.
To ensure smoother processing, we recommend upgrading to the F64 capacity, which provides 64 CUs and is better suited for larger, more demanding workloads
For more information please refer to the following link:
Understand the metrics app compute page - Microsoft Fabric | Microsoft Learn

If this solution helps, please consider giving us Kudos and accepting it as the solution so that it may assist other members in the community.

Thank you.

View solution in original post

Anonymous · ‎01-21-2025

Hi @yash14vashistha

F16 capacity might not be sufficient for efficiently loading and processing 45GB of data in Dataflow Gen2 at once.
The F16 capacity provides 16 Capacity Units (CUs), which may not offer the necessary compute power for handling such a large dataset in a single run.
To ensure smoother processing, we recommend upgrading to the F64 capacity, which provides 64 CUs and is better suited for larger, more demanding workloads
For more information please refer to the following link:
Understand the metrics app compute page - Microsoft Fabric | Microsoft Learn

If this solution helps, please consider giving us Kudos and accepting it as the solution so that it may assist other members in the community.

Thank you.

Anonymous · ‎01-27-2025

Hi @yash14vashistha
I wanted to check if you had the opportunity to review the information provided. Please feel free to contact us if you have any further questions. If my response has addressed your query, please accept it as a solution and give a 'Kudos' so other members can easily find it.
Thank you.

Anonymous · ‎01-14-2025

Hi @yash14vashistha
Thank you for reaching out microsoft fabric community forum.

Could you please confirm if your query have been resolved? If they have, kindly mark the helpful response,give kudos and accept it as the solution. This will assist other community members in resolving similar issues more efficiently.
Best regards,
shamili.v

yash14vashistha · ‎01-15-2025

Hi @Anonymous ,

Unfortunately, my query still stands as we could not figure out why the dataflow errors out when there is enough capacity. I am having to manually break the data in chunks and use smaller dataflows which are consuming a lot of my time.

govindarajan_d · ‎01-09-2025

Hi @yash14vashistha,

Can you show the M-query expression in DF g2. I believe the json files are not properly transformed when writing to the table. Did you check the schema of the output table in DF g2 when configuring destination?

yash14vashistha · ‎01-09-2025

Hi @govindarajan_d ,

Thanks for your help! I am seeing an error when I run the Power Query -

A bit of background, the blob is connected to fabric using VNet gateway. More information in these links -

Create virtual network (VNet) data gateways | Microsoft Learn

Use VNet data gateways with Dataflow Gen2 in Fabric | Microsoft Learn

I believe that the gateway is running out of memory but I do not know how to fix this.

Please see the M Query expression for the Query I am bringing in from the blob storage that contains MGDC Files dataset JSON files. I am only bringing in 900 files from the blob that cintains 9500 files. I am using a helper query to expand te hierarchy of the JSON file. Note that I have used the exact same query (minus the filtering of 900 files) when I brought in 2 files. I did check the schema of the output and it seemed fine, again same as the other 2 queries I ran earlier.

Query for 900 Files

let
Source = AzureStorage.Blobs("https://abc.blob.core.windows.net/"),
#"Navigation 1" = Source{[Name = "dataset"]}[Data],
#"Inserted first characters" = Table.AddColumn(#"Navigation 1", "First characters", each Text.Start([Name], 10), type text),
#"Filtered rows" = Table.SelectRows(#"Inserted first characters", each ([Name] <> "metadata" and [Name] <> "metadata/JobMetadata" and [Name] <> "metadata/JobMetadata/part-00000-5923a466-66b9-477c-9961-fa5d93c955f5-c000.json" and [Name] <> "metadata/JobMetadata/part-00000-6dc9fc0d-7465-4e7e-919c-61084ce0a9a7-c000.json" and [Name] <> "metadata/JobMetadata/part-00000-9347ec14-1fc2-42d2-8901-c3580cd096b5-c000.json" and [Name] <> "metadata/JobMetadata/part-00000-97913f6c-035d-4206-a797-0c5918c657c6-c000.json" and [Name] <> "part-00000-14915d58-2280-45ad-950b-0e12ed81a720-c000.json" and [Name] <> "part-00000-1c74c82d-6013-496d-a07d-d97c718f94ea-c000.json" and [Name] <> "part-00000-37e778da-4e0c-44ea-a944-9e883b08740e-c000.json" and [Name] <> "part-00000-50acc675-7694-4d64-8231-275e457e6f4b-c000.json")),
#"Split column by delimiter 1" = Table.SplitColumn(#"Filtered rows", "First characters", Splitter.SplitTextByDelimiter("-"), {"First characters.1", "First characters.2"}),
#"Changed column type" = Table.TransformColumnTypes(#"Split column by delimiter 1", {{"First characters.2", Int64.Type}}),
#"Filtered rows 1" = Table.SelectRows(#"Changed column type", each [First characters.2] >= 4 and [First characters.2] <= 903),
#"Removed columns" = Table.RemoveColumns(#"Filtered rows 1", {"First characters.2", "First characters.1"}),
#"Invoked custom function" = Table.AddColumn(#"Removed columns", "Invoked custom function", each Query([Content])),
#"Expanded Invoked custom function" = Table.ExpandTableColumn(#"Invoked custom function", "Invoked custom function", {"ptenant", "SiteId", "WebId", "ListId", "ItemId", "ScopeId", "SiteUrl", "DirName", "FileName", "Extension", "WebTemplateId", "ListServerTemplate", "IsLabelEncrypted", "MajorVersion", "MinorVersion", "SensitivityLabelInfo.DisplayName", "SensitivityLabelInfo.Id", "SizeInBytes", "SizeInBytesWithVersions", "TimeCreated", "TimeLastModified", "Author.Email", "Author.Name", "ModifiedBy.Email", "ModifiedBy.Name", "SnapshotDate", "Operation", "QuickXorHash"}, {"ptenant", "SiteId", "WebId", "ListId", "ItemId", "ScopeId", "SiteUrl", "DirName", "FileName", "Extension.1", "WebTemplateId", "ListServerTemplate", "IsLabelEncrypted", "MajorVersion", "MinorVersion", "SensitivityLabelInfo.DisplayName", "SensitivityLabelInfo.Id", "SizeInBytes", "SizeInBytesWithVersions", "TimeCreated", "TimeLastModified", "Author.Email", "Author.Name", "ModifiedBy.Email", "ModifiedBy.Name", "SnapshotDate", "Operation", "QuickXorHash"}),
Custom = Table.TransformColumnTypes(#"Expanded Invoked custom function", {{"ptenant", type text}, {"SiteId", type text}, {"WebId", type text}, {"ListId", type text}, {"ItemId", type text}, {"ScopeId", type text}, {"SiteUrl", type text}, {"DirName", type text}, {"FileName", type text}, {"Extension.1", type text}, {"WebTemplateId", type number}, {"ListServerTemplate", type text}, {"IsLabelEncrypted", type logical}, {"MajorVersion", type number}, {"MinorVersion", type number}, {"SensitivityLabelInfo.DisplayName", type text}, {"SensitivityLabelInfo.Id", type text}, {"SizeInBytes", type number}, {"SizeInBytesWithVersions", type number}, {"TimeCreated", type datetime}, {"TimeLastModified", type datetime}, {"Author.Email", type text}, {"Author.Name", type text}, {"ModifiedBy.Email", type text}, {"ModifiedBy.Name", type text}, {"SnapshotDate", type datetime}, {"Operation", type text}, {"QuickXorHash", type text}})
in
Custom

Helper query to expand the hierarchy of JSON file

let
Source = (FilesData as binary) => let
Source = Table.FromColumns({Lines.FromBinary(FilesData, null, null)}),
#"Transformed Column" = Table.TransformColumns(Source, {"Column1", Json.Document}),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Transformed Column", "Column1", {"ptenant", "SiteId", "WebId", "ListId", "ItemId", "ScopeId", "SiteUrl", "DirName", "FileName", "Extension", "WebTemplateId", "ListServerTemplate", "IsLabelEncrypted", "MajorVersion", "MinorVersion", "SensitivityLabelInfo", "SizeInBytes", "SizeInBytesWithVersions", "TimeCreated", "TimeLastModified", "Author", "ModifiedBy", "SnapshotDate", "Operation", "QuickXorHash"}, {"ptenant", "SiteId", "WebId", "ListId", "ItemId", "ScopeId", "SiteUrl", "DirName", "FileName", "Extension", "WebTemplateId", "ListServerTemplate", "IsLabelEncrypted", "MajorVersion", "MinorVersion", "SensitivityLabelInfo", "SizeInBytes", "SizeInBytesWithVersions", "TimeCreated", "TimeLastModified", "Author", "ModifiedBy", "SnapshotDate", "Operation", "QuickXorHash"}),
#"Expanded Author" = Table.ExpandRecordColumn(#"Expanded Column1", "Author", {"Email", "Name"}, {"Author.Email", "Author.Name"}),
#"Expanded ModifiedBy" = Table.ExpandRecordColumn(#"Expanded Author", "ModifiedBy", {"Email", "Name"}, {"ModifiedBy.Email", "ModifiedBy.Name"}),
#"Expanded SensitivityLabelInfo" = Table.ExpandRecordColumn(#"Expanded ModifiedBy", "SensitivityLabelInfo", {"DisplayName", "Id"}, {"SensitivityLabelInfo.DisplayName", "SensitivityLabelInfo.Id"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded SensitivityLabelInfo",{ {"TimeCreated", type datetime}, {"ptenant", type text}, {"SiteId", type text}, {"WebId", type text}, {"ListId", type text}, {"ItemId", type text}, {"ScopeId", type text}, {"SiteUrl", type text}, {"DirName", type text}, {"FileName", type text}, {"Extension", type text}, {"Operation", type text}, {"WebTemplateId", Int64.Type}})
in
#"Changed Type"
in
Source

govindarajan_d · ‎01-09-2025

Hi @yash14vashistha,

You might want to remove the URL in the AzureStorage.Blobs.

From the error message I see, the cache is overloaded. What capacity you are running and on an average how large are these files?

And also, Dataflow gen 2 does not provide the most optimized code for running transformations. Can you split the DF g2 into two dataflows - one for copying the data alone to the lakehouse and the other one which actually does the transformation on top of the files in lakehouse?

yash14vashistha · ‎01-09-2025

Hi @govindarajan_d ,

Thanks for your quick response!

The client has F16 license, or so they tell me. The average size of the file is about 50 MB each which amount to 900*50 = 45GB

If I do not run the helper query, the JSON file is not expanded correctly and throws an error loading the file. Hence, the data will not even load in the Lakehouse table. This is why need to helper query.

govindarajan_d · ‎01-09-2025

Hi @yash14vashistha,

45 GB of data? Hmm, F16 should be Ok to handle that amount of data, but I would suggest trying to scale up the capacity to F64 or higher during the time of running of dataflow and see if that works.

Also, see if just loading 50-60 files work. If it does, then it might be a capacity limitation issue and you might have to scale up during the processing time. You can also check Capacity Metrics App to see if there is any throttling.

As I said before, DF g2 does not produce the best optimized code for GUI transformations. You can try adding the Azure Storage as a shortcut to Lakehouse and then using Spark for JSON transformations. That way it would be more performant.

yash14vashistha · ‎01-10-2025

Thanks @govindarajan_d

Could it also be the ports/memory associated with the VNet when we first set it up in Azure? Because I am using a VNet gateway to connect to blob storage. ALso, because the error message I received was this -

Files900_WriteToDataDestination: There was a problem refreshing the dataflow: 'Couldn't refresh the entity because of an issue with the mashup document MashupException.Error: The evaluation reached the allowed cache entry size limit. Try increasing the allowed cache size. Details: GatewayObjectId: 3f5b77e3-7116-44aa-87eb-b01e4ac664c8'. Error code: Mashup Exception Error. (Request ID: 1f501f08-3c5e-4234-947a-cc803e289d66).

govindarajan_d · ‎01-10-2025

Hi @yash14vashistha,

Can you try loading just one file instead of all 900 files and see if you get an error?

yash14vashistha · ‎01-10-2025

Hey @govindarajan_d ,

I had created 2 dataflows previously one with 1 file and another with 2 files and both of them ran perfectly fine.

I was also able tohave the client run the Fabric Capacity Metrics app and it seems that the dataflow ran well below the CU limit. This would mean that the dataflow was not overburdening the capacity.

Any help/ideas will be greatly appreciated

govindarajan_d · ‎01-13-2025

Hi @yash14vashistha,

If you can load data with lesser number of files, then I don't think it is a network restriction issue. As I said earlier, DF g2 is not optimized from what I experienced, so it would be best to try to build 2 dataflows - one that brings all the data into lakehouse as json files itself and the other one which actually transforms the data.

If you have experience with spark, I also recommend using that as it is more efficient.

yash14vashistha · ‎01-15-2025

I see. Thanks @govindarajan_d

Microsoft Fabric Lakehouse shows blank data

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Microsoft Fabric Lakehouse shows blank data

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025