Solved: Understanding data refresh and dataset sizes Pro w...

jbrooi · ‎12-05-2023

Hi!

I have several datasets/semantic models, which have web connectors. They get data from an API and I paginated all querys. I created the dataset in PBI Desktop and published dataset to the service in a Pro workspace and scheduled refresh.

Errors happened, pointing to specific querys and I always engineered solutions expecting the error to be coming from bad querys and/or a not so very well performing API.

In one case I engineered a new pbix for a dataset which is performing okay, and the older one is not refreshing at all. The size is the same though, both are 3 MB if I check the dataset storage size in the service.

I've tested if the data refresh would perform better in a Fabric trial workspace. It runs succesful as well, but it takes the same time. Makes sense I guess, the big depending factor being the API.

The strange thing is, the dataset size in the Pro workspace is 3 MB, yet in Fabric is 14 MB. For the exact same file. Can someone help me understand?

Only difference I see is that 'Large semantic model storage format' is turned on for the dataset in Fabric workspace.

Thanks.

ibarrau · ‎12-05-2023

Hi. I don't think there is a doc specifying the difference between the engine behind shared capacity and dedicated capacity with large semantic model storage format enabled.

If you are sure the amount of data is the same and the Fabric one hasn't added more data, then my guess would be that a dedicated capacity is a more complex semantic model that has all analysis services features like XMLA endpoints that allows you to deal with much more complex operations. That would make an increase of the data model size.

I hope that helps,

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Happy to help!

LaDataWeb Blog