Re: Scanner Results missing detailed metadata for ...

AMD0791 · ‎05-13-2024

I have a powershell script that I manually run periodically to get results from the scanner api, saving the results to JSON files. I then feed those JSON files to a Dataflow and dump the results into some Fabric Lakehouse tables.

I last ran the script in February, for 132 workspaces (the script breaks them down into smaller lists to send to API). I ended up with 2 files that were 53MB and 14MB.

I ran it today for 140 workspaces, and ended up with 2 files that were 20MB and 3MB.

It looks like all the detailed metadata for the dataset tables is missing in the new files.

This is the query being run by the powershell script:

https://api.powerbi.com/v1.0/myorg/admin/workspaces/getInfo?lineage=True&datasourceDetails=True&data...

The tenant settings mentioned in this article are enabled for a security group and I am a member of that security group.

Set up metadata scanning in an organization - Microsoft Fabric | Microsoft Learn

Acroustique · ‎05-16-2025

Any luck finding a solution @AMD0791 ?

I have a similar issue where the scan only returns the datasourceUsages (and datasourceInstanceId) for a limited number of datasets, without rhyme or reason...

AMD0791 · ‎05-13-2024

After a little bit of investigation, it looks like the latest scan got detailed metatdata for some datasets but not others in the same workspace.

I'm wondering if I'm getting caught in the limitations spelled out here, but that doesn't make sense because the metadata is blank on datasets that have been refreshing daily. Run metadata scanning - Microsoft Fabric | Microsoft Learn

From the article:

semantic models that haven't been refreshed or republished will be returned in API responses but without their subartifact information and expressions. For example, semantic model name and lineage are included in the response, but not the semantic model's table and column names.
semantic models containing only DirectQuery tables will return subartifact metadata only if some sort of action has been taken on the semantic model, such as someone building a report on top of it, someone viewing a report based on it, etc."

How recently does the dataset need to be republished to get the metadata? This limitation seems like it would make it very difficult to get the initial full scan of metadata if the dataset has to be republished very recently. And in this case, I'm getting metadata for datasets last published a year ago, and not getting metadata for a dataset re-published in March.

The workspaces I'm using are in premium workspaces, so shouldn't be subject to the size limits