Re: pbip - Great progress, but where is this heade...

andyclap · ‎08-07-2023

I'm a software developer primarily, and understand how managing change across complex projects is enabled by modern vcs (i.e. git).

As such I've been following several attempts to enable proper vcs integration across power bi reports via serialzation to text format to allow diff & merge.

Starting with the original python pbit extractor, then Mathias Thierbach's excellent pbi-tools (https://github.com/pbi-tools/pbi-tools) - there have been various attempts, unfortunately complicated by the underlying serialization formats.

Now we have the pbip format, this is getting closer to enabling complex change in power bi - i.e. diff & merge - which is great!

The team has done excellent work on stabilizing ids and removing derrived elements that just add noise to the diff.

However, right now there are a few show-stoppers that if resolved would finish the job and take power-bi report development to the next level:

* Single file approach:

* A complex report likely contains may tabs/pages.

* Having these reports all in one file means that file churns

* to paraphrase SRP: "A file should only have one reason to change"

* Likewise a complex report likely contains may tables/data sources

* (and simliarly but slightly less complex - the datasetDiagramLayout)

* I see a big benefit to splitting report.json by section; splitting model.bim by expression

* Serialization of complex json structures into strings

* It's simply unfeasible to diff & merge a single line string value as complex as these

* visual properties in each visuals' config property are around 2K long each, and are almost-impossible to diff & merge.

* Bookmarks in report.json's /config property is unmanageable - one of my report's config line is 600K long, and expands when reserailized to 61K lines of json! This is big enough to warrant splitting into files by bookmark let alone serializing.

* Ideally these would be serialized as proper subdocument properties.

* Minor things

* json arrays are ordered, you don't need ordinals: if you reorder anything every node has a change.

* why the blank first line in multi-line measure expressions?

My actual question here is are these likely to be addressed by the pbi team? Is there an open roadmap for this?

Or will it be worth investing time to build out further transformation tooling ourselves to process the pbip files and re-serialize to a format that is closer to the goal of clean and precise diff & merge?