How To Convert a Power BI Dataset To a Power BI Da...

Nolock · ‎02-02-2020

This post is about a tool which converts a Power BI dataset to a Power BI Dataflow. I have analyzed the internals of PBIT files and Power BI Dataflow JSON files in depth and created a PowerShell script which converts any PBIT into Power BI Dataflow JSON.

TL; DR

https://github.com/nolockcz/PowerPlatform/tree/master/PBIT%20to%20Dataflow
However, I personally recommend reading the article once before you use it in your project.

Research

I have been searching for a conversion tool for a long time. The only solution I have found was a manual conversion like in this blog post of @MattAllington or this post of Reza Rad.
Also, I have recently studied the internals of the PBIT/PBIX file and I have tried to extract the maximum of it. And then there was only one step further to analyze the structure of a Power BI Dataflow JSON file.

Motivation

I wanted to have a script which does all the repetitive work for me. I do not like the kind of assembly-line-work in IT!

High-level description

The mighty tool I am talking about is absolutely no magic. It parses Power Query queries, their names, Power Query Editor groups, and some additional properties from a PBIT file. Then, it transforms all the parsed information into a form which is used by Power BI Dataflows. It is a JSON file used for import/export of dataflows. An example of such a file follows:

{
    "name": "Migration Test",
    "description": "",
    "version": "1.0",
    "culture": "de-DE",
    "modifiedTime": "2019-12-04T10:15:10.9208101+00:00",
    "pbi:mashup": {
        "fastCombine": false,
        "allowNativeQueries": false,
        "queriesMetadata": {
            "Test Tabelle 1": {
                "queryId": "c366b94f-cf0b-4cf2-badd-8cae128741a2",
                "queryName": "Test Tabelle 1",
                "queryGroupId": "96d1e7ce-8815-4582-9e5e-592fb7ac51cd",
                "loadEnabled": true
            },
            "Test Tabelle 2": {
                "queryId": "2b7d7845-06ba-4348-ab7d-107905d8e40a",
                "queryName": "Test Tabelle 2",
                "loadEnabled": true
            }
        },
        "document": "section Section1;\r\nshared #\"Test Tabelle 1\" = let\r\n  Quelle = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(\"i45WMlSK1YlWMgKTxkqxsQA=\", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#\"Spalte 1\" = _t]),\r\n  #\"Geänderter Spaltentyp\" = Table.TransformColumnTypes(Quelle, {{\"Spalte 1\", Int64.Type}})\r\nin\r\n  #\"Geänderter Spaltentyp\";\r\nshared #\"Test Tabelle 2\" = let\r\n  Quelle = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText(\"i45WMlSK1YlWMgKTxkqxsQA=\", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#\"Spalte 1\" = _t]),\r\n  #\"Geänderter Spaltentyp\" = Table.TransformColumnTypes(Quelle, {{\"Spalte 1\", Int64.Type}})\r\nin\r\n  #\"Geänderter Spaltentyp\";\r\n"
    },
    "annotations": [
        {
            "name": "pbi:QueryGroups",
            "value": "[{\"id\":\"96d1e7ce-8815-4582-9e5e-592fb7ac51cd\",\"name\":\"a\",\"description\":null,\"parentId\":null,\"order\":0}]"
        }
    ],
    "entities": [
        {
            "$type": "LocalEntity",
            "name": "Test Tabelle 1",
            "description": "",
            "pbi:refreshPolicy": {
                "$type": "FullRefreshPolicy",
                "location": "Test%20Tabelle%201.csv"
            },
            "attributes": [
                {
                    "name": "Spalte 1",
                    "dataType": "int64"
                }
            ]
        },
        {
            "$type": "LocalEntity",
            "name": "Test Tabelle 2",
            "description": "",
            "pbi:refreshPolicy": {
                "$type": "FullRefreshPolicy",
                "location": "Test%20Tabelle%202.csv"
            },
            "attributes": [
                {
                    "name": "Spalte 1",
                    "dataType": "int64"
                }
            ]
        }
    ]
}

Low-level description

The low-level description is the PowerShell code itself. I have documented every single line and I hope it is understandable for everybody. You can download it on https://github.com/nolockcz/PowerPlatform/tree/master/PBIT%20to%20Dataflow.

In this project, I use the files DataMashup and DataModelSchema. When you open the file DataMashup, you only see some binary text.

But when you scroll to the right you see there is an XML object. That is the part of the file I am interested in. It contains all the Power Query queries and their properties.

The second file, DataModelSchema, is a JSON file. It contains all tables and their columns which are loaded into the tabular model.

There are also columns properties but many of them, like summarizeBy or Format, are important for the Power BI model but not for a dataflow. The only important property is the data type of a column. The rest can be ignored.

How to use the script

The script is written in PowerShell 5.1. There are a plenty of functions defined at the beginning. The start of the execution is in the end of the script.

But first, navigate to the directory where your PBIT file is stored. Then go to the end of the script and change the variable $fileName to the name of your PBIT file. The output file will be generated in the same directory with a name of your PBIT file + “.json”. You can change the name if needed, too. The last line is the call of the function GenerateMigrationString. Its return value is then saved to the output file.

# name of the input PBIT file
$fileName = "BaseIT Dataset v1.2.pbit"

# name of the output JSON file
$jsonOutputFileName = $fileName + ".json"

# generate the migration string from a PBIT file
GenerateMigrationString($fileName) | Out-File $jsonOutputFileName -Encoding utf8

The last step is an import into Power BI Dataflows as you can see in the following screenshot.

I have tested the code with a huge dataset having over 300 complex queries in its ETL process.

And the working result in Power BI Dataflows:

Limitations

I would like to describe some limitations of Power BI source files and Power BI Dataflows.

Group names and group hierarchy

While analyzing the structure of a PBIT/PBIX file, I found out that I can parse a group ID of a Power Query Group, but not its name. Moreover, I could not read the hierarchy of groups.

These both properties are stored encrypted in the file DataMashup, as you can see on the following screenshot.

I have tried to decode it with a Base64 decoder, but I got only a binary object. My next idea was to check if it is an encoded table like in Power Query Enter Data Explained. Also not working. If somebody has an idea, how to decode and interpret the group names and the group hierarchy, please let me know.

The exact order of properties

There were some stumbling stones during the development. One of them is an order of properties. Do not ask me why, but sometimes the order of properties in the dataflow JSON import file plays a role. If you do not keep the exact order, the import file is rejected by Power BI Dataflow. I worked with objects which are serialized to JSON. At the beginning, I did not know how to force the JSON serializer to generate properties in an exact order. The solution was using the Add-Member method.

#shared

Do you know the record #shared? It contains all built-in and custom functions and all your custom queries. More about that for example here. The problem is this record works in Power BI Desktop only and cannot be used in Power BI Service. The PowerShell script ignores all queries containing the keyword #shared and writes a warning like “WARNING: The query 'Record Table' uses the record #shared. This is not allowed in Power BI Dataflows and the query won't be migrated.”

Multiline comments in Power BI Dataflows

The Power BI Dataflows do not support multiline comments at the time of writing the article. There is already an official issue and the bug will be fixed in the near future.

Final words

I have a dataset containing an ETL process with more than 300 queries. If I wanted to migrate this dataset manually into Power BI Dataflows, it would take hours or even days. Thanks to this script, the job is done in minutes. And every single next dataset, too 😊

ImkeF · ‎03-11-2020

Hi Michal (@Nolock),

this is super aweseome!!

Thanks so much for sharing!

/Imke

FireFighter1017 · ‎09-11-2020

That looks pretty cool. Every thought of doing it in Python or R so we can use it within PowerBI Desktop itself? Both are powerfull languages to process huge xml files and convert them into something else. they both have parsing packages that allows you to query XML tags.

That sound like a little side project I could do.

Nolock · ‎09-11-2020

Hello @FireFighter1017,

well, I'd rather wait for the GA of the new enhanced metadata model because my method, unfortunately, won't work with it any more.

FireFighter1017 · ‎09-14-2020

Oh... we'll wait and see then!

Let me know if you need help with R. I'm not so familiar with Python but I've been programming with R for 2-3 years now.

GJA · ‎09-24-2020

Hi,

It's a very good job, thank you for the sharing.

Sadry · ‎09-24-2020

Hi, thank you for your work it is exactly what I am looking for.

However, I get an empty json file when I run the script.

Do you have an idea of what I shoul do to correct that ?

Thx

GJA · ‎09-24-2020

@Sadry I had the same problem and I deleted the folders which grouped my queries. Now without the folders it works

Sadry · ‎09-24-2020

@GJA I tried and even without the folder it doesn't work.

So I tried by deleting parameters but in that case, the DataMashup file in the PBIT file was missing and the script returned an error.

Si finally I tried without the folders and with a useless parameter just to be able to have the DataMashup file.

Unfortunately, it's the same result, a empty JSON file.

GJA · ‎09-24-2020

@Sadry I think it's because of the new metadata. I faced the same issue and spent all my day on it but I have no solution right now sorry.

Nolock · ‎09-24-2020

Hello @Sadry,

the script won't work with the new enhanced metadata model, unfortunately.

Sadry · ‎09-25-2020

Hi @Nolock thanks a lot anyway.

I found another way which is longer but did the job:

Copy and paste the queries from the power bi desktop to the dataflow.

I just finished the migration, everything working good.

Thak you again.

JBartlett_DMU · ‎01-15-2021

I'm getting the following error when I try to run your script against My-Dataset.pbit:

Get-Content: C:\AzureDevOps\Power-BI\Migration PBIT to Dataflow.ps1:109:32

Line |
109  | … leContent = Get-Content -Path "$zipDirectory\DataMashup" -Encoding UT …
     |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Cannot find path 'C:\AzureDevOps\Power-BI\My-Dataset\DataMashup' because it does not exist.

Any ideas what might be going on?

Nolock · ‎02-11-2021

Hello @JBartlett_DMU,

when I wrote this article one year ago, nobody knew that there will come a new dataset metadata format: https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-enhanced-dataset-metadata

The code works only with the old format, unfortunately.

But there is a new, easier method which you can use:

Select all queries in the Power Query Editor in Power BI Desktop.
Copy them with Ctrl+C.
Go to a blank Power BI Dataflow.
Insert all queries with Ctrl+V.

That#s all. This is much more comfortable, isn't it?

Best regards,

Nolock

JBartlett_DMU · ‎03-11-2021

Hi @Nolock,

Somehow, I never received the email notification that you had replied to my question, and in the mean time, I happened to stumble upon the solution that you suggested. Yes indeed, that is much more comfortable, I agree!

Thanks for taking the time to reply, and I hope you have a nice day!

James

pescadicto · ‎07-07-2021

Hi @Nolock,

I'm trying to follow these steps:

Select all queries in the Power Query Editor in Power BI Desktop.
Copy them with Ctrl+C.
Go to a blank Power BI Dataflow.
Insert all queries with Ctrl+V.

How can I perform "Go to a blank Power BI Dataflow" in Power BI Desktop?

Thanks!

José

UPDATE

I found it: just copy the querys in Desktop and paste them in the Power BI service in a browser. Now it seems obvious.