Solved: Re: SQL Statement to query a Power BI Dataflow

Anonymous · ‎11-15-2021

Hello everyone

I've looked everywhere for an answer to this, I couldn't find a working example.

I have a very large Power BI Dataflow set up, will all Sales history over 10 years. My report, that I build on PBI desktop, only looks for the Sales of the current year.

Right now, using the Power Bi Dataflow connector, I can't find the place to write my SQL statement like "SELECT order_id FROM Sales WHERE Sales_date = 2021"

The M Code in Advanced Editor is this :

let
Source = PowerBI.Dataflows([]),
#"xxxxxxxxxxxxxxxx" = Source{[workspaceId="xxxxxxxxxxxxxxxx"]}[Data],
#"yyyyyyyyyyyyyyyy" = #"xxxxxxxxxxxxxxxx"{[dataflowId="yyyyyyyyyyyyyyyy"]}[Data],
#"Sales" = #"yyyyyyyyyyyyyyyy"{[entity="Sales"]}[Data]

Right now, PBI is pulling the whole history and all the columns of the table Sales Orders, and that takes almost 1 hour to perform Join in the subsequent steps.

Isn't there a way to amend the M code in order to pass a SQL statement to limit the amount of data transfered ?

Thanks a lot,

A

AlexisOlson · ‎11-16-2021

You should be able to add a filter step in your query editor to select just 2021 data. I don't think the dataflow necessarily uses SQL but that shouldn't matter.

Your query will look something like this with the new step.

let
    Source = PowerBI.Dataflows([]),
    #"xxxxxxxxxxxxxxxx" = Source{[workspaceId="xxxxxxxxxxxxxxxx"]}[Data],
    #"yyyyyyyyyyyyyyyy" = #"xxxxxxxxxxxxxxxx"{[dataflowId="yyyyyyyyyyyyyyyy"]}[Data],
    #"Sales" = #"yyyyyyyyyyyyyyyy"{[entity="Sales"]}[Data],
    #"Filtered Rows" = Table.SelectRows(#"Sales", each [sales_date] >= #date(2021, 1, 1))
in
    #"Filtered Rows"

View solution in original post

FireFighter1017 · ‎01-08-2025

In order to run SQL statements, you need a database engine to run those statements.

A Dataflow is not storing data in a SQL database. You can see that when you eexport your Dataflow in a JSON file on tag "ppdf:outputFileFormat".

Dataflow Gen1 is using csv files.

Dataglow Gen2 is using Apache Parquet files.

If you can figure out a way to connect to the files generated by Gen2 Dataflows, You can run SQL statement on Parquet files by using Apache Spark SQL.

AlexisOlson · ‎11-16-2021

You should be able to add a filter step in your query editor to select just 2021 data. I don't think the dataflow necessarily uses SQL but that shouldn't matter.

Your query will look something like this with the new step.

let
    Source = PowerBI.Dataflows([]),
    #"xxxxxxxxxxxxxxxx" = Source{[workspaceId="xxxxxxxxxxxxxxxx"]}[Data],
    #"yyyyyyyyyyyyyyyy" = #"xxxxxxxxxxxxxxxx"{[dataflowId="yyyyyyyyyyyyyyyy"]}[Data],
    #"Sales" = #"yyyyyyyyyyyyyyyy"{[entity="Sales"]}[Data],
    #"Filtered Rows" = Table.SelectRows(#"Sales", each [sales_date] >= #date(2021, 1, 1))
in
    #"Filtered Rows"