Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hello everyone
I've looked everywhere for an answer to this, I couldn't find a working example.
I have a very large Power BI Dataflow set up, will all Sales history over 10 years. My report, that I build on PBI desktop, only looks for the Sales of the current year.
Right now, using the Power Bi Dataflow connector, I can't find the place to write my SQL statement like "SELECT order_id FROM Sales WHERE Sales_date = 2021"
The M Code in Advanced Editor is this :
let
Source = PowerBI.Dataflows([]),
#"xxxxxxxxxxxxxxxx" = Source{[workspaceId="xxxxxxxxxxxxxxxx"]}[Data],
#"yyyyyyyyyyyyyyyy" = #"xxxxxxxxxxxxxxxx"{[dataflowId="yyyyyyyyyyyyyyyy"]}[Data],
#"Sales" = #"yyyyyyyyyyyyyyyy"{[entity="Sales"]}[Data]
Right now, PBI is pulling the whole history and all the columns of the table Sales Orders, and that takes almost 1 hour to perform Join in the subsequent steps.
Isn't there a way to amend the M code in order to pass a SQL statement to limit the amount of data transfered ?
Thanks a lot,
A
Solved! Go to Solution.
You should be able to add a filter step in your query editor to select just 2021 data. I don't think the dataflow necessarily uses SQL but that shouldn't matter.
Your query will look something like this with the new step.
let
Source = PowerBI.Dataflows([]),
#"xxxxxxxxxxxxxxxx" = Source{[workspaceId="xxxxxxxxxxxxxxxx"]}[Data],
#"yyyyyyyyyyyyyyyy" = #"xxxxxxxxxxxxxxxx"{[dataflowId="yyyyyyyyyyyyyyyy"]}[Data],
#"Sales" = #"yyyyyyyyyyyyyyyy"{[entity="Sales"]}[Data],
#"Filtered Rows" = Table.SelectRows(#"Sales", each [sales_date] >= #date(2021, 1, 1))
in
#"Filtered Rows"
In order to run SQL statements, you need a database engine to run those statements.
A Dataflow is not storing data in a SQL database. You can see that when you eexport your Dataflow in a JSON file on tag "ppdf:outputFileFormat".
Dataflow Gen1 is using csv files.
Dataglow Gen2 is using Apache Parquet files.
If you can figure out a way to connect to the files generated by Gen2 Dataflows, You can run SQL statement on Parquet files by using Apache Spark SQL.
You should be able to add a filter step in your query editor to select just 2021 data. I don't think the dataflow necessarily uses SQL but that shouldn't matter.
Your query will look something like this with the new step.
let
Source = PowerBI.Dataflows([]),
#"xxxxxxxxxxxxxxxx" = Source{[workspaceId="xxxxxxxxxxxxxxxx"]}[Data],
#"yyyyyyyyyyyyyyyy" = #"xxxxxxxxxxxxxxxx"{[dataflowId="yyyyyyyyyyyyyyyy"]}[Data],
#"Sales" = #"yyyyyyyyyyyyyyyy"{[entity="Sales"]}[Data],
#"Filtered Rows" = Table.SelectRows(#"Sales", each [sales_date] >= #date(2021, 1, 1))
in
#"Filtered Rows"
Thanks @amitchandak
Does that mean it's impossible to send a complex query with JOIN etc, to be executed on the Dataflows side before the data is sent back to Power BI ?
@Anonymous , You can use M parameter but that will limit data, can only be changed in the deployment pipeline
refer
User | Count |
---|---|
61 | |
59 | |
56 | |
38 | |
29 |
User | Count |
---|---|
82 | |
62 | |
45 | |
40 | |
39 |