Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers!
Enter the sweepstakes now!Preparing for a certification exam? Ask exam experts all your questions on May 15th. Register now.
I have a huge problem and I can't seem to find a solution.
I'm implementing some machine learning algorithms through a Python script inside a PowerQuery query in order to analyze a csv of data.
I need to chain the output (alone) of this query (original query) to another query (derivate query) for further analysis, and I need to have both the original and derivate queries as tables in a report.
The problem is that, the original query's script, uses some stochastic functions, (random sampling and stochastic ML models) so the results of the query change at every execution and are never the same.
When I try to reference another query on the result of the script, the two queries don't have the same results, probably because the derivate query, doesn't simply copy the values of the original query but re-executes the original query too, and this doesn't allow me to use them both inside a report since they're presenting 2 different sets of results.
Is there a way to reference only the output of a query, or to treat a query as a View or static Table, and avoid re-executing the whole set of instrunctions and scripts inside the original query?
An easy example that anybody can replicate, here I generate a random dataframe with this python code;
import numpy as np
import pandas as pd
X=pd.DataFrame(np.random.randint(0,10,(3,3)))
As you can see, if I reference the scripted query, the derivate table is different because it re-executes the script and doesn't simply copy the values.
@PhilipTreacy @Anonymous
Yesterday I saw that if we use @PhilipTreacy's solution with the [dataset= .... ] argument in the Python.Execute then it stops working and the generated numbers are new in every step within the same query . Obviously we do not need that argument here but I was rather surprised at the big change. Any thoughts?
let
Initial = #table({"Col1"},{{1}}),
Source = Python.Execute("import numpy as np#(lf)import pandas as pd#(lf)#(lf)X=pd.DataFrame(np.random.randint(0,10,(3,3)))", [dataset=#"Initial"]),
X1 = Source{[Name="X"]}[Value],
#"Changed Type" = Table.TransformColumnTypes(X1,{{"0", Int64.Type}, {"1", Int64.Type}, {"2", Int64.Type}}),
Py = Table.FromColumns(Table.ToColumns(#"Changed Type"))
in
Py
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
I've been using PowerBI for only 3 weeks, I'm far from being an expert, but my guess is, when we load data from powerquery it probably has some in-built system that updates the source, while when we do it from Python it goes out of reach.
Cause at this point the only reason I can see of why it re-executes the query is that it's for reloading the data source, in Python maybe we've bypassed that.
Still, in the first example I generated the data inside the script, and it re-executed the whole thing anyway.
Hi @Anonymous
So reading the csv in with Python and then using this fixes the issue?
Py = Table.FromColumns(Table.ToColumns(#"Changed Type"))
Regards
Phil
Proud to be a Super User!
If I read the csv inside the python script, it actually works even without the
Py = Table.FromColumns(Table.ToColumns(#"Changed Type"))
Your example though made me think that the problem was probably in the dataset argument of the script:
let
Source = "",
#"Run Python script" = Python.Execute("# 'dataset' holds the input data for this script#(lf)import numpy as np#(lf)import pandas as pd#(lf)#(lf)df=pd.read_csv(r""C:\...\file.csv"",sep=';')#(lf)X=df.sample(n=100)"),
X = #"Run Python script"{[Name="X"]}[Value],
#"Changed Type" = Table.TransformColumnTypes(X,{{"CustomerID", type text}, {"Assignments", Int64.Type}, {"Recency", Int64.Type}, {"Frequency", Int64.Type}, {"Monetary", Int64.Type}})
in
#"Changed Type"
With this code I can apparently reference the same random sample on different secondary queries.
Hi @AlB
Not sure, maybe to do with lazy evaluation? After the copy of the Python output table is created then used as the query result, PBI/PQ figures it doesn't need to refer to any previous steps to provide that data, so the Python code isn't re-executed.
Cheers
Phil
Proud to be a Super User!
That's interesting. Any idea why it works doing it like that?
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Hi @Anonymous
This worked for me. Create a copy of the Python results table and use that as the query result.
let
Source = Python.Execute("import numpy as np#(lf)import pandas as pd#(lf)#(lf)X=pd.DataFrame(np.random.randint(0,10,(3,3)))"),
X1 = Source{[Name="X"]}[Value],
#"Changed Type" = Table.TransformColumnTypes(X1,{{"0", Int64.Type}, {"1", Int64.Type}, {"2", Int64.Type}}),
Py = Table.FromColumns(Table.ToColumns(#"Changed Type"))
in
Py
You can then reference the output of this query without re-executing the Python script.
Regards
Phil
If I answered your question please mark my post as the solution.
If my answer helped solve your problem, give it a kudos by clicking on the Thumbs Up.
Proud to be a Super User!
Thank you very much, this solved partially the problem, it does indeed work in this example case.
But then I tried to apply it to the real case, in which I load a csv of data through PowerQuery, and thus the Source step reads a csv and in the python script step there's a "[dataset=Source]" argument, in this case the problem persists even if I uncheck the "Enable load" option, it re-executes the original query and all the stochastic functions give different results.
Source = Csv.Document(File.Contents("C:\....\file.csv"),[Delimiter=",", Columns=9, Encoding=1252, QuoteStyle=QuoteStyle.None]),
...
#"Run Python script" = Python.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)import pandas as pd#(lf)import numpy as np#(lf)#(lf)DF=dataset.sample(n=100)",[dataset=#"Changed Type"]),
But I also noticed that I can bypass this by reading the csv directly inside the python script (pd.read_csv()) instead of setting it as the source through PowerQuery, leaving the Source blank, (Source='') and having no [dataset=Source] argument in the python script.
Source = "",
#"Run Python script" = Python.Execute("# 'dataset' holds the input data for this script#(lf)import numpy as np#(lf)import pandas as pd#(lf)#(lf)df=pd.read_csv(r""C:\...\file.csv"",sep=';')#(lf)X=df.sample(n=100)"),
I haven't tried this suggestion, but here is one other configuration to explore. You could generate both of your tables within the same query (i.e., as two separate variables then in the last step have them in Table form in separate columns or rows). Disable load on that table, but then reference it two new queries where you can separately expand each table. If you uncheck parallel loading and refresh all, you may get your desired result.
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Hi @Anonymous
Interesting. I'm afraid it won't work with the reference to the original query. The Python script will get executed again. Check this out: https://social.technet.microsoft.com/Forums/en-US/8d5ee632-fdff-4ba2-b150-bb3591f955fb/queries-evaluation-chain
You could:
1. Execute the Python script outside of PQ, instead of having PQ invoke it directly and just read the static values into PQ or
2. Work with both tables (and do all the processing) in the same query. Something like:
let
Source = "",
#"Run Python script" = Python.Execute("import numpy as np#(lf)import pandas as pd#(lf)#(lf)X=pd.DataFrame(np.random.randint(0,10,(3,3)))#(lf)"),
#"Expanded Value" = Table.ExpandTableColumn(#"Run Python script", "Value", {"0", "1", "2"}, {"Col0", "Col1", "Col2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Value",{"Name"}),
Table1_ = #"Removed Columns",
Table2_ = Table1_
in
Table2_
You can work on Table1_ and Table2_ from the last step on, within the same query
Perhaps these knowledgeable people have other ideas @ImkeF @edhans @Jimmy801 @mahoneypat @PhilipTreacy
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
Check out the April 2025 Power BI update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
User | Count |
---|---|
76 | |
72 | |
69 | |
48 | |
40 |
User | Count |
---|---|
61 | |
41 | |
33 | |
31 | |
28 |