Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Hi,
I am new to Spark SQL, and I am wondering about the differences of these two ways to use SQL in Fabric Notebooks:
-- Method A -- (spark.sql)
-- Method B -- (%%sql magic)
I am wondering about the differences of Method A (spark.sql) and Method B (%%sql)
Is it only Method A which can be used when working with dataframes?
Is Method A better suited when you need to store the outputs of the query in a dataframe, and perform further transformations on the outputs in subsequent steps?
Method B - is this method more suitable for standalone commands (create lakehouse table, drop lakehouse table, etc.) and/or for doing ad-hoc queries?
Is it possible to store the outputs of this kind of query in a dataframe or variable in the notebook, and use it in subsequent steps in the notebook? Or we need to use Method A (spark.sql) in that case.
Are there performance differences between these two methods?
Are both methods called Spark SQL?
Do these two methods have different names, so they can be refered to separately in conversations and chats?
I am curious to learn more about the differences between these two methods.
Thank you! 😀
Thank you @Anonymous,
Your reply is really helpful and clarifying! 😀
I have a further query:
I prefer the simplicity of using the %%sql magic.
I think it's easier to write Spark SQL code in a %%sql magic cell than in a PySpark cell (spark.sql-method).
(However I realized there are some ways of making it easier to write sql inside the spark.sql-function, see next reply from me).
There is one thing I am curious about in your reply, because it seems like an attractive option:
"Method B (%%sql Magic)
Purpose: This method is a magic command used within Fabric Notebooks to execute Spark SQL queries directly within the notebook cells. It's more concise and interactive.
DataFrames: While it doesn't directly create DataFrames, you can capture the output using the display() function or assign the query to a variable like any other Python expression."
How can I capture the output by using the display() function or assigning the query to a variable?
Let's say I have this %%sql magic code in a notebook cell, how can I capture the output of this cell for use in subsequent cells?
%%sql
SELECT
SalesOrderNumber,
CustomerName,
OrderDate,
Item,
Quantity,
UnityPrice,
Tax,
Year,
AVG(UnityPrice) OVER (PARTITION BY CustomerName) AS AvgUnitPricePerCustomer
FROM orders
ORDER BY OrderDate DESC;
One method I found, which seems works for me, is to use the %%sql magic cell to create temporary view, and then I can reference this temporary view in other notebook cells. However it requires me to add another cell with spark.sql if I want to use the temporary view in further transformations.
I can do something like this to create a temporary view:
%%sql
CREATE OR REPLACE TEMPORARY VIEW vw_temporary AS
SELECT
SalesOrderNumber,
CustomerName,
OrderDate,
Item,
Quantity,
UnityPrice,
Tax,
Year,
AVG(UnityPrice) OVER (PARTITION BY CustomerName) AS AvgUnitPricePerCustomer
FROM orders
ORDER BY OrderDate DESC;
Then I can reference the temporary view inside a spark.sql() function in a PySpark cell, and easily assign it to a dataframe:
df = spark.sql("SELECT * FROM vw_temporary")
display(df)
Currently, this is the only way I have been able to use the outputs from %%sql magic code in subsequent steps. It works fine, but this way requires two notebook cells (one for the %%sql magic, and another one to assign it to a dataframe).
Thank you 😀
I just learned that enclosing the code inside triple quotes """ """ also works if I want to write multi-line SQL in a PySpark cell (spark.sql-method) 😊
I'm new to this so I didn't know that from before. That is a lot simpler than adding backslack \ at the end of each line (unless someone tells me there is a keyboard shortcut to add \ to a code block ...EDIT: I just realized it's possible to hold the ALT key on the keyboard while using the mouse to click at multiple locations in the code and then insert the backslash \)
This works:
df = spark.sql("""
SELECT *
FROM orders
""")
display(df)
This works:
df = spark.sql("\
SELECT *\
FROM orders\
")
display(df)
This will not work:
df = spark.sql("
SELECT *
FROM orders
")
display(df)
Hi @frithjof_v ,
I apologize for this statement - "I am curious how I can capture the output from the %%sql magic code by using the display() function or assigning the query to a variable."
As you said you can create temporary view and use it with pyspark cell is the only way.
Unfortunately, you cannot directly mix %%sql and PySpark code within a single cell in Fabric Notebooks.
Using triple quotes (""" """) is the recommended way to write multi-line SQL code within the spark.sql` function for better readability and easier line breaks.
Hope this is helpful. Please let me know incase of furher queries.
Thank you @Anonymous,
I am getting a clearer understanding of what are the possibilities with both methods ☺️
Hi @frithjof_v ,
I hope we have answered your query. Otherwise, will respond back with the more details and we will try to help .
@frithjof_v @Anonymous
What are the implications for memory usage when using PySpark with a dataframe and a temp view as opposed to %%sql in a Notebook?
In other words, say your view contains billions of rows. I am assuming on the backend this is somehow on disk and not in memory.
What about when you try to load this view in a dataframe? Does the Python engine manage this intelligently or will you get an out of mem error? Or is it the notebook that handles this? Or... ?
I don't have enough knowledge about this to answer, and I am curious about the same.
I can write my assumptions here, and I am hoping to be corrected if I'm wrong because these are only assumptions and this is brand new for me, but I started reading a little bit so I got some theories at this point:
I am hoping @Anonymous can answer 😀
Hi @Element115 ,
As @frithjof_v said -
Regardless of whether you use a DataFrame, temporary view, or %%sql magic, the underlying data itself resides on disk. Spark doesn't keep the entire dataset in memory by default.
When you create a DataFrame from data or a temporary view from a DataFrame/query, Spark reads the data from disk into memory only as much as needed to perform the specific operation (e.g., schema inference, initial partitioning). It doesn't load the entire dataset in memory at once.
As you perform operations (filtering, joining, aggregation) on the DataFrame or query results in a temporary view, Spark might shuffle data around or create additional intermediate datasets in memory (depending on the complexity of the operation). The amount of memory used depends on the specific operation and the size of the data being processed in that step.
Hi @frithjof_v ,
We haven’t heard from you on the last response and was just checking back to see if we answered your query.
Otherwise, will respond back with the more details and we will try to help .
Hi @frithjof_v ,
We haven’t heard from you on the last response and was just checking back to see if we answered your query.
Otherwise, will respond back with the more details and we will try to help .
Hi @frithjof_v ,
Thanks for using Fabric Community.
I'd be glad to explain the differences between Method A (spark.sql) and Method B (%%sql magic) in Spark SQL for Fabric Notebooks:
Method A (spark.sql)
Method B (%%sql Magic)
Performance:
There are generally minimal performance differences between the two methods. Both translate your queries into optimized distributed Spark operations. However, complex logic within Method A code might have a slight overhead compared to simpler %%sql queries.
Naming and Terminology:
In Summary:
Choose Method A (spark.sql) when you need to:
Use Method B (%%sql) for:
Hope this is helpful. Please let me know incase of further queries.
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.
User | Count |
---|---|
9 | |
5 | |
4 | |
3 | |
2 |