Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
pbi_taken
Helper I
Helper I

Write sql query when source is a lakehouse table in dataflow gen2

Hi,

 

I'm wondering if it's possible to write sql query to filter data with dataflow gen2 when using a table from a lakehouse in fabric?

Any good ideas on how to solve this?

Thanks!

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @pbi_taken ,

 

You can write sql queries to filter the tables in lakehouse.
First you need to create a Dataflow Gen2 then select lakehouse as the destination, publish it and find that lakehouse then create a new notebook and write sql query to filter the tables. If you want to save the filtered results you need to use python query.

vtangjiemsft_2-1732690080263.png

 

%%sql
SELECT *
FROM lakehouse_table
WHERE column_name = 'desired_value'

 

vtangjiemsft_0-1732689609636.png

%%pyspark
# Read the original table
df = spark.read.table("original_table_name")

# Apply the filter
filtered_df = df.filter(df["column_name"] == "desired_value")

# Save the filtered data as a new table
filtered_df.write.mode("overwrite").saveAsTable("original_table_name")

vtangjiemsft_1-1732689912590.png

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly. 

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Hi @pbi_taken ,

 

You can write sql queries to filter the tables in lakehouse.
First you need to create a Dataflow Gen2 then select lakehouse as the destination, publish it and find that lakehouse then create a new notebook and write sql query to filter the tables. If you want to save the filtered results you need to use python query.

vtangjiemsft_2-1732690080263.png

 

%%sql
SELECT *
FROM lakehouse_table
WHERE column_name = 'desired_value'

 

vtangjiemsft_0-1732689609636.png

%%pyspark
# Read the original table
df = spark.read.table("original_table_name")

# Apply the filter
filtered_df = df.filter(df["column_name"] == "desired_value")

# Save the filtered data as a new table
filtered_df.write.mode("overwrite").saveAsTable("original_table_name")

vtangjiemsft_1-1732689912590.png

Best Regards,

Neeko Tang

If this post  helps, then please consider Accept it as the solution  to help the other members find it more quickly. 

Poojara_D12
Super User
Super User

Hi @pbi_taken 

Yes, you can filter data in a Dataflow Gen2 using SQL when working with a Lakehouse table in Fabric. Here's how you can approach it:

Steps to Achieve SQL Filtering:

  1. Use SQL Queries in Dataflow Gen2:

    • Dataflow Gen2 allows you to work with different data sources, and if your Lakehouse table is available in the dataflow, you can use SQL-based transformations to filter and manipulate data.
    • To filter data from a Lakehouse table, you can either write SQL queries directly if the source supports SQL, or use the SQL-based transformation within the Dataflow Gen2 interface.
  2. Create a Dataflow:

    • In Fabric, navigate to the Dataflow Gen2 section.
    • Add a Lakehouse as a data source.
    • Select the table from the Lakehouse that you want to query.
  3. SQL Query Transformation:

    • If the Lakehouse is connected through a SQL-based source (like SQL Server, Azure Synapse, or another relational database), you can use the SQL transformation in the dataflow to filter data.
    • In the dataflow editor, use the SQL query or DirectQuery option to write your SQL filter condition.
  4. Custom SQL Filter:

    • You can use SQL WHERE clauses to filter your data within the query itself.
    •  

 

SELECT *
FROM lakehouse_table
WHERE [column_name] = 'desired_value'

 

  1. Pushdown Filters:

    • Some sources support pushdown filters, where the SQL query is pushed directly to the source system (Lakehouse in this case), which improves performance.
    • Make sure that your Lakehouse source supports pushdown, which should automatically apply the filter at the source.

Recommendations:

  • Use DirectQuery if supported by your Lakehouse setup, as it allows SQL filtering directly in the source system and avoids pulling unnecessary data.
  • Write SQL within the dataflow query editor to filter the dataset before it’s loaded into Power BI or other destinations.Example SQL query:

Did I answer your question? Mark my post as a solution, this will help others!

If my response(s) assisted you in any way, don't forget to drop me a "Kudos" 🙂

Kind Regards,
Poojara
Data Analyst | MSBI Developer | Power BI Consultant
Please Subscribe my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS 

Did I answer your question? Mark my post as a solution, this will help others!
If my response(s) assisted you in any way, don't forget to drop me a "Kudos"

Kind Regards,
Poojara - Proud to be a Super User
Data Analyst | MSBI Developer | Power BI Consultant
Consider Subscribing my YouTube for Beginners/Advance Concepts: https://youtube.com/@biconcepts?si=04iw9SYI2HN80HKS

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.