Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

View all the Fabric Data Days sessions on demand. View schedule

SachinNandanwar

SQL's APPLY Clause in PowerBI DAX ?

The aritcle was originally published on : www.azureguru.net

Link to the original article :  link

 

If you have ever worked with any top RDBMS’s like SQL Server or Oracle, the chances are that at some point you must have encountered the APPLY clause especially when dealing with complex queries involving table-valued functions or correlating sub queries.

 

SQL’s CROSS APPLY and OUTER APPLY operators offer powerful ways to evaluate and join rows from one table through a table-valued function or subquery that enables more dynamic and flexible queries than traditional JOIN clauses.

 

Here is a typical output of an APPLY clause

 

Untitled.png

 

You might ask — isn’t a SQL APPLY CLAUSE similar to a SQL CROSS JOIN?

 

The answer is Yes and No.

 

Yes: because the CROSS APPLY clause can produce a Cartesian-like result when used with a table-valued function or subquery where each row from the left table is combined with matching rows from the right side.

 

No: because unlike a CROSS JOIN, the APPLY operator allows you to pass values from the left table into the right side (usually a UDF or correlated subquery). This means the right side can be dynamically evaluated for each row of the left input—something not possible with a CROSS JOIN.

 

This article is more inclined towards replicating the APPLY clause behavior in DAX, so we won’t delve too deeply into all the intricate details of the APPLY clause.

 

Now, let’s take a use case.

 

Suppose we need to get a list of TOP N customers for each Product based on Sales Amount.

 

You probably might be aware of the TOP N clause in DAX. But using only the TOPN function will give us the overall TOP N customers but not product-wise TOP N customers.

 

EVALUATE
    TOPN (
        2,
        ALLSELECTED ( Customer[Name] ),
        [Sales Amount]
    )

Untitled.png

 

To fetch product-wise TOP N customers based on Sales Amount we will have to use the GENERATE function in combination with the TOPN function.

 

You can find details of the GENERATE function here.

 

GENERATE basically creates a Cartesian product between two tables which effectively exhibits SQL CROSS JOIN behavior. Using it in combination with TOPN, the GENERATE function behaves exactly like SQL’s APPLY clause.

 

EVALUATE
SELECTCOLUMNS (
    GENERATE (
        'Product',
        TOPN (
            2,
            ALLSELECTED ( Customer[Name] ),
            [Sales Amount]
        )
    ),
    "Product Name", 'Product'[Product Name],
    "Product Key",  'Product'[ProductKey],
    "Sales Amount", [Sales Amount]
)

The equivalent SQL query would look like this:

SELECT [productkey],
       [sales amount]
FROM   sales
       CROSS APPLY (
            SELECT TOP 2 [name]
            FROM   customer
            WHERE  customer.customerkey = sales.customerkey
        );

The above query is not the exact equivalent of our DAX expression, given the missing ORDER BY clause and missing join across the sales and customer tables.

 

Back to our DAX expression:

 

The issue with the earlier version is that it behaves more like SQL’s OUTER APPLY instead of CROSS APPLY.

 

With the underlying data, the DAX expression returns blank entries for products without any sales, and the number of rows varies according to the number of customers.

 

Untitled.png

For example, ProductKey 1719 has no buyers, so the number of returned rows equals the number of customers.

 

Note: In DAX.do, not all rows may be displayed due to limitations.

 

Untitled.png

We can add a FILTER to limit results to a specific Product Name:

 

EVALUATE
FILTER (
    SELECTCOLUMNS (
        GENERATE (
            'Product',
            TOPN (
                2,
                ALLSELECTED ( Customer[Name] ),
                [Sales Amount],
                DESC
            )
        ),
        'Product'[Product Name],
        "Customer Name", Customer[Name],
        "Sales Amount",  [Sales Amount]
    ),
    [Product Name] = "MGS Dal of Honor Airborne M150"
)

Untitled.png

 

As mentioned earlier, this expression behaves similar to OUTER APPLY, which is undesirable.

 

One way to fix this is by filtering out blank Sales Amount values:

EVALUATE
FILTER (
    SELECTCOLUMNS (
        GENERATE (
            'Product',
            TOPN ( 2, ALLSELECTED ( Customer[Name] ), [Sales Amount], DESC )
        ),
        "Customer Name", Customer[Name],
        "Product Name", 'Product'[Product Name],
        "Sales Amount",  [Sales Amount]
    ),
    NOT ISBLANK ( [Sales Amount] )
)

 

Untitled.png

 

But beware the above approach is a performance disaster as we are not controlling the filter behavior during the expression evaluation by limiting the unqualified number of rows being returned.

 

A better approach is to introduce SUMMARIZECOLUMNS inside the TOPN:

 

EVALUATE
SELECTCOLUMNS (
    GENERATE (
        'Product',
        TOPN (
            2,
            SUMMARIZECOLUMNS ( Customer[Name], "Sales Amount", [Sales Amount] ),
            [Sales Amount],
            DESC
        )
    ),
    "Customer",      Customer[Name],
    "Product Name",  'Product'[Product Name],
    "Sales Amount",  [Sales Amount]
)

The approach is to first get a summarized list of Sales Amount per customer, then fetch the top 2 product-wise Sales Amount for each customer.

 

Untitled.png

 

For a sharp eyed reader it might come as a puzzle to see that we haven’t used Product Name attribute in SUMMARIZEOLUMNS.

 

EVALUATE
SUMMARIZECOLUMNS ( Customer[Name], "Sales Amount", [Sales Amount] )

If we summarize the Sales Amount for a Customer we get an aggregated Sales Amount for all customers without the product wise Sales Amount.

 

Untitled.png

 

but then why we haven’t used it our expression (marked in red below) and still get the correct totals ?

 

Untitled.png

 

In SQL APPLY clause, if both the outer and inner queries produce columns with the same name, then we have to use aliases to disambiguate them or else it will raise a ambiguous column name error. If we use aliases then they wouldn’t error out.

 

So looking at our DAX expression, we already have a Product Name referenced (marked in green above).

 

We could also create an alias in our DAX expression through ADDCOLUMNS but the expression looks very convoluted

 

EVALUATE
SELECTCOLUMNS (
    GENERATE (
        'Product',
        TOPN (
            2,
            ADDCOLUMNS (
                SUMMARIZECOLUMNS ( Customer[Name], "Sales Amount", [Sales Amount] ),
                "Inner ProductName", 'Product'[Product Name]
            ),
            [Sales Amount],
            DESC
        )
    ),
    "Customer",     Customer[Name],
    "Product Name", [Inner ProductName],
    "Sales Amount", [Sales Amount]
)

 

Untitled.png

 

This version uses an alias (Inner ProductName), but makes the expression more complex.

 

So for better readability and maintenance we can stick to our approach through SUMMARIZECOLUMNS without aliases.

 

EVALUATE
SELECTCOLUMNS (
    GENERATE (
        'Product',
        TOPN (
            2,
            SUMMARIZECOLUMNS ( Customer[Name], "Sales Amount", [Sales Amount] ),
            [Sales Amount],
            DESC
        )
    ),
    "Customer",      Customer[Name],
    "Product Name",  'Product'[Product Name],
    "Sales Amount",  [Sales Amount]
)

Conclusion

In this article, we replicated SQL’s APPLY clause behavior in DAX and highlighted important points to avoid performance issues and achieve the expected output.

 

Thanks for reading !!!

Comments

This is an excellent deep dive, especially the way you connected SQL’s APPLY behavior with DAX concepts like GENERATE, SUMMARIZECOLUMNS, and TOPN.

 

It’s rare to see such a clear explanation of how CROSS APPLY logic can be translated into a functional DAX pattern, while also calling out the performance pitfalls that many practitioners overlook.

 

Great work, definitely bookmarking this one for future reference

Really good article on DAX using SQL