Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
Anonymous
Not applicable

DAX script is running slow for large dataset scenario.(1 million records)

Hi team, 

 

I have a datset with 1 million rows, which are loaded into PowerBI  by ODBC(MongoDB).  Below is some sample data.

 

I want to extract each element in multi-vlaue column "C_DS" into a new table , and then stats the value count and create a pie cart by creating a new measurement.

Here is the two Dax scripts I use:

1. Extract each element

 

 

 

C_DS_label = 

VAR SplitByCharacter = ","
VAR Table0 =
    SELECTCOLUMNS(
    ADDCOLUMNS (
        GENERATE (
            ROW ( "Text", CONCATENATEX(SampleData,SampleData[C_DS],",") ),
            VAR TokenCount =
                PATHLENGTH ( SUBSTITUTE ( [Text], SplitByCharacter, "|" ) )
            RETURN
                GENERATESERIES ( 1, TokenCount )
        ),
        "Word", PATHITEM ( SUBSTITUTE ( [Text], SplitByCharacter, "|" ), [Value] )
    ),
    "Word",[Word])
RETURN
    SUMMARIZE(Table0,[Word])

 

 

 

 

2. stats the value count

 

 

 

Count_C_DS =
var curr = SELECTEDVALUE(C_DS_label[Word])
return
calculate(COUNTROWS('SampleData'), FILTER(ALL('SampleData'[C_DS]), PATHCONTAINS(SUBSTITUTE ( SampleData[C_DS], ",", "|" ) , curr) ))

 

 

 

 

But I found that  the first DAX script(extract each element) is running very very ....very slow. It has run for 3 hours for 1 million records and hasn't been ended yet.

 

can any expert help take a look and advice how to improve the DAX script?

 

Thanks,

Cherie

1 ACCEPTED SOLUTION

Great @Anonymous 
Please use

C_DS_label =
DISTINCT (
    SELECTCOLUMNS (
        GENERATE (
            FILTER ( VALUES ( SampleData[C_DS] ), SampleData[C_DS] <> BLANK () ),
            VAR NumofWords =
                PATHLENGTH ( SUBSTITUTE ( [C_DS], ",", "|" ) )
            VAR NumSeries =
                GENERATESERIES ( 1, NumofWords )
            RETURN
                ADDCOLUMNS (
                    NumSeries,
                    "@Word", PATHITEM ( SUBSTITUTE ( [C_DS], ",", "|" ), [Value] )
                )
        ),
        "Word", [@Word]
    )
)

View solution in original post

4 REPLIES 4
tamerj1
Super User
Super User

 Hi @Anonymous 
Please try 

C_DS_label =  
SELECTCOLUMNS (
    GENERATE (
        FILTER ( VALUES ( SampleData[C_DS] ), SampleData[C_DS] <> BLANK () ),
        VAR NumofWords = PATHLENGTH ( SUBSTITUTE ( [C_DS], ",", "|" ) )
        VAR NumSeries = GENERATESERIES ( 1, NumofWords )
        RETURN
            ADDCOLUMNS ( 
                NumSeries,
                "@Word", PATHITEM ( SUBSTITUTE ( [C_DS], ",", "|" ), [Value] )
            )
    ),
    "Word", [@Word]
)

1.png

Anonymous
Not applicable

Hi @tamerj1 ,

 

Thank you for the advice. I tried, and it's very fast to extract the values. But the extracted values looks like duplicated, and not unique . What I expect is to extract the disctinct unique values from the C_DS column.

Here is the actual result in powerBI Desktop.

cheriemilk_0-1653381259530.png

 

Expected Result I want:

Word
Profile_Criteria
Keyword
Faceted_Criteria
Backgroud_Criteria
Rating_Criteria

 

Great @Anonymous 
Please use

C_DS_label =
DISTINCT (
    SELECTCOLUMNS (
        GENERATE (
            FILTER ( VALUES ( SampleData[C_DS] ), SampleData[C_DS] <> BLANK () ),
            VAR NumofWords =
                PATHLENGTH ( SUBSTITUTE ( [C_DS], ",", "|" ) )
            VAR NumSeries =
                GENERATESERIES ( 1, NumofWords )
            RETURN
                ADDCOLUMNS (
                    NumSeries,
                    "@Word", PATHITEM ( SUBSTITUTE ( [C_DS], ",", "|" ), [Value] )
                )
        ),
        "Word", [@Word]
    )
)
Anonymous
Not applicable

@tamerj1  thanks, works perfect.

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors