Solved: How to fetch storage mode and size of storage mode...

akashparthe · ‎12-20-2024

I want to programmatically fetch storage mode and size of a semantic model using fabric notebook. Can anyone help me out with the rest api/python module available.

Anonymous · ‎12-22-2024

Hi @akashparthe ,

Thanks for the reply from lbendlin .

The link lbendlin provided was very helpful, I did a test based on the link he provided and it ran successfully to get the size of the semantic model for different workspaces.

import sempy.fabric as fabric
import pandas as pd
import datetime


def analyze_model_size(drop_default_datasets=True):
    """
    Author : Sandeep Pawar |  fabric.guru  |  02-29-2024

    This function scans the active premium workspaces in the entire tenant and calculates the total column size
    of premium datasets (import and Direct Lake). The purpose of the function is to get the total column size and
    identify size reduction opportunities by calculating percentage of size occupied by calculated tables, calculated columns,
    auto datetime tables, columns with double data type and column hierarchy size.

    Note that user hierarchy and relationship size are not used in memory size calculation.   

    args: drop_default_datasets=True
          set to False to drop default datasets from the final dataframe. Default is True.  

    """

    def get_ds_size(workspace, dataset):
        all_columns = fabric.list_columns(workspace=workspace, dataset=dataset, extended=True)
        size = sum(all_columns["Total Size"])
        calc_cols = sum(all_columns[all_columns.Type == "Calculated"]["Total Size"])
        calc_tables = sum(all_columns.query('Type == "CalculatedTableColumn"')["Total Size"])
        auto_date = sum(all_columns[all_columns['Table Name'].str.startswith(("DateTableTemplate_", "LocalDateTable_"))]["Total Size"])
        float_cols = sum(all_columns.query('`Data Type` == "Double"')["Total Size"])
        hierarchy_size = sum(all_columns["Hierarchy Size"]) #column hierarchy size
        num_tables = all_columns["Table Name"].nunique()

        return size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables

    #Get active capacities only
    active_capacities = fabric.list_capacities().query('State == "Active"')
    #Premium and Fabric workspaces
    ws = fabric.list_workspaces().query('`Is On Dedicated Capacity`==True')
    premium_workspaces = ws[ws['Capacity Id'].isin(list(active_capacities.Id))]
    datasets = pd.concat([fabric.list_datasets(ws).assign(workspace=ws) for ws in premium_workspaces['Name']], ignore_index=True)
    col_list = ['total_columnsize_MB', 'pct_size_calculated_cols', 'pct_size_calculated_tables', 
             'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size','number_of_tables']    
    catalog = datasets[["workspace", "Dataset Name"]].copy().assign(date = datetime.date.today())
    catalog[col_list] = pd.NA

    for i, row in catalog.iterrows():
        try:
            size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables = get_ds_size(row["workspace"], row["Dataset Name"])
            catalog.loc[i, ['total_columnsize_MB', "pct_size_calculated_cols", 'pct_size_calculated_tables', 
                            'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size', 'number_of_tables']] = [
                round(size/(1024**2), 1), round(100 * (calc_cols / size), 1), round(100 * (calc_tables / size), 1),
                round(100 * (auto_date / size), 1), round(100 * (float_cols / size), 1), round(100 * (hierarchy_size / size), 1), int(num_tables)
            ]
        #Excpetion to handle default datasets which do not have XMLA endpoint
        except Exception: 
            continue
    for col in col_list:
        catalog[col]=pd.to_numeric(catalog[col], errors='coerce')

    if drop_default_datasets:
        #default datasets will show NaN values. To include default, set drop_default_datasets=False
        catalog.dropna(inplace=True) 

    catalog.sort_values(by=['total_columnsize_MB'], ascending=False, inplace=True)
    catalog.reset_index(drop=True, inplace=True)
    return catalog

df = analyze_model_size()
display(df)

As for the storage model, I haven't found a way to get it programmatically. As a workaround, you can click “Model layout” in the SQL endpoint and select the corresponding table to see its storage schema, as shown below.

If you have any other questions please feel free to contact me.

Best Regards,
Yang
Community Support Team

If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

View solution in original post

Anonymous · ‎12-22-2024

Hi @akashparthe ,

Thanks for the reply from lbendlin .

The link lbendlin provided was very helpful, I did a test based on the link he provided and it ran successfully to get the size of the semantic model for different workspaces.

import sempy.fabric as fabric
import pandas as pd
import datetime


def analyze_model_size(drop_default_datasets=True):
    """
    Author : Sandeep Pawar |  fabric.guru  |  02-29-2024

    This function scans the active premium workspaces in the entire tenant and calculates the total column size
    of premium datasets (import and Direct Lake). The purpose of the function is to get the total column size and
    identify size reduction opportunities by calculating percentage of size occupied by calculated tables, calculated columns,
    auto datetime tables, columns with double data type and column hierarchy size.

    Note that user hierarchy and relationship size are not used in memory size calculation.   

    args: drop_default_datasets=True
          set to False to drop default datasets from the final dataframe. Default is True.  

    """

    def get_ds_size(workspace, dataset):
        all_columns = fabric.list_columns(workspace=workspace, dataset=dataset, extended=True)
        size = sum(all_columns["Total Size"])
        calc_cols = sum(all_columns[all_columns.Type == "Calculated"]["Total Size"])
        calc_tables = sum(all_columns.query('Type == "CalculatedTableColumn"')["Total Size"])
        auto_date = sum(all_columns[all_columns['Table Name'].str.startswith(("DateTableTemplate_", "LocalDateTable_"))]["Total Size"])
        float_cols = sum(all_columns.query('`Data Type` == "Double"')["Total Size"])
        hierarchy_size = sum(all_columns["Hierarchy Size"]) #column hierarchy size
        num_tables = all_columns["Table Name"].nunique()

        return size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables

    #Get active capacities only
    active_capacities = fabric.list_capacities().query('State == "Active"')
    #Premium and Fabric workspaces
    ws = fabric.list_workspaces().query('`Is On Dedicated Capacity`==True')
    premium_workspaces = ws[ws['Capacity Id'].isin(list(active_capacities.Id))]
    datasets = pd.concat([fabric.list_datasets(ws).assign(workspace=ws) for ws in premium_workspaces['Name']], ignore_index=True)
    col_list = ['total_columnsize_MB', 'pct_size_calculated_cols', 'pct_size_calculated_tables', 
             'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size','number_of_tables']    
    catalog = datasets[["workspace", "Dataset Name"]].copy().assign(date = datetime.date.today())
    catalog[col_list] = pd.NA

    for i, row in catalog.iterrows():
        try:
            size, calc_cols, calc_tables, auto_date, float_cols, hierarchy_size, num_tables = get_ds_size(row["workspace"], row["Dataset Name"])
            catalog.loc[i, ['total_columnsize_MB', "pct_size_calculated_cols", 'pct_size_calculated_tables', 
                            'pct_size_autodatetime', 'pct_size_floats', 'pct_hierarchy_size', 'number_of_tables']] = [
                round(size/(1024**2), 1), round(100 * (calc_cols / size), 1), round(100 * (calc_tables / size), 1),
                round(100 * (auto_date / size), 1), round(100 * (float_cols / size), 1), round(100 * (hierarchy_size / size), 1), int(num_tables)
            ]
        #Excpetion to handle default datasets which do not have XMLA endpoint
        except Exception: 
            continue
    for col in col_list:
        catalog[col]=pd.to_numeric(catalog[col], errors='coerce')

    if drop_default_datasets:
        #default datasets will show NaN values. To include default, set drop_default_datasets=False
        catalog.dropna(inplace=True) 

    catalog.sort_values(by=['total_columnsize_MB'], ascending=False, inplace=True)
    catalog.reset_index(drop=True, inplace=True)
    return catalog

df = analyze_model_size()
display(df)

As for the storage model, I haven't found a way to get it programmatically. As a workaround, you can click “Model layout” in the SQL endpoint and select the corresponding table to see its storage schema, as shown below.

If you have any other questions please feel free to contact me.

Best Regards,
Yang
Community Support Team

If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

lbendlin · ‎12-20-2024

Calculating and Reducing Power BI Dataset Size Using Semantic Link

How to fetch storage mode and size of storage mode of a semantic model using fabric notebook

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

How to fetch storage mode and size of storage mode of a semantic model using fabric notebook

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025