Feature Request: Shared / Common Key Dictionary En...

rif-kiamil · ‎09-22-2025

What you would like to accomplish:

Introduce catalog-level “join groups” in Fabric/SQL to enable multiple columns across different tables (e.g., fact and dimension tables) to share a consistent dictionary encoding for join keys. This would ensure that identical string or UUID values are assigned the same internal integer code, making joins and group-bys on these keys significantly faster and more memory-efficient.

DuckDB Feature Request I made: Shared / Common Dictionary Encoding for Join Keys includes more details on the motivation and potent...https://issuetracker.google.com/issues/442600668

One important detail to highlight is that typical dictionary encoding is local to a column or file, ...

How this might work:

Allow explicit declaration of “join groups” at the dataset or project level, so that specific columns across tables share the same dictionary domain.
Example of a possible SQL API:
```
-- Define a join group to align dictionary codes across tables
CREATE JOIN GROUP customer_id_group (
  fact_customer_daily.customer_id,
  fact_customer_orders_daily.customer_id
);

-- Optionally rebuild or refresh the dictionary
ALTER JOIN GROUP customer_id_group REBUILD;
```
Internally, Fabric/SQL would persist and reuse a common dictionary for the grouped columns, ensuring that equivalent string/UUID values map to identical encoded integer IDs across tables and partitions.
If applicable, reasons why alternative solutions are not sufficient:
- Manual mapping tables: Today, teams often maintain surrogate key lookup tables to replace string/UUID identifiers with integer IDs. This adds complexity, operational overhead, and data engineering effort.
- Per-table dictionary encoding (as in Parquet/ORC): While these formats support dictionaries, they are scoped to row groups or files, and don’t provide cross-table consistency.
  Other information (workarounds you have tried, documentation consulted, etc):
  - Current practice in many warehouses (includingFabric/SQL) is to maintain mapping tables for surrogate integer keys derived from string or UUID identifiers. While effective, this approach requires extra ETL complexity and maintenance.
  - Implementing catalog-level shared dictionaries inFabric/SQL could provide operational simplicity and significant performance improvements for users with large, join-heavy analytic workloads.
  - That said, I also want to acknowledge that it's entirely possible systems like Fabric/SQL may already be implementing something along these lines internally, and as an external user, I wouldn’t necessarily have visibility into whether or how that’s happening.

Feature Request: Shared / Common Key Dictionary Encoding for Join Keys

What you would like to accomplish:

How this might work:

If applicable, reasons why alternative solutions are not sufficient:

Other information (workarounds you have tried, documentation consulted, etc):

default use original column name as prefix

Put back the Other Contacts check box for dataflow...

Need Feature to get Failure notification if a Data...

View Dataflow parameters from pipeline when Worksp...

Better Data Migration Assistant For identifying Da...

Join us in Barcelona for FabCon and SQLCon, September 28 - October 1, 2026.

Feature Request: Shared / Common Key Dictionary Encoding for Join Keys

What you would like to accomplish:

How this might work:

If applicable, reasons why alternative solutions are not sufficient:

Other information (workarounds you have tried, documentation consulted, etc):

default use original column name as prefix

Put back the Other Contacts check box for dataflow...

Need Feature to get Failure notification if a Data...

View Dataflow parameters from pipeline when Worksp...

Better Data Migration Assistant For identifying Da...