Solved: How is your team handling fuzzy duplicates, outlie...

SachiP · ‎08-05-2025

We’re building a tool to help automate the worst parts of real-world data cleaning — especially for teams working in Fabric and Power BI.
Common headaches we hear from data teams:

Fuzzy duplicates across merged sources (different spellings, casing, etc.)
Outliers that skew dashboards and break models
Missing values that kill calculated columns or ML prep

We’ve built patterns to automate:

Dynamic outlier detection (beyond simple Z-scores)
Smart missing value imputation (context-aware)
Fuzzy matching + deduplication across joins

👉Curious: How is your team currently solving these?
Is it mostly manual, or are you using any automated tools?

Would love to hear what’s working — or what’s still painful.

@FabricPlatformForums

v-pgoloju · ‎08-06-2025

Hi @SachiP,

As per my knowledge, I've been using a manual process to handle fuzzy duplicates, outliers, and missing values in Power BI

Thanks & Regards,

Prasanna Kumar

View solution in original post

SachiP · ‎08-07-2025

great Thanks! How much time does it take for you to clean a dataset?

v-pgoloju · ‎08-06-2025

Hi @SachiP,

As per my knowledge, I've been using a manual process to handle fuzzy duplicates, outliers, and missing values in Power BI

Thanks & Regards,

Prasanna Kumar

How is your team handling fuzzy duplicates, outliers, and missing values in Fabric?

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - November 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

How is your team handling fuzzy duplicates, outliers, and missing values in Fabric?

Helpful resources

Power BI Dataviz World Championships

Power BI Monthly Update - November 2025

FabCon Atlanta 2026