Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Get Fabric certified for FREE! Don't miss your chance! Learn more
Dear Microsoft Fabric Community,
I’m a strong advocate of Power Query and the Power BI ecosystem. Its capabilities in shaping, merging, and transforming data are impressive. However, after working extensively with complex, real-world datasets, I’ve encountered some critical limitations when it comes to automated data cleaning — especially at scale.
In one recent use case, my dataset contained:
Fuzzy duplicates (e.g., “Tom Clark” vs. “Tom Clarke”)
Outliers (e.g., Age = 999, or negative income values)
Missing values across dozens of columns
While Power Query could help detect issues, resolving them was highly manual and time-consuming. Key limitations I observed:
No native support for dynamic median/mode-based imputation
No column-wide outlier handling (e.g., using IQR)
Fuzzy duplicate detection is not natively supported
Manual replacements must be done per column
When facing hundreds of messy columns, this process quickly becomes unmanageable.
To address these challenges, I developed an automated data cleaning layer using R, integrated within Power BI. The solution provides:
This tool doesn’t replace Power Query — it extends its capabilities. It empowers analysts to work efficiently with complex, inconsistent data while maintaining full control and auditability.
I believe Power BI and Microsoft Fabric could benefit from native support for:
Dynamic column-wise missing value imputation (mean/median/mode)
Built-in outlier handling logic (e.g., via IQR or z-score)
Fuzzy duplicate detection with adjustable similarity thresholds
Such enhancements would dramatically reduce the manual burden on analysts and accelerate time to insight.
I would love to share this tool and discuss how it might align with the vision of Microsoft Fabric and Power BI.
Thank You
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.