Dear Microsoft Fabric Community, I’m a strong advocate of Power Query and the Power BI ecosystem. Its capabilities in shaping, merging, and transforming data are impressive. However, after working extensively with complex, real-world datasets, I’ve encountered some critical limitations when it comes to automated data cleaning — especially at scale. In one recent use case, my dataset contained: Fuzzy duplicates (e.g., “Tom Clark” vs. “Tom Clarke”) Outliers (e.g., Age = 999, or negative income values) Missing values across dozens of columns While Power Query could help detect issues, resolving them was highly manual and time-consuming. Key limitations I observed: No native support for dynamic median/mode-based imputation No column-wide outlier handling (e.g., using IQR) Fuzzy duplicate detection is not natively supported Manual replacements must be done per column When facing hundreds of messy columns, this process quickly becomes unmanageable. My Solution To address these challenges, I developed an automated data cleaning layer using R, integrated within Power BI. The solution provides: Outlier detection and handling using IQR logic On-the-fly imputation of missing values: – Median for numeric columns – Mode for categorical columns Fuzzy duplicate detection and removal using string similarity No hardcoding required — fully dataset-agnostic Runs natively inside Power BI using the R script visual or Power Query R steps Full transparency — changes are logged step by step This tool doesn’t replace Power Query — it extends its capabilities. It empowers analysts to work efficiently with complex, inconsistent data while maintaining full control and auditability. Suggested Feature Enhancement I believe Power BI and Microsoft Fabric could benefit from native support for: Dynamic column-wise missing value imputation (mean/median/mode) Built-in outlier handling logic (e.g., via IQR or z-score) Fuzzy duplicate detection with adjustable similarity thresholds Such enhancements would dramatically reduce the manual burden on analysts and accelerate time to insight. I would love to share this tool and discuss how it might align with the vision of Microsoft Fabric and Power BI. Thank You
... View more