The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Imagine the following scenario: You have migrated a report from an old system to the great Power BI and a tester reports a bug in the new report. But the tester does not describe the exact cause, they just export a CSV file from both systems and say that the numbers are not the same. And it is your task to find out what the problem is and, of course, you have to fix it. The first step is to uncover the differences. Earlier, I used Notepad++ or Visual Studio Code for comparing CSV files but nowadays I use Power Query for that!
There are many tools which compare two files line by line like Notepad++, Visual Studio Code or many others. But they are designed to compare code lines and not to search for differences in huge CSV files. You also have to sort both CSV files before you start comparing them. That all is a lot of work and the result is unreliable. Let’s take a look at a Power Query solution.
What I suggest is the following: I import both CSV files into Power Query Editor in Excel, merge them by their keys and compare all corresponding columns.
First, you have to import both files in Power Query in Excel. Open the ribbon Data and click on From Text/CSV in the left upper corner. After you choose a file, the following window gets opened.
Do not click on the button Load but use the button Edit instead. It takes you to Power Query Editor.
Next, you import the second file you want to compare. As you can see in the screenshot below, there are two options how to do that. Either you right-click in the space under the previous query or you use the ribbon Home and click on the New Source.
After the import of the second file you should see two queries.
The next step is merging these two files. I recommend using Merge Queries as New because we will create a new query and our source files will stay unchanged. You can later merge these source files in different ways if needed.
In the Merge mask, you can define what kind of JOIN should be used. In this scenario, the full outer join is the recommended choice. It guarantees that you see the not matching records of both files.
I often rename the columns in a form like on the following screenshot. It improves the readability of the result.
One of the last steps is creating a new custom column which compares corresponding old and new columns. The new column called IsAnyDifference equals True if there is any difference between Old.Name and New.Name OR between Old.Count and New.Count.
… and some examples of the new column. As you can see, I do not care about the data type of the new custom column.
As the last step, select only rows which have IsAnyDifference = True.
After you have applied the filter, you only see rows which contain some differences between the old and the new file.
How simple, right?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.