Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

selimovd

Combine multiple tables with UNION / UNION ALL in SQL Server, APPEND in Power Query or UNION in DAX

If you want to combine several tables, you have to ask yourself whether you should do this already in the data source such as SQL Server, in the processing of the data (ETL) in Power Query or in Power BI, and, above all, where exactly there is a difference.

Data transformation in the data source or in the frontend?

A question that comes up right at the beginning is, where should the union of the tables take place?
In most cases, it is better to do this at the beginning of the data transformation. Ideally, the union would take place in SQL Server.

 

However, it may not be possible to do this in SQL Server, either because the data is not yet available there, since it comes from different sources, such as an SQL table and a CSV file. In this case, the next level, here Power Query within Power BI or the new dataflows, would be a good choice.

 

Then there is the last possibility to combine the data directly in Power BI with DAX. Again, there are different scenarios why this could happen in the Power BI data model. For example, if a table is calculated in the DAX and is not yet available in Power Query. Usually, you would do the append less in the frontend and more in the data source or power query.

Example data for the 3 scenarios

As an example, we look at the target values for different locations. Let's say that the target values come from different systems and so from different tables or files, we have to merge them again for the analysis.

 

Our first table looks like this:

t1.png

 

We want to combine this with the table of our second location to get a complete table of all locations. Note that we have one column more in the second table because here several regions belong to the location or to a common cluster:

 

t2.png

 

Important for this scenario is the last record from the cluster "Central Middle Earth". The employee "1089 Frodo B." actually is from the region of our first table, but during the year he was working in the region of "Rohan"/"Minas Tirith", so the manager of that region entered a target value there. But since he is back now in "The Shire", the manager there has also added him to his system. Accordingly, we have here a double record.

 

For our test series, we need a third table, here we will use the target values of the location "Moria":

t3.png

 

We are going to test each version on 6 criteria:

  1. Syntax to combine the tables
  2. Handling of duplicate entries
  3. Behavior in case of an unequal amount of columns
  4. Behavior in case of deviating column names
  5. Behavior in case of a different order of the columns
  6. Behavior in case of more than 2 tables


1. UNION / UNION ALL in SQL Server

Let's start with the merge in SQL Server.

A. Syntax to combine the tables in SQL Server

The syntax is quite simple, we combine the select queries of the individual tables with a UNION or UNION ALL:

 

 

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

 

 

As a result we receive the combined tables:

1.png

 

B. Handling of duplicate entries in SQL Server

If you paid attention, the duplicate entry of the employee "Frodo B." is missing in the example above. This is because the "UNION" command removes duplicate values.

If we want to have the duplicate entries as well, we have to use "UNION ALL". The syntax is similar to the example above:

 

 

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

 

This way we also receive the duplicate record:

2.png

 


C. Behavior in case of unequal amount of columns in SQL Server

In SQL Server we reach a limit here because the number of columns must always be identical. Otherwise, we run into an error if we try to integrate the cluster from the second table, for instance:

 

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Cluster]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

 

Here we encounter the following error:

 

Msg 205, Level 16, State 1, Line 2
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

 

An additional column would have to be caught in the remaining queries (NULL AS [Cluster]).

D. Behavior in case of different deviating column names in SQL Server

Different column names are not a problem. The column names of the first query are always used. The following queries must simply have the correct number of columns:

 

SELECT [EmpID] AS [EmployeeID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Cluster]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

 

Note that the column name of the first column adapts to the name of the first select:

3.png

 


E. Behavior in case of unequal order of the columns in SQL Server

As mentioned, the SQL Server follows the order of the columns of the first select. The column names of the following queries are ignored. If the order of the columns in the following query changes, the values are used in the order they appear in the select, as long as the file types match:

 

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [LocationID]
  ,[Name]
  ,[EmpID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

 

The values of the LocationID and the EmpID appear mixed in the columns:4.png

 If we put a column with text and a column with numerical values on the corresponding positions, we would run into a conversion error:

 

Msg 245, Level 16, State 1, Line 2
Conversion failed when converting the varchar value 'Frodo B.' to data type int.

 

 


F. Behavior in case of more than 2 tables in SQL Server

An unlimited number of tables can be combined. For this purpose, additional UNION or UNION ALL statements simply have to be lined up one after the other:

 

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetShire]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetCentralMiddleEarth]

UNION ALL

SELECT [EmpID]
  ,[Name]
  ,[LocationID]
  ,[Location]
  ,[Year]
  ,[Target]
FROM [TargetMoria]

 

 Each additional table is simply added below the existing ones:

5.png

 


2. APPEND in Power Query

In Power Query the behavior is slightly different than in SQL Server. Here the columns are combined only by the names and not by order, which results in other possibilities.

Due to the visual way Power Query works, it would be a little unusual to use the programmatic syntax in M (= the query language of Power Query). I will therefore focus on creating the query with the graphical interface.

A. Syntax to combine the tables in Power Query

To create the query, the first selected table is marked and combined using the append function:

6.png

  Then you only need to select the second table:

7.png

After that, the combination is already done. In Power Query further transformations could be done:

8.png

 


B. Handling of duplicate entries in Power Query

Duplicates are not automatically removed by the append. Those still exist:

9.png

However, these can be filtered out in a next step using the Remove Duplicates function:

10.png

 Afterwards the duplicate value is removed:

11.png

 



C. Behavior in case of unequal amount of columns in Power Query

As already mentioned, the append in Power Query is using the column names. Columns with the same name are combined, columns missing from a table are simply left empty:

12.png

 



D. Behavior in case of deviating column names in Power Query

We see the same behavior with column names. If the name changes, the content will not be combined.

As in the previous example, in the first table the column "EmpID" was renamed to "EmployeeID", in the second table the name is kept:

13.png

 

 

E. Behavior in case of different order of the columns in Power Query

The order does not matter at all in Power Query, because the append happens exclusively on the column name.

The append of the following two tables...

14.png

 

15.png

 

... is handled by Power Query without problems:

 

16.png



F. Behavior in case of more than 2 tables in Power Query

It is also possible to combine several tables. With the possibility of combining more than 2 tables, all listed tables can be combined:

17.png

 

18.png

 

3. UNION in DAX

A new table can be created in DAX in the Modeling Tab by clicking "New Table":

19.png

A big difference to the other two approaches is certainly that the UNION command in DAX combines existing tables from the data model. This means that although the base tables can be hidden from the end user, they are still loaded into the data model. That means we will take up more space in the data model and in the file size as well as a longer preparation of the data model during the initialization.

A. Syntax to combine the tables in DAX

The syntax is kept relatively simple here as well. With the UNION function the mentioned tables are simply put together:

 

A Union = UNION('A TargetShire', 'A TargetCentralMiddleEarth')

 

As a result we get the combined table directly in Power BI: 

20.png

 

B. Handling of duplicate entries in DAX

As just seen in the first scenario, duplicate entries are not being filtered out:

21.png

 

 The UNION in DAX thus corresponds to a UNION ALL in SQL or an APPEND in Power Query. If you want to eliminate the additional entry, this can be achieved with the DISTINCT function:

 

A Union Distinct = DISTINCT( UNION( 'A TargetShire', 'A TargetCentralMiddleEarth' ) )

 

22.png

 

C. Behavior in case of unequal amount of columns in DAX

Like SQL, DAX also requires the identical number of columns. In scenario C, we unfortunately cannot connect the two tables with an unequal number of columns:

 

Each table argument of 'UNION' must have the same number of columns.

 

23.png

Here the table with fewer columns would have to be manipulated with a table function in DAX as a workaround.


D. Behavior in case of deviating column names in DAX

Also the naming of the columns behaves like in SQL Server. The column names of the first table are simply used:

24.png

The column caption is changed accordingly if the order of the tables is changed at the UNION:

25.png

 

 

E. Behavior in case of different order of the columns in DAX

As in SQL Server, only the order of the columns matters in DAX. So as soon as the columns are not in the same order, the contents will be mixed:

26.png

 

 

F. Behavior in case of more than 2 tables in DAX

The syntax of UNION requires at least 2 tables to combine. But any number of tables can be combined. These can easily be added as another argument to the function. All tables must have an identical structure in order to achieve the desired result:

 

F Union = UNION( 'F TargetShire', 'F TargetCentralMiddleEarthDAX', 'F TargetMoria' )

 

27.png

 

 Conclusion

Combining several tables to one large table is possible in all 3 ways. As we have seen, the behavior of UNION in SQL Server and UNION in DAX within Power BI is very similar. Here tables with the same number of columns are placed directly under each other. The APPEND function in Power Query takes a different approach and combines the tables by column names.

 

To get an overview here is the summary in tabular form:

28.png

 

To come back to the question remains which command you should use. As written in the introduction this depends strongly on the context. All three approaches have a different behavior. Depending on the initial situation, each developer must decide for himself what makes the most sense.

 

This post was originally published on:

https://whatthefact.bi/power-bi/dax/combine-multiple-tables-with-union-union-all-in-sql-server-appen...

 

Best regards

Denis Selimovic

Blog: WhatTheFact.bi
LinkedIn: Denis Selimovic
Twitter: @WhatTheFact_BI

Comments