Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
teaguejs
Helper II
Helper II

Comparing Two Different Datasets for missing values

Hi,

 

I have two different datasets/tables in Power BI. One that is a Power BI dataset and the other is a Google BigQuery direct query.

 

Lets call them table_a and table_b. I would like to compare what values in Column A in table_a are missing from Column A in table_b.

 

Any tips on the easiest way to go about doing this?

1 ACCEPTED SOLUTION
tamerj1
Super User
Super User

Hi @teaguejs 
You can create a new calculated table 

Missing From b =
EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) )

Or you can create a measure, for example for a card visual 

Missing From b =
CONCATENATEX (
    EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) ),
    [Column A],
    UNICHAR ( 10 )
)

View solution in original post

13 REPLIES 13
tamerj1
Super User
Super User

Hi @teaguejs 
You can create a new calculated table 

Missing From b =
EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) )

Or you can create a measure, for example for a card visual 

Missing From b =
CONCATENATEX (
    EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) ),
    [Column A],
    UNICHAR ( 10 )
)

I used the first suggestion of creating a new calculated table and that worked, thank you!

I tired your first calculated table suggestion and got the following error:

 

Error Message:
MdxScript(Model) (59, 38) Calculation error in measure 'CMDB_Active'[Missing from b]: A table of multiple values was supplied where a single value was expected.

 

Hi @teaguejs 
For option 1 please make sure you are creating a calculated table not measure

1.png

If you want to create a measure please use option 2

Hi @teaguejs 
Have check this? ☝️

 

rsbin
Super User
Super User

@teaguejs 

I think using the LOOKUPVALUE function should do the trick.

Create a new Calculated Column in your table_a.  The syntax should be something like this:

 

ColumnA_fromTable_b = LOOKUPVALUE( table_b[Column A],table_b[Column A], [Column A] )

 

From Microsoft documentation:

 

LOOKUPVALUE(
    <result_columnName>,
    <search_columnName>,
    <search_value>
    [, <search2_columnName>, <search2_value>]…
    [, <alternateResult>]
)

 

Hope this works for you.

Regards,

 

 

I think the naming convention I used for the columns and tables is a bit confusing so let me use the actual names.

 

Within the table named CMDB_Active and the column within that table called ip_address, I am trying to see what values are contained in that column that aren't in a separate table called it_current from the column ip_address_rapid7

 

Let me know if that makes sense

@teaguejs ,

It would be helpful if you can you paste a small sample of data (just several records) from each table into a reply.  Paste it as a table, not a picture.

 

@teaguejs ,

I created two sample tables:

rsbin_1-1660682174275.png

 

ip_address_rapid7

b
c
d

 

This is the result

rsbin_0-1660681983341.png

So the values "a", "e" and "f" are not in your second table called it_current.

Hope this helps.

Another error, I used the following to create a new column in Table A:

Missing from Rapid7 = LOOKUPVALUE(it_current[ip_address_rapid7],it_current[ip_address_rapid7], [ip_address])

Error: The column 'CMDB_Active'[Missing from Rapid7] cannot be pushed to the remote data source and cannot be used in this scenario.

@teaguejs ,

I have never seen this error message before.  I did a quick google search using the phrase "cannot be pushed to the remote data source and cannot be used in this scenario".

Are you using DirectQuery or a composite model?

If this is the case, we may have to try something different.  Suggest you search the same and to see if you can find an answer.  I am stumped at the moment.

 

Yeah unfortunately it is direct query so I will have to do some other searching, thanks for your help though!

@teaguejs ,

May not be the cleanest solution, but are you able to create a new Table summarizing on

'CMDB_Active[ip_address].  Then add your new column to this new calculated table.

Due to the size of your dataset, it might degrade the performance of your model, but at least you can see if it gets you the results you are looking for.  Then work on the performance if it is an issue.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.