cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
teaguejs
Helper II
Helper II

Comparing Two Different Datasets for missing values

Hi,

 

I have two different datasets/tables in Power BI. One that is a Power BI dataset and the other is a Google BigQuery direct query.

 

Lets call them table_a and table_b. I would like to compare what values in Column A in table_a are missing from Column A in table_b.

 

Any tips on the easiest way to go about doing this?

1 ACCEPTED SOLUTION
tamerj1
Super User
Super User

Hi @teaguejs 
You can create a new calculated table 

Missing From b =
EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) )

Or you can create a measure, for example for a card visual 

Missing From b =
CONCATENATEX (
    EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) ),
    [Column A],
    UNICHAR ( 10 )
)

View solution in original post

13 REPLIES 13
tamerj1
Super User
Super User

Hi @teaguejs 
You can create a new calculated table 

Missing From b =
EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) )

Or you can create a measure, for example for a card visual 

Missing From b =
CONCATENATEX (
    EXCEPT ( VALUES ( Table_a[Column A] ), VALUES ( Table_b[Column A] ) ),
    [Column A],
    UNICHAR ( 10 )
)

I used the first suggestion of creating a new calculated table and that worked, thank you!

I tired your first calculated table suggestion and got the following error:

 

Error Message:
MdxScript(Model) (59, 38) Calculation error in measure 'CMDB_Active'[Missing from b]: A table of multiple values was supplied where a single value was expected.

 

Hi @teaguejs 
For option 1 please make sure you are creating a calculated table not measure

1.png

If you want to create a measure please use option 2

Hi @teaguejs 
Have check this? ☝️

 

rsbin
Super User
Super User

@teaguejs 

I think using the LOOKUPVALUE function should do the trick.

Create a new Calculated Column in your table_a.  The syntax should be something like this:

 

ColumnA_fromTable_b = LOOKUPVALUE( table_b[Column A],table_b[Column A], [Column A] )

 

From Microsoft documentation:

 

LOOKUPVALUE(
    <result_columnName>,
    <search_columnName>,
    <search_value>
    [, <search2_columnName>, <search2_value>]…
    [, <alternateResult>]
)

 

Hope this works for you.

Regards,

 

 

I think the naming convention I used for the columns and tables is a bit confusing so let me use the actual names.

 

Within the table named CMDB_Active and the column within that table called ip_address, I am trying to see what values are contained in that column that aren't in a separate table called it_current from the column ip_address_rapid7

 

Let me know if that makes sense

@teaguejs ,

It would be helpful if you can you paste a small sample of data (just several records) from each table into a reply.  Paste it as a table, not a picture.

 

@teaguejs ,

I created two sample tables:

rsbin_1-1660682174275.png

 

ip_address_rapid7

b
c
d

 

This is the result

rsbin_0-1660681983341.png

So the values "a", "e" and "f" are not in your second table called it_current.

Hope this helps.

Another error, I used the following to create a new column in Table A:

Missing from Rapid7 = LOOKUPVALUE(it_current[ip_address_rapid7],it_current[ip_address_rapid7], [ip_address])

Error: The column 'CMDB_Active'[Missing from Rapid7] cannot be pushed to the remote data source and cannot be used in this scenario.

@teaguejs ,

I have never seen this error message before.  I did a quick google search using the phrase "cannot be pushed to the remote data source and cannot be used in this scenario".

Are you using DirectQuery or a composite model?

If this is the case, we may have to try something different.  Suggest you search the same and to see if you can find an answer.  I am stumped at the moment.

 

Yeah unfortunately it is direct query so I will have to do some other searching, thanks for your help though!

@teaguejs ,

May not be the cleanest solution, but are you able to create a new Table summarizing on

'CMDB_Active[ip_address].  Then add your new column to this new calculated table.

Due to the size of your dataset, it might degrade the performance of your model, but at least you can see if it gets you the results you are looking for.  Then work on the performance if it is an issue.

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Community News

Fabric Community News unified experience

Read the latest Fabric Community announcements, including updates on Power BI, Synapse, Data Factory and Data Activator.

Dashboard in a day with date

Exclusive opportunity for Women!

Join us for a free, hands-on Microsoft workshop led by women trainers for women where you will learn how to build a Dashboard in a Day!

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors