Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
sebasj
Frequent Visitor

Lookup nearest date but not later

Hello everyone! Been peeking the group for a long timem but now I need to make a question because I'm not being able to solve this myself. 

 

I have to find with the IDs in Table 1 the closest date in Table 2 to date in Table 1 that isn't later than date in Table 1. Once that's found, I need to bring the data in Location column in table 2 to Table 1

 

Here's the example:

 

Table 1 (date is in dd/mm/yyyy)

IDDateLocation
101/03/2021 
205/03/2021 
308/05/2021 
417/06/2021 

 

Table 2

IDDateLocation
101/02/20213
129/02/20215
104/04/20217
204/03/202123
207/07/20214
203/02/20214
207/03/202142
324/06/20218
304/04/202115
426/07/202134

 

So the result would be:

 

Table 1

IDDateLocation
101/03/20215
205/03/202123
308/05/202115
417/06/2021<null>

 

This is because:

- For ID1 closest date in Table 2 to 01/03/2021 that is not later, is 29/02/2021 which corresponds to Location 5

- For ID2 closest date in Table 2 to 05/03/2021 that is not AFTER that date, is 04/03/2021 which corresponds to Location 23

- For ID3 closest date in Table 2 to 08/05/2021 that is not later is 04/04/2021, which corresponds to Location 15

- For ID4 there's no close date that isn't later than 17/06/2021 so it's null

 

I've seen this post but while it worked, it also brings dates later than the one in Table 1. 

 

Any help will be greatly appreciated, thanks a lot!

 

Best regards,

Seb

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @sebasj 

I think I find the reason why you get wrong result. 

Let's see Date 2020/08/28 and ID 162614.33 in Table 1 in your sample. In Table 2, we can find the nearest date is 2020/08/28, it contains 2571 and 2570 two results. So in Query 1 max function will only show the max location 2571 , and in Query 2 sum function will show 2571+2570 = 5141. We couldn't expand the table by calculated column. Power Query in Power Query Editor will be a good way to achieve your goal.

1. Merge Table 1 and Table 2 by ID column.

1.png

2. Expand Table 2 by Date and Location columns.

3. Add a Conditional column as below, to let Table2.Date <=Date in Table1 show 1, or show 0.2.png

4. Remove 0 from Conditional Column by filter.

5. Group By with ID and Date column to show Max Date. And expand all rows to show Location you need.

3.png

Finally result is as below.

4.png

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

11 REPLIES 11
Anonymous
Not applicable

Hi @sebasj 

I think you just need to build a calculated column by filter and earlier function in "Table 1"

Code:

Location =
CALCULATE (
    MAX ( 'Table 2'[Location] ),
    FILTER (
        'Table 2',
        'Table 2'[ID] = EARLIER ( 'Table 1'[ID] )
            && 'Table 2'[Date] <= EARLIER ( 'Table 1'[Date] )
    )
)

Result is as below.

1.png

 

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks a lot for the work!

I tried it and it appears to be doing exactly what I was looking for (if you're able to, I would like to ask you how this works, though it seems self explanatory there are some parts I don't quite get).

My datasets are big (>3M rows) so I'm going to to test, but so far it seems to be working exactly as expected. As soon as I end testing, if everything goes fine I'll mark the answer as accepted solution. In the meantime, if you're kind enough to explain me how it works I'll be really grateful. Thanks again!

Anonymous
Not applicable

Hi @sebasj 

I find there is some problem in my old code and I update my code as below. This is correct one.

Location =
VAR _MAXDate =
    CALCULATE (
        MAX ( 'Table 2'[Date] ),
        FILTER (
            'Table 2',
            'Table 2'[ID] = EARLIER ( 'Table 1'[ID] )
                && 'Table 2'[Date] <= EARLIER ( 'Table 1'[Date] )
        )
    )
RETURN
    CALCULATE (
        SUM ( 'Table 2'[Location] ),
        FILTER (
            'Table 2',
            'Table 2'[ID] = EARLIER ( 'Table 1'[ID] )
                && 'Table 2'[Date] = _MAXDate
        )
    )

I will explain my code to you. 

You want to calculate Location from 'Table 2' by filter of ID and Date. So we can understand the logic of the code through requirements. Normally, the code format looks like CaLculate(Expression,Filter(Table)). 

Expression is the value you want to get or calculate from, so we add 'Table 2'[Location] into this place. Your Location column looks like in number type, so we need to use sum function to get it. We use max function to get other types of value, just like "Table 2"[Date] in _MaxDate. Earlier function is to get value in current row, so we can know that we let 'Table 2'[ID] = current 'Table 1'[ID]. Then we need to get the last date in "Table 2"before current "Table 1" date, so we use the same logic to get _MaxDate. Var function is hypothesis function, we can use the result in it in below code.

Then we get result.

 

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hello @Anonymous and sorry for the delay in my answer! I've tested the formula with three different datasets. 

This last one seems to have some kind of issue, very weird: for some specific cases (less than 100 in approx 2 million), it certainly brings a very weird result.

Let me show you in detail, perhaps you're able to find what could be causing the issue. To test if this was working OK, I modified the query to bring date instead of location on another column. So, I compared location and dates brought by your first solution, and the results from the current one

The results are the following (I'm going to just show a couple of IDs, not the one hundred)

 

Table1.DateTable1.IDTable2.Date(Query1)Table2.Location(Query1)Table2.Location(Query2)Table2.Date(Date)
23/09/2020162614.3328/08/20202571514127/04/2141
29/09/2020128742.4728/08/20201399246527/04/2141
23/03/2021128621.4819/03/202183561269607/06/2142
20/10/2020102873.7226/07/202012224420/02/2141
23/02/2021102873.7226/07/202012224420/02/2141
22/10/2020102873.6726/07/20202420/02/2141
29/09/2020102873.7326/07/202012625220/02/2141
27/10/2020102873.6026/07/20205711420/02/2141
11/12/20202379.6011/12/20203301660223/11/2141
11/12/2020202996.6011/12/20203301660223/11/2141
26/07/2020213211.6826/07/20201220/02/2141
26/07/2020103663.6826/07/20201220/02/2141
26/07/2020102873.6726/07/20202420/02/2141
26/07/202057086.6726/07/20202420/02/2141
26/07/202057086.6026/07/20205711420/02/2141
26/07/202057086.7226/07/202012224420/02/2141
26/07/202057086.7326/07/202012625220/02/2141
26/07/2020102873.6026/07/20205711420/02/2141
26/07/2020102873.7326/07/202012625220/02/2141
26/07/2020102873.7226/07/202012224420/02/2141
28/08/2020139316.3928/08/20202035223227/04/2141
28/08/2020139316.3928/08/20202035223227/04/2141
28/08/2020162614.3328/08/20202571514127/04/2141

 

As you can see there's something weird going on with dates and location. So, I went into Table 2 and checked these IDs:

 

Table2.LocationTable2.DateTable2.ID
330111/12/2020202996.60
330111/12/20202379.60
330111/12/2020202996.60
330111/12/20202379.60
126/07/2020103663.68
126/07/2020213211.68
226/07/202057086.67
226/07/2020102873.67
5726/07/202057086.60
5726/07/2020102873.60
12626/07/2020102873.73
12626/07/202057086.73
12226/07/202057086.72
12226/07/2020102873.72
19728/08/2020139316.39
203528/08/2020139316.39
257028/08/2020162614.33
257128/08/2020162614.33
106628/08/2020128742.47
139928/08/2020128742.47
354511/02/2021139316.39
434019/03/2021128621.48
835619/03/2021128621.48
613421/07/20218409.49
316121/07/20218409.49
274928/08/2020128621.48
12226/07/2020102873.72
12226/07/202057086.72
126/07/2020213211.68
126/07/2020103663.68
226/07/202057086.67
226/07/2020102873.67
5726/07/2020102873.60
5726/07/202057086.60
12626/07/202057086.73
12626/07/2020102873.73

 

The only thing I find in common is that they all are duplicate IDs (repeated). Seems this last query does something weird with the dates, and also adds up the location codes for repeated IDs, could it be?

Again, thanks a lot for your help, sorry for the delay, it took me awhile to test it!

 

PS: I just saw how the table looks on the post and it looks very weird. I'm trying to edit it. Unfortunately I don't know how to add a file.

sebasj
Frequent Visitor

@Anonymous 

 

I'm sure the issue is within this line

RETURN
    CALCULATE (
        SUM ( 'Table 2'[Location] ),

When it finds two occurrences, it adds the location values and also the dates it seems. I tried to fix it but I have been unable to. 

In your explanation you mention "Expression is the value you want to get or calculate from, so we add 'Table 2'[Location] into this place. Your Location column looks like in number type, so we need to use sum function to get it" which is correct, it is number type, but I dont know how else it could work without summing them in the output. Any help will be greatly appreciated!

 

Anonymous
Not applicable

Hi @sebasj 

I think I find the reason why you get wrong result. 

Let's see Date 2020/08/28 and ID 162614.33 in Table 1 in your sample. In Table 2, we can find the nearest date is 2020/08/28, it contains 2571 and 2570 two results. So in Query 1 max function will only show the max location 2571 , and in Query 2 sum function will show 2571+2570 = 5141. We couldn't expand the table by calculated column. Power Query in Power Query Editor will be a good way to achieve your goal.

1. Merge Table 1 and Table 2 by ID column.

1.png

2. Expand Table 2 by Date and Location columns.

3. Add a Conditional column as below, to let Table2.Date <=Date in Table1 show 1, or show 0.2.png

4. Remove 0 from Conditional Column by filter.

5. Group By with ID and Date column to show Max Date. And expand all rows to show Location you need.

3.png

Finally result is as below.

4.png

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

I still haven't answered because I'm testing on different datasets, but the solution seems to be working.

Let me finish testings and ASAP I'll mark the solution (a couple of days I think at most, have a deadline on me).

Thanks a lot!

Anonymous
Not applicable

Hi @sebasj 

Could you tell me if your problem has been solved? If it is, kindly Accept it as the solution. More people will benefit from it. Or you are still confused about it, please provide me with more details about your table and your problem or share me with your pbix file from your Onedrive for Business.

 

Best Regards,

Rico Zhou

@Anonymous With a little tweaking, this worked out perfectly (or apparently it seems, so far all the tests I've ran against a sample gave a correct result) so I'm accepting as a solution.

Again as I mentioned before, I'm thankful for the time you took to help me to solve this issue. I have a question though: would there be any other method in DAX instead of SUM that could allow expanding and grabbing the correct data? I'm not asking because I need it since this solution works perfectly, I'm asking just to learn more.

Thanks and best regards!

Sebastian

Ashish_Mathur
Super User
Super User

Hi,

This calculated column in Table1 works

=LOOKUPVALUE(Table2[Location],Table2[Date],CALCULATE(MAX(Table2[Date]),FILTER(Table2,Table2[ID]=EARLIER(Table1[ID])&&Table2[Date]<=EARLIER(Table1[Date]))),[ID],Table1[ID])

Hope this helps.

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

In principle this works, but I've found two issues:

- The first one is that it expects to have unique values (at least that's the error it's giving me), while unfortunately both tables can have more than one occurrence of the ID, what varies is the date (the ID is not an unique identifier per row). That was my bad, I forgot to mention that 

- The second one is that it seems to have some issues when the dataset is too big. The datasets I'm dealing with have >3 million rows by approx. 60 columns and it took a good half an hour to give me a result (which was the error i mentioned above). This was also my bad, I also forgot to mention that. 

I tried to fix it but my knowledge isn't enough to make it work. 

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.