The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Hi All,
I have a table of crime stats and a table of CCTV sites, see example tables below. I'd like to map crimes around each CCTV site, for example within 500 m of each site as shown below (although not with the site labels, that was just for clarification),
I can apply the haversine calculation (as shown be Phil Seamark in image below). I assume the best option is to add a calculated column to the crime table. I tried below, but I get a range of errors. I'll also need to add criteria to select the closest CCTV site in case there are two with 500 m, but thought I'd focus on the first part.
Any thoughts on my approach or my calculated column?
CCTV Installaton | |||
Camera ID | Long | Lat | Installation Year |
201 | -36.9722 | 174.7867 | 2019 |
202 | -36.9651 | 174.795 | 2017 |
203 | -36.9665 | 174.8085 | 2018 |
204 | -36.9519 | 174.8385 | 2020 |
205 | -36.9656 | 174.8156 | 2019 |
Crime Stats | |||||
ID | Long | Lat | Crime Type | Year | Calculated Column |
1 | 174.7902527 | -36.97949982 | Property | 2015 | |
2 | 174.7902527 | -36.97949982 | Injury | 2021 | 203 |
3 | 174.8324585 | -36.96062088 | Cyber | 2015 | |
4 | 174.8324585 | -36.96062088 | Injury | 2017 | |
5 | 174.8353424 | -36.93717957 | Injury | 2020 | |
6 | 174.8353424 | -36.93717957 | Injury | 2016 | 201 |
7 | 174.8353424 | -36.93717957 | Cyber | 2020 | |
8 | 174.8505554 | -36.9417305 | Cyber | 2019 | |
9 | 174.8505554 | -36.9417305 | Cyber | 2018 | |
10 | 174.8505554 | -36.9417305 | Property | 2018 | |
11 | 174.7889252 | -37.00097275 | Injury | 2020 | 205 |
12 | 174.7889252 | -37.00097275 | Property | 2016 | |
13 | 174.7889252 | -37.00097275 | Property | 2018 | |
14 | 174.8352814 | -36.93664932 | Injury | 2019 | |
15 | 174.7895508 | -36.97363281 | Injury | 2018 | 202 |
16 | 174.7895508 | -36.97363281 | Property | 2016 | 202 |
17 | 174.7895508 | -36.97363281 | Property | 2017 | |
18 | 174.8317566 | -36.96046066 | Injury | 2022 | |
19 | 174.8317566 | -36.96046066 | Cyber | 2021 | |
20 | 174.8399353 | -36.94024658 | Injury | 2017 |
see message 6
Ahh, yes I did miss message 6. With large data sets, I'm getting some unusual banding when I specify distance of 200 m (see image below). Also, the haversine formula that you used gives quite different distance results. At least a factor of 2 different. I thought they would be similar. I checked the original formula by measuring a distance of Google Maps, and it gave me a matching result within 1 decimal place.
I refactored the formula to its more common format - see if that is better. I also included a simpler version that gives the same result.
DC2 =
var a = summarize(CCTV,CCTV[Camera ID],CCTV[Lat],CCTV[Long],"Dist",12742000 * asin(sqrt(power(sin(RADIANS(CCTV[Lat] - Crimes[Lat]) / 2), 2) + power(sin(RADIANS(CCTV[Long] - Crimes[Long]) / 2), 2) * cos(RADIANS(CCTV[Lat])) * cos(RADIANS(Crimes[Lat])))))
var b = TOPN(1,a,[Dist],ASC)
return CONCATENATEX(b,[Dist])
DC3 =
var a = summarize(CCTV,CCTV[Camera ID],CCTV[Lat],CCTV[Long],"Dist",6371000 * acos(sin(RADIANS(CCTV[Lat]))*sin(RADIANS(Crimes[Lat]))+cos(RADIANS(CCTV[Lat]))*cos(RADIANS(Crimes[Lat]))*cos(RADIANS(CCTV[Long]-Crimes[Long]))))
var b = TOPN(1,a,[Dist],ASC)
return CONCATENATEX(b,[Dist])
Your banding is likely caused by picking a column from the wrong table.
Sorry for the delay responding. I've been testing the above. I was getting some spurious results, so I stuck with the original haversine formula. That combined with your "brut force it" method and a distance slicer worked. So, thanks very much for your help.
Ideally I would like to simplify the code so I only have one calculted column that returns the CCTV ID when less than 500 m away. So if anyone has any suggestions to refine the code it would be greatfully received. Otherwise I'll keep trying to figure out some options.
that returns the CCTV ID when less than 500 m away.
You can only decide that after you have computed the distance.So there is no advantage in throwing away the result. Imagine you then want to say "less than 1000m" etc. Keep the actual values.
One thing you can consider is to ditch the Haversine and do the simple flat earth computation based on the average meters per degree for your latitude and longitude area. Much easier to compute.
1. what are the errors?
2. wouldn't you want this as a measure, rather than a calculated column?
Hi Ibendin,
It's not recognising the references for longitude / latitude for each table. Even for the Crime[Latitude] and Crime[Longitude] is seems to be wanting a measure. I put a screen shot below. I think it's something to do with the fact that the tables are unrelated and I need to do a row by row operation.
I would have thought a calculated column would be better for a row by row operation. Also, it can process the information during the refresh which means it should respond quicker (my basic understanding). But obviously my DAX isn't right.
Basically I just want to add a CCTV site in the calculated column if it's within a certain radius of the crime location. In my head it makes sense, but I'm not sure how to get it to work with unrelated tables.
You can combine both tables. Add a Location Type identifier to distinguish between cameras and crimes.
Hi Ibendin,
thanks for the example. I'm less worried about displaying the CCTV location and more interested in selecting the crimes within a certain radius based on the haversine calculation. Eventually I want to campare crime rates within a radius of CCTV locations vs crime rates beyond that radius to see if there is a difference. I think to do that, I'll need to have a calculated column in the crime table that adds a CCTV site if its within a radius. Any thoughts.
Oh, and thanks re the headers.
Cheers Andrew
I would brute force it , calculate the distance from a crime location to all camera locations and then pick the closest. Storing that distance will probably be useful too.
see attached for a possible implementation. Note that DAX has a RADIANS function.
Note that the Haversine distance is rather meaningless in an urban environment.
Lol @ "brute force it". I'm open to anything. although for rural locations the nearest CCTV location may be miles away. But I could then filter them out. So how would "brute force it" look? Would it be a calculated column added to the Crime table?
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
78 | |
71 | |
52 | |
50 |
User | Count |
---|---|
123 | |
119 | |
76 | |
64 | |
60 |