Calculate distance of closest point using latitude/longitude en masse?

Calculate distance of closest point using latitude/longitude en masse? - c#

I'm trying to figure out, on a theoretical basis, how to find the closest points to a single point. All of these points (75,000+ points) are mapped out with latitude and longitude. The Haversine formula for distance is what I'm using to find the distance between two points, but does this formula scale on the fly?
I'm using a web front end + SQL Server backend. I can't even begin to imagine how to do this... find all distances on the fly then sort them based on distance? Again, I'm wondering if this scales to as many points as I have.

This article goes into detail about this exact subject.
The main optimization is using an index on lat/long. If you know that there must be a point within some distance of your query, you can use that known distance to only check points within that box. Because of the index, the database does a range scan.

I assume you have lat and long in an SQL database.
The best you can do with a problem like this is just to find all distances and sort them by value.
SELECT (haversine formula) as distance
FROM table
ORDER BY distance ASC

Related

DataStructure algorithm for pathfinding in a pointcloud

I was unsure how to phrase the title. I am searching for good data structure / algorithm combination for the following problem:
I have about 20000 objects, each containing a set of values (integers) and a XYZ-Position (doubles). I want to find the object that fits the following conditions best:
1. Needs to be within a maximum distance from a given starting point
2. From a given starting point, it has to be reachable by requiring no "hopping" (i.e. "travelling" from one object with position XYZ to another object's position) with a distance greater than a given threshold
3. It has to have the maximum value (of a given position in the set of integers)
At first I thought about graph theory and pathfinding, but it does not seem to fit well. I have no distinct edges, so I would have to link every point with every other point and use the distance as a weight on the edge. This would result in a lot(!) of edges. Second problem is that pathfinding (if I am not mistaken) only takes one criteria (usually distance or costs) as search criteria. But I would need multiple criteria (distance, hop limit, int value).
Any thoughts and also any good libraries to solve this well?

How to match SURF interest points to a database of images

I am using the SURF algorithm in C# (OpenSurf) to get a list of interest points from an image. Each of these interest points contains a vector of descriptors , an x coordinate (int), an y coordinate (int), the scale (float) and the orientation (float).
Now, i want to compare the interest points from one image to a list of images in a database which also have a list of interest points, to find the most similar image. That is: [Image(I.P.)] COMPARETO [List of Images(I.P.)]. => Best match. Comparing the images on an individual basis yields unsatisfactory results.
When searching stackoverflow or other sites, the best solution i have found is to build an FLANN index while at the same time keeping track of where the interest points comes from. But before implementation, I have some questions which puzzle me:
1) When matching images based on their SURF interest points an algorithm I have found does the matching by comparing their distance (x1,y1->x2,y2) with each other and finding the image with the lowest total distance. Are the descriptors or orientation never used when comparing interest points?
2) If the descriptors are used, than how do i compare them? I can't figure out how to compare X vectors of 64 points (1 image) with Y vectors of 64 points (several images) using a indexed tree.
I would really appreciate some help. All the places I have searched or API I found, only support matching one picture to another, but not to match one picture effectively to a list of pictures.

There are multiple things here.
In order to know two images are (almost) equal, you have to find the homographic projection of the two such that the projection results in a minimal error between the projected feature locations. Brute-forcing that is possible but not efficient, so a trick is to assume that similar images tend to have the feature locations in the same spot as well (give or take a bit). For example, when stitching images, the image to stitch are usually taken only from a slightly different angle and/or location; even if not, the distances will likely grow ("proportionally") to the difference in orientation.
This means that you can - as a broad phase - select candidate images by finding k pairs of points with minimum spatial distance (the k nearest neighbors) between all pairs of images and perform homography only on these points. Only then you compare the projected point-pairwise spatial distance and sort the images by said distance; the lowest distance implies the best possible match (given the circumstances).
If I'm not mistaken, the descriptors are oriented by the strongest angle in the angle histogram. Theat means you may also decide to take the euclidean (L2) distance of the 64- or 128-dimensional feature descriptors directly to obtain the actual feature-space similarity of two given features and perform homography on the best k candidates. (You will not compare the scale in which the descriptors were found though, because that would defeat the purpose of scale invariance.)
Both options are time consuming and direcly depend on the number of images and features; in other word's: stupid idea.
Approximate Nearest Neighbors
A neat trick is to not use actual distances at all, but approximate distances instead. In other words, you want an approximate nearest neighbor algorithm, and FLANN (although not for .NET) would be one of them.
One key point here is the projection search algorithm. It works like this:
Assuming you want to compare the descriptors in 64-dimensional feature space. You generate a random 64-dimensional vector and normalize it, resulting in an arbitrary unit vector in feature space; let's call it A. Now (during indexing) you form the dot product of each descriptor against this vector. This projects each 64-d vector onto A, resulting in a single, real number a_n. (This value a_n represents the distance of the descriptor along A in relation to A's origin.)
This image I borrowed from this answer on CrossValidated regarding PCA demonstrates it visually; think about the rotation as the result of different random choices of A, where the red dots correspond to the projections (and thus, scalars a_n). The red lines show the error you make by using that approach, this is what makes the search approximate.
You will need A again for search, so you store it. You also keep track of each projected value a_n and the descriptor it came from; furthermore you align each a_n (with a link to its descriptor) in a list, sorted by a_n.
To clarify using another image from here, we're interested in the location of the projected points along the axis A:
The values a_0 .. a_3 of the 4 projected points in the image are approximately sqrt(0.5²+2²)=1.58, sqrt(0.4²+1.1²)=1.17, -0.84 and -0.95, corresponding to their distance to A's origin.
If you now want to find similar images, you do the same: Project each descriptor onto A, resulting in a scalar q (query). Now you go to the position of q in the list and take the k surrounding entries. These are your approximate nearest neighbors. Now take the feature-space distance of these k values and sort by lowest distance - the top ones are your best candidates.
Coming back to the last picture, assume the topmost point is our query. It's projection is 1.58 and it's approximate nearest neighbor (of the four projected points) is the one at 1.17. They're not really close in feature space, but given that we just compared two 64-dimensional vectors using only two values, it's not that bad either.
You see the limits there and, similar projections do not at all require the original values to be close, this will of course result in rather creative matches. To accomodate for this, you simply generate more base vectors B, C, etc. - say n of them - and keep track of a separate list for each. Take the k best matches on all of them, sort that list of k*n 64-dimensional vectors according to their euclidean distance to the query vector, perform homography on the best ones and select the one with the lowest projection error.
The neat part about this is that if you have n (random, normalized) projection axes and want to search in 64-dimensional space, you are simply multiplying each descriptor with a n x 64 matrix, resulting in n scalars.

I am pretty sure that the distance is calculated between the descriptors and not their coordinates (x,y). You can compare directly only one descriptor against another. I propose the following possible solution (surely not the optimal)
You can find for each descriptor in the query image the top-k nearest neighbors in your dataset, and later take all top-k lists and finds the most common image there.

Latitude/Longitude Math Question (Distance Between Two Coordinates)

I currently have a database query that calculate the distance between every store in the database from there to a "home" location. I'm calculating the flight distance using this formula.
http://www.movable-type.co.uk/scripts/latlong.html
Then I'm ordering them and displaying them. I discovered a much better way to do it is only search for stores whose latitude/longitude are within a range.
For example instead of calculating distance between each store in the database (over 30000), only group ones with lat/longs within a certain range and calculate distance between those.
Right now I'm trying to found out how to calculate the actual bounds. The distance has to be below 5km. So I divide 5 by 100 and cap the latitude and longitude by those amounts.
SELECT storeid, storedescription, address, city, storebannerdescription, lat, lon,
ROUND (gc_dist (lat, lon, 43.758152, -79.746639), 1) AS distance
FROM storelatlon
WHERE ((lat-43.758152) < 0.05 AND (lat -43.758152) > -0.05) AND ( (lon-(-79.746639)) < 0.05 AND (lon-(-79.746639)) > -0.05)
ORDER BY Distance

This method could work, but depending on how widely distributed your stores are, you may need to refine your bounds a bit because as you go up in latitute, you cross more longitude over a given distance. You may get too many or too few points just filtering by 0.05 deg for all latitudes.

I assume that you are using PostgreSQL based on your use of gc_dist. As long as you have PostGIS installed, then you can perform the desired logic using built-in functionality.
You will have to prepare your table for this:
First, store your location as a PostGIS point. I suggest using WGS-84 for geographic calculations.
After you do that, then you can query for all points within any given distance from your desired location using ST_Contains and ST_Buffer, where ST_Buffer wraps the center point that you want to find neighbors for (-79.746639, 43.758152) in long/lat order.
I have never used PostgreSQL for this, but I have done this with Oracle. It appears to take the same setup for PostgreSQL as it did for Oracle:
A. Tell the database that you will be creating a geometric-based column. Found syntax here.
SELECT AddGeometryColumn('db_name', 'table_name', 'desired_column_name', 4326,
'POINT', 2);
Above, 4326 is a constant "SRID" denoting WGS-84.
B. Add an index to the column. Found syntax here (this page seems very useful).
CREATE INDEX desired_index_name
ON table_name
USING gist(desired_column_name);
C. Add your data.
INSERT INTO table_name (desired_column_name)
VALUES('SRID=4326;POINT(-79.746639 43.758152)');
(I am curious if the above string needs to be contained by ST_GeomFromText based on other examples)
or (I found here)
INSERT INTO table_name (desired_column_name)
VALUES (ST_Transform(ST_MakePoint(-79.746639, 43.758152), 4326));
Note: I assume it's (X, Y), which means Longitude and Latitude ordering.
D. Query your data.
SELECT * FROM table_name
WHERE
ST_Contains(
ST_Buffer(
ST_Transform(ST_MakePoint(-79.746639, 43.758152), 4326),
distance_probably_in_meters),
desired_column_name);

C# find all Latitude and Longitude within a mile

Given a lat and long value, is there any way of finding all lat and longs that are within a specified distance? I have a db table of lat and long values which are locations of let's say street lamps, given a lat long pair how could I find all those that are within a particular distance?
I guess drawing a circle from the starting point and finding all lat and longs contained would be the best way however, I don't have the skills to do this. I am a c# developer by trade but need a few pointers in the whole geocoding world.

You could use the Haversine Formula (see #tdammers answer) to calculate a distance between each point (Lat, Long) in your table and the given point. You will have to iterate over the entire collection in order to evaluate each point individually.
Or, if you are using SQL Server 2008, then geospatial support is built-in. Each record would store the location as a geography type (possibly in addition to two discrete columns to hold Latitude and Longitude, if it's easier to have those values broken out), and then you can construct a simple SQL query:
DECLARE #Point geography = 'POINT(-83.12345 45.12345)' -- Note: Long Lat ordering required when using WKT
SELECT *
FROM tblStreetLamps
WHERE location.STDistance(#point) < 1 * 1609.344 -- Note: 1 mile converted to meters
Another similar possibility is to bring the SQL Spatial types into your .NET application. The redistributable is found here: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=CEB4346F-657F-4D28-83F5-AAE0C5C83D52 (under Microsoft® System CLR Types for SQL Server® 2008 R2).
Then, the querying can be done via LINQ. Note: This saves you from implementing the Haversine by yourself, otherwise the process of querying would be the same.
var yourLocation = SqlGeography.Point(Latitude, Longitude, 4326);
var query = from fac in FacilityList
let distance = SqlGeography
.Point(fac.Lat, fac.Lon, 4326)
.STDistance(yourLocation)
.Value
where distance < 1 * 1609.344
orderby distance
select fac;
return query.Distinct().ToList();

The haversine formula gives you the distance (in meters; converting to miles is trivial) between two lat/lon points. From there, you can probably find the reverse...

I'm a little late for answering this, but I came up with a trick years ago to do essentially the same for satellite fields of view.
There are two points on earth where you exactly know the latitude and longitude of every point a given distance from your location. Those points are the North and South poles. So let’s put the point you want at North pole. One nautical mile away is the circle of longitudes with latitude 90 degrees minus 1 minute, or 90 – 1/60 degrees = 89.9833 degrees North latitude, since 1 minute of arc = 1 nautical mile.
Now that you have the locus of longitudes one mile from the pole with latitude 89.9833, you essentially rotate the earth until the lat/long you want is where the pole used to be. This process is called “The Rotation of the Map Graticules”. The math for this is straight forward, once you’ve thought about the equations awhile. I have them buried somewhere, so I can’t get to the code easily, however the process, with the equations is in John Snyder’s book “Map Projections: A Working Manual”. You can get the pdf free at http://pubs.usgs.gov/pp/1395/report.pdf. The explanation is on pages 29 – 32.
Chuck Gantz

some time ago I was solving a problem how to get POIs along the road. I made use of quadtree, that means dividing the whole area into cells and subcells recursively. Each POI belongs to only one cell. Having these cells you can easily do high level calculation on cell level and after that search only cells with intersection. It's more game development technique but can be used here as well. Here is something about it on Wiki:
http://en.wikipedia.org/wiki/Quadtree

How to get the closest significant population centre from a latitude and longitude?

I'm currently trying to reverse geocode a series of lat/long co-ordinates using the Virtual Earth/Bing Maps web services. Whilst I can successfully retrieve an address for the positions I also need to be able to retrieve the closest significant population centre for the position so I can display a heading and distance to the centre of the nearest town/city/metropolis etc. This is for cases where the location is travelling between locations e.g. on a highway/motorway.
Has anyone out there got any ideas how to do this as I've been banging my head against it for a few days now and I've gotten nowhere!
Cheers in advance...

I think it is safe to assume that the nearest city is always quite close compared with the size of the Earth, so you can use a simple pythagoras triangle.
Suppose you are at (lat0, long0) and a trial city is at (lat1, long1).
Horizontal (EW) distance is roughly
d_ew = (long1 - long0) * cos(lat0)
This is multiplied by cos(lat0) to account for longitude lines getting closer together at high latitude.
Vertical (NS) distance is easier
d_ns = (lat1 - lat0)
So the distance between the two points is
d = sqrt(d_ew * d_ew + d_ns * d_ns)
You can refine this method for more exacting tasks, but this should be good enough for the nearest city.
For comparing distances, it will be fine to compare d squared, which means you can omit the sqrt operation.

try using the wikipedia location service, documented here
http://www.geonames.org/export/wikipedia-webservice.html

It sounds like you're looking for a database of latitudes and longitudes of major cities, so you can calculate distances.
This is a link to a page giving a few dozen, world-wide.
There may be others, most likely US-centric (but that may be what you want).

I would do the following:
table 1:
T_CityPopulation
Fields:
CityTownInfo,Population,LonLat
Then compute distance between your current LonLat for each record in table 1, using a threshold value ignore towns/citys over x miles. Then sort the results by Population.
EDIT:
Even if you don't want to maintain a table, it has to be stored somewhere, I think if you maintained it yourself at least you have control over it, vs relying on another service.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.