Hopefully a quick question. I have an IEnumerable of type Position where Position is defined as follows:
public class Position {
public double Latitude { get; set; }
public double Longitude { get; set; }
}
What I need to do is quickly sort through the IEnumerable to find elements that fall within a certain distance of eachother. The elements in the IEnumerable do not get populated in any specific order, but at any one time I need to be able to compute which of the members of the IEnumerable fall within x kilometers of eachother.
Now, I already have a Haversine implementation and for argument's sake, we can say it's called GetDistance and has the following signature:
double GetDistance(Position one, Position two);
I've got a few ideas, but none of them feel particularly efficient to my mind. To give a little more info, it's unlikely the IEnumerable will ever hold more than 10,000 items at any one time.
What I'd like to arrive at is something, perhaps an extension method, that lets me invoke it to return an IEnumerable which contains just the subset from the original collection which meets the criteria, for example:
OriginalEnumerable.GetMembersCloserThan(kilometers: 100);
Any help, much appreciated.
EDIT: For clarity, consider the IEnumerable I want to search describes a circle with radius r. It's members are coordinates within the circle. I'm trying to determine which members (points) are within a given proximity of eachother.
Something like this? Assuming GetDistance is available.
public static IEnumerable<Position> GetMembersCloserThan(this IEnumerable<Position> positions, double maxDistance)
{
return positions.Where(a => positions.Any(b => a != b && GetDistance(a, b) < maxDistance));
}
Edit I see now you are also interested in performance. The above is not particularly fast, though it is not horribly slow either since the math is pretty simple for comparing distances. Let me know if it satisfies your requirements.
Edit 2 This one is much faster--it won't test against past failures and will automatically add a match to the success list
public static IEnumerable<Position> GetMembersCloserThan(this IEnumerable<Position> positions, double maxDistance)
{
List<Position> closePositions = new List<Position>();
List<Position> testablePositions = positions.ToList();
foreach (Position position in positions)
{
// Skip this one, it has already been matched.
if (closePositions.Contains(position))
continue;
bool isClose = false;
foreach (Position testAgainstPosition in testablePositions)
{
if (position == testAgainstPosition)
continue;
if (GetDistance(position, testAgainstPosition) < maxDistance)
{
// Both the position and the tested position pass.
closePositions.Add(position);
closePositions.Add(testAgainstPosition);
isClose = true;
break;
}
}
// Don't test against this position in the future, it was already checked.
if (!isClose)
{
testablePositions.Remove(position);
}
}
return closePositions;
}
If you need more performance: Put the items in a Lists sorted by latitude.
To calculate your desired set of locations, iterate one of them. But for your distance calculation, you only need to consider the items, that are at most 100km different in latitude. That means, you can go back item by item, until the difference is greater than 100km. You need to wrap around the end of the list, however.
Mark all items (or yyield return) that are closer than 100km and move on.
Although I cannot quantify the expense, the sorting should amortize for large sets. Also, it may perform bad if most point lie at similar latitude. If that is an issue, use a 2D-Dictionary with rounded coordinates as keys.
Related
I have a bipartite graph that's quite large (~200 vertices per part, usually with 20,000 or more edges in between), and I'm trying to find a Minimum Vertex Cover in it because I'm looking for an assignment between the vertices of the two parts.
According to Koenig's theorem, there is such a cover with the same size as the cardinality of a Maximum Matching (https://en.wikipedia.org/wiki/K%C5%91nig%27s_theorem_(graph_theory)).
I have implemented the Hopcroft Karp algorithm which gives me a Maximum Matching. If needed, I can provide my implementation of that, but I doubt that's where my problem is.
What's the actual problem?
I suspect my implementation, taken from the Wikipedia article above (https://en.wikipedia.org/wiki/K%C5%91nig%27s_theorem_(graph_theory)#Constructive_proof), has an error in it, but after several hours of checking it I am unable to find the cause of the bug: While the Hopcroft Karp algorithm finds a maximum matching with 192 edges, the Minimum Vertex Cover returns 200 vertices. As this is a bipartite graph, these numbers shouldn't differ (because of the theorem). Maybe you can help me out and tell me where my mistake is. Thanks in advance!!
(Student's and Project's are my two types of vertices in the bipartite graph)
internal static List<Vertex> FindMinimumVertexCover(IReadOnlyList<Edge> matching, IReadOnlyList<Vertex> studentVertices, IReadOnlyList<Vertex> projectVertices)
{
var unmatchedStudentNodes = studentVertices.Except(matching.Select(e => e.GetStudentVertex())).ToList();
var visitedVertices = new List<Vertex>();
var edgeComparer = new EdgeComparer();
foreach (var unmatchedStudentNode in unmatchedStudentNodes)
{
visitedVertices = visitedVertices.Union(FindAlternatingNodes(matching, unmatchedStudentNode, visitedVertices, edgeComparer)).ToList();
}
visitedVertices = unmatchedStudentNodes.Union(visitedVertices).ToList();
return studentVertices.Except(visitedVertices).Union(projectVertices.Intersect(visitedVertices)).ToList();
}
private static List<Vertex> FindAlternatingNodes(IReadOnlyList<Edge> matching, Vertex initialVertex, List<Vertex> visitedVertices, EdgeComparer edgeComparer)
{
if (visitedVertices.Contains(initialVertex))
return Enumerable.Empty<Vertex>().ToList();
visitedVertices.Add(initialVertex);
List<Edge> unmatchedEdges = initialVertex.Edges.Except(matching, edgeComparer).ToList();
foreach (Edge unmatchedEdge in unmatchedEdges)
{
Vertex visitedVertex = unmatchedEdge.GetProjectVertex();
Edge matchedEdge = matching.SingleOrDefault(e => e.GetProjectVertex().Equals(visitedVertex));
if (matchedEdge != default(Edge))
{
visitedVertices.Add(visitedVertex);
visitedVertex = matchedEdge.GetStudentVertex();
visitedVertices = visitedVertices.Union(FindAlternatingNodes(matching, visitedVertex, visitedVertices, edgeComparer)).ToList();
}
}
return visitedVertices;
}
class EdgeComparer : IEqualityComparer<Edge>
{
public bool Equals(Edge x, Edge y)
{
if (Object.ReferenceEquals(x, y))
return true;
if (x is null || y is null)
return false;
return Object.ReferenceEquals(x.GetStudentVertex(), y.GetStudentVertex()) && Object.ReferenceEquals(x.GetProjectVertex(), y.GetProjectVertex());
}
public int GetHashCode(Edge edge)
{
return (Student: edge.GetStudentVertex(), Project: edge.GetProjectVertex()).GetHashCode();
}
}
I now found the problem. I want to thank #David Eisenstat, as he suggested generating small random graphs repeatedly.
The problem was something in my implementation of the Vertex class.
Every time I create an instance of the Edge class, I add that Edge to the corresponding vertices as well (meaning I effectively got 3 references to an edge). Calling the outer algorithm again (which calls the method above) only recreated the edge list, but left the old references in the vertices intact. Thus, following calls didn't start freshly, and the Minimum Vertex Cover found edges in the graph that weren't existent anymore (namely the List<Edge> unmatchedEdges = initialVertex.Edges.Except(matching, edgeComparer).ToList(); line).
I have created my own class 'Geolocation' which just holds double variables 'longitude' and 'latitude'.
I have a List and another static Geolocation object. I need to loop through my List and analyse which of these locations is closest to the static location. By 'closest' I am referring to real life distance, so I think it would be called a geospatial analysis.
My own pseudo idea is along the lines of this.
Geolocation[] closestPositions = new Geolocation[]
for (int i = 0; i < TrainStations.AllStations.Count; i++)
{
Geolocation A = TrainStations.AllStations[i];
Geolocation B = DataHandler.UserCurrentPosition;
if (A and B are closer than what is stored in closestPosition)
{
closestPosition[0] = A;
closestPosition[1] = B;
}
}
By the end of this loop I would be left with the two closest points (more specifically which train station in the city is closest to the user's current position).
However, I'm really not sure if the way my pesudo code would function is most efficient, and I don't know how to do the actual analysis (get the distances from A to B and measure which is shortest).
Rather than creating my own class 'Geolocation', I should have used the inbuilt System.Device.Location.GeoCoordinate class.
https://msdn.microsoft.com/en-us/library/system.device.location.geocoordinate(v=vs.110).aspx
This class has a function 'GetDistanceTo(Geocoordinate)' which "Returns the distance between the latitude and longitude coordinates that are specified by this GeoCoordinate and another specified GeoCoordinate."
I can use this in my loop, using GeoCordinate objects, to analyse which points are closest.
Note: I define a 'Jagged Multidimensional' specifically as jmArray[][,].
I'm trying to wrap my head around the use of this type of array, in order to hold simple coordinates, a pair of Integers. This array will be used to create a 3x3 grid, so in my head I'm seeing:
jmArray[N][X,Y]
Where N is the number of the grid slice, and X,Y are the coordinates of the slice. So:
jmArray[2][3,2]
Would mean that slice 2 lies at coordinate 3,2.
I've been trying to assign values to each slice, but I'm stuck somewhere...
jmArray[0][,] = new int[1,2] {{3,3}};
A little help in understanding how to do this properly would be nice.
Unless I'm misunderstanding you, a simpler way to do this would be to create a dictionary of size 3 tuples.
var space = Dictionary<Tuple<int, int, int>, TPointValue>;
// Fill up space with some points
space[Tuple.Create(3,3,1)] = new TPointValue(42);
// Retrieve point from 3d space
TPointValue point3_3_1 = space[Tuple.Create(3,3,1)];
I'll concede that in its current form, this approach makes retrieval of planes or basis lines cumbersome and inefficient compared to jagged arrays, although it does makes assignment and retrieval of points very efficient.
However: if you were to wrap this data structure in a class of your own that provides methods for accessing planes/lines etc, you could very easily and efficiently calculate the keys required to obtain any set of points beforehand, eg. those within a plane/line/polygon, and then access these points very efficiently.
PS: Note that the value at a point need not be some fancy type like TPointValue, it could be just a string or float or whatever you like.
You can achieve it like this:
int[][,] jmArray = new int[3][,];
jmArray[0] = new int[1,2] {{3,3}};
Instead of a complicated array a simple class with meaningful names might work better:
class Slice
{
int X = 0;
int Y = 0;
public Slice()
{
}
public Slice(int _X, int _Y)
{
X = _X;
Y = _Y;
}
}
Slice[] Slices = new Slice[9];
The index of the array will be the position of the slice.
I found this algorithm here.
I have a problem, I cant seem to understand how to set up and pass my heuristic function.
static public Path<TNode> AStar<TNode>(TNode start, TNode destination,
Func<TNode, TNode, double> distance,
Func<TNode, double> estimate) where TNode : IHasNeighbours<TNode>
{
var closed = new HashSet<TNode>();
var queue = new PriorityQueue<double, Path<TNode>>();
queue.Enqueue(0, new Path<TNode>(start));
while (!queue.IsEmpty)
{
var path = queue.Dequeue();
if (closed.Contains(path.LastStep))
continue;
if (path.LastStep.Equals(destination))
return path;
closed.Add(path.LastStep);
foreach (TNode n in path.LastStep.Neighbours)
{
double d = distance(path.LastStep, n);
var newPath = path.AddStep(n, d);
queue.Enqueue(newPath.TotalCost + estimate(n), newPath);
}
}
return null;
}
As you can see, it accepts 2 functions, a distance and a estimate function.
Using the Manhattan Heuristic Distance function, I need to take 2 parameters. Do I need to modify his source and change it to accepting 2 parameters of TNode so I can pass a Manhattan estimate to it? This means the 4th param will look like this:
Func<TNode, TNode, double> estimate) where TNode : IHasNeighbours<TNode>
and change the estimate function to:
queue.Enqueue(newPath.TotalCost + estimate(n, path.LastStep), newPath);
My Manhattan function is:
private float manhattanHeuristic(Vector3 newNode, Vector3 end)
{
return (Math.Abs(newNode.X - end.X) + Math.Abs(newNode.Y - end.Y));
}
Good question. I agree that the article was confusing. I've updated it to address your question.
First, to answer the question you asked: should you modify the code given to take a different function? If you want, sure, but you certainly don't have to. My advice is to pass the function that the algorithm wants, because that's the function that it needs. Why pass information that the algorithm doesn't need?
How do to that?
The A* algorithm I give takes two functions.
The first function gives the exact distance between two given neighbouring nodes.
The second function gives the estimated distance between a given node and the destination node.
It is the second function that you don't have.
If you have a function that gives the estimated distance between two given nodes and you need a function that gives the estimated distance between a given node and the destination node then just build that function:
Func<Node, Node, double> estimatedDistanceBetweenTwoNodes = whatever;
Func<Node, double> estimatedDistanceToDestination = n=>estimatedDistanceBetweenTwoNodes(n, destination);
And you're done. Now you have the function you need.
This technique of turning a two-parameter function into a one-parameter function by fixing one of the parameters to a certain value is called "partial function application" and it is extremely common in functional programming.
Is that all clear?
Now on to the second and much more serious problem. As I described in my articles, the correct operation of the algorithm is predicated on the estimation function being conservative. Can you guarantee that the Manhattan distance never overestimates? That seems unlikely. If there is a "diagonal" street anywhere in the grid then the Manhattan distance overestimates the optimal distance between two points, and the A* algorithm will not find it. Most people use the Euclidean distance (aka the L2 norm) for the A* algorithm because the shortest distance between two points is by definition not an overestimate. Why are you using the Manhattan distance? I am very confused as to why you think this is a good idea.
Yes, you'd need to modify the code, as there is no possibilty to fit an estimate method in there with two TNodeparameters.
I'm working in an application (C#) that applies some readability formulas to a text, like Gunning-Fog, Precise SMOG, Flesh-Kincaid.
Now, I need to implement the Fry-based Grade formula in my program, I understand the formula's logic, pretty much you take 3 100-words samples and calculate the average on sentences per 100-words and syllables per 100-words, and then, you use a graph to plot the values.
Here is a more detailed explanation on how this formula works.
I already have the averages, but I have no idea on how can I tell my program to "go check the graph and plot the values and give me a level." I don't have to show the graph to the user, I only have to show him the level.
I was thinking that maybe I can have all the values in memory, divided into levels, for example:
Level 1: values whose sentence average are between 10.0 and 25+, and whose syllables average are between 108 and 132.
Level 2: values whose sentence average are between 7.7 and 10.0, and .... so on
But the problem is that so far, the only place in which I have found the values that define a level, are in the graph itself, and they aren't too much accurate, so if I apply the approach commented above, trying to take the values from the graph, my level estimations would be too much imprecise, thus, the Fry-based Grade will not be accurate.
So, maybe any of you knows about some place where I can find exact values for the different levels of the Fry-based Grade, or maybe any of you can help me think in a way to workaround this.
Thanks
Well, I'm not sure about this being the most efficient solution, neither the best one, but at least it does the job.
I gave up to the idea of having like a math formula to get the levels, maybe there is such a formula, but I couldn't find it.
So I took the Fry's graph, with all the levels, and I painted each level of a different color, them I loaded the image on my program using:
Bitmap image = new Bitmap(#"C:\FryGraph.png");
image.GetPixel(int x, int y);
As you can see, after loading the image I use the GetPixel method to get the color at the specified coordinates. I had to do some conversion, to get the equivalent pixels for a given value on the graph, since the scale of the graph is not the equivalent to the pixels of the image.
In the end, I compare the color returned by GetPixel to see which was the Fry readability level of the text.
I hope this may be of any help for someone who faces the same problem.
Cheers.
You simply need to determine the formula for the graph. That is, a formula that accepts the number of sentences and number of syllables, and returns the level.
If you can't find the formula, you can determine it yourself. Estimate the linear equation for each of the lines on the graph. Also estimate the 'out-of-bounds' areas in the 'long words' and 'long sentences' areas.
Now for each point, just determine the region in which it resides; which lines it is above and which lines it is below. This is fairly simple algebra, unfortunately this is the best link I can find to describe how to do that.
I have made a first pass at solving this that I thought I would share in case someone else is looking sometime in the future. I built on the answer above and created a generic list of linear equations that one can use to determine an approximate grade level. First had to correct the values to make it more linear. This does not take into account the invalid areas, but I may revisit that.
The equation class:
public class GradeLineEquation
{
// using form y = mx+b
// or y=Slope(x)=yIntercept
public int GradeLevel { get; set; }
public float Slope { get; set; }
public float yIntercept { get; set; }
public float GetYGivenX(float x)
{
float result = 0;
result = (Slope * x) + yIntercept;
return result;
}
public GradeLineEquation(int gradelevel,float slope,float yintercept)
{
this.GradeLevel = gradelevel;
this.Slope = slope;
this.yIntercept = yintercept;
}
}
Here is the FryCalculator:
public class FryCalculator
{
//this class normalizes the plot on the Fry readability graph the same way a person would, by choosing points on the graph based on values even though
//the y-axis is non-linear and neither axis starts at 0. Just picking a relative point on each axis to plot the intercept of the zero and infinite scope lines
private List<GradeLineEquation> linedefs = new List<GradeLineEquation>();
public FryCalculator()
{
LoadLevelEquations();
}
private void LoadLevelEquations()
{
// load the estimated linear equations for each line with the
// grade level, Slope, and y-intercept
linedefs.Add(new NLPTest.GradeLineEquation(1, (float)0.5, (float)22.5));
linedefs.Add(new NLPTest.GradeLineEquation(2, (float)0.5, (float)20.5));
linedefs.Add(new NLPTest.GradeLineEquation(3, (float)0.6, (float)17.4));
linedefs.Add(new NLPTest.GradeLineEquation(4, (float)0.6, (float)15.4));
linedefs.Add(new NLPTest.GradeLineEquation(5, (float)0.625, (float)13.125));
linedefs.Add(new NLPTest.GradeLineEquation(6, (float)0.833, (float)7.333));
linedefs.Add(new NLPTest.GradeLineEquation(7, (float)1.05, (float)-1.15));
linedefs.Add(new NLPTest.GradeLineEquation(8, (float)1.25, (float)-8.75));
linedefs.Add(new NLPTest.GradeLineEquation(9, (float)1.75, (float)-24.25));
linedefs.Add(new NLPTest.GradeLineEquation(10, (float)2, (float)-35));
linedefs.Add(new NLPTest.GradeLineEquation(11, (float)2, (float)-40));
linedefs.Add(new NLPTest.GradeLineEquation(12, (float)2.5, (float)-58.5));
linedefs.Add(new NLPTest.GradeLineEquation(13, (float)3.5, (float)-93));
linedefs.Add(new NLPTest.GradeLineEquation(14, (float)5.5, (float)-163));
}
public int GetGradeLevel(float avgSylls,float avgSentences)
{
// first normalize the values given to cartesion positions on the graph
float x = NormalizeX(avgSylls);
float y = NormalizeY(avgSentences);
// given x find the first grade level equation that produces a lower y at that x
return linedefs.Find(a => a.GetYGivenX(x) < y).GradeLevel;
}
private float NormalizeY(float avgSentenceCount)
{
float result = 0;
int lower = -1;
int upper = -1;
// load the list of y axis line intervalse
List<double> intervals = new List<double> {2.0, 2.5, 3.0, 3.3, 3.5, 3.6, 3.7, 3.8, 4.0, 4.2, 4.3, 4.5, 4.8, 5.0, 5.2, 5.6, 5.9, 6.3, 6.7, 7.1, 7.7, 8.3, 9.1, 10.0, 11.1, 12.5, 14.3, 16.7, 20.0, 25.0 };
// find the first line lower or equal to the number we have
lower = intervals.FindLastIndex(a => ((double)avgSentenceCount) >= a);
// if we are not over the top or on the line grab the next higher line value
if(lower > -1 && lower < intervals.Count-1 && ((float) intervals[lower] != avgSentenceCount))
upper = lower + 1;
// set the integer portion of the respons
result = (float)lower;
// if we have an upper limit calculate the percentage above the lower line (to two decimal places) and add it to the result
if(upper != -1)
result += (float)Math.Round((((avgSentenceCount - intervals[lower])/(intervals[upper] - intervals[lower]))),2);
return result;
}
private float NormalizeX(float avgSyllableCount)
{
// the x axis is MUCH simpler. Subtract 108 and divide by 2 to get the x position relative to a 0 origin.
float result = (avgSyllableCount - 108) / 2;
return result;
}
}