I have a bipartite graph that's quite large (~200 vertices per part, usually with 20,000 or more edges in between), and I'm trying to find a Minimum Vertex Cover in it because I'm looking for an assignment between the vertices of the two parts.
According to Koenig's theorem, there is such a cover with the same size as the cardinality of a Maximum Matching (https://en.wikipedia.org/wiki/K%C5%91nig%27s_theorem_(graph_theory)).
I have implemented the Hopcroft Karp algorithm which gives me a Maximum Matching. If needed, I can provide my implementation of that, but I doubt that's where my problem is.
What's the actual problem?
I suspect my implementation, taken from the Wikipedia article above (https://en.wikipedia.org/wiki/K%C5%91nig%27s_theorem_(graph_theory)#Constructive_proof), has an error in it, but after several hours of checking it I am unable to find the cause of the bug: While the Hopcroft Karp algorithm finds a maximum matching with 192 edges, the Minimum Vertex Cover returns 200 vertices. As this is a bipartite graph, these numbers shouldn't differ (because of the theorem). Maybe you can help me out and tell me where my mistake is. Thanks in advance!!
(Student's and Project's are my two types of vertices in the bipartite graph)
internal static List<Vertex> FindMinimumVertexCover(IReadOnlyList<Edge> matching, IReadOnlyList<Vertex> studentVertices, IReadOnlyList<Vertex> projectVertices)
{
var unmatchedStudentNodes = studentVertices.Except(matching.Select(e => e.GetStudentVertex())).ToList();
var visitedVertices = new List<Vertex>();
var edgeComparer = new EdgeComparer();
foreach (var unmatchedStudentNode in unmatchedStudentNodes)
{
visitedVertices = visitedVertices.Union(FindAlternatingNodes(matching, unmatchedStudentNode, visitedVertices, edgeComparer)).ToList();
}
visitedVertices = unmatchedStudentNodes.Union(visitedVertices).ToList();
return studentVertices.Except(visitedVertices).Union(projectVertices.Intersect(visitedVertices)).ToList();
}
private static List<Vertex> FindAlternatingNodes(IReadOnlyList<Edge> matching, Vertex initialVertex, List<Vertex> visitedVertices, EdgeComparer edgeComparer)
{
if (visitedVertices.Contains(initialVertex))
return Enumerable.Empty<Vertex>().ToList();
visitedVertices.Add(initialVertex);
List<Edge> unmatchedEdges = initialVertex.Edges.Except(matching, edgeComparer).ToList();
foreach (Edge unmatchedEdge in unmatchedEdges)
{
Vertex visitedVertex = unmatchedEdge.GetProjectVertex();
Edge matchedEdge = matching.SingleOrDefault(e => e.GetProjectVertex().Equals(visitedVertex));
if (matchedEdge != default(Edge))
{
visitedVertices.Add(visitedVertex);
visitedVertex = matchedEdge.GetStudentVertex();
visitedVertices = visitedVertices.Union(FindAlternatingNodes(matching, visitedVertex, visitedVertices, edgeComparer)).ToList();
}
}
return visitedVertices;
}
class EdgeComparer : IEqualityComparer<Edge>
{
public bool Equals(Edge x, Edge y)
{
if (Object.ReferenceEquals(x, y))
return true;
if (x is null || y is null)
return false;
return Object.ReferenceEquals(x.GetStudentVertex(), y.GetStudentVertex()) && Object.ReferenceEquals(x.GetProjectVertex(), y.GetProjectVertex());
}
public int GetHashCode(Edge edge)
{
return (Student: edge.GetStudentVertex(), Project: edge.GetProjectVertex()).GetHashCode();
}
}
I now found the problem. I want to thank #David Eisenstat, as he suggested generating small random graphs repeatedly.
The problem was something in my implementation of the Vertex class.
Every time I create an instance of the Edge class, I add that Edge to the corresponding vertices as well (meaning I effectively got 3 references to an edge). Calling the outer algorithm again (which calls the method above) only recreated the edge list, but left the old references in the vertices intact. Thus, following calls didn't start freshly, and the Minimum Vertex Cover found edges in the graph that weren't existent anymore (namely the List<Edge> unmatchedEdges = initialVertex.Edges.Except(matching, edgeComparer).ToList(); line).
Related
I'm creating kind of a 2D map where which coord has a tile class attached to it,
The tile class would have its coord inside of it, and some other values that would be accessed later, but I would like to have that map with no size limitations. The problem is, I want to see the values inside of some tile in that map, for example: I'm at coord(25,30) and I want to know a bool value inside the tile class in each adjacent tile.
And if I use an array, I would have maybe the fastest way to check the coord of a tile, e.g. an array of 2 indexes. I could make each index be a x and a y coord respectively, so I would only check that coord when seeing the values on a tile. But the map would have a limit in size.
And if I use a list the map won't have a limit in size but I can't check the coordinate directly, so I would need to go through each created tile with a foreach loop, and check if the coord inside that tile, is the same as the one I am looking for.
My current solution is to have a second List with only the coordinates, and have it assigned when I create a tile, so the index in the coordinate list is the same as in the tile list. So when I need to check for a tile, I do a CoordinateList.Contains(coordinate), and if that is true, then I put the index of that coordinate as the index that the code should look in the tile List.
I want a faster way to check a tile, without a size limitation.
So far, with the tile List I got around 3200ms for each time I checked the whole map (about 2000 tiles in the list).
And with the mapCoord List I got around 1500ms (around 2000 tiles and coords).
And with the array I was getting a pretty fast response (never measured it) but less than I second for sure... Since I never had to check for the whole Array, but one for a certain index.
Examples for easier understanding of my problem:
note1: It does not fill all the array.
note2: It wouldn't always be rectangular.
int size = 50;
Tile[,] mapArray = new Tile[size,size];
List<Tile> mapList = new List<Tile>();
List<Vector2Int> mapCoord = new List<Vector2Int>();
void CreateMap()
{
for(int x = size/2; size <= size/2; x++)
{
for(int y = size/2; size <= size/2; y++)
{
if(x > 2 && y > 3)
{
mapArray[x,y] = new Tile(new Vector2Int(x,y), false, 32);
mapList.add(new Tile(new Vector2Int(x,y), false, 32));
mapCoord.add(new Vector2Int(x,y));
}
}
}
}
So if I was to check a tile in the array, in the array I would just check the coord, since the tile coord would be the same as the array index, but it would have a size limit.
And if I was to check the tile in the list, I would need to do a foreach loop like this. Pretty bad for performance and optimization.
Tile desiredTile = null;
for each(Tile tile in mapTiles)
{
if(tile.Coord == DesiredCoord)
desiredTile = tile;
}
And the best way so far, is checking the mapCoord list like this:
if(mapCoord.Contains(desiredCoord))
{
desiredTile = mapList[mapCoord.IndexOf(DesiredCoord)];
}
Look up "sparse array" as a way to do this. One possible implementation is a Dictionary where the key is effectively a tuple of two ints (x and y). If the game starts with a standard boundary (say +/- 100 of the starting point) you could mix and match, a 200x200 array and the dictionary beyond that. You can also get creative by having multiple rectangular regions, each as an array.
If your total address space fits into a short integer (+/- 32k), then you could do something like this:
[StructLayout(LayoutKind.Explicit)]
struct IntXY
{
[FieldOffset(0)] Int16 X;
[FieldOffset(2)] Int16 Y;
[FieldOffset(0)] UInt32 AsAnUnsignedInteger;
public override int GetHashCode()
{
return AsAnUnsignedInteger.GetHashCode();
}
}
and use that as the key in your Dictionary (using LayoutKind.Explicit makes this is effectively the same as a C/C++ union - the X and Y shorts take up the same combined 32 bits as the unsigned int). It's probably cheaper than a Tuple<int, int> (though you'd probably want to test my guess).
When trying to calculate the Minkowski difference of two convex polygons I can simply find the set of vertices and use a gift wrapping algorithm to find the convex hull. (see below.)
However for concave polygons the convex hull algorithm is not appropriate as it will give me a false positive on collisions, is there a way I can adapt my code to easily find the correct expansion using the points already generated?
public List<Vector2D> MinkowskiDifference(Polygon other)
{
List<Vector2D> vList = new List<Vector2D>();
foreach (Vector2D vT in this.vertices)
{
foreach (Vector2D vO in other.vertices)
{
vList.Add((vT+this.position) - (vO+other.position));
}
}
return vList;
}
public static Polygon ConvexHull(List<Vector2D> vList)
{
List<Vector2D> S = vList;
List<Vector2D> P = new List<Vector2D>();
Vector2D endpoint;
Vector2D pointOnHull = LeftMostPoint(S);
int i = 0;
do
{
P.Add(pointOnHull);
endpoint = S[0];
for (int j = 1; j < S.Count; j++)
{
if (endpoint == pointOnHull || RelPosition(P[i], endpoint, S[j]) == -1)
{
endpoint = S[j];
}
}
i++;
pointOnHull = endpoint;
} while (endpoint != P[0]);
return new Polygon(P);
}
The usual method is to decompose the concave polygon in to convex pieces, and iterate pairwise between the convex pieces of each polygon looking for a collision. If one of the polygons is too big to do this naively (made up of 100s of convex peices), you can add each piece to a broad phase.
Note that if you're doing something like GJK, you don't explicitly construct the Minkowski difference as a polygon. Rather, you walk around it implicitly by finding "support" vertices on it that are furthest along a given direction. Because the Minkowski difference is linearly separable, you can do this for each polygon in isolation then find their difference.
The idea can be a little hard to grok, but see eg: http://www.dyn4j.org/2010/04/gjk-distance-closest-points/
I implemented the quick hull code found on the following page:
http://www.ahristov.com/tutorial/geometry-games/convex-hull.html
The algorithm is returning the correct points of the convex hull, but it is not returning them in the correct trigonometric order. Since the points are in no meaningful order I cannot use them to draw the lines and thus the hull itself.
For example, when I run the algorithm with the following points
(2,5) (9,2) (1,8) (0,5) (3,3)
The correct order I want them returned in is:
(0,5) (1,8) (9,2) (3,3)
Instead the quick hull algorithm returns them like this:
(1,8) (0,5) (3,3) (9,2)
Can anyone please help me
If it's not possible to modify the algorithm to return them in the correct order, you can compute the centroid of the returned points (add them all up and divide by the count, the centroid of a convex hull will always lie inside in the hull), then calculate the angle from the centroid to each point like this:
point.angle = atan2(point.y - centroid.y, point.x - centroid.x);
then sort the list of points based on the angles.
Also, this part of your C# code doesn't match the Java:
// Recursively proceed with new sets
HullSplit(minPt, farthestPt, leftSetMinPt, ref hull);
HullSplit(maxPt, farthestPt, leftSetMaxPt, ref hull);
// should be:
// HullSplit(farthestPt, maxPt, leftSetMaxPt, ref hull);
Java is:
hullSet(A,P,leftSetAP,hull);
hullSet(P,B,leftSetPB,hull);
Also, you have effectively reversed the signs on your point to line tests compared to the Java:
public int pointLocation(Point A, Point B, Point P) {
int cp1 = (B.x-A.x)*(P.y-A.y) - (B.y-A.y)*(P.x-A.x);
return (cp1>0)?1:-1;
}
if (pointLocation(A,B,p) == -1) // tests for negative
if (pointLocation(A,P,M)==1) { // tests for positive
if (pointLocation(P,B,M)==1) { // tests for positive
C#:
private static bool IsAboveLine(Point a, Point b, Point pt)
{
var result = ((b.X - a.X) * (pt.Y - a.Y))
-((b.Y - a.Y) * (pt.X - a.X));
return (result > 0) ? true : false;
}
if (IsAboveLine(minPt, maxPt, pt)) // tests for positive
if (!IsAboveLine(minPt, farthestPt, set.ElementAt(i))) // tests for negative
if (!IsAboveLine(farthestPt, maxPt, set.ElementAt(i))) // tests for negative
Hopefully a quick question. I have an IEnumerable of type Position where Position is defined as follows:
public class Position {
public double Latitude { get; set; }
public double Longitude { get; set; }
}
What I need to do is quickly sort through the IEnumerable to find elements that fall within a certain distance of eachother. The elements in the IEnumerable do not get populated in any specific order, but at any one time I need to be able to compute which of the members of the IEnumerable fall within x kilometers of eachother.
Now, I already have a Haversine implementation and for argument's sake, we can say it's called GetDistance and has the following signature:
double GetDistance(Position one, Position two);
I've got a few ideas, but none of them feel particularly efficient to my mind. To give a little more info, it's unlikely the IEnumerable will ever hold more than 10,000 items at any one time.
What I'd like to arrive at is something, perhaps an extension method, that lets me invoke it to return an IEnumerable which contains just the subset from the original collection which meets the criteria, for example:
OriginalEnumerable.GetMembersCloserThan(kilometers: 100);
Any help, much appreciated.
EDIT: For clarity, consider the IEnumerable I want to search describes a circle with radius r. It's members are coordinates within the circle. I'm trying to determine which members (points) are within a given proximity of eachother.
Something like this? Assuming GetDistance is available.
public static IEnumerable<Position> GetMembersCloserThan(this IEnumerable<Position> positions, double maxDistance)
{
return positions.Where(a => positions.Any(b => a != b && GetDistance(a, b) < maxDistance));
}
Edit I see now you are also interested in performance. The above is not particularly fast, though it is not horribly slow either since the math is pretty simple for comparing distances. Let me know if it satisfies your requirements.
Edit 2 This one is much faster--it won't test against past failures and will automatically add a match to the success list
public static IEnumerable<Position> GetMembersCloserThan(this IEnumerable<Position> positions, double maxDistance)
{
List<Position> closePositions = new List<Position>();
List<Position> testablePositions = positions.ToList();
foreach (Position position in positions)
{
// Skip this one, it has already been matched.
if (closePositions.Contains(position))
continue;
bool isClose = false;
foreach (Position testAgainstPosition in testablePositions)
{
if (position == testAgainstPosition)
continue;
if (GetDistance(position, testAgainstPosition) < maxDistance)
{
// Both the position and the tested position pass.
closePositions.Add(position);
closePositions.Add(testAgainstPosition);
isClose = true;
break;
}
}
// Don't test against this position in the future, it was already checked.
if (!isClose)
{
testablePositions.Remove(position);
}
}
return closePositions;
}
If you need more performance: Put the items in a Lists sorted by latitude.
To calculate your desired set of locations, iterate one of them. But for your distance calculation, you only need to consider the items, that are at most 100km different in latitude. That means, you can go back item by item, until the difference is greater than 100km. You need to wrap around the end of the list, however.
Mark all items (or yyield return) that are closer than 100km and move on.
Although I cannot quantify the expense, the sorting should amortize for large sets. Also, it may perform bad if most point lie at similar latitude. If that is an issue, use a 2D-Dictionary with rounded coordinates as keys.
I'm working in an application (C#) that applies some readability formulas to a text, like Gunning-Fog, Precise SMOG, Flesh-Kincaid.
Now, I need to implement the Fry-based Grade formula in my program, I understand the formula's logic, pretty much you take 3 100-words samples and calculate the average on sentences per 100-words and syllables per 100-words, and then, you use a graph to plot the values.
Here is a more detailed explanation on how this formula works.
I already have the averages, but I have no idea on how can I tell my program to "go check the graph and plot the values and give me a level." I don't have to show the graph to the user, I only have to show him the level.
I was thinking that maybe I can have all the values in memory, divided into levels, for example:
Level 1: values whose sentence average are between 10.0 and 25+, and whose syllables average are between 108 and 132.
Level 2: values whose sentence average are between 7.7 and 10.0, and .... so on
But the problem is that so far, the only place in which I have found the values that define a level, are in the graph itself, and they aren't too much accurate, so if I apply the approach commented above, trying to take the values from the graph, my level estimations would be too much imprecise, thus, the Fry-based Grade will not be accurate.
So, maybe any of you knows about some place where I can find exact values for the different levels of the Fry-based Grade, or maybe any of you can help me think in a way to workaround this.
Thanks
Well, I'm not sure about this being the most efficient solution, neither the best one, but at least it does the job.
I gave up to the idea of having like a math formula to get the levels, maybe there is such a formula, but I couldn't find it.
So I took the Fry's graph, with all the levels, and I painted each level of a different color, them I loaded the image on my program using:
Bitmap image = new Bitmap(#"C:\FryGraph.png");
image.GetPixel(int x, int y);
As you can see, after loading the image I use the GetPixel method to get the color at the specified coordinates. I had to do some conversion, to get the equivalent pixels for a given value on the graph, since the scale of the graph is not the equivalent to the pixels of the image.
In the end, I compare the color returned by GetPixel to see which was the Fry readability level of the text.
I hope this may be of any help for someone who faces the same problem.
Cheers.
You simply need to determine the formula for the graph. That is, a formula that accepts the number of sentences and number of syllables, and returns the level.
If you can't find the formula, you can determine it yourself. Estimate the linear equation for each of the lines on the graph. Also estimate the 'out-of-bounds' areas in the 'long words' and 'long sentences' areas.
Now for each point, just determine the region in which it resides; which lines it is above and which lines it is below. This is fairly simple algebra, unfortunately this is the best link I can find to describe how to do that.
I have made a first pass at solving this that I thought I would share in case someone else is looking sometime in the future. I built on the answer above and created a generic list of linear equations that one can use to determine an approximate grade level. First had to correct the values to make it more linear. This does not take into account the invalid areas, but I may revisit that.
The equation class:
public class GradeLineEquation
{
// using form y = mx+b
// or y=Slope(x)=yIntercept
public int GradeLevel { get; set; }
public float Slope { get; set; }
public float yIntercept { get; set; }
public float GetYGivenX(float x)
{
float result = 0;
result = (Slope * x) + yIntercept;
return result;
}
public GradeLineEquation(int gradelevel,float slope,float yintercept)
{
this.GradeLevel = gradelevel;
this.Slope = slope;
this.yIntercept = yintercept;
}
}
Here is the FryCalculator:
public class FryCalculator
{
//this class normalizes the plot on the Fry readability graph the same way a person would, by choosing points on the graph based on values even though
//the y-axis is non-linear and neither axis starts at 0. Just picking a relative point on each axis to plot the intercept of the zero and infinite scope lines
private List<GradeLineEquation> linedefs = new List<GradeLineEquation>();
public FryCalculator()
{
LoadLevelEquations();
}
private void LoadLevelEquations()
{
// load the estimated linear equations for each line with the
// grade level, Slope, and y-intercept
linedefs.Add(new NLPTest.GradeLineEquation(1, (float)0.5, (float)22.5));
linedefs.Add(new NLPTest.GradeLineEquation(2, (float)0.5, (float)20.5));
linedefs.Add(new NLPTest.GradeLineEquation(3, (float)0.6, (float)17.4));
linedefs.Add(new NLPTest.GradeLineEquation(4, (float)0.6, (float)15.4));
linedefs.Add(new NLPTest.GradeLineEquation(5, (float)0.625, (float)13.125));
linedefs.Add(new NLPTest.GradeLineEquation(6, (float)0.833, (float)7.333));
linedefs.Add(new NLPTest.GradeLineEquation(7, (float)1.05, (float)-1.15));
linedefs.Add(new NLPTest.GradeLineEquation(8, (float)1.25, (float)-8.75));
linedefs.Add(new NLPTest.GradeLineEquation(9, (float)1.75, (float)-24.25));
linedefs.Add(new NLPTest.GradeLineEquation(10, (float)2, (float)-35));
linedefs.Add(new NLPTest.GradeLineEquation(11, (float)2, (float)-40));
linedefs.Add(new NLPTest.GradeLineEquation(12, (float)2.5, (float)-58.5));
linedefs.Add(new NLPTest.GradeLineEquation(13, (float)3.5, (float)-93));
linedefs.Add(new NLPTest.GradeLineEquation(14, (float)5.5, (float)-163));
}
public int GetGradeLevel(float avgSylls,float avgSentences)
{
// first normalize the values given to cartesion positions on the graph
float x = NormalizeX(avgSylls);
float y = NormalizeY(avgSentences);
// given x find the first grade level equation that produces a lower y at that x
return linedefs.Find(a => a.GetYGivenX(x) < y).GradeLevel;
}
private float NormalizeY(float avgSentenceCount)
{
float result = 0;
int lower = -1;
int upper = -1;
// load the list of y axis line intervalse
List<double> intervals = new List<double> {2.0, 2.5, 3.0, 3.3, 3.5, 3.6, 3.7, 3.8, 4.0, 4.2, 4.3, 4.5, 4.8, 5.0, 5.2, 5.6, 5.9, 6.3, 6.7, 7.1, 7.7, 8.3, 9.1, 10.0, 11.1, 12.5, 14.3, 16.7, 20.0, 25.0 };
// find the first line lower or equal to the number we have
lower = intervals.FindLastIndex(a => ((double)avgSentenceCount) >= a);
// if we are not over the top or on the line grab the next higher line value
if(lower > -1 && lower < intervals.Count-1 && ((float) intervals[lower] != avgSentenceCount))
upper = lower + 1;
// set the integer portion of the respons
result = (float)lower;
// if we have an upper limit calculate the percentage above the lower line (to two decimal places) and add it to the result
if(upper != -1)
result += (float)Math.Round((((avgSentenceCount - intervals[lower])/(intervals[upper] - intervals[lower]))),2);
return result;
}
private float NormalizeX(float avgSyllableCount)
{
// the x axis is MUCH simpler. Subtract 108 and divide by 2 to get the x position relative to a 0 origin.
float result = (avgSyllableCount - 108) / 2;
return result;
}
}