I'm trying to write an algorithm that will take a list of points visited along an edge, and a list of unvisited edges (made up of pairs of points) which make up the rest of the object and search through them for a path that completes the edge (that is, connects the start to the end). I currently have:
public static int PolygonSearch(Point start, Point end, List<Point> visitedPoints, List<Point[]> unvisitedEdges)
{
int count = 0;
for (int i = unvisitedEdges.Count - 1; i > -1; i--)
{
Point[] line = unvisitedEdges[i];
if (((Equal(line[0], start) && Equal(line[1], end))
|| (Equal(line[1], start) && Equal(line[0], end)))
&& visitedPoints.Count > 2)
{
return count + 1;
}
else if (Equal(start, line[0]))
{
unvisitedEdges.RemoveAt(i);
count += PolygonSearch(line[1], end, visitedPoints, unvisitedEdges);
}
else if (Equal(start, line[1]))
{
unvisitedEdges.RemoveAt(i);
count += PolygonSearch(line[0], end, visitedPoints, unvisitedEdges);
}
}
return count;
}
(start and end being the current start and end points of the line)
The obvious problem here is the removal, which messes up the outer loops, but I'm not sure how to correct for it, I tried creating a new list each time but that didn't work (I've not even implemented a way to return the path yet, just to count the valid ones).
Any help fixing this would be greatly appreciated.
To avoid removing an object, you can set it as 'removed', then ignore it if it is so set.
The following uses a flag called Visited. If it is 'removed', Visited is set to true.
I haven't tested this obviously, but it should give you a general idea of what to do:
public struct Edge
{
public Edge()
{
this.Visited = false;
}
public Point[] Points;
public bool Visited;
}
public static int PolygonSearch(Point start, Point end, List<Point> visitedPoints, List<Edge> unvisitedEdges)
{
int count = 0;
for (int i = unvisitedEdges.Count - 1; i > -1; i--)
{
Edge line = unvisitedEdges[i];
if (((Equal(line.Points[0], start) && Equal(line.Points[1], end))
|| (Equal(line.Points[1], start) && Equal(line.Points[0], end)))
&& visitedPoints.Count > 2
&& line.Visited == false)
{
return count + 1;
}
else if (Equal(start, line[0]))
{
unvisitedEdges[i].Visited = true;
count += PolygonSearch(line.Points[1], end, visitedPoints, unvisitedEdges);
}
else if (Equal(start, line[0]))
{
unvisitedEdges[i].Visited = true;
count += PolygonSearch(line.Points[1], end, visitedPoints, unvisitedEdges);
}
}
return count;
}
Related
Today I decided to learn how Quick sort algorithm work. I studied some examples and watched a few YouTube videos on the matter.
So, after that I decided to try and write it down. I did use some pointers from a YouTube channel that I find extremely helpful.
When I try to run the program however, I get a "Process is terminated due to StackOverflowException" and I'm unable to figure why.
I will appreciate if anyone could help me.
private static void QuickSort(int[] numbers, int beginning, int end)
{
int pivotLoc = 0;
if(beginning < end)
{
Partition(numbers, beginning, end, pivotLoc);
QuickSort(numbers, beginning, pivotLoc - 1);
QuickSort(numbers, pivotLoc + 1, end);
}
}
private static void Partition(int[] numbers, int beginning, int end,int pivotLoc)
{
int left = beginning;
int right = end;
int temp;
pivotLoc = left;
while (true)
{
while(numbers[pivotLoc] <= numbers[right] && pivotLoc != right)
{
right--;
}
if(pivotLoc == right)
{
break;
}
else if(numbers[pivotLoc] > numbers[right])
{
temp = numbers[right];
numbers[right] = numbers[pivotLoc];
numbers[pivotLoc] = temp;
pivotLoc = right;
}
while (numbers[pivotLoc] >= numbers[left] && pivotLoc != left)
{
left++;
}
if(pivotLoc == left)
{
break;
}
else if(numbers[pivotLoc] < numbers[left])
{
temp = numbers[left];
numbers[left] = numbers[pivotLoc];
numbers[pivotLoc] = temp;
pivotLoc = left;
}
}
}
I was studying search algorithms and wanted to solve the missionaries and cannibals problem in order to practice. However, my code never provides a solution. At first, I thought this was because I had recurring states, causing an infinite loop, so I added a state history to make sure states weren't being repeated. However, this still has not worked.
Below is the code I have written. I am using vectors to represent the states of the missionaries, cannibals and the boat and the children of the nodes get added if they pass a check that checks if the move is within the range (0,0,0) and (3,3,1).
I have tried stepping through the code but since the tree is fairly large I can only keep track of so many things, so I have a hard time seeing the fault in my code.
This was written in Visual Studio as a console program.
Vector3 class
public class Vector3
{
public int m;
public int c;
public int b;
public Vector3(int M, int C, int B)
{
m = M;
c = C;
b = B;
}
public override bool Equals(System.Object obj)
{
if (obj == null)
return false;
Vector3 p = obj as Vector3;
if ((System.Object)p == null)
return false;
return (m == p.m) && (c == p.c) && (b == p.b);
}
}
Node class
public class Node
{
public Vector3 State;
public Node(Vector3 st)
{
State = st;
}
}
My Program.cs
class Program
{
static void Main(string[] args)
{
Program p = new Program();
p.DFS(new Node(new Vector3(3, 3, 1)));
Console.ReadKey();
}
List<Vector3> History = new List<Vector3>();
Vector3[] Operators = new Vector3[]
{
new Vector3(1,0,1),
new Vector3(2,0,1),
new Vector3(0,1,1),
new Vector3(0,2,1),
new Vector3(1,1,1),
};
public bool TryMove(Vector3 current, Vector3 toApply, bool substract)
{
if (substract)
{
if (current.c - toApply.c < 0 || current.m - toApply.m < 0 || current.b - toApply.b < 0 || (current.c - toApply.c) > (current.m - toApply.m))
{
return false;
}
else return true;
}
else
{
if (current.c + toApply.c > 3 || current.m + toApply.m > 3 || current.b + toApply.b > 1 || (current.c + toApply.c) > (current.m + toApply.m))
{
return false;
}
else return true;
}
}
public void DFS(Node n)
{
Stack<Node> stack = new Stack<Node>();
stack.Push(n);
while (stack.Count > 0)
{
Node curNode = stack.Pop();
if (History.Contains(curNode.State))
{
}
else
{
History.Add(curNode.State);
if (curNode.State == new Vector3(0, 0, 0))
{
Console.WriteLine("Solution found.");
return;
}
else
{
if (curNode.State.b == 0) //Boat is across the river
{
for (int x = 0; x < 5; x++)
{
if (TryMove(curNode.State, Operators[x], false))
{
stack.Push(new Node(new Vector3(curNode.State.m + Operators[x].m, curNode.State.c + Operators[x].c, curNode.State.b + Operators[x].b)));
}
}
}
else //Boat == 1
{
for (int x = 0; x < 5; x++)
{
if (TryMove(curNode.State, Operators[x], true))
{
stack.Push(new Node(new Vector3(curNode.State.m - Operators[x].m, curNode.State.c - Operators[x].c, curNode.State.b - Operators[x].b)));
}
}
}
}
}
}
Console.WriteLine("No solution found.");
return;
}
}
My code keeps hitting the 'No solution found' block. When I remove the history I keep infinite looping between states (3,3,1) and (2,2,1) and get an OutOfMemoryException at the 2 gigabyte mark, so I'm not even sure about keeping track of history anymore.
What steps should I take in order to implement the DFS in the context of the problem correctly, given the code I provided above?
Your algorithm is fine. The problem is that you used == operator in curNode.State == new Vector3(0, 0, 0); line. In C#, by default, == compares objects by reference, so this condition will always return false. Either use node.State.Equals(new Vector3(0, 0, 0)) or override == operator to use your Equals method.
See MSDN Guidelines on custom comparison in C#.
I am trying to create a 2D cave generation system. When I run the program I get "System.StackOverflowException" Exception, after I try to create new object from its own class.
My cave generator works like this:
I create a map that contains ID’s (integers) of the different types of cells(like wall, water or empty Space).
First off all my "Map" class creates a map filled with walls and after that in the centre of the map, it creates a "Miner" object. The Miner digs the map and makes caves. The problem is I want to create more miners. So, my Miner that is digging the map creates another Miner. However, when I do this, I get a "System.StackOverflowException" Exception.
How do I go about tracking down the cause of the StackOverflow in my program.
Here is my miner code:
Miner.cs
public class Miner
{
Random rand = new Random();
public string state { get; set; }
public int x { get; set; }
public int y { get; set; }
public Map map { get; set; }
public int minersCount;
public Miner(Map map, string state, int x, int y)
{
this.map = map;
this.state = state;
this.x = x;
this.y = y;
minersCount++;
if (state == "Active")
{
StartDigging();
}
}
bool IsOutOfBounds(int x, int y)
{
if (x == 0 || y == 0)
{
return true;
}
else if (x > map.mapWidth - 2 || y > map.mapHeight - 2)
{
return true;
}
return false;
}
bool IsLastMiner()
{
if (minersCount == 1)
{
return true;
}
else
{
return false;
}
}
public void StartDigging()
{
if (state == "Active")
{
int dir = 0;
bool needStop = false;
int ID = -1;
while (!needStop && !IsOutOfBounds(x, y))
{
while (dir == 0)
{
dir = ChooseDirection();
}
if (!AroundIsNothing())
{
while (ID == -1)
{
ID = GetIDFromDirection(dir);
}
}
else
{
if (!IsLastMiner())
{
needStop = true;
}
}
if (ID == 1)
{
DigToDirection(dir);
dir = 0;
}
if (ID == 0 && IsLastMiner())
{
MoveToDirection(dir);
dir = 0;
}
TryToCreateNewMiner();
}
if (needStop)
{
state = "Deactive";
}
}
}
public void TryToCreateNewMiner()
{
if (RandomPercent(8))
{
Miner newMiner = new Miner(map, "Active", x, y);
}
else
{
return;
}
}
bool AroundIsNothing()
{
if (map.map[x + 1, y] == 0 && map.map[x, y + 1] == 0 &&
map.map[x - 1, y] == 0 && map.map[x, y - 1] == 0)
{
return true;
}
else
{
return false;
}
}
void MoveToDirection(int dir)
{
if (dir == 1)
{
x = x + 1;
}
else if (dir == 2)
{
y = y + 1;
}
else if (dir == 3)
{
x = x - 1;
}
else if (dir == 4)
{
y = y - 1;
}
}
void DigToDirection(int dir)
{
if (dir == 1)
{
map.map[x + 1, y] = 0;
x = x + 1;
}
else if (dir == 2)
{
map.map[x, y + 1] = 0;
y = y + 1;
}
else if (dir == 3)
{
map.map[x - 1, y] = 0;
x = x - 1;
}
else if (dir == 4)
{
map.map[x, y - 1] = 0;
y = y - 1;
}
}
int GetIDFromDirection(int dir)
{
if (dir == 1)
{
return map.map[x + 1, y];
}
else if (dir == 2)
{
return map.map[x, y + 1];
}
else if (dir == 3)
{
return map.map[x - 1, y];
}
else if (dir == 4)
{
return map.map[x, y - 1];
}
else
{
return -1;
}
}
int ChooseDirection()
{
return rand.Next(1, 5);
}
bool RandomPercent(int percent)
{
if (percent >= rand.Next(1, 101))
{
return true;
}
return false;
}
}
Whilst you can get StackOverflowExceptions by creating too many really large objects on the stack, it usually happens because your code has got into a state where it is calling the same chain of functions over and over again. So, to track down the cause in your code, the best starting point is to determine where your code calls itself.
Your code consists of several functions that are called by the Miner class itself, most of which are trivial
Trivial functions that don't call anything else in the class. Whilst these functions may contribute to the state that triggers the problem, they aren't part of the terminal function loop:
IsOutOfBounds(int x, int y)
bool IsLastMiner()
bool AroundIsNothing()
void MoveToDirection(int dir)
void DigToDirection(int dir)
int GetIDFromDirection(int dir)
int ChooseDirection()
bool RandomPercent(int percent)
This leaves your remaining three functions
public Miner(Map map, string state, int x, int y) // Called by TryToCreateNewMiner
public void StartDigging() // Called by constructor
// Contains main digging loop
public void TryToCreateNewMiner() // Called by StartDigging
These three functions form a calling loop, so if the branching logic in the functions is incorrect it could cause a non-terminating loop and hence a stack overflow.
So, looking at the branching logic in the functions
Miner
The constructor only has one branch, based on if the state is "Active". It is always active, since that's the way the object is always being created, so the constructor will always call StartDigging. This feels like the state isn't being handled correctly, although it's possible that you're going to use it for something else in the future...
As an aside, it's generally considered to be bad practice to do a lot of processing, not required to create the object in an objects constructor. All of your processing happens in the constructor which feels wrong.
TryToCreateNewMiner
This has one branch, 8% of the time, it will create a new miner and call the constructor. So for every 10 times TryToCreateNewMiner is called, we stand a good chance that it will have succeeded at least once. The new miner is initially started in the same position as the parent object (x and y aren't changed).
StartDigging
There's a fair bit of branching in this method. The main bit we are interested in are the conditions around calls to TryToCreateNewMiner. Lets look at the branches:
if(state=="Active")
This is currently a redundant check (it's always active).
while (!needStop && !IsOutOfBounds(x, y)) {
The first part of this termination clause is never triggered. needStop is only ever set to true if(!IsLastMiner). Since minersCount is always 1, it's always the last miner, so needStop is never triggered. The way you are using minersCount suggests that you think it is shared between instances of Miner, which it isn't. If that is your intention you may want to read up on static variables.
The second part of the termination clause is the only way out of the loop and that is triggered if either x or y reaches the edge of the map.
while(dir==0)
This is a pointless check, dir can only be a number between 1 and 5, since that's what is returned by ChooseDirection.
if(!AroundIsNothing())
This is checking if the positions that the Miner can move into are all set to 0. If they are not, then GetIDFromDirection is called. This is key. If the Miner is currently surrounded by 0, ID will not be set, it will remain at it's previous value. In a situation where a Miner has just been created, this will be -1 (we know this could happen because all Miners are created at the location of the Miner creating it).
The last two checksif(ID==1) and if(ID==0 && IsLastMiner()) guard the code that moves the Miner (either by calling dig, or move). So, if ID is not 0, or 1 at this point the Miner will not move. This could cause a problem because it is immediately before the call to TryToCreateNewMiner, so if the program ever gets into this situation it will be stuck in a loop where the Miner isn't moving and it's constantly trying to create new Miners in the same position. 8% of the time this will work, creating a new miner in the same position, which will perform the same checks and get into the same loop, again not moving and trying to create a new miner and so it goes until the stack runs out of space and the program crashes.
You need to take a look at your termination clauses and the way you're handling ID, you probably don't want the Miner to just stop doing anything if it gets completely surround by 0.
I am trying this problem using dynamic programming
Problem:
Given a meeting room and a list of intervals (represent the meeting), for e.g.:
interval 1: 1.00-2.00
interval 2: 2.00-4.00
interval 3: 14.00-16.00
...
etc.
Question:
How to schedule the meeting to maximize the room utilization, and NO meeting should overlap with each other?
Attempted solution
Below is my initial attempt in C# (knowing it is a modified Knapsack problem with constraints). However I had difficulty in getting the result correctly.
bool ContainsOverlapped(List<Interval> list)
{
var sortedList = list.OrderBy(x => x.Start).ToList();
for (int i = 0; i < sortedList.Count; i++)
{
for (int j = i + 1; j < sortedList.Count; j++)
{
if (sortedList[i].IsOverlap(sortedList[j]))
return true;
}
}
return false;
}
public bool Optimize(List<Interval> intervals, int limit, List<Interval> itemSoFar){
if (intervals == null || intervals.Count == 0)
return true; //no more choice
if (Sum(itemSoFar) > limit) //over limit
return false;
var arrInterval = intervals.ToArray();
//try all choices
for (int i = 0; i < arrInterval.Length; i++){
List<Interval> remaining = new List<Interval>();
for (int j = i + 1; j < arrInterval.Length; j++) {
remaining.Add(arrInterval[j]);
}
var partialChoice = new List<Interval>();
partialChoice.AddRange(itemSoFar);
partialChoice.Add(arrInterval[i]);
//should not schedule overlap
if (ContainsOverlapped(partialChoice))
partialChoice.Remove(arrInterval[i]);
if (Optimize(remaining, limit, partialChoice))
return true;
else
partialChoice.Remove(arrInterval[i]); //undo
}
//try all solution
return false;
}
public class Interval
{
public bool IsOverlap(Interval other)
{
return (other.Start < this.Start && this.Start < other.End) || //other < this
(this.Start < other.Start && other.End < this.End) || // this covers other
(other.Start < this.Start && this.End < other.End) || // other covers this
(this.Start < other.Start && other.Start < this.End); //this < other
}
public override bool Equals(object obj){
var i = (Interval)obj;
return base.Equals(obj) && i.Start == this.Start && i.End == this.End;
}
public int Start { get; set; }
public int End { get; set; }
public Interval(int start, int end){
Start = start;
End = end;
}
public int Duration{
get{
return End - Start;
}
}
}
Edit 1
Room utilization = amount of time the room is occupied. Sorry for confusion.
Edit 2
for simplicity: the duration of each interval is integer, and the start/end time start at whole hour (1,2,3..24)
I'm not sure how you are relating this to a knapsack problem. To me it seems more of a vertex cover problem.
First sort the intervals as per their start times and form a graph representation in the form of adjacency matrix or list.
The vertices shall be the interval numbers. There shall be an edge between two vertices if the corresponding intervals overlap with each other. Also, each vertex shall be associated with a value equal to the interval's duration.
The problem then becomes choosing the independent vertices in such a way that the total value is maximum.
This can be done through dynamic programming. The recurrence relation for each vertex shall be as follows:
V[i] = max{ V[j] | j < i and i->j is an edge,
V[k] + value[i] | k < i and there is no edge between i and k }
Base Case V[1] = value[1]
Note:
The vertices should be numbered in increasing order of their start times. Then if there are three vertices:
i < j < k, and if there is no edge between vertex i and vertex j, then there cannot be any edge between vertex i and vertex k.
Good approach is to create class that can easily handle for you.
First I create helper class for easily storing intervals
public class FromToDateTime
{
private DateTime _start;
public DateTime Start
{
get
{
return _start;
}
set
{
_start = value;
}
}
private DateTime _end;
public DateTime End
{
get
{
return _end;
}
set
{
_end = value;
}
}
public FromToDateTime(DateTime start, DateTime end)
{
Start = start;
End = end;
}
}
And then here is class Room, where all intervals are and which has method "addInterval", which returns true, if interval is ok and was added and false, if it does not.
btw : I got a checking condition for overlapping here : Algorithm to detect overlapping periods
public class Room
{
private List<FromToDateTime> _intervals;
public List<FromToDateTime> Intervals
{
get
{
return _intervals;
}
set
{
_intervals = value;
}
}
public Room()
{
Intervals = new List<FromToDateTime>();
}
public bool addInterval(FromToDateTime newInterval)
{
foreach (FromToDateTime interval in Intervals)
{
if (newInterval.Start < interval.End && interval.Start < newInterval.End)
{
return false;
}
}
Intervals.Add(newInterval);
return true;
}
}
While the more general problem (if you have multiple number of meeting rooms) is indeed NP-Hard, and is known as the interval scheduling problem.
Optimal solution for 1-d problem with one classroom:
For the 1-d problem, choosing the (still valid) earliest deadline first solves the problem optimally.
Proof: by induction, the base clause is the void clause - the algorithm optimally solves a problem with zero meetings.
The induction hypothesis is the algorithm solves the problem optimally for any number of k tasks.
The step: Given a problem with n meetings, hose the earliest deadline, and remove all invalid meetings after choosing it. Let the chosen earliest deadline task be T.
You will get a new problem of smaller size, and by invoking the algorithm on the reminder, you will get the optimal solution for them (induction hypothesis).
Now, note that given that optimal solution, you can add at most one of the discarded tasks, since you can either add T, or another discarded task - but all of them overlaps T - otherwise they wouldn't have been discarded), thus, you can add at most one from all discarded tasks, same as the suggested algorithm.
Conclusion: For 1 meeting room, this algorithm is optimal.
QED
high level pseudo code of the solution:
findOptimal(list<tasks>):
res = [] //empty list
sort(list) //according to deadline/meeting end
while (list.IsEmpty() == false):
res = res.append(list.first())
end = list.first().endTime()
//remove all overlaps with the chosen meeting
while (list.first().startTine() < end):
list.removeFirst()
return res
Clarification: This answer assumes "Room Utilization" means maximize number of meetings placed in the room.
Thanks all, here is my solution based on this Princeton note on dynamic programming.
Algorithm:
Sort all events by end time.
For each event, find p[n] - the latest event (by end time) which does not overlap with it.
Compute the optimization values: choose the best between including/not including the event.
Optimize(n) {
opt(0) = 0;
for j = 1 to n-th {
opt(j) = max(length(j) + opt[p(j)], opt[j-1]);
}
}
The complete source-code:
namespace CommonProblems.Algorithm.DynamicProgramming {
public class Scheduler {
#region init & test
public List<Event> _events { get; set; }
public List<Event> Init() {
if (_events == null) {
_events = new List<Event>();
_events.Add(new Event(8, 11));
_events.Add(new Event(6, 10));
_events.Add(new Event(5, 9));
_events.Add(new Event(3, 8));
_events.Add(new Event(4, 7));
_events.Add(new Event(0, 6));
_events.Add(new Event(3, 5));
_events.Add(new Event(1, 4));
}
return _events;
}
public void DemoOptimize() {
this.Init();
this.DynamicOptimize(this._events);
}
#endregion
#region Dynamic Programming
public void DynamicOptimize(List<Event> events) {
events.Add(new Event(0, 0));
events = events.SortByEndTime();
int[] eventIndexes = getCompatibleEvent(events);
int[] utilization = getBestUtilization(events, eventIndexes);
List<Event> schedule = getOptimizeSchedule(events, events.Count - 1, utilization, eventIndexes);
foreach (var e in schedule) {
Console.WriteLine("Event: [{0}- {1}]", e.Start, e.End);
}
}
/*
Algo to get optimization value:
1) Sort all events by end time, give each of the an index.
2) For each event, find p[n] - the latest event (by end time) which does not overlap with it.
3) Compute the optimization values: choose the best between including/not including the event.
Optimize(n) {
opt(0) = 0;
for j = 1 to n-th {
opt(j) = max(length(j) + opt[p(j)], opt[j-1]);
}
display opt();
}
*/
int[] getBestUtilization(List<Event> sortedEvents, int[] compatibleEvents) {
int[] optimal = new int[sortedEvents.Count];
int n = optimal.Length;
optimal[0] = 0;
for (int j = 1; j < n; j++) {
var thisEvent = sortedEvents[j];
//pick between 2 choices:
optimal[j] = Math.Max(thisEvent.Duration + optimal[compatibleEvents[j]], //Include this event
optimal[j - 1]); //Not include
}
return optimal;
}
/*
Show the optimized events:
sortedEvents: events sorted by end time.
index: event index to start with.
optimal: optimal[n] = the optimized schedule at n-th event.
compatibleEvents: compatibleEvents[n] = the latest event before n-th
*/
List<Event> getOptimizeSchedule(List<Event> sortedEvents, int index, int[] optimal, int[] compatibleEvents) {
List<Event> output = new List<Event>();
if (index == 0) {
//base case: no more event
return output;
}
//it's better to choose this event
else if (sortedEvents[index].Duration + optimal[compatibleEvents[index]] >= optimal[index]) {
output.Add(sortedEvents[index]);
//recursive go back
output.AddRange(getOptimizeSchedule(sortedEvents, compatibleEvents[index], optimal, compatibleEvents));
return output;
}
//it's better NOT choose this event
else {
output.AddRange(getOptimizeSchedule(sortedEvents, index - 1, optimal, compatibleEvents));
return output;
}
}
//compatibleEvents[n] = the latest event which do not overlap with n-th.
int[] getCompatibleEvent(List<Event> sortedEvents) {
int[] compatibleEvents = new int[sortedEvents.Count];
for (int i = 0; i < sortedEvents.Count; i++) {
for (int j = 0; j <= i; j++) {
if (!sortedEvents[j].IsOverlap(sortedEvents[i])) {
compatibleEvents[i] = j;
}
}
}
return compatibleEvents;
}
#endregion
}
public class Event {
public int EventId { get; set; }
public bool IsOverlap(Event other) {
return !(this.End <= other.Start ||
this.Start >= other.End);
}
public override bool Equals(object obj) {
var i = (Event)obj;
return base.Equals(obj) && i.Start == this.Start && i.End == this.End;
}
public int Start { get; set; }
public int End { get; set; }
public Event(int start, int end) {
Start = start;
End = end;
}
public int Duration {
get {
return End - Start;
}
}
}
public static class ListExtension {
public static bool ContainsOverlapped(this List<Event> list) {
var sortedList = list.OrderBy(x => x.Start).ToList();
for (int i = 0; i < sortedList.Count; i++) {
for (int j = i + 1; j < sortedList.Count; j++) {
if (sortedList[i].IsOverlap(sortedList[j]))
return true;
}
}
return false;
}
public static List<Event> SortByEndTime(this List<Event> events) {
if (events == null) return new List<Event>();
return events.OrderBy(x => x.End).ToList();
}
}
}
Is there any simple algorithm to determine the likeliness of 2 names representing the same person?
I'm not asking for something of the level that Custom department might be using. Just a simple algorithm that would tell me if 'James T. Clark' is most likely the same name as 'J. Thomas Clark' or 'James Clerk'.
If there is an algorithm in C# that would be great, but I can translate from any language.
Sounds like you're looking for a phonetic-based algorithms, such as soundex, NYSIIS, or double metaphone. The first actually is what several government departments use, and is trivial to implement (with many implementations readily available). The second is a slightly more complicated and more precise version of the first. The latter-most works with some non-English names and alphabets.
Levenshtein distance is a definition of distance between two arbitrary strings. It gives you a distance of 0 between identical strings and non-zero between different strings, which might also be useful if you decide to make a custom algorithm.
Levenshtein is close, although maybe not exactly what you want.
I've faced similar problem and tried to use Levenstein distance first, but it did not work well for me. I came up with an algorithm that gives you "similarity" value between two strings (higher value means more similar strings, "1" for identical strings). This value is not very meaningful by itself (if not "1", always 0.5 or less), but works quite well when you throw in Hungarian Matrix to find matching pairs from two lists of strings.
Use like this:
PartialStringComparer cmp = new PartialStringComparer();
tbResult.Text = cmp.Compare(textBox1.Text, textBox2.Text).ToString();
The code behind:
public class SubstringRange {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
int start;
public int Start {
get { return start; }
set { start = value; }
}
int end;
public int End {
get { return end; }
set { end = value; }
}
public int Length {
get { return End - Start; }
set { End = Start + value;}
}
public bool IsValid {
get { return MasterString.Length >= End && End >= Start && Start >= 0; }
}
public string Contents {
get {
if(IsValid) {
return MasterString.Substring(Start, Length);
} else {
return "";
}
}
}
public bool OverlapsRange(SubstringRange range) {
return !(End < range.Start || Start > range.End);
}
public bool ContainsRange(SubstringRange range) {
return range.Start >= Start && range.End <= End;
}
public bool ExpandTo(string newContents) {
if(MasterString.Substring(Start).StartsWith(newContents, StringComparison.InvariantCultureIgnoreCase) && newContents.Length > Length) {
Length = newContents.Length;
return true;
} else {
return false;
}
}
}
public class SubstringRangeList: List<SubstringRange> {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
public SubstringRangeList(string masterString) {
this.MasterString = masterString;
}
public SubstringRange FindString(string s){
foreach(SubstringRange r in this){
if(r.Contents.Equals(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public SubstringRange FindSubstring(string s){
foreach(SubstringRange r in this){
if(r.Contents.StartsWith(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public bool ContainsRange(SubstringRange range) {
foreach(SubstringRange r in this) {
if(r.ContainsRange(range))
return true;
}
return false;
}
public bool AddSubstring(string substring) {
bool result = false;
foreach(SubstringRange r in this) {
if(r.ExpandTo(substring)) {
result = true;
}
}
if(FindSubstring(substring) == null) {
bool patternfound = true;
int start = 0;
while(patternfound){
patternfound = false;
start = MasterString.IndexOf(substring, start, StringComparison.InvariantCultureIgnoreCase);
patternfound = start != -1;
if(patternfound) {
SubstringRange r = new SubstringRange();
r.MasterString = this.MasterString;
r.Start = start++;
r.Length = substring.Length;
if(!ContainsRange(r)) {
this.Add(r);
result = true;
}
}
}
}
return result;
}
private static bool SubstringRangeMoreThanOneChar(SubstringRange range) {
return range.Length > 1;
}
public float Weight {
get {
if(MasterString.Length == 0 || Count == 0)
return 0;
float numerator = 0;
int denominator = 0;
foreach(SubstringRange r in this.FindAll(SubstringRangeMoreThanOneChar)) {
numerator += r.Length;
denominator++;
}
if(denominator == 0)
return 0;
return numerator / denominator / MasterString.Length;
}
}
public void RemoveOverlappingRanges() {
SubstringRangeList l = new SubstringRangeList(this.MasterString);
l.AddRange(this);//create a copy of this list
foreach(SubstringRange r in l) {
if(this.Contains(r) && this.ContainsRange(r)) {
Remove(r);//try to remove the range
if(!ContainsRange(r)) {//see if the list still contains "superset" of this range
Add(r);//if not, add it back
}
}
}
}
public void AddStringToCompare(string s) {
for(int start = 0; start < s.Length; start++) {
for(int len = 1; start + len <= s.Length; len++) {
string part = s.Substring(start, len);
if(!AddSubstring(part))
break;
}
}
RemoveOverlappingRanges();
}
}
public class PartialStringComparer {
public float Compare(string s1, string s2) {
SubstringRangeList srl1 = new SubstringRangeList(s1);
srl1.AddStringToCompare(s2);
SubstringRangeList srl2 = new SubstringRangeList(s2);
srl2.AddStringToCompare(s1);
return (srl1.Weight + srl2.Weight) / 2;
}
}
Levenstein distance one is much simpler (adapted from http://www.merriampark.com/ld.htm):
public class Distance {
/// <summary>
/// Compute Levenshtein distance
/// </summary>
/// <param name="s">String 1</param>
/// <param name="t">String 2</param>
/// <returns>Distance between the two strings.
/// The larger the number, the bigger the difference.
/// </returns>
public static int LD(string s, string t) {
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if(n == 0) return m;
if(m == 0) return n;
// Step 2
for(int i = 0; i <= n; d[i, 0] = i++) ;
for(int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for(int i = 1; i <= n; i++) {
//Step 4
for(int j = 1; j <= m; j++) {
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
I doubt there is, considering even the Customs Department doesn't seem to have a satisfactory answer...
If there is a solution to this problem I seriously doubt it's a part of core C#. Off the top of my head, it would require a database of first, middle and last name frequencies, as well as account for initials, as in your example. This is fairly complex logic that relies on a database of information.
Second to Levenshtein distance, what language do you want? I was able to find an implementation in C# on codeproject pretty easily.
In an application I worked on, the Last name field was considered reliable.
So presented all the all the records with the same last name to the user.
User could sort by the other fields to look for similar names.
This solution was good enough to greatly reduce the issue of users creating duplicate records.
Basically looks like the issue will require human judgement.