Related
I am trying to spawn the neighbours of my path finder but when doing so the original path (path) blue also becomes covered up. I have tried switching of the order of spawn but this does not seem to help the problem. Quite new to Unity and would appreciate any insight into how I might go about spawning the path the neighbours without losing the original blue path.
Obtaining path neighbours; spawning the path + neighbours
for (int i = 1; i < path.Count; i++)
{
var pathCellPosition = path[i];
pathCellPosition.Position = new Vector3(pathCellPosition.Position.x, pathPrefab.transform.position.y, pathCellPosition.Position.z);
List<Node> neighbours = GetNeighbours(pathCellPosition);
var p = Instantiate(pathPrefab, pathCellPosition.Position, Quaternion.identity, PathCells);
foreach (Node n in neighbours)
{
neighbourPrefab.GetComponentInChildren<Text>().text = n.Cost.ToString();
Instantiate(neighbourPrefab, n.Position, Quaternion.identity, p.transform);
}
}
Original path
Path getting covered by neighbours
Implemented suggestions fixing part of the problem
The tiles around now seem to appear but when applying the same solution to implement neighbours of neighbours it seems to cause an overlap with the original neighbouring nodes.
Code changes:
HashSet<Node> visitedNodes = new HashSet<Node>();
for (int i = 1; i < path.Count; i++)
{
var pathCellPosition = path[i];
visitedNodes.Add(path[i]);
pathCellPosition.Position = new Vector3(pathCellPosition.Position.x, pathPrefab.transform.position.y, pathCellPosition.Position.z);
List<Node> neighbours = GetNeighbours(path[i]);
int fCost = DistanceFromStart(pathCellPosition) + HCostDistanceFromEndNode(pathCellPosition) + pathCellPosition.Cost;
pathPrefab.GetComponentInChildren<Text>().text = fCost.ToString();
var p = Instantiate(pathPrefab, pathCellPosition.Position, Quaternion.identity, PathCells);
for(int x = 0; x < neighbours.Count; x++)
{
var node = neighbours[x];
if (visitedNodes.Contains(node))
{
continue;
}
visitedNodes.Add(node);
fCost = DistanceFromStart(node) + HCostDistanceFromEndNode(node) + node.Cost;
neighbourPrefab.GetComponentInChildren<Text>().text = fCost.ToString();
Instantiate(neighbourPrefab, node.Position, Quaternion.identity, p.transform);
List<Node> NeighBourOfNeighbourNodes = GetNeighbours(node);
if(NeighBourOfNeighbourNodes.Count > 0)
{
for (int y = 0; y < NeighBourOfNeighbourNodes.Count; y++)
{
var neighbourNode = NeighBourOfNeighbourNodes[y];
if (visitedNodes.Contains(neighbourNode))
{
continue;
}
visitedNodes.Add(neighbourNode);
fCost = DistanceFromStart(neighbourNode) + HCostDistanceFromEndNode(neighbourNode) + neighbourNode.Cost;
nofn.GetComponentInChildren<Text>().text = fCost.ToString();
Instantiate(nofn, neighbourNode.Position, Quaternion.identity, p.transform);
}
}
}
If I don't draw neighbour of neighbours, this looks right. But if I do neighbour of neighbours the purple ones don't appear as much as they should.
Before:
After:
A typical simple solution is to check if the node is part of the path before adding. A HashSet can be good for this, but you will need something that can be reliably compared.
Perhaps something like this:
var visitedNodes = new HashSet<Node>(path);
for (int i = 1; i < path.Count; i++)
{
var pathCellPosition = path[i];
pathCellPosition.Position = new Vector3(pathCellPosition.Position.x, pathPrefab.transform.position.y, pathCellPosition.Position.z);
List<Node> neighbours = GetNeighbours(pathCellPosition);
var p = Instantiate(pathPrefab, pathCellPosition.Position, Quaternion.identity, PathCells);
foreach (Node n in neighbours)
{
if(visitedNodes.Contains(n))
continue;
visitedNodes.Add(n);
neighbourPrefab.GetComponentInChildren<Text>().text = n.Cost.ToString();
Instantiate(neighbourPrefab, n.Position, Quaternion.identity, p.transform);
}
}
You can also use the same approach to ensure a neighbour is not added twice.
If you want multiple levels from the node you might need to do something like a breath first search, ideally keeping track of the distance from the original path. The following generic code should traverse the graph in a breadth first manner, just use the path as the first parameter, and the GetNeighbours method for the second.
public static IEnumerable<(T Node, int Distance)> GetNeighborsWithDistance<T>(IEnumerable<T> self, Func<T, IEnumerable<T>> selector)
{
var stack = new Queue<(T Node, int Distance)>();
var visited = new HashSet<T>();
foreach (var node in self)
{
stack.Enqueue((node, 0));
}
while (stack.Count > 0)
{
var current = stack.Dequeue();
if(visited.Contains(current.Node))
continue;
yield return current;
visited.Add(current.Node);
foreach (var child in selector(current.Node))
{
stack.Enqueue((child, current.Distance+ 1));
}
}
}
This should be guaranteed to return nodes in order of distance from the original path, so just do .TakeWhile(p => p.Distance < DesiredDistanceFromOriginalPath) to get as many nodes as you want.
I have a list of people List<Person> who I need to produce groups of combinations according to a couple of conditions. This may be easiest explained with an example. Let's say I have N = 19 people.
List<Person> people = new List<Person>(){A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S};
I am given as input a PreferredGroupSize. In this case, PreferredGroupSize = 5;.
I need as output combinations grouped by this PreferredGroupSize and +/-1 for remaining members lastGroupSizes.
For 19, I need all combinations of people with 3 groups of size 5 and a single group of size 4.
Using some modulus operations, I have already calculated the number of groups I need numGroups, as well as how many numNormalSizeGroups (i.e., number of PreferredGroupSize groups) and how many numOddSizeGroups.
This is calculated as follows:
//if(mod > (PreferredGroupSize)/2.0f)) make small groups.
//else make large groups.
float val1 = N % PreferredGroupSize;
float val2 = PreferredGroupSize / 2.0f;
if (N % PreferredGroupSize == 0)
{
lastGroupSizes = PreferredGroupSize;
numOddSizeGroups = 0;
numNormalSizeGroups = N / PreferredGroupSize;
}
else if (val1 > val2)
{
lastGroupSizes = PreferredGroupSize - 1;
numOddSizeGroups = PreferredGroupSize - N % PreferredGroupSize;
numGroups = (int)MathF.Ceiling(((float)N) / PreferredGroupSize);
numNormalSizeGroups = numGroups - numOddSizeGroups;
}
else
{
lastGroupSizes = PreferredGroupSize + 1;
numOddSizeGroups = N % PreferredGroupSize;
numGroups = (int)MathF.Floor(N / PreferredGroupSize);
numNormalSizeGroups = numGroups - numOddSizeGroups;
}
The issue I'm having is I do not know how to combine the groups to form valid combinations. Valid combinations should contain no duplicate members, the number of groups should be numGroups, and the total members should be N.
These will later be added to something like a Dictionary<List<Person>, float> Groups
So, for example, this would be a valid combination:
[A B C D E], 0.55
[F G H I J], 0.72
[K L M N O], 0.74
[P Q R S], 0.75
Each row is a List<Person>, followed by a float Score that represents the compatibility of the group. Higher is better. The floats aren't so important. What is important is that only valid combinations should be created.
This would be an invalid combination due to a repeated member:
[A B C D E], 0.55
[F B H I J], 0.77
[K L M N O], 0.74
[P Q R S], 0.75
This would be an invalid combination due to the wrong total number of people in the groups:
[A B C D], 0.63
[F G H I J], 0.72
[K L M N O], 0.74
[P Q R S], 0.75
If either condition is violated, it should be not be added to the list of valid groups.
Right now, I do the following:
Produce all combinations of people of size PreferredGroupSize. nCr = 19 choose 5
Produce all combinations of people of size lastGroupSizes. nCr = 19 choose 4
Calculate a group Score for all combinations produced by 1 and 2 and add the combinations and Score to the Groups Dictionary.
Produce all combinations of Groups of size numGroups.
This is where the major complexity comes into play, and why I want to avoid producing invalid groups:
nCr = ((19 choose 5) + (19 choose 4)) choose numGroups
Iterate over those using a foreach loop, looking for valid groups, adding only valid groups to a List<Person> ValidGroups.
Break out of the loop after a given M groups have been calculated. I have sorted Groups by the Score so that higher scoring groups are more likely to be found first.
Process the valid groups.
I skip Step 1 if numNormalSizeGroups is 0, and I skip Step 2 if numOddSizeGroups is 0.
There are a ton of invalid groups created this way, and the nCr is very large.
I believe there should be a better way to do this, but I have not yet found a way. Perhaps sharing how combinations are created may be helpful in assisting:
//make combinations of a certain length
public static IEnumerable<IEnumerable<T>> GetCombinations<T>(IEnumerable<T> items, int count)
{
int i = 0;
foreach (var item in items)
{
if (count == 1)
yield return new T[] { item };
else
{
foreach (var result in GetCombinations(items.Skip(i + 1), count - 1))
yield return new T[] { item }.Concat(result);
}
++i;
}
}
I attribute this code to here: Unique combinations of list
At the source, the code is listed as creating permutations instead of combinations. I have run this on many lists, and it produces combinations.
I'll also run through a somewhat minimal version of the current algorithm a bit with code:
//1. Produce all combinations of people of size PreferredGroupSize.
//2. Produce all combinations of people of size lastGroupSizes.
//Skip Step 1 if numNormalSizeGroups is 0
//Skip Step 2 if numOddSizeGroups is 0
bool calculatedNormal = false;
bool calculatedOdd = false;
while(!calculatedNormal || !calculatedOdd)
{
int gSize = 0;
if(!calculatedNormal)
{
calculatedNormal = true;
if(numNormalSizeGroups>0)
gSize = PreferredGroupSize;
else
continue;
}
else if(!calculatedOdd)
{
calculatedOdd = true;
if(numOddSizeGroups>0)
gSize = lastGroupSizes;
else
break;
}
//Calculate a group Score for all combinations produced by 1 and 2.
//Add the combinations and Score to the Groups Dictionary.
var combos = GetCombinations(people, gSize);
float groupScore = 0;
List<Person> curGroup;
//for purposes of this example, generate pseudorandom float:
Random rand = new Random();
foreach(var combo in combos)
{
//groupScore = CalculateGroupScore(combo);
groupScore = (float)rand.NextDouble();
curGroup = new List<Person>();
foreach(Person person in combo)
curGroup.Add(person);
Groups.Add(curGroup, groupScore);
}
}
//I have sorted Groups by the Score
//so that higher scoring groups are more
//likely to be found first.
Groups = Groups.OrderByDescending(x => x.Value).ToDictionary(x => x.Key, x => x.Value);
//4. Produce all combinations of Groups of size numGroups.
var AllGroupCombos = GetCombinations(Groups, numGroups);
int M = 10000;
List<Person> gList = new List<Person>();
List<Person> ValidGroups = new List<Person>();
int curPeopleInGroup = 0;
bool addGroup = true;
//5. Iterate over those using a foreach loop
foreach(var combo in AllGroupCombos)
{
curPeopleInGroup = 0;
addGroup = true;
gList.Clear();
//Look for valid groups
foreach(KeyValuePair<List<Person>, float> keyValuePair in combo)
{
foreach(Person person in keyValuePair.Key)
{
if(!gList.Contains(person)
{
gList.Add(person);
++curPeopleInGroup;
}
else
{
addGroup = false;
break;
}
}
if(!addGroup)
break;
}
//Add only valid groups to a List<Person> ValidGroups.
if(!addGroup || curPeopleInGroup != N)
continue;
var comboList = combo.ToList();
ValidGroups.Add(comboList);
//6. Break out of the loop after a given M
//groups have been calculated.
if(ValidGroups.Count() >= M)
break;
}
I hope this was helpful, and that someone will be able to assist. :)
I am trying to solve polygon partition problem using graham scan
below is the problem statement and input and expected output
I have implemented graham scan logic but i need to display output
according to the sets formed and display all the points with indexes;
The first line will contain a single integer ().
The next lines will contain two integers (), denoting a point with coordinates . Note that points are not guaranteed to be distinct.
SAMPLE INPUT
3
0 0
0 1
1 0
On the first line, output a single integer (), the number of sets in your partition.
On the next lines, print the elements of the partition. On the -th of these lines, first print the integer (), and then indices of points forming the -th set. Points are numbered starting from . Each index from to must appear in exactly one set.
SAMPLE OUTPUT
1
3 1 2 3
below is the code implemented according to the output mentioned above,
how to print for more sets?
private static void convexHull(Point[] points, int N, int set)
{
int index = 0;
List<int> result = new List<int>();
// There must be at least 3 points
if (N < 3) return;
// Initialize Result
MyStack<Point> hull = new MyStack<Point>(N);
// Find the leftmost point
int l = 0;
for (int i = 1; i < N; i++)
if (points[i].x < points[l].x)
l = i;
// Start from leftmost point, keep moving
// counterclockwise until reach the start point
// again. This loop runs O(h) times where h is
// number of points in result or output.
int p = l, q;
do
{
// Add current point to result
hull.push(points[p]);
// Search for a point 'q' such that
// orientation(p, x, q) is counterclockwise
// for all points 'x'. The idea is to keep
// track of last visited most counterclock-
// wise point in q. If any point 'i' is more
// counterclock-wise than q, then update q.
q = (p + 1) % N;
for (int i = 0; i < N; i++)
{
// If i is more counterclockwise than
// current q, then update q
if (orientation(points[p], points[i], points[q])
== 2)
q = i;
}
// Now q is the most counterclockwise with
// respect to p. Set p as q for next iteration,
// so that q is added to result 'hull'
p = q;
} while (p != l); // While we don't come to first
set += 1;
var ele = hull.GetAllStackElements();
foreach (Point pt in ele)
{
index += 1;
}
Console.WriteLine(set);
Console.Write(string.Format("{0} ", index));
for (int s = 1; s <= index; s++)
{
Console.Write(string.Format("{0} ", s));
}
Console.Write(Environment.NewLine);
}
Problem: Given an input array of integers of size n, and a query array of integers of size k, find the smallest window of input array that contains all the elements of query array and also in the same order.
I have tried below approach.
int[] inputArray = new int[] { 2, 5, 2, 8, 0, 1, 4, 7 };
int[] queryArray = new int[] { 2, 1, 7 };
Will find the position of all query array element in inputArray.
public static void SmallestWindow(int[] inputArray, int[] queryArray)
{
Dictionary<int, HashSet<int>> dict = new Dictionary<int, HashSet<int>>();
int index = 0;
foreach (int i in queryArray)
{
HashSet<int> hash = new HashSet<int>();
foreach (int j in inputArray)
{
index++;
if (i == j)
hash.Add(index);
}
dict.Add(i, hash);
index = 0;
}
// Need to perform action in above dictionary.??
}
I got following dictionary
int 2--> position {1, 3}
int 1 --> position {6}
int 7 --> position {8}
Now I want to perform following step to findout minimum window
Compare int 2 position to int 1 position. As (6-3) < (6-1)..So I will store 3, 6 in a hashmap.
Will compare the position of int 1 and int 7 same like above.
I cannot understand how I will compare two consecutive value of a dictionary. Please help.
The algorithm:
For each element in the query array, store in a map M (V → (I,P)), V is the element, I is an index into the input array, P is the position in the query array. (The index into the input array for some P is the largest such that query[0..P] is a subsequence of input[I..curr])
Iterate through the array.
If the value is the first term in the query array: Store the current index as I.
Else: Store the value of the index of the previous element in the query array, e.g. M[currVal].I = M[query[M[currVal].P-1]].I.
If the value is the last term: Check if [I..curr] is a new best.
Complexity
The complexity of this is O(N), where N is the size of the input array.
N.B.
This code expects that no elements are repeated in the query array. To cater for this, we can use a map M (V → listOf((I,P))). This is O(NhC(Q)), where hC(Q) is the count of the mode for the query array..
Even better would be to use M (V → listOf((linkedList(I), P))). Where repeated elements occur consecutively in the query array, we use a linked list. Updating those values then becomes O(1). The complexity is then O(NhC(D(Q))), where D(Q) is Q with consecutive terms merged.
Implementation
Sample java implementation is available here. This does not work for repeated elements in the query array, nor do error checking, etc.
I don't see how using HashSet and Dictionary will help you in this. Were I faced with this problem, I'd go about it quite differently.
One way to do it (not the most efficient way) is shown below. This code makes the assumption that queryArray contains at least two items.
int FindInArray(int[] a, int start, int value)
{
for (int i = start; i < a.Length; ++i)
{
if (a[i] == value)
return i;
}
return -1;
}
struct Pair
{
int first;
int last;
}
List<Pair> foundPairs = new List<Pair>();
int startPos = 0;
bool found = true;
while (found)
{
found = false;
// find next occurrence of queryArray[0] in inputArray
startPos = FindInArray(inputArray, startPos, queryArray[0]);
if (startPos == -1)
{
// no more occurrences of the first item
break;
}
Pair p = new Pair();
p.first = startPos;
++startPos;
int nextPos = startPos;
// now find occurrences of remaining items
for (int i = 1; i < queryArray.Length; ++i)
{
nextPos = FindInArray(inputArray, nextPos, queryArray[i]);
if (nextPos == -1)
{
break; // didn't find it
}
else
{
p.last = nextPos++;
found = (i == queryArray.Length-1);
}
}
if (found)
{
foundPairs.Add(p);
}
}
// At this point, the foundPairs list contains the (start, end) of all
// sublists that contain the items in order.
// You can then iterate through that list, subtract (last-first), and take
// the item that has the smallest value. That will be the shortest sublist
// that matches the criteria.
With some work, this could be made more efficient. For example, if 'queryArray' contains [1, 2, 3] and inputArray contains [1, 7, 4, 9, 1, 3, 6, 4, 1, 8, 2, 3], the above code will find three matches (starting at positions 0, 4, and 8). Slightly smarter code could determine that when the 1 at position 4 is found, since no 2 was found prior to it, that any sequence starting at the first position would be longer than the sequence starting at position 4, and therefore short-circuit the first sequence and start over at the new position. That complicates the code a bit, though.
You want not a HashSet but a (sorted) tree or array as the value in the dictionary; the dictionary contains mappings from values you find in the input array to the (sorted) list of indices where that value appears.
Then you do the following
Look up the first entry in the query. Pick the lowest index where it appears.
Look up the second entry; pick the lowest entry greater than the index of the first.
Look up the third; pick the lowest greater than the second. (Etc.)
When you reach the last entry in the query, (1 + last index - first index) is the size of the smallest match.
Now pick the second index of the first query, repeat, etc.
Pick the smallest match found from any of the starting indices.
(Note that the "lowest entry greater" is an operation supplied with sorted trees, or can be found via binary search on a sorted array.)
The complexity of this is approximately O(M*n*log n) where M is the length of the query and n is the average number of indices at which a given value appears in the input array. You can modify the strategy by picking that query array value that appears least often for the starting point and going up and down from there; if there are k of those entries (k <= n) then the complexity is O(M*k*log n).
After you got all the positions(indexes) in the inputArray:
2 --> position {0,2} // note: I change them to 0-based array
1 --> position {5,6} // I suppose it's {5,6} to make it more complex, in your code it's only {5}
7 --> position {7}
I use a recursion to get all possible paths. [0->5->7] [0->6->7] [2->5->7] [2->6->7]. The total is 2*2*1=4 possible paths. Obviously the one who has Min(Last-First) is the shortest path(smallest window), those numbers in the middle of the path don't matter. Here comes the code.
struct Pair
{
public int Number; // the number in queryArray
public int[] Indexes; // the positions of the number
}
static List<int[]> results = new List<int[]>(); //store all possible paths
static Stack<int> currResult = new Stack<int>(); // the container of current path
static int[] inputArray, queryArray;
static Pair[] pairs;
After the data structures, here is the Main.
inputArray = new int[] { 2, 7, 1, 5, 2, 8, 0, 1, 4, 7 }; //my test case
queryArray = new int[] { 2, 1, 7 };
pairs = (from n in queryArray
select new Pair { Number = n, Indexes = inputArray.FindAllIndexes(i => i == n) }).ToArray();
Go(0);
FindAllIndexes is an extension method to help find all the indexes.
public static int[] FindAllIndexes<T>(this IEnumerable<T> source, Func<T,bool> predicate)
{
//do necessary check here, then
Queue<int> indexes = new Queue<int>();
for (int i = 0;i<source.Count();i++)
if (predicate(source.ElementAt(i))) indexes.Enqueue(i);
return indexes.ToArray();
}
The recursion method:
static void Go(int depth)
{
if (depth == pairs.Length)
{
results.Add(currResult.Reverse().ToArray());
}
else
{
var indexes = pairs[depth].Indexes;
for (int i = 0; i < indexes.Length; i++)
{
if (depth == 0 || indexes[i] > currResult.Last())
{
currResult.Push(indexes[i]);
Go(depth + 1);
currResult.Pop();
}
}
}
}
At last, a loop of results can find the Min(Last-First) result(shortest window).
Algorithm:
get all indexes into the inputArray
of all queryArray values
order them ascending by index
using each index (x) as a starting
point find the first higher index
(y) such that the segment
inputArray[x-y] contains all
queryArray values
keep only those segments that have the queryArray items in order
order the segments by their lengths,
ascending
c# implementation:
First get all indexes into the inputArray of all queryArray values and order them ascending by index.
public static int[] SmallestWindow(int[] inputArray, int[] queryArray)
{
var indexed = queryArray
.SelectMany(x => inputArray
.Select((y, i) => new
{
Value = y,
Index = i
})
.Where(y => y.Value == x))
.OrderBy(x => x.Index)
.ToList();
Next, using each index (x) as a starting point find the first higher index (y) such that the segment inputArray[x-y] contains all queryArray values.
var segments = indexed
.Select(x =>
{
var unique = new HashSet<int>();
return new
{
Item = x,
Followers = indexed
.Where(y => y.Index >= x.Index)
.TakeWhile(y => unique.Count != queryArray.Length)
.Select(y =>
{
unique.Add(y.Value);
return y;
})
.ToList(),
IsComplete = unique.Count == queryArray.Length
};
})
.Where(x => x.IsComplete);
Now keep only those segments that have the queryArray items in order.
var queryIndexed = segments
.Select(x => x.Followers.Select(y => new
{
QIndex = Array.IndexOf(queryArray, y.Value),
y.Index,
y.Value
}).ToArray());
var queryOrdered = queryIndexed
.Where(item =>
{
var qindex = item.Select(x => x.QIndex).ToList();
bool changed;
do
{
changed = false;
for (int i = 1; i < qindex.Count; i++)
{
if (qindex[i] <= qindex[i - 1])
{
qindex.RemoveAt(i);
changed = true;
}
}
} while (changed);
return qindex.Count == queryArray.Length;
});
Finally, order the segments by their lengths, ascending. The first segment in the result is the smallest window into inputArray that contains all queryArray values in the order of queryArray.
var result = queryOrdered
.Select(x => new[]
{
x.First().Index,
x.Last().Index
})
.OrderBy(x => x[1] - x[0]);
var best = result.FirstOrDefault();
return best;
}
test it with
public void Test()
{
var inputArray = new[] { 2, 1, 5, 6, 8, 1, 8, 6, 2, 9, 2, 9, 1, 2 };
var queryArray = new[] { 6, 1, 2 };
var result = SmallestWindow(inputArray, queryArray);
if (result == null)
{
Console.WriteLine("no matching window");
}
else
{
Console.WriteLine("Smallest window is indexes " + result[0] + " to " + result[1]);
}
}
output:
Smallest window is indexes 3 to 8
Thank you everyone for your inputs. I have changed my code a bit and find it working. Though it might not be very efficient but I'm happy to solve using my head :). Please give your feedback
Here is my Pair class with having number and position as variable
public class Pair
{
public int Number;
public List<int> Position;
}
Here is a method which will return the list of all Pairs.
public static Pair[] GetIndex(int[] inputArray, int[] query)
{
Pair[] pairList = new Pair[query.Length];
int pairIndex = 0;
foreach (int i in query)
{
Pair pair = new Pair();
int index = 0;
pair.Position = new List<int>();
foreach (int j in inputArray)
{
if (i == j)
{
pair.Position.Add(index);
}
index++;
}
pair.Number = i;
pairList[pairIndex] = pair;
pairIndex++;
}
return pairList;
}
Here is the line of code in Main method
Pair[] pairs = NewCollection.GetIndex(array, intQuery);
List<int> minWindow = new List<int>();
for (int i = 0; i <pairs.Length - 1; i++)
{
List<int> first = pairs[i].Position;
List<int> second = pairs[i + 1].Position;
int? temp = null;
int? temp1 = null;
foreach(int m in first)
{
foreach (int n in second)
{
if (n > m)
{
temp = m;
temp1 = n;
}
}
}
if (temp.HasValue && temp1.HasValue)
{
if (!minWindow.Contains((int)temp))
minWindow.Add((int)temp);
if (!minWindow.Contains((int)temp1))
minWindow.Add((int)temp1);
}
else
{
Console.WriteLine(" Bad Query array");
minWindow.Clear();
break;
}
}
if(minWindow.Count > 0)
{
Console.WriteLine("Minimum Window is :");
foreach(int i in minWindow)
{
Console.WriteLine(i + " ");
}
}
It is worth noting that this problem is related to the longest common subsequence problem, so coming up with algorithms that run in better than O(n^2) time in the general case with duplicates would be challenging.
Just in case someone is interested in C++ implementation with O(nlog(k))
void findMinWindow(const vector<int>& input, const vector<int>& query) {
map<int, int> qtree;
for(vector<int>::const_iterator itr=query.begin(); itr!=query.end(); itr++) {
qtree[*itr] = 0;
}
int first_ptr=0;
int begin_ptr=0;
int index1 = 0;
int queptr = 0;
int flip = 0;
while(true) {
//check if value is in query
if(qtree.find(input[index1]) != qtree.end()) {
int x = qtree[input[index1]];
if(0 == x) {
flip++;
}
qtree[input[index1]] = ++x;
}
//remove all nodes that are not required and
//yet satisfy the all query condition.
while(query.size() == flip) {
//done nothing more
if(queptr == input.size()) {
break;
}
//check if queptr is pointing to node in the query
if(qtree.find(input[queptr]) != qtree.end()) {
int y = qtree[input[queptr]];
//more nodes and the queue is pointing to deleteable node
//condense the nodes
if(y > 1) {
qtree[input[queptr]] = --y;
queptr++;
} else {
//cant condense more just keep that memory
if((!first_ptr && !begin_ptr) ||
((first_ptr-begin_ptr)>(index1-queptr))) {
first_ptr=index1;
begin_ptr=queptr;
}
break;
}
} else {
queptr++;
}
}
index1++;
if(index1==input.size()) {
break;
}
}
cout<<"["<<begin_ptr<<" - "<<first_ptr<<"]"<<endl;
}
here the main for calling it.
#include <iostream>
#include <vector>
#include <map>
using namespace std;
int main() {
vector<int> input;
input.push_back(2);
input.push_back(5);
input.push_back(2);
input.push_back(8);
input.push_back(0);
input.push_back(1);
input.push_back(4);
input.push_back(7);
vector<int> query1;
query1.push_back(2);
query1.push_back(8);
query1.push_back(0);
vector<int> query2;
query2.push_back(2);
query2.push_back(1);
query2.push_back(7);
vector<int> query3;
query3.push_back(1);
query3.push_back(4);
findMinWindow(input, query1);
findMinWindow(input, query2);
findMinWindow(input, query3);
}
In developing search for a site I am building, I decided to go the cheap and quick way and use Microsoft Sql Server's Full Text Search engine instead of something more robust like Lucene.Net.
One of the features I would like to have, though, is google-esque relevant document snippets. I quickly found determining "relevant" snippets is more difficult than I realized.
I want to choose snippets based on search term density in the found text. So, essentially, I need to find the most search term dense passage in the text. Where a passage is some arbitrary number of characters (say 200 -- but it really doesn't matter).
My first thought is to use .IndexOf() in a loop and build an array of term distances (subtract the index of the found term from the previously found term), then ... what? Add up any two, any three, any four, any five, sequential array elements and use the one with the smallest sum (hence, the smallest distance between search terms).
That seems messy.
Is there an established, better, or more obvious way to do this than what I have come up with?
Although it is implemented in Java, you can see one approach for that problem here:
http://rcrezende.blogspot.com/2010/08/smallest-relevant-text-snippet-for.html
I know this thread is way old, but I gave this a try last week and it was a pain in the back side. This is far from perfect, but this is what I came up with.
The snippet generator:
public static string SelectKeywordSnippets(string StringToSnip, string[] Keywords, int SnippetLength)
{
string snippedString = "";
List<int> keywordLocations = new List<int>();
//Get the locations of all keywords
for (int i = 0; i < Keywords.Count(); i++)
keywordLocations.AddRange(SharedTools.IndexOfAll(StringToSnip, Keywords[i], StringComparison.CurrentCultureIgnoreCase));
//Sort locations
keywordLocations.Sort();
//Remove locations which are closer to each other than the SnippetLength
if (keywordLocations.Count > 1)
{
bool found = true;
while (found)
{
found = false;
for (int i = keywordLocations.Count - 1; i > 0; i--)
if (keywordLocations[i] - keywordLocations[i - 1] < SnippetLength / 2)
{
keywordLocations[i - 1] = (keywordLocations[i] + keywordLocations[i - 1]) / 2;
keywordLocations.RemoveAt(i);
found = true;
}
}
}
//Make the snippets
if (keywordLocations.Count > 0 && keywordLocations[0] - SnippetLength / 2 > 0)
snippedString = "... ";
foreach (int i in keywordLocations)
{
int stringStart = Math.Max(0, i - SnippetLength / 2);
int stringEnd = Math.Min(i + SnippetLength / 2, StringToSnip.Length);
int stringLength = Math.Min(stringEnd - stringStart, StringToSnip.Length - stringStart);
snippedString += StringToSnip.Substring(stringStart, stringLength);
if (stringEnd < StringToSnip.Length) snippedString += " ... ";
if (snippedString.Length > 200) break;
}
return snippedString;
}
The function which will find the index of all keywords in the sample text
private static List<int> IndexOfAll(string haystack, string needle, StringComparison Comparison)
{
int pos;
int offset = 0;
int length = needle.Length;
List<int> positions = new List<int>();
while ((pos = haystack.IndexOf(needle, offset, Comparison)) != -1)
{
positions.Add(pos);
offset = pos + length;
}
return positions;
}
It's a bit clumsy in its execution. The way it works is by finding the position of all keywords in the string. Then checking that no keywords are closer to each other than the desired snippet length, so that snippets won't overlap (that's where it's a bit iffy...). And then grabs substrings of the desired length centered around the position of the keywords and stitches the whole thing together.
I know this is years late, but posting just in case it might help somebody coming across this question.
public class Highlighter
{
private class Packet
{
public string Sentence;
public double Density;
public int Offset;
}
public static string FindSnippet(string text, string query, int maxLength)
{
if (maxLength < 0)
{
throw new ArgumentException("maxLength");
}
var words = query.Split(' ').Where(w => !string.IsNullOrWhiteSpace(w)).Select(word => word.ToLower()).ToLookup(s => s);
var sentences = text.Split('.');
var i = 0;
var packets = sentences.Select(sentence => new Packet
{
Sentence = sentence,
Density = ComputeDensity(words, sentence),
Offset = i++
}).OrderByDescending(packet => packet.Density);
var list = new SortedList<int, string>();
int length = 0;
foreach (var packet in packets)
{
if (length >= maxLength || packet.Density == 0)
{
break;
}
string sentence = packet.Sentence;
list.Add(packet.Offset, sentence.Substring(0, Math.Min(sentence.Length, maxLength - length)));
length += packet.Sentence.Length;
}
var sb = new List<string>();
int previous = -1;
foreach (var item in list)
{
var offset = item.Key;
var sentence = item.Value;
if (previous != -1 && offset - previous != 1)
{
sb.Add(".");
}
previous = offset;
sb.Add(Highlight(sentence, words));
}
return String.Join(".", sb);
}
private static string Highlight(string sentence, ILookup<string, string> words)
{
var sb = new List<string>();
var ff = true;
foreach (var word in sentence.Split(' '))
{
var token = word.ToLower();
if (ff && words.Contains(token))
{
sb.Add("[[HIGHLIGHT]]");
ff = !ff;
}
if (!ff && !string.IsNullOrWhiteSpace(token) && !words.Contains(token))
{
sb.Add("[[ENDHIGHLIGHT]]");
ff = !ff;
}
sb.Add(word);
}
if (!ff)
{
sb.Add("[[ENDHIGHLIGHT]]");
}
return String.Join(" ", sb);
}
private static double ComputeDensity(ILookup<string, string> words, string sentence)
{
if (string.IsNullOrEmpty(sentence) || words.Count == 0)
{
return 0;
}
int numerator = 0;
int denominator = 0;
foreach(var word in sentence.Split(' ').Select(w => w.ToLower()))
{
if (words.Contains(word))
{
numerator++;
}
denominator++;
}
if (denominator != 0)
{
return (double)numerator / denominator;
}
else
{
return 0;
}
}
}
Example:
highlight "Optic flow is defined as the change of structured light in the image, e.g. on the retina or the camera’s sensor, due to a relative motion between the eyeball or camera and the scene. Further definitions from the literature highlight different properties of optic flow" "optic flow"
Output:
[[HIGHLIGHT]] Optic flow [[ENDHIGHLIGHT]] is defined as the change of structured
light in the image, e... Further definitions from the literature highlight diff
erent properties of [[HIGHLIGHT]] optic flow [[ENDHIGHLIGHT]]
Well, here's the hacked together version I made using the algorithm I described above. I don't think it is all that great. It uses three (count em, three!) loops an array and two lists. But, well, it is better than nothing. I also hardcoded the maximum length instead of turning it into a parameter.
private static string FindRelevantSnippets(string infoText, string[] searchTerms)
{
List<int> termLocations = new List<int>();
foreach (string term in searchTerms)
{
int termStart = infoText.IndexOf(term);
while (termStart > 0)
{
termLocations.Add(termStart);
termStart = infoText.IndexOf(term, termStart + 1);
}
}
if (termLocations.Count == 0)
{
if (infoText.Length > 250)
return infoText.Substring(0, 250);
else
return infoText;
}
termLocations.Sort();
List<int> termDistances = new List<int>();
for (int i = 0; i < termLocations.Count; i++)
{
if (i == 0)
{
termDistances.Add(0);
continue;
}
termDistances.Add(termLocations[i] - termLocations[i - 1]);
}
int smallestSum = int.MaxValue;
int smallestSumIndex = 0;
for (int i = 0; i < termDistances.Count; i++)
{
int sum = termDistances.Skip(i).Take(5).Sum();
if (sum < smallestSum)
{
smallestSum = sum;
smallestSumIndex = i;
}
}
int start = Math.Max(termLocations[smallestSumIndex] - 128, 0);
int len = Math.Min(smallestSum, infoText.Length - start);
len = Math.Min(len, 250);
return infoText.Substring(start, len);
}
Some improvements I could think of would be to return multiple "snippets" with a shorter length that add up to the longer length -- this way multiple parts of the document can be sampled.
This is a nice problem :)
I think I'd create an index vector: For each word, create an entry 1 if search term or otherwise 0. Then find the i such that sum(indexvector[i:i+maxlength]) is maximized.
This can actually be done rather efficiently. Start with the number of searchterms in the first maxlength words. then, as you move on, decrease your counter if indexvector[i]=1 (i.e. your about to lose that search term as you increase i) and increase it if indexvector[i+maxlength+1]=1. As you go, keep track of the i with the highest counter value.
Once you got your favourite i, you can still do finetuning like see if you can reduce the actual size without compromising your counter, e.g. in order to find sentence boundaries or whatever. Or like picking the right i of a number of is with equivalent counter values.
Not sure if this is a better approach than yours - it's a different one.
You might also want to check out this paper on the topic, which comes with yet-another baseline: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.4357&rep=rep1&type=pdf
I took another approach, perhaps it will help someone...
First it searches if it word appears in my case with IgnoreCase (you change this of course yourself).
Then I create a list of Regex matches on each separators and search for the first occurrence of the word (allowing partial case insensitive matches).
From that index, I get the 10 matches in front and behind the word, which makes the snippet.
public static string GetSnippet(string text, string word)
{
if (text.IndexOf(word, StringComparison.InvariantCultureIgnoreCase) == -1)
{
return "";
}
var matches = new Regex(#"\b(\S+)\s?", RegexOptions.Singleline | RegexOptions.Compiled).Matches(text);
var p = -1;
for (var i = 0; i < matches.Count; i++)
{
if (matches[i].Value.IndexOf(word, StringComparison.InvariantCultureIgnoreCase) != -1)
{
p = i;
break;
}
}
if (p == -1) return "";
var snippet = "";
for (var x = Math.Max(p - 10, 0); x < p + 10; x++)
{
snippet += matches[x].Value + " ";
}
return snippet;
}
If you use CONTAINSTABLE you will get a RANK back , this is in essence a density value - higher the RANK value, the higher the density. This way, you just run a query to get the results you want and dont have to result to massaging the data when its returned.
Wrote a function to do this just now. You want to pass in:
Inputs:
Document text
This is the full text of the document you're taking a snippet from. Most likely you will want to strip out any BBCode/HTML from this document.
Original query
The string the user entered as their search
Snippet length
Length of the snippet you wish to display.
Return Value:
Start index of the document text to take the snippet from. To get the snippet simply do documentText.Substring(returnValue, snippetLength). This has the advantage that you know if the snippet is take from the start/end/middle so you can add some decoration like ... if you wish at the snippet start/end.
Performance
A resolution set to 1 will find the best snippet but moves the window along 1 char at a time. Set this value higher to speed up execution.
Tweaks
You can work out score however you want. In this example I've done Math.pow(wordLength, 2) to favour longer words.
private static int GetSnippetStartPoint(string documentText, string originalQuery, int snippetLength)
{
// Normalise document text
documentText = documentText.Trim();
if (string.IsNullOrWhiteSpace(documentText)) return 0;
// Return 0 if entire doc fits in snippet
if (documentText.Length <= snippetLength) return 0;
// Break query down into words
var wordsInQuery = new HashSet<string>();
{
var queryWords = originalQuery.Split(' ');
foreach (var word in queryWords)
{
var normalisedWord = word.Trim().ToLower();
if (string.IsNullOrWhiteSpace(normalisedWord)) continue;
if (wordsInQuery.Contains(normalisedWord)) continue;
wordsInQuery.Add(normalisedWord);
}
}
// Create moving window to get maximum trues
var windowStart = 0;
double maxScore = 0;
var maxWindowStart = 0;
// Higher number less accurate but faster
const int resolution = 5;
while (true)
{
var text = documentText.Substring(windowStart, snippetLength);
// Get score of this chunk
// This isn't perfect, as window moves in steps of resolution first and last words will be partial.
// Could probably be improved to iterate words and not characters.
var words = text.Split(' ').Select(c => c.Trim().ToLower());
double score = 0;
foreach (var word in words)
{
if (wordsInQuery.Contains(word))
{
// The longer the word, the more important.
// Can simply replace with score += 1 for simpler model.
score += Math.Pow(word.Length, 2);
}
}
if (score > maxScore)
{
maxScore = score;
maxWindowStart = windowStart;
}
// Setup next iteration
windowStart += resolution;
// Window end passed document end
if (windowStart + snippetLength >= documentText.Length)
{
break;
}
}
return maxWindowStart;
}
Lots more you can add to this, for example instead of comparing exact words perhaps you might want to try comparing the SOUNDEX where you weight soundex matches less than exact matches.