Related
Best way I can explain it is using an example:
You are visiting a shop with $2000, your goal is to have $0 at the end of your trip.
You do not know how many items are going to be available, nor how much they cost.
Say that there are currently 3 items costing $1000, $750, $500.
(The point is to calculate all possible solutions, not the most efficient one.)
You can spend $2000, this means:
You can buy the $1000 item 0, 1 or 2 times.
You can buy the $750 item 0, 1 or 2 times.
You can buy the $500 item 0, 1, 2, 3 or 4 times.
At the end I need to be able to have all solutions, in this case it will be
2*$1000
1*$1000 and 2*$500
2*$750 and 1*$500
4*$500
Side note: you can't have a duplicate solution (like this)
1*$1000 and 2*$500
2*$500 and 1*$1000
This is what I tried:
You first call this function using
goalmoney = convert.ToInt32(goalMoneyTextBox.Text);
totalmoney = Convert.ToInt32(totalMoneyTextBox.Text);
int[] list = new int[usingListBox.Items.Count];
Calculate(0, currentmoney, list);
The function:
public void Calculate(int level, int money, int[] list)
{
string item = usingListBox.Items[level].ToString();
int cost = ItemDict[item];
for (int i = 0; i <= (totalmoney / cost); i++)
{
int[] templist = list;
int tempmoney = money - (cost * i);
templist[level] = i;
if (tempmoney == goalmoney)
{
resultsFound++;
}
if (level < usingListBox.Items.Count - 1 && tempmoney != goalmoney) Calculate(level + 1, tempmoney, templist);
}
}
Your problem can be reduced to a well known mathematical problem labeled Frobenius equation which is closely related to the well known Coin problem. Suppose you have N items, where i-th item costs c[i] and you need to spent exactly S$. So you need to find all non negative integer solutions (or decide whether there are no solutions at all) of equation
c[1]*n[1] + c[2]*n[2] + ... + c[N]*n[N] = S
where all n[i] are unknown variables and each n[i] is the number of bought items of i-th type.
This equation can be solved in a various ways. The following function allSolutions (I suppose it can be additionally simplified) finds all solutions of a given equation:
public static List<int[]> allSolutions(int[] system, int total) {
ArrayList<int[]> all = new ArrayList<>();
int[] solution = new int[system.length];//initialized by zeros
int pointer = system.length - 1, temp;
out:
while (true) {
do { //the following loop can be optimized by calculation of remainder
++solution[pointer];
} while ((temp = total(system, solution)) < total);
if (temp == total && pointer != 0)
all.add(solution.clone());
do {
if (pointer == 0) {
if (temp == total) //not lose the last solution!
all.add(solution.clone());
break out;
}
for (int i = pointer; i < system.length; ++i)
solution[i] = 0;
++solution[--pointer];
} while ((temp = total(system, solution)) > total);
pointer = system.length - 1;
if (temp == total)
all.add(solution.clone());
}
return all;
}
public static int total(int[] system, int[] solution) {
int total = 0;
for (int i = 0; i < system.length; ++i)
total += system[i] * solution[i];
return total;
}
In the above code system is array of coefficients c[i] and total is S. There is an obvious restriction: system should have no any zero elements (this lead to infinite number of solutions). A slight modification of the above code avoids this restriction.
Assuming you have class Product which exposes a property called Price, this is a way to do it:
public List<List<Product>> GetAffordableCombinations(double availableMoney, List<Product> availableProducts)
{
List<Product> sortedProducts = availableProducts.OrderByDescending(p => p.Price).ToList();
//we have to cycle through the list multiple times while keeping track of the current
//position in each subsequent cycle. we're using a list of integers to save these positions
List<int> layerPointer = new List<int>();
layerPointer.Add(0);
int currentLayer = 0;
List<List<Product>> affordableCombinations = new List<List<Product>>();
List<Product> tempList = new List<Product>();
//when we went through all product on the top layer, we're done
while (layerPointer[0] < sortedProducts.Count)
{
//take the product in the current position on the current layer
var currentProduct = sortedProducts[layerPointer[currentLayer]];
var currentSum = tempList.Sum(p => p.Price);
if ((currentSum + currentProduct.Price) <= availableMoney)
{
//if the sum doesn't exeed our maximum we add that prod to a temp list
tempList.Add(currentProduct);
//then we advance to the next layer
currentLayer++;
//if it doesn't exist, we create it and set the 'start product' on that layer
//to the current product of the current layer
if (currentLayer >= layerPointer.Count)
layerPointer.Add(layerPointer[currentLayer - 1]);
}
else
{
//if the sum would exeed our maximum we move to the next prod on the current layer
layerPointer[currentLayer]++;
if (layerPointer[currentLayer] >= sortedProducts.Count)
{
//if we've reached the end of the list on the current layer,
//there are no more cheaper products to add, and this cycle is complete
//so we add the list we have so far to the possible combinations
affordableCombinations.Add(tempList);
tempList = new List<Product>();
//move to the next product on the top layer
layerPointer[0]++;
currentLayer = 0;
//set the current products on each subsequent layer to the current of the top layer
for (int i = 1; i < layerPointer.Count; i++)
{
layerPointer[i] = layerPointer[0];
}
}
}
}
return affordableCombinations;
}
I have a method that needs to do a calculation based upon the length of an array. I am using the .length method for the calculation, but the method is doing arithmetic with the max length of the array (which I have declared as 10). This is the loop I am using to get data from the user. I know it isn't the ideal way to sort array data, but this is for a homework assignment, and it revolves around using the .Split method correctly (which isn't the problem I'm having).
for (int i = 0; i < MAX; i++)
{
Console.Write("Enter a name and a score for player #{0}: ", (i + 1));
string input = Console.ReadLine();
if (input == "")
{
// If nothing is entered, it will break the loop.
break;
}
// Splits the user data into 2 arrays (integer and string).
string[] separateInput = input.Split();
name [i] = separateInput[0];
score [i] = int.Parse(separateInput[1]);
}
Here is the method I am using to calculate the average score:
static void CalculateScores(int[] score)
{
int sum = 0;
int average = 0;
for (int i = 0; i < score.Length; i++)
{
sum += score[i];
average = sum / score.Length;
}
Console.WriteLine("The average score was {0}", average);
I am calling the method like this:
CalculateScores(score);
Edit: My arrays are declared:
int[] score = new int[MAX]; //MAX == 10.
string[] name = new string[MAX];
The CalculateScores method is doing the math as though score.Length is always 10, no matter how many different combinations of scores I input to the console. I can't figure out if it's because my loop to gather input has been done incorrectly, or my CalculateScores method is flawed. Thanks in advance.
Edit: to clarify, I am just confused at why I can't get the correct value out of CalculateScores.
Length always represents the size of the array, which if you've instantiated as 10, then it will always be 10, regardless of how many items you've filled.
There are lots of ways of solving your problem, but I'd go with the simple one of not using length in your calculation, but rather just storing the number of items in a separate variable:
int numItems = 0;
for(int i=0;i<MAX;i++)
{
Console.Write("Enter a name and a score for player #{0}: ", (i + 1));
string input = Console.ReadLine();
if (input == "")
{
break; // if nothing is entered, it will break the loop
}
numItems++;
...
}
static void CalculateScores(int[] score, int numItems)
{
// don't use Length at all, use numItems instead
}
Arrays are generally used for fixed sized data, so the Length property reflects how many items the array can hold rather than the amount of elements in the array. The simplest fix would be to use a List(T), which is used for variadic data, instead.
// A nice abstraction to hold the scores instead of two separate arrays.
public class ScoreKeeper
{
public string Name { get; set; }
public int Score { get; set; }
}
var scores = new List<ScoreKeeper>();
for (int i = 0; i < MAX; i++)
{
Console.Write("Enter a name and a score for player #{0}: ", (i + 1));
string input = Console.ReadLine();
if (input == "")
{
// If nothing is entered, it will break the loop.
break;
}
// Splits the user data into 2 arrays (integer and string).
string[] separateInput = input.Split();
scores.Add(new ScoreKeeper { Name = separateInput[0], Score = int.Parse(separateInput[1]) });
}
static void CalculateScores(ICollection<ScoreKeeper> scores)
{
// We take advantage of Linq here by gathering all the
// scores and taking their average.
var average = scores.Select(s => s.Score).Average();
Console.WriteLine("The average score was {0}", average);
}
checking maually:
int sum = 0;
int average = 0;
int length;
for (int i = 0; i < MAX; i++) {
if(name[i]!=string.empty) {
sum += score[i];
length=i+1;
}
}
average = sum / length;
Console.WriteLine("The average score was {0}", average);
I have come up with the code below but that doesn't satisfy all cases, e.g.:
Array consisting all 0's
Array having negative values(it's bit tricky since it's about finding product as two negative ints give positive value)
public static int LargestProduct(int[] arr)
{
//returning arr[0] if it has only one element
if (arr.Length == 1) return arr[0];
int product = 1;
int maxProduct = Int32.MinValue;
for (int i = 0; i < arr.Length; i++)
{
//this block store the largest product so far when it finds 0
if (arr[i] == 0)
{
if (maxProduct < product)
{
maxProduct = product;
}
product = 1;
}
else
{
product *= arr[i];
}
}
if (maxProduct > product)
return maxProduct;
else
return product;
}
How can I incorporate the above cases/correct the code. Please suggest.
I am basing my answer on the assumption that if you have more than 1 element in the array, you would want to multiply at least 2 contiguous integers for checking the output, i.e. in array of {-1, 15}, the output that you want is -15 and not 15).
The problem that we need to solve is to look at all possible multiplication combinations and find out the max product out of them.
The total number of products in an array of n integers would be nC2 i.e. if there are 2 elements, then the total multiplication combinations would be 1, for 3, it would be 3, for 4, it would be 6 and so on.
For each number that we have in the incoming array, it has to multiply with all the multiplications that we did with the last element and keep the max product till now and if we do it for all the elements, at the end we would be left with the maximum product.
This should work for negatives and zeros.
public static long LargestProduct(int[] arr)
{
if (arr.Length == 1)
return arr[0];
int lastNumber = 1;
List<long> latestProducts = new List<long>();
long maxProduct = Int64.MinValue;
for (int i = 0; i < arr.Length; i++)
{
var item = arr[i];
var latest = lastNumber * item;
var temp = new long[latestProducts.Count];
latestProducts.CopyTo(temp);
latestProducts.Clear();
foreach (var p in temp)
{
var product = p * item;
if (product > maxProduct)
maxProduct = product;
latestProducts.Add(product);
}
if (i != 0)
{
if (latest > maxProduct)
maxProduct = latest;
latestProducts.Add(latest);
}
lastNumber = item;
}
return maxProduct;
}
If you want the maximum product to also incorporate the single element present in the array i.e. {-1, 15} should written 15, then you can compare the max product with the element of the array being processed and that should give you the max product if the single element is the max number.
This can be achieved by adding the following code inside the for loop at the end.
if (item > maxProduct)
maxProduct = item;
Your basic problem is 2 parts. Break them down and solving it becomes easier.
1) Find all contiguous subsets.
Since your source sequence can have negative values, you are not all that equipped to make any value judgments until you're found each subset, as a negative can later be "cancelled" by another. So let the first phase be to only find the subsets.
An example of how you might do this is the following code
// will contain all contiguous subsets
var sequences = new List<Tuple<bool, List<int>>>();
// build subsets
foreach (int item in source)
{
var deadCopies = new List<Tuple<bool, List<int>>>();
foreach (var record in sequences.Where(r => r.Item1 && !r.Item2.Contains(0)))
{
// make a copy that is "dead"
var deadCopy = new Tuple<bool, List<int>>(false, record.Item2.ToList());
deadCopies.Add(deadCopy);
record.Item2.Add(item);
}
sequences.Add(new Tuple<bool, List<int>>(true, new List<int> { item }));
sequences.AddRange(deadCopies);
}
In the above code, I'm building all my contiguous subsets, while taking the liberty of not adding anything to a given subset that already has a 0 value. You can omit that particular behavior if you wish.
2) Calculate each subset's product and compare that to a max value.
Once you have found all of your qualifying subsets, the next part is easy.
// find subset with highest product
int maxProduct = int.MinValue;
IEnumerable<int> maxSequence = Enumerable.Empty<int>();
foreach (var record in sequences)
{
int product = record.Item2.Aggregate((a, b) => a * b);
if (product > maxProduct)
{
maxProduct = product;
maxSequence = record.Item2;
}
}
Add whatever logic you wish to restrict the length of the original source or the subset candidates or product values. For example, if you wish to enforce minimum length requirements on either, or if a subset product of 0 is allowed if a non-zero product is available.
Also, I make no claims as to the performance of the code, it is merely to illustrate breaking the problem down into its parts.
I think you should have 2 products at the same time - they will differ in signs.
About case, when all values are zero - you can check at the end if maxProduct is still Int32.MinValue (if Int32.MinValue is really not possible)
My variant:
int maxProduct = Int32.MinValue;
int? productWithPositiveStart = null;
int? productWithNegativeStart = null;
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] == 0)
{
productWithPositiveStart = null;
productWithNegativeStart = null;
}
else
{
if (arr[i] > 0 && productWithPositiveStart == null)
{
productWithPositiveStart = arr[i];
}
else if (productWithPositiveStart != null)
{
productWithPositiveStart *= arr[i];
maxProduct = Math.max(maxProduct, productWithPositiveStart);
}
if (arr[i] < 0 && productWithNegativeStart == null)
{
productWithNegativeStart = arr[i];
}
else if (productWithNegativeStart != null)
{
productWithNegativeStart *= arr[i];
maxProduct = Math.max(maxProduct, productWithNegativeStart);
}
maxProduct = Math.max(arr[i], maxProduct);
}
}
if (maxProduct == Int32.MinValue)
{
maxProduct = 0;
}
At a high level, your current algorithm splits the array upon a 0 and returns the largest contiguous product of these sub-arrays. Any further iterations will be on the process of finding the largest contiguous product of a sub-array where no elements are 0.
To take into account negative numbers, we obviously first need to test if the product of one of these sub-arrays is negative, and take some special action if it is.
The negative result comes from an odd number of negative values, so we need to remove one of these negative values to make the result positive again. To do this we remove all elements up the the first negative number, or the last negative number and all elements after that, whichever results in the highest product.
To take into account an array of all 0's, simply use 0 as your starting maxProduct. If the array is a single negative value, you're special case handling of a single element will mean that is returned. After that, there will always be a positive sub-sequence product, or else the whole array is 0 and it should return 0 anyway.
it can be done in O(N). it is based on the simple idea: calculate the minimum (minCurrent) and maximum (maxCurrent) till i. This can be easily changed to fit for the condition like: {0,0,-2,0} or {-2,-3, -8} or {0,0}
a[] = {6, -3, 2, 0, 3, -2, -4, -2, 4, 5}
steps of the algorithm given below for the above array a :
private static int getMaxProduct(int[] a) {
if (a.length == 0) {
throw new IllegalArgumentException();
}
int minCurrent = 1, maxCurrent = 1, max = Integer.MIN_VALUE;
for (int current : a) {
if (current > 0) {
maxCurrent = maxCurrent * current;
minCurrent = Math.min(minCurrent * current, 1);
} else if (current == 0) {
maxCurrent = 1;
minCurrent = 1;
} else {
int x = maxCurrent;
maxCurrent = Math.max(minCurrent * current, 1);
minCurrent = x * current;
}
if (max < maxCurrent) {
max = maxCurrent;
}
}
//System.out.println(minCurrent);
return max;
}
I am having the numbers follows taken as strings
My actual number is 1234567890123456789
from this i have to separate it as s=12 s1=6789 s3=3456789012345
remaining as i said
I would like to add as follows
11+3, 2+4, 6+5, 7+6, 8+7, 9+8 such that the output should be as follows
4613579012345
Any help please
public static string CombineNumbers(string number1, string number2)
{
int length = number1.Length > number2.Length ? number1.Length : number2.Length;
string returnValue = string.Empty;
for (int i = 0; i < length; i++)
{
int n1 = i >= number1.Length ? 0 : int.Parse(number1.Substring(i,1));
int n2 = i >= number2.Length ? 0 : int.Parse(number2.Substring(i,1));
int sum = n1 + n2;
returnValue += sum < 10 ? sum : sum - 10;
}
return returnValue;
}
This sounds an awful lot like a homework problem, so I'm not giving code. Just think about what you need to do. You are saying that you need to take the first character off the front of two strings, parse them to ints, and add them together. Finally, take the result of the addition and append them to the end of a new string. If you write code that follows that path, it should work out fine.
EDIT: As Ralph pointed out, you'll also need to check for overflows. I didn't notice that when I started typing. Although, that shouldn't be too hard, since you're starting with a two one digit numbers. If the number is greater than 9, then you can just subtract 10 to bring it down to the proper one digit number.
How about this LINQish solution:
private string SumIt(string first, string second)
{
IEnumerable<char> left = first;
IEnumerable<char> right = second;
var sb = new StringBuilder();
var query = left.Zip(right, (l, r) => new { Left = l, Right = r })
.Select(chars => new { Left = int.Parse(chars.Left.ToString()),
Right = int.Parse(chars.Right.ToString()) })
.Select(numbers => (numbers.Left + numbers.Right) % 10);
foreach (var number in query)
{
sb.Append(number);
}
return sb.ToString();
}
Tried something:
public static string NumAdd(int iOne, int iTwo)
{
char[] strOne = iOne.ToString().ToCharArray();
char[] strTwo = iTwo.ToString().ToCharArray();
string strReturn = string.Empty;
for (int i = 0; i < strOne.Length; i++)
{
int iFirst = 0;
if (int.TryParse(strOne[i].ToString(), out iFirst))
{
int iSecond = 0;
if (int.TryParse(strTwo[i].ToString(), out iSecond))
{
strReturn += ((int)(iFirst + iSecond)).ToString();
}
}
// last one, add the remaining string
if (i + 1 == strOne.Length)
{
strReturn += iTwo.ToString().Substring(i+1);
break;
}
}
return strReturn;
}
You should call it like this:
string strBla = NumAdd(12345, 123456789);
This function works only if the first number is smaller than the second one. But this will help you to know how it is about.
In other words, you want to add two numbers treating the lesser number like it had zeroes to its right until it had the same amount of digits as the greater number.
Sounds like the problem at this point is simply a matter of finding out how much you need to multiply the smaller number by in order to reach the number of digits of the larger number.
In developing search for a site I am building, I decided to go the cheap and quick way and use Microsoft Sql Server's Full Text Search engine instead of something more robust like Lucene.Net.
One of the features I would like to have, though, is google-esque relevant document snippets. I quickly found determining "relevant" snippets is more difficult than I realized.
I want to choose snippets based on search term density in the found text. So, essentially, I need to find the most search term dense passage in the text. Where a passage is some arbitrary number of characters (say 200 -- but it really doesn't matter).
My first thought is to use .IndexOf() in a loop and build an array of term distances (subtract the index of the found term from the previously found term), then ... what? Add up any two, any three, any four, any five, sequential array elements and use the one with the smallest sum (hence, the smallest distance between search terms).
That seems messy.
Is there an established, better, or more obvious way to do this than what I have come up with?
Although it is implemented in Java, you can see one approach for that problem here:
http://rcrezende.blogspot.com/2010/08/smallest-relevant-text-snippet-for.html
I know this thread is way old, but I gave this a try last week and it was a pain in the back side. This is far from perfect, but this is what I came up with.
The snippet generator:
public static string SelectKeywordSnippets(string StringToSnip, string[] Keywords, int SnippetLength)
{
string snippedString = "";
List<int> keywordLocations = new List<int>();
//Get the locations of all keywords
for (int i = 0; i < Keywords.Count(); i++)
keywordLocations.AddRange(SharedTools.IndexOfAll(StringToSnip, Keywords[i], StringComparison.CurrentCultureIgnoreCase));
//Sort locations
keywordLocations.Sort();
//Remove locations which are closer to each other than the SnippetLength
if (keywordLocations.Count > 1)
{
bool found = true;
while (found)
{
found = false;
for (int i = keywordLocations.Count - 1; i > 0; i--)
if (keywordLocations[i] - keywordLocations[i - 1] < SnippetLength / 2)
{
keywordLocations[i - 1] = (keywordLocations[i] + keywordLocations[i - 1]) / 2;
keywordLocations.RemoveAt(i);
found = true;
}
}
}
//Make the snippets
if (keywordLocations.Count > 0 && keywordLocations[0] - SnippetLength / 2 > 0)
snippedString = "... ";
foreach (int i in keywordLocations)
{
int stringStart = Math.Max(0, i - SnippetLength / 2);
int stringEnd = Math.Min(i + SnippetLength / 2, StringToSnip.Length);
int stringLength = Math.Min(stringEnd - stringStart, StringToSnip.Length - stringStart);
snippedString += StringToSnip.Substring(stringStart, stringLength);
if (stringEnd < StringToSnip.Length) snippedString += " ... ";
if (snippedString.Length > 200) break;
}
return snippedString;
}
The function which will find the index of all keywords in the sample text
private static List<int> IndexOfAll(string haystack, string needle, StringComparison Comparison)
{
int pos;
int offset = 0;
int length = needle.Length;
List<int> positions = new List<int>();
while ((pos = haystack.IndexOf(needle, offset, Comparison)) != -1)
{
positions.Add(pos);
offset = pos + length;
}
return positions;
}
It's a bit clumsy in its execution. The way it works is by finding the position of all keywords in the string. Then checking that no keywords are closer to each other than the desired snippet length, so that snippets won't overlap (that's where it's a bit iffy...). And then grabs substrings of the desired length centered around the position of the keywords and stitches the whole thing together.
I know this is years late, but posting just in case it might help somebody coming across this question.
public class Highlighter
{
private class Packet
{
public string Sentence;
public double Density;
public int Offset;
}
public static string FindSnippet(string text, string query, int maxLength)
{
if (maxLength < 0)
{
throw new ArgumentException("maxLength");
}
var words = query.Split(' ').Where(w => !string.IsNullOrWhiteSpace(w)).Select(word => word.ToLower()).ToLookup(s => s);
var sentences = text.Split('.');
var i = 0;
var packets = sentences.Select(sentence => new Packet
{
Sentence = sentence,
Density = ComputeDensity(words, sentence),
Offset = i++
}).OrderByDescending(packet => packet.Density);
var list = new SortedList<int, string>();
int length = 0;
foreach (var packet in packets)
{
if (length >= maxLength || packet.Density == 0)
{
break;
}
string sentence = packet.Sentence;
list.Add(packet.Offset, sentence.Substring(0, Math.Min(sentence.Length, maxLength - length)));
length += packet.Sentence.Length;
}
var sb = new List<string>();
int previous = -1;
foreach (var item in list)
{
var offset = item.Key;
var sentence = item.Value;
if (previous != -1 && offset - previous != 1)
{
sb.Add(".");
}
previous = offset;
sb.Add(Highlight(sentence, words));
}
return String.Join(".", sb);
}
private static string Highlight(string sentence, ILookup<string, string> words)
{
var sb = new List<string>();
var ff = true;
foreach (var word in sentence.Split(' '))
{
var token = word.ToLower();
if (ff && words.Contains(token))
{
sb.Add("[[HIGHLIGHT]]");
ff = !ff;
}
if (!ff && !string.IsNullOrWhiteSpace(token) && !words.Contains(token))
{
sb.Add("[[ENDHIGHLIGHT]]");
ff = !ff;
}
sb.Add(word);
}
if (!ff)
{
sb.Add("[[ENDHIGHLIGHT]]");
}
return String.Join(" ", sb);
}
private static double ComputeDensity(ILookup<string, string> words, string sentence)
{
if (string.IsNullOrEmpty(sentence) || words.Count == 0)
{
return 0;
}
int numerator = 0;
int denominator = 0;
foreach(var word in sentence.Split(' ').Select(w => w.ToLower()))
{
if (words.Contains(word))
{
numerator++;
}
denominator++;
}
if (denominator != 0)
{
return (double)numerator / denominator;
}
else
{
return 0;
}
}
}
Example:
highlight "Optic flow is defined as the change of structured light in the image, e.g. on the retina or the camera’s sensor, due to a relative motion between the eyeball or camera and the scene. Further definitions from the literature highlight different properties of optic flow" "optic flow"
Output:
[[HIGHLIGHT]] Optic flow [[ENDHIGHLIGHT]] is defined as the change of structured
light in the image, e... Further definitions from the literature highlight diff
erent properties of [[HIGHLIGHT]] optic flow [[ENDHIGHLIGHT]]
Well, here's the hacked together version I made using the algorithm I described above. I don't think it is all that great. It uses three (count em, three!) loops an array and two lists. But, well, it is better than nothing. I also hardcoded the maximum length instead of turning it into a parameter.
private static string FindRelevantSnippets(string infoText, string[] searchTerms)
{
List<int> termLocations = new List<int>();
foreach (string term in searchTerms)
{
int termStart = infoText.IndexOf(term);
while (termStart > 0)
{
termLocations.Add(termStart);
termStart = infoText.IndexOf(term, termStart + 1);
}
}
if (termLocations.Count == 0)
{
if (infoText.Length > 250)
return infoText.Substring(0, 250);
else
return infoText;
}
termLocations.Sort();
List<int> termDistances = new List<int>();
for (int i = 0; i < termLocations.Count; i++)
{
if (i == 0)
{
termDistances.Add(0);
continue;
}
termDistances.Add(termLocations[i] - termLocations[i - 1]);
}
int smallestSum = int.MaxValue;
int smallestSumIndex = 0;
for (int i = 0; i < termDistances.Count; i++)
{
int sum = termDistances.Skip(i).Take(5).Sum();
if (sum < smallestSum)
{
smallestSum = sum;
smallestSumIndex = i;
}
}
int start = Math.Max(termLocations[smallestSumIndex] - 128, 0);
int len = Math.Min(smallestSum, infoText.Length - start);
len = Math.Min(len, 250);
return infoText.Substring(start, len);
}
Some improvements I could think of would be to return multiple "snippets" with a shorter length that add up to the longer length -- this way multiple parts of the document can be sampled.
This is a nice problem :)
I think I'd create an index vector: For each word, create an entry 1 if search term or otherwise 0. Then find the i such that sum(indexvector[i:i+maxlength]) is maximized.
This can actually be done rather efficiently. Start with the number of searchterms in the first maxlength words. then, as you move on, decrease your counter if indexvector[i]=1 (i.e. your about to lose that search term as you increase i) and increase it if indexvector[i+maxlength+1]=1. As you go, keep track of the i with the highest counter value.
Once you got your favourite i, you can still do finetuning like see if you can reduce the actual size without compromising your counter, e.g. in order to find sentence boundaries or whatever. Or like picking the right i of a number of is with equivalent counter values.
Not sure if this is a better approach than yours - it's a different one.
You might also want to check out this paper on the topic, which comes with yet-another baseline: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.4357&rep=rep1&type=pdf
I took another approach, perhaps it will help someone...
First it searches if it word appears in my case with IgnoreCase (you change this of course yourself).
Then I create a list of Regex matches on each separators and search for the first occurrence of the word (allowing partial case insensitive matches).
From that index, I get the 10 matches in front and behind the word, which makes the snippet.
public static string GetSnippet(string text, string word)
{
if (text.IndexOf(word, StringComparison.InvariantCultureIgnoreCase) == -1)
{
return "";
}
var matches = new Regex(#"\b(\S+)\s?", RegexOptions.Singleline | RegexOptions.Compiled).Matches(text);
var p = -1;
for (var i = 0; i < matches.Count; i++)
{
if (matches[i].Value.IndexOf(word, StringComparison.InvariantCultureIgnoreCase) != -1)
{
p = i;
break;
}
}
if (p == -1) return "";
var snippet = "";
for (var x = Math.Max(p - 10, 0); x < p + 10; x++)
{
snippet += matches[x].Value + " ";
}
return snippet;
}
If you use CONTAINSTABLE you will get a RANK back , this is in essence a density value - higher the RANK value, the higher the density. This way, you just run a query to get the results you want and dont have to result to massaging the data when its returned.
Wrote a function to do this just now. You want to pass in:
Inputs:
Document text
This is the full text of the document you're taking a snippet from. Most likely you will want to strip out any BBCode/HTML from this document.
Original query
The string the user entered as their search
Snippet length
Length of the snippet you wish to display.
Return Value:
Start index of the document text to take the snippet from. To get the snippet simply do documentText.Substring(returnValue, snippetLength). This has the advantage that you know if the snippet is take from the start/end/middle so you can add some decoration like ... if you wish at the snippet start/end.
Performance
A resolution set to 1 will find the best snippet but moves the window along 1 char at a time. Set this value higher to speed up execution.
Tweaks
You can work out score however you want. In this example I've done Math.pow(wordLength, 2) to favour longer words.
private static int GetSnippetStartPoint(string documentText, string originalQuery, int snippetLength)
{
// Normalise document text
documentText = documentText.Trim();
if (string.IsNullOrWhiteSpace(documentText)) return 0;
// Return 0 if entire doc fits in snippet
if (documentText.Length <= snippetLength) return 0;
// Break query down into words
var wordsInQuery = new HashSet<string>();
{
var queryWords = originalQuery.Split(' ');
foreach (var word in queryWords)
{
var normalisedWord = word.Trim().ToLower();
if (string.IsNullOrWhiteSpace(normalisedWord)) continue;
if (wordsInQuery.Contains(normalisedWord)) continue;
wordsInQuery.Add(normalisedWord);
}
}
// Create moving window to get maximum trues
var windowStart = 0;
double maxScore = 0;
var maxWindowStart = 0;
// Higher number less accurate but faster
const int resolution = 5;
while (true)
{
var text = documentText.Substring(windowStart, snippetLength);
// Get score of this chunk
// This isn't perfect, as window moves in steps of resolution first and last words will be partial.
// Could probably be improved to iterate words and not characters.
var words = text.Split(' ').Select(c => c.Trim().ToLower());
double score = 0;
foreach (var word in words)
{
if (wordsInQuery.Contains(word))
{
// The longer the word, the more important.
// Can simply replace with score += 1 for simpler model.
score += Math.Pow(word.Length, 2);
}
}
if (score > maxScore)
{
maxScore = score;
maxWindowStart = windowStart;
}
// Setup next iteration
windowStart += resolution;
// Window end passed document end
if (windowStart + snippetLength >= documentText.Length)
{
break;
}
}
return maxWindowStart;
}
Lots more you can add to this, for example instead of comparing exact words perhaps you might want to try comparing the SOUNDEX where you weight soundex matches less than exact matches.