Count occurrence of whole word in a string - c#

I want to find the number of occurrence of particular word in a string.
I have searched online and found many answers like
Stack Overflow Answer
Answer from DotNetPerl
But none of them gave me accurate result.
What I want is:
Input:
I have asked the question in StackOverflow. Therefore i can expect answer here.
Output for "The" keyword:
The keyword count: 2
Note: It should not consider "The" from "Therefore" in a sentence.
Basically I want to match whole word and get the count.

Try like this
var searchText=" the ";
var input="I have asked the question in StackOverflow. Therefore i can expect answer here.";
var arr=input.Split(new char[]{' ','.'});
var count=Array.FindAll(arr, s => s.Equals(searchText.Trim())).Length;
Console.WriteLine(count);
DOTNETFIDDLE
EDIT
For your Search Sentence
var sentence ="I have asked the question in StackOverflow. Therefore i can expect answer here.";
var searchText="have asked";
char [] split=new char[]{',',' ','.'};
var splitSentence=sentence.ToLower().Split(split);
var splitText=searchText.ToLower().Split(split);
Console.WriteLine("Search Sentence {0}",splitSentence.Length);
Console.WriteLine("Search Text {0}",splitText.Length);
var count=0;
for(var i=0;i<splitSentence.Length;i++){
if(splitSentence[i]==splitText[0]){
var index=i;
var found=true;
var j=0;
for( j=0;j<splitText.Length;j++){
if(splitSentence[index++]!=splitText[j])
{
found=false;
break;
}
}
if(found){
Console.WriteLine("Index J {0} ",j);
count++;
i= index >i ? index-1 : i;
}
}
}
Console.WriteLine("Total found {0} substring",count);
DOTNETFIDDLE

A possible solution would be using Regex:
var count = Regex.Matches(input.ToLower(), String.Format("\b{0}\b", "the")).Count;

try Like this (Way 1)
string SpecificWord = " the ";
string sentence = "I have asked the question in StackOverflow. Therefore i can expect answer here.";
int count = 0;
foreach (Match match in Regex.Matches(sentence, SpecificWord, RegexOptions.IgnoreCase))
{
count++;
}
Console.WriteLine("{0}" + " Found " + "{1}" + " Times", SpecificWord, count);
or Like this (Way 2)
string SpecificWord = " the ";
string sentence = "I have asked the question in StackOverflow. Therefore i can expect answer here.";
int WordPlace = sentence.IndexOf(SpecificWord);
Console.WriteLine(sentence);
int TimesRep;
for (TimesRep = 0; WordPlace > -1; TimesRep++)
{
sentence = (sentence.Substring(0, WordPlace) +sentence.Substring(WordPlace +SpecificWord.Length)).Replace(" ", " ");
WordPlace = sentence.IndexOf(SpecificWord);
}
Console.WriteLine("this word Found " + TimesRep + " time");

You can use while loop to search index of the first occurrence, after that give search from found index ++ position and set one counter at the end of the loop. While loop goes untill index == -1.

Well the problem is not that simple you may think; there are many issues should be taken care of such as punctuation, letter case, and things like how word boundaries are identified.
However using N_Gram concept I provide the following solution:
1- Identify how many words are in the key. Name it as N
2- Extract all N-consecutive sequence of words (N_Grams) in the text.
3- Count the occurrence of key in N_Grams
string text = "I have asked the question in StackOverflow. Therefore i can expect answer here.";
string key = "the question";
int gram = key.Split(' ').Count();
var parts = text.Split(' ');
List<string> n_grams = new List<string>();
for (int i = 0; i < parts.Count(); i++)
{
if (i <= parts.Count() - gram)
{
string sequence = "";
for (int j = 0; j < gram; j++)
{
sequence += parts[i + j] + " ";
}
if (sequence.Length > 0)
sequence = sequence.Remove(sequenc.Count() - 1, 1);
n_grams.Add(sequence);
}
}
// The result
int count = n_grams.Count(p => p == key);
}
For example for the key = the question and considering single space as word boundaries, the following bi-grams are extracted:
I have
have asked
asked the
the question
question in
in StackOverflow.
StackOverflow. Therefore
Therefore i
i can
can expect
expect answer
answer here.
and the number of times the question appears in the text is not obvious to see: 1

This solution should work wherever the string is:
var str = "I have asked the question in StackOverflow. Therefore i can expect answer here.";
var numMatches = Regex.Matches(str.ToUpper(), "THE")
.Cast<Match>()
.Count(match =>
(match.Index == 0 || str[match.Index - 1] == ' ') &&
(match.Index + match.Length == str.Length ||
!Regex.IsMatch(
str[match.Index + match.Length].ToString(),
"[a-zA-Z]")));
.NET Fiddle

string input = "I have asked the question in StackOverflow. Therefore i can expect answer here.";
string pattern = #"\bthe\b";
var matches = Regex.Matches(input, pattern, RegexOptions.IgnoreCase);
Console.WriteLine(matches.Count);
See Regex Anchors - "\b".

Try like this
string Text = "I have asked the question in StackOverflow. Therefore i can expect answer here.";
Text = Text.ToLower();
Dictionary<string, int> frequencies = null;
frequencies = new Dictionary<string, int>();
string[] words = Regex.Split(Text, "\\W+");
foreach (string word in words)
{
if (frequencies.ContainsKey(word))
{
frequencies[word] += 1;
}
else
{
frequencies[word] = 1;
}
}
foreach (KeyValuePair<string, int> entry in frequencies)
{
string word = entry.Key;
int frequency = entry.Value;
Response.Write(word.ToString() + "," + frequency.ToString()+"</br>");
}
And To search Specific Word then try Like This.
string Text = "I have asked the question in StackOverflow. Therefore the i can expect answer here.";
Text = Text.ToLower();
string searchtext = "the";
searchtext = searchtext.ToLower();
string[] words = Regex.Split(Text, "\\W+");
foreach (string word in words)
{
if (searchtext.Equals(word))
{
count = count + 1;
}
else
{
}
}
Response.Write(count);

There are many possibility for Count occurrence of whole word in a string.
E.g.
First:
string name = "pappu kumar sdffnsd sdfnsdkfbsdf sdfjnsd fsdjkn fsdfsd sdfsd pappu kumar";
var res= name.Contains("pappu kumar");
var splitval = name.Split("pappu kumar").Length-1;
Second:
var r = Regex.Matches(name, "pappu kumar").Count;

What about (seems more efficency then other solutions):
public static int CountOccurences(string haystack, string needle)
{
return (haystack.Length - haystack.Replace(needle, string.Empty).Length) / needle.Length;
}

Try out this works for structured data as well.
var splitStr = inputStr.Split(' ');
int result_count = splitStr.Count(str => str.Contains("userName"));

Related

C#: Need to split a string into a string[] and keeping the delimiter (also a string) at the beginning of the string

I think I am too dumb to solve this problem...
I have some formulas which need to be "translated" from one syntax to another.
Let's say I have a formula that goes like that (it's a simple one, others have many "Ceilings" in it):
string formulaString = "If([Param1] = 0, 1, Ceiling([Param2] / 0.55) * [Param3])";
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
My attempt is to split the fomulaString at "Ceiling(" so I am able to iterate through the string array and insert my string at the correct index (counting every "(" and ")" to get the right index)
What I have so far:
//splits correct, but loses "CEILING("
string[] parts = formulaString.Split(new[] { "CEILING(" }, StringSplitOptions.None);
//splits almost correct, "CEILING(" is in another group
string[] parts = Regex.Split(formulaString, #"(CEILING\()");
//splits almost every letter
string[] parts = Regex.Split(formulaString, #"(?=[(CEILING\()])");
When everything is done, I concat the string so I have my complete formula again.
What do I have to set as Regex pattern to achieve this sample? (Or any other method that will help me)
part1 = "If([Param1] = 0, 1, ";
part2 = "Ceiling([Param2] / 0.55) * [Param3])";
//part3 = next "CEILING(" in a longer formula and so on...
As I mention in a comment, you almost got it: (?=Ceiling). This is incomplete for your use case unfortunately.
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
Depending on your regex engine (for example JS) this works:
string[] parts = Regex.Split(formulaString, #"(?<=Ceiling\([^)]*(?=\)))");
string modifiedFormula = String.join("; 1", parts);
The regex
(?<=Ceiling\([^)]*(?=\)))
(?<= ) Positive lookbehind
Ceiling\( Search for literal "Ceiling("
[^)] Match any char which is not ")" ..
* .. 0 or more times
(?=\)) Positive lookahead for ")", effectively making us stop before the ")"
This regex is a zero-assertion, therefore nothing is lost and it will cut your strings before the last ")" in every "Ceiling()".
This solution would break whenever you have nested "Ceiling()". Then your only solution would be writing your own parser for the same reasons why you can't parse markup with regex.
Regex.Replace(formulaString, #"(?<=Ceiling\()(.*?)(?=\))","$1; 1");
Note: This will not work for nested "Ceilings", but it does for Ceiling(), It will also not work fir Ceiling(AnotherFunc(x)). For that you need something like:
Regex.Replace(formulaString, #"(?<=Ceiling\()((.*\((?>[^()]+|(?1))*\))*|[^\)]*)(\))","$1; 1$3");
but I could not get that to work with .NET, only in JavaScript.
This is my solution:
private string ConvertCeiling(string formula)
{
int ceilingsCount = formula.CountOccurences("Ceiling(");
int startIndex = 0;
int bracketCounter;
for (int i = 0; i < ceilingsCount; i++)
{
startIndex = formula.IndexOf("Ceiling(", startIndex);
bracketCounter = 0;
for (int j = 0; j < formula.Length; j++)
{
if (j < startIndex) continue;
var c = formula[j];
if (c == '(')
{
bracketCounter++;
}
if (c == ')')
{
bracketCounter--;
if (bracketCounter == 0)
{
// found end
formula = formula.Insert(j, "; 1");
startIndex++;
break;
}
}
}
}
return formula;
}
And CountOccurence:
public static int CountOccurences(this string value, string parameter)
{
int counter = 0;
int startIndex = 0;
int indexOfCeiling;
do
{
indexOfCeiling = value.IndexOf(parameter, startIndex);
if (indexOfCeiling < 0)
{
break;
}
else
{
startIndex = indexOfCeiling + 1;
counter++;
}
} while (true);
return counter;
}

Counting the number of times words in an array that are within a sentence

I am writing code that will count the number of times each word in an array appears as well as the total of number of all the words appearing. I have managed to create an array which allows the user to add in a number of words they wish to check. However, I am struggling to find a way to count the number of times each word is within the sentence individually (I can only get it working for the first element of the array).
I have tried a for loop that when completed it will move to the next element in the array but it does not start for loop again for the next element but ends the code block.
int occurences = 0;
string[] words = new string[_wordCount];
for (int i = 0; i < words.Length; i++)
{
Console.WriteLine("Type in the censored words you wish to be counted: ");
words[i] = Console.ReadLine();
if (_sentence.Contains(words[i]))
{
occurences++;
}
if (i > words.Length)
{
i++;
}
}
Console.WriteLine("Number of censored word occurences: " + occurences);
return occurences;
You simply need to calculate each word occurrence separately like this:
int i, j, woccurence, occurences = 0;
string[] words, details = new string[_wordCount];
for (i = 0; i < words.Length; i++)
{
Console.WriteLine("Type in the censored words you wish to be counted: ");
words[i] = Console.ReadLine();
woccurence = 0;
details = _sentence.Split(' ');
for (j = 0; j < details.Length; j++)
if (details[j] == words[i])
woccurences++;
Console.WriteLine("Number of censored word occurences: " + woccurences);
occurences += woccurences;
}
Console.WriteLine("Number of total censored words occurences: " + occurences);
return occurences;
Details
In each cycle of the loop, you count the number of occurrences of a word from the beginning using woccurence and print it out, then you add this value to the total occurrence.
if sentence consists of words seperated by a space, you can use string.split to make it an array that you can then iterate throguh.
var sentenceArray = _sentence.Split(new Char [] {' '});
And then loop through sentence array within your main words loop.
I don't see where _sentence or _wordCount are being set, but let me know if this is what you have in mind.
Caveat: I'm assuming here the user will input each word with a single space between words when prompted. You'll probably want to handle the case where user uses too much space, commas, etc.
static int GetWords()
{
int occurences = 0;
int _wordCount = 0;
string _sentence = "The quick brown fox jumped over the lazy dog.";
string[] words = new string[_sentence.Length];
Console.WriteLine("Type in the censored words you wish to be counted: ");
string censoredWordString = Console.ReadLine();
string[] censoredWords = censoredWordString.Split(' ');
for (int i = 0; i < censoredWords.Length; i++)
{
if (_sentence.Contains(censoredWords[i]))
{
occurences++;
}
}
Console.WriteLine("Number of censored word occurences: " + occurences);
return occurences;
}
You can try this:
int occurences = 0;
string sentence = "This is a test sentence. This sentence is test. This sentence do nothing.";
var sentenceWords = new string(sentence.Where(c => !char.IsPunctuation(c)).ToArray()).Split(' ');
var wordsFound = new Dictionary<string, int>();
Console.WriteLine("Sentence = " + sentence);
while ( true)
{
Console.WriteLine(Environment.NewLine);
Console.WriteLine("Type in a censored word you wish to be counted (enter empty to end): ");
string input = Console.ReadLine();
if ( input == "" ) break;
int count = sentenceWords.Count(word => word.ToLower() == input.ToLower());
if ( count == 0 )
{
Console.WriteLine("Can't find \"" + input + "\".");
}
else
{
Console.WriteLine("Found " + count + " occurences of \"" + input + "\".");
if ( !wordsFound.ContainsKey(input) )
wordsFound.Add(input, count);
occurences += count;
}
}
Console.WriteLine(Environment.NewLine);
Console.WriteLine("Number of total censored words occurences: " + occurences);
foreach ( var item in wordsFound)
Console.WriteLine(" " + item.Key + ": " + item.Value);
Console.ReadKey();
You can achieve that only using LINQ
int totalWords = 0;
var sentence = "Don't cry because it's over, smile because it happened";
sentence.ToLower().Split(' ').GroupBy(x => x).ToList().ForEach(x=> {
totalWords += x.Count();
Console.WriteLine($"{x.Key}: {x.Count()}");
});
Console.WriteLine($"Total words: {totalWords}");
First of all we use ToLower() se we can get rid of lower and uppers
Then We split by a space with .Split(' ')
Now we group by each word with GroupBy(x=>x)
We need to use ToList() to cast de IGrouping<T> so we can iterate the results with ForEach
Finally we print the results getting the Key which reffers to the grouped object and use Count() to get the quantity of group contains
Personally, I'd use a dictionary. Specifically with using a <key, value> of string and int.
int occurances = 0;
string[] words = new string[_wordCount];
var results = new Dictionary<string, int>();
var splitSentence = _sentence.Split(' ').ToArray();
for(int i = 0; i < words.Length; i++)
{
Console.WriteLine("Type in the censored words you wish to be counted: ");
words[i] = Console.ReadLine();
if(_sentence.Contains(words[i]))
{
if(!results.ContainsKey(words[i]))
{
results.Add(words[i], 0);
}
for(var j = 0; j < splitSentence.Length; j++)
{
if(splitSentence[j] == words[i])
{
results[words[i]]++;
occurances++;
}
}
}
}
For a dictionary, the first 'parameter' must be unique, so any time a word reappears you just need to check if that key exists first, it'll just add to the count (the value in the key, value pair).
Using LINQ and Regular Expressions:
var sentence = "now is the time for all good men to come, to the aid of their country.";
var words = new[] { "time", "to" };
var wordsHS = words.ToHashSet();
var wordRE = new Regex(#"\w+", RegexOptions.Compiled);
var wordCounts = wordRE.Matches(sentence).Cast<Match>().Select(m => m.Value)
.Where(w => wordsHS.Contains(w))
.GroupBy(w => w)
.Select(wg => new { Word = wg.Key, Count = wg.Count() })
.ToList();
var total = wordCounts.Sum(wc => wc.Count);

C# More intuitive way to split a string into tokens?

I have a method which takes in a string, which contains various characters, but I'm only concerned about underscores '_' and dollar signs '$'. I want to split up the string into tokens by underscores as each piece b/w the underscores contains important information.
However, if a $ is contained in an area between underscores, then a token should be created from the last occurrence of an underscore to the end (ignoring any underscores in this last section).
Example
input: Hello_To_The$Great_World
expected tokens: Hello, To, The$Great_World
Question
I have a solution below, but I'm wondering is there a cleaner/more intuitive way of doing this than what I have below?
var aTokens = new List<string>();
var aPos = 0;
for (var aNum = 0; aNum < item.Length; aNum++)
{
if (aNum == item.Length - 1)
{
aTokens.Add(item.Substring(aPos, item.Length - aPos));
break;
}
if (item[aNum] == '$')
{
aTokens.Add(item.Substring(aPos, item.Length - aPos));
break;
}
if (item[aNum] == '_')
{
aTokens.Add(item.Substring(aPos, aNum - aPos));
aPos = aNum + 1;
}
}
You can split string by _ not having $ before them.
For that you can use the following regex:
(?<!\$.*)_
Sample code:
string input = "Hello_To_The$Great_World";
string[] output = Regex.Split(input, #"(?<!\$.*)_");
You also can do the task without regex and without loops, but with the help of 2 splits:
string input = "Hello_To_The$Great_World";
string[] temp = input.Split(new[] { '$' }, 2);
string[] output = temp[0].Split('_');
if (temp.Length > 1)
output[output.Length - 1] = output[output.Length - 1] + "$" + temp[1];
This method is not efficient or clean, but it gives you a general idea of how to do this:
Split your string into tokens
Find the index of the first string to contain $
Return a new array with the first n tokens and the final token is the remaining strings concatenated.
It's probably more useful to take advantage of IEnumerable or do things over a for loop instead of all this Array.Copy stuff... but you get the gist of it.
private string[] SomeMethod(string arg)
{
var strings = arg.Split(new[] { '_' });
var indexedValue = strings.Select((v, i) => new { Value = v, Index = i }).FirstOrDefault(x => x.Value.Contains("$"));
if (indexedValue != null)
{
var count = indexedValue.Index + 1;
string[] final = new string[count];
Array.Copy(strings, 0, final, 0, indexedValue.Index);
final[indexedValue.Index] = String.Join("_", strings, indexedValue.Index, strings.Length - indexedValue.Index);
return final;
}
return strings;
}
Here's my version (loops are so last year...)
const char dollar = '$';
const char underscore = '_';
var item = "Hello_To_The$Great_World";
var aTokens = new List<string>();
int dollarIndex = item.IndexOf(dollar);
if (dollarIndex >= 0)
{
int lastUnderscoreIndex = item.LastIndexOf(underscore, dollarIndex);
if (lastUnderscoreIndex >= 0)
{
aTokens.AddRange(item.Substring(0, lastUnderscoreIndex).Split(underscore));
aTokens.Add(item.Substring(lastUnderscoreIndex + 1));
}
else
{
aTokens.Add(item);
}
}
else
{
aTokens.AddRange(item.Split(underscore));
}
Edit:
I should have added, cleaner/more intuitive is very subjective, as you have found out by the variety of answers provided. From a maintainability point of view, it's much more important that the method you write to do the parsing is unit tested!
It's also an interesting exercise to test the performance of the various methods posted here - it quickly becomes apparent that your original version is much faster than using regular expressions! (Although in a real life situation, it's probably quite unlikely that the performance of this method will make any difference to your application!)

Iterating backwards through an char array after finding a known word

I've got a project I'm working on in C#. I've got two char array's. One is a sentence and one is a word. I've got to iterate through the sentence array until I find a word that matches the word that was turned into an word array what I'm wondering is once I find the word how do I iterate backwards through the sentence array at the point I found the word back through the same length as the word array?
Code :
String wordString = "(Four)";
String sentenceString = "Maybe the fowl of Uruguay repeaters (Four) will be found";
char[] wordArray = wordString.ToCharArray();
List<String> words = sentenceString.Split(' ').ToList<string>();
//This would be the part where I iterate through sentence
foreach (string sentence in sentArray)
{
//Here I would need to find where the string of (Four) and trim it and see if it equals the wordString.
if (sentence.Contains(wordString)
{
//At this point I would need to go back the length of wordString which happens to be four places but I'm not sure how to do this. And for each word I go back in the sentence I need to capture that in another string array.
}
I don't know if I'm being clear enough on this but if I'm not please feel free to ask.. Thank you in advanced. Also what this should return is "fowl of Uruguay repeaters". So basically the use case is for the number of letters in the parenthesis the logic should return the same number of words before the word in parenthesis.
our are you. I have few question concerned to this exercise.
If the Word (four) was in beginning it should not return? or return all string?
As the length of four is equal to 4 imagine if that word appear as the second word on sentence what it should return just the first word or return 4 words even including the (four) word.?
My solution is the laziest one I just see your question and decide to help.
My solution assumes it return all the word before the (four) if the length is bigger than the word before the (four) word.
My solution return empty string if the word (four) is in the beginning.
My solution return Length of (four) (4) words before (four) word.
ONCE AGAIN IT IS NOT MY BEST APPROACH.
I see the code bellow:
string wordString = "(Four)";
string sentenceString = "Maybe the fowl of Uruguay repeaters (Four) will be found";
//Additionally you can add splitoption to remove the empty word on split function bellow
//Just in case there are more space in sentence.
string[] splitedword = sentenceString.Split(' ');
int tempBackposition = 0;
int finalposition = 0;
for (int i = 0; i < splitedword.Length; i++)
{
if (splitedword[i].Contains(wordString))
{
finalposition = i;
break;
}
}
tempBackposition = finalposition - wordString.Replace("(","").Replace(")","").Length;
string output = "";
tempBackposition= tempBackposition<0?0:tempBackposition;
for (int i = tempBackposition; i < finalposition; i++)
{
output += splitedword[i] + " ";
}
Console.WriteLine(output);
Console.ReadLine();
If it's not what you want can you answer my questions on top? or help me to understand were it's wrong
int i ;
string outputString = (i=sentenceString.IndexOf(wordString))<0 ?
sentenceString : sentenceString.Substring(0,i) ;
var wordString = "(Four)";
int wordStringInt = 4; // Just do switch case to convert your string to int
var sentenceString = "Maybe the fowl of Uruguay repeaters (Four) will be found";
var sentenceStringArray = sentenceString.Split(' ').ToList();
int wordStringIndexInArray = sentenceStringArray.IndexOf(wordString) - 1;
var stringOutPut = "";
if (wordStringIndexInArray > 0 && wordStringIndexInArray > wordStringInt)
{
stringOutPut = "";
while (wordStringInt > 0)
{
stringOutPut = sentenceStringArray[wordStringInt] + " " + stringOutPut;
wordStringInt--;
}
}
What you are matching is kind of complex, so for a more general solution you could use regular expressions.
First we declare what we are searching for:
string word = "(Four)";
string sentence = "Maybe the fowl of Uruguay repeaters (Four) will be found";
We will then search for the words in this string using regular expressions. Since we don't want to match whitespace, and we need to know where each match actually starts and we need to know the word inside the parenthesis we tell it that we optionally want opening and ending parenthesis, but we also want the contents of those as a match:
var words = Regex.Matches(sentence, #"[\p{Ps}]*(?<Content>[\w]+)[\p{Pe}]*").Cast<Match>().ToList();
[\p{Ps}] means we want opening punctuation ([{ etc. while the * indicates zero or more.
Followed is a sub-capture called Content (specified by ?<Content>) with one or more word characters.
At the end we specify that we want zero or more ending punctuation.
We then need to find the word in the list of matches:
var item = words.Single(x => x.Value == word);
Then we need to find this item's index:
int index = words.IndexOf(item);
At this point we just need to know the length of the contents:
var length = item.Groups["Content"].Length;
This length we use to go back in the string 4 words
var start = words.Skip(index - length).First();
And now we have everything we need:
var result = sentence.Substring(start.Index, item.Index - start.Index);
Result should contain fowl of Uruguay repeaters.
edit: It may be a lot simpler to just figure out the count from the word rather than from the content. In that case the complete code should be the following:
string word = "(Four)";
string sentence = "Maybe the fowl of Uruguay repeaters (Four) will be found";
var wordMatch = Regex.Match(word, #"[\p{Ps}]*(?<Content>[\w]+)[\p{Pe}]*");
var length = wordMatch.Groups["Content"].Length;
var words = Regex.Matches(sentence, #"\S+").Cast<Match>().ToList();
var item = words.Single(x => x.Value == word);
int index = words.IndexOf(item);
var start = words.Skip(index - length).First();
var result = sentence.Substring(start.Index, item.Index - start.Index);
\S+ in this case means "match one or more non-whitespace character".
You should try something like the following, which uses Array.Copy after it finds the number word. You will still have to implement the ConvertToNum function correctly (it is hardcoded in for now), but this should be a quick and easy solution.
string[] GetWords()
{
string sentenceString = "Maybe the fowl of Uruguay repeaters (Four) will be found";
string[] words = sentenceString.Split();
int num = 0;
int i; // scope of i should remain outside the for loop
for (i = 0; i < words.Length; i++)
{
string word = words[i];
if (word.StartsWith("(") && word.EndsWith(")"))
{
num = ConvertToNum(word.Substring(1, word.Length - 1));
// converted the number word we found, so we break
break;
}
}
if (num == 0)
{
// no number word was found in the string - return empty array
return new string[0];
}
// do some extra checking if number word exceeds number of previous words
int startIndex = i - num;
// if it does - just start from index 0
startIndex = startIndex < 0 ? 0 : startIndex;
int length = i - startIndex;
string[] output = new string[length];
Array.Copy(words, startIndex, output, 0, length);
return output;
}
// Convert the number word to an integer
int ConvertToNum(string numberStr)
{
return 4; // you should implement this method correctly
}
See - Convert words (string) to Int, for help implementing the ConvertToNum solution. Obviously it could be simplified depending on the range of numbers you expect to deal with.
Here is my solution, im not using regex at all, for the sake of easy understanding:
static void Main() {
var wordString = "(Four)";
int wordStringLength = wordString.Replace("(","").Replace(")","").Length;
//4, because i'm assuming '(' and ')' doesn't count.
var sentenceString = "Maybe the fowl of Uruguay repeaters (Four) will be found";
//Transform into a list of words, ToList() to future use of IndexOf Method
var sentenceStringWords = sentenceString.Split(' ').ToList();
//Find the index of the word in the list of words
int wordIndex = sentenceStringWords.IndexOf(wordString);
//Get a subrange from the original list of words, going back x Times the legnth of word (in this case 4),
var wordsToConcat = sentenceStringWords.GetRange(wordIndex-wordStringLength, wordStringLength);
//Finally concat the output;
var outPut = string.Join(" ", wordsToConcat);
//Output: fowl of Uruguay repeaters
}
I have a solution for you:
string wordToMatch = "(Four)";
string sentence = "Maybe the fowl of Uruguay repeaters (Four) will be found";
if (sentence.Contains(wordToMatch))
{
int length = wordToMatch.Trim(new[] { '(', ')' }).Length;
int indexOfMatchedWord = sentence.IndexOf(wordToMatch);
string subString1 = sentence.Substring(0, indexOfMatchedWord);
string[] words = subString1.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var reversed = words.Reverse().Take(length);
string result = string.Join(" ", reversed.Reverse());
Console.WriteLine(result);
Console.ReadLine();
}
Likely this can be improved regardng performance, but I have a feeling you don't care about that. Make sure you are using 'System.Linq'
I assumed empty returns when input is incomplete, feel free to correct me on that. Wasn't 100% clear in your post how this should be handled.
private string getPartialSentence(string sentence, string word)
{
if (string.IsNullOrEmpty(sentence) || string.IsNullOrEmpty(word))
return string.Empty;
int locationInSentence = sentence.IndexOf(word, StringComparison.Ordinal);
if (locationInSentence == -1)
return string.Empty;
string partialSentence = sentence.Substring(0, locationInSentence);
string[] words = partialSentence.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
int nbWordsRequired = word.Replace("(", "").Replace(")", "").Length;
if (words.Count() >= nbWordsRequired)
return String.Join(" ", words.Skip(words.Count() - nbWordsRequired));
return String.Join(" ", words);
}
I went with using an enumeration and associated dictionary to pair the "(Four)" type strings with their integer values. You could just as easily (and maybe easier) go with a switch statement using
case "(Four)": { currentNumber = 4; };
I feel like the enum allows a little more flexibility though.
public enum NumberVerb
{
one = 1,
two = 2,
three = 3,
four = 4,
five = 5,
six = 6,
seven = 7,
eight = 8,
nine = 9,
ten = 10,
};
public static Dictionary<string, NumberVerb> m_Dictionary
{
get
{
Dictionary<string, NumberVerb> temp = new Dictionary<string, NumberVerb>();
temp.Add("(one)", NumberVerb.one);
temp.Add("(two)", NumberVerb.two);
temp.Add("(three)", NumberVerb.three);
temp.Add("(four)", NumberVerb.four);
temp.Add("(five)", NumberVerb.five);
temp.Add("(six)", NumberVerb.six);
temp.Add("(seven)", NumberVerb.seven);
temp.Add("(eight)", NumberVerb.eight);
temp.Add("(nine)", NumberVerb.nine);
temp.Add("(ten)", NumberVerb.ten);
return temp;
}
}
static void Main(string[] args)
{
string resultPhrase = "";
// Get the sentance that will be searched.
Console.WriteLine("Please enter the starting sentance:");
Console.WriteLine("(don't forget your keyword: ie '(Four)')");
string sentance = Console.ReadLine();
// Get the search word.
Console.WriteLine("Please enter the search keyword:");
string keyword = Console.ReadLine();
// Set the associated number of words to backwards-iterate.
int currentNumber = -1;
try
{
currentNumber = (int)m_Dictionary[keyword.ToLower()];
}
catch(KeyNotFoundException ex)
{
Console.WriteLine("The provided keyword was not found in the dictionary.");
}
// Search the sentance string for the keyword, and get the starting index.
Console.WriteLine("Searching for phrase...");
string[] words = sentance.Split(' ');
int searchResultIndex = -1;
for (int i = 0; (searchResultIndex == -1 && i < words.Length); i++)
{
if (words[i].Equals(keyword))
{
searchResultIndex = i;
}
}
// Handle the search results.
if (searchResultIndex == -1)
{
resultPhrase = "The keyword was not found.";
}
else if (searchResultIndex < currentNumber)
{
// Check the array boundaries with the given indexes.
resultPhrase = "Error: Out of bounds!";
}
else
{
// Get the preceding words.
for (int i = 0; i < currentNumber; i++)
{
resultPhrase = string.Format(" {0}{1}", words[searchResultIndex - 1 - i], resultPhrase);
}
}
// Display the preceding words.
Console.WriteLine(resultPhrase.Trim());
// Exit.
Console.ReadLine();
}

String Between Function?

Is there a way to get the string between.. lets say quote "
The problem with using Indexof and substring is that it gets the first " and last " but not the pair. Like
"Hello" "WHY ARE" "WWWWWEEEEEE"
It will get
Hello" "WHY ARE" "WWWWWEEEEEE
I want it to get to array > Hello, WHY ARE, WWWWEEEEEE
Is there any way to do this?
Something like this?
StringCollection resultList = new StringCollection();
try
{
Regex regexObj = new Regex("\"([^\"]+)\"");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success)
{
resultList.Add(matchResult.Groups[1].Value);
matchResult = matchResult.NextMatch();
}
}
catch (ArgumentException ex)
{
// Syntax error in the regular expression
}
If subjectString was "Hello" "WHY ARE" "WWWWWEEEEEE", that should give you a list containing:
Hello
WHY ARE
WWWWWEEEEEE
A more compact example which uses the static Regex class instead, and just writes the matches to console instead of adding to a collection:
var subject = "\"Hello\" \"WHY ARE\" \"WWWWWEEEEEE\"";
var match = Regex.Match(subject, "\"([^\"]+)\"");
while (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
match = match.NextMatch();
}
string s = '"Hello" "WHY ARE" "WWWWWEEEEEE"'
string[] words = s.Split('"');
// words is now ["Hello", " ", "WHY ARE", " ", "WWWWWEEEEEE"]
If you don't want the empty strings, you can split by '" "', in which case you will get ['"Hello', "WHY ARE", 'WWWWWEEEEEE"'].
On the other hand, using regular expressions could be the best solution for what you want. I'm not a C# expert, so I can't give the code from the top of my head, but this is the regex you'll want to use: "(.*?)"
You can also use the String.IndexOf(char value, int startIndex) method which has, as its parameter says, a start index from which the scan is started.
int start = 0;
do
{
int i1 = s.IndexOf('=', start);
if (i1 < 0) break;
int i2 = s.IndexOf('=', i1 + 1);
if (i2 < 0) break;
yield return s.Substring(i1, i2 - i1);
start = i2 + 1;
}
while (start < s.Length);
string s = '"Hello" "WHY ARE" "WWWWWEEEEEE"
s.replace("\" \"", "!*!"); // custom seperator
s.Replace('"', string.empty);
string[] words = s.Split('!*!');
Should do the trick,
Kindness,
Dan

Categories