how to find text in a string in c# - c#

I am learning Dotnet c# on my own.
how to find whether a given text exists or not in a string and if exists, how to find count of times the word has got repeated in that string. even if the word is misspelled, how to find it and print that the word is misspelled?
we can do this with collections or linq in c# but here i used string class and used contains method but iam struck after that.
if we can do this with help of linq, how?
because linq works with collections, Right?
you need a list in order to play with linq.
but here we are playing with string(paragraph).
how linq can be used find a word in paragraph?
kindly help.
here is what i have tried so far.
string str = "Education is a ray of light in the darkness. It certainly is a hope for a good life. Eudcation is a basic right of every Human on this Planet. To deny this right is evil. Uneducated youth is the worst thing for Humanity. Above all, the governments of all countries must ensure to spread Education";
for(int i = 0; i < i++)
if (str.Contains("Education") == true)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}

You can make a string a string[] by splitting it by a character/string. Then you can use LINQ:
if(str.Split().Contains("makes"))
{
// note that the default Split without arguments also includes tabs and new-lines
}
If you don't care whether it is a word or just a sub-string, you can use str.Contains("makes") directly.
If you want to compare in a case insensitive way, use the overload of Contains:
if(str.Split().Contains("makes", StringComparer.InvariantCultureIgnoreCase)){}

string str = "money makes many makes things";
var strArray = str.Split(" ");
var count = strArray.Count(x => x == "makes");

the simplest way is to use Split extension to split the string into an array of words.
here is an example :
var words = str.Split(' ');
if(words.Length > 0)
{
foreach(var word in words)
{
if(word.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}
}
}
Now, since you just want the count of number word occurrences, you can use LINQ to do that in a single line like this :
var totalOccurrences = str.Split(' ').Count(x=> x.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1);
Note that StringComparison.InvariantCultureIgnoreCase is required if you want a case-insensitive comparison.

Related

How to search for specific char in an array and how to then manipulate that index in c#

Okay, so I'm creating a hangman game and everything functions so far, including what I'm TRYING to do in the question.
But it feels like there is a much more efficient method of obtaining the char that is also easier to manipulate the index.
protected static void alphabetSelector(string activeWordAlphabet)
{
char[] activeWord = activeWordAlphabet.ToCharArray();
string activeWordString = new string(activeWord);
Console.WriteLine("If you'd like to guess a letter, enter the letter. \n
If you'd like to guess the word, please type in the word. --- testing answer{0}",
activeWordString);
//Console.WriteLine("For Testing Purposes ONLY");
String chosenLetter = Console.ReadLine();
//Char[] letterFinder = Array.FindAll(activeWord, s => s.Equals(chosenLetter));
//string activeWordString = new string(activeWord);
foreach (char letter in activeWord);
{
if(activeWordString.Contains(chosenLetter))
{
Console.WriteLine("{0}", activeWordString);
Console.ReadLine();
}
else
{
Console.WriteLine("errrr...wrong!");
Console.ReadLine();
}
}
}
I have broken up the code in some areas to prevent the reader from having to scroll sideways. If this is bothersome, please let me know and I'll leave it in the future.
So this code will successfully print out the 'word' whenever I select the correct letter from the random word (I have the console print the actual word so that I can test it successfully each time). It will also print 'wrong' when I choose a letter NOT in the string.
But I feel like I should be able to use the
Array.FindAll(activeWord, ...)
functionality or some other way. But every time I try and reorder the arguments, it gives me all kinds of different errors and tells me to redo my arguments.
So, if you can look at this and find an easier method of searching the actual array for the user-selected 'letter', please help!! Even if it's not using the Array.FindAll method!!
Edit
Okay, it seems like there's some confusion with what I've done and why I've done it.
I'm ONLY printing the word inside that 'if' statement to test and make sure that the foreach{if{}} will actually work to find the char inside the string. But I ultimately need to be able to provide a placeholder for a char that is successfully found, as well as being able to 'cross out' the letter (from the alphabet list not shown here).
It's hangman - surely you guys know what I'm needing it to do. It has to keep track of which letters are left in the word, which letters have been chosen, as well as which letters are left in the entire alphabet.
I'm a 4-day old newb when it comes to programming, so please. . . I'm only doing what I know to do and when I get errors, I comment things out and write more until I find something that works.
Take a look at this demo I put together for you: https://dotnetfiddle.net/eP9TQM
I'd suggest creating a second string for the display string. Use a StringBuilder, and you can replace the characters in it at specific indices while creating the fewest number of stringobjects in the process.
string word = "your word or phrase here";
//Initialize a new StringBuilder that will display the word with placeholders.
StringBuilder display = new StringBuilder(word.Length); //You know the display word is the same length as the original word
display.Append('-', word.Length); //Fill it with placeholders.
So now you have your phrase/word, and a string builder full of characters that need to be discovered.
Go ahead and convert the display StringBuilder to a string that you can check on each pass to see if it equals your word:
var displayString = display.ToString();
//Loop until the display string is equal to the word
while (!displayString.Equals(word))
{
//Inside here your logic will follow.
}
So you are basically looping until the person answers here. You could of course go back and add logic to limit the number of attempts, or whatever you desire as an alternate exit strategy.
Inside this logic, you will check if they guessed a letter or a word based on how many characters they entered.
If they guessed a word, the logic is simple. Check if the guessed word is the same as the hidden word. If it is, then you break the loop and they are done. Otherwise, guessing loops back around.
If they guessed a letter, the logic is pretty straightforward, but more involved.
First get the character they guessed, just because it may be easier to work with this way.
char guess = input[0];
Now, look over the word for instances of that character:
//Look for instances of the character in the word.
for (int i = 0; i < word.Length; ++i)
{
//If the current index in the word matches their guess, then update the display.
if (char.ToUpperInvariant(word[i]) == char.ToUpperInvariant(guess))
display[i] = word[i];
}
The comments above should explain the idea here.
Update your displayString at the bottom of the loop so that it will check against the hidden word again:
displayString = display.ToString();
That's really all you need to do here. No fancy Linq needed.
Ok your code is really confusing, even with your edit.
First, why these 2 lines of code since activeWordAlphabet is a string :
char[] activeWord = activeWordAlphabet.ToCharArray();
string activeWordString = new string(activeWord);
Then you do your foreach.
For the word "FooBar", if the player types 'F', you will print
FooBar
FooBar
FooBar
FooBar
FooBar
FooBar
How does this help you in anything?
I think you have to review your algorithm. The string type have the function you need
int chosenLetterPosition = activeWord.IndexOf(chosenLetter, alreadyFoundPosition)
alreadyFoundPosition is an int from where the function will search the letter
IndexOf() returns -1 if the letter is not find or a positive number.
You can save this position with your letter in a dictionary to use it again as your new 'alreadyFoundPosition' if the chosenLetter is already in the dictionary
This is my answer. Because I don't have a lot of tasks today :)
class Letter
{
public bool ischosen { get; set; }
public char value { get; set; }
}
class LetterList
{
public LetterList(string word)
{
_lst = new List<Letter>();
word.ToList().ForEach(x => _lst.Add(new Letter() { value = x }));
}
public bool FindLetter(char letter)
{
var search = _lst.Where(x => x.value == letter).ToList();
search.ForEach(x=>x.ischosen=true);
return search.Count > 0 ? true : false;
}
public string NotChosen()
{
var res = "";
_lst.Where(x => !x.ischosen).ToList().ForEach(x => { res += x.value; });
return res;
}
List<Letter> _lst;
}
How to use
var abc = new LetterList("abcdefghijklmnopqrstuvwxyz");
var answer = new LetterList("myanswer");
Console.WriteLine("This my question. Why? write your answer please");
char x = Console.ReadLine()[0];
if (answer.FindLetter(x))
{
Console.WriteLine("you are right!");
}
else
{
Console.WriteLine("fail");
}
abc.FindLetter(x);
Console.WriteLine("not chosen abc:{0} answer:{1}", abc.NotChosen(), answer.NotChosen());
At least we used to play this game like that when i was a child.

Reversal and removing of duplicates in a sentence

I am preparing for a interview question.One of the question is to revert a sentence. Such as "its a awesome day" to "day awesome a its. After this,they asked if there is duplication, can you remove the duplication such as "I am good, Is he good" to "good he is, am I".
for reversal of the sentence i have written following method
public static string reversesentence(string one)
{
StringBuilder builder = new StringBuilder();
string[] split = one.Split(' ');
for (int i = split.Length-1; i >= 0; i--)
{
builder.Append(split[i]);
builder.Append(" ");
}
return builder.ToString();
}
But i am not getting ideas on removing of duplication.Can i get some help here.
This works:
public static string reversesentence(string one)
{
Regex reg = new Regex("\\w+");
bool isFirst = true;
var usedWords = new HashSet<String>(StringComparer.InvariantCultureIgnoreCase);
return String.Join("", one.Split(' ').Reverse().Select((w => {
var trimmedWord = reg.Match(w).Value;
if (trimmedWord != null) {
var wasFirst = isFirst;
isFirst = false;
if (usedWords.Contains(trimmedWord)) //Is it duplicate?
return w.Replace(trimmedWord, ""); //Remove the duplicate phrase but keep punctuation
usedWords.Add(trimmedWord);
if (!wasFirst) //If it's the first word, don't add a leading space
return " " + w;
return w;
}
return null;
})));
}
Basically, we decide if it's distinct based on the word without punctuation. If it already exists, just return the punctuation. If it doesn't exist, print out the whole word including punctuation.
Punctuation also removes the space in your example, which is why we can't just do String.Join(" ", ...) (otherwise the result would be good he Is , am I instead of good he Is, am I
Test:
reversesentence("I am good, Is he good").Dump();
Result:
good he Is, am I
For plain reversal:
String.Join(" ", text.Split(' ').Reverse())
For reversal with duplicate removal:
String.Join(" ", text.Split(' ').Reverse().Distinct())
Both work fine for strings containing just spaces as the separator. When you introduce the , then problem becomes more difficult. So much so that you need to specify how it should be handled. For example, should "I am good, Is he good" become "good he Is am I" or "good he Is , am I"? Your example in the question changes the case of "Is" and groups the "," with it too. That seems wrong to me.
The other answer points to using abstractions but interviewers usually want to see implementation.
For the reversal, the usual trick is to reverse the sentence first and then reverse each word as you travel from left to right. A space will you tell you that you have reached the end of a word. (See Programming Interviews Exposed for a solution to this or just google it. This used to be a VERY popular interview question). Your approach works but is frowned upon because you are using extra space (O(n)).
For removing duplicates, if you're only working with ASCII, you can do the following:
bool[] seenChars = new bool[128];
var sb = new StringBuilder();
foreach(char c in stringOne)
{
if(!seenChars[c]){
seenChars[c] = true;
sb.Append(c);
}
}
return sb.ToString();
The idea is to use the value of the char as an index in the array to tell you whether you've seen this character before or not. With this approach, you will be using O(1) space!
Edit: If you want to de-duplicate words, you probably want to use a HashSet and skip adding it if it already exists.
try this
string sentence = "I am good, Is he good";
var words = sentence.Split(new char[]{' ',','}).Distinct(StringComparer.CurrentCultureIgnoreCase);
var stringBuilder = new StringBuilder();
foreach(var item in words)
{
stringBuilder.Append(item);
stringBuilder.Append(" ");
}
Console.Write(stringBuilder);
Console.ReadLine();

Extracting values from a string in C#

I have the following string which i would like to retrieve some values from:
============================
Control 127232:
map #;-
============================
Control 127235:
map $;NULL
============================
Control 127236:
I want to take only the Control . Hence is there a way to retrieve from that string above into an array containing like [127232, 127235, 127236]?
One way of achieving this is with regular expressions, which does introduce some complexity but will give the answer you want with a little LINQ for good measure.
Start with a regular expression to capture, within a group, the data you want:
var regex = new Regex(#"Control\s+(\d+):");
This will look for the literal string "Control" followed by one or more whitespace characters, followed by one or more numbers (within a capture group) followed by a literal string ":".
Then capture matches from your input using the regular expression defined above:
var matches = regex.Matches(inputString);
Then, using a bit of LINQ you can turn this to an array
var arr = matches.OfType<Match>()
.Select(m => long.Parse(m.Groups[1].Value))
.ToArray();
now arr is an array of long's containing just the numbers.
Live example here: http://rextester.com/rundotnet?code=ZCMH97137
try this (assuming your string is named s and each line is made with \n):
List<string> ret = new List<string>();
foreach (string t in s.Split('\n').Where(p => p.StartsWith("Control")))
ret.Add(t.Replace("Control ", "").Replace(":", ""));
ret.Add(...) part is not elegant, but works...
EDITED:
If you want an array use string[] arr = ret.ToArray();
SYNOPSYS:
I see you're really a newbie, so I try to explain:
s.Split('\n') creates a string[] (every line in your string)
.Where(...) part extracts from the array only strings starting with Control
foreach part navigates through returned array taking one string at a time
t.Replace(..) cuts unwanted string out
ret.Add(...) finally adds searched items into returning list
Off the top of my head try this (it's quick and dirty), assuming the text you want to search is in the variable 'text':
List<string> numbers = System.Text.RegularExpressions.Regex.Split(text, "[^\\d+]").ToList();
numbers.RemoveAll(item => item == "");
The first line splits out all the numbers into separate items in a list, it also splits out lots of empty strings, the second line removes the empty strings leaving you with a list of the three numbers. if you want to convert that back to an array just add the following line to the end:
var numberArray = numbers.ToArray();
Yes, the way exists. I can't recall a simple way for It, but string is to be parsed for extracting this values. Algorithm of it is next:
Find a word "Control" in string and its end
Find a group of digits after the word
Extract number by int.parse or TryParse
If not the end of the string - goto to step one
realizing of this algorithm is almost primitive..)
This is simplest implementation (your string is str):
int i, number, index = 0;
while ((index = str.IndexOf(':', index)) != -1)
{
i = index - 1;
while (i >= 0 && char.IsDigit(str[i])) i--;
if (++i < index)
{
number = int.Parse(str.Substring(i, index - i));
Console.WriteLine("Number: " + number);
}
index ++;
}
Using LINQ for such a little operation is doubtful.

String to Sequence of Tokens

I'm parsing command sequence strings and need to convert each string into a string[] that will contain command tokens in the order that they're read.
The reason being is that these sequences are stored in a database to instruct a protocol client to carry out a certain prescribed sequence for individual distant applications. There are special tokens in these strings that I need to add to the string[] by themselves because they don't represent data being transmitted; instead they indicate blocking pauses.
The sequences do not contain delimiters. There can be any amount of special tokens found anywhere in a command sequence which is why I can't simply parse the strings with regex. Also, all of these special commands within the sequence are wrapped with ${}
Here's an example of the data that I need to parse into tokens (P1 indicates blocking pause for one second):
"some data to transmit${P1}more data here"
Resulting array should look like this:
{ "some data to transmit", "${P1}", "more data here" }
I would think LINQ could help with this, but I'm not so sure. The only solution I can come up with would be to loop through each character until a $ is found and then detect if a special pause command is available and then parse the sequence from there using indexes.
One option is to use Regex.Split(str, #"(\${.*?})") and ignore the empty strings that you get when you have two special tokens next to each other.
Perhaps Regex.Split(str, #"(\${.*?})").Where(s => s != "") is what you want.
Alright, so as was mentioned in the comments, I suggest you read about lexers. They have the power to do everything and more of what you described.
Since your requirements are so simple, I'll say that it is not too difficult to write the lexer by hand. Here's some pseudocode that could do it.
IEnumerable<string> tokenize(string str) {
var result = new List<string>();
int pos = -1;
int state = 0;
int temp = -1;
while( ++pos < str.Length ) {
switch(state) {
case 0:
if( str[pos] == "$" ) { state = 1; temp = pos; }
break;
case 1:
if( str[pos] == "{" ) { state = 2; } else { state = 0; }
break;
case 2:
if( str[pos] == "}" } {
state = 0;
result.Add( str.Substring(0, temp) );
result.Add( str.Substring(temp, pos) );
str = str.Substring(pos);
pos = -1;
}
break;
}
}
if( str != "" ) {
result.Add(str);
}
return result;
}
Or something like that. I usually get the parameters of Substring wrong on the first try, but that's the general idea.
You can get a much more powerful (and easier to read) lexer by using something like ANTLR.
Using a little bit of Gabe's suggestion, I've come up with a solution that does exactly what I was looking to do:
string tokenPattern = #"(\${\w{1,4}})";
string cmdSequence = "${P}test${P}${P}test${P}${Cr}";
string[] tokenized = (from token in Regex.Split(cmdSequence, tokenPattern)
where token != string.Empty
select token).ToArray();
With the command sequence in the above example, the array contains this:
{ "${P}", "test", "${P}", "${P}", "test", "${P}", "${Cr}"}

.NET String parsing performance improvement - Possible Code Smell

The code below is designed to take a string in and remove any of a set of arbitrary words that are considered non-essential to a search phrase.
I didn't write the code, but need to incorporate it into something else. It works, and that's good, but it just feels wrong to me. However, I can't seem to get my head outside the box that this method has created to think of another approach.
Maybe I'm just making it more complicated than it needs to be, but I feel like this might be cleaner with a different technique, perhaps by using LINQ.
I would welcome any suggestions; including the suggestion that I'm over thinking it and that the existing code is perfectly clear, concise and performant.
So, here's the code:
private string RemoveNonEssentialWords(string phrase)
{
//This array is being created manually for demo purposes. In production code it's passed in from elsewhere.
string[] nonessentials = {"left", "right", "acute", "chronic", "excessive", "extensive",
"upper", "lower", "complete", "partial", "subacute", "severe",
"moderate", "total", "small", "large", "minor", "multiple", "early",
"major", "bilateral", "progressive"};
int index = -1;
for (int i = 0; i < nonessentials.Length; i++)
{
index = phrase.ToLower().IndexOf(nonessentials[i]);
while (index >= 0)
{
phrase = phrase.Remove(index, nonessentials[i].Length);
phrase = phrase.Trim().Replace(" ", " ");
index = phrase.IndexOf(nonessentials[i]);
}
}
return phrase;
}
Thanks in advance for your help.
Cheers,
Steve
This appears to be an algorithm for removing stop words from a search phrase.
Here's one thought: If this is in fact being used for a search, do you need the resulting phrase to be a perfect representation of the original (with all original whitespace intact), but with stop words removed, or can it be "close enough" so that the results are still effectively the same?
One approach would be to tokenize the phrase (using the approach of your choice - could be a regex, I'll use a simple split) and then reassemble it with the stop words removed. Example:
public static string RemoveStopWords(string phrase, IEnumerable<string> stop)
{
var tokens = Tokenize(phrase);
var filteredTokens = tokens.Where(s => !stop.Contains(s));
return string.Join(" ", filteredTokens.ToArray());
}
public static IEnumerable<string> Tokenize(string phrase)
{
return string.Split(phrase, ' ');
// Or use a regex, such as:
// return Regex.Split(phrase, #"\W+");
}
This won't give you exactly the same result, but I'll bet that it's close enough and it will definitely run a lot more efficiently. Actual search engines use an approach similar to this, since everything is indexed and searched at the word level, not the character level.
I guess your code is not doing what you want it to do anyway. "moderated" would be converted to "d" if I'm right. To get a good solution you have to specify your requirements a bit more detailed. I would probably use Replace or regular expressions.
I would use a regular expression (created inside the function) for this task. I think it would be capable of doing all the processing at once without having to make multiple passes through the string or having to create multiple intermediate strings.
private string RemoveNonEssentialWords(string phrase)
{
return Regex.Replace(phrase, // input
#"\b(" + String.Join("|", nonessentials) + #")\b", // pattern
"", // replacement
RegexOptions.IgnoreCase)
.Replace(" ", " ");
}
The \b at the beginning and end of the pattern makes sure that the match is on a boundary between alphanumeric and non-alphanumeric characters. In other words, it will not match just part of the word, like your sample code does.
Yeah, that smells.
I like little state machines for parsing, they can be self-contained inside a method using lists of delegates, looping through the characters in the input and sending each one through the state functions (which I have return the next state function based on the examined character).
For performance I would flush out whole words to a string builder after I've hit a separating character and checked the word against the list (might use a hash set for that)
I would create A Hash table of Removed words parse each word if in the hash remove it only one time through the array and I believe that creating a has table is O(n).
How does this look?
foreach (string nonEssent in nonessentials)
{
phrase.Replace(nonEssent, String.Empty);
}
phrase.Replace(" ", " ");
If you want to go the Regex route, you could do it like this. If you're going for speed it's worth a try and you can compare/contrast with other methods:
Start by creating a Regex from the array input. Something like:
var regexString = "\\b(" + string.Join("|", nonessentials) + ")\\b";
That will result in something like:
\b(left|right|chronic)\b
Then create a Regex object to do the find/replace:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(regexString, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Then you can just do a Replace like so:
string fixedPhrase = regex.Replace(phrase, "");

Categories