Ignoring case sensitivity - c#

I have to count how many times each word from given input text appears in it.
And the thing where I'm stuck: The character casing differences should be ignored.
For example: "You are here.You you" -> the output :
are=1
here=1
You=3
What I've done:
string text = "You are here.You you";
IDictionary<string, int> wordsCount = new SortedDictionary<string, int>();
string[] words = text.Split(' ',',','.','-','!');
foreach (string word in words)
{
int count = 1;
if (wordsCount.ContainsKey(word))
count = wordsCount[word] + 1;
wordsCount[word] = count;
}
var items = from pair in wordsCount
orderby pair.Value ascending
select pair;
foreach (var p in items)
{
Console.WriteLine("{0} -> {1}", p.Key, p.Value);
}
There is a chance to make this possible without checking manually every word from the given text? For example if I have a very long paragraph to not check every word using the specific method?

Just add
for(i = 0; text[i] != '\0'; i++){
text[i] = text[i].ToLower();
}
But as text is a string, just do :
text = text.ToLower();
Just before the string[] words = text.Split(' ',',','.','-','!'); line.
And then enjoy !

How about linq?
var text = "You are here.You you";
var words = text.Split(' ', ',', '.', '-', '!');
words
.GroupBy(word => word.ToLowerInvariant())
.OrderByDescending(group => group.Count())
.ToList()
.ForEach(g=> Console.WriteLine(g.Key + "=" + g.Count()));

Related

How to find the words on the left and right of the searched word

How can I find left and right word in string from selected word string may contain, for example I have a string:
string input = "all our vidphone lines here are trapped. they recirculate the call to other offices within the building";
var word = new List<string> { "other", "they", "all" };
if (word.Any(input.Contains))
{
//and here I want find left and right word from found words
}
So in desired result each found word must be attached as separate value, and should looks like this:
Found: all
Left: (NONE)
Right: our
Found: they
Left: trapped.
Right: recirculate
Found: they
Left: to
Right: offices
Split the input string
String[] haystack = input.Split(' ');
For each word in the query, do the search on haystack
foreach (var w in word) {
for (int i = 0; i < haystack.Length; i++) {
if (w == haystack[i]) {
// print w
// left is haystack[i-1] when i > 0, if i == 0 it's None
// right is haystack[i+1] when i < haystack.length-1, if i == haystack.length-1 it's None
}
}
}
Working example: https://ideone.com/hLry3u
string input = "all our vidphone lines here are trapped. they recirculate the call to other offices within the building";
var queryList = new List<string> { "other", "they", "all", "building" };
string[] stack = input.Split(' ').Select(s => s.Trim())
.Where(s => s != string.Empty)
.ToArray();
foreach (var word in queryList)
{
for (int i = 0; i < stack.Length; i++)
{
if (word != stack[i]) continue;
Console.WriteLine($"Found: {word}");
Console.WriteLine(i > 0 ? $"Left: {stack[i-1]}" : "Left: (NONE)");
Console.WriteLine(i < stack.Length - 1 ? $"Right: {stack[i+1]}" : "Right: (NONE)");
Console.WriteLine();
}
}
Console.ReadLine();
Alternatively you can use
string[] stack = Regex.Split(input, #"\s+");
instead of
string[] stack = input.Split(' ').Select(s => s.Trim())
.Where(s => s != string.Empty)
.ToArray();
Depends your liking of RegEx

Separate words from a string in C#

So i'm working on a program for a university degree. First requirement was to show the number of times each letter of the alphabet appears in a string. Now to develop this program further i would like to show all the words that are in the string, in a list. Here is the current code that i have.
public void occurances()
{
string sentence;
Console.WriteLine("\n");
Console.WriteLine("Please enter a random sentence and press enter");
Console.WriteLine("\n");
var occurances = new Dictionary<char, int>();
var words = occurances;
//a for each loop, and within it, the char variable is a assigned named "characters"
//The value "characters" will represent all the characters in the sentence string.
sentence = Console.ReadLine();
foreach (char characters in sentence)
{
//if the sentence contains characters
if (occurances.ContainsKey(characters))
//add 1 to the value of occurances
occurances[characters] = occurances[characters] + 1;
//otherwise keep the occurnaces value as 1
else
occurances[characters] = 1;
}
foreach (var entry in occurances)
{
//write onto the screen in position 0 and 1, where 0 will contain the entry key
// and 1 will contain the amount of times the entry has been entered
Console.WriteLine("{0}: {1}", entry.Key, entry.Value);
}
//Pause
Console.ReadLine();
}
For 1st Requirement:
var charGroups = sentence.GroupBy(x => x).OrderByDescending(x => x.Count());
For 2nd Requirement:
How to: Count Occurrences of a Word in a String (LINQ)
I thinks the easiest way would be this:
var WordList = YourString.Split(' ').toList(); // Making the list of words
var CharArray = YourString.toCharArray(); // Counting letters
var q = from x in CharArray
group x by x into g
let count = g.Count()
orderby count descending
select new {Value = g.Key, Count = count};

Occurence of elements in the file with c# and Dictionary

I have a file as
outlook temperature Humidity Windy PlayTennis
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
I want to find occurence of each element e.g
sunny: 2
rain: 3
overcast:1
hot: 3
and so on
My code is:
string file = openFileDialog1.FileName;
var text1 = File.ReadAllLines(file);
StringBuilder str = new StringBuilder();
string[] lines = File.ReadAllLines(file);
string[] nonempty=lines.Where(s => s.Trim(' ')!="")
.Select(s => Regex.Replace(s, #"\s+", " ")).ToArray();
string[] colheader = null;
if (nonempty.Length > 0)
colheader = nonempty[0].Split();
else
return;
var linevalue = nonempty.Skip(1).Select(l => l.Split());
int colcount = colheader.Length;
Dictionary<string, string> colvalue = new Dictionary<string, string>();
for (int i = 0; i < colcount; i++)
{
int k = 0;
foreach (string[] values in linevalue)
{
if(! colvalue.ContainsKey(values[i]))
{
colvalue.Add(values[i],colheader[i]);
}
label2.Text = label2.Text + k.ToString();
}
}
foreach (KeyValuePair<string, string> pair in colvalue)
{
label1.Text += pair.Key+ "\n";
}
Output I get here is
sunny
overcast
rain
hot
mild
cool
N
P
true
false
I also want to find the occurence, which I am unable to get. Can u please help me out here.
This LINQ query will return Dictionary<string, int> which will contain each word in file as key, and word's occurrences as value:
var occurences = File.ReadAllLines(file).Skip(1) // skip titles line
.SelectMany(l => l.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))
.GroupBy(w => w)
.ToDictionary(g => g.Key, g => g.Count());
Usage of dictionary:
int sunnyOccurences = occurences["sunny"];
foreach(var pair in occurences)
label1.Text += String.Format("{0}: {1}\n", pair.Key, pair.Value);
Seems to me like you are implementing a simple Tag Cloud. I have used non-generic collection but you can replace it with generic. Replace the HashTable with Dictionary
Follow this code:
Hashtable tagCloud = new Hashtable();
ArrayList frequency = new ArrayList();
Read from a file and store it as array
string[] lines = File.ReadAllLines("file.txt");
//use the specific delimiter
char[] delimiter = new char[] { ' ' };
StringBuilder buffer = new StringBuilder();
foreach (string line in lines)
{
if (line.ToString().Length != 0)
{
buffer.Append((" " + line.Trim()));
}
}
string[] words = buffer.ToString().Trim().Split(delimiter);
Storing occurrence of each word.
List<string> listOfWords = new List<string>(words);
foreach (string i in listOfWords)
{
int c = 0;
foreach (string j in words)
{
if (i.Equals(j))
c++;
}
frequency.Add(c);
}
Store as key value pair. Value will be word and key will be its occurrence
for (int i = 0; i < listOfWords.Count; i++)
{
//use dictionary here
tagCloud.Add(listOfWords[i], (int)frequency[i]);
}
If all you want is the keyword and a count of how many times they appear in the file, then lazyberezovsky's solution is about as elegant of a solution as you will find. But if you need to do any other metrics on the file's data, then I would load the file into a collection that keeps your other metadata intact.
Something simple like:
var forecasts = File.ReadAllLines(file).Skip(1) // skip the header row
.Select(line => line.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)) // split the line into an array of strings
.Select (f =>
new
{
Outlook = f[0],
Temperature = f[1],
Humidity = f[2],
Windy = f[3],
PlayTennis = f[4]
});
will give you an IEnumerable<> of an anonymous type that has properties that can be queried.
For example if you wanted to see how many times "sunny" occurred in the Outlook then you could just use LINQ to do this:
var count = forecasts.Count( f => f.Outlook == "sunny");
Or if you just wanted the list of all outlooks you could write:
var outlooks = forecasts.Select(f => f.Outlook).Distinct();
Where this is useful is when you want to do more complicated queries like "How many rainy cool days are there?
var count = forecasts.Count (f => f.Outlook == "rain" && f.Temperature == "cool");
Again if you just want all words and their occurrence count, then this is overkill.

Splitting an string into a string array.?

I am facing a problem while executing a sql query in C#.The sql query throws an error when the string contains more than 1000 enteries in the IN CLAUSE .The string has more than 1000 substrings each seperated by ','.
I want to split the string into string array each containing 999 strings seperated by ','.
or
How can i find the nth occurence of ',' in a string.
Pull the string from SQL server into a DataSet using a utilities code like
string strResult = String.Empty;
using (SqlCommand cmd = new SqlCommand())
{
cmd.Connection = conn;
cmd.CommandText = strSQL;
strResult = cmd.ExecuteScalar().ToString();
}
Get the returned string from SQL Server
Split the string on the ','
string[] strResultArr = strResult.Split(',');
then to get the nth string that is seperated by ',' (I think this is what you mean by "How can i find the nth occurence of ',' in a string." use
int n = someInt;
string nthEntry = strResultArr[someInt - 1];
I hope this helps.
You could use a regular expression and the Index property of the Match class:
// Long string of 2000 elements, seperated by ','
var s = String.Join(",", Enumerable.Range(0,2000).Select (e => e.ToString()));
// find all ',' and use '.Index' property to find the position in the string
// to find the first occurence, n has to be 0, etc. etc.
var nth_position = Regex.Matches(s, ",")[n].Index;
To create an array of strings of your requiered size, you could split your string and use LINQ's GroupBy to partition the result, and then joining the resulting groups together:
var result = s.Split(',').Select((x, i) => new {Group = i/1000, Value = x})
.GroupBy(item => item.Group, g => g.Value)
.Select(g => String.Join(",", g));
result now contains two strings, each with 1000 comma seperated elements.
How's this:
int groupSize = 1000;
string[] parts = s.Split(',');
int numGroups = parts.Length / groupSize + (parts.Length % groupSize != 0 ? 1 : 0);
List<string[]> Groups = new List<string[]>();
for (int i = 0; i < numGroups; i++)
{
Groups.Add(parts.Skip(i * groupSize).Take(groupSize).ToArray());
}
Maybe something like this:
string line = "1,2,3,4";
var splitted = line.Split(new[] {','}).Select((x, i) => new {
Element = x,
Index = i
})
.GroupBy(x => x.Index / 1000)
.Select(x => x.Select(y => y.Element).ToList())
.ToList();
After this you should just String.Join each IList<string>.
//initial string of 10000 entries divided by commas
string s = string.Join(", ", Enumerable.Range(0, 10000));
//an array of entries, from the original string
var ss = s.Split(',');
//auxiliary index
int index = 0;
//divide into groups by 1000 entries
var words = ss.GroupBy(w =>
{
try
{
return index / 1000;
}
finally
{
++index;
}
})//join groups into "words"
.Select(g => string.Join(",", g));
//print each word
foreach (var word in words)
Console.WriteLine(word);
Or you may find the indeces in the string and split it into substrings afterwards:
string s = string.Join(", ", Enumerable.Range(0, 100));
int index = 0;
var indeces =
Enumerable.Range(0, s.Length - 1).Where(i =>
{
if (s[i] == ',')
{
if (index < 9)
++index;
else
{
index = 0;
return true;
}
}
return false;
}).ToList();
Console.WriteLine(s.Substring(0, indeces[0]));
for (int i = 0; i < indeces.Count - 1; i++)
{
Console.WriteLine(s.Substring(indeces[i], indeces[i + 1] - indeces[i]));
}
However, I would think over, if it was possible to work with the entries before they are combined into one string. And probably think, if it was possible to prevent the necessity to make a query which needs that great list to pass into the IN statement.
string foo = "a,b,c";
string [] foos = foo.Split(new char [] {','});
foreach(var item in foos)
{
Console.WriteLine(item);
}

Find NOT matching characters in a string with regex?

If Im able to check a string if there are invalid characters:
Regex r = new Regex("[^A-Z]$");
string myString = "SOMEString";
if (r.IsMatch(myString))
{
Console.WriteLine("invalid string!");
}
it is fine. But what I would like to print out every invalid character in this string? Like in the example SOMEString => invalid chars are t,r,i,n,g. Any ideas?
Use LINQ. Following will give you an array of 5 elements, not matching to the regex.
char[] myCharacterArray = myString.Where(c => r.IsMatch(c.ToString())).ToArray();
foreach (char c in myCharacterArray)
{
Console.WriteLine(c);
}
Output will be:
t
r
i
n
g
EDIT:
It looks like, you want to treat all lower case characters as invalid string. You may try:
char[] myCharacterArray2 = myString
.Where(c => ((int)c) >= 97 && ((int)c) <= 122)
.ToArray();
In your example the regex would succeed on one character since it's looking for the last character if it isn't uppercase, and your string has such a character.
The regex should be changed to Regex r = new Regex("[^A-Z]");.
(updated following #Chris's comments)
However, for your purpose the regex is actually what you want - just use Matches.
e.g.:
foreach (Match item in r.Matches(myString))
{
Console.WriteLine(item.ToString() + " is invalid");
}
Or, if you want one line:
foreach (Match item in r.Matches(myString))
{
str += item.ToString() + ", ";
}
Console.WriteLine(str + " are invalid");
Try with this:
char[] list = new char[5];
Regex r = new Regex("[^A-Z]*$");
string myString = "SOMEString";
foreach (Match match in r.Matches(myString))
{
list = match.Value.ToCharArray();
break;
}
string str = "invalid chars are ";
foreach (char ch in list)
{
str += ch + ", ";
}
Console.Write(str);
OUTPUT: invalid chars are t, r, i, n, g

Categories