Splitting text and putting it into dictionary - c#

I have text with 600 words and I'm supposed to delete every quotation marks, numbers(years, dates, ..), digits ,... I should only have words, and I have to put in into dictionary.
So I have tried to go through with for each loop and get the first letter and save it in a list. Then I split every row in a word.
e.g.:
You are pretty.
You
are
pretty
The problem there are words in a row they're still same but they shouldn't be same. I've tried to fix it but I couldn't find any solution.
public Dictionary<string, int> words = new Dictionary<string, int>();
public Dictionary<char, List<string>> firstletter = new Dictionary<char, List<string>>();
public Aufgabe(string filename)
{
string filler = "ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜ";
foreach (char f in filler)
{
firstletter[f] = new List<string>();
}
Load(filename);
}
public void Load(string filename)
{
List<string> w = new List<string>();
StreamReader r = new StreamReader(filename);
while (!r.EndOfStream)
{
string row = r.ReadLine();
string[] parts = row.Split(' ');
string[] sonderzeichen = new string[] { "#", ",", ".", ";", "'", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0", "(", ")", "{",
"}", "!", "?", "/", "\"", "&", "+", "-", "–" };
string[] list = new string[parts.Length];
for (int i = 0; i < parts.Length; i++)
{
string a = parts[i];
foreach (string s in sonderzeichen)
{
if (s != "-")
{
a = a.Replace(s, string.Empty);
}
else
{
if (a.Length == 1)
{
a = string.Empty;
}
}
}
list[i] = a;
}
parts = list;
foreach (string a in parts)
{
if (words.ContainsKey(a))
{
words[a] += 1;
}
else
{
words.Add(a, 1);
}
string b = a.ToUpper();
if (b == "")
continue;
List<string> letter = firstletter[b[0]];
if (!letter.Contains(a))
{
letter.Add(a);
}
}
}
}

There are some things missing in the other answers:
No validation is done to check if the text is a word
Comparison should not be case-sensitive (i.e. spain, Spain and SPAIN should be considered the same word)
My solution:
StringComparer comparer = StringComparer.OrdinalIgnoreCase;
string text = "The 'rain' in spain falls mainly on the plain. 07 November 2018 20:02:07 - 20180520 I said the Plain in SPAIN. 12345";
var dictionary = Regex.Split(text, #"\W+")
.Where(IsValidWord)
.GroupBy(m => m, comparer)
.ToDictionary(m => m.Key, m => m.Count(), comparer);
Method IsValidWord:
// logic to validate word goes here
private static bool IsValidWord(string text)
{
double value;
bool isNumeric = double.TryParse(text, out value);
// add more validation rules here
return !isNumeric;
}
EDIT
I noticed in your code that you have a Dictionary with the words grouped by first letter. This can be achieved like this (using the previous dictionary):
var lettersDictionary = dictionary.Keys.GroupBy(x => x.Substring(0, 1),
(alphabet, subList) => new {
Alphabet = alphabet,
SubList = subList.OrderBy(x => x, comparer).ToList()
})
.ToDictionary(m => m.Alphabet, m => m.SubList, comparer);

You can just split with a regex, then use LINQ to create your dictionary:
var dictionary = Regex.Split(text, #"\W+")
.GroupBy(m => m, StringComparer.OrdinalIgnoreCase) // Case-insensitive
.ToDictionary(m => m.Key, m => m.Count());
UPDATE
In applying to your example code, your task class could become something like this to build both dictionaries (and to consider case insensitive):
public class Aufgabe
{
const string ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜ";
public Dictionary<string, int> words;
public Dictionary<char, List<string>> firstletter;
public Aufgabe(string filename)
{
var text = File.ReadAllText(filename);
words = Regex.Split(text, #"\W+")
.GroupBy(m => m, StringComparer.OrdinalIgnoreCase)
.ToDictionary(m => m.Key, m => m.Count());
firstletter = ALPHABET.ToDictionary(a => a, // First-letter key
a => words.Keys.Where(m => a == char.ToUpper(m[0])).ToList()); // Words
}
}

Here is one way with Regex, note that case sensitivity has not been addressed
var text = "The 'rain' in spain falls mainly on the plain. I said the plain in spain";
var result = new Dictionary<string,string>();
Regex.Matches(text, #"[^\s]+")
.OfType<Match>()
.Select(m => Regex.Replace(m.Value, #"\W", string.Empty))
.ToList()
.ForEach(word =>
{
if (!result.ContainsKey(word))
result.Add(word, word);
});
result

This is almost certainly a job for regular expressions. \W+ splits your input string into words, defined as any character sequence of alphanumeric characters. See the documentation.
string sentence = "You are pretty. State-of-the-art.";
string[] words = Regex.Split(sentence, #"\W+");
foreach (string word in words)
{
if (word != "")
{
Console.WriteLine(word);
}
}

Related

C# Replace All Vowels in Generic List

How do I replace all vowels in list with a space? Following code does not seem to be working.
List<string> Instruments = new List<string>();
Instruments.Add("cello");
Instruments.Add("guitar");
Instruments.Add("violin");
Instruments.Add("double bass");
string vowels = "a e i o u y";
Instruments.ForEach(w=>vowels = vowels.Replace(w,""));
Expected Result:
cll
gtr
vln
Try this
var vowels = new List<char> {'a','e','i','o','u','y'};
var result = new List<string>();
Instruments.ForEach(w => result.Add(new string(w.Select(x => vowels.Any(y => y == x) ? ' ' : x).ToArray())));
You should use .Select if need to changes values in Collection:
List<string> Instruments = new List<string>();
Instruments.Add("cello");
Instruments.Add("guitar");
Instruments.Add("violin");
Instruments.Add("double bass");
var regex = new Regex("^a|e|i|o|u", RegexOptions.IgnoreCase);
var withoutVowels = from instr in Instruments
select regex.Replace(instr, string.Empty);
foreach (var item in withoutVowels)
{
Console.WriteLine(item);
}
A little quick and dirty but this works for me.
List<string> Instruments = new List<string>();
var newList = new List<String>();
Instruments.Add("cello");
Instruments.Add("guitar");
Instruments.Add("violin");
Instruments.Add("double bass");
List<string> vowels = new List<string> { "a", "e", "i", "o", "u", "y" };
Instruments.ForEach(w =>
{
var temp = w;
vowels.ForEach(v =>
{
temp = temp.Replace(v, "");
});
newList.Add(temp);
});
newList.ForEach(w => Console.WriteLine(w));
You could try the same with with Regex:
public void ReplaceAllVowels()
{
List<string> Instruments = new List<string>();
Instruments.Add("cello");
Instruments.Add("guitar");
Instruments.Add("violin");
Instruments.Add("double bass");
var pattern = new Regex("[aeiouy]");
var lst = Instruments.Select(i => pattern.Replace(i, "")).ToList();
foreach (var item in lst)
{
Console.WriteLine(item);
}
}
Instruments has a collection of string, or words, in it. Your .ForEach iterates through that collection where each instance is w, but you're not affecting those instances with your usage of .Replace here, you're using them in an operation to affect the string vowels.
see String.Replace MSN Documentation.
As such, you need to also iterate through your vowels string, and use the charinstance of vowels: w.Replace(v, " "), where w is your word instance in Instruments, and v is your vowel instance in vowels:
So, it should be Instruments.ForEach(w => foreach (vowelChar in vowels.split(" ")) { w.Replace(vowelChar, " "); });
Note: #mjwills pointed out the other issue with this operation in comments. the assignment to w won't persist here. so, you'll need to create new List<string> in some fashion to persist it (either declare it before hand and add to it in iteration, or use Linq .ToList<T> Extension of IEnumerable<T>.
However, that is inefficient since you're essentially creating a char[] from vowels string using .Split on each iteration of Instruments.
Instead, you should defined your vowels as a char[] to avoid the necessity of that operation: var vowels = new char[] {'a','e','i','o','u', 'y'};
List<string> Instruments = new List<string>();
Instruments.Add("cello");
Instruments.Add("guitar");
Instruments.Add("violin");
Instruments.Add("double bass");
char[] vowels = new char[] {'a','e','i','o','u','y'};
Instruments.ForEach(w => {
foreach (char v in vowels) {
w = w.Replace(v, ' ');
}
});
EDIT: assigning to w instances of Instruments in the iteration don't persist, so you would need to create a new instance of List<string> for your results.
List<string> results = new List<string>();
Instruments.ForEach(w => {
foreach (char v in vowels) {
results.Add(w.Replace(v, ' '));
}
});
results.ForEach(w => Console.WriteLine(w));
This works nicely for me:
List<string> Instruments = new List<string>();
Instruments.Add("cello");
Instruments.Add("guitar");
Instruments.Add("violin");
Instruments.Add("double bass");
string vowels = "aeiouy";
var results = vowels.Aggregate(Instruments,
(i, v) => i.Select(x => x.Replace(v.ToString(), "")).ToList());
I get:
cll
gtr
vln
dbl bss

How to find the duplicates in the given string in c#

I want to find the duplicates for a given string, I tried for collections, It is working fine, but i don't know how to do it for a string.
Here is the code I tried for collections,
string name = "this is a a program program";
string[] arr = name.Split(' ');
var myList = new List<string>();
var duplicates = new List<string>();
foreach(string res in arr)
{
if (!myList.Contains(res))
{
myList.Add(res);
}
else
{
duplicates.Add(res);
}
}
foreach(string result in duplicates)
{
Console.WriteLine(result);
}
Console.ReadLine();
But I want to find the duplicates for the below string and to store it in an array. How to do that?
eg:- string aa = "elements";
In the above string i want to find the duplicate characters and store it in an array
Can anyone help me?
Linq solution:
string name = "this is a a program program";
String[] result = name.Split(' ')
.GroupBy(word => word)
.Where(chunk => chunk.Count() > 1)
.Select(chunk => chunk.Key)
.ToArray();
Console.Write(String.Join(Environment.NewLine, result));
The same princicple for duplicate characters within a string:
String source = "elements";
Char[] result = source
.GroupBy(c => c)
.Where(chunk => chunk.Count() > 1)
.Select(chunk => chunk.Key)
.ToArray();
// result = ['e']
Console.Write(String.Join(Environment.NewLine, result));
string name = "elements";
var myList = new List<char>();
var duplicates = new List<char>();
foreach (char res in name)
{
if (!myList.Contains(res))
{
myList.Add(res);
}
else if (!duplicates.Contains(res))
{
duplicates.Add(res);
}
}
foreach (char result in duplicates)
{
Console.WriteLine(result);
}
Console.ReadLine();
string is an array of chars. So, you can use your collection approach.
But, I would reccomend typed HashSet. Just load it with string and you'll get array of chars without duplicates, with preserved order.
take a look:
string s = "aaabbcdaaee";
HashSet<char> hash = new HashSet<char>(s);
HashSet<char> hashDup = new HashSet<char>();
foreach (var c in s)
if (hash.Contains(c))
hash.Remove(c);
else
hashDup.Add(c);
foreach (var x in hashDup)
Console.WriteLine(x);
Console.ReadKey();
Instead of a List<> i'd use a HashSet<> because it doesn't allow duplicates and Add returns false in that case. It's more efficient. I'd also use a Dictionary<TKey,Tvalue> instead of the list to track the count of each char:
string text = "elements";
var duplicates = new HashSet<char>();
var duplicateCounts = new Dictionary<char, int>();
foreach (char c in text)
{
int charCount = 0;
bool isDuplicate = duplicateCounts.TryGetValue(c, out charCount);
duplicateCounts[c] = ++charCount;
if (isDuplicate)
duplicates.Add(c);
}
Now you have all unique duplicate chars in the HashSet and the count of each unique char in the dictionary. In this example the set only contains e because it's three times in the string.
So you could output it in the following way:
foreach(char dup in duplicates)
Console.WriteLine("Duplicate char {0} appears {1} times in the text."
, dup
, duplicateCounts[dup]);
For what it's worth, here's a LINQ one-liner which also creates a Dictionary that only contains the duplicate chars and their count:
Dictionary<char, int> duplicateCounts = text
.GroupBy(c => c)
.Where(g => g.Count() > 1)
.ToDictionary(g => g.Key, g => g.Count());
I've shown it as second approach because you should first understand the standard way.
string name = "this is a a program program";
var arr = name.Split(' ').ToArray();
var dup = arr.Where(p => arr.Count(q => q == p) > 1).Select(p => p);
HashSet<string> hash = new HashSet<string>(dup);
string duplicate = string.Join(" ", hash);
You can do this through `LINQ
string name = "this is a a program program";
var d = name.Split(' ').GroupBy(x => x).Select(y => new { word = y.Key, Wordcount = y.Count() }).Where(z=>z.cou > 1).ToList();
Use LINQ to group values:
public static IEnumerable<T> GetDuplicates<T>(this IEnumerable<T> list)
{
return list.GroupBy(item => item).SelectMany(group => group.Skip(1));
}
public static bool HasDuplicates<T>(this IEnumerable<T> list)
{
return list.GetDuplicates().IsNotEmpty();
}
Then you use these extensions like this:
var list = new List<string> { "a", "b", "b", "c" };
var duplicatedValues = list.GetDuplicates();

How do I print output in a certain way

I have written this code using LINQ query
static public void BracesRule(String input)
{
//Regex for Braces
string BracesRegex = #"\{|\}";
Dictionary<string, string> dictionaryofBraces = new Dictionary<string, string>()
{
//{"String", StringRegex},
//{"Integer", IntegerRegex },
//{"Comment", CommentRegex},
//{"Keyword", KeywordRegex},
//{"Datatype", DataTypeRegex },
//{"Not included in language", WordRegex },
//{"Identifier", IdentifierRegex },
//{"Parenthesis", ParenthesisRegex },
{"Brace", BracesRegex },
//{"Square Bracket", ArrayBracketRegex },
//{"Puncuation Mark", PuncuationRegex },
//{"Relational Expression", RelationalExpressionRegex },
//{"Arithmetic Operator", ArthimeticOperatorRegex },
//{"Whitespace", WhitespaceRegex }
};
var matches = dictionaryofBraces.SelectMany(a => Regex.Matches(input, a.Value)
.Cast<Match>()
.Select(b =>
new
{
Index = b.Index,
Value = b.Value,
Token = a.Key
}))
.OrderBy(a => a.Index).ToList();
for (int i = 0; i < matches.Count; i++)
{
if (i + 1 < matches.Count)
{
int firstEndPos = (matches[i].Index + matches[i].Value.Length);
if (firstEndPos > matches[(i + 1)].Index)
{
matches.RemoveAt(i + 1);
i--;
}
}
}
foreach (var match in matches)
{
Console.WriteLine(match);
}
}
it's output is something like this
{Index=0, Value= {, Token=Brace}
But I want Output be like
{ BRACE
One possibility would be to modify the anonymous object - create the string from the Key(=Brace) and the Value(={ or }):
string input = "ali{}";
//Regex for Braces
string BracesRegex = #"\{|\}";
Dictionary<string, string> dictionaryofBraces = new Dictionary<string, string>()
{
{"Brace", BracesRegex }
};
var matches = dictionaryofBraces.SelectMany(a => Regex.Matches(input, a.Value)
.Cast<Match>()
.Select(b => String.Format("{0} {1}", b.Value, a.Key.ToUpper())))
.OrderBy(a => a).ToList();
foreach (var match in matches)
{
Console.WriteLine(match);
}
The output is as desired:
{ BRACE
} BRACE

split string to Dictionnary<string, int>

I have a string like that : "content;123 contents;456 contentss;789 " etc..
I would like to split this string to get a Dictionary, but I don't know you to make it. I try to split the string but I got a List only.
The content (before semi colon) is always a unique string.
After the semi colon, I always have a number until I found the space.
the number is always an int (no float needs).
Could someone help me please ?
You can use the following LINQ expression:
"content;123 contents;456 contentss;789"
.Split(' ')
.Select(x => x.Split(';'))
.ToDictionary(x => x[0], x => int.Parse(x[1]));
string input = "content1;123 content2;456 content3;789";
var dict = Regex.Matches(input, #"(.+?);(\d+)").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => int.Parse(m.Groups[2].Value));
You can do something like this:
string value = "content;123 contents;456 contentss;789";
Dictionary<string, int> data = new Dictionary<string,int>();
foreach(string line in value.Split(' '))
{
string[] values = line.Split(';');
if (!data.ContainsKey(values[0]))
{
data.Add(values[0], Convert.ToInt32(values[1]));
}
}
var myList = "content1;number1 content2;number2 content3;number3";
var myDictionary = myList.Split(' ').Select(pair => pair.Split(';')).ToDictionary(splitPair => splitPair[0], splitPair => int.Parse(splitPair[1]));
static void Main(string[] args)
{
string content = "content;123 contents;456 contentss;789";
Dictionary<string, int> result = new Dictionary<string, int>();
content.Split(' ').ToList().ForEach(x =>
{
var items = x.Split(';');
result.Add(items[0], int.Parse(items[1]));
});
foreach(var item in result)
{
Console.WriteLine("{0} -> {1}" , item.Key, item.Value);
}
}

How to split string into a dictionary

I have this string
string sx="(colorIndex=3)(font.family=Helvetica)(font.bold=1)";
and am splitting it with
string [] ss=sx.Split(new char[] { '(', ')' },
StringSplitOptions.RemoveEmptyEntries);
Instead of that, how could I split the result into a Dictionary<string,string>? The
resulting dictionary should look like:
Key Value
colorIndex 3
font.family Helvetica
font.bold 1
It can be done using LINQ ToDictionary() extension method:
string s1 = "(colorIndex=3)(font.family=Helvicta)(font.bold=1)";
string[] t = s1.Split(new[] { '(', ')' }, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, string> dictionary =
t.ToDictionary(s => s.Split('=')[0], s => s.Split('=')[1]);
EDIT: The same result can be achieved without splitting twice:
Dictionary<string, string> dictionary =
t.Select(item => item.Split('=')).ToDictionary(s => s[0], s => s[1]);
There may be more efficient ways, but this should work:
string sx = "(colorIndex=3)(font.family=Helvicta)(font.bold=1)";
var items = sx.Split(new[] { '(', ')' }, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Split(new[] { '=' }));
Dictionary<string, string> dict = new Dictionary<string, string>();
foreach (var item in items)
{
dict.Add(item[0], item[1]);
}
Randal Schwartz has a rule of thumb: use split when you know what you want to throw away or regular expressions when you know what you want to keep.
You know what you want to keep:
string sx="(colorIndex=3)(font.family=Helvetica)(font.bold=1)";
Regex pattern = new Regex(#"\((?<name>.+?)=(?<value>.+?)\)");
var d = new Dictionary<string,string>();
foreach (Match m in pattern.Matches(sx))
d.Add(m.Groups["name"].Value, m.Groups["value"].Value);
With a little effort, you can do it with ToDictionary:
var d = Enumerable.ToDictionary(
Enumerable.Cast<Match>(pattern.Matches(sx)),
m => m.Groups["name"].Value,
m => m.Groups["value"].Value);
Not sure whether this looks nicer:
var d = Enumerable.Cast<Match>(pattern.Matches(sx)).
ToDictionary(m => m.Groups["name"].Value,
m => m.Groups["value"].Value);
string sx = "(colorIndex=3)(font.family=Helvetica)(font.bold=1)";
var dict = sx.Split(new[] { '(', ')' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split('='))
.ToDictionary(x => x[0], y => y[1]);
var dict = (from x in s1.Split(new[] { '(', ')' }, StringSplitOptions.RemoveEmptyEntries)
select new { s = x.Split('=') }).ToDictionary(x => x[0], x => x[1]);
Often used for http query splitting.
Usage: Dictionary<string, string> dict = stringToDictionary("userid=abc&password=xyz&retain=false");
public static Dictionary<string, string> stringToDictionary(string line, char stringSplit = '&', char keyValueSplit = '=')
{
return line.Split(new[] { stringSplit }, StringSplitOptions.RemoveEmptyEntries).Select(s => s.Split(new[] { keyValueSplit })).ToDictionary(x => x[0], y => y[1]); ;
}
You can try
string sx = "(colorIndex=3)(font.family=Helvetica)(font.bold=1)";
var keyValuePairs = sx.Split(new[] { '(', ')' }, StringSplitOptions.RemoveEmptyEntries)
.Select(v => v.Split('='))
.ToDictionary(v => v.First(), v => v.Last());
You could do this with regular expressions:
string sx = "(colorIndex=3)(font.family=Helvetica)(font.bold=1)";
Dictionary<string,string> dic = new Dictionary<string,string>();
Regex re = new Regex(#"\(([^=]+)=([^=]+)\)");
foreach(Match m in re.Matches(sx))
{
dic.Add(m.Groups[1].Value, m.Groups[2].Value);
}
// extract values, to prove correctness of function
foreach(var s in dic)
Console.WriteLine("{0}={1}", s.Key, s.Value);
I am just putting this here for reference...
For ASP.net, if you want to parse a string from the client side into a dictionary this is handy...
Create a JSON string on the client side either like this:
var args = "{'A':'1','B':'2','C':'" + varForC + "'}";
or like this:
var args = JSON.stringify(new { 'A':1, 'B':2, 'C':varForC});
or even like this:
var obj = {};
obj.A = 1;
obj.B = 2;
obj.C = varForC;
var args = JSON.stringify(obj);
pass it to the server...
then parse it on the server side like this:
JavaScriptSerializer jss = new JavaScriptSerializer();
Dictionary<String, String> dict = jss.Deserialize<Dictionary<String, String>>(args);
JavaScriptSerializer requires System.Web.Script.Serialization.

Categories