How to split IEnumerable<string> to groups by separator? - c#

I have IEnumerable<string> which represents txt file.
Txt file have this structure:
Number of group ( int )
WordOfGroup1 (string)
WordOfGroup2
WordOfGroupN
EmptyLine
Number of group ( int )
WordOfGroup1 (string)
etc.
I need create from this text Dictionary<fistWordOfGroup(string), allWordsInGroup(List<string>)
How i can make that in linear complexity?

Try the algorithm below. This will add a group of words to the dictionary whenever it comes across an empty line.
List<string> input = new List<string>()
{
"1",
"wordOfGroup11",
"wordOfGroup12",
"wordOfGroup1N",
"\n",
"2",
"wordOfGroup21",
"wordOfGroup22",
"\n"
};
Dictionary<string, List<string>> result = new Dictionary<string, List<string>>();
string firstWordOfGroup = "";
List<string> allWordsInGroup = new List<string>();
foreach (string line in input)
{
if (int.TryParse(line, out int index) == true)
{
allWordsInGroup.Clear();
continue;
}
// I don't know what "EmptyLine" means
if (line == "\n" || line == Environment.NewLine || line == string.Empty)
{
result.Add(firstWordOfGroup, allWordsInGroup);
}
else
{
if (allWordsInGroup.Count == 0)
{
firstWordOfGroup = line;
}
allWordsInGroup.Add(line);
}
}
Also note that if your groups can have the same first word (e.g. both starting with "WordOfGroup1" then you should use a List<KeyValuePair<string, List<string>>> because the dictionary does not store duplicate keys.

Related

Remove duplicate combination of numbers in C# from csv

I'm trying to remove the duplicate combination from a csv file.
I tried using Distinct but it seems to stay the same.
string path;
string newcsvpath = #"C:\Documents and Settings\MrGrimm\Desktop\clean.csv";
OpenFileDialog openfileDial = new OpenFileDialog();
if (openfileDial.ShowDialog() == DialogResult.OK)
{
path = openfileDial.FileName;
var lines = File.ReadLines(path);
var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Distinct())).ToArray();
var unique = grouped.Select(g => g.First());
var buffer = new StringBuilder();
foreach (var name in unique)
{
string value = name;
buffer.AppendLine(value);
}
File.WriteAllText(newcsvpath ,buffer.ToString());
label5.Text = "Complete";
}
For example, I have a combination of
{ 1,1,1,1,1,1,1,1 } { 1,1,1,1,1,1,1,2 }
{ 2,1,1,1,1,1,1,1 } { 1,1,1,2,1,1,1,1 }
The output should be
{ 1,1,1,1,1,1,1,1 }
{ 2,1,1,1,1,1,1,1 }
From you example, it seems that you want to treat each line as a sequence of numbers and that you consider two lines equal if one sequence is a permutation of the other.
So from reading your file, you have:
var lines = new[]
{
"1,1,1,1,1,1,1,1",
"1,1,1,1,1,1,1,2",
"2,1,1,1,1,1,1,1",
"1,1,1,2,1,1,1,1"
};
Now let's convert it to an array of number sequences:
var linesAsNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.ToArray())
.ToArray();
Or better, since we are not interested in permutations, we can sort the numbers in the sequences immediately:
var linesAsSortedNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.OrderBy(number => number)
.ToArray())
.ToArray();
When using Distinct on this, we have to pass a comparer which considers two array equal, if they have the same elements. Let's use the one from this SO question
var result = linesAsSortedNumberSequences.Distinct(new IEnumerableComparer<int>());
Try it
HashSet<string> record = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
StringBuilder textEditor= new StringBuilder();
foreach (string col in columns)
{
textEditor.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
if (!record.Add(textEditor.ToString())
{
}
}

How to do cascade splitting with C# Linq - multiple foreach split

These are the values i want to split the string cascadingly
List<string> lstsplitWord = new List<string> { ",", "=", "،", "أو", "او", "/", "." };
I have written them as like this but i am assuming that there must be more elegant Linq solution for this
foreach(var part1 in srSplitPart.Split(',')) {
foreach(var part2 in part1.Split('=')) {
foreach(var part3 in part2.Split('،')) {
foreach(var part4 in part3.func_Split_By_String("أو")) {
foreach(var part5 in part4.func_Split_By_String("او")) {
foreach(var part6 in part5.Split('/')) {
foreach(var part7 in part6.Split('.')) {
if (part7.Length < 3)
continue;
string srTrans = part7.FixArabic().func_Special_Trim();
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}
}
}
}
}
}
}
C# .net 4.6.2
special split function
public static List<string> func_Split_By_String(this string Sentence, string srReplace)
{
return Sentence.Split(new string[] { srReplace }, StringSplitOptions.None).ToList();
}
You can just iteratively split every element to smaller parts in a given order:
string originalString = ...;
List<string> separators = new List<string> { ",", "=", "،", "أو", "او", "/", "." };
string[] result = new[] { originalString };
foreach (var separator in separators)
{
result = result.SelectMany(x => x.Split(new[] { separator }, StringSplitOptions.RemoveEmptyEntries)).ToArray();
}
result = result
.Where(x => x.Length >= 3)
.Select(x => x.FixArabic().func_Special_Trim())
.ToArray();
foreach (var item in result)
{
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}
At the beginning, your array will contain only your original string.
After the first foreach iteration array will contain original string separated by ",".
After the second foreach iteration every comma-separated part will be separated by =.
It will repeat until result array contains only strings separated by all given separators. It then applies Length >= 3 condition and FixArabic() and func_Special_Trim().
Update: I have just understood one thing - applying all separators in a given order results into the same string array as simply applying all separators without order.
So, actually, you can just do:
string originalString = ...;
string[] separators = new[] { ",", "=", "،", "أو", "او", "/", "." };
string[] result = originalString
.Split(separators, StringSplitOptions.RemoveEmptyEntries)
.Where(x => x.Length >= 3)
.Select(x => x.FixArabic().func_Special_Trim())
.ToArray();
foreach (var item in result)
{
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}

Split a string base on multiple delimiters specified by user

Updated: Thank you for the answer, but I disagree that my question is answered by another thread. "Multiple delimiters" and "Multi-Character delimiters" are 2 different questions.
This is my code so far:
List<string> delimiters = new List<string>();
List<string> data = new List<string>
{
"Car|cBlue,Mazda~Model|m3",
//More data
};
string userInput = "";
int i = 1;
//The user can enter a maximum of 5 delimiters
while (userInput != "go" && i <= 5)
{
userInput = Console.ReadLine();
delimiters.Add(userInput);
i++;
}
foreach (string delimiter in delimiters)
{
foreach (string s in data)
{
//This split is not working
//string output[] = s.Split(delimiter);
}
}
So, if the user enters "|c" and "~", the expected output is: "Car", "Blue,Mazda", "Model|m3"
If the user enters "|c", "|m", and ",", then the expected output will be: "Car", "Blue", "Mazda~Model", "3"
Add the user input into the List delimiters.
string data = "Car|cBlue,Mazda~Model|m3";
List<string> delimiters = new List<string>();
delimiters.Add("|c");//Change this to user input
delimiters.Add("|m");//change this to user input
string[] parts = data.Split(delimiters.ToArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (string item in parts)
{
Console.WriteLine(item);
}
String.Split has an overload that does exactly that - you just need to convert your List<string> to a string[] :
string input = "Car|cBlue,Mazda~Model|m3";
List<string> delims = new List<string> {"|c", "~"};
string[] out1 = input.Split(delims.ToArray(),StringSplitOptions.None);
//output:
// Car
// Blue,Mazda
// Model|m3
delims = new List<string> {"|c", "|m", ","};
string[] out2 = input.Split(delims.ToArray(),StringSplitOptions.None).Dump();
//output:
// Car
// Blue
// Mazda~Model
// 3
You can use SelectMany to get the result from all the data strings and ToArray() method to create an array from delimiters
var result = data.SelectMany(s => s.Split(delimiters.ToArray(), StringSplitOptions.None));

Dictionary with multiple keys isn't working as expected

I'm making a console program where I've got multiple values mapped to dictionary keyLookup. I'm using if commands that use the key to output some console.writeline = ("stuff"); but it only works if I have the value and the key the same (in the dictionary). I don't know why this is. I've been mucking about with list and foreach and some variables trying to figure out what I've done wrong but even though it continues to work how it works now it still doesn't work how I want.
Also if I have a word in my console.readline(); that isn't in my dictionary the whole thing crashes. Which I don't want, and I'm not sure of why its doing that either as at some point it didn't. Also my mathFunction dictionary works just how I want my keyLookup dictionary to work. Though I think the difference is in how I'm using a list to cross reference through keyLookup.
class MainClass
{
public static string Line;
static string foundKey;
public static void Main (string[] args)
{
while (true)
{
if (Line == null)
{Console.WriteLine ("Enter Input"); }
WordChecker ();
}
}
public static void WordChecker()
{
string inputString = Console.ReadLine ();
inputString = inputString.ToLower();
string[] stripChars = { ";", ",", ".", "-", "_", "^", "(", ")", "[", "]",
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "\n", "\t", "\r" };
foreach (string character in stripChars)
{
inputString = inputString.Replace(character, "");
}
// Split on spaces into a List of strings
List<string> wordList = inputString.Split(' ').ToList();
// Define and remove stopwords
string[] stopwords = new string[] { "and", "the", "she", "for", "this", "you", "but" };
foreach (string word in stopwords)
{
// While there's still an instance of a stopword in the wordList, remove it.
// If we don't use a while loop on this each call to Remove simply removes a single
// instance of the stopword from our wordList, and we can't call Replace on the
// entire string (as opposed to the individual words in the string) as it's
// too indiscriminate (i.e. removing 'and' will turn words like 'bandage' into 'bdage'!)
while ( wordList.Contains(word) )
{
wordList.Remove(word);
}
}
// Create a new Dictionary object
Dictionary<string, int> dictionary = new Dictionary<string, int>();
// Loop over all over the words in our wordList...
foreach (string word in wordList)
{
// If the length of the word is at least three letters...
if (word.Length >= 3)
{
// ...check if the dictionary already has the word.
if ( dictionary.ContainsKey(word) )
{
// If we already have the word in the dictionary, increment the count of how many times it appears
dictionary[word]++;
}
else
{
// Otherwise, if it's a new word then add it to the dictionary with an initial count of 1
dictionary[word] = 1;
}
}
List<string> dicList = new List<string>();
dicList = dictionary.Keys.ToList ();
Dictionary<string, string> keyLookup = new Dictionary<string, string>();
keyLookup["hey"] = "greeting";
keyLookup["hi"] = "greeting";
keyLookup["greeting"] = "greeting";
keyLookup["math"] = "math";
keyLookup["calculate"] = "math";
keyLookup["equation"] = "math";
foundKey = keyLookup[word];
List<string> keyList = new List<string>();
foreach (string keyWord in dicList)
{
if(keyWord == foundKey)
{keyList.Add (keyWord); }
}
foreach (string mKey in keyList)
{
if(mKey == "greeting")
{Greetings ();}
if (mKey == "math")
{Math ();}
}
}
}
public static void Math()
{
Console.WriteLine ("What do you want me to math?");
Console.WriteLine ("input a number");
string input = Console.ReadLine ();
decimal a = Convert.ToDecimal (input);
Console.WriteLine("Tell me math function");
string mFunction = Console.ReadLine();
Console.WriteLine ("tell me another number");
string inputB = Console.ReadLine();
decimal b = Convert.ToDecimal (inputB);
Dictionary<string, string> mathFunction = new Dictionary<string, string>();
mathFunction["multiply"] = "multiply";
mathFunction["times"] = "multiply";
mathFunction["x"] = "multiply";
mathFunction["*"] = "multiply";
mathFunction["divide"] = "divide";
mathFunction["/"] = "divide";
mathFunction["subtract"] = "subtract";
mathFunction["minus"] = "subtract";
mathFunction["-"] = "subtract";
mathFunction["add"] = "add";
mathFunction["+"] = "add";
mathFunction["plus"] = "add";
string foundKey = mathFunction[mFunction];
if (foundKey == "add")
{
Console.WriteLine (a + b);
}
else if (foundKey == "subtract")
{
Console.WriteLine (a - b);
}
else if (foundKey == "multiply")
{
Console.WriteLine (a * b);
}
else if (foundKey == "divide")
{
Console.WriteLine (a / b);
}
else
{
Console.WriteLine ("not a math");
}
}
public static void Greetings()
{
Console.WriteLine("You said hello");
}
}'
You should iterate through the dictionary differently (Dont use ToList-Function).
Try this instead:
foreach (KeyValuePair kvp (Of String, String) In testDictionary)
{
Debug.WriteLine("Key:" + kvp.Key + " Value:" + kvp.Value);
}
And your application is crashing if the word doesn't match, because of this code (You're not creating a new entry that way):
// Otherwise, if it's a new word then add it to the dictionary with an initial count of 1
dictionary[word] = 1;
EDIT: I was wrong about that dictionary[word] = 1 would not create a new element. It's perfectly fine like this.
foundKey = keyLookup[word];
If word doesn't exist in keyLookup then it will crash.
string foundKey = mathFunction[mFunction];
if mFunction doesn't exist in mathFunction then it will crash.
If you're trying to make this a "conversational" program, then the word look-up is the most important part. You don't use predicates or LINQ, both can make string functions extremely easy. Currently you use a Dictionary. Why not use Lists for each keyword?
List<string> GreetingKeywords;
GreetingKeywords.Add("hello"); // ...
List<string> MathKeywords;
MathKeywords.Add("math"); // ...
foreach (var word in dicList)
{
if (GreetingKeywords.Contains(word))
{ Greetings(); }
if (MathKeywords.Contains(word))
{ Maths(); }
}
I'd suggest you read up on predicate and List/Dictionary functions such as Find, IndexOf, etc. etc. That knowledge is invaluable to C#.

Occurence of elements in the file with c# and Dictionary

I have a file as
outlook temperature Humidity Windy PlayTennis
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
I want to find occurence of each element e.g
sunny: 2
rain: 3
overcast:1
hot: 3
and so on
My code is:
string file = openFileDialog1.FileName;
var text1 = File.ReadAllLines(file);
StringBuilder str = new StringBuilder();
string[] lines = File.ReadAllLines(file);
string[] nonempty=lines.Where(s => s.Trim(' ')!="")
.Select(s => Regex.Replace(s, #"\s+", " ")).ToArray();
string[] colheader = null;
if (nonempty.Length > 0)
colheader = nonempty[0].Split();
else
return;
var linevalue = nonempty.Skip(1).Select(l => l.Split());
int colcount = colheader.Length;
Dictionary<string, string> colvalue = new Dictionary<string, string>();
for (int i = 0; i < colcount; i++)
{
int k = 0;
foreach (string[] values in linevalue)
{
if(! colvalue.ContainsKey(values[i]))
{
colvalue.Add(values[i],colheader[i]);
}
label2.Text = label2.Text + k.ToString();
}
}
foreach (KeyValuePair<string, string> pair in colvalue)
{
label1.Text += pair.Key+ "\n";
}
Output I get here is
sunny
overcast
rain
hot
mild
cool
N
P
true
false
I also want to find the occurence, which I am unable to get. Can u please help me out here.
This LINQ query will return Dictionary<string, int> which will contain each word in file as key, and word's occurrences as value:
var occurences = File.ReadAllLines(file).Skip(1) // skip titles line
.SelectMany(l => l.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))
.GroupBy(w => w)
.ToDictionary(g => g.Key, g => g.Count());
Usage of dictionary:
int sunnyOccurences = occurences["sunny"];
foreach(var pair in occurences)
label1.Text += String.Format("{0}: {1}\n", pair.Key, pair.Value);
Seems to me like you are implementing a simple Tag Cloud. I have used non-generic collection but you can replace it with generic. Replace the HashTable with Dictionary
Follow this code:
Hashtable tagCloud = new Hashtable();
ArrayList frequency = new ArrayList();
Read from a file and store it as array
string[] lines = File.ReadAllLines("file.txt");
//use the specific delimiter
char[] delimiter = new char[] { ' ' };
StringBuilder buffer = new StringBuilder();
foreach (string line in lines)
{
if (line.ToString().Length != 0)
{
buffer.Append((" " + line.Trim()));
}
}
string[] words = buffer.ToString().Trim().Split(delimiter);
Storing occurrence of each word.
List<string> listOfWords = new List<string>(words);
foreach (string i in listOfWords)
{
int c = 0;
foreach (string j in words)
{
if (i.Equals(j))
c++;
}
frequency.Add(c);
}
Store as key value pair. Value will be word and key will be its occurrence
for (int i = 0; i < listOfWords.Count; i++)
{
//use dictionary here
tagCloud.Add(listOfWords[i], (int)frequency[i]);
}
If all you want is the keyword and a count of how many times they appear in the file, then lazyberezovsky's solution is about as elegant of a solution as you will find. But if you need to do any other metrics on the file's data, then I would load the file into a collection that keeps your other metadata intact.
Something simple like:
var forecasts = File.ReadAllLines(file).Skip(1) // skip the header row
.Select(line => line.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries)) // split the line into an array of strings
.Select (f =>
new
{
Outlook = f[0],
Temperature = f[1],
Humidity = f[2],
Windy = f[3],
PlayTennis = f[4]
});
will give you an IEnumerable<> of an anonymous type that has properties that can be queried.
For example if you wanted to see how many times "sunny" occurred in the Outlook then you could just use LINQ to do this:
var count = forecasts.Count( f => f.Outlook == "sunny");
Or if you just wanted the list of all outlooks you could write:
var outlooks = forecasts.Select(f => f.Outlook).Distinct();
Where this is useful is when you want to do more complicated queries like "How many rainy cool days are there?
var count = forecasts.Count (f => f.Outlook == "rain" && f.Temperature == "cool");
Again if you just want all words and their occurrence count, then this is overkill.

Categories