How to do cascade splitting with C# Linq - multiple foreach split - c#

These are the values i want to split the string cascadingly
List<string> lstsplitWord = new List<string> { ",", "=", "،", "أو", "او", "/", "." };
I have written them as like this but i am assuming that there must be more elegant Linq solution for this
foreach(var part1 in srSplitPart.Split(',')) {
foreach(var part2 in part1.Split('=')) {
foreach(var part3 in part2.Split('،')) {
foreach(var part4 in part3.func_Split_By_String("أو")) {
foreach(var part5 in part4.func_Split_By_String("او")) {
foreach(var part6 in part5.Split('/')) {
foreach(var part7 in part6.Split('.')) {
if (part7.Length < 3)
continue;
string srTrans = part7.FixArabic().func_Special_Trim();
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}
}
}
}
}
}
}
C# .net 4.6.2
special split function
public static List<string> func_Split_By_String(this string Sentence, string srReplace)
{
return Sentence.Split(new string[] { srReplace }, StringSplitOptions.None).ToList();
}

You can just iteratively split every element to smaller parts in a given order:
string originalString = ...;
List<string> separators = new List<string> { ",", "=", "،", "أو", "او", "/", "." };
string[] result = new[] { originalString };
foreach (var separator in separators)
{
result = result.SelectMany(x => x.Split(new[] { separator }, StringSplitOptions.RemoveEmptyEntries)).ToArray();
}
result = result
.Where(x => x.Length >= 3)
.Select(x => x.FixArabic().func_Special_Trim())
.ToArray();
foreach (var item in result)
{
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}
At the beginning, your array will contain only your original string.
After the first foreach iteration array will contain original string separated by ",".
After the second foreach iteration every comma-separated part will be separated by =.
It will repeat until result array contains only strings separated by all given separators. It then applies Length >= 3 condition and FixArabic() and func_Special_Trim().
Update: I have just understood one thing - applying all separators in a given order results into the same string array as simply applying all separators without order.
So, actually, you can just do:
string originalString = ...;
string[] separators = new[] { ",", "=", "،", "أو", "او", "/", "." };
string[] result = originalString
.Split(separators, StringSplitOptions.RemoveEmptyEntries)
.Where(x => x.Length >= 3)
.Select(x => x.FixArabic().func_Special_Trim())
.ToArray();
foreach (var item in result)
{
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}

Related

How to split IEnumerable<string> to groups by separator?

I have IEnumerable<string> which represents txt file.
Txt file have this structure:
Number of group ( int )
WordOfGroup1 (string)
WordOfGroup2
WordOfGroupN
EmptyLine
Number of group ( int )
WordOfGroup1 (string)
etc.
I need create from this text Dictionary<fistWordOfGroup(string), allWordsInGroup(List<string>)
How i can make that in linear complexity?
Try the algorithm below. This will add a group of words to the dictionary whenever it comes across an empty line.
List<string> input = new List<string>()
{
"1",
"wordOfGroup11",
"wordOfGroup12",
"wordOfGroup1N",
"\n",
"2",
"wordOfGroup21",
"wordOfGroup22",
"\n"
};
Dictionary<string, List<string>> result = new Dictionary<string, List<string>>();
string firstWordOfGroup = "";
List<string> allWordsInGroup = new List<string>();
foreach (string line in input)
{
if (int.TryParse(line, out int index) == true)
{
allWordsInGroup.Clear();
continue;
}
// I don't know what "EmptyLine" means
if (line == "\n" || line == Environment.NewLine || line == string.Empty)
{
result.Add(firstWordOfGroup, allWordsInGroup);
}
else
{
if (allWordsInGroup.Count == 0)
{
firstWordOfGroup = line;
}
allWordsInGroup.Add(line);
}
}
Also note that if your groups can have the same first word (e.g. both starting with "WordOfGroup1" then you should use a List<KeyValuePair<string, List<string>>> because the dictionary does not store duplicate keys.

Remove duplicate combination of numbers in C# from csv

I'm trying to remove the duplicate combination from a csv file.
I tried using Distinct but it seems to stay the same.
string path;
string newcsvpath = #"C:\Documents and Settings\MrGrimm\Desktop\clean.csv";
OpenFileDialog openfileDial = new OpenFileDialog();
if (openfileDial.ShowDialog() == DialogResult.OK)
{
path = openfileDial.FileName;
var lines = File.ReadLines(path);
var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Distinct())).ToArray();
var unique = grouped.Select(g => g.First());
var buffer = new StringBuilder();
foreach (var name in unique)
{
string value = name;
buffer.AppendLine(value);
}
File.WriteAllText(newcsvpath ,buffer.ToString());
label5.Text = "Complete";
}
For example, I have a combination of
{ 1,1,1,1,1,1,1,1 } { 1,1,1,1,1,1,1,2 }
{ 2,1,1,1,1,1,1,1 } { 1,1,1,2,1,1,1,1 }
The output should be
{ 1,1,1,1,1,1,1,1 }
{ 2,1,1,1,1,1,1,1 }
From you example, it seems that you want to treat each line as a sequence of numbers and that you consider two lines equal if one sequence is a permutation of the other.
So from reading your file, you have:
var lines = new[]
{
"1,1,1,1,1,1,1,1",
"1,1,1,1,1,1,1,2",
"2,1,1,1,1,1,1,1",
"1,1,1,2,1,1,1,1"
};
Now let's convert it to an array of number sequences:
var linesAsNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.ToArray())
.ToArray();
Or better, since we are not interested in permutations, we can sort the numbers in the sequences immediately:
var linesAsSortedNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.OrderBy(number => number)
.ToArray())
.ToArray();
When using Distinct on this, we have to pass a comparer which considers two array equal, if they have the same elements. Let's use the one from this SO question
var result = linesAsSortedNumberSequences.Distinct(new IEnumerableComparer<int>());
Try it
HashSet<string> record = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
StringBuilder textEditor= new StringBuilder();
foreach (string col in columns)
{
textEditor.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
if (!record.Add(textEditor.ToString())
{
}
}

C# Use Regex to split on Words

This is a stripped down version of code I am working on. The purpose of the code is to take a string of information, break it down, and parse it into key value pairs.
Using the info in the example below, a string might look like:
"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"
One further point about the above example, at least three of the features we have to parse out will occasionally include additional values. Here is an updated fake example string.
"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"
The problem with this is that the code refuses to split out DIVIDE and DIV information separately. Instead, it keeps splitting at DIV and then assigning the rest of the information as the value.
Is there a way to tell my code that DIVIDE and DIV need to be parsed out as two separate values, and to not turn DIVIDE into DIV?
public List<string> FeatureFilterStrings
{
// All possible feature types from the EWSD switch.
get
{
return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
}
}
public void Parse(string input){
Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };
Regex regex = new Regex(#"(?=\\bDIVIDE|DIV|CLACOS|INT)");
string[] ms = regex.Split(updatedInput);
List<string> queryLines = new List<string>();
// takes the parsed out data and assigns it to the queryLines List<string>
foreach (string m in ms)
{
queryLines.Add(m);
}
var features = queryLines.Where(queryFilter);
foreach (string feature in features)
{
foreach (Match m in Regex.Matches(workLine, valueExpression))
{
string key = m.Groups["key"].Value.Trim();
string value = String.Empty;
value = Regex.Replace(m.Groups["value"].Value.Trim(), #"s", String.Empty);
AddKeyValue(key, value);
}
}
private void AddKeyValue(string key, string value)
{
try
{
// Check if key already exists. If it does, remove the key and add the new key with updated value.
// Value information appends to what is already there so no data is lost.
if (this.ContainsKey(key))
{
this.Remove(key);
this.Add(key, value.Split('&'));
}
else
{
this.Add(key, value.Split('&'));
}
}
catch (ArgumentException)
{
// Already added to the dictionary.
}
}
}
Further information, the string information does not have a set number of spaces between each key/value, each string may not include all of the values, and the features aren't always in the same order. Welcome to parsing old telephone switch information.
I would create a dictionary from your input string
string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var dict = Regex.Matches(input, #"(\w+?) = (.+?)( |$)").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
Test the code:
foreach(var kv in dict)
{
Console.WriteLine(kv.Key + "=" + kv.Value);
}
This might be a simple alternative for you.
Try this code:
var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(xs => xs[0], xs => xs[1]);
I then get the following dictionary:
Based on your updated input, things get more complicated, but this works:
var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";
Func<string, char, string> tighten =
(i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));
var parts =
tighten(tighten(input, '&'), ',')
.Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts
.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(
xs => xs[0],
xs => xs
.Skip(1)
.SelectMany(x => x.Split(','))
.SelectMany(x => x.Split('&'))
.ToArray());
I get this dictionary:

How to Split an Already Split String

I have a code as below.
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (!test.Exists(str => str == splitText[0]))
test.Add(splitText[0]);
}
I'm getting values like "Under 56.5 Points (+56.5)".
Now I want to split again with everything after '(' for each items in the list so i will get a new list and can use it. How can I do that?
if you want to extract value inside parenthesis:
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (!test.Exists(str => str == splitText[0]))
if(splitText[0].Contains("("))
test.Add(splitText[0].Split('(', ')')[1]);
else
test.Add(splitText[0]);
}
Well, assuming you are after a solution without regular expressions, and that you have a List<string> test declared, you can follow up with a substring, with indexes (and some error handling):
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (splitText.Length == 0)
continue;
string stringToCheck = splitText[0];
int openParenIndex = stringToCheck.IndexOf('(');
int closeParenIndex = stringToCheck.LastIndexOf(')');
if (openParenIndex >=0 && closeParenIndex >= 0)
{
// get what's inside the outermost set of parens
int length = closeParenIndex - openParenIndex + 1;
stringToCheck = stringToCheck.Substring(openParenIndex, length);
}
if (!test.Exists(str => str == splitText[0]))
test.Add(splitText[0]);
}
You can find out about all of the methods to use with strings here.

Split a string base on multiple delimiters specified by user

Updated: Thank you for the answer, but I disagree that my question is answered by another thread. "Multiple delimiters" and "Multi-Character delimiters" are 2 different questions.
This is my code so far:
List<string> delimiters = new List<string>();
List<string> data = new List<string>
{
"Car|cBlue,Mazda~Model|m3",
//More data
};
string userInput = "";
int i = 1;
//The user can enter a maximum of 5 delimiters
while (userInput != "go" && i <= 5)
{
userInput = Console.ReadLine();
delimiters.Add(userInput);
i++;
}
foreach (string delimiter in delimiters)
{
foreach (string s in data)
{
//This split is not working
//string output[] = s.Split(delimiter);
}
}
So, if the user enters "|c" and "~", the expected output is: "Car", "Blue,Mazda", "Model|m3"
If the user enters "|c", "|m", and ",", then the expected output will be: "Car", "Blue", "Mazda~Model", "3"
Add the user input into the List delimiters.
string data = "Car|cBlue,Mazda~Model|m3";
List<string> delimiters = new List<string>();
delimiters.Add("|c");//Change this to user input
delimiters.Add("|m");//change this to user input
string[] parts = data.Split(delimiters.ToArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (string item in parts)
{
Console.WriteLine(item);
}
String.Split has an overload that does exactly that - you just need to convert your List<string> to a string[] :
string input = "Car|cBlue,Mazda~Model|m3";
List<string> delims = new List<string> {"|c", "~"};
string[] out1 = input.Split(delims.ToArray(),StringSplitOptions.None);
//output:
// Car
// Blue,Mazda
// Model|m3
delims = new List<string> {"|c", "|m", ","};
string[] out2 = input.Split(delims.ToArray(),StringSplitOptions.None).Dump();
//output:
// Car
// Blue
// Mazda~Model
// 3
You can use SelectMany to get the result from all the data strings and ToArray() method to create an array from delimiters
var result = data.SelectMany(s => s.Split(delimiters.ToArray(), StringSplitOptions.None));

Categories