Remove duplicate combination of numbers in C# from csv - c#

I'm trying to remove the duplicate combination from a csv file.
I tried using Distinct but it seems to stay the same.
string path;
string newcsvpath = #"C:\Documents and Settings\MrGrimm\Desktop\clean.csv";
OpenFileDialog openfileDial = new OpenFileDialog();
if (openfileDial.ShowDialog() == DialogResult.OK)
{
path = openfileDial.FileName;
var lines = File.ReadLines(path);
var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Distinct())).ToArray();
var unique = grouped.Select(g => g.First());
var buffer = new StringBuilder();
foreach (var name in unique)
{
string value = name;
buffer.AppendLine(value);
}
File.WriteAllText(newcsvpath ,buffer.ToString());
label5.Text = "Complete";
}
For example, I have a combination of
{ 1,1,1,1,1,1,1,1 } { 1,1,1,1,1,1,1,2 }
{ 2,1,1,1,1,1,1,1 } { 1,1,1,2,1,1,1,1 }
The output should be
{ 1,1,1,1,1,1,1,1 }
{ 2,1,1,1,1,1,1,1 }

From you example, it seems that you want to treat each line as a sequence of numbers and that you consider two lines equal if one sequence is a permutation of the other.
So from reading your file, you have:
var lines = new[]
{
"1,1,1,1,1,1,1,1",
"1,1,1,1,1,1,1,2",
"2,1,1,1,1,1,1,1",
"1,1,1,2,1,1,1,1"
};
Now let's convert it to an array of number sequences:
var linesAsNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.ToArray())
.ToArray();
Or better, since we are not interested in permutations, we can sort the numbers in the sequences immediately:
var linesAsSortedNumberSequences = lines.Select(line => line.Split(',')
.Select(int.Parse)
.OrderBy(number => number)
.ToArray())
.ToArray();
When using Distinct on this, we have to pass a comparer which considers two array equal, if they have the same elements. Let's use the one from this SO question
var result = linesAsSortedNumberSequences.Distinct(new IEnumerableComparer<int>());

Try it
HashSet<string> record = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
StringBuilder textEditor= new StringBuilder();
foreach (string col in columns)
{
textEditor.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
if (!record.Add(textEditor.ToString())
{
}
}

Related

Stacking the same lines into packet, and writing to new file

I would like to know how to stack the same lines titled to packet and write to next file. For example, I had the following problem:
I read CSV file line by line, and I want to stack lines with the same titles to one packet.
file1:
Test;param1
Test;param2
Test1;param1
Test1;param2
Test1;param3
Test2;param1
result file:
Test;[param1,param2]
Test1;[param1,param2,param3]
Test2;[param1]
It does not have to be identical, but it is a hint on how to do something like that.
My code:
var enumLines = System.IO.File.ReadLines(pathZamowienia, Encoding.UTF8);
int factor = 0;
foreach (var line in enumLines)
{
var tabLine = line.Split(';').ToList();
if (factor == 0)
{
Console.WriteLine();
}
else
{
try
{
Title = tabLine[0];
}
catch (FormatException ex)
{
Console.WriteLine("Failure");
}
try
{
Param = tabLine[1];
}
catch (FormatException ex)
{
Console.WriteLine("Failure");
}
factor++;
}
You can use a LINQ query to group the lines
// Test input
var enumLines = new List<string> {
"Test;param1",
"Test;param2",
"Test1;param1",
"Test1;param2",
"Test1;param3",
"Test2;param1"
};
// Re-group the parameters
var newLines = enumLines
.Select(s => s.Split(';'))
.GroupBy(a => a[0], a => a[1])
.Select(g => g.Key + ";[" + String.Join(",", g) + "]");
// Test output:
foreach (string line in newLines) {
Console.WriteLine(line);
}
Output:
Test;[param1,param2]
Test1;[param1,param2,param3]
Test2;[param1]
Note that the group g itself is an enumeration of the aggregated values and also has a property Key. The first argument of GroupBy selects the Key, the second optional parameter selects the value to be aggregated. If it is omitted, the input (the string array a) is aggregated.
If the input includes misshaped lines, you could also exclude them with an additional Where-clause:
var newLines = enumLines
.Select(s => s.Split(';'))
.Where(a => a.Length >= 2)
.GroupBy(a => a[0], a => a[1])
.Select(g => g.Key + ";[" + String.Join(",", g) + "]");
This is what you can do. Parse your file first and then transform
var data - new Dictionary<string, List<string>>();
string[] lines = File.ReadAllLines(fileName);
foreach(string line in Lines)
{
string parts = line.Split(';');
if (!data.ContainsKey(parts[0]))
data.Add(parts[0], new List<string>());
data[parts[0]].Add(parts[1]);
}
// then you open stream and write this
foreach(var kvp in data)
{
string line = $"{kvp.Key};[{string.Join(',', kvp.Value)}]"
// write line here
}
// close stream

C# Use Regex to split on Words

This is a stripped down version of code I am working on. The purpose of the code is to take a string of information, break it down, and parse it into key value pairs.
Using the info in the example below, a string might look like:
"DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567"
One further point about the above example, at least three of the features we have to parse out will occasionally include additional values. Here is an updated fake example string.
"DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568"
The problem with this is that the code refuses to split out DIVIDE and DIV information separately. Instead, it keeps splitting at DIV and then assigning the rest of the information as the value.
Is there a way to tell my code that DIVIDE and DIV need to be parsed out as two separate values, and to not turn DIVIDE into DIV?
public List<string> FeatureFilterStrings
{
// All possible feature types from the EWSD switch.
get
{
return new List<string>() { "DIVIDE", "DIV", "CLACOS", "INT"};
}
}
public void Parse(string input){
Func<string, bool> queryFilter = delegate(string line) { return FeatureFilterStrings.Any(s => line.Contains(s)); };
Regex regex = new Regex(#"(?=\\bDIVIDE|DIV|CLACOS|INT)");
string[] ms = regex.Split(updatedInput);
List<string> queryLines = new List<string>();
// takes the parsed out data and assigns it to the queryLines List<string>
foreach (string m in ms)
{
queryLines.Add(m);
}
var features = queryLines.Where(queryFilter);
foreach (string feature in features)
{
foreach (Match m in Regex.Matches(workLine, valueExpression))
{
string key = m.Groups["key"].Value.Trim();
string value = String.Empty;
value = Regex.Replace(m.Groups["value"].Value.Trim(), #"s", String.Empty);
AddKeyValue(key, value);
}
}
private void AddKeyValue(string key, string value)
{
try
{
// Check if key already exists. If it does, remove the key and add the new key with updated value.
// Value information appends to what is already there so no data is lost.
if (this.ContainsKey(key))
{
this.Remove(key);
this.Add(key, value.Split('&'));
}
else
{
this.Add(key, value.Split('&'));
}
}
catch (ArgumentException)
{
// Already added to the dictionary.
}
}
}
Further information, the string information does not have a set number of spaces between each key/value, each string may not include all of the values, and the features aren't always in the same order. Welcome to parsing old telephone switch information.
I would create a dictionary from your input string
string input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var dict = Regex.Matches(input, #"(\w+?) = (.+?)( |$)").Cast<Match>()
.ToDictionary(m => m.Groups[1].Value, m => m.Groups[2].Value);
Test the code:
foreach(var kv in dict)
{
Console.WriteLine(kv.Key + "=" + kv.Value);
}
This might be a simple alternative for you.
Try this code:
var input = "DIVIDE = KE48 CLACOS = 4556D DIV = 3466 INT = 4567";
var parts = input.Split(new [] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(xs => xs[0], xs => xs[1]);
I then get the following dictionary:
Based on your updated input, things get more complicated, but this works:
var input = "DIVIDE = KE48, KE49, KE50 CLACOS = 4566D DIV = 3466 INT = 4567 & 4568";
Func<string, char, string> tighten =
(i, c) => String.Join(c.ToString(), i.Split(c).Select(x => x.Trim()));
var parts =
tighten(tighten(input, '&'), ',')
.Split(new[] { '=', ' ' }, StringSplitOptions.RemoveEmptyEntries);
var dictionary =
parts
.Select((x, n) => new { x, n })
.GroupBy(xn => xn.n / 2, xn => xn.x)
.Select(xs => xs.ToArray())
.ToDictionary(
xs => xs[0],
xs => xs
.Skip(1)
.SelectMany(x => x.Split(','))
.SelectMany(x => x.Split('&'))
.ToArray());
I get this dictionary:

Counting words using LinkedList

I have a class WordCount which has string wordDic and int count. Next, I have a List.
I have ANOTHER List which has lots of words inside it. I am trying to use List to count the occurrences of each word inside List.
Below is where I am stuck.
class WordCount
{
string wordDic;
int count;
}
List<WordCount> usd = new List<WordCount>();
foreach (string word in wordsList)
{
if (usd.wordDic.Contains(new WordCount {wordDic=word, count=0 }))
usd.count[value] = usd.counts[value] + 1;
else
usd.Add(new WordCount() {wordDic=word, count=1});
}
I don't know how to properly implement this in code but I am trying to search my List to see if the word in wordsList already exists and if it does, add 1 to count but if it doesn't then insert it inside usd with count of 1.
Note: *I have to use Lists to do this. I am not allowed to use anything else like hash tables...*
This is the answer before you edited to only use lists...btw, what is driving that requirement?
List<string> words = new List<string> {...};
// For case-insensitive you can instantiate with
// new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase)
Dictionary<string, int> counts = new Dictionary<string, int>();
foreach (string word in words)
{
if (counts.ContainsKey(word))
{
counts[word] += 1;
}
else
{
counts[word] = 1;
}
}
If you can only use lists, Can you use List<KeyValuePair<string,int>> counts which is the same thing as a dictionary (although I'm not sure it would guarantee uniqueness). The solution would be very similar. If you can only use lists the following will work.
List<string> words = new List<string>{...};
List<string> foundWord = new List<string>();
List<int> countWord = new List<int>();
foreach (string word in words)
{
if (foundWord.Contains(word))
{
countWord[foundWord.IndexOf(word)] += 1;
}
else
{
foundWord.Add(word);
countWord.Add(1);
}
}
Using your WordCount class
List<string> words = new List<string>{...};
List<WordCount> foundWord = new List<WordCount>();
foreach (string word in words)
{
WordCount match = foundWord.SingleOrDefault(w => w.wordDic == word);
if (match!= null)
{
match.count += 1;
}
else
{
foundWord.Add(new WordCount { wordDic = word, count = 1 });
}
}
You can use Linq to do this.
static void Main(string[] args)
{
List<string> wordsList = new List<string>()
{
"Cat",
"Dog",
"Cat",
"Hat"
};
List<WordCount> usd = wordsList.GroupBy(x => x)
.Select(x => new WordCount() { wordDic = x.Key, count = x.Count() })
.ToList();
}
Use linq: Assuming your list of words :
string[] words = { "blueberry", "chimpanzee", "abacus", "banana", "abacus","apple", "cheese" };
You can do:
var count =
from word in words
group word.ToUpper() by word.ToUpper() into g
where g.Count() > 0
select new { g.Key, Count = g.Count() };
(or in your case, select new WordCount()... it'll depend on how you have your constructor set up)...
the result will look like:
First, all of your class member is private, thus, they could not be accessed somewhere out of your class. Let's assume you're using them in WordCount class too.
Second, your count member is an int. Therefore, follow statement will not work:
usd.count[value] = usd.counts[value] + 1;
And I think you've made a mistype between counts and count.
To solve your problem, find the counter responding your word. If it exists, increase count value, otherwise, create the new one.
foreach (string word in wordsList) {
WordCount counter = usd.Find(c => c.wordDic == word);
if (counter != null) // Counter exists
counter.count++;
else
usd.Add(new WordCount() { wordDic=word, count = 1 }); // Create new one
}
You should use a Dictionary as its faster when using the "Contains" method.
Just replace your list with this
Dictionary usd = new Dictionary();
foreach (string word in wordsList)
{
if (usd.ContainsKey(word.ToLower()))
usd.count[word.ToLower()].count++;
else
usd.Add(word.ToLower(), new WordCount() {wordDic=word, count=1});
}

C#: Loop over Textfile, split it and Print a new Textfile

I get many lines of String as an Input that look like this. The Input is a String that comes from
theObjects.Runstate;
each #VAR;****;#ENDVAR; represents one Line and one step in the loop.
#VAR;Variable=Speed;Value=Fast;Op==;#ENDVAR;#VAR;Variable=Fabricator;Value=Freescale;Op==;#ENDVAR;
I split it, to remove the unwanted fields, like #VAR,#ENDVAR and Op==.
The optimal Output would be:
Speed = Fast;
Fabricator = Freescale; and so on.
I am able to cut out the #VAR and the#ENDVAR. Cutting out the "Op==" wont be that hard, so thats now not the main focus of the question. My biggest concern right now is,thatI want to print the Output as a Text-File. To print an Array I would have to loop over it. But in every iteration, when I get a new line, I overwrite the Array with the current splitted string. I think the last line of the Inputfile is an empty String, so the Output I get is just an empty Text-File. It would be nice if someone could help me.
string[] w;
Textwriter tw2;
foreach (EA.Element theObjects in myPackageObject.Elements)
{
theObjects.Type = "Object";
foreach (EA.Element theElements in PackageHW.Elements)
{
if (theObjects.ClassfierID == theElements.ElementID)
{
t = theObjects.RunState;
w = t.Replace("#ENDVAR;", "#VAR;").Replace("#VAR;", ";").Split(new string[] { ";" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in w)
{
tw2.WriteLine(s);
}
}
}
}
This linq-query gives the exptected result:
var keyValuePairLines = File.ReadLines(pathInputFile)
.Select(l =>
{
l = l.Replace("#VAR;", "").Replace("#ENDVAR;", "").Replace("Op==;", "");
IEnumerable<string[]> tokens = l.Split(new[]{';'}, StringSplitOptions.RemoveEmptyEntries)
.Select(t => t.Split('='));
return tokens.Select(t => {
return new KeyValuePair<string, string>(t.First(), t.Last());
});
});
foreach(var keyValLine in keyValuePairLines)
foreach(var keyVal in keyValLine)
Console.WriteLine("Key:{0} Value:{1}", keyVal.Key, keyVal.Value);
Output:
Key:Variable Value:Speed
Key:Value Value:Fast
Key:Variable Value:Fabricator
Key:Value Value:Freescale
If you want to output it to another text-file with one key-value pair on each line:
File.WriteAllLines(pathOutputFile, keyValuePairLines.SelectMany(l =>
l.Select(kv => string.Format("{0}:{1}", kv.Key, kv.Value))));
Edit according to your question in the comment:
"What would I have to change/add so that the Output is like this. I
need AttributeValuePairs, for example: Speed = Fast; or Fabricator =
Freescale ?"
Now i understand the logic, you have key-value pairs but you are interested only in the values. So every two key-values belong together, the first value of a pair specifies the attibute and the second value the value of that attribute(f.e. Speed=Fast).
Then it's a little bit more complicated:
var keyValuePairLines = File.ReadLines(pathInputFile)
.Select(l =>
{
l = l.Replace("#VAR;", "").Replace("#ENDVAR;", "").Replace("Op==;", "");
string[] tokens = l.Split(new[]{';'}, StringSplitOptions.RemoveEmptyEntries);
var lineValues = new List<KeyValuePair<string, string>>();
for(int i = 0; i < tokens.Length; i += 2)
{
// Value to a variable can be found on the next index, therefore i += 2
string[] pair = tokens[i].Split('=');
string key = pair.Last();
string value = null;
string nextToken = tokens.ElementAtOrDefault(i + 1);
if (nextToken != null)
{
pair = nextToken.Split('=');
value = pair.Last();
}
var keyVal = new KeyValuePair<string, string>(key, value);
lineValues.Add(keyVal);
}
return lineValues;
});
File.WriteAllLines(pathOutputFile, keyValuePairLines.SelectMany(l =>
l.Select(kv=>string.Format("{0} = {1}", kv.Key, kv.Value))));
Output in the file with your single sample-line:
Speed = Fast
Fabricator = Freescale

Create dynamic string array in C# and add strings (outcome of a split method) into two separate arrays through a loop

I have a list of strings which includes strings in format: xx#yy
xx = feature name
yy = project name
Basically, I want to split these strings at # and store the xx part in one string array and the yy part in another to do further operations.
string[] featureNames = all xx here
string[] projectNames = all yy here
I am able to split the strings using the split method (string.split('#')) in a foreach or for loop in C# but I can't store two parts separately in two different string arrays (not necessarily array but a list would also work as that can be converted to array later on).
The main problem is to determine two parts of a string after split and then appends them to string array separately.
This is one simple approach:
var xx = new List<string>();
var yy = new List<string>();
foreach(var line in listOfStrings)
{
var split = string.split('#');
xx.Add(split[0]);
yy.Add(split[1]);
}
The above instantiates a list of xx and and a list of yy, loops through the list of strings and for each one splits it. It then adds the results of the split to the previously instantiated lists.
How about the following:
List<String> xx = new List<String>();
List<String> yy = new List<String>();
var strings = yourstring.Split('#');
xx.Add(strings.First());
yy.Add(strings.Last());
var featureNames = new List<string>();
var productNames = new List<string>();
foreach (var productFeature in productFeatures)
{
var parts = productFeature.Split('#');
featureNames.Add(parts[0]);
productNames.Add(parts[1]);
}
How about
List<string> lst = ... // your list containging xx#yy
List<string> _featureNames = new List<string>();
List<string> _projectNames = new List<string>();
lst.ForEach(x =>
{
string[] str = x.Split('#');
_featureNames.Add(str[0]);
_projectNames.Add(str[1]);
}
string[] featureNames = _featureNames.ToArray();
string[] projectNames = _projectNames.ToArray();
You can do something like this:
var splits = input.Select(v => v.Split('#'));
var features = splits.Select(s => s[0]).ToList();
var projects = splits.Select(s => s[1]).ToList();
If you don't mind slightly more code but better performance and less pressure on garbage collector then:
var features = new List<string>();
var projects = new List<string>();
foreach (var split in input.Select(v => v.Split('#')))
{
features.Add(split[0]);
projects.Add(split[1]);
}
But overall I'd suggest to create class and parse your input (more C#-style approach):
public class ProjectFeature
{
public readonly string Project;
public readonly string Feature;
public ProjectFeature(string project, string feature)
{
this.Project = project;
this.Feature = feature;
}
public static IEnumerable<ProjectFeature> ParseList(IEnumerable<string> input)
{
return input.Select(v =>
{
var split = v.Split('#');
return new ProjectFeature(split[1], split[0]);
}
}
}
and use it later (just an example of possible usage):
var projectFeatures = ProjectFeature.ParseList(File.ReadAllLines(#"c:\features.txt")).ToList();
var features = projectFeatures.Select(f => f.Feature).ToList();
var projects = projectFeatures.Select(f => f.Project).ToList();
// ??? etc.
var all_XX = yourArrayOfStrings.Select(str => str.split('\#')[0]); // this will be IENumerable
var all_YY = yourArrayOfStrings.Select(str => str.split('\#')[1]); // the same fot YY. But here make sure that element at [1] exists
The main problem is to determine two parts of a string after split and then appends them to string array separately.
Why the different arrays? Wouldn't a dictionary be more fitting?
List<String> input = File.ReadAllLines().ToList<String>(); // or whatever
var output = new Dictionary<String, String>();
foreach (String line in input)
{
var parts = input.Split('#');
output.Add(parts[0], parts[1]);
}
foreach (var feature in output)
{
Console.WriteLine("{0}: {1}", feature.Key, feature.Value);
}
Try this.
var ls = new List<string>();
ls.Add("123#project");
ls.Add("123#project1");
var f = from c in ls
select new
{
XX = c.Split("#")[0],
YY = c.Split("#")[1]
};
string [] xx = f.Select (x => x.XX).ToArray();
string [] yy = f.Select (x => x.YY).ToArray();

Categories