List<T> - distinction by T.field - c#

I have a List<X> where X has a couple of fields:
public string word;
public int count;
how do I get a List<X> with distinct X.word elements?

You can use grouping
var n = from n in items
group n by n.word into g
select g.First();

MoreLinq has a DistinctBy method:
var distinctByWord = list.DistinctBy(x => x.Word).ToList();

From your data structure, I'd suggest you probably want a Dictionary instead of a list.
If you are doing something like counting the number of times a word is seen, or even combining (word,count) pairs from some other input by adding the counts, it will be more efficient to do this with a Dictionary because you won't have to scan the list to find the entry to update.

You'll need to use the overload of the Distinct method that takes an instance of IEqualityComparer<X>:
new List<X>().Distinct(new XComparer());
public class XComparer : IEqualityComparer<X> {
public bool Equals(X x, X y) {
return x.word.Equals(y.word);
}
public int GetHashCode(X obj) {
return obj.word.GetHashCode();
}
}
public class X {
public string Word { get; set; }
public int Count { get; set; }
}
And then:
var myList = new List<X>() {
new X(){ Count = 1, Word = "A" },
new X(){ Count = 2, Word = "A"},
new X(){ Count = 1, Word = "B"}
};
foreach(var x in myList.Distinct(new XComparer()))
Console.WriteLine(x.Count + " " + x.Word);
Prints:
1 A
1 B

I think the idea is to count the words and not to lose counts for words with the same name, right? If so, it reminds me map-reduce algorithm. You have already done map, so you need to do reduce somehow. I recommend you to create new Dictionary<string,int> and loop your list. If Dictionary does not have word - add it (key - word, count - value), if has - add count to value.

Related

Extract overlapping substrings from a list of strings using LINQ

I have the following list of strings {"a","b","c","d","e"}. How can I obtain sublists of length 3 using LINQ like this:
{"a","b","c"}
{"b","c","d"}
{"c","d","e"}
I am not looking for every combination
var list = students.OrderBy(student => student.LastName)
.Select(student => student);
List<Student> sortedStudents = list.ToList();
var triplets = from x in sortedStudents
from y in sortedStudents
from z in sortedStudents
select new { x, y, z};
StudentListBox.ItemsSource = triplets;
I am not looking for something like
{"a","b","c"}
{"a","b","d"}
{"a","b","e"}
.............
{"d","a","b"}
{"d","a","c"}
{"d","a","e"} and so on
Student class
class Student
{
public Student()
{
}
public String FirstName
{
get;
set;
}
public String LastName
{
get;
set;
}
public DateTime Birthday
{
get;
set;
}
public override string ToString()
{
return FirstName + " " + LastName;
}
}
You can us an overload of Select which gets the index of current element as an extra parameter to the selector and use it like this:
var triplets = sortedStudents.Take(list.Count - 2)
.Select((x, i) => new { S1 = x, S2 = list[i+1], S3 = list[i+2] });
Here is one approach with Linq - .Take(3)defines length of 3
string[] input = { "a", "b", "c", "d", "e" };
var result = Enumerable.Range(0, input.Length - 2).Select(x => input.Skip(x).Take(3));
Just loop the strings in your array:
public IEnumerable<IEnumerable<string>> GetTriples(string[] myArray)
{
for (int i = 0; i < myArray.Length - 2; i++)
{
yield return myArray.Skip(i).Take(3);
}
}
This codes loops every string in your array and gets the next two strings.
Assuming (because you haven't got a complete code sample) that you want to take triples of items from your collection in the order in which they appear, you can use a combination of Skip and Take to give you subsets representing your triples.
var triplets = new List<IEnumerable<Student>>();
for(int i = 0; i < (sortedStudents.Count - 2); i++)
{
triplets.Add(sortedStudents.Skip(i).Take(3));
}

Sort a List in which each element contains 2 Values

I have a text file that contains Values in this Format: Time|ID:
180|1
60 |2
120|3
Now I want to sort them by Time. The Output also should be:
60 |2
120|3
180|1
How can I solve this problem? With this:
var path = #"C:\Users\admin\Desktop\test.txt";
List<string> list = File.ReadAllLines(path).ToList();
list.Sort();
for (var i = 0; i < list.Count; i++)
{
Console.WriteLine(list[i]);
}
I got no success ...
3 steps are necessary to do the job:
1) split by the separator
2) convert to int because in a string comparison a 6 comes after a 1 or 10
3) use OrderBy to sort your collection
Here is a linq solution in one line doing all 3 steps:
list = list.OrderBy(x => Convert.ToInt32(x.Split('|')[0])).ToList();
Explanation
x => lambda expression, x denotes a single element in your list
x.Split('|')[0] splits each string and takes only the first part of it (time)
Convert.ToInt32(.. converts the time into a number so that the ordering will be done in the way you desire
list.OrderBy( sorts your collection
EDIT:
Just to understand why you got the result in the first place here is an example of comparison of numbers in string representation using the CompareTo method:
int res = "6".CompareTo("10");
res will have the value of 1 (meaning that 6 is larger than 10 or 6 follows 10)
According to the documentation->remarks:
The CompareTo method was designed primarily for use in sorting or alphabetizing operations.
You should parse each line of the file content and get values as numbers.
string[] lines = File.ReadAllLines("path");
// ID, time
var dict = new Dictionary<int, int>();
// Processing each line of the file content
foreach (var line in lines)
{
string[] splitted = line.Split('|');
int time = Convert.ToInt32(splitted[0]);
int ID = Convert.ToInt32(splitted[1]);
// Key = ID, Value = Time
dict.Add(ID, time);
}
var orderedListByID = dict.OrderBy(x => x.Key).ToList();
var orderedListByTime = dict.OrderBy(x => x.Value).ToList();
Note that I use your ID reference as Key of dictionary assuming that ID should be unique.
Short code version
// Key = ID Value = Time
var orderedListByID = lines.Select(x => x.Split('|')).ToDictionary(x => Convert.ToInt32(x[1]), x => Convert.ToInt32(x[0])).OrderBy(x => x.Key).ToList();
var orderedListByTime = lines.Select(x => x.Split('|')).ToDictionary(x => Convert.ToInt32(x[1]), x => Convert.ToInt32(x[0])).OrderBy(x => x.Value).ToList();
You need to convert them to numbers first. Sorting by string won't give you meaningful results.
times = list.Select(l => l.Split('|')[0]).Select(Int32.Parse);
ids = list.Select(l => l.Split('|')[1]).Select(Int32.Parse);
pairs = times.Zip(ids, (t, id) => new{Time = t, Id = id})
.OrderBy(x => x.Time)
.ToList();
Thank you all, this is my Solution:
var path = #"C:\Users\admin\Desktop\test.txt";
List<string> list = File.ReadAllLines(path).ToList();
list = list.OrderBy(x => Convert.ToInt32(x.Split('|')[0])).ToList();
for(var i = 0; i < list.Count; i++)
{
Console.WriteLine(list[i]);
}
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class TestClass {
public static void main(String[] args) {
List <LineItem> myList = new ArrayList<LineItem>();
myList.add(LineItem.getLineItem(500, 30));
myList.add(LineItem.getLineItem(300, 20));
myList.add(LineItem.getLineItem(900, 100));
System.out.println(myList);
Collections.sort(myList);
System.out.println("list after sort");
System.out.println(myList);
}
}
class LineItem implements Comparable<LineItem>{
int time;
int id ;
#Override
public String toString() {
return ""+ time + "|"+ id + " ";
}
#Override
public int compareTo(LineItem o) {
return this.time-o.time;
}
public static LineItem getLineItem( int time, int id ){
LineItem l = new LineItem();
l.time=time;
l.id=id;
return l;
}
}

How to compare two csv files by 2 columns?

I have 2 csv files
1.csv
spain;russia;japan
italy;russia;france
2.csv
spain;russia;japan
india;iran;pakistan
I read both files and add data to lists
var lst1= File.ReadAllLines("1.csv").ToList();
var lst2= File.ReadAllLines("2.csv").ToList();
Then I find all unique strings from both lists and add it to result lists
var rezList = lst1.Except(lst2).Union(lst2.Except(lst1)).ToList();
rezlist contains this data
[0] = "italy;russia;france"
[1] = "india;iran;pakistan"
At now I want to compare, make except and union by second and third column in all rows.
1.csv
spain;russia;japan
italy;russia;france
2.csv
spain;russia;japan
india;iran;pakistan
I think I need to split all rows by symbol ';' and make all 3 operations (except, distinct and union) but cannot understand how.
rezlist must contains
india;iran;pakistan
I added class
class StringLengthEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
...
}
public int GetHashCode(string obj)
{
...
}
}
StringLengthEqualityComparer stringLengthComparer = new StringLengthEqualityComparer();
var rezList = lst1.Except(lst2,stringLengthComparer ).Union(lst2.Except(lst1,stringLengthComparer),stringLengthComparer).ToList();
Your question is not very clear: for instance, is india;iran;pakistan the desired result primarily because russia is at element[1]? Isn't it also included because element [2] pakistan does not match france and japan? Even though thats unclear, I assume the desired result comes from either situation.
Then there is this: find all unique string from both lists which changes the nature dramatically. So, I take it that the desired results are because "iran" appears in column[1] no where else in column[1] in either file and even if it did, that row would still be unique due to "pakistan" in col[2].
Also note that a data sample of 2 leaves room for a fair amount of error.
Trying to do it in one step makes it very confusing. Since eliminating dupes found in 1.CSV is pretty easy, do it first:
// parse "1.CSV"
List<string[]> lst1 = File.ReadAllLines(#"C:\Temp\1.csv").
Select(line => line.Split(';')).
ToList();
// parse "2.CSV"
List<string[]> lst2 = File.ReadAllLines(#"C:\Temp\2.csv").
Select(line => line.Split(';')).
ToList();
// extracting once speeds things up in the next step
// and leaves open the possibility of iterating in a method
List<List<string>> tgts = new List<List<string>>();
tgts.Add(lst1.Select(z => z[1]).Distinct().ToList());
tgts.Add(lst1.Select(z => z[2]).Distinct().ToList());
var tmpLst = lst2.Where(x => !tgts[0].Contains(x[1]) ||
!tgts[1].Contains(x[2])).
ToList();
That results in the items which are not in 1.CSV (no matching text in Col[1] nor Col[2]). If that is really all you need, you are done.
Getting unique rows within 2.CSV is trickier because you have to actually count the number of times each Col[1] item occurs to see if it is unique; then repeat for Col[2]. This uses GroupBy:
var unique = tmpLst.
GroupBy(g => g[1], (key, values) =>
new GroupItem(key,
values.ToArray()[0],
values.Count())
).Where(q => q.Count == 1).
GroupBy(g => g.Data[2], (key, values) => new
{
Item = string.Join(";", values.ToArray()[0]),
Count = values.Count()
}
).Where(q => q.Count == 1).Select(s => s.Item).
ToList();
The GroupItem class is trivial:
class GroupItem
{
public string Item { set; get; } // debug aide
public string[] Data { set; get; }
public int Count { set; get; }
public GroupItem(string n, string[] d, int c)
{
Item = n;
Data = d;
Count = c;
}
public override string ToString()
{
return string.Join(";", Data);
}
}
It starts with tmpList, gets the rows with a unique element at [1]. It uses a class for storage since at this point we need the array data for further review.
The second GroupBy acts on those results, this time looking at col[2]. Finally, it selects the joined string data.
Results
Using 50,000 random items in File1 (1.3 MB), 15,000 in File2 (390 kb). There were no naturally occurring unique items, so I manually made 8 unique in 2.CSV and copied 2 of them into 1.CSV. The copies in 1.CSV should eliminate 2 if the 8 unique rows in 2.CSV making the expected result 6 unique rows:
NepalX and ItalyX were the repeats in both files and they correctly eliminated each other.
With each step it is scanning and working with less and less data, which seems to make it pretty fast for 65,000 rows / 130,000 data elements.
your GetHashCode()-Method in EqualityComparer are buggy. Fixed version:
public int GetHashCode(string obj)
{
return obj.Split(';')[1].GetHashCode();
}
now the result are correct:
// one result: "india;iran;pakistan"
btw. "StringLengthEqualityComparer"is not a good name ;-)
private void GetUnion(List<string> lst1, List<string> lst2)
{
List<string> lstUnion = new List<string>();
foreach (string value in lst1)
{
string valueColumn1 = value.Split(';')[0];
string valueColumn2 = value.Split(';')[1];
string valueColumn3 = value.Split(';')[2];
string result = lst2.FirstOrDefault(s => s.Contains(";" + valueColumn2 + ";" + valueColumn3));
if (result != null)
{
if (!lstUnion.Contains(result))
{
lstUnion.Add(result);
}
}
}
}
class Program
{
static void Main(string[] args)
{
var lst1 = File.ReadLines(#"D:\test\1.csv").Select(x => new StringWrapper(x)).ToList();
var lst2 = File.ReadLines(#"D:\test\2.csv").Select(x => new StringWrapper(x));
var set = new HashSet<StringWrapper>(lst1);
set.SymmetricExceptWith(lst2);
foreach (var x in set)
{
Console.WriteLine(x.Value);
}
}
}
struct StringWrapper : IEquatable<StringWrapper>
{
public string Value { get; }
private readonly string _comparand0;
private readonly string _comparand14;
public StringWrapper(string value)
{
Value = value;
var split = value.Split(';');
_comparand0 = split[0];
_comparand14 = split[14];
}
public bool Equals(StringWrapper other)
{
return string.Equals(_comparand0, other._comparand0, StringComparison.OrdinalIgnoreCase)
&& string.Equals(_comparand14, other._comparand14, StringComparison.OrdinalIgnoreCase);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
return obj is StringWrapper && Equals((StringWrapper) obj);
}
public override int GetHashCode()
{
unchecked
{
return ((_comparand0 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand0) : 0)*397)
^ (_comparand14 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand14) : 0);
}
}
}

How to count occurences of number stored in file containing multiple delimeters?

This is my input store in file:
50|Carbon|Mercury|P:4;P:00;P:1
90|Oxygen|Mars|P:10;P:4;P:00
90|Serium|Jupiter|P:4;P:16;P:10
85|Hydrogen|Saturn|P:00;P:10;P:4
Now i will take my first row P:4 and then next P:00 and then next like wise and want to count occurence in every other row so expected output will be:
P:4 3(found in 2nd row,3rd row,4th row(last cell))
P:00 2 (found on 2nd row,4th row)
P:1 0 (no occurences are there so)
P:10 1
P:16 0
etc.....
Like wise i would like to print occurence of each and every proportion.
So far i am successfull in splitting row by row and storing in my class file object like this:
public class Planets
{
//My rest fields
public string ProportionConcat { get; set; }
public List<proportion> proportion { get; set; }
}
public class proportion
{
public int Number { get; set; }
}
I have already filled my planet object like below and Finally my List of planet object data is like this:
List<Planets> Planets = new List<Planets>();
Planets[0]:
{
Number:50
name: Carbon
object:Mercury
ProportionConcat:P:4;P:00;P:1
proportion[0]:
{
Number:4
},
proportion[1]:
{
Number:00
},
proportion[2]:
{
Number:1
}
}
Etc...
I know i can loop through and perform search and count but then 2 to 3 loops will be required and code will be little messy so i want some better code to perform this.
Now how do i search each and count every other proportion in my planet List object??
Well, if you have parsed proportions, you can create new struct for output data:
// Class to storage result
public class Values
{
public int Count; // count of proportion entry.
public readonly HashSet<int> Rows = new HashSet<int>(); //list with rows numbers.
/// <summary> Add new proportion</summary>
/// <param name="rowNumber">Number of row, where proportion entries</param>
public void Increment(int rowNumber)
{
++Count; // increase count of proportions entries
Rows.Add(rowNumber); // add number of row, where proportion entry
}
}
And use this code to fill it. I'm not sure it's "messy" and don't see necessity to complicate the code with LINQ. What do you think about it?
var result = new Dictionary<int, Values>(); // create dictionary, where we will storage our results. keys is proportion. values - information about how often this proportion entries and rows, where this proportion entry
for (var i = 0; i < Planets.Count; i++) // we use for instead of foreach for finding row number. i == row number
{
var planet = Planets[i];
foreach (var proportion in planet.proportion)
{
if (!result.ContainsKey(proportion.Number)) // if our result dictionary doesn't contain proportion
result.Add(proportion.Number, new Values()); // we add it to dictionary and initialize our result class for this proportion
result[proportion.Number].Increment(i); // increment count of entries and add row number
}
}
You can use var count = Regex.Matches(lineString, input).Count;. Try this example
var list = new List<string>
{
"50|Carbon|Mercury|P:4;P:00;P:1",
"90|Oxygen|Mars|P:10;P:4;P:00",
"90|Serium|Jupiter|P:4;P:16;P:10",
"85|Hydrogen|Saturn|P:00;P:10;P:4"
};
int totalCount;
var result = CountWords(list, "P:4", out totalCount);
Console.WriteLine("Total Found: {0}", totalCount);
foreach (var foundWords in result)
{
Console.WriteLine(foundWords);
}
public class FoundWords
{
public string LineNumber { get; set; }
public int Found { get; set; }
}
private List<FoundWords> CountWords(List<string> words, string input, out int total)
{
total = 0;
int[] index = {0};
var result = new List<FoundWords>();
foreach (var f in words.Select(word => new FoundWords {Found = Regex.Matches(word, input).Count, LineNumber = "Line Number: " + index[0] + 1}))
{
result.Add(f);
total += f.Found;
index[0]++;
}
return result;
}
I made a DotNetFiddle for you here: https://dotnetfiddle.net/z9QwmD
string raw =
#"50|Carbon|Mercury|P:4;P:00;P:1
90|Oxygen|Mars|P:10;P:4;P:00
90|Serium|Jupiter|P:4;P:16;P:10
85|Hydrogen|Saturn|P:00;P:10;P:4";
string[] splits = raw.Split(
new string[] { "|", ";", "\n" },
StringSplitOptions.None
);
foreach (string p in splits.Where(s => s.ToUpper().StartsWith(("P:"))).Distinct())
{
Console.WriteLine(
string.Format("{0} - {1}",
p,
splits.Count(s => s.ToUpper() == p.ToUpper())
)
);
}
Basically, you can use .Split to split on multiple delimiters at once, it's pretty straightforward. After that, everything is gravy :).
Obviously my code simply outputs the results to the console, but that part is fairly easy to change. Let me know if there's anything you didn't understand.

WPF list filtering

I am new to WPF so this is probably an easy question. I have an app that reads some words from a csv file and stores them in a list of strings. What I am trying to do is parametise this list to show the most popular words in my list. So in my UI I want to have a text box which when I enter a number e.g. 5 would filter the original list leaving only the 5 most popular (frequent) words in the new list. Can anyone assist with this final step? Thanks -
public class VM
{
public VM()
{
Words = LoadWords(fileList);
}
public IEnumerable<string> Words { get; private set; }
string[] fileList = Directory.GetFiles(#"Z:\My Documents\", "*.csv");
private static IEnumerable<string> LoadWords(String[] fileList)
{
List<String> words = new List<String>();
//
if (fileList.Length == 1)
{
try
{
foreach (String line in File.ReadAllLines(fileList[0]))
{
string[] rows = line.Split(',');
words.AddRange(rows);
}
}
catch (Exception ex)
{
System.Windows.MessageBox.Show(ex.Message, "Problem!");
}
}
else
{
System.Windows.MessageBox.Show("Please ensure that you have ONE read file in the source folder.", "Problem!");
}
return words;
}
}
A LINQ query that groups by the word and orders by the count of that word descending should do it. Try this
private static IEnumerable<string> GetTopWords(int Count)
{
var popularWords = (from w in words
group w by w
into grp
orderby grp.Count() descending
select grp.Key).Take(Count).ToList();
return popularWords;
}
You could use CollectionViewSource.GetDefaultView(viewModel.Words), which returns ICollectionView.
ICollectionView exposes Filter property of type Predicate<object>, that you could involve for filtering.
So the common scenario looks like:
ViewModel exposes property PopularCount, that is binded to some textbox in View.
ViewModel listens for PopularCount property's changing.
When notification occured, model obtains ICollectionView for viewModel.Words collection and set up Filter property.
You could find working sample of Filter property usage here. If you get stuck with code, let me know.
Instead of reading all the words into the list and then sorting it based on the frequency, a cleaner approach would be to create a custom class MyWord that stores the word and the frequency. While reading the file, the frequency of the word can be incremented. The class can implement IComparable<T> to compare the words based on the frequency.
public class MyWord : IComparable<MyWord>
{
public MyWord(string word)
{
this.Word = word;
this.Frequency = 0;
}
public MyWord(string word, int frequency)
{
this.Word = word;
this.Frequency = frequency;
}
public string Word { get; private set;}
public int Frequency { get; private set;}
public void IncrementFrequency()
{
this.Frequency++;
}
public void DecrementFrequency()
{
this.Frequency--;
}
public int CompareTo(MyWord secondWord)
{
return this.Frequency.CompareTo(secondWord.Frequency);
}
}
The main class VM would have these members,
public IEnumerable<MyWord> Words { get; private set; }
private void ShowMostPopularWords(int numberOfWords)
{
SortMyWordsDescending();
listBox1.Items.Clear();
for (int i = 0; i < numberOfWords; i++ )
{
listBox1.Items.Add(this.Words.ElementAt(i).Word + "|" + this.Words.ElementAt(i).Frequency);
}
}
And the call to ShowMostPopularWords()
private void Button_Click(object sender, RoutedEventArgs e)
{
int numberOfWords;
if(Int32.TryParse(textBox1.Text, NumberStyles.Integer, CultureInfo.CurrentUICulture, out numberOfWords))
{
ShowMostPopularWords(numberOfWords);
}
}
I'm not sure if grouping and ordering of the 'words' list is what you want but if yes this could be a way of doing it:
int topN = 3;
List<string> topNWords = new List<string>();
string[] words = new string[] {
"word5",
"word1",
"word1",
"word1",
"word2",
"word2",
"word2",
"word3",
"word3",
"word4",
"word5",
"word6",
};
// [linq query][1]
var wordGroups = from s in words
group s by s into g
select new { Count = g.Count(), Word = g.Key };
for (int i = 0; i < Math.Min(topN, wordGroups.Count()); i++)
{
// (g) => g.Count is a [lambda expression][2]
// .OrderBy and Reverse are IEnumerable extension methods
var element = wordGroups.OrderBy((g) => g.Count).Reverse().ElementAt(i);
topNWords.Add(element.Count + " - " + element.Word);
}
Thsi could be made much shorter by using ordering in the linq select clause but I wished to introduce you to inline lambdas and ienumerable extensions too.
The short version could be:
topNWords = (from s in words
group s by s
into g
orderby g.Count() descending
select g.Key).Take(Math.Min(topN, g.Count()).ToList();

Categories