Select Distinct rows when only one columns is different Linq - c#

I have a data = List<Model>, where Model looks like this
public class Model{
public string String1 { get; set; }
public int Int1{ get; set; }
public int Int2{ get; set; }
public string String2 { get; set; }
public decimal Decimal1{ get; set; }
public decimal DecimalN{ get; set; }
I want to have average of each DecimalX values group by String1, Int1, Int2, but my problem is that sometimes I have two or more the same rows and only String2 is different, so I would like to do Distinct, but doesn't work because of this String2 property. I was trying to change all values of String2 to null or empty string
var x = data.ForEach(x => x.String2= null);
but I receive error Cannot asign void to implicitly- typed variable.

ForEach does not return anything - it's is void method, so your assignment to x is not valid.
Instead of this:
var x = data.ForEach(x => x.String2= null);
You should do like this:
data.ForEach(x => x.String2= null);

Just like Reniuz said that error occurs becaus forEach doesn't return anything (void).
Returning to your main problem, you mentioned that you need the average value for each Decimal.
For Decimal1 you can do something like this:
var b = list.GroupBy(g => new { g.String1, g.Int1, g.Int2 }).Select(r=> new {r.Key.String1, r.Key.Int1, r.Key.Int2, avgDecimal1 = r.Select(g=>g.Decimal1).Average()}).ToList();
First you need to Group by the elements that are going to be your keys (without String2 like you said) and after that, select those keys and the AVG of the elements of a determined property (e.g. Decimal1). You can add more AVG functions if you want (avgDecimal2, avgDecimal3, etc)

you want to start with projecting your results with the values you want to work with or in the way you want to work with them.
var query = source.Select(e => new
Just omit the properties you don't want.
// or go straight to your average
var sum = data.GroupBy(e => new
}).Average(e => e.Key.Decimal1);

I don't understand why you want distinct values. May be you can provide an example.
One solution can be -
var result = data.GroupBy(x => new { x.String1, x.Int1, x.Int2 })
.Select(g => new
String1 = g.Key.String1,
Int1 = g.Key.Int1,
Int2 = g.Key.Int2,
AvgDecimal1 = g.Select(x => x.Decimal1).Average(),
AvgDecimal2 = g.Select(x => x.Decimal1).Average(),
AvgDecimalN = g.Select(x => x.Decimal1).Average()
If you want distinct values you can add distinct before group by -
data.Select(d => new Model { String1 = d.String1, Int1 = d.Int1, Int2 = d.Int2, Decimal1 = d.Decimal1, Decimal2 = d.Decimal2, DecimalN = d.DecimalN }) //returning new collection with String2 as null
First solution produces following result -


Why is the return is List<char>?

I am trying to pull file names that match the substring using "contains" method. However, return seem to be List<char> but I expect List<string>.
private void readAllAttribues()
using (var reader = new StreamReader(attribute_file))
//List<string> AllLines = new List<string>();
List<FileNameAttributeList> AllAttributes = new List<FileNameAttributeList>();
while (!reader.EndOfStream)
FileNameAttributeList Attributes = new FileNameAttributeList();
Attributes ImageAttributes = new Attributes();
Point XY = new Point();
string lineItem = reader.ReadLine();
var values = lineItem.Split(',');
Attributes.ImageFileName = values[1];
XY.X = Convert.ToInt16(values[3]);
XY.Y = Convert.ToInt16(values[4]);
ImageAttributes.Location = XY;
ImageAttributes.Radius = Convert.ToInt16(values[5]);
ImageAttributes.Area = Convert.ToInt16(values[6]);
List<string> unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).FirstOrDefault().ImageFileName.ToList();
List<string>var unique_reference_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"ref")).FirstOrDefault().ImageFileName.ToList();
foreach (var unique_raw_filename in unique_raw_filenames)
var raw_attributes = AllAttributes.Where(x => x.ImageFileName == unique_raw_filename).ToList();
Datatype class
public class FileNameAttributeList
{ // Do not change the order
public string ImageFileName { get; set; }
public List<Attributes> Attributes { get; set; }
public FileNameAttributeList()
Attributes = new List<Attributes>();
Why is FirstOrDefault() does not work ? (It returns List<char> but I am expecting List<string> and fails.
The ToList() method converts collections that implement IEnumerable<SomeType> into lists.
Looking at the definition of String, you can see that it implements IEnumerable<Char>, and so ImageFileName.ToList() in the following code will return a List<char>.
AllAttributes.Where(x =>
Although I'm guessing at what you want, it seems like you want to filter AllAttributes based on the ImageFileName, and then get a list of those file names. If that's the case, you can use something like this:
var unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).Select(y=>y.ImageFileName).ToList();
In your code
List<string> unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).FirstOrDefault().ImageFileName.ToList();
FirstOrDefault() returns the first, or default, FileNameAttributeList from the list AllAttributes where the ImageFileName contains the text non.
Calling ToList() on the ImageFileName then converts the string value into a list of chars because string is a collection of char.
I think that what you are intending can be achieved by switching out FirstOrDefault to Select. Select allows you to map one value onto another.
So your code could look like this instead.
List<string> unique_raw_filenames = AllAttributes.Where(x => x.ImageFileName.Contains(#"non")).Select(x => x.ImageFileName).ToList();
This then gives you a list of string.

Sort a List in which each element contains 2 Values

I have a text file that contains Values in this Format: Time|ID:
60 |2
Now I want to sort them by Time. The Output also should be:
60 |2
How can I solve this problem? With this:
var path = #"C:\Users\admin\Desktop\test.txt";
List<string> list = File.ReadAllLines(path).ToList();
for (var i = 0; i < list.Count; i++)
I got no success ...
3 steps are necessary to do the job:
1) split by the separator
2) convert to int because in a string comparison a 6 comes after a 1 or 10
3) use OrderBy to sort your collection
Here is a linq solution in one line doing all 3 steps:
list = list.OrderBy(x => Convert.ToInt32(x.Split('|')[0])).ToList();
x => lambda expression, x denotes a single element in your list
x.Split('|')[0] splits each string and takes only the first part of it (time)
Convert.ToInt32(.. converts the time into a number so that the ordering will be done in the way you desire
list.OrderBy( sorts your collection
Just to understand why you got the result in the first place here is an example of comparison of numbers in string representation using the CompareTo method:
int res = "6".CompareTo("10");
res will have the value of 1 (meaning that 6 is larger than 10 or 6 follows 10)
According to the documentation->remarks:
The CompareTo method was designed primarily for use in sorting or alphabetizing operations.
You should parse each line of the file content and get values as numbers.
string[] lines = File.ReadAllLines("path");
// ID, time
var dict = new Dictionary<int, int>();
// Processing each line of the file content
foreach (var line in lines)
string[] splitted = line.Split('|');
int time = Convert.ToInt32(splitted[0]);
int ID = Convert.ToInt32(splitted[1]);
// Key = ID, Value = Time
dict.Add(ID, time);
var orderedListByID = dict.OrderBy(x => x.Key).ToList();
var orderedListByTime = dict.OrderBy(x => x.Value).ToList();
Note that I use your ID reference as Key of dictionary assuming that ID should be unique.
Short code version
// Key = ID Value = Time
var orderedListByID = lines.Select(x => x.Split('|')).ToDictionary(x => Convert.ToInt32(x[1]), x => Convert.ToInt32(x[0])).OrderBy(x => x.Key).ToList();
var orderedListByTime = lines.Select(x => x.Split('|')).ToDictionary(x => Convert.ToInt32(x[1]), x => Convert.ToInt32(x[0])).OrderBy(x => x.Value).ToList();
You need to convert them to numbers first. Sorting by string won't give you meaningful results.
times = list.Select(l => l.Split('|')[0]).Select(Int32.Parse);
ids = list.Select(l => l.Split('|')[1]).Select(Int32.Parse);
pairs = times.Zip(ids, (t, id) => new{Time = t, Id = id})
.OrderBy(x => x.Time)
Thank you all, this is my Solution:
var path = #"C:\Users\admin\Desktop\test.txt";
List<string> list = File.ReadAllLines(path).ToList();
list = list.OrderBy(x => Convert.ToInt32(x.Split('|')[0])).ToList();
for(var i = 0; i < list.Count; i++)
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class TestClass {
public static void main(String[] args) {
List <LineItem> myList = new ArrayList<LineItem>();
myList.add(LineItem.getLineItem(500, 30));
myList.add(LineItem.getLineItem(300, 20));
myList.add(LineItem.getLineItem(900, 100));
System.out.println("list after sort");
class LineItem implements Comparable<LineItem>{
int time;
int id ;
public String toString() {
return ""+ time + "|"+ id + " ";
public int compareTo(LineItem o) {
return this.time-o.time;
public static LineItem getLineItem( int time, int id ){
LineItem l = new LineItem();
return l;

How to compare two csv files by 2 columns?

I have 2 csv files
I read both files and add data to lists
var lst1= File.ReadAllLines("1.csv").ToList();
var lst2= File.ReadAllLines("2.csv").ToList();
Then I find all unique strings from both lists and add it to result lists
var rezList = lst1.Except(lst2).Union(lst2.Except(lst1)).ToList();
rezlist contains this data
[0] = "italy;russia;france"
[1] = "india;iran;pakistan"
At now I want to compare, make except and union by second and third column in all rows.
I think I need to split all rows by symbol ';' and make all 3 operations (except, distinct and union) but cannot understand how.
rezlist must contains
I added class
class StringLengthEqualityComparer : IEqualityComparer<string>
public bool Equals(string x, string y)
public int GetHashCode(string obj)
StringLengthEqualityComparer stringLengthComparer = new StringLengthEqualityComparer();
var rezList = lst1.Except(lst2,stringLengthComparer ).Union(lst2.Except(lst1,stringLengthComparer),stringLengthComparer).ToList();
Your question is not very clear: for instance, is india;iran;pakistan the desired result primarily because russia is at element[1]? Isn't it also included because element [2] pakistan does not match france and japan? Even though thats unclear, I assume the desired result comes from either situation.
Then there is this: find all unique string from both lists which changes the nature dramatically. So, I take it that the desired results are because "iran" appears in column[1] no where else in column[1] in either file and even if it did, that row would still be unique due to "pakistan" in col[2].
Also note that a data sample of 2 leaves room for a fair amount of error.
Trying to do it in one step makes it very confusing. Since eliminating dupes found in 1.CSV is pretty easy, do it first:
// parse "1.CSV"
List<string[]> lst1 = File.ReadAllLines(#"C:\Temp\1.csv").
Select(line => line.Split(';')).
// parse "2.CSV"
List<string[]> lst2 = File.ReadAllLines(#"C:\Temp\2.csv").
Select(line => line.Split(';')).
// extracting once speeds things up in the next step
// and leaves open the possibility of iterating in a method
List<List<string>> tgts = new List<List<string>>();
tgts.Add(lst1.Select(z => z[1]).Distinct().ToList());
tgts.Add(lst1.Select(z => z[2]).Distinct().ToList());
var tmpLst = lst2.Where(x => !tgts[0].Contains(x[1]) ||
That results in the items which are not in 1.CSV (no matching text in Col[1] nor Col[2]). If that is really all you need, you are done.
Getting unique rows within 2.CSV is trickier because you have to actually count the number of times each Col[1] item occurs to see if it is unique; then repeat for Col[2]. This uses GroupBy:
var unique = tmpLst.
GroupBy(g => g[1], (key, values) =>
new GroupItem(key,
).Where(q => q.Count == 1).
GroupBy(g => g.Data[2], (key, values) => new
Item = string.Join(";", values.ToArray()[0]),
Count = values.Count()
).Where(q => q.Count == 1).Select(s => s.Item).
The GroupItem class is trivial:
class GroupItem
public string Item { set; get; } // debug aide
public string[] Data { set; get; }
public int Count { set; get; }
public GroupItem(string n, string[] d, int c)
Item = n;
Data = d;
Count = c;
public override string ToString()
return string.Join(";", Data);
It starts with tmpList, gets the rows with a unique element at [1]. It uses a class for storage since at this point we need the array data for further review.
The second GroupBy acts on those results, this time looking at col[2]. Finally, it selects the joined string data.
Using 50,000 random items in File1 (1.3 MB), 15,000 in File2 (390 kb). There were no naturally occurring unique items, so I manually made 8 unique in 2.CSV and copied 2 of them into 1.CSV. The copies in 1.CSV should eliminate 2 if the 8 unique rows in 2.CSV making the expected result 6 unique rows:
NepalX and ItalyX were the repeats in both files and they correctly eliminated each other.
With each step it is scanning and working with less and less data, which seems to make it pretty fast for 65,000 rows / 130,000 data elements.
your GetHashCode()-Method in EqualityComparer are buggy. Fixed version:
public int GetHashCode(string obj)
return obj.Split(';')[1].GetHashCode();
now the result are correct:
// one result: "india;iran;pakistan"
btw. "StringLengthEqualityComparer"is not a good name ;-)
private void GetUnion(List<string> lst1, List<string> lst2)
List<string> lstUnion = new List<string>();
foreach (string value in lst1)
string valueColumn1 = value.Split(';')[0];
string valueColumn2 = value.Split(';')[1];
string valueColumn3 = value.Split(';')[2];
string result = lst2.FirstOrDefault(s => s.Contains(";" + valueColumn2 + ";" + valueColumn3));
if (result != null)
if (!lstUnion.Contains(result))
class Program
static void Main(string[] args)
var lst1 = File.ReadLines(#"D:\test\1.csv").Select(x => new StringWrapper(x)).ToList();
var lst2 = File.ReadLines(#"D:\test\2.csv").Select(x => new StringWrapper(x));
var set = new HashSet<StringWrapper>(lst1);
foreach (var x in set)
struct StringWrapper : IEquatable<StringWrapper>
public string Value { get; }
private readonly string _comparand0;
private readonly string _comparand14;
public StringWrapper(string value)
Value = value;
var split = value.Split(';');
_comparand0 = split[0];
_comparand14 = split[14];
public bool Equals(StringWrapper other)
return string.Equals(_comparand0, other._comparand0, StringComparison.OrdinalIgnoreCase)
&& string.Equals(_comparand14, other._comparand14, StringComparison.OrdinalIgnoreCase);
public override bool Equals(object obj)
if (ReferenceEquals(null, obj)) return false;
return obj is StringWrapper && Equals((StringWrapper) obj);
public override int GetHashCode()
return ((_comparand0 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand0) : 0)*397)
^ (_comparand14 != null ? StringComparer.OrdinalIgnoreCase.GetHashCode(_comparand14) : 0);

What is the easiest way to split columns from a txt file

I've been looking around a bit but haven't really found a good example with what I'm struggling right now.
I have a .txt file with a couple of columns as follows:
52,20120406, 112, 91, 20, 130,
53,20130601, 332, 11, 33, 120,
And I'm reading these from the file into a string[] array.
I'd like to split them into a list
for example
List results, and [0] index will be the first index of the columns
Now I've been looking around, and came up with the "\\\s+" split
but I'm not sure how to go about it since each entry is under another one.
string[] lines = File.ReadAllLines(path);
List<Bus> results = new List<Bus>();
//Bus = class with all the vars in it
//such as Bus.ID, Bus.COLD, Bus.YYYYMMDD
foreach (line in lines) {
var val = line.Split("\\s+");
//not sure where to go from here
Would greatly appreciate any help!
Kind regards, Venomous.
I suggest using Linq, something like this:
List<Bus> results = File
.ReadLines(#"C:\MyFile.txt") // we have no need to read All lines in one go
.Skip(1) // skip file's title
.Select(line => line.Split(','))
.Select(items => new Bus( //TODO: check constructor's syntax
DateTime.ParseExact(items[2], "yyyyMMdd", CultureInfo.InvariantCulture)))
I would do
public class Foo
public int Id {get; set;}
public string Date {get; set;}
public double Cold {get; set;}
Then read the file
var l = new List<Foo>();
foreach (line in lines)
var sp = line.Split(',');
var foo = new Foo
Id = int.Parse(sp[0].Trim()),
Date = sp[1].Trim(),//or pharse the date to a date time struct
Cold = double.Parse(sp[2].Trim())
//now l contains a list filled with Foo objects
I would probably keep a List of properties and use reflection to populate the object, something like this :
var columnMap = new[]{"ID","YYYYMMDD","COLD","WATER","OD","OP"};
var properties = columnMap.Select(typeof(Bus).GetProperty).ToList();
var resultList = new List<Bus>();
foreach(var line in lines)
var val = line.Split(',');
var adding = new Bus();
for(int i=0;i<val.Length;i++)
This is assuming that all of your properties are strings however
Something like this perhaps...
results.Add(new Bus
ID = val[0],
YYYYMMDD = val[1],
COLD = val[2],
WATER = val[3],
OD = val[4],
OP = val[5]
Keep in mind that all of the values in the val array are still strings at this point. If the properties of Bus are typed, you will need to parse them into the correct types e.g. assume ID is typed as an int...
ID = string.IsNullOrEmpty(val[0]) ? default(int) : int.Parse(val[0]),
Also, if the column headers are actually present in the file in the first line, you'll need to skip/disregard that line and process the rest.
Given that we have the Bus class with all the variables from your textfile:
class Bus
public int id;
public DateTime date;
public int cold;
public int water;
public int od;
public int op;
public Bus(int _id, DateTime _date, int _cold, int _water, int _od, int _op)
id = _id;
date = _date;
cold = _cold;
water = _water;
od = _od;
op = _op;
Then we can list them all in the results list like this:
List<Bus> results = new List<Bus>();
foreach (string line in File.ReadAllLines(path))
if (line.StartsWith("#"))
string[] parts = line.Replace(" ", "").Split(','); // Remove all spaces and split at commas
results.Add(new Bus(
DateTime.ParseExact(parts[1], "yyyyMMdd", CultureInfo.InvariantCulture),
And access the values as you wish:
I hope this helps.

What is the fastest way to compare a value from a list to a specific sum from another list?

I have two huge lists of created objects. A List<Forecast> with all the forecasts from different Resources and a List<Capacity> with the capacities of these Resources.
A Forecast also contains booleans indicating if this Resource is over or below the capacity for the sum of all his forecasts.
public class Forecast
public int ResourceId { get; set; }
public double? ForecastJan { get; set; }
// and ForecastFeb, ForecastMarch, ForecastApr, ForecastMay, etc.
public bool IsOverForecastedJan { get; set; }
// and IsOverForecastedFeb, IsOverForecastedMarch, IsOverForecastedApr, etc.
public class Capacity
public int ResourceId { get; set; }
public double? CapacityJan { get; set; }
// and CapacityFeb, CapacityMar, CapacityApr, CapacityMay, etc.
I have to set the IsOverForecastXXX properties so I have to know for each month if the sum of forecasts for each resource is higher than the sum of the capacity for this specific resource.
Here is my code :
foreach (Forecast f in forecastList)
if (capacityList.Where(c => c.Id == f.ResourceId)
.Select(c => c.CapacityJan)
< forecastList.Where(x => x.ResourceId == f.ResourceId)
.Sum(x => x.ForecastJan)
f.IsOverForecastedJan = true;
//Same for each month...
My code works but I have really bad performances when the lists are too big (thousands of elements).
Do you how of can I improve this algorithm ? How to compare the sum of forecasts for each resource with the associated capacity ?
You can use First or FirstOrdefault to get the capacities for the currect resource, then compare them. I would use ToLookup which is similar to a Dictionary to get all forecasts for all resources.
ILookup<int, Forecast> forecastMonthGroups = forecastList
.ToLookup(fc => fc.ResourceId);
foreach (Forecast f in forecastList)
double? janSum = forecastMonthGroups[f.ResourceId].Sum(fc => fc.ForecastJan);
double? febSum = forecastMonthGroups[f.ResourceId].Sum(fc => fc.ForecastFeb);
var capacities = capacityList.First(c => c.ResourceId == f.ResourceId);
bool overJan = capacities.CapacityJan < janSum;
bool overFeb = capacities.CapacityFeb < febSum;
// ...
f.IsOverForecastedJan = overJan;
f.IsOverForecastedFeb = overFeb;
// ...
It seems that there is only one Capacity per ResourceID, then i would use a Dictionary to store the "way" from ResourceId to Capacity, this would improve performance even more:
ILookup<int, Forecast> forecastMonthGroups = forecastList
.ToLookup(fc => fc.ResourceId);
Dictionary<int, Capacity> capacityResources = capacityList
.ToDictionary(c => c.ResourceId);
foreach (Forecast f in forecastList)
double? janSum = forecastMonthGroups[f.ResourceId].Sum(fc => fc.ForecastJan);
double? febSum = forecastMonthGroups[f.ResourceId].Sum(fc => fc.ForecastFeb);
bool overJan = capacityResources[f.ResourceId].CapacityJan < janSum;
bool overFeb = capacityResources[f.ResourceId].CapacityFeb < febSum;
// ...
f.IsOverForecastedJan = overJan;
f.IsOverForecastedFeb = overFeb;
// ...
I would try selecting out your capacities and forecasts for each month before entering the loop this way you are not iterating each list every time you go round the loop.
Something like this:
var capicities = capacityList.GroupBy(c => c.ResourceId).ToDictionary(c=>c.First().ResourceId, c=>c.First().CapacityJan);
var forecasts = forecastList.GroupBy(x => x.ResourceId).ToDictionary(x => x.First().ResourceId, x => x.Sum(f => f.ForecastJan));
foreach (Forecast f in forecastList)
if (capicities[f.ResourceId] < forecasts[f.ResourceId])
f.IsOverForecastedJan = true;
There are lots of things you can do to speed this up. First up, make a single pass over forecastList and sum the capacity forecast for each month:
var demandForecasts = new Dictionary<int, double?[]>();
foreach (var forecast in forecastList)
var rid = forecast.ResourceId;
if (!demandForecasts.ContainsKey(rid))
demandForecasts[rid] = new double?[12];
var demandForecast = demandForecasts[rid];
demandForecast[0] += forecast.ForecastJan;
// etc
demandForecast[11] += forecast.ForecastDec;
Do the same for capacities, resulting in a capacities dictionary. Then, one more loop over forecastList to set the "over forecasted" flags:
foreach (var forecast in forecastList)
var rid = forecast.ResourceId;
forecast.IsOverForecastedJan = capacities[rid][0] < demandForecast[rid][0];
// ...
forecast.IsOverForecastedDec = capacities[rid][11] < demandForecast[rid][11];
As is obvious from the twelve-fold code repetition implicit in this code, modelling capacities etc as separate properties for each month is probably not the best way of doing things -- using some kind of indexed collection would allow the repetition to be eliminated.
