How to find all duplicate from a List<string>? [duplicate] - c#

This question already has answers here:
C# LINQ find duplicates in List
(13 answers)
Closed 3 years ago.
I have a List<string> which has some words duplicated. I need to find all words which are duplicates.
Any trick to get them all?

In .NET framework 3.5 and above you can use Enumerable.GroupBy which returns an enumerable of enumerables of duplicate keys, and then filter out any of the enumerables that have a Count of <=1, then select their keys to get back down to a single enumerable:
var duplicateKeys = list.GroupBy(x => x)
.Where(group => group.Count() > 1)
.Select(group => group.Key);

If you are using LINQ, you can use the following query:
var duplicateItems = from x in list
group x by x into grouped
where grouped.Count() > 1
select grouped.Key;
or, if you prefer it without the syntactic sugar:
var duplicateItems = list.GroupBy(x => x).Where(x => x.Count() > 1).Select(x => x.Key);
This groups all elements that are the same, and then filters to only those groups with more than one element. Finally it selects just the key from those groups as you don't need the count.
If you're prefer not to use LINQ, you can use this extension method:
public void SomeMethod {
var duplicateItems = list.GetDuplicates();
…
}
public static IEnumerable<T> GetDuplicates<T>(this IEnumerable<T> source) {
HashSet<T> itemsSeen = new HashSet<T>();
HashSet<T> itemsYielded = new HashSet<T>();
foreach (T item in source) {
if (!itemsSeen.Add(item)) {
if (itemsYielded.Add(item)) {
yield return item;
}
}
}
}
This keeps track of items it has seen and yielded. If it hasn't seen an item before, it adds it to the list of seen items, otherwise it ignores it. If it hasn't yielded an item before, it yields it, otherwise it ignores it.

and without the LINQ:
string[] ss = {"1","1","1"};
var myList = new List<string>();
var duplicates = new List<string>();
foreach (var s in ss)
{
if (!myList.Contains(s))
myList.Add(s);
else
duplicates.Add(s);
}
// show list without duplicates
foreach (var s in myList)
Console.WriteLine(s);
// show duplicates list
foreach (var s in duplicates)
Console.WriteLine(s);

If you're looking for a more generic method:
public static List<U> FindDuplicates<T, U>(this List<T> list, Func<T, U> keySelector)
{
return list.GroupBy(keySelector)
.Where(group => group.Count() > 1)
.Select(group => group.Key).ToList();
}
EDIT: Here's an example:
public class Person {
public string Name {get;set;}
public int Age {get;set;}
}
List<Person> list = new List<Person>() { new Person() { Name = "John", Age = 22 }, new Person() { Name = "John", Age = 30 }, new Person() { Name = "Jack", Age = 30 } };
var duplicateNames = list.FindDuplicates(p => p.Name);
var duplicateAges = list.FindDuplicates(p => p.Age);
foreach(var dupName in duplicateNames) {
Console.WriteLine(dupName); // Will print out John
}
foreach(var dupAge in duplicateAges) {
Console.WriteLine(dupAge); // Will print out 30
}

Using LINQ, ofcourse.
The below code would give you dictionary of item as string, and the count of each item in your sourc list.
var item2ItemCount = list.GroupBy(item => item).ToDictionary(x=>x.Key,x=>x.Count());

For what it's worth, here is my way:
List<string> list = new List<string>(new string[] { "cat", "Dog", "parrot", "dog", "parrot", "goat", "parrot", "horse", "goat" });
Dictionary<string, int> wordCount = new Dictionary<string, int>();
//count them all:
list.ForEach(word =>
{
string key = word.ToLower();
if (!wordCount.ContainsKey(key))
wordCount.Add(key, 0);
wordCount[key]++;
});
//remove words appearing only once:
wordCount.Keys.ToList().FindAll(word => wordCount[word] == 1).ForEach(key => wordCount.Remove(key));
Console.WriteLine(string.Format("Found {0} duplicates in the list:", wordCount.Count));
wordCount.Keys.ToList().ForEach(key => Console.WriteLine(string.Format("{0} appears {1} times", key, wordCount[key])));

I'm assuming each string in your list contains several words, let me know if that's incorrect.
List<string> list = File.RealAllLines("foobar.txt").ToList();
var words = from line in list
from word in line.Split(new[] { ' ', ';', ',', '.', ':', '(', ')' }, StringSplitOptions.RemoveEmptyEntries)
select word;
var duplicateWords = from w in words
group w by w.ToLower() into g
where g.Count() > 1
select new
{
Word = g.Key,
Count = g.Count()
}

I use a method like that to check duplicated entrys in a string:
public static IEnumerable<string> CheckForDuplicated(IEnumerable<string> listString)
{
List<string> duplicateKeys = new List<string>();
List<string> notDuplicateKeys = new List<string>();
foreach (var text in listString)
{
if (notDuplicateKeys.Contains(text))
{
duplicateKeys.Add(text);
}
else
{
notDuplicateKeys.Add(text);
}
}
return duplicateKeys;
}
Maybe it's not the most shorted or elegant way, but I think that is very readable.

lblrepeated.Text = "";
string value = txtInput.Text;
char[] arr = value.ToCharArray();
char[] crr=new char[1];
int count1 = 0;
for (int i = 0; i < arr.Length; i++)
{
int count = 0;
char letter=arr[i];
for (int j = 0; j < arr.Length; j++)
{
char letter3 = arr[j];
if (letter == letter3)
{
count++;
}
}
if (count1 < count)
{
Array.Resize<char>(ref crr,0);
int count2 = 0;
for(int l = 0;l < crr.Length;l++)
{
if (crr[l] == letter)
count2++;
}
if (count2 == 0)
{
Array.Resize<char>(ref crr, crr.Length + 1);
crr[crr.Length-1] = letter;
}
count1 = count;
}
else if (count1 == count)
{
int count2 = 0;
for (int l = 0; l < crr.Length; l++)
{
if (crr[l] == letter)
count2++;
}
if (count2 == 0)
{
Array.Resize<char>(ref crr, crr.Length + 1);
crr[crr.Length - 1] = letter;
}
count1 = count;
}
}
for (int k = 0; k < crr.Length; k++)
lblrepeated.Text = lblrepeated.Text + crr[k] + count1.ToString();

Related

Merge first list with second list based on standard deviation of second list C#

Given 2 datasets (which are both a sequence of standard deviations away from a number, we are looking for the overlapping sections):
var list1 = new decimal[] { 357.06, 366.88, 376.70, 386.52, 406.15 };
var list2 = new decimal[] { 370.51, 375.62, 380.72, 385.82, 390.93 };
I would like to perform a merge with items from List2 being placed closest to items of List1, within a certain range, i.e. merge List2 element within 5.10 (standard deviation) of List1 element:
357.06
366.88 => 370.51
376.70 => 375.52, 380.72
386.52 => 390.93
406.15
The idea is to cluster values from List2 and count them, in this case element with value 376.70 would have the highest significance as it has 2 close neighbors of 375.52 and 380.72 (where as 366.88 and 386.52 have only 1 match, and the remaining none within range).
Which C# math/stats libraries could be used for this (or would there be a better way to combine statistically)?
If this is more of a computer science or stats question apologies in advance will close and reopen on relevant SO site.
Assuming that list2 is sorted (if not, put Array.Sort(list2);) you can try Binary Search:
Given:
var list1 = new decimal[] { 357.06m, 366.88m, 376.70m, 386.52m, 406.15m };
var list2 = new decimal[] { 370.51m, 375.62m, 380.72m, 385.82m, 390.93m };
decimal sd = 5.10m;
Code:
// Array.Sort(list2); // Uncomment, if list2 is not sorted
List<(decimal value, decimal[] list)> result = new List<(decimal value, decimal[] list)>();
foreach (decimal value in list1) {
int leftIndex = Array.BinarySearch<decimal>(list2, value - sd);
if (leftIndex < 0)
leftIndex = -leftIndex - 1;
else // edge case
for (; leftIndex >= 1 && list1[leftIndex - 1] == value - sd; --leftIndex) ;
int rightIndex = Array.BinarySearch<decimal>(list2, value + sd);
if (rightIndex < 0)
rightIndex = -rightIndex - 1;
else // edge case
for (; rightIndex < list1.Length - 1 && list1[rightIndex + 1] == value + sd; ++rightIndex) ;
result.Add((value, list2.Skip(leftIndex).Take(rightIndex - leftIndex).ToArray()));
}
Let's have a look:
string report = string.Join(Environment.NewLine, result
.Select(item => $"{item.value} => [{string.Join(", ", item.list)}]"));
Console.Write(report);
Outcome:
357.06 => []
366.88 => [370.51]
376.70 => [375.62, 380.72]
386.52 => [385.82, 390.93]
406.15 => []
Something like this should work
var list1 = new double[] { 357.06, 366.88, 376.70, 386.52, 406.15 };
var list2 = new double[] { 370.51, 375.62, 380.72, 385.82, 390.93 };
double dev = 5.1;
var result = new Dictionary<double, List<double>>();
foreach (var l in list2) {
var diffs = list1.Select(r => new { diff = Math.Abs(r - l), r })
.Where(d => d.diff <= dev)
.MinBy(r => r.diff)
.FirstOrDefault();
if (diffs == null) {
continue;
}
List<double> list;
if (! result.TryGetValue(diffs.r, out list)) {
list = new List<double>();
result.Add(diffs.r, list);
}
list.Add(l);
}
It uses MinBy from MoreLinq, but it is easy to modify to work without it.
In fact, you don't need extra libs or something else. You can use just LINQ for this.
internal class Program
{
private static void Main(string[] args)
{
var deviation = 5.1M;
var list1 = new decimal[] { 357.06M, 366.88M, 376.70M, 386.52M, 406.15M };
var list2 = new decimal[] { 370.51M, 375.62M, 380.72M, 385.82M, 390.93M };
var result = GetDistribution(list1.ToList(), list2.ToList(), deviation);
result.ForEach(x => Console.WriteLine($"{x.BaseValue} => {string.Join(", ", x.Destribution)} [{x.Weight}]"));
Console.ReadLine();
}
private static List<Distribution> GetDistribution(List<decimal> baseList, List<decimal> distrebutedList, decimal deviation)
{
return baseList.Select(x =>
new Distribution
{
BaseValue = x,
Destribution = distrebutedList.Where(y => x - deviation < y && y < x + deviation).ToList()
}).ToList();
}
}
internal class Distribution
{
public decimal BaseValue { get; set; }
public List<decimal> Destribution { get; set; }
public int Weight => Destribution.Count;
}
I hope it was useful for you.

Remove the repeating items and return the order number

I want to remove the repeating items of a list.I can realize it whit Distinct() easily.But i also need to get the order number of the items which have been removed.I can't find any function in linq to solve the problem and finally realize it with the following code:
public List<string> Repeat(List<string> str)
{
var Dlist = str.Distinct();
List<string> repeat = new List<string>();
foreach (string aa in Dlist)
{
int num = 0;
string re = "";
for (int i = 1; i <= str.LongCount(); i++)
{
if (aa == str[i - 1])
{
num = num + 1;
re = re + " - " + i;
}
}
if (num > 1)
{
repeat.Add(re.Substring(3));
}
}
return repeat;
}
Is there any other way to solve the problem more simple? Or is there any function in linq I missed?Any advice will be appreciated.
This query does exactly the same as your function, if I'm not mistaken:
var repeated = str.GroupBy(s => s).Where(group => group.Any())
.Select(group =>
{
var indices = Enumerable.Range(1, str.Count).Where(i => str[i-1] == group.Key).ToList();
return string.Join(" - ", group.Select((s, i) => indices[i]));
});
It firstly groups the items of the original list, so that every item with the same content is in a group. Then it searches for all indices of the items in the group in the original list, so that we have all the indices of the original items of the group. Then it joins the indices to a string, so that the resulting format is similiar to the one you requested. You could also transform this statement lambda to an anonymous lambda:
var repeated = str.GroupBy(s => s).Where(group => group.Any())
.Select(group => string.Join(" - ",
group.Select((s, i) =>
Enumerable.Range(1, str.Count).Where(i2 => str[i2 - 1] == group.Key).ToList()[i])));
However, this significantly reduces performance.
I tested this with the following code:
public static void Main()
{
var str = new List<string>
{
"bla",
"bla",
"baum",
"baum",
"nudel",
"baum",
};
var copy = new List<string>(str);
var repeated = str.GroupBy(s => s).Where(group => group.Any())
.Select(group => string.Join(" - ",
group.Select((s, i) =>
Enumerable.Range(1, str.Count).Where(i2 => str[i2 - 1] == group.Key).ToList()[i])));
var repeated2 = Repeat(str);
var repeated3 = str.GroupBy(s => s).Where(group => group.Any())
.Select(group =>
{
var indices = Enumerable.Range(1, str.Count).Where(i => str[i-1] == group.Key).ToList();
return string.Join(" - ", group.Select((s, i) => indices[i]));
});
Console.WriteLine(string.Join("\n", repeated) + "\n");
Console.WriteLine(string.Join("\n", repeated2) + "\n");
Console.WriteLine(string.Join("\n", repeated3));
Console.ReadLine();
}
public static List<string> Repeat(List<string> str)
{
var distinctItems = str.Distinct();
var repeat = new List<string>();
foreach (var item in distinctItems)
{
var added = false;
var reItem = "";
for (var index = 0; index < str.LongCount(); index++)
{
if (item != str[index])
continue;
added = true;
reItem += " - " + (index + 1);
}
if (added)
repeat.Add(reItem.Substring(3));
}
return repeat;
}
Which has the followin output:
1 - 2
3 - 4 - 6
5
1 - 2
3 - 4 - 6
5
1 - 2
3 - 4 - 6
5
Inside your repeat method you can use following way to get repeated items
var repeated = str.GroupBy(s=>s)
.Where(grp=>grp.Count()>1)
.Select(y=>y.Key)
.ToList();

How to find the duplicates in the given string in c#

I want to find the duplicates for a given string, I tried for collections, It is working fine, but i don't know how to do it for a string.
Here is the code I tried for collections,
string name = "this is a a program program";
string[] arr = name.Split(' ');
var myList = new List<string>();
var duplicates = new List<string>();
foreach(string res in arr)
{
if (!myList.Contains(res))
{
myList.Add(res);
}
else
{
duplicates.Add(res);
}
}
foreach(string result in duplicates)
{
Console.WriteLine(result);
}
Console.ReadLine();
But I want to find the duplicates for the below string and to store it in an array. How to do that?
eg:- string aa = "elements";
In the above string i want to find the duplicate characters and store it in an array
Can anyone help me?
Linq solution:
string name = "this is a a program program";
String[] result = name.Split(' ')
.GroupBy(word => word)
.Where(chunk => chunk.Count() > 1)
.Select(chunk => chunk.Key)
.ToArray();
Console.Write(String.Join(Environment.NewLine, result));
The same princicple for duplicate characters within a string:
String source = "elements";
Char[] result = source
.GroupBy(c => c)
.Where(chunk => chunk.Count() > 1)
.Select(chunk => chunk.Key)
.ToArray();
// result = ['e']
Console.Write(String.Join(Environment.NewLine, result));
string name = "elements";
var myList = new List<char>();
var duplicates = new List<char>();
foreach (char res in name)
{
if (!myList.Contains(res))
{
myList.Add(res);
}
else if (!duplicates.Contains(res))
{
duplicates.Add(res);
}
}
foreach (char result in duplicates)
{
Console.WriteLine(result);
}
Console.ReadLine();
string is an array of chars. So, you can use your collection approach.
But, I would reccomend typed HashSet. Just load it with string and you'll get array of chars without duplicates, with preserved order.
take a look:
string s = "aaabbcdaaee";
HashSet<char> hash = new HashSet<char>(s);
HashSet<char> hashDup = new HashSet<char>();
foreach (var c in s)
if (hash.Contains(c))
hash.Remove(c);
else
hashDup.Add(c);
foreach (var x in hashDup)
Console.WriteLine(x);
Console.ReadKey();
Instead of a List<> i'd use a HashSet<> because it doesn't allow duplicates and Add returns false in that case. It's more efficient. I'd also use a Dictionary<TKey,Tvalue> instead of the list to track the count of each char:
string text = "elements";
var duplicates = new HashSet<char>();
var duplicateCounts = new Dictionary<char, int>();
foreach (char c in text)
{
int charCount = 0;
bool isDuplicate = duplicateCounts.TryGetValue(c, out charCount);
duplicateCounts[c] = ++charCount;
if (isDuplicate)
duplicates.Add(c);
}
Now you have all unique duplicate chars in the HashSet and the count of each unique char in the dictionary. In this example the set only contains e because it's three times in the string.
So you could output it in the following way:
foreach(char dup in duplicates)
Console.WriteLine("Duplicate char {0} appears {1} times in the text."
, dup
, duplicateCounts[dup]);
For what it's worth, here's a LINQ one-liner which also creates a Dictionary that only contains the duplicate chars and their count:
Dictionary<char, int> duplicateCounts = text
.GroupBy(c => c)
.Where(g => g.Count() > 1)
.ToDictionary(g => g.Key, g => g.Count());
I've shown it as second approach because you should first understand the standard way.
string name = "this is a a program program";
var arr = name.Split(' ').ToArray();
var dup = arr.Where(p => arr.Count(q => q == p) > 1).Select(p => p);
HashSet<string> hash = new HashSet<string>(dup);
string duplicate = string.Join(" ", hash);
You can do this through `LINQ
string name = "this is a a program program";
var d = name.Split(' ').GroupBy(x => x).Select(y => new { word = y.Key, Wordcount = y.Count() }).Where(z=>z.cou > 1).ToList();
Use LINQ to group values:
public static IEnumerable<T> GetDuplicates<T>(this IEnumerable<T> list)
{
return list.GroupBy(item => item).SelectMany(group => group.Skip(1));
}
public static bool HasDuplicates<T>(this IEnumerable<T> list)
{
return list.GetDuplicates().IsNotEmpty();
}
Then you use these extensions like this:
var list = new List<string> { "a", "b", "b", "c" };
var duplicatedValues = list.GetDuplicates();

finding all possible sum of two arrays element

I have two arrays and i am trying to get all possible sum of each element with other element of two array and index of each element
int[] width = new int[2] {10,20 };
int[] height = new int[2] {30,40 };
result should like this (value / indexes)
10 width0
10+20 width0+width1
10+30 width0+height0
10+40 width0+height1
10+20+30 width0+width1+height0
10+20+40 width0+width1+height1
10+20+30+40 width0+width1+height0+height1
And so for each element in two array
I tried using permutation but I get other output
It is more easy to get all combinations from one array than two arrays. And as we see, you need to store indices and array names along with the value of the elements in collections. So, in my opinion the best option is to combine these two arrays in one dictionary, where the key will be the value of the numbers and the value will be [ArrayName + Index of item] (f.e width0, height1 and so on....)
So, let's combine these arrays in one dictionary:
int[] width = new int[2] { 10, 20 };
int[] height = new int[2] { 30, 40 };
var widthDictionary = width.ToList().Select((number, index) => new { index, number })
.ToDictionary(key => key.number, value => string.Format("width{0}", value.index));
var heightDictionary = height.ToList().Select((number, index) => new { index, number })
.ToDictionary(key => key.number, value => string.Format("height{0}", value.index));
// And here is the final dictionary
var totalDictionary = widthDictionary.Union(heightDictionary);
Then add this method to your class: (source)
public static IEnumerable<IEnumerable<T>> GetPowerSet<T>(List<T> list)
{
return from m in Enumerable.Range(0, 1 << list.Count)
select
from i in Enumerable.Range(0, list.Count)
where (m & (1 << i)) != 0
select list[i];
}
Then send your dictionary as an argument to this method and project this collection as you want with the help of the Select() method:
var sumOfCombinations = GetPowerSet(totalDictionary.ToList())
.Where(x => x.Count() > 0)
.Select(x => new
{
Numbers = x.Select(pair => pair.Key).ToList(),
DisplayValues = x.Select(pair => pair.Value).ToList()
})
.ToList();
And at the end you can display expected result as this:
sumOfCombinations.ForEach(x =>
{
x.Numbers.ForEach(number => Console.Write("{0} ", number));
x.DisplayValues.ForEach(displayValue => Console.Write("{0} ", displayValue));
Console.WriteLine();
});
And, the result is:
This is a play off of #Farhad Jabiyev's answer.
Declares a class called IndexValuePair. and uses foreach on widthList and heightList. to populate the 'Index' property of item instance.
Note: Index is a string.
Class & Static Function
public class IndexValuePair
{
public string Index {get;set;}
public int Value {get;set;}
}
public static IEnumerable<IEnumerable<T>> GetPowerSet<T>(List<T> list)
{
return from m in Enumerable.Range(0, 1 << list.Count)
select
from i in Enumerable.Range(0, list.Count)
where (m & (1 << i)) != 0
select list[i];
}
Main (Console)
static void Main(string[] args)
{
int[] width = new int[2] { 10, 20 };
int[] height = new int[2] { 30, 40 };
var wholeList = width.Select(val => new IndexValuePair() { Index = "width", Value = val }).ToList();
var heightList = height.Select(val => new IndexValuePair() { Index = "height", Value = val }).ToList();
var iteration = 0;
wholeList.ForEach(ivp => { ivp.Index = ivp.Index + count; count = iteration + 1; });
iteration = 0;
heightList.ForEach(ipv => { ivp.Index = ivp.Index + count; count = iteration + 1; });
wholeList.AddRange(heightList);
var sumOfCombinations = GetPowerSet(wholeList).Where(x => x.Count() > 0)
.Select(x => new { Combination = x.ToList(), Sum = x.Sum(ivp => ivp.Value) }).ToList();
StringBuilder sb = new StringBuilder();
sumOfCombinations.ForEach(ivp =>
{
ivp.Combination.ForEach(pair => sb.Append(string.Format("{0} ", pair.Value)));
sb.Append(string.Format("= {0} = ", x.Sum));
ivp.Combination.ForEach(pair=> sb.Append(string.Format("{0} + ", pair.Index)));
sb.Length -= 3;
Console.WriteLine(sb);
sb.Clear();
});
var key = Console.ReadKey();
}

How to find modal value accross List<List<double>> for each inner value?

This is remarkably similar to another question I asked previously. I have no idea how to do things in Linq so I need some help with this one. I want to find the Modal value of a List> for each inner value.
I have the following list:
List<List<double>> myFullList = new List<List<double>>();
for(int i = 1; i <= numberOfLoops; i++)
{
List<double> myInnerList = new List<double>();
for(int i = 1; i <= 10; i++)
{
// Populate inner list with random numbers
myInnerList.Add(double myRandomNumber);
}
// Add the inner list to the full list
myFullList.Add(myInnerList);
}
The list should look something like this:
myFullList[0] = {rand#1,rand#2,rand#3,...,rand#10}
myFulllist[1] = {rand#1,rand#2,rand#3,...,rand#10}
.
.
.
.
myFulllist[1] = {rand#1,rand#2,rand#3,...,rand#10}
I need to find the MODAL VALUE for that data to form ONE single list that looks something like this:
List<double> mode= new List<double>();
mode= {mode#1, mode#2........mode#10}
This output variable will find the mode of the data for the same "row" of data in the inner list.
Simple example:
innerList[0] = {1.00,2.00,3.00};
innerList[1] = {3.00,2.00,8.00};
innerList[2] = {3.00,9.00,1.00};
innerList[3] = {3.00,1.00,1};
fullList = {innerList[0], innerList[1], innerList[2], innerList[3]};
modeList = {3,2,1};
Not the most elegant way, but probably easier to Understand. It has been succesfully tested :)
class Program
{
static void Main(string[] args)
{
Random rnd = new Random();
int numberOfLoops = 10;
List<List<int>> myFullList = new List<List<int>>();
for (int i = 0; i < numberOfLoops; i++)
{
List<int> myInnerList = new List<int>();
for (int j = 0; j < 10; j++)
{
// Populate inner list with random numbers
myInnerList.Add(rnd.Next(0, 10));
}
// Add the inner list to the full list
myFullList.Add(myInnerList);
}
myFullList = Transpose<int>(myFullList);
List<int> result = new List<int>();
foreach (List<int> subList in myFullList)
result.Add(Mode(subList));
//TO-DO: linq version!
//List<int> result = myFullList.ForEach(num => Mode(num));
}
public static int Mode(List<int> x)
{
int mode = x.GroupBy(v => v)
.OrderByDescending(g => g.Count())
.First()
.Key;
return mode;
}
public static List<List<T>> Transpose<T>(List<List<T>> lists)
{
var longest = lists.Any() ? lists.Max(l => l.Count) : 0;
List<List<T>> outer = new List<List<T>>(longest);
for (int i = 0; i < longest; i++)
outer.Add(new List<T>(lists.Count));
for (int j = 0; j < lists.Count; j++)
for (int i = 0; i < longest; i++)
outer[i].Add(lists[j].Count > i ? lists[j][i] : default(T));
return outer;
}
}
That's quiet simple, here is code (sorry, haven't fully tested it, but it's good to start with):
public static class ModalHelper
{
public static List<double> GetModals(List<List<double>> source)
{
return source.Select(list => list.Sum()/list.Count).ToList();
}
}
This linq query should do the trick
var result = list.Select<List<double>, List<KeyValuePair<int, double>>>(sub =>
{
List<KeyValuePair<int, double>> elems = new List<KeyValuePair<int, double>>(sub.Count);
for (int i = 0; i < sub.Count; ++i)
elems.Add(new KeyValuePair<int, double>(i, sub[i]));
return elems;
}).SelectMany((x) => x).GroupBy((x) => x.Key).Select<IGrouping<int, KeyValuePair<int, double>>, double>(x =>
{
var y = x.GroupBy(g => g.Value).OrderByDescending(g => g.Count());
return y.First().First().Value;
});
Here is an example:
static void Main(string[] args)
{
List<List<double>> list = new List<List<double>>();
list.Add(new List<double> { 1.00, 2.00, 3.00 });
list.Add(new List<double> { 3.00, 2.00, 8.00 });
list.Add(new List<double> { 3.00, 9.00, 1.00 });
list.Add(new List<double> { 3.00, 1.00, 1 });
var result = list.Select<List<double>, List<KeyValuePair<int, double>>>(sub =>
{
List<KeyValuePair<int, double>> elems = new List<KeyValuePair<int, double>>(sub.Count);
for (int i = 0; i < sub.Count; ++i)
elems.Add(new KeyValuePair<int, double>(i, sub[i]));
return elems;
}).SelectMany((x) => x).GroupBy((x) => x.Key).Select<IGrouping<int, KeyValuePair<int, double>>, double>(x =>
{
var y = x.GroupBy(g => g.Value).OrderByDescending(g => g.Count());
return y.First().First().Value;
});
foreach (double val in result)
Console.Write(val + " ");
Console.WriteLine();
}
Here a live version at ideone: http://ideone.com/ye2EhG
First the lists are transformed to lists of key-value-pairs which add the information of the index inside each list. Then these lists are flattened to one single list and then this new list is grouped by the index. The groups are ordered by the count of values and the most-frequent element is returned for each group.
Something like this should give the mode:
var temp = myFullList.SelectMany(l => l).GroupBy(all => all).Select(result => new
{
Value = result.Key,
Count = result.Count()
}).OrderByDescending(t => t.Count);
Explanation:
From MSDN - The SelectMany
Projects each element of a sequence to an IEnumerable and flattens
the resulting sequences into one sequence.
So it gives us each decimal from the sub lists. We then group that by the decimals themselves and select the count for each along with their value. Finally we order by the count to give the most frequently occurring decimals first.
Edit based on the comment from Robert S
It seems the above code isn't what was required. As Robert S points out that code gives the mode of ALL numbers in the List<List<double>> but the question is how to get the mode from each column.
The following code should give the mode per column. Note that this code ignores duplicates; if more than one number appears the same amount of times the first number will be given:
var result1 = myFullList[0].Select((l, i) => new
{
Column = i,
Mode = myFullList.GroupBy(fl => fl[i]).OrderByDescending(t => t.Count()).Select(t => t.Key).FirstOrDefault()
});
foreach (var item in result1)
{
Console.WriteLine(string.Format("{0} {1}", item.Column, item.Mode));
}
The code is using the overload of Select to take the index of the element (the column in the OP's definition). It then groups each item at that index. Note there are no bounds checks on myFullList but in production code there should be.
If duplicates are an issue we need two steps:
var temp2 = myFullList[0].Select((l, i) => new
{
Column = i,
Mode = myFullList.GroupBy(fl => fl[i]).Select(t => new { Number = t.Key, Count = t.Count() }).OrderByDescending(a => a.Count)
});
var result2 = temp2.Select(t => new
{
Column = t.Column,
Mode = t.Mode.Where(m => m.Count == t.Mode.Max(tm => tm.Count))
});
foreach (var item in result2)
{
for (int i = 0; i < item.Mode.Count(); i++)
{
Console.WriteLine(string.Format("{0} {1}", item.Column, item.Mode.ElementAt(i)));
}
}
In the above code temp2.Mode will contain an IEnumerable of an anonymous object containing the number and how many times that number has appeared. result2 is then populated by grabbing each of those items where the count matches the max of the count.
Given the input:
myFullList.Add(new List<double> { 1.00, 2.00, 3.00 });
myFullList.Add(new List<double> { 3.00, 2.00, 3.00 });
myFullList.Add(new List<double> { 3.00, 9.00, 1.00 });
myFullList.Add(new List<double> { 3.00, 1.00, 1 });
The first code outputs
0 3
1 2
2 3
and the second outputs
0 3
1 2
2 3
2 1
Note we have two outputs for column 2 as both 3 and 1 are equally popular.

Categories