comparing two lists and removing missing numbers with C# - c#

there are two lists:
List<int> list2 = new List<int>(new[] { 1, 2, 3, 5, 6 }); // missing: 0 and 4
List<int> list1 = new List<int>(new[] { 0, 1, 2, 3, 4, 5, 6 });
how do you compare two lists, find missing numbers in List1 and remove these numbers from List1? To be more precise, I need to find a way to specify starting and ending position for comparison.
I imagine that the proccess should be very similar to this:
Step 1.
int start_num = 3; // we know that comparisons starts at number 3
int start = list2.IndexOf(start_num); // we get index of Number (3)
int end = start + 2; // get ending position
int end_num = list2[end]; // get ending number (6)
now we've got positions of numbers (and numbers themselves) for comparison in List2 (3,5,6)
Step 2. To get positions of numbers in List1 for comparison - we can do the following:
int startlist1 = list1.IndexOf(start_num); // starting position
int endlist1 = list1.IndexOf(end_num); // ending position
the range is following: (3,4,5,6)
Step 3. Comparison. Tricky part starts here and I need a help with it
Basically now we need to compare list2 at (3,5,6) with list1 at (3,4,5,6). The missing number is "4".
// I have troubles with this step but the result will be:
int remove_it = 4; // or int []
Step 4. Odd number removal.
int remove_it = 4;
list1 = list1.Where(a => a != remove_it).ToList();
works great, but what will happen if we have 2 missing numbers? i.e.
int remove_it = 4 // becomes int[] remove_it = {4, 0}
Result As you have guessed the result is new List1, without number 4 in it.
richTextBox1.Text = "" + string.Join(",", list1.ToArray()); // output: 0,1,2,3,5,6
textBox1.Text = "" + start + " " + start_num; // output: 2 3
textBox3.Text = "" + end + " " + end_num; // output: 4 6
textBox2.Text = "" + startlist1; // output: 3
textBox4.Text = "" + endlist1; // output: 6
Can you guy help me out with Step 3 or point me out to the right direction?
Also, can you say what will happen if starting number(start_num) is the last number, but I need to get next two numbers? In example from above numbers were 3,5,6, but they should be no different than 5,6,0 or 6,0,1 or 0,1,2.

Just answering the first part:
var list3 = list1.Intersect(list2);
This will set list3 to { 0, 1, 2, 3, 4, 5, 6 } - { 0, 4 } = { 1, 2, 3, 5, 6 }
And a reaction to step 1:
int start_num = 3; // we know that comparisons starts at number 3
int start = list2.IndexOf(start_num); // we get index of Number (3)
int end = start + 2; // get ending position
From where do you get all those magic numbers (3, + 2 ) ?
I think you are over-thinking this, a lot.

var result = list1.Intersect(list2)
You can add a .ToList on the end if you really need the result to be a list.

List<int> list2 = new List<int>(new[] { 1, 2, 3, 5, 6 }); // missing: 0 and 4
List<int> list1 = new List<int>(new[] { 0, 1, 2, 3, 4, 5, 6 });
// find items in list 2 notin 1
var exceptions = list1.Except(list2);
// or are you really wanting to do a union? (unique numbers in both arrays)
var uniquenumberlist = list1.Union(list2);
// or are you wanting to find common numbers in both arrays
var commonnumberslist = list1.Intersect(list2);

maybe you should work with OrderedList instead of List...

Something like this:
list1.RemoveAll(l=> !list2.Contains(l));

To get the numbers that exist in list1 but not in list2, you use the Except extension method:
IEnumerable<int> missing = list1.Except(list2);
To loop through this result to remove them from list1, you have to realise the result, otherwise it will read from the list while you are changing it, and you get an exception:
List<int> missing = list1.Except(list2).ToList();
Now you can just remove them:
foreach (int number in missing) {
list1.Remove(number);
}

I'm not sure I understand your issue, and I hope the solution I give you to be good for you.
You have 2 lists:
List list2 = new List(new[] { 1, 2, 3, 5, 6 }); // missing: 0 and 4
List list1 = new List(new[] { 0, 1, 2, 3, 4, 5, 6 });
To remove from list1 all the missing numbers in list2 I suggest this solution:
Build a new list with missing numbers:
List diff = new List();
then put all the numbers you need to remove in this list. Now the remove process should be simple, just take all the elements you added in diff and remove from list2.

Did I understand correctly that algorithm is:
1) take first number in List 2 and find such number in List1,
2) then remove everything from list 1 until you find second number form list2 (5)
3) repeat step 2) for next number in list2.?

You can use Intersect in conjunction with Skip and Take to get the intersection logic combined with a range (here we ignore the fact 0 is missing as we skip it):
static void Main(string[] args)
{
var list1 = new List<int> { 1, 2, 3, 4, 5 };
var list2 = new List<int> { 0, 1, 2, 3, 5, 6 };
foreach (var i in list2.Skip(3).Take(3).Intersect(list1))
Console.WriteLine(i); // Outputs 3 then 5.
Console.Read();
}
Though if I'm being really honest, I'm not sure what is being asked - the only thing I'm certain on is the intersect part:
var list1 = new List<int> { 1, 2, 3, 4, 5 };
var list2 = new List<int> { 0, 1, 2, 3, 5, 6 };
foreach (var i in list2.Intersect(list1))
Console.WriteLine(i); // Outputs 1, 2, 3, 5.

ok, seems like I hadn't explained the problem well enough, sorry about it. Anyone interested can understand what I meant by looking at this code:
List<int> list2 = new List<int>() { 1, 2, 3, 5, 6 }; // missing: 0 and 4
List<int> list1 = new List<int>() { 0, 1, 2, 3, 4, 5, 6 };
int number = 3; // starting position
int indexer = list2.BinarySearch(number);
if (indexer < 0)
{
list2.Insert(~index, number); // don't look at this part
}
// get indexes of "starting position"
int index1 = list1.Select((item, i) => new { Item = item, Index = i }).First(x => x.Item == number).Index;
int index2 = list2.Select((item, i) => new { Item = item, Index = i }).First(x => x.Item == number).Index;
// reorder lists starting at "starting position"
List<int> reorderedList1 = list1.Skip(index1).Concat(list1.Take(index1)).ToList(); //main big
List<int> reorderedList2 = list2.Skip(index2).Concat(list2.Take(index2)).ToList(); // main small
int end = 2; // get ending position: 2 numbers to the right
int end_num = reorderedList2[end]; // get ending number
int endlist1 = reorderedList1.IndexOf(end_num); // ending position
//get lists for comparison
reorderedList2 = reorderedList2.Take(end + 1).ToList();
reorderedList1 = reorderedList1.Take(endlist1 + 1).ToList();
//compare lists
var list3 = reorderedList1.Except(reorderedList2).ToList();
if (list3.Count != 0)
{
foreach (int item in list3)
{
list1 = list1.Where(x => x != item).ToList(); // remove from list
}
}
// list1 is the result that I wanted to see
if there are any ways to optimize this code please inform me. cheers.

Related

c# - How to remove from List using LINQ the elements with X times of repetition?

I have "myList" of int[].
int[] linha1 = { 1, 2, 2, 2, 3 };
int[] linha2 = { 1, 2, 2, 3, 4 };
int[] linha3 = { 1, 2, 3, 4, 5 };
List<int[]> myList = new List<int[]>();
myList.Add(linha1);
myList.Add(linha2);
myList.Add(linha3);
I want to remove from myList the elements with numbers repeating more than twice.
Ex.: only "linha1" would be removed because the number "2" repeats 3 times.
Is there a way using LINQ?
Just for the challenge try this
var linhaTokeep = myList.Where(l =>l.GroupBy(i => i).All(v => v.Count() <= 2));
foreach (var b in linhaTokeep)
{
Console.WriteLine(string.Join(',' ,b));
}

The union of the intersects of the 2 set combinations of a sequence of sequences

How can I find the set of items that occur in 2 or more sequences in a sequence of sequences?
In other words, I want the distinct values that occur in at least 2 of the passed in sequences.
Note:
This is not the intersect of all sequences but rather, the union of the intersect of all pairs of sequences.
Note 2:
The does not include the pair, or 2 combination, of a sequence with itself. That would be silly.
I have made an attempt myself,
public static IEnumerable<T> UnionOfIntersects<T>(
this IEnumerable<IEnumerable<T>> source)
{
var pairs =
from s1 in source
from s2 in source
select new { s1 , s2 };
var intersects = pairs
.Where(p => p.s1 != p.s2)
.Select(p => p.s1.Intersect(p.s2));
return intersects.SelectMany(i => i).Distinct();
}
but I'm concerned that this might be sub-optimal, I think it includes intersects of pair A, B and pair B, A which seems inefficient. I also think there might be a more efficient way to compound the sets as they are iterated.
I include some example input and output below:
{ { 1, 1, 2, 3, 4, 5, 7 }, { 5, 6, 7 }, { 2, 6, 7, 9 } , { 4 } }
returns
{ 2, 4, 5, 6, 7 }
and
{ { 1, 2, 3} } or { {} } or { }
returns
{ }
I'm looking for the best combination of readability and potential performance.
EDIT
I've performed some initial testing of the current answers, my code is here. Output below.
Original valid:True
DoomerOneLine valid:True
DoomerSqlLike valid:True
Svinja valid:True
Adricadar valid:True
Schmelter valid:True
Original 100000 iterations in 82ms
DoomerOneLine 100000 iterations in 58ms
DoomerSqlLike 100000 iterations in 82ms
Svinja 100000 iterations in 1039ms
Adricadar 100000 iterations in 879ms
Schmelter 100000 iterations in 9ms
At the moment, it looks as if Tim Schmelter's answer performs better by at least an order of magnitude.
// init sequences
var sequences = new int[][]
{
new int[] { 1, 2, 3, 4, 5, 7 },
new int[] { 5, 6, 7 },
new int[] { 2, 6, 7, 9 },
new int[] { 4 }
};
One-line way:
var result = sequences
.SelectMany(e => e.Distinct())
.GroupBy(e => e)
.Where(e => e.Count() > 1)
.Select(e => e.Key);
// result is { 2 4 5 7 6 }
Sql-like way (with ordering):
var result = (
from e in sequences.SelectMany(e => e.Distinct())
group e by e into g
where g.Count() > 1
orderby g.Key
select g.Key);
// result is { 2 4 5 6 7 }
May be fastest code (but not readable), complexity O(N):
var dic = new Dictionary<int, int>();
var subHash = new HashSet<int>();
int length = array.Length;
for (int i = 0; i < length; i++)
{
subHash.Clear();
int subLength = array[i].Length;
for (int j = 0; j < subLength; j++)
{
int n = array[i][j];
if (!subHash.Contains(n))
{
int counter;
if (dic.TryGetValue(n, out counter))
{
// duplicate
dic[n] = counter + 1;
}
else
{
// first occurance
dic[n] = 1;
}
}
else
{
// exclude duplucate in sub array
subHash.Add(n);
}
}
}
This should be very close to optimal - how "readable" it is depends on your taste. In my opinion it is also the most readable solution.
var seenElements = new HashSet<T>();
var repeatedElements = new HashSet<T>();
foreach (var list in source)
{
foreach (var element in list.Distinct())
{
if (seenElements.Contains(element))
{
repeatedElements.Add(element);
}
else
{
seenElements.Add(element);
}
}
}
return repeatedElements;
You can skip already Intesected sequences, this way will be a little faster.
public static IEnumerable<T> UnionOfIntersects<T>(this IEnumerable<IEnumerable<T>> source)
{
var result = new List<T>();
var sequences = source.ToList();
for (int sequenceIdx = 0; sequenceIdx < sequences.Count(); sequenceIdx++)
{
var sequence = sequences[sequenceIdx];
for (int targetSequenceIdx = sequenceIdx + 1; targetSequenceIdx < sequences.Count; targetSequenceIdx++)
{
var targetSequence = sequences[targetSequenceIdx];
var intersections = sequence.Intersect(targetSequence);
result.AddRange(intersections);
}
}
return result.Distinct();
}
How it works?
Input: {/*0*/ { 1, 2, 3, 4, 5, 7 } ,/*1*/ { 5, 6, 7 },/*2*/ { 2, 6, 7, 9 } , /*3*/{ 4 } }
Step 0: Intersect 0 with 1..3
Step 1: Intersect 1 with 2..3 (0 with 1 already has been intersected)
Step 2: Intersect 2 with 3 (0 with 2 and 1 with 2 already has been intersected)
Return: Distinct elements.
Result: { 2, 4, 5, 6, 7 }
You can test it with the below code
var lists = new List<List<int>>
{
new List<int> {1, 2, 3, 4, 5, 7},
new List<int> {5, 6, 7},
new List<int> {2, 6, 7, 9},
new List<int> {4 }
};
var result = lists.UnionOfIntersects();
You can try this approach, it might be more efficient and also allows to specify the minimum intersection-count and the comparer used:
public static IEnumerable<T> UnionOfIntersects<T>(this IEnumerable<IEnumerable<T>> source
, int minIntersectionCount
, IEqualityComparer<T> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<T>.Default;
foreach (T item in source.SelectMany(s => s).Distinct(comparer))
{
int containedInHowManySequences = 0;
foreach (IEnumerable<T> seq in source)
{
bool contained = seq.Contains(item, comparer);
if (contained) containedInHowManySequences++;
if (containedInHowManySequences == minIntersectionCount)
{
yield return item;
break;
}
}
}
}
Some explaining words:
It enumerates all unique items in all sequences. Since Distinct is using a set this should be pretty efficient. That can help to speed up in case of many duplicates in all sequences.
The inner loop just looks into every sequence if the unique item is contained. Thefore it uses Enumerable.Contains which stops execution as soon as one item was found(so duplicates are no issue).
If the intersection-count reaches the minum intersection count this item is yielded and the next (unique) item is checked.
That should nail it:
int[][] test = { new int[] { 1, 2, 3, 4, 5, 7 }, new int[] { 5, 6, 7 }, new int[] { 2, 6, 7, 9 }, new int[] { 4 } };
var result = test.SelectMany(a => a.Distinct()).GroupBy(x => x).Where(g => g.Count() > 1).Select(y => y.Key).ToList();
First you make sure, there are no duplicates in each sequence. Then you join all sequences to a single sequence and look for duplicates as e.g. here.

Merging arrays with common element

I want to merge arrays with common element. I have list of arrays like this:
List<int[]> arrList = new List<int[]>
{
new int[] { 1, 2 },
new int[] { 3, 4, 5 },
new int[] { 2, 7 },
new int[] { 8, 9 },
new int[] { 10, 11, 12 },
new int[] { 3, 9, 13 }
};
and I would like to merge these arrays like this:
List<int[]> arrList2 = new List<int[]>
{
new int[] { 1, 2, 7 },
new int[] { 10, 11, 12 },
new int[] { 3, 4, 5, 8, 9, 13 } //order of elements doesn't matter
};
How to do it?
Let each number be a vertex in the labelled graph. For each array connect vertices pointed by the numbers in the given array. E.g. given array (1, 5, 3) create two edges (1, 5) and (5, 3). Then find all the connected components in the graph (see: http://en.wikipedia.org/wiki/Connected_component_(graph_theory))
I'm pretty sure it is not the best and the fastest solution, but works.
static List<List<int>> Merge(List<List<int>> source)
{
var merged = 0;
do
{
merged = 0;
var results = new List<List<int>>();
foreach (var l in source)
{
var i = results.FirstOrDefault(x => x.Intersect(l).Any());
if (i != null)
{
i.AddRange(l);
merged++;
}
else
{
results.Add(l.ToList());
}
}
source = results.Select(x => x.Distinct().ToList()).ToList();
}
while (merged > 0);
return source;
}
I've used List<List<int>> instead of List<int[]> to get AddRange method available.
Usage:
var results = Merge(arrList.Select(x => x.ToList()).ToList());
// to get List<int[]> instead of List<List<int>>
var array = results.Select(x => x.ToArray()).ToList();
Use Disjoint-Set Forest data structure. The data structure supports three operations:
MakeSet(item) - creates a new set with a single item
Find(item) - Given an item, look up a set.
Union(item1, item2) - Given two items, connects together the sets to which they belong.
You can go through each array, and call Union on its first element and each element that you find after it. Once you are done with all arrays in the list, you will be able to retrieve the individual sets by going through all the numbers again, and calling Find(item) on them. Numbers the Find on which produce the same set should be put into the same array.
This approach finishes the merge in O(α(n)) amortized (α grows very slowly, so for all practical purposes it can be considered a small constant).

How to display how many times an array element appears

I am new to C# and hope I can get some help on this topic. I have an array with elements and I need to display how many times every item appears.
For instance, in [1, 2, 3, 4, 4, 4, 3], 1 appears one time, 4 appears three times, and so on.
I have done the following but don`t know how to put it in the foreach/if statement...
int[] List = new int[]{1,2,3,4,5,4,4,3};
foreach(int d in List)
{
if("here I want to check for the elements")
}
Thanks you, and sorry if this is a very basic one...
You can handle this via Enumerable.GroupBy. I recommend looking at the C# LINQ samples section on Count and GroupBy for guidance.
In your case, this can be:
int[] values = new []{1,2,3,4,5,4,4,3};
var groups = values.GroupBy(v => v);
foreach(var group in groups)
Console.WriteLine("Value {0} has {1} items", group.Key, group.Count());
You can keep a Dictionary of items found as well as their associated counts. In the example below, dict[d] refers to an element by its value. For example d = 4.
int[] List = new int[]{1,2,3,4,5,4,4,3};
var dict = new Dictionary<int, int>();
foreach(int d in List)
{
if (dict.ContainsKey(d))
dict[d]++;
else
dict.Add(d, 1);
}
When the foreach loop terminates you'll have one entry per unique value in dict. You can get the count of each item by accessing dict[d], where d is some integer value from your original list.
The LINQ answers are nice, but if you're trying to do it yourself:
int[] numberFound = new int[6];
int[] List = new int[] { 1, 2, 3, 4, 5, 4, 4, 3 };
foreach (int d in List)
{
numberFound[d]++;
}
var list = new int[] { 1, 2, 3, 4, 5, 4, 4, 3 };
var groups = list.GroupBy(i => i).Select(i => new { Number = i.Key, Count = i.Count() });
private static void CalculateNumberOfOccurenceSingleLoop()
{
int[] intergernumberArrays = { 1, 2, 3, 4, 1, 2, 4, 1, 2, 3, 5, 6, 1, 2, 1, 1, 2 };
Dictionary<int, int> NumberOccurence = new Dictionary<int, int>();
for (int i = 0; i < intergernumberArrays.Length; i++)
{
if (NumberOccurence.ContainsKey(intergernumberArrays[i]))
{
var KeyValue = NumberOccurence.Where(j => j.Key == intergernumberArrays[i]).FirstOrDefault().Value;
NumberOccurence[intergernumberArrays[i]] = KeyValue + 1;
}
else
{
NumberOccurence.Add(intergernumberArrays[i], 1);
}
}
foreach (KeyValuePair<int, int> item in NumberOccurence)
{
Console.WriteLine(item.Key + " " + item.Value);
}
Console.ReadLine();
}

Finding duplicates within list of list

Simple situation. I have a list of lists, almost table like, and I am trying to find out if any of the lists are duplicated.
Example:
List<List<int>> list = new List<List<int>>(){
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};
I would like to know that there are 4 total items, 2 of which are duplicates. I was thinking about doing something like a SQL checksum but I didn't know if there was a better/easier way.
I care about performance, and I care about ordering.
Additional Information That May Help
Things inserted into this list will never be removed
Not bound to any specific collection.
Dont care about function signature
They type is not restricted to int
Let's try to get best performace. if n is number of lists and m is length of lists then we can get O(nm + nlogn + n) plus some probability of hash codes to be equal for different lists.
Major steps:
Calculate hash codes*
Sort them
Go over list to find dupes
* this is important step. for simlicity you can calc hash as = ... ^ (list[i] << i) ^ (list[i + 1] << (i + 1))
Edit for those people that think that PLINQ can boost the thing, but not good algorythm. PLINQ can also be added here, because all the steps are easily parallelizable.
My code:
static public void Main()
{
List<List<int>> list = new List<List<int>>(){
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};
var hashList = list.Select((l, ind) =>
{
uint hash = 0;
for (int i = 0; i < l.Count; i++)
{
uint el = (uint)l[i];
hash ^= (el << i) | (el >> (32 - i));
}
return new {hash, ind};
}).OrderBy(l => l.hash).ToList();
//hashList.Sort();
uint prevHash = hashList[0].hash;
int firstInd = 0;
for (int i = 1; i <= hashList.Count; i++)
{
if (i == hashList.Count || hashList[i].hash != prevHash)
{
for (int n = firstInd; n < i; n++)
for (int m = n + 1; m < i; m++)
{
List<int> x = list[hashList[n].ind];
List<int> y = list[hashList[m].ind];
if (x.Count == y.Count && x.SequenceEqual(y))
Console.WriteLine("Dupes: {0} and {1}", hashList[n].ind, hashList[m].ind);
}
}
if (i == hashList.Count)
break;
if (hashList[i].hash != prevHash)
{
firstInd = i;
prevHash = hashList[i].hash;
}
}
}
Unless you're doing some seriously heavy lifting, perhaps the following simple code will work for you:
var lists = new List<List<int>>()
{
new List<int>() {0 ,1, 2, 3, 4, 5, 6 },
new List<int>() {0 ,1, 2, 3, 4, 5, 6 },
new List<int>() {0 ,1, 4, 2, 4, 5, 6 },
new List<int>() {0 ,3, 2, 5, 1, 6, 4 }
};
var duplicates = from list in lists
where lists.Except(new[] { list }).Any(l => l.SequenceEqual(list))
select list;
Obviously you could get better performance if you hand-tweak an algorithm such that you don't have to scan the lists each iteration, but there is something to be said for writing declarative, simpler code.
(Plus, thanks to the Awesomeness of LINQ®, by adding a .AsParallel() call to the above code, the algorithm will run on multiple cores, thus running potentially faster than the complex, hand-tweaked solutions mentioned in this thread.)
Something like this will give you the correct results:
List<List<int>> list = new List<List<int>>(){
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};
list.ToLookup(l => String.Join(",", l.Select(i => i.ToString()).ToArray()))
.Where(lk => lk.Count() > 1)
.SelectMany(group => group);
You will have to iterate through each index of each list at least once, but you can potentially speed up the process by creating a custom hash table, so that you can quickly reject non-duplicate lists without having to do comparisons per-item.
Algorithm:
Create a custom hashtable (dictionary: hash -> list of lists)
For each list
Take a hash of the list (one that takes order into account)
Search in hashtable
If you find matches for the hash
For each list in the hash entry, re-compare the tables
If you find a duplicate, return true
Else if you don't find matches for the hash
Create a temp list
Append the current list to our temp list
Add the temp list to the dictionary as a new hash entry
You didn't find any duplicates, so return false
If you have a strong enough hashing algorithm for your input data, you might not even have to do the sub-comparisons, since there wouldn't be any hash collisions.
I have some example code. The missing bits are:
An optimization so that we do the dictionary lookup only once per list (for search and insert). Might have to make your own Dictionary/Hash Table class to do this?
A better hashing algorithm that you find by profiling a bunch of them against your data
Here is the code:
public bool ContainsDuplicate(List<List<int>> input)
{
var encounteredLists = new Dictionary<int, List<EnumerableWrapper>>();
foreach (List<int> currentList in input)
{
var currentListWrapper = new EnumerableWrapper(currentList);
int hash = currentListWrapper.GetHashCode();
if (encounteredLists.ContainsKey(hash))
{
foreach (EnumerableWrapper currentEncounteredEntry in encounteredLists[hash])
{
if (currentListWrapper.Equals(currentEncounteredEntry))
return true;
}
}
else
{
var newEntry = new List<EnumerableWrapper>();
newEntry.Add(currentListWrapper);
encounteredLists[hash] = newEntry;
}
}
return false;
}
sealed class EnumerableWrapper
{
public EnumerableWrapper(IEnumerable<int> list)
{
if (list == null)
throw new ArgumentNullException("list");
this.List = list;
}
public IEnumerable<int> List { get; private set; }
public override bool Equals(object obj)
{
bool result = false;
var other = obj as EnumerableWrapper;
if (other != null)
result = Enumerable.SequenceEqual(this.List, other.List);
return result;
}
public override int GetHashCode()
{
// Todo: Implement your own hashing algorithm here
var sb = new StringBuilder();
foreach (int value in List)
sb.Append(value.ToString());
return sb.ToString().GetHashCode();
}
}
Here's a potential idea (this assumes that the values are numerical):
Implement a comparer that multiplies each member of each collection by its index, then sum the whole thing:
Value: 0 5 8 3 2 0 5 3 5 1
Index: 1 2 3 4 5 6 7 8 9 10
Multiple: 0 10 24 12 10 0 35 24 45 10
Member CheckSum: 170
So, the whole "row" has a number that changes with the members and ordering. Fast to compute and compare.
You could also try probabilistic algorithms if duplicates are either very rare or very common. e.g. a bloom filter
What about that writing your own list comparer:
class ListComparer:IEqualityComparer<List<int>>
{
public bool Equals(List<int> x, List<int> y)
{
if(x.Count != y.Count)
return false;
for(int i = 0; i < x.Count; i++)
if(x[i] != y[i])
return false;
return true;
}
public int GetHashCode(List<int> obj)
{
return base.GetHashCode();
}
}
and then just:
var nonDuplicatedList = list.Distinct(new ListComparer());
var distinctCount = nonDuplicatedList.Count();
if they are all single digit and have the same number of elements you can put them together so the first one is 123456 and check if the numbers are the same.
then you would have a list {123456, 123456, 142456, 325164}
which is easier to check for duplicates, if the individual members can be more than 10 you would have to modify this.
Edit: added sample code, can be optimize, just a quick example to explain what I meant.
for(int i = 0; i< list.length; i++)
{
List<int> tempList = list[i];
int temp = 0;
for(int j = tempList.length - 1;i > = 0; j--)
{
temp = temp * 10 + tempList[j];
}
combinded.add(temp);
}
for(int i =0; i< combined.length; i++)
{
for(int j = i; j < combined.length; j++)
{
if(combined[i] == combined[j])
{
return true;
}
}
}
return false;
There are a number of good solutions here already, but I believe this one will consistently run the fastest unless there is some structure of the data that you haven't yet told us about.
Create a map from integer key to List, and a map from key to List<List<int>>
For each List<int>, compute a hash using some simple function like (...((x0)*a + x1)*a + ...)*a + xN) which you can calculate recursively; a should be something like 1367130559 (i.e. some large prime that is randomly non-close to any interesting power of 2.
Add the hash and the list it comes from as a key-value pair, if it does not exist. If it does exist, look in the second map. If the second map has that key, append the new List<int> to the accumulating list. If not, take the List<int> you looked up from the first map and the List<int> you were testing, and add a new entry in the second map containing a list of those two items.
Repeat until you've passed through your entire first list. Now you have a hashmap with a list of potential collisions (the second map), and a hashmap with a list of keys (the first map).
Iterate through the second map. For each entry, take the List<List<int>> therein and sort it lexicographically. Now just walk through doing equality comparisons to count the number of different blocks.
Your total number of items is equal to the length of your original list.
Your number of distinct items is equal to the size of your first hashmap plus the sum of (number of blocks - 1) for each entry in your second hashmap.
Your number of duplicate items is the difference of those two numbers (and you can find out all sorts of other things if you want).
If you have N non-duplicate items, and M entries which are duplicates from a set of K items, then it will take you O(N+M+2K) to create the initial hash maps, at the very worst O(M log M) to do the sorting (and probably more like O(M log(M/K))), and O(M) to do the final equality test.
Check out C# 3.0: Need to return duplicates from a List<> it shows you how to return duplicates from the list.
Example from that page:
var duplicates = from car in cars
group car by car.Color into grouped
from car in grouped.Skip(1)
select car;

Categories