How to compare two List<String> to each other? [duplicate] - c#

This question already has answers here:
Is there a built-in method to compare collections?
(15 answers)
Closed 9 years ago.
Let's say there are
List<string> a1 = new List<string>();
List<string> a2 = new List<string>();
Is there way to do like this?
if (a1 == a2)
{
}

If you want to check that the elements inside the list are equal and in the same order, you can use SequenceEqual:
if (a1.SequenceEqual(a2))
See it working online: ideone

You could also use Except(produces the set difference of two sequences) to check whether there's a difference or not:
IEnumerable<string> inFirstOnly = a1.Except(a2);
IEnumerable<string> inSecondOnly = a2.Except(a1);
bool allInBoth = !inFirstOnly.Any() && !inSecondOnly.Any();
So this is an efficient way if the order and if the number of duplicates does not matter(as opposed to the accepted answer's SequenceEqual). Demo: Ideone
If you want to compare in a case insentive way, just add StringComparer.OrdinalIgnoreCase:
a1.Except(a2, StringComparer.OrdinalIgnoreCase)

I discovered that SequenceEqual is not the most efficient way to compare two lists of strings (initially from http://www.dotnetperls.com/sequenceequal).
I wanted to test this myself so I created two methods:
/// <summary>
/// Compares two string lists using LINQ's SequenceEqual.
/// </summary>
public bool CompareLists1(List<string> list1, List<string> list2)
{
return list1.SequenceEqual(list2);
}
/// <summary>
/// Compares two string lists using a loop.
/// </summary>
public bool CompareLists2(List<string> list1, List<string> list2)
{
if (list1.Count != list2.Count)
return false;
for (int i = 0; i < list1.Count; i++)
{
if (list1[i] != list2[i])
return false;
}
return true;
}
The second method is a bit of code I encountered and wondered if it could be refactored to be "easier to read." (And also wondered if LINQ optimization would be faster.)
As it turns out, with two lists containing 32k strings, over 100 executions:
Method 1 took an average of 6761.8 ticks
Method 2 took an average of 3268.4 ticks
I usually prefer LINQ for brevity, performance, and code readability; but in this case I think a loop-based method is preferred.
Edit:
I recompiled using optimized code, and ran the test for 1000 iterations. The results still favor the loop (even more so):
Method 1 took an average of 4227.2 ticks
Method 2 took an average of 1831.9 ticks
Tested using Visual Studio 2010, C# .NET 4 Client Profile on a Core i7-920

private static bool CompareDictionaries(IDictionary<string, IEnumerable<string>> dict1, IDictionary<string, IEnumerable<string>> dict2)
{
if (dict1.Count != dict2.Count)
{
return false;
}
var keyDiff = dict1.Keys.Except(dict2.Keys);
if (keyDiff.Any())
{
return false;
}
return (from key in dict1.Keys
let value1 = dict1[key]
let value2 = dict2[key]
select value1.Except(value2)).All(diffInValues => !diffInValues.Any());
}

You can check in all the below ways for a List
List<string> FilteredList = new List<string>();
//Comparing the two lists and gettings common elements.
FilteredList = a1.Intersect(a2, StringComparer.OrdinalIgnoreCase);

Related

LINQ Select with a method that returns a type - creating a new list [duplicate]

This question already has answers here:
Convert a list to a string in C#
(14 answers)
Closed 9 months ago.
I am a mere beginner and I am trying to learn a bit of LINQ. I have a list of values and I want to receive a different list based on some computation. For example, the below is often quoted in various examples across the Internet:
IEnumerable<int> squares = Enumerable.Range(1, 10).Select(x => x * x);
here the "computation" is done by simply multiplying a member of the original list by itself.
I wanted to actually use a method that returns a string and takes x as an argument.
Here is the code I wrote:
namespace mytests{
class program {
static void Main (string[] args)
{
List<string> nums = new List<string>();
nums.Add("999");
nums.Add("888");
nums.Add("777");
IEnumerable<string> strings = nums.AsEnumerable().Select(num => GetStrings(num));
Console.WriteLine(strings.ToString());
}
private static string GetStrings (string num){
if (num == "999")
return "US";
else if (num == "888")
{
return "GB";
}
else
{
return "PL";
}
}
}
}
It compiles but when debugging, the method GetStrings is never accessed and the strings object does not have any members. I was expecting it to return "US", "GB", "PL".
Any advice on what I could be doing wrong?
Thanks.
IEnumerable<string>.ToString() method does not work as you expected. Result will be
System.Collections.Generic.List`1[System.String]
If you want to see the values which are held in the collection, you should create iteration.
foreach (var i in strings)
Console.WriteLine(i);
This line does two things for you. One of them is writing the values which are held in the collection to console. The other operation is iterating the collection. During iteration, values are needed and linq will execute the necessary operation (in your case GetStrings method).
Currently your code does not use the collection values, so the code does not evaluate the values and does not trigger GetStrings method.

C# All Unique Combinations of List<string> [duplicate]

This question already has answers here:
All Possible Combinations of a list of Values
(19 answers)
Closed 4 years ago.
This question has been asked many times, but every SO post I've seen wants a specific length in values, whereas I just want to know every unique combination regardless of length.
The code below only provides a list where there are exactly 3 entries of combinations (and they are not unique).
List<string> list = new List<string> { "003_PS", "003_DH", "003_HEAT" };
var perms = list.GetPermutations();
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> GetPermutations<T>(this IEnumerable<T> items)
{
foreach (var item in items)
{
var itemAsEnumerable = Enumerable.Repeat(item, 1);
var subSet = items.Except(itemAsEnumerable);
if (!subSet.Any())
{
yield return itemAsEnumerable;
}
else
{
foreach (var sub in items.Except(itemAsEnumerable).GetPermutations())
{
yield return itemAsEnumerable.Union(sub);
}
}
}
}
}
/*
OUTPUT:
003_PS, 003_DH, 003_HEAT
003_PS, 003_HEAT, 003_DH
003_DH, 003_PS, 003_HEAT
003_DH, 003_HEAT, 003_PS
003_HEAT, 003_PS, 003_DH
003_HEAT, 003_DH, 003_PS
*/
What I'm looking for is this:
/*
OUTPUT:
003_PS, 003_DH, 003_HEAT
003_PS, 003_DH
003_PS, 003_HEAT
003_PS
003_DH, 003_HEAT
003_DH
003_HEAT
*/
The size is not limited to 3 items and each entry is unique.
What do I need to change in this function? I'm open to a LINQ solution
Any help is appreciated. Thanks!
EDIT: list above was not accurate to output
**EDIT #2:
Here's a Javascript version of what I'm looking for. But I don't know what the C# equivalent syntax is:
function getUniqueList(arr){
if (arr.length === 1) return [arr];
else {
subarr = getUniqueList(arr.slice(1));
return subarr.concat(subarr.map(e => e.concat(arr[0])), [[arr[0]]]);
}
}
.
OK, you have n items, and you want the 2n combinations of those items.
FYI, this is called the "power set" when you do it on sets; since sequences can contain duplicates and sets cannot, what you want is not exactly the power set, but it's close enough. The code I am presenting will be in a sense wrong if your sequence contains duplicates, since we'll say that the result for {10, 10} will be {}, {10}, {10}, {10, 10} but if you don't like that, well, remove duplicates before you start, and then you'll be computing the power set.
Computing the sequence of all combinations is straightforward. We reason as follows:
If there are zero items then there is only one combination: the combination that has zero items.
If there are n > 0 items then the combinations are all the combinations with the first element, and all the combinations without the first element.
Let's write the code:
using System;
using System.Collections.Generic;
using System.Linq;
static class Extensions
{
public static IEnumerable<T> Prepend<T>(
this IEnumerable<T> items,
T first)
{
yield return first;
foreach(T item in items)
yield return item;
}
public static IEnumerable<IEnumerable<T>> Combinations<T>(
this IEnumerable<T> items)
{
if (!items.Any())
yield return items;
else
{
var head = items.First();
var tail = items.Skip(1);
foreach(var sequence in tail.Combinations())
{
yield return sequence; // Without first
yield return sequence.Prepend(head);
}
}
}
}
public class Program
{
public static void Main()
{
var items = new [] { 10, 20, 30 };
foreach(var sequence in items.Combinations())
Console.WriteLine($"({string.Join(",", sequence)})");
}
}
And we get the output
()
(10)
(20)
(10,20)
(30)
(10,30)
(20,30)
(10,20,30)
Note that this code is not efficient if the original sequence is very long.
EXERCISE 1: Do you see why?
EXERCISE 2: Can you make this code efficient in both time and space for long sequences? (Hint: if you can make the head, tail and prepend operations more memory-efficient, that will go a long way.)
EXERCISE 3: You give the same algorithm in JavaScript; can you make the analogies between each part of the JS algorithm and the C# version?
Of course, we assume that the number of items in the original sequence is short. If it's even a hundred, you're going to be waiting a long time to enumerate all those trillions of combinations.
If you want to get all the combinations with a specific number of elements, see my series of articles on algorithms for that: https://ericlippert.com/2014/10/13/producing-combinations-part-one/

How to check if random values are unique?

C # code:
I have 20 random numbers between 1-100 in an array and the program should check if every value is unique. Now i should use another method which returns true if there are only unique values in the array and false if there are not any unique values in the array. I would appreciate if someone could help me with this.
bool allUnique = array.Distinct().Count() == array.Count(); // or array.Length
or
var uniqueNumbers = new HashSet<int>(array);
bool allUnique = uniqueNumbers.Count == array.Count();
A small alternative to #TimSchmelters excellent answers that can run a bit more efficient:
public static bool AllUniq<T> (this IEnumerable<T> data) {
HashSet<T> hs = new HashSet<T>();
return data.All(hs.Add);
}
What this basically does is generating a for loop:
public static bool AllUniq<T> (this IEnumerable<T> data) {
HashSet<T> hs = new HashSet<T>();
foreach(T x in data) {
if(!hs.Add(x)) {
return false;
}
}
return true;
}
From the moment one hs.Add fails - this because the element already exists - the method returns false, if no such object can be found, it returns true.
The reason that this can work faster is that it will stop the process from the moment a duplicate is found whereas the previously discussed approaches first construct a collection of unique numbers and then compare the size. Now if you iterate over large amount of numbers, constructing the entire distinct list can be computationally intensive.
Furthermore note that there are more clever ways than generate-and-test to generate random distinct numbers. For instance interleave the generate and test procedure. Once a project I had to correct generated Sudoku's this way. The result was that one had to wait entire days before it came up with a puzzle.
Here's a non linq solution
for(int i=0; i< YourArray.Length;i++)
{
for(int x=i+1; x< YourArray.Length; x++)
{
if(YourArray[i] == YourArray[x])
{
Console.WriteLine("Found repeated value");
}
}
}

c# Linq Except Not Returning List of Different Values

I am trying to find the differences in two lists. List, "y" should have 1 unique value when compared to list "x". However, Except, does not return the difference. The, "differences" list's count always equals 0.
List<EtaNotificationUser> etaNotifications = GetAllNotificationsByCompanyIDAndUserID(PrevSelectedCompany.cmp_ID);
IEnumerable<string> x = etaNotifications.OfType<string>();
IEnumerable<string> y = EmailList.OfType<string>();
IEnumerable<string> differences = x.Except(y, new StringLengthEqualityComparer()).ToList();
foreach(string diff in differences)
{
addDiffs.Add(diff);
}
After reading a few posts and articles on the post, I created a custom comparer. The comparer looks at string length (kept it simple for testing) and obtains the Hashcode, since these are two objects of a different type (even though I convert their types to string), I thought it may have been the issue.
class StringLengthEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return x.Length == y.Length;
}
public int GetHashCode(string obj)
{
return obj.Length;
}
}
This is my first time using Except. Sounds like a great, optimized way of comparing two lists, but I can't get it to work.
Update
X - Should hold Email Addresses from the database.
GetAllNotificationsByCompanyIDAndUserID - brings back email values from the DB.
Y - Should hold all Email Addresses in the UI Grid.
What I am trying to do is detect if a new e-mail has been added to the grid. So at this point X will have the saved values from past entries. Y will have any new e-mail addresses add by the user and have not been saved yet.
I have verified this is all working correctly.
The problem is here:
IEnumerable<string> x = etaNotifications.OfType<string>();
but etaNotifications is a List<EtaNotificationUser>, none of which can be a string since string is sealed. OfType returns all instances that are of the given type - it does not "convert" each member to that type.
So x will always be empty.
Maybe you want:
IEnumerable<string> x = etaNotifications.Select(e => e.ToString());
if EtaNotificationUser has overridden ToString to give you the value you want to compare. If the value you want to compare is in a property you can use:
IEnumerable<string> x = etaNotifications.Select(e => e.EmailAddress);
or some other property.
You'll likely have to do something similar for y (unless EmailList is already a List<string> which I doubt).
Assuming you have verified that your two enumerables x and y actually contain the strings you expect them to, I believe your problem is with your string comparer. According to the docs, Enumerable.Except "Produces the set difference of two sequences. The set difference is the members of the first sequence that don't appear in the second sequence." But your equality comparer equates all strings with the same length. Thus, if a string in the first sequence happens to have the same length as a string in the second, it will not be found as different using your comparer.
Update: yup, I just tested it:
public class StringLengthEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return x.Length == y.Length;
}
public int GetHashCode(string obj)
{
return obj.Length;
}
}
string [] array1 = new string [] { "foo", "bar", "yup" };
string[] array2 = new string[] { "dll" };
int diffCount;
diffCount = 0;
foreach (var diff in array1.Except(array2, new StringLengthEqualityComparer()))
{
diffCount++;
}
Debug.Assert(diffCount == 0); // No assert.
diffCount = 0;
foreach (var diff in array1.Except(array2))
{
diffCount++;
}
Debug.Assert(diffCount == 0); // Assert b/c diffCount == 3.
There is no assert with the custom comparer but there is with the standard.

Test whether two IEnumerable<T> have the same values with the same frequencies

I have two multisets, both IEnumerables, and I want to compare them.
string[] names1 = { "tom", "dick", "harry" };
string[] names2 = { "tom", "dick", "harry", "harry"};
string[] names3 = { "tom", "dick", "harry", "sally" };
string[] names4 = { "dick", "harry", "tom" };
Want names1 == names4 to return true (and self == self returns true obviously)
But all other combos return false.
What is the most efficient way? These can be large sets of complex objects.
I looked at doing:
var a = name1.orderby<MyCustomType, string>(v => v.Name);
var b = name4.orderby<MyCustomType, string>(v => v.Name);
return a == b;
First sort as you have already done, and then use Enumerable.SequenceEqual. You can use the first overload if your type implements IEquatable<MyCustomType> or overrides Equals; otherwise you will have to use the second form and provide your own IEqualityComparer<MyCustomType>.
So if your type does implement equality, just do:
return a.SequenceEqual(b);
Here's another option that is both faster, safer, and requires no sorting:
public static bool UnsortedSequencesEqual<T>(
this IEnumerable<T> first,
IEnumerable<T> second)
{
return UnsortedSequencesEqual(first, second, null);
}
public static bool UnsortedSequencesEqual<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var counts = new Dictionary<T, int>(comparer);
foreach (var i in first) {
int c;
if (counts.TryGetValue(i, out c))
counts[i] = c + 1;
else
counts[i] = 1;
}
foreach (var i in second) {
int c;
if (!counts.TryGetValue(i, out c))
return false;
if (c == 1)
counts.Remove(i);
else
counts[i] = c - 1;
}
return counts.Count == 0;
}
The most efficient way would depend on the datatypes. A reasonably efficient O(N) solution that's very short is the following:
var list1Groups=list1.ToLookup(i=>i);
var list2Groups=list2.ToLookup(i=>i);
return list1Groups.Count == list2Groups.Count
&& list1Groups.All(g => g.Count() == list2Groups[g.Key].Count());
The items are required to have a valid Equals and GetHashcode implementation.
If you want a faster solution, cdhowie's solution below is comparably fast # 10000 elements, and pulls ahead by a factor 5 for large collections of simple objects - probably due to better memory efficiency.
Finally, if you're really interested in performance, I'd definitely try the Sort-then-SequenceEqual approach. Although it has worse complexity, that's just a log N factor, and those can definitely be drowned out by differences in the constant for all practical data set sizes - and you might be able to sort in-place, use arrays or even incrementally sort (which can be linear). Even at 4 billion elements, the log-base-2 is just 32; that's a relevant performance difference, but the difference in constant factor could conceivably be larger. For example, if you're dealing with arrays of ints and don't mind modifying the collection order, the following is faster than either option even for 10000000 items (twice that and I get an OutOfMemory on 32-bit):
Array.Sort(list1);
Array.Sort(list2);
return list1.SequenceEqual(list2);
YMMV depending on machine, data-type, lunar cycle, and the other usual factors influencing microbenchmarks.
You could use a binary search tree to ensure that the data is sorted. That would make it an O(log N) operation. Then you can run through each tree one item at a time and break as soon as you find a not equal to condition. This would also give you the added benefit of being able to first compare the size of the two trees since duplicates would be filtered out. I'm assuming these are treated as sets, whereby {"harry", "harry"} == {"harry").
If you are counting duplicates, then do a quicksort or a mergesort first, that would then make your comparison operation an O(N) operation. You could of course compare the size first, as two enums cannot be equal if the sizes are different. Since the data is sorted, the first non-equal condition you encounter would render the entire operation as "not-equal".
#cdhowie's answer is great, but here's a nice trick that makes it even better for types that declare .Count by comparing that value prior to decomposing parameters to IEnumerable. Just add this to your code in addition to his solution:
public static bool UnsortedSequencesEqual<T>(this IReadOnlyList<T> first, IReadOnlyList<T> second, IEqualityComparer<T> comparer = null)
{
if (first.Count != second.Count)
{
return false;
}
return UnsortedSequencesEqual((IEnumerable<T>)first, (IEnumerable<T>)second, comparer);
}

Categories