Difference of two lists C# - c#

I have two lists of strings both of which are ~300,000 lines. List 1 has a few lines more than List 2. What I'm trying to do is find the strings that in List 1 but not in List 2.
Considering how many strings I have to compare, is Except() good enough or is there something better (faster)?

Internally the enumerable Except extension method uses Set<T> to perform the computation. It's going to be as least as fast as any other method.
Go with list1.Except(list2).
It'll give you the best performance and the simplest code.

My suggestion:
HashSet<String> hash1 = new HashSet<String>(new string[] { "a", "b", "c", "d" });
HashSet<String> hash2 = new HashSet<String>(new string[] { "a", "b" });
List<String> result = hash1.Except(hash2).ToList();

Related

algorithm - Generate all combinations of selecting one element from a lot of list [duplicate]

This question already has answers here:
Generating all Possible Combinations
(12 answers)
Closed 7 years ago.
Lets say I have three list:
A = {"b", "c", "d", "x", "y", "z"}
B = {"a", "e", "i"}
I want to generate all combinations of selecting one element from both list.
For just 2 list this is easy (pseuodo-ish code):
combinations = []
for a in A:
for b in B:
combinations += [a, b]
But what if the number of lists is unknown? I want to generalize this somehow preferrably with a C# extension method.
The method signature would be something like this:
public static IEnumerable<IEnumerable<T>> Combination<T>(this IEnumerable<IEnumerable<T>> elements)
EDIT:
I was looking for Cartesian product, thanks for clarification.
If you are just flat combining, and you have a set of these sets, you should be able to simply use a SelectMany on the sets.
public static IEnumerable<[T]> Combination<T>(this IEnumerable<IEnumerable<T>> elements)
{
return elements.Select(
e => elements.Except(e).SelectMany( t => new T[e,t] )
);
}
Assuming you want all combinations including the ones with duplicate elements in a different order, this is basically what the existing method SelectMany does.
It's fairly easy to get all combinations using query expressions and anonymous objects. Here's an example:
var combinations = from a in allAs
from b in allBs
select new { A = a, B = b };
Doing this in one fell swoop (using the proposed Combination method) would at least require you to:
Provide a function to "select" a combination (similar to the signature of the Zip method)
Or, return a generic Tuple<T, T>
But then again, it's really just replacing the usage of a cartesian product (SelectMany) and a projection (Select), both of which already exist.

Hashcode to check uniqueness in a string array

I am storing large number of arrays of data into a List, however, I don't want to store the data if it already exists in my list - the order of the data doesn't matter. I figured using GetHashCode to generate a hashcode would be appropriate because it was supposed to not care about order. However, what I found with a simple test below is that for the first two string[] a1 and a2 it generates a different hashcode.
Can I not utilize this method of checking? Can someone suggest a better way to check please?
string[] a1 = { "cat", "bird", "dog" };
string[] a2 = { "cat", "dog", "bird" };
string[] a3 = { "cat", "fish", "dog" };
Console.WriteLine(a1.GetHashCode());
Console.WriteLine(a2.GetHashCode());
Console.WriteLine(a3.GetHashCode());
the results from the above test produces three different hashcode results.
Ideally, I would have liked to see the same Hashcode for a1 and a2...so I am looking for something that would allow me to quickly check if those strings already exist.
Your arrays aren't equal, by the standard used by arrays for determining equality. The standard used by arrays for determining equality is that two separately created arrays are never equal.
If you want separately created collections with equal elements to compare as equal, then use a collection type which supports that.
I recommend HashSet<T>, in your case HashSet<string>. It doesn't provide the GetHashCode() and Equals() behaviour you want directly, but it has a CreateSetComparer() method that provides you with a helper class that does give you hash code and comparer methods that do what you want.
Just remember that you cannot use this for a quick equality check. You can only use this for a quick inequality check. Two objects that are not equal may still have the same hash code, basically by random chance. It's only when the hash codes aren't equal that you can skip the equality check.
If you say a1.GetHashCode(), this will always generate a new hash code for you:
using System;
public class Program
{
public static void Main()
{
string[] a1 = { "cat", "bird", "dog" };
string[] a2 = { "cat", "dog", "bird" };
string[] a3 = { "cat", "fish", "dog" };
Console.WriteLine(a1.GetHashCode());
Console.WriteLine(a2.GetHashCode());
Console.WriteLine(a3.GetHashCode());
}
}

Sort a List<string[]> by the second value (int) [duplicate]

This question already has answers here:
How to Sort a List<T> by a property in the object
(23 answers)
Closed 7 years ago.
I have a list
List<string[]> listname = ...
The list looks like this:
[string][string][string]
I want to sort the list by second string.
The second string is a number presented as a string, i want to keep it that way, i need it like this.
I want the numbers to be in Increasing order.
Example for data:
["RDP"]["3389"]["TCP"]
["HTTP"]["80"]["TCP"]
["HTTPS"]["443"]["TCP"]
I want to sort by the post number.
In this example "RDP" will become the last one.
How can I do so?
var sortedList = listname.OrderBy(l => int.Parse(l[1]));
This assumes that the second entry in each array is parsable as int. If you're not sure about that, you'd need to add some checks and maybe use int.TryParse.
You can refer to appropriate index of an array in OrderBy:
var l = new List<string[]>() {
new string[] { "a", "b", "c" },
new string[] { "b", "a", "c" },
new string[] { "c", "c", "c" },
};
var s = l.OrderBy(c => c[1]).ToList();

Getting the Matched number of the two array

I have 2 arrays.
String[] arrFirst={"a","b","c","d","e"};
String[] arrSecond={"a","b","f","d","g"};
String[] arrThird={"a","f","g","h","e"};
I want the results like for arrFirst and arrSecond , the result is 3
For arrFirst and arrThird, the result is 2
All the code that I found are comparing two arrays and return whether they are example the same or not.
But what I want is how many are matched.
I can do the looping way.
But I think it will take too much time and I am wondering whether there is any faster way.
Thanks..
You can use Intersect method.
String[] arrFirst={"a","b","c","d","e"};
String[] arrSecond={"a","b","f","d","g"};
String[] arrThird={"a","f","g","h","e"};
arrFirst.Intersect(arrSecond).Count(); // 3
arrFirst.Intersect(arrThird).Count(); //2
arrFirst.Join(arrSecond,f=>f,s=>s,(f,s)=>f).count();
arrFirst.Zip(arrSecond, (a, b) => a.Equals(b)).Count(a => a);

Why Except function applying Distinct? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Except has similar effect to Distinct?
I'm having two List<String> like
lstOne = { "A", "B", "C" ,"C" ,"C" };
lstTwo = { "A" };
lstResult = lstOne.Except(lstTwo).ToList();
Now the expected output is
lstReult = { "B","C","C","C" };
But the actula result is like
lstResult = { "B","C" };
Why its so? i've used Except , why its applying Distinct too?
"Except" is documented as returning the set difference of two sequences.
The set difference by definition is a set. Sets by definition don't have duplicates.
the expected output is ...
No, the expected output is identical to the actual output.
If you expect something different, my advice is to adjust your expectations to match the documented behaviour.
It is documented as returning "A sequence that contains the set difference of the elements of two sequences.". Sets do not have duplicates.
It is perhaps a subtle point, but it functions as per the spec.
If you want the dups:
var lstOne = new[] { "A", "B", "C" ,"C" ,"C" };
var except = new HashSet<string> { "A" };
var lstResult = lstOne.Where(x => !except.Contains(x)).ToList();
// ^^ "B", "C", "C", "C"
MSDN Definition: "Produces the set difference of two sequences by using the default equality comparer to compare values." --> Difference as set --> each key is unique.

Categories