I have a small struct and I have to compare the values to find which ones have the same FreeFlow text, and then grab that struct ENumber.
public struct Holder
{
public string FreeFlow;
public int ENumber;
}
and here is how I add them
foreach(Class1.TextElement re in Class1._TextElements)
{
//create struct with all details will be good for later
Holder ph = new Holder();
ph.FreeFlow = re.FreeFlow;
ph.ENumber = re.ENumber;
lstHolder.Add(ph);
}
foreach(Class1.TextElement2 re in Class1._TextElements2)
{
//create struct with all details will be good for later
Holder phi = new Holder();
phi.FreeFlow = re.FreeFlow;
phi.ENumber = re.ENumber;
lstHolder2.Add(phi);
}
I can do a comparing using a foreach within a foreach, but I think this will not be the most effective way. Any help?
EDIT: I am trying to determine if freeflow text is exactly the same as the other struct freeflow text
I have to compare the values to find
which ones have the same FreeFlow
text, and then grab that struct
ENumber.
If you can use LINQ you can join on the items with the same FreeFlow text then select the ENumber values of both items:
var query = from x in Class1._TextElements
join y in Class1._TextElements2 on x.FreeFlow equals y.FreeFlow
select new { xId = x.ENumber, yId = y.ENumber };
foreach (var item in query)
{
Console.WriteLine("{0} : {1}", item.xId, item.yId);
}
EDIT: my understanding is the FreeFlow text is the common member and that ENumber is probably different, otherwise it would make sense to determine equivalence based on that. If that is the case the join query above should be what you need.
If I'm interpreting you correctly, you want to find the elements that are in both lstHolder and lstHolder2 - which is the intersection. If I'm interpreting correctly, then 2 step solution: first, override Equals() on your Holder struct. then use teh LINQ intersect operator:
var result = lstHolder.Intersect(lstHolder2);
What do you mean by "compare"? This could mean a lot of things. Do you want to know which items are common to both sets? Do you want to know which items are different?
LINQ might have the answer no matter what you mean. Union, Except, etc.
If you are using C# 3.0 or higher then try the SequenceEqual method
Class1._TextElements.SequenceEqual(Class1._TextElements2);
This will run equality checks on the elements in the collection. If the sequences are of different lengths or any of the elements in the same position are not equal it will return false.
Related
I have two DataTables. I applied the Except operator as follows,
and got either unfiltered or undesired results.
resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable()).CopyToDataTable();
Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?
Note:
Both DataTables are plain simple with just one column.
dtA contains a dozen rows of strings.
dtB contains thousands of rows of strings.
I also tried the same syntax with another use case the set operator, Intersect. This does not work either.
resultDataTable2 =dtA.AsEnumerable().Intersect(dtB.AsEnumerable()).CopyToDataTable();
Except will use default comparer ie. it will compare references.
I think you are expecting to filter result and comparison is based on members.
I will recommend you to implement your own IEqualityComparer to compare two objects based on member.
e.g.
resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable(), new TestComparer()).CopyToDataTable();
class TestComparer : IEqualityComparer<MyTestClass>
{
public bool Equals(MyTestClass b1, MyTestClass b2)
{
if (b2 == null && b1 == null)
return true;
else if (b1 == null || b2 == null)
return false;
else if(b1.Prop1 == b2.Prop1 && b1.Prop2 == b2.Prop2) // ToDo add more check based on class
return true;
else
return false;
}
public int GetHashCode(MyTestClass)
{
int hCode = MyTestClass.Height ^ MyTestClass.Length ^ ....; // Add more based on class properties
return hCode.GetHashCode();
}
}
Doc
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.except?view=net-6.0#system-linq-enumerable-except-1(system-collections-generic-ienumerable((-0))-system-collections-generic-ienumerable((-0))-system-collections-generic-iequalitycomparer((-0)))
Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?
When you do a.Except(b) the contents of b are loaded into a hash set. A hash set is a device that doesn't accept duplicates, so it returns false when you try to add something that is already there inside it.
After b is loaded into the hash set, then a is looped over, also being added to the hash set. Anything that adds successfully (set.Add returns true because it is not already there) is returned. Anything that is already there (set.Add returns false because it was added by being in b, or appearing earlier in a) is not returned. You should note that this process also dedupes a, so 1,1,2,3 except 2,3 would return just a single 1. You've achieved "every unique thing in a that isn't in b" - but do check whether you wanted a to be deduped too
A hash set is a wonderful thing, that enables super fast lookups. To do this it relies on two methods that every object has: GetHashcode and Equals. GetHashcode converts an object into a probably-unique number. Equals makes absolutely sure an object A equals B. Hash set tracks all the hashcodes and objects it's seen before, so when you add something it first gets the hashcode of what youre trying to add.. If it never saw that hashcode before it adds the item. If you try add anything that has the same hashcode as something it saw already, it uses Equals to check whether or not it's the same as what it saw it already (sometimes hashcodes are the same for different data) and adds the item if Equals declares it to be different. This whole operation is very fast, much faster than searching object by object through all the objects it saw before.
By default any class in C# gets its implementation of GetHashcode and Equals from object. Object's versions of these methods essentially return the memory address for GetHashcode, and compare the memory addresses for Equals
This works fine for stuff that really is at the same memory address:
var p = new Person(){Name="John"};
var q = p; //same mem address as p
But it doesn't work for objects that have the same data but live at different memory addresses:
var p = new Person(){Name="John"};
var q = new Person(){Name="John"}; //not same mem address as p
If you define two people as being equal if they have the same name, and you want C# to consider them equal in the same way, you have to instruct C# to compare the names, not the memory addresses.
A DataRow is like Person above: just because it has the same data as another DataRow, doesn't mean it's the same row in C#'s opinion. Further, because a single DataRow cannot belong to two datatables, it's certain that the "John" in row 1 of dtA, is a different object to the "John" in row 1 of dtB..
By defaul Equals returns false for these two data rows, so Except will never consider them equal and remove dtA's John because of the presence of John in dtB..
..unless you provide an alternative comparison strategy that overrides C#s default opinion of equality. That might look like:
provide a comparer, like Kalpesh's answer, typos on the class name aside),
override Equals/GetHashcode for the datarows so they work off column data, not memory addresses,
or use some other thing that does already have Equals and GetHashcode that work off data rather than memory addresses
As these are just datatables of a single column full of strings they're notionally not much more than an array of string. If we make them into an array of strings, when we do a.Except(b) we will be comparing strings. By default C#s opinion of whether one string equals another is based on the data content of the string rather than the memory address it lives at1, so you can either use string arrays/lists to start with or convert your dtA/B to a string array:
var arrA = dtA.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var arrB = dtB.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var result = arra.Except(arrB);
Techncially we don't even need to call ToArray()..
If you really need the result to be a datatable, make one and add all the strings to it:
var resultDt = new DataTable();
resultDt.Columns.Add("x");
foreach(var s in result)
resultDt.Rows.Add(s);
1: we'll ignore interning for now
Sorry, I think I was not clear earlier. I am trying to do as O.R.mapper says below- create a list of arbitrary variables and then get their values later in foreach loop.
Moreover, all variables are of string type so I think can come in one list. Thanks.
Is there a way to store variables in a list or array then then loop through them later.
For example: I have three variables in a class c named x,y and Z.
can I do something like:
public List Max_One = new List {c.x,c.y,c.z}
and then later in the code
foreach (string var in Max_One)
{
if ((var < 0) | (var > 1 ))
{
// some code here
}
}
Is there a particular reason why you want to store the list of variables beforehand? If it is sufficient to reuse such a list whenever you need it, I would opt for creating a property that returns an IEnumerable<string>:
public IEnumerable<string> Max_One {
get {
yield return c.x;
yield return c.y;
yield return c.z;
}
}
The values returned in this enumerable would be retrieved only when the property getter is invoked. Hence, the resulting enumerable would always contain the current values of c.x, c.y and c.z.
You can then iterate over these values with a foreach loop as alluded to by yourself in your question.
This might not be practical if you need to gradually assemble the list of variables; in that case, you might have to work with reflection. If this is really required, please let me know; I can provide an example for that, but it will become more verbose and complex.
Yes, e.g. if they are all strings:
public List<string> Max_One = new List<string> {c.x,c.y,c.z};
This uses the collection initializer syntax.
It doesn't make sense to compare a string to an int, though. This is a valid example:
foreach (string var in Max_One)
{
if (string.IsNullOrEmpty(var))
{
// some code here
}
}
If your properties are numbers (int, for example) you can do this:
List<int> Max_One = new List<int> { c.x, c.y, c.Z };
and use your foreach like this
foreach(int myNum in Max_One) { ... } //you can't name an iterator 'var', it's a reserved word
Replace int in list declaration with the correct numeric type (double, decimal, etc.)
You could try using:
List<object> list = new List<object>
{
c.x,
c.y,
c.z
};
I will answer your question in reverse way
To start with , you cannot name your variable with "var" since it is reserved name. So what you can do for the foreach is
foreach (var x in Max_One)
{
if ((x< 0) || (x> 1 ))
{
// some code here
}
}
if you have .Net 3.0 and later framework, you can use "var" to define x as a member of Max_One list without worrying about the actual type of x. if you have older than the version 3.0 then you need to specify the datatype of x, and in this case your code is valid (still risky though)
The last point (which is the your first point)
public List Max_One = new List {c.x,c.y,c.z}
There are main thing you need to know , that is in order to store in a list , the members must be from the same datatype, so unless a , b , and c are from the same datatype you cannot store them in the same list EXCEPT if you defined the list to store elements of datatype "object".
If you used the "Object" method, you need to cast the elements into the original type such as:
var x = (int) Max_One[0];
You can read more about lists and other alternatives from this website
http://www.dotnetperls.com/collections
P.s. if this is a homework, then you should read more and learn more from video tutorials and books ;)
i got a generic list that looks like this:
List<PicInfo> pi = new List<PicInfo>();
PicInfo is a class that looks like this:
[ProtoContract]
public class PicInfo
{
[ProtoMember(1)]
public string fileName { get; set; }
[ProtoMember(2)]
public string completeFileName { get; set; }
[ProtoMember(3)]
public string filePath { get; set; }
[ProtoMember(4)]
public byte[] hashValue { get; set; }
public PicInfo() { }
}
what i'm trying to do is:
first, filter the list with duplicate file names and return the duplicate objects;
than, filter the returned list with duplicate hash value's;
i can only find examples on how to do this which return anonymous types. but i need it to be a generic list.
if someone can help me out, I'd appreciate it. also please explain your code. it's a learning process for me.
thanks in advance!
[EDIT]
the generic list contains a list of objects. these objects are pictures. every picture has a file name, hash value (and some more data which is irrelevant at this point). some pictures have the same name (duplicate file names). and i want to get a list of the duplicate file names from this generic list 'pi'.
But those pictures also have a hash value. from the file names that are identical, i want another list of those identical files names that also have identical hash values.
[/EDIT]
Something like this should work. Whether it is the best method I am not sure. It is not very efficient because for each element you are iterating through the list again to get the count.
List<PicInfo> pi = new List<PicInfo>();
IEnumerable<PicInfo> filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName)>1);
I hope the code isn't too complicated to need explaining. I always think its best to work it out on your own anyway but if anythign is confusing then just ask and I'll explain.
If you want the second filter to be filtering for the same filename and same hash being a duplicate then you just need to extend the lambda in the Count to check against hash too.
Obviously if you just want filenames at the end then it is easy enough to do a Select to get just an enumerable list of those filenames, possibly with a Distinct if you only want them to appear once.
NB. Code written by hand so do forgive typos. May not compile first time, etc. ;-)
Edit to explain code - spoilers! ;-)
In english what we want to do is the following:
for each item in the list we want to select it if and only if there is more than one item in the list with the same filename.
Breaking this down to iterate over the list and select things based on a criteria we use the Where method. The condition of our where method is
there is more than one item in the list with the same filename
for this we clearly need to count the list so we use pi.Count. However we have a condition that we are only counting if the filename matches so we pass in an expression to tell it only to count those things.
The expression will work on each item of the list and return true if we want to count it and false if we don't want to.
The filename we are interested in is on x, the item we are filtering. So we want to count how many items have a filename the same as x.FileName. Thus our expression is z=>z.FileName==x.FileName. So z is our variable in this expression and x.FileName in this context is unchanging as we iterate over z.
We then of course put our criteria in of >1 to get the boolean value we want.
If you wanted those that are duplicates when considering the filename and hashvalue then you would expand the part in the Count to be z=>z.FileName==x.FileName && z.hashValue==x.hashValue.
So your final code to get the distinct on both values would be:
List pi = new List();
List filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName && z.hashValue==x.hashValue)>1).ToList();
If you wanted those that are duplicates when considering the filename and hashvalue then you would expand the part in the Count to compare the hashValue as well. Since this is an array you will want to use the SequenceEqual method to compare them value by value.
So your final code to get the distinct on both values would be:
List<PicInfo> pi = new List<PicInfo>();
List<PicInfo> filt = pi.Where(x=>pi.Count(z=>z.FileName==x.FileName && z.hashValue.SequenceEqual(x.hashValue))>1).ToList();
Note that I didn't create the intermediary list and just went straight from the original list. You could go from the intermediate list but the code would be much the same if going from the original as from a filtered list.
I think, you have to use SequenceEqual method for finding dublicate
(http://msdn.microsoft.com/ru-ru/library/bb348567.aspx).
For filter use
var p = pi.GroupBy(rs => rs.fileName) // group by name
.Where(rs => rs.Count() > 1) // find group whose count greater than 1
.Select(rs => rs.First()) // select 1st element from each group
.GroupBy(rs => rs.hashValue) // now group by hash value
.Where(rs => rs.Count() > 1) // find group has multiple values
.Select(rs => rs.First()) // select first element from group
.ToList<PicInfo>() // make the list of picInfo of result
I have a list of items.
The problem is the returned items (which I have no control over) return the same items THREE time.
So while the actual things that should be in the list are:
A
B
C
I get
A
B
C
A
B
C
A
B
C
How can I cleanly and easily remove the duplicates? Maybe count the items, divide by three and delete anything from X to list.Count?
The quickest, simplest thing to do is to not remove the items but run a distinct query
var distinctItems = list.Distinct();
If it's a must that you have a list, you can always append .ToList() to the call. If it's a must that you continue to work with the same list, then you'd just have to iterate over it and keep track of what you already have and remove any duplicates.
Edit: "But I'm working with a class"
If you have a list of a given class, to use Distinct you need to either (a) override Equals and GetHashCode inside your class so that appropriate equality comparisons can be made. If you do not have access to the source code (or simply don't want to override these methods for whatever reason), then you can (b) provide an IEqualityComparer<YourClass> implementation as an argument to the Distinct method. This will also allow you to specify the Equals and GetHashCode implementations without having to modify the source of the actual class.
public class MyObjectComparer : IEqualityComparer<MyObject>
{
public bool Equals(MyObject a, MyObject b)
{
// code to determine equality, usually based on one or more properties
}
public int GetHashCode(MyObject a)
{
// code to generate hash code, usually based on a property
}
}
// ...
var distinctItems = myList.Distinct(new MyObjectComparer());
if you are 100% sure that you receive everything you need 3 times, then just
var newList = oldList.Take(oldList.Count / 3).ToList()
Linq has a Distinct() method which does exactly this. Or put the items in a HashSet if you want to avoid duplicated completely.
If you're using C# 3 or up:
var newList = dupList.Distinct().ToList();
If not then sort the list and do the following:
var lastItem = null;
foreach( var item in dupList )
{
if( item != lastItem )
{
newItems.Add(item);
}
lastItem = item;
}
you could simply create a new list and add items to it that are not already there.
I'm trying to get a distinct list of words from an array of words with the following code:
string words = "this is a this b";
var split = words.Split(' ');
IEnumerable<Word> distinctWords = (
from w in split
select new Word
{
Text = w.ToString()
}
).Distinct().ToList();
I thought this would take out the double occurrence of 'this' but it returns a list of each word in the phrase.
In your example, each Word object is distinct, because there is no comparison which looks at the Text property.
However, there's no reason to create a new object:
var distinctWords = (from w in split
select w).Distinct().ToList();
Or more simply:
var distinctWords = new List<string>(split.Distinct());
The problem is, that you create several Word objects that contain the same Value, but how should the compiler know, that these shall be the same items?
Try
(from w in split.Distinct()
select new Word { Text = w.ToString()}).ToList();
You haven't posted the code for your Word class, but my guess is that it doesn't implement Equals with a value comparison so you get the default implementation of Equals which just checks the object references. Note that if you decide to implement your own version of Equals, you also need to correctly implement GetHashCode.
An alternative way to solve this issue is to provide an IEqualityComparer as a parameter to the Distinct function.
You may try to convert array ToList() first before calling the .Distinct()
and then converting it ToArray() again
myArray= myArray.ToList().Distinct().ToArray();
As others noted, the problem is probably that your Word object doesn't implement structural equality (compare the actual content, not instance references). If you still want to get a collection of Word objects as the result, but use Distinct on the underlying string values, you can write this:
IEnumerable<Word> distinctWords =
(from w in split.Distinct()
select new Word { Text = w.ToString() }).ToList();