Get all unmatching elements between two large lists - c#

I have a dictionary<string, Foo> with X amount of elements. The dictionary key is containing Foo.Id. I also have a List<Foo> newFoos, which in my case contains a little less elements than the dictionary. So what I would like to do, is have a new List<Foo> with all the elements that are in newFoos but not in my dictionary.
I solved this by using:
var list = MyDict.Where(x => newFoos.All(y => y.Id != x.Key)).ToList();
But the problem with this was performance in my case, it must be some easier and faster way? And please not by using Except/Intersect and override Equals
public class Program {
public static Dictionary<int, Foo> MyDict { get; set; } = new Dictionary<int, Foo>();
private static void Main(string[] args) {
for (int i = 0; i < 2000; i++) {
MyDict.Add(i, new Foo() {Id = i});
}
var newFoos = new List<Foo>();
for (int i = 0; i < 1500; i++) {
newFoos.Add(new Foo() { Id = i });
}
var list = MyDict.Where(x => newFoos.All(y => y.Id != x.Key)).ToList();
}
}
public class Foo {
public int Id { get; set; }
//More properties
}
When using my testcode above I find it not that slow, but the principle is the same

var list = newFoos.Where(x => !MyDict.ContainsKey(x.Id)).ToList();
This should be more efficient since checking if a key is in a dictionary should be faster than looking up an item in a list.

I assume the index access on the list is slightly faster than using an enumerator. As already stated, checking the Key existance also is much faster than accessing each item, this results in a for...ContainsKey...
List<Foo> addedFoos = new List<Foo>();
for (int i = 0; i < newFoos.Count; i++)
{
Foo current = newFoos[i];
if (MyDict.ContainsKey(current.Id))
{
addedFoos.Add(current);
//MyDict.Add(current.Id, current); /* see remark below */
}
}
//addedFoos.ForEach(item => MyDict.Add(item.Id, item.Value)); /* see remark below */
If you intend to add them to the dictionary, depending on the amount of newFoos, it may be better to Add the items after finding them instead of in the loop, because otherwise you will enlarge the Dictionary while searching with items that will never cause a hit anyway.

Related

How to set flag in one list for the Id's which match with the Id's in another list using Lambda expression c#

I have two lists classes
public class class1{
public Int Id { get; set; }
public Bool Flag{ get; set; }
}
public class class2{
public Int Id { get; set; }
}
Now i have List<class1> and List<class2>,
Now i have to update Flag property to true in List<class1> for only those Ids which match with the Id's present in List<class2> using lambda expression c#.Don't want to use foreach.
using lambda expression. Don't want to use foreach.
That's usually a silly requirement and a hallmark that you're not really familiar with C#, Linq or performance analysis. You have a collection whose elements you want to modify, so you should use foreach().
If you're trying out functional programming, then you should treat the list elements as immutable and project into a new collection.
The first part of your problem, looking up which list elements to modify based on a presence of one of their properties in another collection's elements' properties, is trivial:
var elementsToModify = list1.Where(l1 => list2.Any(l2 => l2.Id == l1.Id));
Now with a foreach(), this'll be simple:
foreach (var l1 in elementsToModify)
{
l1.Flag = true;
}
Or, even denser (not that less code equals more performance):
foreach (var l1 in list1.Where(l1 => list2.Any(l2 => l2.Id == l1.Id)))
{
l1.Flag = true;
}
So, there's your code. But you didn't want to use foreach(). Then you need to project into a new collection:
var newList1 = list1.Where(l1 => list2.Any(l2 => l2.Id == l1.Id))
.Select(l1 => new Class1
{
Id = l1.Id,
Flag = true,
})
.ToList();
There you have it, a List<Class1> with only flagged items. Optionally you could use this list in a foreach() to update the original list1. Oh, wait.
The below solution does not use the classical "for each", but is compiled to one under the hood. If that's not what you meant, then please explain what you are trying to achieve. Using for each in this example is a good approach. One could also use while or for loops, but is it really what's being asked here?
Object definition:
public class MyObject
{
public int Id { get; set; }
public bool Flag { get; set; }
}
List initialization:
var list = new List<MyObject>()
{
new MyObject() { Id= 1 },
new MyObject() { Id= 2 },
new MyObject() { Id= 3 },
new MyObject() { Id= 4 }
};
var list2 = new List<MyObject>()
{
new MyObject() { Id= 2 },
new MyObject() { Id= 4 }
};
Code:
list.ForEach(el => el.Flag = list2.Any(el2 => el2.Id == el.Id));
EDIT:
An example with a while loop (a bit nasty to do it this way):
int i = -1;
int numberOfElements = list.Count;
while (++i < numberOfElements)
{
list[i].Flag = list2.Any(el => el.Id == list[i].Id);
}
I guess you can write a for loop yourself...

Why isn't my object being updated

I'm doing something wrong because after the loop executed myData still contains objects with blank ids. Why isn't the myData object being updated in the following foreach loop, and how do I fix it?
I thought it could be that I wasn't passing the object by reference, but added a ref keyword and also moved to the main method and I'm still showing the object not being updated.
Additional Information
The user object in the foreach loop is being updated, but the myData list does not reflect the updates I see being applied to the user object.
** Solution **
I was not creating a List but an Enumerable which was pulling the json each time I went through myData in a foreach list. Adding a ToList() fixed my issue.
public class MyData
{
public string ID { get; set; }
public Dictionary<string, string> Properties { get; set; }
}
int index = 0;
// Does not allow me to up, creates an IEnumerable
//IEnumerable<MyData> myData = JObject.Parse(json)["Users"]
// .Select(x => new MyData()
// {
// ID = x["id"].ToString(),
// Properties = x.OfType<JProperty>()
// .ToDictionary(y => y.Name, y => y.Value.ToString())
// });
//Works allows me to update the resulting list.
IEnumerable<MyData> myData = JObject.Parse(json)["Users"]
.Select(x => new MyData()
{
ID = x["id"].ToString(),
Properties = x.OfType<JProperty>()
.ToDictionary(y => y.Name, y => y.Value.ToString())
}).ToList();
foreach (var user in myData) // Also tried myData.ToList()
{
if (string.IsNullOrEmpty(user.ID))
{
user.ID = index.ToString();
user.Properties["id"] = index.ToString();
}
index++;
}
public class MyData
{
public MyData()
{
this.Properties = new Dictionary<string,string>();
}
public string ID { get; set; }
public Dictionary<string, string> Properties { get; set; }
}
public static void Main(string[] args)
{
IEnumerable<MyData> myDataList = new List<MyData>();
int index = 0; // Assuming your starting point is 0
foreach (var obj in myDataList)
{
if (obj != null && string.IsNullOrEmpty(obj.ID))
{
obj.ID = index.ToString();
// Checks if the Properties dictionary has the key "id"
if (obj.Properties.ContainsKey("id"))
{
// If it does, then update it
obj.Properties["id"] = obj.ID;
}
else
{
// Else add it to the dictionary
obj.Properties.Add("id", obj.ID);
}
}
index++;
}
I believe the reason why your objects are not updating because it's probably still referring to the memory block before your objects were changed. Perhaps. The easiest way (that I can think of, there are thousands of smarter programmers than me) is to create a new list and have it contain all of your updated objects.
Edit
I updated the code above with the code that I have. I created a method to set a small amount of objects to test:
private static IEnumerable<MyData> GetMyData()
{
return new List<MyData>()
{
new MyData(),
new MyData() {ID = "2"},
new MyData() {ID = "3"},
new MyData()
};
}
I was able to view my changes and then go through a foreach loop to view my changes. If the ID of the object is Null or Empty, then it steps into the if check and adds the current index to the ID as you know.
Now for my question: Which "id" is blank? The "id" in the dictionary or is it the ID of the model? Are all of your (Model).ID blank? As the updated code of yours, if your dictionary doesn't have "id" as a key, it's going to throw an exception saying it doesn't exist so you will need to do a check to make sure it does exist or add it if it doesn't.

Accessing properties of elements in a collection to create new collections

Is there any way to define a table as a collection of rows, and automatically populate properties (columns) on the table according to the properties of the rows?
For example:
public class Foobar {
public int TheNumber;
public string TheString;
}
public class SomeFoobars : List<Foobar> {
public List<int> TheNumber {
get { return Select(foo => foo.TheNumber); }
set { for (int i = 0; i < Count; i++) { this[i].TheNumber = value[i]; }
}
public List<int> TheString {
get { return Select(foo => foo.TheString ); }
set { for (int i = 0; i < Count; i++) { this[i].TheString = value[i]; }
}
}
// So I can now do things like:
SomeFoobars myFoobars = ReturnsListOfFoobar();
MethodThatTakesListOfInt( myFoobars.TheNumbers );
myFoobars.TheString = SomeMethodThatReturnsListOfString();
Creating the collection class implementation isn't so bad if you only have to do it once, but I would like to have this functionality for any type of row and not have to write the collection properties over and over. These property methods are essentially identical, other than the reference to the specific property on the contained class (i.e. TheNumber or TheString in the example above).
Is there any way to accomplish this? Perhaps using reflection?
I would suggest you to go back and revise your design. As you may realize now, it is causing a lot of trouble to you.
With that being said if you still decide to keep the current kits, you can remove the properties on SomeFoobars and still do the same in this way :
MethodThatTakesListOfInt(
myFoobars
.Select(f => f.TheNumber)
.ToList());
SomeMethodThatReturnsListOfString()
.Select((s,i) => new { Index = i, String = s })
.ToList()
.ForEach(x => myFoobars[x.Index].TheString = x.String);

Element Metrics with Custom collection in C#

I am trying to figure out the best way to organise a bunch of my data classes, given I need to be able to access some metrics on them all at some point.
Here's a snippet of my OR class:
public enum status { CLOSED, OPEN }
public class OR
{
public string reference { get; set; }
public string title { get; set; }
public status status { get; set; }
}
Not every OR I initialise will have values for all properties. I want to be able to 'collect' thousands of these together in such a way that I can easily obtain a count of how many OR objects had a value set. For example:
OR a = new OR() { reference = "a" }
OR b = new OR() { reference = "b", title = "test" }
OR c = new OR() { reference = "c", title = "test", status = status.CLOSED }
Now these are somehow collected in such a way I can do (pseudo):
int titleCount = ORCollection.titleCount;
titleCount = 2
I would also want to be able gather metrics for the enum type properties, for example retrieve a Dictionary from the collection that looks like:
Dictionary<string, int> statusCounts = { "CLOSED", 1 }
The reason for wanting access to these metrics is that I am building two collections of ORs and comparing them side-by-side for any differences (they should be identical). I want to be able to compare their metrics at this higher level first, then break-down where precisely they differ.
Thanks for any light that can be shed on how to accomplish this. :-)
... to 'collect' thousands of these
Thousands is not a huge number. Just use a List<OR> and you can get all your metrics with Linq queries.
For example:
List<OR> orList = ...;
int titleCount = orList
.Where(o => ! string.IsNullOrEmpty(o.title))
.Count();
Dictionary<status, int> statusCounts = orList
.GroupBy(o => o.status)
.ToDictionary(g => g.Key, g => g.Count());
The existing answers using Linq are absolutely great and really elegant, so the idea presented below is just for posterity.
Here is a (very rough) reflection-based program that will alow you to count the "valid" properties in any collection of objects.
The validators are defined by you in the Validators dictionary so that you can easily change what is a valid/invalid value for each property. You may find it useful as a concept if you end up with objects having tons of properties and don't want to have to write inline linq metrics on the actual collection itself for every single property.
You could weaponise this as a function and then run it against both collections, giving you a basis to report on the exact differences between both since it records the references to the individual objects in the final dictionary.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Reflection;
namespace reftest1
{
public enum status { CLOSED, OPEN }
public class OR
{
public string reference { get; set; }
public string title { get; set; }
public status status { get; set; }
public int foo { get; set; }
}
//creates a dictionary by property of objects whereby that property is a valid value
class Program
{
//create dictionary containing what constitues an invalid value here
static Dictionary<string,Func<object,bool>> Validators = new Dictionary<string, Func<object,bool>>
{
{"reference",
(r)=> { if (r ==null) return false;
return !String.IsNullOrEmpty(r.ToString());}
},
{"title",
(t)=> { if (t ==null) return false;
return !String.IsNullOrEmpty(t.ToString());}
},
{"status", (s) =>
{
if (s == null) return false;
return !String.IsNullOrEmpty(s.ToString());
}},
{"foo",
(f) =>{if (f == null) return false;
return !(Convert.ToInt32(f.ToString()) == 0);}
}
};
static void Main(string[] args)
{
var collection = new List<OR>();
collection.Add(new OR() {reference = "a",foo=1,});
collection.Add(new OR(){reference = "b", title = "test"});
collection.Add(new OR(){reference = "c", title = "test", status = status.CLOSED});
Type T = typeof (OR);
var PropertyMetrics = new Dictionary<string, List<OR>>();
foreach (var pi in GetProperties(T))
{
PropertyMetrics.Add(pi.Name,new List<OR>());
foreach (var item in collection)
{
//execute validator if defined
if (Validators.ContainsKey(pi.Name))
{
//get actual property value and compare to valid value
var value = pi.GetValue(item, null);
//if the value is valid, record the object into the dictionary
if (Validators[pi.Name](value))
{
var lookup = PropertyMetrics[pi.Name];
lookup.Add(item);
}
}//end trygetvalue
}
}//end foreach pi
foreach (var metric in PropertyMetrics)
{
Console.WriteLine("Property '{0}' is set in {1} objects in collection",metric.Key,metric.Value.Count);
}
Console.ReadLine();
}
private static List<PropertyInfo> GetProperties(Type T)
{
return T.GetProperties(BindingFlags.Public | BindingFlags.Instance).ToList();
}
}
}
You can get the title count using this linq query:
int titleCount = ORCollection
.Where(x => !string.IsNullOrWhiteSpace(x.title))
.Count();
You could get the count of closed like this:
int closedCount = ORCollection
.Where(x => x.status == status.CLOSED)
.Count();
If you were going to have larger collections or you access the values a lot it might be worth creating a custom collection implementation that stores the field counts, it could then increment/decrement these values as you add and remove items. You could also store a dictionary of status counts in this custom collection that gets updated as you add and remove items.

Am I using the LINQ .OfType() operator correctly?

public class Stock
{
}
class Program
{
static void Main(string[] args)
{
ObjectCache cache = MemoryCache.Default;
cache["test"] = new Stock();
var x = cache.OfType<Stock>().ToList();
}
}
This is returning empty ...I thought OfType is supposed to return all instances in a collection of type T ?
Just to rule out the ObjectCache as a possible culprit I also tried
List<object> lstTest = new List<object>();
lstTest.Add(new Stock());
var y = lstTest.OfType<Stock>().ToList();
This works however - so it seems like the problem is with the ObjectCache, which is an instance of a Dictionary underneath
SOLUTION
cache.Select(item => item.Value).OfType<T>().ToList()
Thanks Alexei!
MemoryChache returns enumerator of KeyValuePair<string,Object>, not just values: MemoryChache.GetEnumerator().
You need to case accordingly to get your items. Something like:
var y = cache.Select(item => item.Value).OfType<Stock>();
This would work
cache.GetValues(new string[] {"test"}).Values.OfType<Order>()
But I don't think you should use this.
Cache works like a Dictionary...so you can get set of KeyValuePairs with GetValues
This worked for me.
public class Stock
{
public Stock()
{
Name = "Erin";
}
public string Name { get; set; }
}
class Program
{
static void Main(string[] args)
{
System.Collections.ArrayList fruits = new System.Collections.ArrayList(4);
fruits.Add("Mango");
fruits.Add("Orange");
fruits.Add("Apple");
fruits.Add(3.0);
fruits.Add("Banana");
fruits.Add(new Stock());
// Apply OfType() to the ArrayList.
var query1 = fruits.OfType<Stock>();
Console.WriteLine("Elements of type 'stock' are:");
foreach (var fruit in query1)
{
Console.WriteLine(fruit);
}
}
}
Remember IEnumerable is lazily evaluated. Use a foreach to loop through query1 and you will see it only find the Stock object.
Yeah. Sorry myself. ObjectCache is a IEnumerable>
Not really an IDictionary.
This works:
var c = cache.Select(o => o.Value).OfType<Stock>().ToList();

Categories