Categorical sorting optimization

Categorical sorting optimization - c#

Question: What is the best way to sort items(T) into buckets(ConcurrentBag)?
Ok, so I have not yet taken an Algorithms class, so I am unsure of the best approach to the problem I have come across.
Preconditions:
Each bucket has a unique identifier (within each sBucket).
Each sBucket has a unique identifier.
Each item has a unique identifier.
Each item has a property (bucketId) corresponding to the bucket it
belongs to.
Each item has a property (sBucketId) corresponding to the
superBucket it belongs to.
Bucket and sBucket id's are unique.
I have a ConcurrentBag of items I wish to sort into these
buckets.
There are several hundred items.
There are several dozen buckets.
There are 3 super-buckets which contain the buckets.
Each super-bucket contains the same buckets, though with different
items within the buckets.
I am currently using brute force via a Parallel.foreach loop on the collection of items to compare the item's bucketId to each individual bucket using linq. This is incredibly slow and cumbersome though, so I'd like to find a better method.
I have thought about sorting the items based on their superBucket then Bucket, and then iterating through each superbucket->bucket to insert the items. Should this be the path I take?
Thanks for any help you can provide.
Example of current code
ConcurrentBag<Item> items ...
List<SuperBuckets> ListOfSuperBuckets ...
Parallel.ForEach(items, item =>
{
ListOfSuperBuckets
.Where(sBucket => sBucket.id == item.sBucketId)
.First()
.buckets
.Where(bucket => bucket.id == item.bucketId)
.First()
.items
.Add(item);
});

I wouldn't use parallelism for this, but there are a bunch of options.
var groupedBySBucket = ListOfSuperBuckets
.GroupJoin(items, a => a.id, b => b.sBucketId, (a,b) => new
{
sBucket = a,
buckets = a.buckets
.GroupJoin(b, c => c.id, x => x.bucketId, (c, x) => new
{
bucket = c,
items = x
});
});
foreach (var g in groupedBySBucket)
{
// We benefit here from that the collection types are passed by reference.
foreach (var b in g.buckets)
{
b.bucket.AddRange(b.items);
}
}
Or if that's too much code for you, this is comparable.
var groupedByBucket = ListOfSuperBuckets
.SelectMany(c => c.buckets, (a,b) => new { sBucketId = a.id, bucket = b })
.GroupJoin(items, a => new { a.sBucketId, bucketId = a.bucket.id }, b => new { b.sBucketId, b.bucketId }, (a, b) => new
{
bucket = a.bucket,
items = b
}));
foreach (var g in groupedByBucket)
{
// We benefit here from that the collection types are passed by reference.
g.bucket.AddRange(b.items);
}
This is also assuming ListOfSuperBuckets is a given. If that was simply an artifact of your implementation, there'd be a simpler way even yet. This builds the list.
Beware, of course, because these are different--this one won't have any empty buckets for no data, but the first implementation could. We're also creating new buckets, which the first implementation doesn't; good if we need to, bad if you've already created them elsewhere. The first one could easily be modified to create them, of course.
var ListOfSuperBuckets = items
.GroupBy(c => new { c.bucketId, c.sBucketId })
.GroupBy(c => c.Key.sBucketId)
.Select(c => new SuperBucket
{
id = c.Key,
buckets = c.Select(b => new Bucket
{
id = b.Key.bucketId,
items = b.ToList()
}).ToList()
})
.ToList();
For what it's worth, all these ToList calls are meant to preserve the contract I assume you have. If you don't need them, you could benefit from LINQ's deferred execution by leaving them off. It's really a matter of how you're using the code, but that's worth consideration.

You should use Dictionary so you can look up buckets and SuperBuckets by ID instead of searching for them.
SuperBucket should have a Dictionary<id_type,Bucket> that you can use to look up buckets by ID, and should should keep the SuperBuckets in a Dictionary<id_type,SuperBucket>. (id_type is the type of your IDs -- probably string or int, but I can't tell from your code)
If you don't want to modify the existing classes, then build a Dictionary<id_type, Dictionary<id_type, Bucket>> and use that.

Related

Adding items to the list inside foreach loop

epublic ActionResult ExistingPolicies()
{
if (Session["UserId"]==null)
{
return RedirectToAction("Login");
}
using(PMSDBContext dbo=new PMSDBContext())
{
List<Policy> viewpolicy = new List<Policy>();
var userid = Session["UserId"];
List<AddPolicy> policy= dbo.AddPolicies.Where(c => c.MobileNumber ==
(string)userid).ToList();
foreach(AddPolicy p in policy)
{
viewpolicy=dbo.Policies.Where(c => c.PolicyId ==p.PolicyId).ToList();
}
Session["Count"] = policy.Count;
return View(viewpolicy);
}
}
Here the policy list clearly has 2 items.But when I iterate through foreach,the viewpolicy list only takes the last item as its value.If break is used,it takes only the first item.How to store both items in viewpolicy list??
Regards
Surya.

You can iterate through policies and add them by one to list with Add, but I would say that often (not always, though) better option would be to just retrieve the whole list from DB in one query. Without knowing your entities you can do at least something like that:
List<AddPolicy> policy = ...
viewpolicy = dbo.Policies
.Where(c => policy.Select(p => p.PolicyId).Contains(c.PolicyId))
.ToList();
But if you have correctly set up entities relations, you should be able to do something like this:
var viewpolicy = dbo.AddPolicies
.Where(c => c.MobileNumber == (string)userid)
.Select(p => p.Policy) //guessing name here, also can be .SelectMany(p => p.Policy)
.ToList();

Of course; instead of adding to the list, you replace it with a whole new one on each pass of the loop:
viewpolicy=dbo.Policies.Where(c => c.PolicyId ==p.PolicyId).ToList()
This code above will search all the policies for the policy with that ID, turn it into a new List and assign to the viewpolicy variable. You never actually add anything to a list with this way, you just make new lists all the time and overwrite the old one with the latest list
Perhaps you need something like this:
viewpolicy.Add(dbo.Policies.Single(c => c.PolicyId ==p.PolicyId));
This has a list, finds one policy by its ID number (for which there should be only one policy, right? It's an ID so I figured it's unique..) and adds it to the list
You could use a Where and skip the loop entirely if you wanted:
viewpolicy=dbo.Policies.Where(c => policy.Any(p => c.PolicyId == p.PolicyId)).ToList();
Do not do this in a loop, it doesn't need it. It works by asking LINQ to do the looping for you. It should be converted to an IN query and run by the DB, so generally more performant than dragging the policies out one by one (via id). If the ORM didn't understand how to make it into SQL you can simplify things for it by extracting the ids to an int collection:
viewpolicy=dbo.Policies.Where(c => policy.Select(p => p.PolicyId).Any(id => c.PolicyId == id)).ToList();
Final point, I recommend you name your "collections of things" with a plural. You have a List<Policy> viewpolicy - this is a list that contains multiple policies so really we should call it viewPolicies. Same for the list of AddPolicy. It makes code read more nicely if things that are collections/lists/arrays are named in the plural

Something like:
viewpolicy.AddRange(dbo.Policies.Where(c => c.PolicyId ==p.PolicyId));

What is the fastest way to compare these two collections?

I am noticing a huge performance issue with trying to get a list of keys in a ConcurrentDictionary value object that exist in an IEnumerable collection as follows:
Customer object has:
string CustomerNumber;
string Location;
var CustomerDict = ConcurrentDictionary<string, Customer>();
var customers = IEnumerable<string>();
I am trying to get a list of the keys in the dictionary where the customers.CustomerNumber is in the dictionary. What I have is below the removeItems takes a very long time to return:
var removeItems = CustomerDict
.Where(w => customers.Any(c => c == w.Value.CustomerNumber))
.Select(s => s.Key)
.ToList();
foreach(var item in removeItems)
{
CustomerDict.TryRemove(item, out _);
}
Any help would be much appreciated what best to do with this.

Make customers a HashSet<string>, who's Contains method is O(1):
var customers = HashSet<string>();
var removeItems = CustomerDict
.Where(w => customers.Contains(w.Value.CustomerNumber))
.Select(s => s.Key);
Currently, Any is iterating over customers every time which has an O(n) complexity.
Also you're call to ToList is superfluous: it adds an additional, unnecessary iteration over customers, not to mention increased memory usage.

I think its better to create HashSet from customers in order to look faster,
HashSet<string> customersHashSet = new HashSet<string>(customers);
var removeItems = CustomerDict
.Where(c => customersHashSet.Contains(c.Value.CustomerNumber))
.Select(s => s.Key);
foreach (var item in removeItems)
{
CustomerDict.TryRemove(item, out _);
}
When removing consider if you have many items in the HashSet ( relatively to the dictionary ) its maybe better to iterate over the dictionary and search in the HashSet, like this :
foreach (var item in CustomerDict.ToArray())
{
if (customersHashSet.Contains(item.Value.CustomerNumber))
CustomerDict.TryRemove(item.Key, out _);
}

The problem is that .Any will do a linear scan of the underlying collection, which in your case is the key collection of your concurrent dictionary. This takes linear effort. It would be better to dump the keys into a local HashSet and then check the inclusion via .Contains(w.Value.CustomerNumber). This becomes nearly constant effort.

Why not just simply do this:
foreach(var customer in customers) //enumerate customers
CustomerDict.TryRemove(customer, out _); //trytoremove the customer, won't do anything if the customer isn't found

How to keep initializer list order within Select and/or SelectMany

I hope this is not a duplicate but I wasn't able to find an answer on this.
It either seems to be an undesired behavior or missing knowledge on my part.
I have a list of platform and configuration objects. Both contains a member string CodeName in it.
The list of CodeNames look like this:
dbContext.Platforms.Select(x => x.CodeName) => {"test", "PC", "Nintendo"}
dbContext.Configurations.Select(x => x.CodeName) => {"debug", "release"}
They are obtained from a MySQL database hence the dbContext object.
Here is a simple code that I was to translate in LINQ because 2 foreach are things of the past:
var choiceList = new List<List<string>>();
foreach (Platform platform in dbContext.Platforms.ToList())
{
foreach (Configuration configuration in dbContext.Configurations.ToList())
{
choiceList.Add(new List<string>() { platform.CodeName, configuration.CodeName });
}
}
This code gives my exactly what I want, keeping the platform name first which looks like :
var results = new List<List<string>>() {
{"test", "debug"},
{"test", "release"},
{"PC", "debug"}
{"PC", "release"}
{"Nintendo", "debug"}
{"Nintendo", "release"}};
But if I translate that to this, my list contains item in a different order:
var choiceList = dbContext.Platforms.SelectMany(p => dbContext.Configurations.Select(t => new List<string>() { p.CodeName, t.CodeName })).ToList();
I will end up with this, where the platform name isn't always first, which is not what is desired:
var results = new List<List<string>>() {
{"debug", "test"},
{"release", "test"},
{"debug", "PC"}
{"PC", "release"}
{"debug", "Nintendo"}
{"Nintendo", "release"}};
My question is, is it possible to obtain the desired result using LINQ?
Let me know if I'm not clear or my question lacks certain details.
Thanks
EDIT: So Ivan found the explanation and I modified my code in consequence.
In fact, only the Enumerable in front of the SelectMany needed the .ToList().
I should also have mentioned that I was stuck with the need of a List>.
Thanks everyone for the fast input, this was really appreciated.

When you use
var choiceList = dbContext.Platforms.SelectMany(p => dbContext.Configurations.Select(t => new List<string>() { p.CodeName, t.CodeName })).ToList();
it's really translated to some SQL query where the order of the returned records in not defined as soon as you don't use ORDER BY.
To get the same results as your nested loops, execute and materialize both queries, and then do SelectMany in memory:
var platforms = dbContext.Platforms.ToList();
var configurations = dbContext.Configurations.ToList();
var choiceList = platforms.SelectMany(p => configurations,
(p, c) => new List<string>() { p.CodeName, c.CodeName })
.ToList();

Rather than projecting it out to an array, project it out two a new object with two fields (potentially an anonymous object) and then, if you need it, project that into a two element array after you have retrieved the objects from the database, if you really do need these values in an array.

Try this-
var platforms= dbContext.Platforms.Select(x=>x.CodeName);
var configurations=dbContext.Configurations.Select(x=>x.CodeName);
var mix=platforms.SelectMany(num => configurations, (n, a) => new { n, a });
If you want to learn more in detail- Difference between Select and SelectMany

OrderBy on nested collections

I'm trying to sort this complex object:
Order _sut = new Order
{
OrderDataArray = new[]
{
new OrderData
{
OrderHeaderArray = new[]
{
new OrderHeader
{
SequenceNumber = 1,
OrderPositionArray = new[]
{
new OrderPositions
{
LineNumber = 3
},
new OrderPositions
{
LineNumber = 2
},
new OrderPositions
{
LineNumber = 1
}
}
}
}
}
}
};
Using the code:
[Fact]
public void Sorts_By_Sequence_Number()
{
var ordered = _sut.OrderDataArray
.OrderBy(o => o.OrderHeaderArray
.OrderBy(a => a.OrderPositionArray
.OrderBy(p => p.LineNumber)))
.ToArray();
_sut.OrderDataArray = ordered;
OutputHelper(_sut);
}
I don't understand why this doesn't work, meaning the sorting routine simply keeps initial order of LineNumber object. I've tried various things with OrderBy, but looks like it doesn't sort.
EDIT
Thank you for responses, both are correct. I have accepted poke's response as it provides a bit more detailed information on the inner workings of the OrderBy method. Basically I was missing the assignment within the loop, I was trying to sort all objects at once.

You should consider what OrderBy does. It orders a collection by the value you determine in the lambda expression and then returns an enumerable.
Your outer call is good for that:
_sut.OrderDataArray.OrderBy(o => something).ToArray();
You sort by something, and then convert the result into a (then sorted) array. There are two things that matter here: First of all, at least in your example, there is only one object in OrderDataArray, so there is no sort happening. Second, it depends on the return value of something how those objects are sorted.
So in that case, what is something? It’s the following:
o.OrderHeaderArray.OrderBy(a => somethingElse)
So regardless of somethingElse, what does this return? An IEnumerable<OrderHeader>. How do multiple enumerables compare to each other? They are not really comparable; and they especially don’t tell you anything about the order based on their content (you’d have to enumerate it first). So essentially, you order that OrderHeaderArray by “something else”, use the result which does not tell you anything about an order as the key to order the OrderDataArray. Then, you throw the sorted OrderHeaderArray away.
You do the same exactly one level deeper with the OrderPositionArray, which again will not do anything useful. The only actual useful ordering happens to the OrderPositionArray itself but that result is again thrown away.
Now, if you want to order your structure, you should do so properly, by reassinging the sorted structure to the array. So at some point, you would have to do the following:
a.OrderPositionArray = a.OrderPositionArray.OrderBy(p => p.LineNumber).ToArray();
But apart from the OrderPositionArray itself and the OrderHeader, you don’t really have anything that can be sorted (because you can’t really sort a collection by the order of a subcollection). So you could could solve it like this:
foreach (OrderData data in _sut.OrderDataArray)
{
foreach (OrderHeader header in data.OrderHeaderArray)
{
header.OrderPositionArray = header.OrderPositionArray.OrderBy(p => p.LineNumber).ToArray();
}
data.OrderHeaderArray = data.OrderHeaderArray.OrderBy(h => h.SequenceNumber).ToArray();
}
Instead of Linq, you can also sort the arrays in-place, which maybe makes it a bit nicer since you are not creating new inner arrays:
var c = Comparer<int>.Default;
foreach (OrderData data in _sut.OrderDataArray)
{
foreach (OrderHeader header in data.OrderHeaderArray)
{
Array.Sort(header.OrderPositionArray, new Comparison<OrderPositions>((x, y) => c.Compare(x.LineNumber, y.LineNumber)));
}
Array.Sort(data.OrderHeaderArray, new Comparison<OrderHeader>((x, y) => c.Compare(x.SequenceNumber, y.SequenceNumber)));
}

Here,
var ordered = _sut.OrderDataArray.OrderBy(o => ...
expects the Func<OrderData, TKey>, and the values will be sorted by comparing the result of this function execution.
At the same time, you pass the result of another OrderBy, which is IOrderedEnumerable. It simply doesn't make much sense.
In order to sort all the nested collections, you can do the following:
foreach (var orderData in _sut.OrderDataArray)
{
foreach (var orderHeader in orderData.OrderHeaderArray)
{
orderHeader.OrderPositionArray = orderHeader.OrderPositionArray
.OrderBy(x => x.LineNumber).ToArray();
}
orderData.OrderHeaderArray = orderData.OrderHeaderArray
.OrderBy(x => x.SequenceNumber).ToArray();
}
_sut.OrderDataArray = _sut.OrderDataArray
.OrderBy(x => ...).ToArray();
It sorts every OrderPositionArray item by its items' LineNumber.
It sorts every OrderHeaderArray by headers' SequenceNumber.
However, it is pretty unclear how you want to sort _sut.OrderDataArray - it is marked as x => ... in the example.
It has no comparable properties which can be used for sorting.

Adding IEnumerable<Type> to IList<Type> where IList doesn't contain primary key - LAMBDA

I have an IList<Price> SelectedPrices. I also have an IEnumerable<Price> that gets retrieved at a later date. I would like to add everything from the latter to the former where the former does NOT contain the primary key defined in the latter. So for instance:
IList<Price> contains Price.ID = 1, Price.ID = 2, and IEnumerable<Price> contains Price.ID = 2, Price.ID = 3, and Price.ID = 4. What's the easiest way to use a lambda to add those items so that I end up with the IList containing 4 unique Prices? I know I have to call ToList() on the IList to get access to the AddRange() method so that I can add multiple items at once, but how do I select only the items that DON'T exist in that list from the enumerable?

I know I have to call ToList() on the IList to get access to the AddRange() method
This is actually not safe. This will create a new List<T>, so you won't add the items to your original IList<T>. You'll need to add them one at a time.
The simplest option is just to loop and use a contains:
var itemsToAdd = enumerablePrices.Where(p => !SelectedPrices.Any(sel => sel.ID == p.ID));
foreach(var item in itemsToAdd)
{
SelectedPrices.Add(item);
}
However, this is going to be quadratic in nature, so if the collections are very large, it may be slow. Depending on how large the collections are, it might actually be better to build a set of the IDs in advance:
var existing = new HashSet<int>(SelectedPrices.Select(p => p.ID));
var itemsToAdd = enumerablePrices.Where(p => !existing.Contains(p.ID));
foreach(var item in itemsToAdd)
{
SelectedPrices.Add(item);
}
This will prevent the routine from going quadratic if your collection (SelectedPrices) is large.

You can try that:
var newPrices = prices.Where(p => !SelectedPrices.Any(sp => sp.ID == p.ID));
foreach(var p in newPrices)
SelectedPrices.Add(p);
I know I have to call ToList() on the IList to get access to the AddRange() method so that I can add multiple items at once
ToList will create a new instance of List<Price>, so you will be modifying another list, not the original one... No, you need to add the items one by one.

Try yourEnumerable.Where(x => !yourList.Any(y => y.ID == x.ID)) for the selection part of your question.

If you want to add new elements to the existing list and do that in a most performant way you should probably do it in a conventional way. Like this:
IList<Price> selectedPrices = ...;
IEnumerable<Price> additionalPrices = ...;
IDictionary<int, Price> pricesById = new Dictionary<int, Price>();
foreach (var price in selectedPrices)
{
pricesById.Add(price.Id, price);
}
foreach (var price in additionalPrices)
{
if (!pricesById.ContainsKey(price.Id))
{
selectedPrices.Add(price);
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Categorical sorting optimization - c#

Related

Adding items to the list inside foreach loop

What is the fastest way to compare these two collections?

How to keep initializer list order within Select and/or SelectMany

OrderBy on nested collections

Adding IEnumerable<Type> to IList<Type> where IList doesn't contain primary key - LAMBDA

Categories

Resources