Speed up search and insert to the concurrentbag c# - c#

I have a problem with slow "building" a list and I don't have idea how to speed it up.
Here is my code:
private static ConcurrentBag<Classe<PojedynczeSlowa>> categoryClasses = new ConcurrentBag<Classe<PojedynczeSlowa>>();
private const int howManyStudents = 20;
private static int howManyClasses;
private static EventWaitHandle[] ewhClass;
private static List<Classe<Words>> deserializeClasses;
//...
public static void CreateCategoryClasses()
{
deserializeClasses = Deserialize();
howManyClasses = deserializeClasses.Count;
ewhClass = new EventWaitHandle[howManyClasses];
for (var i = 4; i >= 0; --i)
{
categoryClasses.Add(new Classe<PojedynczeSlowa>(((Categories) i).ToString()));
}
WaitCallback threadMethod = ParseCategories;
ThreadPool.SetMaxThreads(howManyStudents, howManyClasses);
for (var i = 0; i < howManyClasses; ++i)
{
ewhClass[i] = new EventWaitHandle(false, EventResetMode.AutoReset);
ThreadPool.QueueUserWorkItem(threadMethod, i);
}
for (var i = 0; i < howManyClasses; ++i)
{
ewhClass[i].WaitOne();
}
var xmls = new XmlSerializer(typeof(List<Classe<PojedynczeSlowa>>)); //poprawić!!
using (var sw = new StreamWriter(#"categoryClasses.xml"))
{
xmls.Serialize(sw, categoryClasses.ToList());
}
}
private static void ParseCategories(object index)
{
int sum;
var i = index as int?;
if (deserializeClasses[i.Value].Category == Categories.PEOPLE.ToString())
{
foreach (var word in deserializeClasses[i.Value].Bag)
{
sum =
deserializeClasses.Count(
clas =>
clas.Bag.Where(x => clas.Category == deserializeClasses[i.Value].Category)
.Contains(word));
if (!categoryClasses.ElementAt(0).Bag.Contains(new PojedynczeSlowa(word.Word, sum)))
{
categoryClasses.ElementAt(0)
.Bag.Add(new PojedynczeSlowa(word.Word,
Convert.ToDouble(sum)/
Convert.ToDouble(deserializeClasses.Count(x => x.Category == deserializeClasses[i.Value].Category))));
}
}
}
//rest of the code which adds elements to the list on other indexes.
ewhClass[(i).Value].Set();
}
I might add that:
deserializeClasses contains about 18550 elements of class "Word", and any of this elements ("Word") contains a list of string and int, average size of this list is about 200-250 elements. I use .net 4.5.1
Thanks for help!

A couple things (I don't have enough rep to comment so my comments are coming in here too)...
1) Class definitions would be very helpful. For example, you have
if (!categoryClasses.ElementAt(0).Bag.Contains(new PojedynczeSlowa(word.Word, sum)))
which will never be true if you haven't overridden object.Equals (did you?). Also, it's much harder to know what's going on with an incomplete sample.
2) Your code
sum = deserializeClasses.Count(clas => clas.Bag.Where(x => clas.Category == deserializeClasses[i.Value].Category).Contains(word));
doesn't make use of x at all. Consider
sum = deserializeClasses.Count(clas => clas.Category == deserializeClasses[i.Value].Category && clas.Bag.Contains(word));
This avoids much potential enumeration and could speed up the average cost even though the worst case cost remains the same.
3) Dictionaries are your friend. Consider making some temp dictionaries that are indexed by whatever you're checking against. I'm having a hard time figuring out exactly what you're trying to do (see comment 1) but I'm guessing you could save quite a bit of performance cost, particularly that Contains() call, with using a Dictionary.
4) I'm not sure that multithreading is going to save you anything here. I'm guessing it will make things slower since this looks to be CPU bound and you are adding CPU overhead with thread switching.
I would help out with some code but I'm in a bit of a hurry and don't have time to guess at the rest of the missing code to get everything to compile.

Related

How to "return" multiple times with for loop?

Hopefully this post gives more clarity as to what I am trying to achieve.
Objective: I want to spawn 20 apples(that have an attached button) from a list at runtime. When the apples are clicked they will spawn a popup with information pertaining to the apple that was clicked.
What I'm doing currently: I am using a for loop to run through the list to spawn the apples. I currently have the following code:
public class AppleInventory : MonoBehaviour
{
[SerializeField] private ApplesScript applPrefab;
[SerializeField] private Transform applParent;
public ApplesScript CreateApples()
{
var appl = Instantiate(applPrefab, applParent);
for (int i = 0; i < apples.Count; i++)
{
appl = Instantiate(applPrefab, applParent);
appl.InitAppleVisualization(apples[i].GetAppleSprite());
appl.AssignAppleButtonCallback(() => CreateApplePopUpInfo(i));
appl.transform.position = new Vector2(apples[i].x, apples[i].y);
}
return appl;
}
}
The Problem: The problem is that when I use the for loop and click on the button,it returns the following error: ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. The popup information also does not update.
Code without for loop: The code works to spawn one apple when I remove the for loop and set the int i = to a specific number, like below. It will give the correct popup info for any number that "i" is set to. This lets me know that it is not the rest of the code that is the issue. This leads me to believe it is the "return" line along with the for loop that is the issue. It seems I may need to "return" for each iteration but I am unsure of how to go about doing this.
public ApplesScript CreateApples()
{
int i = 7;
var appl = Instantiate(applPrefab, applParent);
appl.InitAppleVisualization(apples[i].GetAppleSprite());
appl.AssignAppleButtonCallback(() => CreateApplePopUpInfo(i));
appl.transform.position = new Vector2(apples[i].x, apples[i].y);
return appl;
}
Thank you,
-
UPDATE
The fix was so simple. I just ended up creating a new method specifically for the for loop and it worked the way I wanted. My code now looks like this:
public void StarterOfApplesCreation()
{
for (int i = 0; i < apples.Count; i++)
{
CreateApples(i);
}
}
public void CreateApples(int i)
{
var appl = Instantiate(applPrefab, applParent);
appl.InitAppleVisualization(apples[i].GetAppleSprite());
appl.AssignAppleButtonCallback(() => CreateApplePopUpInfo(i));
appl.transform.position = new Vector2(apples[i].x, apples[i].y);
}
You have two options. The conventional option is to create all the items first and then return them all in some sort of list, e.g.
public static void Main()
{
foreach (var thing in GetThings(5))
{
Console.WriteLine(thing.Number);
}
Console.ReadLine();
}
public static Thing[] GetThings(int count)
{
var things = new Thing[count];
for (var i = 0; i < count; i++)
{
things[i] = new Thing { Number = i };
}
return things;
}
The more modern option is to use an iterator. It actually will return one item at a time. It has the limitation that you have to use the items there and then - you won't have random access like you would an array or the like - but it also has advantages, e.g.
public static void Main()
{
foreach (var thing in GetThings(5))
{
Console.WriteLine(thing.Number);
}
Console.ReadLine();
}
public static IEnumerable<Thing> GetThings(int count)
{
for (var i = 0; i < count; i++)
{
var thing = new Thing { Number = i };
yield return thing;
}
}
The result of an iterator will usually be used as the source for a foreach loop or a LINQ query. Note that you can always call ToArray or ToList on the result of an iterator if you do want random access in specific situations, but you still have the advantages of an iterator elsewhere. For instance, let's say that your method produces 1000 items and you want to find the first one that matches some condition. Using my first example, you would have to create all 1000 items every time, even if the first one was a match. Using an iterator, because the items are processed as they are created, you can abort the process as soon as you find a match, meaning that you won't unnecessarily create the remaining items.
Note that my examples use the following class:
public class Thing
{
public int Number { get; set; }
}
You can copy and paste the code into a Console app that doesn't use top-level statements. The bones of the code will still work with top-level statements, but you'll need to make a few other modifications.
Store each separate "appl" that gets instantiated in an Array, ie appls[i]=appl
Do this within the for loop.
If you think about it, by putting the line "return appl;" outside the for loop, you are only storing that last game object, not all of them. Thats why creating an array of gameobjects and assigning them within the loop may work for you.

Is there a concurrent sorted dictionary or something similar?

For a project we've been working on we used a concurrent dictionary, which was fine until a new specification came up which required the dictionary to be sorted (it should remain in the order it was added, kind of like a FIFO).
This is currently what we do, we take an x amount (5 in this case) of items out of the dictionary:
private Dictionary<PriorityOfMessage, ConcurrentDictionary<Guid, PriorityMessage>> mQueuedMessages = new Dictionary<PriorityOfMessage, ConcurrentDictionary<Guid, PriorityMessage>>();
var messages = new List<KeyValuePair<Guid, PriorityMessage>>();
messages.AddRange(mQueuedMessages[priority].Take(5));
then we do some things with it and eventually if everything succeeds we removed them.
mQueuedMessages[priority].TryRemove(messageOfPriority.Key);
However if things fail we won't remove them and try later. So unfortunatly there is no concurrent sorted dictionary, but are there ways to ensure the messages stay in the order they are added?
It is very important we can take multiple objects from the list/dictionary without removing them (or we need to be able to add them to the front later).
How often will you take per second?
.
it could be a thousand times a second
1000 lock operations per second are absolutely nothing. This will consume almost no time at all.
my colleague has already tried using locks and lists and he deemed it too slow
In all likelihood this means that the locked region was too big. My guess is it went something like that:
lock (...) {
var item = TakeFromQueue();
Process(item);
DeleteFromQueue(item);
}
This does not work because Process is too slow. It must be:
lock (...)
var item = TakeFromQueue();
Process(item);
lock (...)
DeleteFromQueue(item);
You will not have any perf problems with that at all.
You can now pick any data structure that you like. You are no longer bound by the capabilities of the built-in concurrent data structures. Besides picking a data structure that you like you also can perform any operation on it that you like such as taking multiple items atomically.
I have not fully understood your needs but it sounds like SortedList might go in the right direction.
You could also go for another solution (haven't tested it performance-wise):
public class ConcurrentIndexableQueue<T> {
private long tailIndex;
private long headIndex;
private readonly ConcurrentDictionary<long, T> dictionary;
public ConcurrentIndexableQueue() {
tailIndex = -1;
headIndex = 0;
dictionary = new ConcurrentDictionary<long, T>();
}
public long Count { get { return tailIndex - headIndex + 1; } }
public bool IsEmpty { get { return Count == 0; } }
public void Enqueue(T item) {
var enqueuePosition = Interlocked.Increment(ref tailIndex);
dictionary.AddOrUpdate(enqueuePosition, k => item, (k, v) => item);
}
public T Peek(long index) {
T item;
return dictionary.TryGetValue(index, out item) ?
item :
default(T);
}
public long TryDequeue(out T item) {
if (headIndex > tailIndex) {
item = default(T);
return -1;
}
var dequeuePosition = Interlocked.Increment(ref headIndex) - 1;
dictionary.TryRemove(dequeuePosition, out item);
return dequeuePosition;
}
public List<T> GetSnapshot() {
List<T> snapshot = new List<T>();
long snapshotTail = tailIndex;
long snapshotHead = headIndex;
for (long i = snapshotHead; i < snapshotTail; i++) {
T item;
if (TryDequeue(out item) >= 0) {
snapshot.Add(item);
}
}
return snapshot;
}
}

More efficient and readable nested loop

I've created an algorithm which weighs the relevance of a list of articles against two lists of keywords that correlate to attributes of the article.
It works great and is super efficient... but it's a mess. It's not terribly readable, so it's difficult to discern what's going.
The operation in pseudo code goes something like this:
Loop through every article in a list called articles(List<Article>)
For every article, loop through every role in a list of roles (List<string>)
Check to see if the current article has any roles (Article.Roles = List<string>)
If yes, then loop through each role in the article and try to match a role in the article to the role in the current loop
If a match is found, add weight to the article. If the index of the role on the article and the role in the roles list are both index 0 (in primary position) add extra weight for two matching primaries
Repeat for topics, but with no bonus for primary matches
What would be a better way to write the following code? I can't use foreach except in one or two places, because I need to match indexes to know what value to add on a match.
private static List<Article> WeighArticles(List<Article> articles, List<string> roles, List<string> topics, List<string> industries)
{
var returnList = new List<Article>();
for (int currentArticle = 0; currentArticle < articles.Count; currentArticle++)
{
for (int currentRole = 0; currentRole < roles.Count; currentRole++)
{
if (articles[currentArticle].Roles != null && articles[currentArticle].Roles.Count > 0)
{
for (int currentArticleRole = 0; currentArticleRole < articles[currentArticle].Roles.Count; currentArticleRole++)
{
if (articles[currentArticle].Roles[currentArticleRole].ToLower() == roles[currentRole].ToLower())
{
if (currentArticleRole == 0 && currentRole == 0)
articles[currentArticle].Weight += 3;
else
articles[currentArticle].Weight += 1;
}
}
}
}
for (int currentTopic = 0; currentTopic < topics.Count; currentTopic++)
{
if (articles[currentArticle].Topics != null && articles[currentArticle].Topics.Count > 0)
{
for (int currentArticleTopic = 0; currentArticleTopic < articles[currentArticle].Topics.Count; currentArticleTopic++)
{
if (articles[currentArticle].Topics[currentArticleTopic].ToLower() == topics[currentTopic].ToLower())
{
articles[currentArticle].Weight += 0.8;
}
}
}
}
returnList.Add(articles[currentArticle]);
}
return returnList;
}
//Article Class stub (unused properties left out)
public class Article
{
public List<string> Roles { get; set; }
public List<string> Topics { get; set; }
public double Weight { get; set; }
}
If you'll examine your code, you'll find that you are asking Article class to many times for data. Use Tell, Don't Ask principle and move weight adding logic to Article class, where it should belong. That will increase cohesion of Article, and make your original code much more readable. Here is how your original code will look like:
foreach(var article in articles)
{
article.AddWeights(roles);
article.AddWeights(topics);
}
And Article will look like:
public double Weight { get; private set; } // probably you don't need setter
public void AddWeights(IEnumerable<Role> roles)
{
const double RoleWeight = 1;
const double PrimaryRoleWeight = 3;
if (!roles.Any())
return;
if (Roles == null || !Roles.Any())
return;
var pirmaryRole = roles.First();
var comparison = StringComparison.CurrentCultureIgnoreCase;
if (String.Equals(Roles[0], primaryRole, comparison))
{
Weight += PrimaryRoleWeight;
return;
}
foreach(var role in roles)
if (Roles.Contains(role, StringComparer.CurrentCultureIgnoreCase))
Weight += RoleWeight;
}
Adding topics weights:
public void AddWeights(IEnumerable<Topic> topics)
{
const double TopicWeight = 0.8;
if (Topics == null || !Topics.Any() || !topics.Any())
return;
foreach(var topic in topics)
if (Topics.Contains(topic, StringComparer.CurrentCultureIgnoreCase))
Weight += TopicWeight;
}
Okay, you have several design flaws in your code:
1 - It's too procedural. You need to learn to think to write code to tell the machine "what you want" as opposed to "how to do it", similar to the analogy of going to a bar and instructing the bartender about the exact proportions of everything instead of just asking for a drink.
2 - Collections Should NEVER be null. Which means that checking for articles[x].Roles != null makes no sense at all.
3 - iterating on a List<string> and comparing each with someOtherString makes no sense either. Use List<T>.Contains() instead.
4 - You're grabbing each and every one of the items in the input list and outputting them in a new list. Also nonsense. Either return the input list directly or create a new list by using inputList.ToList()
All in all, here's a more idiomatic C# way of writing that code:
private static List<Article> WeighArticles(List<Article> articles, List<string> roles, List<string> topics, List<string> industries)
{
var firstRole = roles.FirstOrDefault();
var firstArticle = articles.FirstOrDefault();
var firstArticleRole = firstArticle.Roles.FirstOrDefault();
if (firstArticleRole != null && firstRole != null &&
firstRole.ToLower() == firstArticleRole.ToLower())
firstArticle.Weight += 3;
var remaining = from a in articles.Skip(1)
from r in roles.Skip(1)
from ar in a.Roles.Skip(1)
where ar.ToLower() == r.ToLower()
select a;
foreach (var article in remaining)
article.Weight += 1;
var hastopics = from a in articles
from t in topics
from at in a.Topics
where at.ToLower() == t.ToLower()
select a;
foreach (var article in hastopics)
article.Weight += .8;
return articles;
}
There are even better ways to write this, such as using .Take(1) instead of .FirstOrDefault()
Use the Extract Method refactoring on each for loop and give it a semantic name WeightArticlesForRole, WeightArticlesForTopic, etc. this will eliminate the nested loops(they are still there but via function call passing in a list).
It will also make your code self documenting and more readable, as now you have boiled a loop down to a named method that reflects what it accomplishes. those reading your code will be most interested in understanding what it accomplishes first before trying to understand how it accomplishes it. Semantic/conceptual function names will facilitate this. They can use GoTo Definition to determine the how after they udnerstand the what. Provide the summary tag comment for each method with elaborated explanation(similar to your pseudo code) and now others can wrap their head around what your code is doing without having to tediously read code they aren't concerned with the implementation details of.
The refactored methods will likely have some dirty looking parameters, but they will be private methods so I generally don't worry about this. However, sometimes it helps me see what dependencies are there that should probably be removed and restructure the code in the call such that it can be reused from multiple places. I suspect with some params for the weighting and delegate functions you might be able to combine WeightArticlesForRole and WeightArticlesForTopic into a single function to be reused in both places.

Why Doesn't My Anonymous Method Work in a Loop?

This function is supposed to set descending order numbers on an IEnumerable<Order>, but it doesn't work. Can anyone tell me what's wrong with it?
private void orderNumberSetter(IEnumerable<Order> orders)
{
var i = 0;
Action<Order, int> setOrderNumber = (Order o, int count) =>
{
o.orderNumber = i--;
};
var orderArray = orders.ToArray();
for (i = 0; i < orders.Count(); i++)
{
var order = orderArray[i];
setOrderNumber(order, i);
}
}
You are re-using i as loop variable and i gets modified in your setOrderNumber lambda - don't modify i - it's unclear what you meant to do, maybe the following:
Action<Order, int> setOrderNumber = (Order o, int count) =>
{
o.orderNumber = count;
};
If the above is the case you could have achieved that much, much easier though, your code seems unnecessarily complex, i.e:
for (i = 0; i < orderArray.Length; i++)
{
orderArray[i].orderNumber = i;
}
or even simpler without having to create an array at all:
int orderNum = 0;
foreach(var order in orders)
{
order.orderNumber = orderNum++;
}
Edit:
To set descending order numbers, you can determine the number of orders first then go backwards from there:
int orderNum = orders.Count();
foreach(var order in orders)
{
order.orderNumber = orderNum--;
}
Above would produce one based order numbers in descending order. Another approach, more intuitive and probably easier to maintain is to just walk the enumeration in reverse order:
int orderNum = 0;
foreach(var order in orders.Reverse())
{
order.orderNumber = orderNum++;
}
I agree with BrokenGlass, you are running into an infinite loop.
You could achieve the same thing using foreach:
private void orderNumberSetter(IEnumerable<Order> orders)
{
var count = orders.Count();
orders.ToList().ForEach(o =>
{
o.orderNumber = count--;
});
}
I would try this code instead that decrements i while it enumerates through the array
private void orderNumberSetter(IEnumerable<Order> orders)
{
int i = orders.Count();
foreach (Order order in orders.ToArray())
{
order.orderNumber = --i;
}
}
Though its hard to tell what your trying to do, its a good bet that you didn't mean to keep referring to the same variable i, which is whats causing an infinite loop.
heres another example of what I believe you wanted
IEnumerable<Order> reversed = orders.ToArray(); //To avoid editing the original
reversed.Reverse();
int orderNumber = 0;
foreach (Order order in reversed)
{
order.orderNumber = orderNumber++;
}
I suggest editing your title. Your title describes your question, and I'm sure you didn't want a Broken C# function, since you already had one :P. Its also good to describe what your code to do in the post thoroughly, including what your expected results are, and how your current example doesn't meet them. Don't let your non working example alone explain what you want, It only showed us an example of what you didn't want.

Is there a more efficent way to randomise a set of LINQ results?

I've produced a function to get back a random set of submissions depending on the amount passed to it, but I worry that even though it works now with a small amount of data when the large amount is passed through, it would become efficent and cause problems.
Is there a more efficent way of doing the following?
public List<Submission> GetRandomWinners(int id)
{
List<Submission> submissions = new List<Submission>();
int amount = (DbContext().Competitions
.Where(s => s.CompetitionId == id).FirstOrDefault()).NumberWinners;
for (int i = 1 ; i <= amount; i++)
{
bool added = false;
while (!added)
{
bool found = false;
var randSubmissions = DbContext().Submissions
.Where(s => s.CompetitionId == id && s.CorrectAnswer).ToList();
int count = randSubmissions.Count();
int index = new Random().Next(count);
foreach (var sub in submissions)
{
if (sub == randSubmissions.Skip(index).FirstOrDefault())
found = true;
}
if (!found)
{
submissions.Add(randSubmissions.Skip(index).FirstOrDefault());
added = true;
}
}
}
return submissions;
}
As I say, I have this fully working and bringing back the wanted result. It is just that I'm not liking the foreach and while checks in there and my head has just turned to mush now trying to come up with the above solution.
(Please read all the way through, as there are different aspects of efficiency to consider.)
There are definitely simpler ways of doing this - and in particular, you really don't need to perform the query for correct answers repeatedly. Why are you fetching randSubmissions inside the loop? You should also look at ElementAt to avoid the Skip and FirstOrDefault - and bear in mind that as randSubmissions is a list, you can use normal list operations, like the Count property and the indexer!
The option which comes to mind first is to perform a partial shuffle. There are loads of examples on Stack Overflow of a modified Fisher-Yates shuffle. You can modify that code very easily to avoid shuffling the whole list - just shuffle it until you've got as many random elements as you need. In fact, these days I'd probably implement that shuffle slightly differently to you could just call:
return correctSubmissions.Shuffle(random).Take(amount).ToList();
For example:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
for (int i = 0; i < elements.Length; i++)
{
// Find an item we haven't returned yet
int swapIndex = i + rng.Next(elements.Length - i);
T tmp = elements[i];
yield return elements[swapIndex];
elements[swapIndex] = tmp;
// Note that we don't need to copy the value into elements[i],
// as we'll never use that value again.
}
}
Given the above method, your GetRandomWinners method would look like this:
public List<Submission> GetRandomWinners(int competitionId, Random rng)
{
List<Submission> submissions = new List<Submission>();
int winnerCount = DbContext().Competitions
.Single(s => s.CompetitionId == competitionId)
.NumberWinners;
var correctEntries = DbContext().Submissions
.Where(s => s.CompetitionId == id &&
s.CorrectAnswer)
.ToList();
return correctEntries.Shuffle(rng).Take(winnerCount).ToList();
}
I would advise against creating a new instance of Random in your method. I have an article on preferred ways of using Random which you may find useful.
One alternative you may want to consider is working out the count of the correct entries without fetching them all, then work out winning entries by computing a random selection of "row IDs" and then using ElementAt repeatedly (with a consistent order). Alternatively, instead of pulling the complete submissions, pull just their IDs. Shuffle the IDs to pick n random ones (which you put into a List<T>, then use something like:
return DbContext().Submissions
.Where(s => winningIds.Contains(s.Id))
.ToList();
I believe this will use an "IN" clause in the SQL, although there are limits as to how many entries can be retrieved like this.
That way even if you have 100,000 correct entries and 3 winners, you'll only fetch 100,000 IDs, but 3 complete records. Hope that makes sense!

Categories