How to ensure thread safe ASP.net page to access static list of objects - c#

In my web application i am having following common objectList for all online users.
public static List<MyClass> myObjectList = new List<MyClass>();
so when multiple online users try to read data from this object myObjectList then are there any chances of thread synchronization issue.
In another scenario multiple users are reading from myObjectList and few of them are writing also but every user is writing on a different index of List . Every user may add a new item to this list . So now I think there are chances of synchronization issue.
How to write thread safe utility class that can read and write data from this object in safer way.
Suggestions are highly welcome
Code suggested by Angelo looks like this
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace ObjectPoolExample
{
public class ObjectPool<T>
{
private ConcurrentBag<T> _objects;
private Func<T> _objectGenerator;
public ObjectPool(Func<T> objectGenerator)
{
if (objectGenerator == null) throw new ArgumentNullException("objectGenerator");
_objects = new ConcurrentBag<T>();
_objectGenerator = objectGenerator;
}
public T GetObject()
{
T item;
if (_objects.TryTake(out item)) return item;
return _objectGenerator();
}
public void PutObject(T item)
{
_objects.Add(item);
}
}
class Program
{
static void Main(string[] args)
{
CancellationTokenSource cts = new CancellationTokenSource();
// Create an opportunity for the user to cancel.
Task.Factory.StartNew(() =>
{
if (Console.ReadKey().KeyChar == 'c' || Console.ReadKey().KeyChar == 'C')
cts.Cancel();
});
ObjectPool<MyClass> pool = new ObjectPool<MyClass> (() => new MyClass());
// Create a high demand for MyClass objects.
Parallel.For(0, 1000000, (i, loopState) =>
{
MyClass mc = pool.GetObject();
Console.CursorLeft = 0;
// This is the bottleneck in our application. All threads in this loop
// must serialize their access to the static Console class.
Console.WriteLine("{0:####.####}", mc.GetValue(i));
pool.PutObject(mc);
if (cts.Token.IsCancellationRequested)
loopState.Stop();
});
Console.WriteLine("Press the Enter key to exit.");
Console.ReadLine();
}
}
// A toy class that requires some resources to create.
// You can experiment here to measure the performance of the
// object pool vs. ordinary instantiation.
class MyClass
{
public int[] Nums {get; set;}
public double GetValue(long i)
{
return Math.Sqrt(Nums[i]);
}
public MyClass()
{
Nums = new int[1000000];
Random rand = new Random();
for (int i = 0; i < Nums.Length; i++)
Nums[i] = rand.Next();
}
}
}
I think i can go with this approach.

If you are using .NET 4.0 you are better off changing to one of the thread-safe collections already supported by the runtime, like for example a ConcurrentBag.
The concurrent bag however does not support access by index if I recall correctly so you may need to resort to a ConcurrentDictionary if you need access to an object by a given key.
If .NET 4.0 is not an option you should read the following blog post:
Why are thread safe collections so hard?

Related

Enforcing asynchronous computation for a list of objects

I have a class that performs some heavy calculations. For a bunch of different inputs I would like to do those calculations in parallel in multiple threads, because they are independent from each other. How can I enforce that? I tried this code (dummy test from dotnetfiddle) but the calculations are already being done in the "list generation" instead of the Task.AwaitAll
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
using System.Diagnostics;
public class Program
{
public static async Task Main()
{
List<Model> models = new List<Model>();
for (int i = 100; i<10000; i++) {
models.Add(new Model(i));
}
List<Task> list = new();
/* assume list of model with different input sets */
var sw = new Stopwatch();
sw.Start();
foreach(var model in models)
{
list.Add(model.Calculate());
}
Console.WriteLine(sw.ElapsedMilliseconds.ToString()); // here I would assume 0, but all the calculations are already done
sw.Restart();
await Task.WhenAll(list);
Console.WriteLine(sw.ElapsedMilliseconds.ToString()); // yields almost 0
}
}
public class Model
{
public double Result {get; set;}
public int input {get; set;}
public Model(int Input) {
input = Input;
Result = 0;
}
public Task Calculate()
{
/* do "heavy" stuff */
for (int i = 0; i < input; i++) {
Result += i;
}
return Task.CompletedTask;
}
}
Returning a Task does not magically run the code on a new thread; in fact it indicates that the method is asynchronous rather than computationally expensive.
Regardless, Calculate has a completely synchronous implementation, and will run synchronously, irrespective of the Task being returned.
This being said, don't return a Task from Calculate.
If you want to use Tasks to run calculations on separate threads, then there is Task.Run, which offloads work to the ThreadPool, and returns a Task representing that work.
An example:
foreach (var model in models)
{
list.Add(Task.Run(model.Calculate));
}
await Task.WhenAll(list);
Or more concisely:
await Task.WhenAll(models.Select(model => Task.Run(model.Calculate)));
Although Parallel.ForEach is designed specifically for this purpose:
Parallel.ForEach(models, model => model.Calculate());

C# Multi-threading, wait for all task to complete in a situation when new tasks are being constantly added

I have a situation where new tasks are being constantly generated and added to a ConcurrentBag<Tasks>.
I need to wait all tasks to complete.
Waiting for all the tasks in the ConcurrentBag via WaitAll is not enough as the number of tasks would have grown while the previous wait is completed.
At the moment I am waiting it in the following way:
private void WaitAllTasks()
{
while (true)
{
int countAtStart = _tasks.Count();
Task.WaitAll(_tasks.ToArray());
int countAtEnd = _tasks.Count();
if (countAtStart == countAtEnd)
{
break;
}
#if DEBUG
if (_tasks.Count() > 100)
{
tokenSource.Cancel();
break;
}
#endif
}
}
I am not very happy with the while(true) solution.
Can anyone suggest a better more efficient way to do this (without having to pool the processor constantly with a while(true))
Additional context information as requested in the comments. I don't think though this is relevant to the question.
This piece of code is used in a web crawler. The crawler scans page content and looks for two type of information. Data Pages and Link Pages. Data pages will be scanned and data will be collected, Link Pages will be scanned and more links will be collected from them.
As each of the tasks carry-on the activities and find more links, they add the links to an EventList. There is an event OnAdd on the list (code below) that is used to trigger other task to scan the newly added URLs. And so forth.
The job is complete when there are no more running tasks (so no more links will be added) and all items have been processed.
public IEventList<ISearchStatus> CurrentLinks { get; private set; }
public IEventList<IDataStatus> CurrentData { get; private set; }
public IEventList<System.Dynamic.ExpandoObject> ResultData { get; set; }
private readonly ConcurrentBag<Task> _tasks = new ConcurrentBag<Task>();
private readonly CancellationTokenSource tokenSource = new CancellationTokenSource();
private readonly CancellationToken token;
public void Search(ISearchDefinition search)
{
CurrentLinks.OnAdd += UrlAdded;
CurrentData.OnAdd += DataUrlAdded;
var status = new SearchStatus(search);
CurrentLinks.Add(status);
WaitAllTasks();
_exporter.Export(ResultData as IList<System.Dynamic.ExpandoObject>);
}
private void DataUrlAdded(object o, EventArgs e)
{
var item = o as IDataStatus;
if (item == null)
{
return;
}
_tasks.Add(Task.Factory.StartNew(() => ProcessObjectSearch(item), token));
}
private void UrlAdded(object o, EventArgs e)
{
var item = o as ISearchStatus;
if (item==null)
{
return;
}
_tasks.Add(Task.Factory.StartNew(() => ProcessFollow(item), token));
_tasks.Add(Task.Factory.StartNew(() => ProcessData(item), token));
}
public class EventList<T> : List<T>, IEventList<T>
{
public EventHandler OnAdd { get; set; }
private readonly object locker = new object();
public new void Add(T item)
{
//lock (locker)
{
base.Add(item);
}
OnAdd?.Invoke(item, null);
}
public new bool Contains(T item)
{
//lock (locker)
{
return base.Contains(item);
}
}
}
I think that this task can be done with TPL Dataflow library with very basic setup. You'll need a TransformManyBlock<Task, IEnumerable<DataTask>> and an ActionBlock (may be more of them) for actual data processing, like this:
// queue for a new urls to parse
var buffer = new BufferBlock<ParseTask>();
// parser itself, returns many data tasks from one url
// similar to LINQ.SelectMany method
var transform = new TransformManyBlock<ParseTask, DataTask>(task =>
{
// get all the additional urls to parse
var parsedLinks = GetLinkTasks(task);
// get all the data to parse
var parsedData = GetDataTasks(task);
// setup additional links to be parsed
foreach (var parsedLink in parsedLinks)
{
buffer.Post(parsedLink);
}
// return all the data to be processed
return parsedData;
});
// actual data processing
var consumer = new ActionBlock<DataTask>(s => ProcessData(s));
After that you need to link the blocks between each over:
buffer.LinkTo(transform, new DataflowLinkOptions { PropagateCompletion = true });
transform.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true });
Now you have a nice pipeline which will execute in background. At the moment you realize that everything you need is parsed, you simply call the Complete method for a block so it stops accepting news messages. After the buffer became empty, it will propagate the completion down the pipeline to transform block, which will propagate it down to consumer(s), and you need to wait for Completion task:
// no additional links would be accepted
buffer.Complete();
// after all the tasks are done, this will get fired
await consumer.Completion;
You can check the moment for a completion, for example, if both buffer' Count property and transform' InputCount and transform' CurrentDegreeOfParallelism (this is internal property for the TransformManyBlock) are equal to 0.
However, I suggested you to implement some additional logic here to determine current transformers number, as using the internal logic isn't a great solution. As for cancelling the pipeline, you can create a TPL block with a CancellationToken, either the one for all, or a dedicated for each block, getting the cancellation out of box.
Why not write one function that yields your tasks as necessary, when they are created? This way you can just use Task.WhenAll to wait for them to complete or, have I missed the point? See this working here.
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
try
{
Task.WhenAll(GetLazilyGeneratedSequenceOfTasks()).Wait();
Console.WriteLine("Fisnished.");
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
public static IEnumerable<Task> GetLazilyGeneratedSequenceOfTasks()
{
var random = new Random();
var finished = false;
while (!finished)
{
var n = random.Next(1, 2001);
if (n < 50)
{
finished = true;
}
if (n > 499)
{
yield return Task.Delay(n);
}
Task.Delay(20).Wait();
}
yield break;
}
}
Alternatively, if your question is not as trivial as my answer may suggest, I'd consider a mesh with TPL Dataflow. The combination of a BufferBlock and an ActionBlock would get you very close to what you need. You could start here.
Either way, I'd suggest you want to include a provision for accepting a CancellationToken or two.

Consuming blocking collection with multiple tasks/consumers

I have the following code that I populate users from a source, for the sake of example it is as below. what I want to do is to consume BlockingCollection with multiple consumers.
Is below the right way to do that? Also what would be the best number of threads ? ok this would depend on hardware, memory etc. Or how can i do it in a better way?
Also would below implementation ensure that i will process everything in the collection until it is empty?
class Program
{
public static readonly BlockingCollection<User> users = new BlockingCollection<User>();
static void Main(string[] args)
{
for (int i = 0; i < 100000; i++)
{
var u = new User {Id = i, Name = "user " + i};
users.Add(u);
}
Run();
}
static void Run()
{
for (int i = 0; i < 100; i++)
{
Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning);
}
}
static void Process()
{
foreach (var user in users.GetConsumingEnumerable())
{
Console.WriteLine(user.Id);
}
}
}
public class User
{
public int Id { get; set; }
public string Name { get; set; }
}
A few small things
You never called CompleteAdding, by not doing that your consuming foreach loops will never complete and hang forever. Fix that by doing users.CompleteAdding() after the initial for loop.
You never wait for the work to finish, Run() will spin up your 100 threads (which likely WAY too much unless your real process involves a lot of waiting for uncontested resources). Because Tasks are not foreground threads they will not keep your program open when your Main exits. You need a CountdownEvent to track when everything is done.
You don't start up your consumers till after your producer has finished all of it's work, you should spin off the producer in to a separate thread or start the consumers first so they are ready to work while you populate the producer on the main thread.
here is a updated version of the code with the fixes
class Program
{
private const int MaxThreads = 100; //way to high for this example.
private static readonly CountdownEvent cde = new CountdownEvent(MaxThreads);
public static readonly BlockingCollection<User> users = new BlockingCollection<User>();
static void Main(string[] args)
{
Run();
for (int i = 0; i < 100000; i++)
{
var u = new User {Id = i, Name = "user " + i};
users.Add(u);
}
users.CompleteAdding();
cde.Wait();
}
static void Run()
{
for (int i = 0; i < MaxThreads; i++)
{
Task.Factory.StartNew(Process, TaskCreationOptions.LongRunning);
}
}
static void Process()
{
foreach (var user in users.GetConsumingEnumerable())
{
Console.WriteLine(user.Id);
}
cde.Signal();
}
}
public class User
{
public int Id { get; set; }
public string Name { get; set; }
}
For the "Best number of threads" like I said earlier, it really depends on what you are waiting on.
If what you are processing is CPU bound, the optimum number of threads is likely Enviorment.ProcessorCount.
If what you are doing is waiting on a external resource, but new requests do not affect old requests (for example asking 20 different servers for information, server the load on server n does not affect the load on server n+1) in that case I would let Parallel.ForEach just choose the number of threads for you.
If you are waiting on a resource that is contended (for example reading/writing to a hard disk) you will want to not use very many threads at all (perhaps even only use one). I just posted a answer in another question about that, when reading in from the hard disk, you should only just use one thread at a time so the hard drive is not jumping around all over trying to complete all the reads at once.

Locking in a factory method

I am interfacing with a back-end system, where I must never ever have more than one open connection to a given object (identified by it's numeric ID), but different consumers may be opening and closing them independently of one another.
Roughly, I have a factory class fragment like this:
private Dictionary<ulong, IFoo> _openItems = new Dictionary<ulong, IFoo>();
private object _locker = new object();
public IFoo Open(ulong id)
{
lock (_locker)
{
if (!_openItems.ContainsKey(id))
{
_openItems[id] = _nativeResource.Open(id);
}
_openItems[id].RefCount++;
return _openItems[id];
}
}
public void Close(ulong id)
{
lock (_locker)
{
if (_openItems.ContainsKey(id))
{
_openItems[id].RefCount--;
if (_openItems[id].RefCount == 0)
{
_nativeResource.Close(id);
_openItems.Remove(id);
}
}
}
}
Now, here is the problem. In my case, _nativeResource.Open is very slow. The locking in here is rather naive and can be very slow when there are a lot of different concurrent .Open calls, even though they are (most likely) referring to different ids and don't overlap, especially if they are not in the _openItems cache.
How do I structure the locking so that I am only preventing concurrent access to a specific ID and not to all callers?
What you may want to look into is a striped locking strategy. The idea is that you share N locks for M items (possible ID's in your case), and choose a lock such that for any ID the lock chosen is always the same one. The classic way of choosing locks for this technique is modulo division- simply divide M by N, take the remainder, and use the lock with that index:
// Assuming the allLocks class member is defined as follows:
private static AutoResetEvent[] allLocks = new AutoResetEvent[10];
// And initialized thus (in a static constructor):
for (int i = 0; i < 10; i++) {
allLocks[i] = new AutoResetEvent(true);
}
// Your method becomes
var lockIndex = id % allLocks.Length;
var lockToUse = allLocks[lockIndex];
// Wait for the lock to become free
lockToUse.WaitOne();
try {
// At this point we have taken the lock
// Do the work
} finally {
lockToUse.Set();
}
If you are on .net 4, you could try the ConcurrentDictionary with something along these lines:
private ConcurrentDictionary<ulong, IFoo> openItems = new ConcurrentDictionary<ulong, IFoo>();
private object locker = new object();
public IFoo Open(ulong id)
{
var foo = this.openItems.GetOrAdd(id, x => nativeResource.Open(x));
lock (this.locker)
{
foo.RefCount++;
}
return foo;
}
public void Close(ulong id)
{
IFoo foo = null;
if (this.openItems.TryGetValue(id, out foo))
{
lock (this.locker)
{
foo.RefCount--;
if (foo.RefCount == 0)
{
if (this.openItems.TryRemove(id, out foo))
{
this.nativeResource.Close(id);
}
}
}
}
}
If anyone can see any glaring issues with that, please let me know!

working with threads - add to collection

List<int> data=new List<int>();
foreach(int id in ids){
var myThread=new Thread(new ThreadStart(Work));
myThread.Start(id);
}
Work(){
}
Method Work does some processing on the received id and then adds the result to the data list? How can I add data to the collection from each thread? How would my code look like? thanks
If you're using .NET 4, I strongly suggest you use Parallel Extensions instead. For example:
var list = ids.AsParallel()
.Select(Work)
.ToList();
where Work is:
public int Work(int id)
{
...
}
so that it can receive the id appropriately. If you're not keen on the method conversion, you could add a lambda expression:
var list = ids.AsParallel()
.Select(id => Work(id))
.ToList();
Either way, this will avoid creating more threads than you really need, and deal with the thread safety side of things without you having to manage the locks yourself.
First of all, you need to protect your multithreaded access with a lock. Second, you need to pass the parameter to your thread (or use lambda which can capture the local variable; beware that if you capture loop variable, it will change the value during the loop, so you ought to have a local copy).
object collectionLock = new object();
List<int> data = new List<int>();
foreach (int id in ids)
{
Thread t = new Thread(Worker);
t.Start(id);
}
void Worker(object o)
{
int id = (int)o;
lock(collectionLock)
{
data.Add(id);
}
}
you can pass and retrieve data (using callbacks) from threads. See MSDN article.
Example:
public class SomeClass
{
public static List<int> data = new List<int>();
public static readonly object obj = new object();
public void SomeMethod(int[] ids)
{
foreach (int id in ids)
{
Work w = new Work();
w.Data = id;
w.callback = ResultCallback;
var myThread = new Thread(new ThreadStart(w.DoWork));
myThread.Start();
}
}
public static void ResultCallback(int d)
{
lock (obj)
{
data.Add(d);
}
}
}
public delegate void ExampleCallback(int data);
class Work
{
public int Data { get; set; }
public ExampleCallback callback;
public void DoWork()
{
Console.WriteLine("Instance thread procedure. Data={0}", Data);
if (callback != null)
callback(Data);
}
}

Categories