Prevent or forbid use of Parallel.ForEach

Prevent or forbid use of Parallel.ForEach - c#

I wrote a library on top of System.Data.SQLite and realized it (my library) has unpredictable behaviour when using Parallel.ForEach. I might debug this eventually (i.e. if I get/take the time), most likely by locking the right parts, but for now let's say I want to just prevent usage of Parallel.ForEach, or force usage of my library to allow (or result in) only a single thread, how would I proceed?

You can't control how your API is consumed by external code. If it's something you absolutely cannot address prior to release, it would be a good idea to be very explicit about failure cases in your documentation (both XML comments and any sort of "help file").

A few quick threadstatic attributes might solve your concurrency problem, but this smells like the tip of a much larger iceberg. Fix the root cause, not the symptom.

Related

AccessViolationException when accessing variable solution value

We've been utilizing the OR tools to solve linear optimizations in a real-time, .NET application. That is, solving linear optimizations regularly using different inputs as time progresses.
Recently we ran into an issue that we haven't seen before while running our application on a server for extended periods of time, in which seemingly random attempts to solve the optimizations were causing AccessViolationExceptions. Specifically,
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
at Google.OrTools.LinearSolver.operations_research_linear_solverPINVOKE.Variable_SolutionValue(System.Runtime.InteropServices.HandleRef)
...
I'm trying to find out more specifically where this is happening in the pipeline, but given the output there I believe it is a section in which we are trying to retrieve the individual variable solution values out of the solver after solving the optimization.
We are using a wide variety of constraints over a decent sized number of variables.
Has anyone seen this before?

Reference github issue link
After some testing we found that what appears to have been happening is that the garbage collector was collecting some of the Variables we were using during the P/Invoke, as per this.
Unfortunately, this seems to be a side effect of the way that SWIG creates its .NET wrappers and their IDisposable implementations, using HandleRefs instead of something like SafeHandles, which 'handle' this as per the documentation:
Platform invoke operations automatically increment the reference count of handles encapsulated by a SafeHandle and decrement them upon completion. This ensures that the handle will not be recycled or closed unexpectedly.
More information here.
Without wanting to get into the business of creating our own SWIG typemap or compiling a new version of SWIG, .NET provides a way of keeping objects 'alive' with regard to the Garbage Collector. That is, calling GC.KeepAlive on all of the objects which we will be accessing values from via P/Invoke (in our case the Solver and our Variables) at the end of the optimization procedure, prevents the garbage collector from thinking that they are collectible until the end of the scope of the KeepAlive method without side effects (as per their documentation).
Preliminary testing has shown this to work, though given that it was already intermittently occurring before, we'll be watching for this happening going forward.
Going forward, I think either making a request of SWIG to use SafeHandles is probably the best idea (it has been discussed before and is still an open issue) or changing the typemap to use SafeHandles directly, is likely the best option. I may try investigating the later option myself, but because this fix ended up only adding 3 lines of code (plus a host of comments) to our code base for what seems like a full fix, it's going to be low priority for me. That said, a fix for this would be nice for an upcoming version.

Threading not possible suggest any other way to do parallel processing

My code in C# which i put in threading is not thread safe as it involves lots of database connections, stored procedure executions. but still i want to minimize the time required for the execution, can anyone suggest me something for parallel or asynchronous processing.. anything from database side or .net side...
I am stuck with threading.. not able to apply...
Thanks

There is not nearly enough information in your question to suggest an optimized solution. However, if you must deal with resources that are not thread safe, one strategy is to spawn separate processes to handle sub-tasks.
Keep in mind, though, that separate processes still would have to handle portions of the overall solution that do not step on each other.

Well . There are couple of things that can be done . However I dont know how much it really fits you.
On database side you can do Query Optimizations , Indexing and other stuff that may help increase the query run time. Use profilers to analyse . See query plans to check if the indexes are properly used.
Use NOLOCKS but only where you see that you can USE NO LOCKS in your selects. (Not always a good practice but is used)
Implement proper Synchronized threading that can process multiple request. There is no other way from .Net Side. You have to review your design properly. However you can do code optimizations as well using profiler. You can use Task Library as well.
Other thing is There could be an issue with your server. Check CPU and Memory utilizations.
(This is my least of the concerns).

You should explain better what you are doing in your code...
If you do a lot of loops, you should try Parallel.For / Parallel.Foreach
http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.aspx
In Parallels you can also queue tasks for ordered computation or divide loops into blocks that could improve overall performance...
The best I can say, with so little information.
Hope it helps.

How do i get a reference to all managed threads

I know there's nothing in the box ... but does anyone have any tricks.
Managed threads not OS threads please.
Cheers
Answering the comments:
Version is .Net 3.5.
I want all managed threads in the current running process.
I want them so I can get the call stack of everythread.
Thanks

I suspect that anything at this level would be done with the debugging hooks outside of managed code. By design, it isn't really geared up to let you do that. Of course, you could just use any existing debugger, etc (even just windbg/sos).
For you own threads - simply store away a reference when you create them. But of course, don't do this as a mechanism to abort them etc - there are much better (i.e. workable) ways of doing that with things like Monitor, Mutex, etc.
Of course, if you don't mind stepping outside of managed code I'm sure there are options...

How to prove that multithreading is working?

How can I prove that multithreading is working in my C# programs? This is for a testing requirement. For example, I'm going to have to add some locking to a logger class (yes I know, I shouldn't have written my own logging class), and I need a test case to prove that the change will work.

If you want to test that your locking code is correctly synchronizing access to your log(s), you need to construct tests that guarantee contention. This may require you to refactor your code so that you can inject a mock log writer class that can hold the log's lock for arbitrary periods of time.
This is a broad topic, and you can find several related questions on StackOverflow, which are all worth reading:
How do I perform a Unit Test using threads?
How to write an automated test for thread safety
What are some strategies to unit test a scheduler?
Unit testing a multithreaded application?
Should I unit test for multithreading problems before writing any lock? (.NET C# TDD)
CHESS is a framework under development for identifying "assertion violations, deadlocks, livelocks, data-races, and memory-model errors." I haven't actually used this, but it looks like it might be extremely helpful.

Well, this may sound wrong but the truth is you can't prove multi-threaded behavior with unit-tests. Having said that, you can gain some confidence in the code with testing and over time it might actually present an issue.
<rant>
Multi-threaded code is the bane of my existence in many a project. Often people/developers do not have the expertise required to do a good job. Bugs often go unnoticed for long periods of time before anyone sees it in the wild, and then you can't reproduce the issue to identify whats going on. Further, attempting to 'fix' broken multi-threaded code via debugging is often not a viable approach.
</rant>
Anyway, go ahead and test it, there is no harm in doing that much and it's easy enough to do. Just fire up N number of threads, have them all wait on a ManualRestEvent, and then call your api in a tight loop a couple of hundred-thousand times :). But first I would recommend everyone on your team do a code review. Walk every line of code thinking about it executing in parallel. Ask yourself:
Do I really need this lock()?
What's the least amount of code that MUST be in the lock()?
Can I make this object/state immutable and avoid the locking?
Is there any way for a caller to have code execute inside the lock?
Review all members accessed and changed inside a lock for 'volatile'?
Are you using System.Threading.Thread.MemoryBarrier correctly?
If multiple locks are involved are they always obtained in the same order?
[wiki add here]

You just can't :) It's all depends on timing and it might blow up at any time. You have to mentally check every possible situation and that is the only way to go. That's why a lot of developers think multithreading is impossible to get right.

I have actually found Thread.Sleep() to be very useful to simulate various different race conditions. However, for obvious reasons you need to ensure that you either remove (or use configuration to disable) the Thread.Sleep before deploying to production.
In Robert C Martin's book "Clean Code", he recommends using "jiggling strategies" in your unit tests to ferret out multi-threading issues. "Jiggling" involves adding random wait times to your code so that threads run in different order at different times. You can then run your unit tests many times and your jiggling may route out some flaws. The important thing is NOT to ignore any unit test failures involving multithreading just because they pass the next time you run the test.

You actually can't. You can, however, write some debug-time code (start of routine, end of routine, special actions routine takes, ...) that writes to a console, so you can see that routines run at the same time.

Thread.Sleep. If you're suffering from race conditions with multithreading a well placed Thread.Sleep can increase the size of the race condition making it easier to reproduce.
WriteA();
// potential race condition as another bit of code might read all the state
// and only get A in their read.
WriteB();
to
WriteA();
Thread.Sleep(60000);
WriteB();
Then you can write code that reproduces the problem. Then you can write code that fixes the problem. Then you can assert that your fix works. Profit!

Another thread posted a related answer, using Microsoft's CHESS program.

Concurrency issues while accessing data via reflection in C#

I'm currently writing a library that can be used to show the internal state of some running code (mainly fields and properties both public and private). Objects are accessed in a different thread to put their info into a window for the user to see. The problem is, there are times while I'm walking a long IList in which its structure may change. Some piece of code in the program being 'watched' may add a new item, or even worse, remove some. This of course causes the whole thing to crash.
I've come up with some ideas but I'm afraid they're not quite correct:
Locking the list being accessed while I'm walking it. I'm not sure if this would work since the IList being used may have not been locked for writing at the other side.
Let the code being watched to be aware of my existence and provide some interfaces to allow for synchronization. (I'd really like it to be totally transparent though).
As a last resort, put every read access into a try/catch block and pretend as if nothing happened when it throws. (Can't think of an uglier solution that actually works).
Thanks in advance.

The only way you're going to keep things "transparent" to the code being monitored is to make the monitoring code robust in the face of state changes.
Some suggestions
Don't walk a shared list - make a copy of the list into a local List instance as soon (and as fast) as you can. Once you have a local (non-shared) list of instances, noone can monkey with the list.
Make things as robust as you can - putting every read into a try/catch might feel nasty, but you'll probably need to do it.

Option number 3 may feel ugly, but this looks to be similar to the approach the Visual Studio watch windows use, and I would choose that approach.
In Visual Studio, you can often set a watch on some list or collection and at a later point notice the watch simply displays an exception when it can't evaluate a certain value due to user or code state changes.
This is the most robust approach when dealing with such an open ended range of possibilities. The fact is, if your watching code is designed to support as many scenarios as possible you will not be able to think of all situations in advance. Handling and presenting exceptions nicely is the next best approach.
By the way, someone else mentioned that locking your data structures will work. This is not true if the "other code" is not also using locks for synchronization. In fact both pieces of code must lock the same synchronization object, very unlikely if you don't control the other code. (I think you mention this in your question, so I agree.)

While I like Bevan's idea of copying the list for local read access, if the list is particularly large, that may not be a truly viable option.
If you really need seamless, transparent, concurrent access to these lists, you should look into the Parallel Extensions for .NET library. It is currently available for .NET 2.0 through 3.5 as a CTP. The extensions will be officially included in .NET 4.0 along with some additional collections. I think you would be interested in the BlockingCollection from the CTP, which would give you that transparent concurrent access you need. There is obviously a performance hit as with any threaded stuff that involves synchronization, however these collections are fairly well optimized.

As I understand, you don't want to have ANY dependency/requirement on the code being watched or enforce any constrains on how the code is written.
Although this is my favourite approach to code a "watcher", this causes you application to face a very broad range of code and behaviours, which can cause it to crash.
So, as said before me, my advice is to make the watcher "robust" in the first step. You should be prepared for anything going wrong anywhere in your code, because considering the "transparency", many things can potentially go wrong! (Be careful where to put your try/catch, entering and leaving the try block many times can have a visible performance impact)
When you're done making your code robust, next steps would be making it more usable and dodging the situations that can cause exceptions, like the "list" thing you mentioned. For example, you can check the watched object and see if it's a list, and it's not too long, first make a quick copy of it and then do the rest. This way you eliminate a large amount of the probability that can make your code throw.

Locking the list will work, because it is being modified, as you've observed via the crashing :)
Seems to me, though, that I'd avoid locking (because it seems that your thread is only the 'watcher' and shouldn't really interrupt).
On this basis, I would just try and handle the cases where you determine missing things. Is this not possible?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Prevent or forbid use of Parallel.ForEach - c#

You can't control how your API is consumed by external code. If it's something you absolutely cannot address prior to release, it would be a good idea to be very explicit about failure cases in your documentation (both XML comments and any sort of "help file").

A few quick threadstatic attributes might solve your concurrency problem, but this smells like the tip of a much larger iceberg. Fix the root cause, not the symptom.

Related

AccessViolationException when accessing variable solution value

Threading not possible suggest any other way to do parallel processing

How do i get a reference to all managed threads

How to prove that multithreading is working?

Concurrency issues while accessing data via reflection in C#

Categories

Resources