How to prove that multithreading is working? - c#

How can I prove that multithreading is working in my C# programs? This is for a testing requirement. For example, I'm going to have to add some locking to a logger class (yes I know, I shouldn't have written my own logging class), and I need a test case to prove that the change will work.

If you want to test that your locking code is correctly synchronizing access to your log(s), you need to construct tests that guarantee contention. This may require you to refactor your code so that you can inject a mock log writer class that can hold the log's lock for arbitrary periods of time.
This is a broad topic, and you can find several related questions on StackOverflow, which are all worth reading:
How do I perform a Unit Test using threads?
How to write an automated test for thread safety
What are some strategies to unit test a scheduler?
Unit testing a multithreaded application?
Should I unit test for multithreading problems before writing any lock? (.NET C# TDD)
CHESS is a framework under development for identifying "assertion violations, deadlocks, livelocks, data-races, and memory-model errors." I haven't actually used this, but it looks like it might be extremely helpful.

Well, this may sound wrong but the truth is you can't prove multi-threaded behavior with unit-tests. Having said that, you can gain some confidence in the code with testing and over time it might actually present an issue.
<rant>
Multi-threaded code is the bane of my existence in many a project. Often people/developers do not have the expertise required to do a good job. Bugs often go unnoticed for long periods of time before anyone sees it in the wild, and then you can't reproduce the issue to identify whats going on. Further, attempting to 'fix' broken multi-threaded code via debugging is often not a viable approach.
</rant>
Anyway, go ahead and test it, there is no harm in doing that much and it's easy enough to do. Just fire up N number of threads, have them all wait on a ManualRestEvent, and then call your api in a tight loop a couple of hundred-thousand times :). But first I would recommend everyone on your team do a code review. Walk every line of code thinking about it executing in parallel. Ask yourself:
Do I really need this lock()?
What's the least amount of code that MUST be in the lock()?
Can I make this object/state immutable and avoid the locking?
Is there any way for a caller to have code execute inside the lock?
Review all members accessed and changed inside a lock for 'volatile'?
Are you using System.Threading.Thread.MemoryBarrier correctly?
If multiple locks are involved are they always obtained in the same order?
[wiki add here]

You just can't :) It's all depends on timing and it might blow up at any time. You have to mentally check every possible situation and that is the only way to go. That's why a lot of developers think multithreading is impossible to get right.

I have actually found Thread.Sleep() to be very useful to simulate various different race conditions. However, for obvious reasons you need to ensure that you either remove (or use configuration to disable) the Thread.Sleep before deploying to production.
In Robert C Martin's book "Clean Code", he recommends using "jiggling strategies" in your unit tests to ferret out multi-threading issues. "Jiggling" involves adding random wait times to your code so that threads run in different order at different times. You can then run your unit tests many times and your jiggling may route out some flaws. The important thing is NOT to ignore any unit test failures involving multithreading just because they pass the next time you run the test.

You actually can't. You can, however, write some debug-time code (start of routine, end of routine, special actions routine takes, ...) that writes to a console, so you can see that routines run at the same time.

Thread.Sleep. If you're suffering from race conditions with multithreading a well placed Thread.Sleep can increase the size of the race condition making it easier to reproduce.
WriteA();
// potential race condition as another bit of code might read all the state
// and only get A in their read.
WriteB();
to
WriteA();
Thread.Sleep(60000);
WriteB();
Then you can write code that reproduces the problem. Then you can write code that fixes the problem. Then you can assert that your fix works. Profit!

Another thread posted a related answer, using Microsoft's CHESS program.

Related

Threading not possible suggest any other way to do parallel processing

My code in C# which i put in threading is not thread safe as it involves lots of database connections, stored procedure executions. but still i want to minimize the time required for the execution, can anyone suggest me something for parallel or asynchronous processing.. anything from database side or .net side...
I am stuck with threading.. not able to apply...
Thanks
There is not nearly enough information in your question to suggest an optimized solution. However, if you must deal with resources that are not thread safe, one strategy is to spawn separate processes to handle sub-tasks.
Keep in mind, though, that separate processes still would have to handle portions of the overall solution that do not step on each other.
Well . There are couple of things that can be done . However I dont know how much it really fits you.
On database side you can do Query Optimizations , Indexing and other stuff that may help increase the query run time. Use profilers to analyse . See query plans to check if the indexes are properly used.
Use NOLOCKS but only where you see that you can USE NO LOCKS in your selects. (Not always a good practice but is used)
Implement proper Synchronized threading that can process multiple request. There is no other way from .Net Side. You have to review your design properly. However you can do code optimizations as well using profiler. You can use Task Library as well.
Other thing is There could be an issue with your server. Check CPU and Memory utilizations.
(This is my least of the concerns).
You should explain better what you are doing in your code...
If you do a lot of loops, you should try Parallel.For / Parallel.Foreach
http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.aspx
In Parallels you can also queue tasks for ordered computation or divide loops into blocks that could improve overall performance...
The best I can say, with so little information.
Hope it helps.

What is non-thread-safety for?

There are a lot of articles and discussions explaining why it is good to build thread-safe classes. It is said that if multiple threads access e.g. a field at the same time, there can only be some bad consequences. So, what is the point of keeping non thread-safe code? I'm focusing mostly on .NET, but I believe the main reasons are not language-dependent.
E.g. .NET static fields are not thread-safe. What would be the result if they were thread-safe by default? (without a need to perform "manual" locking). What are the benefits of using (actually defaulting to) non-thread-safety?
One thing that comes to my mind is performance (more of a guess, though). It's rather intuitive that, when a function or field doesn't need to be thread-safe, it shouldn't be. However, the question is: what for? Is thread-safety just an additional amount of code you always need to implement? In what scenarios can I be 100% sure that e.g. a field won't be used by two threads at once?
Writing thread-safe code:
Requires more skilled developers
Is harder and consumes more coding efforts
Is harder to test and debug
Usually has bigger performance cost
But! Thread-safe code is not always needed. If you can be sure that some piece of code will be accessed by only one thread the list above becomes huge and unnecessary overhead. It is like renting a van when going to neighbor city when there are two of you and not much luggage.
Thread safety comes with costs - you need to lock fields that might cause problems if accessed simultaneously.
In applications that have no use of threads, but need high performance when every cpu cycle counts, there is no reason to have safe-thread classes.
So, what is the point of keeping non thread-safe code?
Cost. Like you assumed, there usually is a penalty in performance.
Also, writing thread-safe code is more difficult and time consuming.
Thread safety is not a "yes" or "no" proposition. The meaning of "thread safety" depends upon context; does it mean "concurrent-read safe, concurrent write unsafe"? Does it mean that the application just might return stale data instead of crashing? There are many things that it can mean.
The main reason not to make a class "thread safe" is the cost. If the type won't be accessed by multiple threads, there's no advantage to putting in the work and increase the maintenance cost.
Writing threadsafe code is painfully difficult at times. For example, simple lazy loading requires two checks for '== null' and a lock. It's really easy to screw up.
[EDIT]
I didn't mean to suggest that threaded lazy loading was particularly difficult, it's the "Oh and I didn't remember to lock that first!" moments that come fast and hard once you think you're done with the locking that are really the challenge.
There are situations where "thread-safe" doesn't make sense. This consideration is in addition to the higher developer skill and increased time (development, testing, and runtime all take hits).
For example, List<T> is a commonly-used non-thread-safe class. If we were to create a thread-safe equivalent, how would we implement GetEnumerator? Hint: there is no good solution.
Turn this question on its head.
In the early days of programming there was no Thread-Safe code because there was no concept of threads. A program started, then proceeded step by step to the end. Events? What's that? Threads? Huh?
As hardware became more powerful, concepts of what types of problems could be solved with software became more imaginative and developers more ambitious, the software infrastructure became more sophisticated. It also became much more top-heavy. And here we are today, with a sophisticated, powerful, and in some cases unnecessarily top-heavy software ecosystem which includes threads and "thread-safety".
I realize the question is aimed more at application developers than, say, firmware developers, but looking at the whole forest does offer insights into how that one tree evolved.
So, what is the point of keeping non thread-safe code?
By allowing for code that isn't thread safe you're leaving it up to the programmer to decide what the correct level of isolation is.
As others have mentioned this allows for complexity reduction and improved performance.
Rico Mariani wrote two articles entitled "Putting your synchronization at the correct level" and
Putting your synchronization at the correct level -- solution that have a nice example of this in action.
In the article he has a method called DoWork(). In it he calls other classes Read twice Write twice and then LogToSteam.
Read, Write, and LogToSteam all shared a lock and were thread safe. This is good except for the fact that because DoWork was also thread safe all the synchronizing work in each Read, Write and LogToSteam was a complete waste of time.
This is all related to the nature Imperative Programming. Its side effects cause the need for this.
However if you had an development platform where applications could be expressed as pure functions where there were no dependencies or side effects then it would be possible to create applications where the threading was managed without developer intervention.
So, what is the point of keeping non thread-safe code?
The rule of thumb is to avoid locking as much as possible. The Ideal code is re-entrant and thread safe with out any locking. But that would be utopia.
Coming back to reality, a good programmer tries his level best to have a sectional locking as opposed to locking the entire context. An example would be to lock few lines of code at a time in various routines than locking everything in a function.
So Also, one has to refactor the code to come up with a design that would minimize the locking if not get rid of it in entirity.
e.g. consider a foobar() function that gets new data on each call and uses switch() case on a type of data to changes a node in a tree. The locking can be mostly avoided (if not completely) As each case statement would touch a different node in a tree. This may be a more specific example but i think it elaborates my point.

How to test for thread safety [duplicate]

This question already has answers here:
Unit testing a multithreaded application?
(9 answers)
How should I unit test multithreaded code?
(29 answers)
Closed 5 years ago.
Do you have any advices how to test a multithreaded application?
I know, threading errors are very difficult to catch and they may occur at anytime - or not at all. Tests are difficult and the results are never for sure. Certainly it is best to carefully design and program the concurrent modules.
Nevertheless - I do not want to leave out the test aspect. So running a lot of threads that are all working on the same items can invoke threading errors, sometimes.
Any ideas or best practices to get a high hit rate of hidden threading errors?
(I am using .Net/C#)
You can use some good tools to test all the threading issues like Data races, deadlocks, stalled threads etc.
intel-thread-checker is one such good tool.
You can also try, CHESS by Microsoft Research
Try increasing the number of threads to a large number if possible, even beyond how many will be used in a release. With lots of threads running your program, an error will appear more often since more threads are running over the code.
Double check your declarations, locks, unlocks, semaphore counts, etc and make sure they make sense.
Create a test document or spreadsheet, and using your knowledge of the code, think about where possible race conditions or deadlocks could occur.
Grab some people from the hall and do a 'hallway usability test' (Joel on Software said that I think?). Generally, people who have no idea what your program does/is about will be able to break it easily.
Good question. I usually test for race-conditions by spawning many threads and letting them wildly perform the operations which I suspect might be subject to race conditions.
Maybe you can look at PNUnit - although it's probably a little different from what you are looking for. The authors say they built it because "we needed to simulate hundreds of clients against the same server".
grep code for calls to threading routines. If any are found, fail the test, as your code has multi-threading bugs.
If it passes, expand your search to the parts of libraries you use, until it fails or (unlikely) is proven thread-safe (i.e. single-threaded).
Once you know you have threading bugs, the testing part of the job is done. All that remains is the small matter of finding and removing them...

Concurrency issues while accessing data via reflection in C#

I'm currently writing a library that can be used to show the internal state of some running code (mainly fields and properties both public and private). Objects are accessed in a different thread to put their info into a window for the user to see. The problem is, there are times while I'm walking a long IList in which its structure may change. Some piece of code in the program being 'watched' may add a new item, or even worse, remove some. This of course causes the whole thing to crash.
I've come up with some ideas but I'm afraid they're not quite correct:
Locking the list being accessed while I'm walking it. I'm not sure if this would work since the IList being used may have not been locked for writing at the other side.
Let the code being watched to be aware of my existence and provide some interfaces to allow for synchronization. (I'd really like it to be totally transparent though).
As a last resort, put every read access into a try/catch block and pretend as if nothing happened when it throws. (Can't think of an uglier solution that actually works).
Thanks in advance.
The only way you're going to keep things "transparent" to the code being monitored is to make the monitoring code robust in the face of state changes.
Some suggestions
Don't walk a shared list - make a copy of the list into a local List instance as soon (and as fast) as you can. Once you have a local (non-shared) list of instances, noone can monkey with the list.
Make things as robust as you can - putting every read into a try/catch might feel nasty, but you'll probably need to do it.
Option number 3 may feel ugly, but this looks to be similar to the approach the Visual Studio watch windows use, and I would choose that approach.
In Visual Studio, you can often set a watch on some list or collection and at a later point notice the watch simply displays an exception when it can't evaluate a certain value due to user or code state changes.
This is the most robust approach when dealing with such an open ended range of possibilities. The fact is, if your watching code is designed to support as many scenarios as possible you will not be able to think of all situations in advance. Handling and presenting exceptions nicely is the next best approach.
By the way, someone else mentioned that locking your data structures will work. This is not true if the "other code" is not also using locks for synchronization. In fact both pieces of code must lock the same synchronization object, very unlikely if you don't control the other code. (I think you mention this in your question, so I agree.)
While I like Bevan's idea of copying the list for local read access, if the list is particularly large, that may not be a truly viable option.
If you really need seamless, transparent, concurrent access to these lists, you should look into the Parallel Extensions for .NET library. It is currently available for .NET 2.0 through 3.5 as a CTP. The extensions will be officially included in .NET 4.0 along with some additional collections. I think you would be interested in the BlockingCollection from the CTP, which would give you that transparent concurrent access you need. There is obviously a performance hit as with any threaded stuff that involves synchronization, however these collections are fairly well optimized.
As I understand, you don't want to have ANY dependency/requirement on the code being watched or enforce any constrains on how the code is written.
Although this is my favourite approach to code a "watcher", this causes you application to face a very broad range of code and behaviours, which can cause it to crash.
So, as said before me, my advice is to make the watcher "robust" in the first step. You should be prepared for anything going wrong anywhere in your code, because considering the "transparency", many things can potentially go wrong! (Be careful where to put your try/catch, entering and leaving the try block many times can have a visible performance impact)
When you're done making your code robust, next steps would be making it more usable and dodging the situations that can cause exceptions, like the "list" thing you mentioned. For example, you can check the watched object and see if it's a list, and it's not too long, first make a quick copy of it and then do the rest. This way you eliminate a large amount of the probability that can make your code throw.
Locking the list will work, because it is being modified, as you've observed via the crashing :)
Seems to me, though, that I'd avoid locking (because it seems that your thread is only the 'watcher' and shouldn't really interrupt).
On this basis, I would just try and handle the cases where you determine missing things. Is this not possible?

Measure performance bottlenecks in a library

In my website I have a rather complicated business logic part encapsulated in a single DLL. Because CPU usage hits the roof when I call certain methods of it I need to measure and optimize.
I know about the performance profiler, but I don't know how to set it up for a library.
Furthermore I don't seem to find any usefull resources about it.
How do you approach this problem?
You can create a simple exe that runs the main methods of your library.
Although it requires some understanding to know which method to call it can help you focus on speciifc scenarios and check their bottlenecks.
You can also put some performance counters: look into msdn, or open a debugger and use old system: create a Stopwatch and do Debug.Writeline to see what's happening.
As Dror says, run it stand-alone under a simple exe. I would add, run THAT under the IDE, and while it's being slow, just pause it, several times, and each time look in detail at what it's doing. It's counterintuitive but very effective.

Categories