Concurrency issues while accessing data via reflection in C#

Concurrency issues while accessing data via reflection in C# - c#

I'm currently writing a library that can be used to show the internal state of some running code (mainly fields and properties both public and private). Objects are accessed in a different thread to put their info into a window for the user to see. The problem is, there are times while I'm walking a long IList in which its structure may change. Some piece of code in the program being 'watched' may add a new item, or even worse, remove some. This of course causes the whole thing to crash.
I've come up with some ideas but I'm afraid they're not quite correct:
Locking the list being accessed while I'm walking it. I'm not sure if this would work since the IList being used may have not been locked for writing at the other side.
Let the code being watched to be aware of my existence and provide some interfaces to allow for synchronization. (I'd really like it to be totally transparent though).
As a last resort, put every read access into a try/catch block and pretend as if nothing happened when it throws. (Can't think of an uglier solution that actually works).
Thanks in advance.

The only way you're going to keep things "transparent" to the code being monitored is to make the monitoring code robust in the face of state changes.
Some suggestions
Don't walk a shared list - make a copy of the list into a local List instance as soon (and as fast) as you can. Once you have a local (non-shared) list of instances, noone can monkey with the list.
Make things as robust as you can - putting every read into a try/catch might feel nasty, but you'll probably need to do it.

Option number 3 may feel ugly, but this looks to be similar to the approach the Visual Studio watch windows use, and I would choose that approach.
In Visual Studio, you can often set a watch on some list or collection and at a later point notice the watch simply displays an exception when it can't evaluate a certain value due to user or code state changes.
This is the most robust approach when dealing with such an open ended range of possibilities. The fact is, if your watching code is designed to support as many scenarios as possible you will not be able to think of all situations in advance. Handling and presenting exceptions nicely is the next best approach.
By the way, someone else mentioned that locking your data structures will work. This is not true if the "other code" is not also using locks for synchronization. In fact both pieces of code must lock the same synchronization object, very unlikely if you don't control the other code. (I think you mention this in your question, so I agree.)

While I like Bevan's idea of copying the list for local read access, if the list is particularly large, that may not be a truly viable option.
If you really need seamless, transparent, concurrent access to these lists, you should look into the Parallel Extensions for .NET library. It is currently available for .NET 2.0 through 3.5 as a CTP. The extensions will be officially included in .NET 4.0 along with some additional collections. I think you would be interested in the BlockingCollection from the CTP, which would give you that transparent concurrent access you need. There is obviously a performance hit as with any threaded stuff that involves synchronization, however these collections are fairly well optimized.

As I understand, you don't want to have ANY dependency/requirement on the code being watched or enforce any constrains on how the code is written.
Although this is my favourite approach to code a "watcher", this causes you application to face a very broad range of code and behaviours, which can cause it to crash.
So, as said before me, my advice is to make the watcher "robust" in the first step. You should be prepared for anything going wrong anywhere in your code, because considering the "transparency", many things can potentially go wrong! (Be careful where to put your try/catch, entering and leaving the try block many times can have a visible performance impact)
When you're done making your code robust, next steps would be making it more usable and dodging the situations that can cause exceptions, like the "list" thing you mentioned. For example, you can check the watched object and see if it's a list, and it's not too long, first make a quick copy of it and then do the rest. This way you eliminate a large amount of the probability that can make your code throw.

Locking the list will work, because it is being modified, as you've observed via the crashing :)
Seems to me, though, that I'd avoid locking (because it seems that your thread is only the 'watcher' and shouldn't really interrupt).
On this basis, I would just try and handle the cases where you determine missing things. Is this not possible?

Related

Multithreading application and Threadsafe Lists

I've been working in a multithread application and I'm still trying to figure the best/most efficient way to deal with a List that's being used and changed by multiple threads.
I've seen that writting a Threadsafe class for a List is not really the best option and to be honest I find all the locking somewhat messy.
I've thought about converting it into a ConcurrentDictionary as I've been using these and they seem to behave really good.
However, I've tried a different approach and would like to hear some opinions on this if it's or not a good option to take:
if(MyList.Count > 0)
{
MyStruct[] Example = null;
Example = new MyStruct[MyList.Count];
MyList.CopyTo(Example, 0);
foreach(MyStruct B in Example)
//Code here
}
This is just something that I tried and seemed to work without really having to make changes anywhere else. I'm not sure if I should even be doing this, that's why I'm looking for some opinions on this

No, this is not thread-safe.
Consider two threads for simplicity.
Thread A creates the list with 10 items
Thread B sees the list has 10 items, and creates an array for 10 items
Thread A adds an item to the list, so it now has 11 items
Thread B crashes on CopyTo, since the array isn't big enough
And that's one of the sanest things that can happen.
Don't mess around with multi-threading at random. It's messy, dangerous and you'll be left with plenty of bugs that are hard to reproduce and fix. Unless something is explicitly said to be thread-safe, don't assume any thread-safety at all. Obligatory starter on multi-threading: http://www.albahari.com/threading/
The usual checklist goes something like this:
Is it really necessary to have multi-threading in the first place?
Are you sure sure?
Are there thread-safe classes that do exactly what you need?
Can you use a simple, consistent locking architecture that is guaranteed not to deadlock?
Are you really sure you need to share that object between multiple threads?
Seriously, multi-threading is hard. Could you perhaps do with immutable data that's explicitly passed between the threads, rather than having shared mutable state?
Find the simplest correct way you can handle the synchronization between the threads.
Is it good enough? Good, stop.
It's not good enough? Consider alternate approaches to data sharing.
If you still have a bottle-neck on a shared resource, consider lock-less programming. This is much, much harder than lock-based synchronization. Make sure you know what you're doing. Even the people who designed C#/.NET are very wary of lock-less programming. Even Raymond Chen, and that's the Chuck Norris of software engineering. There be lions. You need perfect understanding of everything that is and isn't guaranteed, and what is safe on your platform and what's common to all the platforms.

You don't need to create an array of structs. Just use the CopyTo with another List or ToArray.
Additionaly, you can use BlockingCollection which is thread-safe for Producer Consumer operations.
https://msdn.microsoft.com/en-us/library/dd267312(v=vs.110).aspx

Threading not possible suggest any other way to do parallel processing

My code in C# which i put in threading is not thread safe as it involves lots of database connections, stored procedure executions. but still i want to minimize the time required for the execution, can anyone suggest me something for parallel or asynchronous processing.. anything from database side or .net side...
I am stuck with threading.. not able to apply...
Thanks

There is not nearly enough information in your question to suggest an optimized solution. However, if you must deal with resources that are not thread safe, one strategy is to spawn separate processes to handle sub-tasks.
Keep in mind, though, that separate processes still would have to handle portions of the overall solution that do not step on each other.

Well . There are couple of things that can be done . However I dont know how much it really fits you.
On database side you can do Query Optimizations , Indexing and other stuff that may help increase the query run time. Use profilers to analyse . See query plans to check if the indexes are properly used.
Use NOLOCKS but only where you see that you can USE NO LOCKS in your selects. (Not always a good practice but is used)
Implement proper Synchronized threading that can process multiple request. There is no other way from .Net Side. You have to review your design properly. However you can do code optimizations as well using profiler. You can use Task Library as well.
Other thing is There could be an issue with your server. Check CPU and Memory utilizations.
(This is my least of the concerns).

You should explain better what you are doing in your code...
If you do a lot of loops, you should try Parallel.For / Parallel.Foreach
http://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.aspx
In Parallels you can also queue tasks for ordered computation or divide loops into blocks that could improve overall performance...
The best I can say, with so little information.
Hope it helps.

Prevent or forbid use of Parallel.ForEach

I wrote a library on top of System.Data.SQLite and realized it (my library) has unpredictable behaviour when using Parallel.ForEach. I might debug this eventually (i.e. if I get/take the time), most likely by locking the right parts, but for now let's say I want to just prevent usage of Parallel.ForEach, or force usage of my library to allow (or result in) only a single thread, how would I proceed?

You can't control how your API is consumed by external code. If it's something you absolutely cannot address prior to release, it would be a good idea to be very explicit about failure cases in your documentation (both XML comments and any sort of "help file").

A few quick threadstatic attributes might solve your concurrency problem, but this smells like the tip of a much larger iceberg. Fix the root cause, not the symptom.

What is non-thread-safety for?

There are a lot of articles and discussions explaining why it is good to build thread-safe classes. It is said that if multiple threads access e.g. a field at the same time, there can only be some bad consequences. So, what is the point of keeping non thread-safe code? I'm focusing mostly on .NET, but I believe the main reasons are not language-dependent.
E.g. .NET static fields are not thread-safe. What would be the result if they were thread-safe by default? (without a need to perform "manual" locking). What are the benefits of using (actually defaulting to) non-thread-safety?
One thing that comes to my mind is performance (more of a guess, though). It's rather intuitive that, when a function or field doesn't need to be thread-safe, it shouldn't be. However, the question is: what for? Is thread-safety just an additional amount of code you always need to implement? In what scenarios can I be 100% sure that e.g. a field won't be used by two threads at once?

Writing thread-safe code:
Requires more skilled developers
Is harder and consumes more coding efforts
Is harder to test and debug
Usually has bigger performance cost
But! Thread-safe code is not always needed. If you can be sure that some piece of code will be accessed by only one thread the list above becomes huge and unnecessary overhead. It is like renting a van when going to neighbor city when there are two of you and not much luggage.

Thread safety comes with costs - you need to lock fields that might cause problems if accessed simultaneously.
In applications that have no use of threads, but need high performance when every cpu cycle counts, there is no reason to have safe-thread classes.

So, what is the point of keeping non thread-safe code?
Cost. Like you assumed, there usually is a penalty in performance.
Also, writing thread-safe code is more difficult and time consuming.

Thread safety is not a "yes" or "no" proposition. The meaning of "thread safety" depends upon context; does it mean "concurrent-read safe, concurrent write unsafe"? Does it mean that the application just might return stale data instead of crashing? There are many things that it can mean.
The main reason not to make a class "thread safe" is the cost. If the type won't be accessed by multiple threads, there's no advantage to putting in the work and increase the maintenance cost.

Writing threadsafe code is painfully difficult at times. For example, simple lazy loading requires two checks for '== null' and a lock. It's really easy to screw up.
[EDIT]
I didn't mean to suggest that threaded lazy loading was particularly difficult, it's the "Oh and I didn't remember to lock that first!" moments that come fast and hard once you think you're done with the locking that are really the challenge.

There are situations where "thread-safe" doesn't make sense. This consideration is in addition to the higher developer skill and increased time (development, testing, and runtime all take hits).
For example, List<T> is a commonly-used non-thread-safe class. If we were to create a thread-safe equivalent, how would we implement GetEnumerator? Hint: there is no good solution.

Turn this question on its head.
In the early days of programming there was no Thread-Safe code because there was no concept of threads. A program started, then proceeded step by step to the end. Events? What's that? Threads? Huh?
As hardware became more powerful, concepts of what types of problems could be solved with software became more imaginative and developers more ambitious, the software infrastructure became more sophisticated. It also became much more top-heavy. And here we are today, with a sophisticated, powerful, and in some cases unnecessarily top-heavy software ecosystem which includes threads and "thread-safety".
I realize the question is aimed more at application developers than, say, firmware developers, but looking at the whole forest does offer insights into how that one tree evolved.

So, what is the point of keeping non thread-safe code?
By allowing for code that isn't thread safe you're leaving it up to the programmer to decide what the correct level of isolation is.
As others have mentioned this allows for complexity reduction and improved performance.
Rico Mariani wrote two articles entitled "Putting your synchronization at the correct level" and
Putting your synchronization at the correct level -- solution that have a nice example of this in action.
In the article he has a method called DoWork(). In it he calls other classes Read twice Write twice and then LogToSteam.
Read, Write, and LogToSteam all shared a lock and were thread safe. This is good except for the fact that because DoWork was also thread safe all the synchronizing work in each Read, Write and LogToSteam was a complete waste of time.
This is all related to the nature Imperative Programming. Its side effects cause the need for this.
However if you had an development platform where applications could be expressed as pure functions where there were no dependencies or side effects then it would be possible to create applications where the threading was managed without developer intervention.

So, what is the point of keeping non thread-safe code?
The rule of thumb is to avoid locking as much as possible. The Ideal code is re-entrant and thread safe with out any locking. But that would be utopia.
Coming back to reality, a good programmer tries his level best to have a sectional locking as opposed to locking the entire context. An example would be to lock few lines of code at a time in various routines than locking everything in a function.
So Also, one has to refactor the code to come up with a design that would minimize the locking if not get rid of it in entirity.
e.g. consider a foobar() function that gets new data on each call and uses switch() case on a type of data to changes a node in a tree. The locking can be mostly avoided (if not completely) As each case statement would touch a different node in a tree. This may be a more specific example but i think it elaborates my point.

Best practices for debugging

I've been doing quite a bit of debugging of managed applications lately using both Visual Studio and WinDbg, and as such I'm often ask to assist colleagues in debugging situations. On several occasions I have found people aho just insert break points here and there and hope for the best. In my experience that is rarely a useful technique.
My approach goes something like this.
Reproduce the problem. Ideally reduce the input as much as possible.
Examine what goes wrong and list theories for where the bug may be.
Examine one theory at a time by debugging that specific area of the code.
Repeat steps as necessary.
For complex debugging problems I often work with a colleague. For WinDbg this is especially useful.
Any other useful tips or best practices for debugging?

If there was one tip I could give to everyone about debugging it would be to break it again.
That is, when you think you've found the fix and the system seems to work. Back the fix out and see if the system breaks again.
Sometimes you can get lost in the sequence of what you've tried as potential solutions and you finish up in a totally different area of the system while you're debugging the problem. Then you forget what you've changed back in the original area where you were working.
Backing the fix out and then reproducing the problem ensures that the candidate fix isn't relying on something else that you've changed in another part of the system. That your patch for the fix is a correct standalone solution.
HTH.
cheers,
Rob

One very best practice is not diving into debugger immediately but look at the code and think hard for some time.

I'm not sure where I read about "Rubber Duck Debugging", but I think its great. The basic idea is to set a rubber duck on your desk and explain the code to it. The idea is that as you explain the code to the duck, you'll eventually find yourself saying "Now, this happens", and you'll notice that 'this' is not what you intend to be happening.
Lacking a duck, I find I just walk through the code and explain it to myself. It works, but I still think I might bring in a duck.
[EDIT]
I found where I read about the rubber duck Rubber Duck Debugging

As another poster says, with some hard thinking, it's often possible to just see the logic error if you understand what's going on.
But often we think we do, and we don't, or we're simply required to fix something that we don't understand too well, so it's back to first principles.
Reproducing the problem is certainly a vital first step. If you can't do this, then you stand no chance of finding the problem, except by accident.
The next step is to establish beyond doubt the path through the code that actually executes when the bug hits. In a WinForms application that might have many events, and more than one thread, this can be anything but a trivial exercise.
Until you know exactly where the code is going, all the theories in the world about where the bug may be are worthless. And if the code is complex, discovering that code does not stop on a breakpoint can be as informative as having it stop.
So in my experience, using breakpoints early and often can be an essential tool for discovering a how code's working.
I often find that when a problem seems particularly intractable, it's because I've made a fatal assumption about what's going on, and not actually verified it.
So my 'best practice' is not to move on until I'm sure I understand, and not guess.

Not directly related to debugging, but to make debugging easier in the future, there are a few things to consider:
Implementing unit testing, preferably in the form of TDD, forces you to stay on task and develop only with the goal of passing tests. It is harder to "wander" when you are coding to a test, instead of to a task.
Get in the practice of regularly refactoring your code. Small, on-point methods are easier to debug than monolithic "jack of all trades" methods.
Utilize your team members. Often adding an extra set of eyes can help flush something out. Chances are, if you do not find something in a relatively quick manner, you are going to continue to overlook it for a while.
You can always rollback code in your version control system to try and isolate what version of a file caused the introduction of the bug. Once you do that, you can diff between the last good and first bad and just focus on the changes between the two.

The barriers to entry for the debugger in VS.NET with a language like C# or VB.NET is just so ridiculously low that it is often easier to insert a breakpoint or two where you know the problem is and just step through.
Sometimes I find myself using Edit & Continue to write code. It's great. You can see results immediately. Often it's most useful when there's some algorithm or relatively difficult to understand loop.

This book is honestly the best I've read about debugging, especially when you are beyond the normal debugging situation. It contains many tricks and is fun to read with all the "true story". If you are working with a big amount of code that you haven't written yourself, especially if it's crappy, this book is a must!
http://www.amazon.com/Debugging-Applications-Microsoft%C2%AE-Microsoft-Pro-Developer/dp/0735615365/ref=sr_1_1?ie=UTF8&s=books&qid=1238705836&sr=1-1
alt text http://ecx.images-amazon.com/images/I/51RQ146x9VL._SS500_.jpg

Something that helps, especially when you're new to debugging, is to keep some kind of debugging journal, with solutions to problems you've solved in the past. Most bugs follow relatively common patterns (for instance, apparently random problems in non-threaded applications are usually due to undefined variables, or similar use of uninitialized memory) and by keeping track of those patterns, you'll get much better at nailing in on future problems.
After a while, you just develop the necessary intuition (and then your journal becomes a very fun memory of all the nasty enemies you've conquered)

Like Conceptual Blockbusting suggests, I like to try different ways whenever I get stuck. "printf debugging", thinking about the behavior, binary search on code, binary search on version-control commits, writing a unit test to clarify, scratch refactoring, and also firing the debugger.

IMO too much preparation is a waste of time. If you know the codebase reasonably well, you can usually right aways think of a few key places where the problem manifests. Put break points there and see if you're right. Whenever you see better key points, move your breakpoints to get closer to the problem.
When you're chasing bad data such as a null pointer, it depends on where it comes from: if it's passed as an argument, look at the call stack to find where it comes from. If it's part of some data structure or object (the easier case), put a breakpoint there to see when and how it is modified.
Conitional breakpoints can be a great help, otherwise you can simulate them by adding if statements enclosing no-ops. If you have a breakpoint in a hot spot that gets hit too often before you get to the problem, deactivate it and put another one in a place you know will be hit shortly before the problem manifests, then activte the one in the hot spot.

A good practice is to make sure you're not fixing a symptom, but the cause.
Often, one might see an odd value while debugging and fix it there, without checking what caused the value to get there in the first place. This is, of course, a very bad idea.
BTW, this is why Linus objected adding built-in support for kernel debugging.

I'll paraphrase my answer on a similar thread (which is essentially the last bullet point in joseph.ferris's answer to this thread ):
Using your version control system, isolate the file revision where the bug was introduced, using binary search tree approach.
Diff that revision of the source file against the previous revision. The diff may make the reason for the bug apparent.

This is by no means a technical tip but it often works in my case.
Just stop working hard to find a root cause or fix a bug. Relax for a while: take a walk, eat dinner, or just switch to another task (hopefully much easier) - whatever you like...
...Then think about a problem a bit later, when you're "fresh" again. Trace back all the mental process of debugging that you already went through (theories, experiments, assumptions etc. that you made). Chances are that you will instantly see some key factor that you overlooked before ;).
Programmers (or at least me) tend to gradually narrow down thier perspective of problem and wear off of creativity during a long debugging session. But wide perspective combined with creative ideas are man's most powerful weapon in the battle with bugs!

I just replayed in another post, the question was C debugging but as i stated in my replay i think that debugging techniques are language independent.

One thing I like to hammer home is that where you have one instance working and one not (say production and dev) it's about the differences and you need to clearly identify what those could be and deal with them one at a time. Environmental problems can be the hardest to trace and you'll go insane if you don't work systematically.
Incidentally this is one of the reasons I habitually run my VS webapp projects through IIS not cassini anymore.

Another thing I've started doing to all my projects is to add a TraceListener (or derived class) and using it to take key snapshots of my application.
This usually gives me a good idea of where to focus my initial debugging efforts.
Plus, I can it on/off using a config file switch, so I can even get a hint on a production system, without recompiling the code.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.