C# Multi-Producer/Multi-Tiered Multi-Consumer Losing Data

C# Multi-Producer/Multi-Tiered Multi-Consumer Losing Data - c#

I have a built complex application using a multi-tiered producer-consumer pattern, with multiple consumers performing specialized tasks before enqueing data to the next group of consumers. The ultimate job of the application is to break down a raw data file into test records for individual units that that will have been normalized.
The base of the P-C pattern uses Dustin Hyun's pattern from http://dustin-hyun.blogspot.com/2013_07_01_archive.html. I have made numerous modifications because of the multiple tiered approach and others. The code is too complex to post here- perhaps I could post snippets upon request to help clarify and answer questions.
I have employed two tools to speed up how a file gets processed. First is multiple instances of any of the tiers of consumer- there could be eight "index" consumers running whose jobs are to convert the test data from unit IDs and Test Names to Unit Indices and Test Name Indices to normalize the results to load into the DB. Second is the Bundling of units into merged DataTables at two point in the operation.
I have identified that data is lost intermittently, but in a fairly predictable pattern. It appears to be the last, incomplete bundle where the data was expected to have been. After the standard loop pattern, I have a check for a boolean that I use to flag if there is an incoMplete bundle, and it works:
if (dataToSend) // Check if incomplete bundle to process & send prior to ending comsumer operation.
{
UpdateLimitsIndices(bundleNlu);
Enqueue(StdfQType.Func, new BundledNamedTables((N_ParamRes)bundlePR.Copy(), (N_FuncRes)bundleFR.Copy(), numUnitsInCurrBundle));
}
I also have put locks onto everyplace I can see where the any of the p_c entities read or write anything from any of the shared queue members. With just the locks, there appeared to be no real impact. On a whim, I started to play with the sleep time before the loop re-spins So far, Test conditions that caused data loss with a 1ms sleep did not cause data loss during a 100 ms sleep or even a 10 ms sleep during limited testing. Could it be that the longer sleep is allowing the last piece/bundle of data to be properly processed?
I recognize that this question is vague and has few specifics because the application is too complex to post. I do hope I gave enough information for a dialog to start, however. I look for eard to heading your thoughts.
Jeff

I would suggest that because you are not using thread-safe collections (and neither does the author that you are basing your code on) that this may be the basis for losing data due to a concurrent write operation that fails (silently).
Luckily, along with the Task Parallel Library (TPL) .NET 4.0 gives us a whole bunch of concurrent collections which ARE thread-safe for multi-threaded environments.
Have a look at the collections in System.Collections.Concurrent as they are all thread-safe and their locking mechanisms are a lot faster than traditional lock-based objects.

Threading is very difficult to get right, and it appears that you have not gotten it right. Also, why are you (and the author of that blog post) using sleep intervals rather than Monitor.Pulse()?
Rather than trying to implement this yourself, why not use a library that will give you a slightly higher level of abstraction above the underlying thread coordination mechanism?
TPL Dataflow
Reactive Extensions

Related

Database recommendations needed -> Columnar, Embedded (if possible)

EDIT: As result of the answers so far I like to add more focus in what I like to zero in on: A database that allows writing in-memory (could be simple C# code) with persistence to storage options in order to access the data from within R. Redis so far looks the most promising. I also consider to actually use something similar to Lockfree++ or ZeroMQ, in order to avoid writing data concurrently to the database, but rather sending all to be persisted data over a message bus/other implementation and to have one "actor" handle all write operations to an in-memory db or other solution. Any more ideas aside Redis (some mentioned SQLite and I will need to still test its performance). Any other suggestions?
I am searching for the ideal database structure/solution that meets most of my below requirements but so far I utterly failed. Can you please help?
My tasks: I run a process in .Net 4.5 (C#) and generate (generally) value types that I want to use for further analysis in other applications and therefore like to either preserve in-memory or persist on disk. More below. The data is generated within different tasks/threads and thus a row based data format does not lend itself well to match this situation (because the data generated in different threads is generated at different times and is thus not aligned). Thus I thought a columnar data structure may be suitable but please correct me if I am wrong.
Example:
Tasks/Thread #1 generates the following data at given time stamps
datetime.ticks / value of output data
1000000001 233.23
1000000002 233.34
1000000006 234.23
...
Taks/Thread #2 generates the following data at given time stamps
datetime.ticks / value of output data
1000000002 33.32
1000000005 34.34
1000000015 54.32
...
I do not need to align the time stamps at the .Net run-time, I am first and foremost after preserving the data and to process the data within R or Python at a later point.
My requirements:
Fast writes, fast writes, fast writes: It can happen that I generate 100,000- 1,000,000 data points per second and need to persist (worst case) or retain in memory the data. Its ok to run the writes on its own thread so this process can lag the data generation process but limitation is 16gb RAM (64bit code), more below.
Preference is for columnar db format as it lends itself well to how I want to query the data later but I am open to any other structure if it makes sense in regards to the examples above (document/key-value also ok if all other requirements are met, especially in terms of write speed).
API that can be referenced from within .Net. Example: HDF5 may be considered capable by some but I find their .Net port horrible.Something that supports .Net a little better would be a plus but if all other requirements are met then I can deal with something similar to the HDF5 .Net port.
Concurrent writes if possible: As described earlier I like to write data concurrently from different tasks/threads.
I am constrained by 16gb memory (run .Net process in 64bit) and thus I probably look for something that is not purely in-memory as I may sometimes generate more data than that. Something in-memory which persists at times or a pure persistence model is probably preferable.
Preference for embedded but if a server in a client/server solution can run as a windows service then no issue.
In terms of data access I have strong preference for a db solution for which interfaces from R and Python already exist because I like to use the Panda library within Python for time series alignments and other analysis and run analyses within R.
If the API/library supports in addition SQL/SQL-like/Linq/ like queries that would be terrific but generally I just need the absolute bare bones such as load columnar data in between start and end date (given the "key"/index is in such format) because I analyze and run queries within R/Python.
If it comes with a management console or data visualizer that would be a plus but not a must.
Should be open source or priced within "reach" (no, KDB does not qualify in that regards ;-)
OK, here is what I have so far, and again its all I got because most db solution simply fail already on the write performance requirement:
Infobright and Db4o. I like what I read so far but I admit I have not checked into any performance stats
Something done myself. I can easily store value types in binary format and index the data by datetime.ticks , I just would need to somehow write scripts to load/deserialize the data in Python/R. But it would be a massive tasks if I wanted to add concurrency, a query engine, and other goodies. Thus I look for something already out there.

I can't comment -- low rep (I'm new here) -- so you get a full answer instead...
First, are you sure you need a database at all? If fast write speed and portability to R is your biggest concern then have you just considered a flat file mechanism? According to your comments you're willing to batch writes out but you need persistence; if those were my requirements I'd write a straight-to-disck buffering system that was lightning fast then build a separate task that periodically took the disk files and moved them into a data store for R, and that's only if R reading the flat files wasn't sufficient in the first place.
If you can do alignment after-the-fact, then you could write the threads to separate files in your main parallel loop, cutting each file off every so often, and leave the alignment and database loading to the subprocess.
So (in crappy pseudo_code), build a thread process that you'd call with backgroundworker or some such and include a threadname string uniquely identifying each worker and thus each filestream (task/thread):
file_name = threadname + '0001.csv' // or something
open(file_name for writing)
while(generating_data) {
generate_data()
while (buffer_not_full and very_busy) {
write_data_to_buffer
generate_data()
}
flush_buffer_to_disk(file_name)
if(file is big enough or enough time has passed or we're not too busy) {
close(file_name)
move(file_name to bob's folder)
increment file_name
open(file_name for writing)
}
)
Efficient and speedy file I/O and buffering is a straightforward and common problem. Nothing is going to be faster than this. Then you can just write another process to do the database loads and not sweat the performance there:
while(file_name in list of files in bob's folder sorted by date for good measure)
{
read bob's file
load bob's file to database
align dates, make pretty
}
And I wouldn't write that part in C#, I'd batch script it and use the database's native loader which is going to be as fast as anything you can build from scratch.
You'll have to make sure the two loops don't interfere much if you're running on the same hardware. That is, run the task threads at a higher priority, or build in some mutex or performance limiters so that the database load doesn't hog resources while the threads are running. I'd definitely segregate the database server and hardware so that file I/O to the flat files isn't compromised.
FIFO queues would work if you're on Unix, but you're not. :-)
Also, hardware is going to have more of a performance impact for you than the database engine, I'd imagine. If you're on a budget I'm guessing you're on COTS hardware, so springing for a solid state drive may up performance fairly cheaply. As I said, separating the DB storage from the flat file storage would help, and the CPU/RAM for R, the Database, and your Threads should all be segregated ideally.
What I'm saying is that choice of DB vendor probably isn't your biggest issue, unless you have a lot of money to spend. You'll be hardware bound most of the time otherwise. Database tuning is an art, and while you can eek out minor performance gains at the top end, having a good database administrator will keep most databases in the same ballpark for performance. I'd look at what R and Python support well and that you're comfortable with. If you think in columnar fashion then look at R and C#'s support for Cassandra (my vote), Hana, Lucid, HBase, Infobright, Vertica and others and pick one based on price and support. For traditional databases on a single commodity machine, I haven't seen anything that MySQL can't handle.

This is not to answer my own question but to keep track of all data bases which I tested so far and why they have not met my requirements (yet): each time I attempted to write 1 million single objects (1 long, 2 floats) to the database. For ooDBs, I stuck the objects into a collection and wrote the collection itself, similar story for key/value such as Redis but also attempted to write simple ints (1mil) to columnar dbs such as InfoBright.
Db4o, awefully slow writes: 1mil objects within a collection took about 45 seconds. I later optimized the collection structure and also wrote each object individually, not much love here.
InfoBright: Same thing, very slow in terms of write speed, which surprised me quite a bit as it organizes data in columnar format but I think the "knowledge tree" only kicks in when querying data rather than when saving flat data structures/tables-like structures.
Redis (through BookSleeve): Great API for .Net: Full Redis functionality (though couple drawbacks to run the server on Windows machines vs. a Linux or Unix box). Performance was very fast...North of 1 million items per second. I serialized all objects using Protocol Buffers (protobuf-net, both written by Marc Gravell), still need to play a lot more with the library but R and Python both have full access to the Redis DB, which is a big plus. Love it so far. The Async framework that Marc wrote around the Redis base functions is awesome, really neat and it works so far. I wanna spend a little more time to experiment with the Redis Lists/Collection types as well, as I so far only serialized to byte arrays.
SqLite: I ran purely in-memory and managed to write 1 million value type elements in around 3 seconds. Not bad for a pure RDBMS, obviously the in-memory option really speeds things up. I only created one connection, one transaction, created one command, one parameter, and simply adjusted the value of the parameter within a loop and ran the ExecuteNonQuery on each iteration. The transaction commit was then run outside the loop.
HDF5: Though there is a .Net port and there also exists a library to somehow work with HDF5 files out of R, I strongly discourage anyone to do so. Its a pure nightmare. The .Net port is very badly written, heck, the whole HDF5 concept is more than questionable. Its a very old and in my opinion outgrown solution to store vectorized/columnar data. This is 2012 not 1995. If one cannot completely delete datasets and vectors out of the file in which they were stored before then I do not call that an annoyance but a major design flaw. The API in general (not just .Net) is very badly designed and written imho, there are tons of class objects that nobody, without having spent hours and hours of studying the file structure, understands how to use. I think that is somewhat evidenced by the very sparse amount of documentation and example code that is out there. Furthermore, the h5r R library is a drama, an absolute nightmare. Its badly written as well (often the file upon writing is not correctly close due to a faulty flush and it corrupts files), the library has issues to even be properly installed on 32 bit OSs...and it goes on and on. I write the most about HDF5 because I spent the most of my time on this piece of .... and ended up with the most frustration. The idea to have a fast columnar file storage system, accessible from R and .Net was enticing but it just does not deliver what it promised in terms of API integration and usability or lack thereof.
Update: I ditched testing velocityDB simply because there does not seem any adapter to access the db from within R available. I currently contemplate writing my own GUI with charting library which would access the generated data either from a written binary file or have it sent over a broker-less message bus (zeroMQ) or sent through LockFree++ to an "actor" (my gui). I could then call R from within C# and have results returned to my GUI. That would possibly allow me the most flexibility and freedom, but would obviously also be the most tedious to code. I am running into more and more limitations during my tests that with each db test I befriend this idea more and more.
RESULT: Thanks for the participation. In the end I awarded the bounty points to Chipmonkey because he suggested partly what I considered important points to the solution to my problem (though I chose my own, different solution in the end).
I ended up with a hybrid between Redis in memory storage and direct calls out of .Net to the R.dll. Redis allows access to its data stored in memory by different processes. This makes it a convenient solution to quickly store the data as key/value in Redis and to then access the same data out of R. Additionally I directly send data and invoke functions in R through its .dll and the excellent R.Net library. Passing a collection of 1 million value types to R takes about 2.3 seconds on my machine which is fast enough given that I get the convenience to just pass in the data, invoke computational functions within R out of the .Net environment and getting the results back sync or async.

Just a note: I once had a similar problem posted by a fellow in a delphi forum. I could help him with a simple ID-key-value database backend I wrote at that time (kind of a NoSQL engine). Basically, it uses a B-Tree to store triplets (32bit ObjectID, 32bit PropertyKey, 64bit Value). I could manage to save about 500k/sec Values in real time (about 5 years ago). Of course, the data was indexed on all three values (ID, property-ID and value). You could optimize this by ignoring the value index.
The source I still have is in Delphi, but I would think about implementing something like that using C#. I cannot tell you whether it will meet your needs for performance, but if all else fails, give it a try. Using a buffered write should also drastically improve performance.

I would go with way combining persistence storage (I personally prefer db4o, but you can use files as well as mentioned above) and storing objects into memory this way:
use BlockingCollection<T> to store objects in memory (I believe you will achieve better performance then 1000000/s to store objects in memory), and than have one or more processing threads which will consume the objects and store them into persistent database
// Producing thread
for (int i=0; i<1000000; i++)
blockingCollection.Add(myObject);
// Consuming threads
while (true)
{
var myObject = blockingCollection.Take();
db4oSession.Store(myObject); // or write it to the files or whathever
}
BlockingCollection pretty much solves Producer-Consumer workflow, and in case you will use multiple instance of them and use AddToAny/TakeFromAny you can reach any kind of multithreaded performance
each consuming thread could have different db4o session (file) to reach desired performance (db4o is singlethreaded).

Since you want to use ZeroMQ why not use memcache over Redis?
ZeroMQ offers no persistence as far as I know. Memcache also offers no persistence and is a bit faster than Redis.
Or perhaps the other way, if you use Redis why not use beanstalk MQ?
If you want to use Redis (for the persistence) you might want to switch from ZeroMQ to beanstalk MQ (also a fast in memory queue, but also has persistence via logging). Beanstalk also has C# libs.

Best concurrency framework for low latency, high throughput data transfer on single machine [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking for ideas how a concurrent framework might be implemented for my specific architecture, using C#:
I implemented several modules/containers (implemented as classes) that are all individually to connect to a message bus. Each module either mainly produces or mainly consumes, but all modules also implement a request/reply pattern for communication between two given modules. I am very new to concurrent and asynchronous programming but essentially want to run the whole architecture in a concurrent way rather than synchronously. I would really appreciate some pointers which technology (TPL, ThreadPool, CTP, open source libraries,..) to consider for my specific use case, given the following requirements:
The whole system only runs on a local machine (in-process, even the message bus)
At least one module performs heavy IO (several million 16byte messages per second reads from physical drive), publishing multiple 16byte chunks to a blocking collection throughout the whole time.
Another modules consumes from the blocking collection throughout the whole time.
The entry point is the producer starting to publish messages, exit when the producer finishes publishing a finite set of 16byte messages.
The only communication that circumvents the message bus is the publishing/consuming to/from the blocking collection for throughput and latency reasons. (Am happy to hear suggestions to get rid of the message bus if it is plausible)
Other modules handle operations such as writing to an SQL database, publishing to a GUI server, connecting to APIs that communicate with outside servers.Such operations run less frequently/throttled and could potentially be run as tasks rather than utilizing a whole thread throughout running the system.
I run on a 64bit, quad core, 16gb memory machine but ideally I would like to implement a solution that can also run on a duo core machine.
Given what I like to manage what concurrency implementation would you suggest I should focus on?
EDIT: I like to emphasize that the biggest problem I am facing is how to conveniently hook up each container/module to a thread/task pool so that each of the modules runs async while still providing full in and out communication between such modules. I am not too concerned with optimizing a single producer/consumer pattern before I have not solved hooking up all the modules to a concurrent platform that can handle the number of tasks/threads involved dynamically.

I found n-act http://code.google.com/p/n-act/ , an Actors framework for .Net which implements pretty much what I am looking for. I described in my question that I look for bigger picture framework suggestions and it looks to me that an Actor Framework solves what I need. I am not saying that the n-act library will be what I implement but it is a neat example of setting up actors that can communicate asynchronously and can run on their own threads. Message passing also supports the new C#5 async/await functionality.
Disruptor was mentioned above and also the TPL and couple other ideas and I appreciate the input, it actually really got me thinking and I spent quite a bit of time to understand what each library/framework attempts to target and what problems it tries to solve, so the input was very fruitful.
For my particular case, however, I think I believe the Actors Framework is exactly what I need because my main concern is the exchange of async data flow. Unfortunately I do not see much of the Actor model implemented in any .Net technology (yet). TPL Dataflow looks very promising but as Weismat pointed out it is not yet production ready.
If N-Act does not prove stable or usable then I will look for a custom implementation through the TPL. It's about time anyway to fully understand all that TPL has to offer and start thinking concurrently already at the design stage rather than trying to transfer synchronous models into an asynchronous framework.
In summary, "Actor Model" was what I was looking for.

I recommend disruptor-net for a task like this, where you have high throughput, low latency, and a well-defined dataflow.
If you're willing to sacrifice some performance for some thread management, TPL Dataflow might work for you. It does a good job of using TPL for task scheduling.

You may look into Concurrency and Coordination_Runtime as well if you are looking for a framework based concurrency solution. I think this might be a fit for your design ideas.
Otherwise I would follow the rule, that threads should be used when something will be running for the whole lifetime of your application and tasks for short-running items.
I believe it is more important that the responsibility for the concurency is clearly defined, so that you might change the framework later.
As usual for writing fast code, there are no rules of thumb, but th need of a lot of testing with small stubs with measuring the actual performance.

What to make parallel? What will make me better? (.net Web Business Application, MVC+SL)

I'm working on a web application framework, which uses MSSQL for data storage, mostly just does CRUD operations (but on arbitrarly complex structures), provides a WCF interface for rich Silverlight admin and has an MVC3 display (and some basic forms like user settings, etc).
It's getting quite good at being able to load, display, edit and save any (reasonably) complex data structure, in a user-friendly way.
But, I'm looking towards the future, and want to expand my capabilities (and it would be fun to learn new things along the way as well...) - so I've decided (in the light of what's coming for C#5...) to try to get some parallel/async optimalization... Now, I haven't even learned TPL and PLinq yet, so I'm happy for any advice there as well.
So my question is, what are possible areas where parallel processing maybe of help, and where does TPL and PLinq help me on that?
My guts tell me, I could try saving branches of a data structure in a parallel way to the database (this is where I'd expect the biggest peformance optimalization), I could perform some complex operations (file upload, mail sending maybe?) in a multithreaded enviroment, etc. Can I build complex SL UI views in parallel on the client? (Creating 60 data-bound fields on a view can cause "blinking"...) Can I create partial views (menus, category trees, search forms, etc) in MVC at once?
ps: If this turns into "Tell me everything about parallel stuffs" thread, I'm happy to make it community-wiki...

Remember that an asp.net web application is intrinsically a parallel application in any case. Requests can be serviced in parallel and this will all be managed by the asp.net framework. So there are two cases:
You have lots of users all hitting the site at once. In which case the parallel processing capability of the server is probably being used to capacity in any case.
You don't have lots of users all hitting the site at once. In which case the server is probably quite capable of dealing with the responses without parallel processing in a suitable fast response time.
Any time you start thinking about optimising something just because it might be fun, or because you just think you should make stuff faster then you are almost certainly guilty of premature optimization. Your efforts could almost certainly be better spent enriching the functionality of the framework, rather than making what is probably a plenty fast enough solution a little bit faster (at the cost of significantly increase complexity).
In answer to the question of where can TPL and PLINQ really help. In my opinion the main advantage of these technologies is in places in the application where you really do have a lot of long running blocking processes. For example if you have a situation where you call out several times to an external web service - it can be a significant advantage to make these calls in parallel. I would strongly question whether writing to a local database - or even a database on a different box on a local network would count as being a long running blocking process to the extent that this kind of parallelisation is of any significant value.
Pretty much all the examples you list fall in to the category of getting the PC to do something in parallel that it was previously doing in sequence. How many CPUs are on your server - how many are really free when the website is under load. Making something parallel does not necessarily equate to making it faster unless the process involved has some measure of time when you PC is sitting around doing nothing waiting for an external event.

First question is to ask the users / testers which bits seem slow. The only way to know for sure what's slowing you down is to use a profiler like dottrace. The results are sometimes surprising.
If you do find something, parallel processing may not be the answer. You need to remember that there is an overhead in splitting tasks up, so if the task is fairly quick in the first place, it could end up being slower. You also have to consider the added complexity, e.g. what happens if half a task succeeds, and half fails? (Although TPL and PLINQ hide you from this to an extend)
Have fun, but I wondering whether this is a case of 1) solution chasing a problem, and 2) premature optimization.

C# Threading in real-world apps

Learning about threading is fascinating no doubt and there are some really good resources to do that. But, my question is threading applied explicitly either as part of design or development in real-world applications.
I have worked on some extensively used and well-architected .NET apps in C# but found no trace of explicit usage.Is there no real need due to this being managed by CLR or is there any specific reason?
Also, any example of threading coded in widely used .NET apps. in Codelplex or Gooogle Code are also welcome.

The simplest place to use threading is performing a long operation in a GUI while keeping the UI responsive.
If you perform the operation on the UI thread, the entire GUI will freeze until it finishes. (Because it won't run a message loop)
By executing it on a background thread, the UI will remain responsive.
The BackgroundWorker class is very useful here.

is threading applied explicitly either as part of design or development in real-world applications.
In order to take full advantage of modern, multi-core systems, threading must be part of the design from the start. While it's fairly easy (especially in .NET 4) to find small portions of code to thread, to get real scalability, you need to design your algorithms to handle being threaded, preferably at a "high level" in your code. The earlier this is done in the design phases, the easier it is to properly build threading into an application.
Is there no real need due to this being managed by CLR or is there any specific reason?
There is definitely a need. Threading doesn't come for free - it must be added in by the developer. The main reason this isn't found very often, especially in open source code, is really more a matter of difficulty. Even using .NET 4, properly designing algorithms to thread in a scalable, safe manner is difficult.

That entirely depends on the application.
For a client app that ever needs to do any significant work (or perform other potentially long-running tasks, such as making web service calls) I'd expect background threads to be used. This could be achieved via BackgroundWorker, explicit use of the thread pool, explicit use of Parallel Extensions, or creating new threads explicitly.
Web services and web applications are somewhat less likely to create their own threads, in my experience. You're more likely to effectively treat each request as having a separate thread (even if ASP.NET moves it around internally) and perform everything synchronously. Of course there are web applications which either execute asynchronously or start threads for other reasons - but I'd say this comes up less often than in client apps.

Definitely a +1 on the Parallel Extensions to .NET. Microsoft has done some great work here to improve the ThreadPool. You used to have one global queue which handled all tasks, even if they were spawned from a worker thread. Now they have a lock-free global queue and local queues for each worker thread. That's a very nice improvement.
I'm not as big a fan of things like Parallel.For, Parallel.Foreach, and Parallel.Invoke (regions), as I believe they should be pure language extensions rather than class libraries. Obviously, I understand why we have this intermediate step, but it's inevitable for C# to gain language improvements for concurrency and it's equally inevitable that we'll have to go back and change our code to take advantage of it :-)
Overall, if you're looking at building concurrent apps in .NET, you owe it to yourself to research the heck out of the Parallel Extensions. I also think, given that this is a pretty nascent effort from Microsoft, you should be very vocal about what works for you and what doesn't, independent of what you perceive your own skill level to be with concurrency. Microsoft is definitely listening, but I don't think there are that many people yet using the Parallel Extensions. I was at VSLive Redmond yesterday and watched a session on this topic and continue to be impressed with the team working on this.
Disclosure: I used to be the Marketing Director for Visual Studio and am now at a startup called Corensic where we're building tools to detect bugs in concurrent apps.

Most real-world usages of threading I've seen is to simply avoid blocking - UI, network, database calls, etc.
You might see it in use as BeginXXX and EndXXX method pairs, delegate.BeginInvoke calls, Control.Invoke calls.
Some systems I've seen, where threading would be a boon, actually use the isolation principle to achieve multiple "threads", in other words, split the work down into completely unrelated chunks and process them all independently of each other - "multi-threading" (or many-core utilisation) is automagically achieved by simply running all the processes at once.
I think it's fair to say you find a lot of stock-and-trade applications (data presentation) largely do not require massive parallisation, nor are they always able to be architected to be suitable for it. The examples I've seen are all very specific problems. This may attribute to why you've not seen any noticable implementations of it.

The question of whether to make use of an explicit threading implementation is normally a design consideration as others have mentioned here. Trying to implement concurrency as an afterthought usually requires a lot of radical and wholesale changes.
Keep in mind that simply throwing threads into an application doesn't inherently increase performance or speed, given that there is a cost in managing each thread, and also perhaps some memory overhead (not to mention, debugging it can be fun).
From my experience, the most common place to implement a threading design has been in Windows Services (background applications) and on applications which have had use case scenarios where a volume of work could be easily split up into smaller parcels of work (and handed off to threads to complete asynchronously).
As for examples, you could check out the Microsoft Robotics Studio (as far as I know there's a free version now) - it comes with an redistributable (I can't find it as a standalone download) of the Concurrency and Coordination Runtime, there's some coverage of it on Microsoft's Channel 9.
As mentioned by others the Parallel Extensions team (blog is here) have done some great work with thread safety and parallel execution and you can find some samples/examples on the MSDN Code site.

Threading is used in all sorts of scenarios, anything network based depends on threading, whether explicit (sockets stuff) or implicit (web services). Threading keeps UI responsive. And windows services having multiple parallel runs doing the same things in processing data working through queues that need to be processed.
Those are just the most common ones I've seen.

Most answers reference long-running tasks in a GUI application. Another very common usage scenario in my experience is Producer/Consumer queues. We have many utility applications that have to perform web requests etc. often to large number of endpoints. We use producer/consumer threading pattern (usually by integrating a custom thread pool) to allow high parallelization of these tasks.
In fact, at this very moment I am checking up on an application that uploads a 200MB file to 200 different FTP locations. We use SmartThreadPool and run up to around 50 uploads in parallel, which allows the whole batch to complete in under one hour (as opposed to over 50 hours were it all uploads to happen consecutively - so in our usage we find almost straight linear improvements in time).

As modern day programmers we love abstractions so we use threads by calling Async methods or BeginInvoke and by using things like BackgroundWorker or PFX in .Net 4.
Yet sometimes there is a need to do the threading yourself. For Example in a web app I built I have a mail queue that I add to from within the app and there is a background thread that sends the emails. If the thread notices that the queue is filling up faster that it is sending it creates another thread if it then sees that that thread is idle it kills it. This can be done with a higher level abstraction I guess but i did it manually.

I can't resist the edge case - in some applications where either a high degree of operational certainty must be achieved or a high degree of operational uncertainty must be tolerated, then threads and processes are considered from initial architecture design all the way through end delivery
Case 1 - for systems that must achieve extremely high levels of operational reliability, three completely separate subsystems using three different mechanisms may be used in a voting architecture - Spawn 3 threads/proceses across each of the voters, wait for them to conclude/die/be killed, and proceed IFF they all say the same thing - example - complex avionic susystems
Case 2 - for systems that must deal with a high degree of operational uncertainty - do the same thing, but once something/anything gets back to you, kill off the stragglers and go forth with the best answer you got - example - complex intraday trading algorithms endeavoring to destroy the business that employ them :-)

Parallelism in .Net

I have been asked to show the benefits and limitations of Parallelism and evaluate it for use within our company. We are predominantly a data orientated business, and essentially load objects from the database, then put them through some business logic, display to the user, then save back to the DB. In my mind, there isn't too much in that pipe line that would benefit from running in parallel, but being fairly new to the concept, I could be completely wrong. Would there be any part of that simple pipe line that would benefit from running in parallel? And are there any guidelines for how to implement this style of programming?
Also, are there any tools (preferably that come with VS2010) that would show where bottle necks occur and would be able to visually show what's going on when I click "Go" on a simple app that runs a given amount of loops (pre-written simple maths loops e.g. for i as integer = 1 to 1000 - do some calculations) in parallel, then in series?
I need to be able to display the difference using a decent profiling tool.

Yes, even from that simple model you could greatly benefit from parrallelism.
Say for instance that during a load of your data you're doing something like this:
foreach(var datarow in someDataSet)
{
//put your data into some business objects here
}
you could optimize this with parrallelism by doing something like this:
Parrallel.ForEach(someDataSet, datarow =>
{
//put your data into some business objects here
});
This could greatly increase your performance depending on how much data your processing here.
Each data row will now be processed asynchronously instead of in sequence like the typical foreach loop.
My suggestion to you would be to run some simple performance tests on an example as simple as this one and see what kind of results you get. Plot it out in a spreadsheet or something, and show it to your team. You might be suprised with the results you get.

You may reap more benefit from implementing a caching layer (distributed or otherwise) than parallelizing your current pipeline.
With a caching layer, the objects you use frequently will reside in the in-memory cache, allowing for much greater read/write performance. There are a number of options for keeping the cache in sync, and these will vary depending on which vendor you choose.
I'd suggest having a look at MemCached and NCache and see if you think they would be a good fit.
EDIT: As far as profiling tools go, I've used dotTrace extensively and would highly recommend it. You can download a 30 day trial from JetBrains' website.

Certainly there are many tasks that can be parallelized, a detailed analysis can help but bottlenecks are possible candidates.
This material can help you Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4

Possibly, but my general response to this sort of query would typically be - Do you have any performance problems in your application(s)? If yes then by all means investigate why and consider whether parallel execution can help. If not then time is probably best spent elsewhere.

Have you checked out Microsoft's Parallel Computing with Managed Code site? It contains several articles on implementation guidelines discussing both when and how to use .Net 4's parallel features.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.