How to make my code run on multiple cores?

How to make my code run on multiple cores? - c#

I have built an application in C# that I would like to be optimized for multiple cores. I have some threads, should I do more?
Updated for more detail
C# 2.0
Run on Windows Vista and Windows Server 2003
Updated again
This code is running as a service
I do not want to have the complete code... my goal here is to get your experience and how to start. Like I say, I have already use threads. What more can I do?

I'd generalize that writing a highly optimized multi-threaded process is a lot harder than just throwing some threads in the mix.
I recommend starting with the following steps:
Split up your workloads into discrete parallel executable units
Measure and characterize workload types - Network intensive, I/O intensive, CPU intensive etc - these become the basis for your worker pooling strategies. e.g. you can have pretty large pools of workers for network intensive applications, but it doesn't make sense having more workers than hardware-threads for CPU intensive tasks.
Think about queuing/array or ThreadWorkerPool to manage pools of threads. Former more finegrain controlled than latter.
Learn to prefer async I/O patterns over sync patterns if you can - frees more CPU time to perform other tasks.
Work to eliminate or atleast reduce serialization around contended resources such as disk.
Minimize I/O, acquire and hold minimum level of locks for minimum period possible. (Reader/Writer locks are your friend)
5.Comb through that code to ensure that resources are locked in consistent sequence to minimize deadly embrace.
Test like crazy - race conditions and bugs in multithreaded applications are hellish to troubleshoot - often you only see the forensic aftermath of the massacre.
Bear in mind that it is entirely possible that a multi-threaded version could perform worse than a single-threaded version of the same app. There is no excuse for good engineering measurement.

You might want to take a look at the parallel extensions for .NET
http://msdn.com/concurrency

You might want to read Herb Sutter's column 'Effective Concurrency'. You'll find those articles here, with others.

To be able to utilize multiple cores more efficiently you should divide your work up in parts that can be executed in parallel and use threads to divide the work over the cores. You could use threads, background workers, thread pools, etc

For C#, start learning the LINQ-way of doing things, then make use of the Parallel LINQ library and its .AsParallel() extension.

Understanding the parallelism (or potential for parallelism) in the problem(s) you are trying to solve, your application and its algorithms is much more important than any details of thread synchronization, libraries, etc.
Start by reading Patterns for Parallel Programming (which focuses on 'finding concurrency' and higher-level design issues), and then move on to The Art of Multiprocessor Programming (practical details starting from a theoretical basis).

Related

Alternative to threads for implementing massive parallel calculation engine in c# or java?

Suppose there is a need for building a spreadsheet-like engine that needs to be ultra fast, each cell dependencies could be on parallel calculation branch. Could thread be created for each parallel branch ? Isn't thread costfull in term of memory. Easily you could think that with 1000 formulas rows or even 1 million you would have to create same number of threads is it realistic ?
If it isn't realistic is there an alternative to threads for this kind of scenario ?

In modern Java programming, you should avoid threads altogether, and instead use executors. The rest of the world calls them working queues. See Item 68 in Effective Java by Joshua Bloch.
Personally, I strongly prefer the APIs of Grand Central Dispatch. The Java version is called HawtDispatch. That API is simpler, and just works.

For CPU intensive tasks, the optimal number of threads is usually the same number of CPUs. The overhead of creating threads can be much higher than the work that thread does if you are not careful.
Its worth nothing that CPU is often not the main issue. Often memory bandwidth or cache utilisation is more of an issue, in which case having one thread efficiently written can out perform attempting to distribute work across many thread. If the work each thread does is CPU intensive, and uses relatively less memory bandwidth, having multiple threads can help.

Your best bet is Task Parallel Library or Fork/Join Framework in Java. They do use threads but optimize the number of threads and put work items on a work queue for you. They take care of a lot of low level optimization problems in really clever ways. You just use constructs like Parallel.For, etc.

The only thing besides threads that comes to mind are SIMD commands (unless you want to use special hardware which means you'd have to use a lower lvl Language). You'd have to use a external Library for these tough to gain access to the Processors/Gaphic Cards functions. Also CUDA or OpenCL might interest you.
On the other Hand you normally don't want to create that many threads as you described, you could use a thread Pool, with a fixed or dynamic amount of threads, that manages how many threads are created and executes tasks from a queue. Also there is a Fork/Join Feature in Java 7 which helps with thread management.
I'd say have a look at thread pools, with these you can balance out the overhead created from too many threads.
Since you are looking for Information this might help out a bit for threads too.

Please take also a look at Ateji PX. It's an extension to the java language for parallelization that may help you. It was a commercial product but meanwhile it has become available for free.

The Task Parallel Library can help you utilize the CPU as much as possible, and does most of the heavy lifting of thread creation for you.
If you have a very large number of (very) parallelizable computations, and you need the absolute best performance you can have, you will have too look beyond the cpu. There are alternatives that combines LINQ/TPL with the GPU such as MS. Accelerator and Brahma. See for example Utilizing the GPU with c#

C# Threading in real-world apps

Learning about threading is fascinating no doubt and there are some really good resources to do that. But, my question is threading applied explicitly either as part of design or development in real-world applications.
I have worked on some extensively used and well-architected .NET apps in C# but found no trace of explicit usage.Is there no real need due to this being managed by CLR or is there any specific reason?
Also, any example of threading coded in widely used .NET apps. in Codelplex or Gooogle Code are also welcome.

The simplest place to use threading is performing a long operation in a GUI while keeping the UI responsive.
If you perform the operation on the UI thread, the entire GUI will freeze until it finishes. (Because it won't run a message loop)
By executing it on a background thread, the UI will remain responsive.
The BackgroundWorker class is very useful here.

is threading applied explicitly either as part of design or development in real-world applications.
In order to take full advantage of modern, multi-core systems, threading must be part of the design from the start. While it's fairly easy (especially in .NET 4) to find small portions of code to thread, to get real scalability, you need to design your algorithms to handle being threaded, preferably at a "high level" in your code. The earlier this is done in the design phases, the easier it is to properly build threading into an application.
Is there no real need due to this being managed by CLR or is there any specific reason?
There is definitely a need. Threading doesn't come for free - it must be added in by the developer. The main reason this isn't found very often, especially in open source code, is really more a matter of difficulty. Even using .NET 4, properly designing algorithms to thread in a scalable, safe manner is difficult.

That entirely depends on the application.
For a client app that ever needs to do any significant work (or perform other potentially long-running tasks, such as making web service calls) I'd expect background threads to be used. This could be achieved via BackgroundWorker, explicit use of the thread pool, explicit use of Parallel Extensions, or creating new threads explicitly.
Web services and web applications are somewhat less likely to create their own threads, in my experience. You're more likely to effectively treat each request as having a separate thread (even if ASP.NET moves it around internally) and perform everything synchronously. Of course there are web applications which either execute asynchronously or start threads for other reasons - but I'd say this comes up less often than in client apps.

Definitely a +1 on the Parallel Extensions to .NET. Microsoft has done some great work here to improve the ThreadPool. You used to have one global queue which handled all tasks, even if they were spawned from a worker thread. Now they have a lock-free global queue and local queues for each worker thread. That's a very nice improvement.
I'm not as big a fan of things like Parallel.For, Parallel.Foreach, and Parallel.Invoke (regions), as I believe they should be pure language extensions rather than class libraries. Obviously, I understand why we have this intermediate step, but it's inevitable for C# to gain language improvements for concurrency and it's equally inevitable that we'll have to go back and change our code to take advantage of it :-)
Overall, if you're looking at building concurrent apps in .NET, you owe it to yourself to research the heck out of the Parallel Extensions. I also think, given that this is a pretty nascent effort from Microsoft, you should be very vocal about what works for you and what doesn't, independent of what you perceive your own skill level to be with concurrency. Microsoft is definitely listening, but I don't think there are that many people yet using the Parallel Extensions. I was at VSLive Redmond yesterday and watched a session on this topic and continue to be impressed with the team working on this.
Disclosure: I used to be the Marketing Director for Visual Studio and am now at a startup called Corensic where we're building tools to detect bugs in concurrent apps.

Most real-world usages of threading I've seen is to simply avoid blocking - UI, network, database calls, etc.
You might see it in use as BeginXXX and EndXXX method pairs, delegate.BeginInvoke calls, Control.Invoke calls.
Some systems I've seen, where threading would be a boon, actually use the isolation principle to achieve multiple "threads", in other words, split the work down into completely unrelated chunks and process them all independently of each other - "multi-threading" (or many-core utilisation) is automagically achieved by simply running all the processes at once.
I think it's fair to say you find a lot of stock-and-trade applications (data presentation) largely do not require massive parallisation, nor are they always able to be architected to be suitable for it. The examples I've seen are all very specific problems. This may attribute to why you've not seen any noticable implementations of it.

The question of whether to make use of an explicit threading implementation is normally a design consideration as others have mentioned here. Trying to implement concurrency as an afterthought usually requires a lot of radical and wholesale changes.
Keep in mind that simply throwing threads into an application doesn't inherently increase performance or speed, given that there is a cost in managing each thread, and also perhaps some memory overhead (not to mention, debugging it can be fun).
From my experience, the most common place to implement a threading design has been in Windows Services (background applications) and on applications which have had use case scenarios where a volume of work could be easily split up into smaller parcels of work (and handed off to threads to complete asynchronously).
As for examples, you could check out the Microsoft Robotics Studio (as far as I know there's a free version now) - it comes with an redistributable (I can't find it as a standalone download) of the Concurrency and Coordination Runtime, there's some coverage of it on Microsoft's Channel 9.
As mentioned by others the Parallel Extensions team (blog is here) have done some great work with thread safety and parallel execution and you can find some samples/examples on the MSDN Code site.

Threading is used in all sorts of scenarios, anything network based depends on threading, whether explicit (sockets stuff) or implicit (web services). Threading keeps UI responsive. And windows services having multiple parallel runs doing the same things in processing data working through queues that need to be processed.
Those are just the most common ones I've seen.

Most answers reference long-running tasks in a GUI application. Another very common usage scenario in my experience is Producer/Consumer queues. We have many utility applications that have to perform web requests etc. often to large number of endpoints. We use producer/consumer threading pattern (usually by integrating a custom thread pool) to allow high parallelization of these tasks.
In fact, at this very moment I am checking up on an application that uploads a 200MB file to 200 different FTP locations. We use SmartThreadPool and run up to around 50 uploads in parallel, which allows the whole batch to complete in under one hour (as opposed to over 50 hours were it all uploads to happen consecutively - so in our usage we find almost straight linear improvements in time).

As modern day programmers we love abstractions so we use threads by calling Async methods or BeginInvoke and by using things like BackgroundWorker or PFX in .Net 4.
Yet sometimes there is a need to do the threading yourself. For Example in a web app I built I have a mail queue that I add to from within the app and there is a background thread that sends the emails. If the thread notices that the queue is filling up faster that it is sending it creates another thread if it then sees that that thread is idle it kills it. This can be done with a higher level abstraction I guess but i did it manually.

I can't resist the edge case - in some applications where either a high degree of operational certainty must be achieved or a high degree of operational uncertainty must be tolerated, then threads and processes are considered from initial architecture design all the way through end delivery
Case 1 - for systems that must achieve extremely high levels of operational reliability, three completely separate subsystems using three different mechanisms may be used in a voting architecture - Spawn 3 threads/proceses across each of the voters, wait for them to conclude/die/be killed, and proceed IFF they all say the same thing - example - complex avionic susystems
Case 2 - for systems that must deal with a high degree of operational uncertainty - do the same thing, but once something/anything gets back to you, kill off the stragglers and go forth with the best answer you got - example - complex intraday trading algorithms endeavoring to destroy the business that employ them :-)

Multicore programming: the hard parts

I'm writing a book on multicore programming using .NET 4 and I'm curious to know what parts of multicore programming people have found difficult to grok or anticipate being difficult to grok?

What's a useful unit of work to parallelize, and how do I find/organize one?
All these parallelism primitives aren't helpful if you fork a piece of work that is smaller than the forking overhead; in fact, that buys you a nice slowdown instead of what you are expecting.
So one of the big problems is finding units of work that are obviously more expensive than the parallelism primitives. A key problem here is that nobody knows what anything costs to execute, including the parallelism primitives themselves. Clearly calibrating these costs would be very helpful. (As an aside, we designed, implemented, and daily use a parallel programming langauge, PARLANSE whose objective was to minimize the cost of the parallelism primitives by allowing the compiler to generate and optimize them, with the goal of making smaller bits of work "more parallelizable").
One might also consider discussion big-Oh notation and its applications. We all hope that the parallelism primitives have cost O(1). If that's the case, then if you find work with cost O(x) > O(1) then that work is a good candidate for parallelization. If your proposed work is also O(1), then whether it is effective or not depends on the constant factors and we are back to calibration as above.
There's the problem of collecting work into large enough units, if none of the pieces are large enough. Code motion, algorithm replacement, ... are all useful ideas to achieve this effect.
Lastly, there's the problem of synchnonization: when do my parallel units have to interact, what primitives should I use, and how much do those primitives cost? (More than you expect!).

I guess some of it depends on how basic or advanced the book/audience is. When you go from single-threaded to multi-threaded programming for the first time, you typically fall off a huge cliff (and many never recover, see e.g. all the muddled questions about Control.Invoke).
Anyway, to add some thoughts that are less about the programming itself, and more about the other related tasks in the software process:
Measuring: deciding what metric you are aiming to improve, measuring it correctly (it is so easy to accidentally measure the wrong thing), using the right tools, differentiating signal versus noise, interpreting the results and understanding why they are as they are.
Testing: how to write tests that tolerate unimportant non-determinism/interleavings, but still pin down correct program behavior.
Debugging: tools, strategies, when "hard to debug" implies feedback to improve your code/design and better partition mutable state, etc.
Physical versus logical thread affinity: understanding the GUI thread, understanding how e.g. an F# MailboxProcessor/agent can encapsulate mutable state and run on multiple threads but always with only a single logical thread (one program counter).
Patterns (and when they apply): fork-join, map-reduce, producer-consumer, ...
I expect that there will be a large audience for e.g. "help, I've got a single-threaded app with 12% CPU utilization, and I want to learn just enough to make it go 4x faster without much work" and a smaller audience for e.g. "my app is scaling sub-linearly as we add cores because there seems to be contention here, is there a better approach to use?", and so a bit of the challenge may be serving each of those audiences.

Since you write a whole book for multi-core programming in .Net.
I think you can also go beyond multi-core a little bit.
For example, you can use a chapter talking about parallel computing in a distributed system in .Net. Unlikely, there is no mature frameworks in .Net yet. DryadLinq is the closest. (On the other side, Hadoop and its friends in Java platform are really good.)
You can also use a chapter demonstrating some GPU computing stuff.

One thing that has tripped me up is which approach to use to solve a particular type of problem. There's agents, there's tasks, async computations, MPI for distribution - for many problems you could use multiple methods but I'm having difficulty understanding why I should use one over another.

To understand: low level memory details like the difference between acquire and release semantics of memory.
Most of the rest of the concepts and ideas (anything can interleave, race conditions, ...) are not that difficult with a little usage.
Of course the practice, especially if something is failing sometimes, is very hard as you need to work at multiple levels of abstraction to understand what is going on, so keep your design simple and as far as possible design out the need for locking etc. (e.g. using immutable data and higher level abstractions).

Its not so much theoretical details, but more the practical implementation details which trips people up.
What's the deal with immutable data structures?
All the time, people try to update a data structure from multiple threads, find it too hard, and someone chimes in "use immutable data structures!", and so our persistent coder writes this:
ImmutableSet set;
ThreadLoop1()
foreach(Customer c in dataStore1)
set = set.Add(ProcessCustomer(c));
ThreadLoop2()
foreach(Customer c in dataStore2)
set = set.Add(ProcessCustomer(c));
Coder has heard all their lives that immutable data structures can be updated without locking, but the new code doesn't work for obvious reasons.
Even if your targeting academics and experienced devs, a little primer on the basics of immutable programming idioms can't hurt.
How to partition roughly equal amounts of work between threads?
Getting this step right is hard. Sometimes you break up a single process into 10,000 steps which can be executed in parallel, but not all steps take the same amount of time. If you split the work on 4 threads, and the first 3 threads finish in 1 second, and the last thread takes 60 seconds, your multithreaded program isn't much better than the single-threaded version, right?
So how do you partition problems with roughly equal amounts of work between all threads? Lots of good heuristics on solving bin packing problems should be relevant here..
How many threads?
If your problem is nicely parallelizable, adding more threads should make it faster, right? Well not really, lots of things to consider here:
Even a single core processor, adding more threads can make a program faster because more threads gives more opportunities for the OS to schedule your thread, so it gets more execution time than the single-threaded program. But with the law of diminishing returns, adding more threads increasing context-switching, so at a certain point, even if your program has the most execution time the performance could still be worse than the single-threaded version.
So how do you spin off just enough threads to minimize execution time?
And if there are lots of other apps spinning up threads and competing for resources, how do you detect performance changes and adjust your program automagically?

I find the conceptions of synchronized data moving across worker nodes in complex patterns very hard to visualize and program.
Usually I find debugging to be a bear, also.

Will Multi threading increase the speed of the calculation on Single Processor

On a single processor, Will multi-threading increse the speed of the calculation. As we all know that, multi-threading is used for Increasing the User responsiveness and achieved by sepating UI thread and calculation thread. But lets talk about only console application. Will multi-threading increases the speed of the calculation. Do we get culculation result faster when we calculate through multi-threading.
what about on multi cores, will multi threading increse the speed or not.
Please help me. If you have any material to learn more about threading. please post.
Edit:
I have been asked a question, At any given time, only one thread is allowed to run on a single core. If so, why people use multithreading in a console application.
Thanks in advance,
Harsha

In general terms, no it won't speed up anything.
Presumably the same work overall is being done, but now there is the overhead of additional threads and context switches.
On a single processor with HyperThreading (two virtual processors) then the answer becomes "maybe".
Finally, even though there is only one CPU perhaps some of the threads can be pushed to the GPU or other hardware? This is kinda getting away from the "single processor" scenario but could technically be way of achieving a speed increase from multithreading on a single core PC.
Edit: your question now mentions multithreaded apps on a multicore machine.
Again, in very general terms, this will provide an overall speed increase to your calculation.
However, the increase (or lack thereof) will depend on how parallelizable the algorithm is, the contention for memory and cache, and the skill of the programmer when it comes to writing parallel code without locking or starvation issues.

Few threads on 1 CPU:
may increase performance in case you continue with another thread instead of waiting for I/O bound operation
may decrease performance if let say there are too many threads and work is wasted on context switching
Few threads on N CPUs:
may increase performance if you are able to cut job in independent chunks and process them in independent manner
may decrease performance if you rely heavily on communication between threads and bus becomes a bottleneck.
So actually it's very task specific - you can parallel one things very easy while it's almost impossible for others. Perhaps it's a bit advanced reading for new person but there are 2 great resources on this topic in C# world:
Joe Duffy's web log
PFX team blog - they have a very good set of articles for parallel programming in .NET world including patterns and practices.

What is your calculation doing? You won't be able to speed it up by using multithreading if it a processor bound, but if for some reason your calculation writes to disk or waits for some other sort of IO you may be able to improve performance using threading. However, when you say "calculation" I assume you mean some sort of processor intensive algorithm, so adding threads is unlikely to help, and could even slow you down as the context switch between threads adds extra work.

If the task is compute bound, threading will not make it faster unless the calculation can be split in multiple independent parts. Even so you will only be able to achieve any performance gains if you have multiple cores available. From the background in your question it will just add overhead.
However, you may still want to run any complex and long running calculations on a separate thread in order to keep the application responsive.

No, no and no.
Unless you write parallelizing code to take advantage of multicores, it will always be slower if you have no other blocking functions.

Exactly like the user input example, one thread might be waiting for a disk operation to complete, and other threads can take that CPU time.

As described in the other answers, multi-threading on a single core won't give you any extra performance (hyperthreading notwithstanding). However, if your machine sports an Nvidia GPU you should be able to use the CUDA to push calculations to the GPU. See http://www.hoopoe-cloud.com/Solutions/CUDA.NET/Default.aspx and C#: Perform Operations on GPU, not CPU (Calculate Pi).

Above mention most.
Running multiple threads on one processor can increase performance, if you can manage to get more work done at the same time, instead of let the processor wait between different operations. However, it could also be a severe loss of performance due to for example synchronization or that the processor is overloaded and cant step up to the requirements.
As for multiple cores, threading can improve the performance significantly. However, much depends on finding the hotspots and not overdo it. Using threads everywhere and the need of synchronization can even lower the performance. Optimizing using threads with multiple cores takes a lot of pre-studies and planning to get a good result. You need for example to think about how many threads to be use in different situations. You do not want the threads to sit and wait for information used by another thread.
http://www.intel.com/intelpress/samples/mcp_samplech01.pdf
https://computing.llnl.gov/tutorials/parallel_comp/
https://computing.llnl.gov/tutorials/pthreads/
http://en.wikipedia.org/wiki/Superscalar
http://en.wikipedia.org/wiki/Simultaneous_multithreading

I have been doing some intensive C++ mathematical simulation runs using 24 core servers. If I run 24 separate simulations in parallel on the 24 cores of a single server, then I get a runtime for each of my simulations of say X seconds.
The bizarre thing I have noticed is that, when running only 12 simulations, using 12 of the 24 cores, with the other 12 cores idle, then each of the simulations runs at a runtime of Y seconds, where Y is much greater than X! When viewing the task manager graph of the processor usage, it is obvious that a process does not stick to only one core, but alternates between a number of cores. That is to say, the switching between cores to use all the cores slows down the calculation process.
The way I maintained the runtime when running only 12 simulations, is to run another 12 "junk" simulations on the side, using the remaining 12 cores!
Conclusion: When using multi-cores, use them all at 100%, for lower utilisation, the runtime increases!

For single core CPU,
Actually the performance depends on the job you are referring.
In your case, for calculation done by CPU, in that case OverClocking would help if your parentBoard supports it. Otherwise there is no way for CPU to do calculations that are faster than the speed of CPU.
For the sake of Multicore CPU
As the above answers say, if properly designed the performance may increase, if all cores are fully used.
In single core CPU, if the threads are implemented in User Level then multithreading wont matter if there are blocking system calls in the thread, like an I/O operation. Because kernel won't know about the userlevel threads.
So if the process does I/O then you can implement the threads in Kernel space and then you can implement different threads for different job.
(The answer here is on theory based.)

Even a CPU bound task might run faster multi-threaded if properly designed to take advantage of cache memory and pipelineing done by the processor. Modern processors spend a lot of time
twiddling their thumbs, even when nominally fully "busy".
Imagine a process that used a small chunk of memory very intensively. Processing
the same chunk of memory 1000 times would be much faster than processing 1000 chunks
of similar memory.
You could certainly design a multi threaded program that would be faster than a single thread.

Treads don't increase performance. Threads sacrifice performance in favor of keeping parts of the code responsive.
The only exception is if you are doing a computation that is so parallelizeable that you can run different threads on different cores (which is the exception, not the rule).

C# - Moving files - to queue or multi-thread

I have an app that moves a project and its files from preview to production using a Flex front-end and a .NET web service. Currently, the process takes about 5-10 mins/per project. Aside from latency concerns, it really shouldn't take that long. I'm wondering whether or not this is a good use-case for multi-threading. Also, considering the user may want to push multiple projects or one right after another, is there a way to queue the jobs.
Any suggestions and examples are greatly appreciated.
Thanks!

Something that does heavy disk IO typically isn't a good candidate for multithreading since the disks can really only do one thing at a time. However, if you're pushing to multiple servers or the servers have particularly good disk subsystems some light threading may be beneficial.

As a note - regardless of whether or not you decide to queue the jobs, you will use multi-threading. Queueing is just one way of handling what is ultimately solved using multi-threading.
And yes, I'd recommend you build a queue to push out each project.

You should compare the speed of your code compared to just copying in Windows (i.e., explorer or command line) vs copying with something advanced like TeraCopy. If your code is significantly slower than Window then look at parts in your code to optimize using a profiler. If your code is about as fast as Windows but slower than TeraCopy, then multithreading could help.
Multithreading is not generally helpful when the operation I/O bound, but copying files involves reading from the disk AND writing over the network. This is two I/O operations, so if you separate them onto different threads, it could increase performance. For something like this you need a producer/consumer setup where you have a Circular queue with one thread reading from disk and writing to the queue, and another thread reading from the queue and writing to the network. It'll be important to keep in mind that the two threads will not run at the same speed, so if the queue gets full, wait before writing more data and if it's empty, wait before writing. Also the locking strategy could have a big impact on performance here and could cause the performance to degrade to slower than a single-threaded implementation.

If you're moving things between just two computers, the network is going to be the bottleneck, so you may want to queue these operations.
Likewise, on the same machine, the I/O is going to be the bottleneck, so you'd want to queue there, too.

You should try using the ThreadPool.
ThreadPool.QueueUserWorkItem(MoveProject, project);

Agreed with everyone over the limited performance of running the tasks in parallel.
If you have full control over your deployment environment, you could use Rhino Queues:
http://ayende.com/Blog/archive/2008/08/01/Rhino-Queues.aspx
This will allow you to produce a queue of jobs asynchronously (say from a WCF service being called from your Silverlight/Flex app) and consume them synchronously.
Alternatively you could use WCF and MSMQ, but the learning curve is greater.

When dealing with multiple files using multiple threads usually IS a good idea in concerns of performance.The main reason is that most disks nowadays support native command queuing.
I wrote an article recently about reading/writing files with multiple files on ddj.com.
See http://www.ddj.com/go-parallel/article/showArticle.jhtml?articleID=220300055.
Also see related question
Will using multiple threads with a RandomAccessFile help performance?
In particular i made the experience that when dealing with very many files it IS a good idea to use a number of threads. In contrary using many thread in many cases does not slow down applications as much as commonly expected.
Having said that i'd say there is no other way to find out than trying all possible different approaches. It depends on very many conditions: Hardware, OS, Drivers etc.

The very first thing you should do is point any kind of profiling tool towards your software. If you can't do that (like, if you haven't got such a tool), insert logging code.
The very first thing you need to do is figure out what is taking a long time to complete, and then why is it taking a long time to complete. That your "copy" operation as a whole takes a long time to complete isn't good enough, you need to pinpoint the reason for this down to a method or a set of methods.
Until you do that, all the other things you can do to your code will likely be guesswork. My experience has taught me that when it comes to performance, 9 out of 10 reasons for things running slow comes as surprises to the guy(s) that wrote the code.
So measure first, then change.
For instance, you might discover that you're in fact reporting progress of copying the file on a byte-per-byte basis, to a GUI, using a synchronous call to the UI, in which case it wouldn't matter how fast the actual copying can run, you'll still be bound by message handling speed.
But that's just conjecture until you know, so measure first, then change.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.