Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
When should I allocate a new thread to the task?
I have one task to compute 100k of equations and store the result into one array, and the second one to sort it. Should I stick to 2 threads, taking into account that I make it a thread-safe code, or can I assign like 3 threads to calculate one third of 100k equations and a fourth one to deal with sorting? Or just 2 threads?
Also, I have a 4 core processor, what happens if I bring the program with 4 threads onto another pc with 2 cores?
Thank you!
First of all, having two threads for the two tasks you describe (calculating and sorting) is useless, since you can only sort the results when all calculations are done, assuming you want to sort by the results.
For the calculations themselves, it depends on the weight of the calculation. Threads allow you to execute them simultaneously, but you also got a little overhead. Having more than one thread on one core is slower than having just one thread, since you got the overhead of switching between thread, without the benefit of simultaneous execution.
Also, you will need a thread safe version of an array (or list), which might be a bit slower because it may to synchonise access to it.
So I think a better solution would be to store the results in one array per thread, let the threads calculate independently, and only after they are all done combine the arrays. I must admit I don't know if you can assign a single thread to a single core. If so, I would create one thread per core.
When dividing the calculations between the threads, don't cut the array in N equal pieces. It could be that one of your cores is very busy with a demanding thread from another process. If that is the case, then a thread of your process will get hardly any time. So it's better to assign small pieces, so if a thread is slower, it will just calculate less pieces of the source array. If you use a thread safe counter, each thread can just pick the next item after each calculation.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I'm trying to get into multithreading by trying to do matrix multiplication and my problem is, how I would get all sub matrixes from a matrix.
My matrix variable is a int[,].
Example, if I have a matrix by 100 x 100, how would i get 10 of 10 x 10 sub matrix. And is it possible that user can choose to how many equal parts to cut up the matrix even if I the matrix is not a square ex. 400 x 300?
Is it even the right way to do it, by calculate on the sub matrixes and then add them together when done?
how would i get 10 of 10 x 10 sub matrix
You would do a double loop, copying each value from the original matrix to the new sub matrix.
Is it even the right way to do it, by calculate on the sub matrixs and then add them together when done?
The normal way to multiply matrices is with a triple loop, as shown in this answer. It should be fairly trivial to convert the outer loop to a parallel.For loop, since all calculations are independent from each other. This avoids any need to process individual sub matrices, and let the framework deal with partitioning the work.
However, things like this is typically fairly cache sensitive. A matrix will be stored in memory as sequential values, either row or column major. Accessing sequential values will be very cache friendly, but accessing non sequential values will not be. So you might want to copy a full row/column to a temporary array to ensure all subsequent accesses are sequential. If using a parallel loop you should probably use one of the overloads that give you a thread local array to use. There more things one can do with cache optimizations, and SIMD intrinstics but that is probably best left as a later exercise.
There are algorithms with a lower algorithmic complexity that does work on submatrices, but in my experience it will be fairly tricky to make this actually faster in c# than a cache-optimized triple loop.
Keep in mind to measure the performance of your method. I would also suggest comparing your performance with some well optimized existing library to get some sense of how performant your implementation is.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have 20 text files stored in hard disk each contains millions of information's about an educational organization.Suppose i have a method which will iterate text files in a loop and process .Which is best way to do the work starting each thread for each text file(Factory.startnew()) or each process for each textfile(process.start())
EDIT
I have 8GB RAM ,8core server ,so thought of to process them in thread or process.Currently i am using process and i don't find any bottleneck as of now.But i am in dilemma for using threads or process
The reading speed of the harddisk will most likely be the bottleneck here.
So, depending on the processing you need to do on the data, it might or might not be interesting to use multiple threads (and I would certainly not use processes).
The most important thing however, will be to make sure that no multiple threads are accessing the same physical disk at the same time, because that would lead to a slowdown because of constantly switching and seeking of the hdd-heads.
I have done some testing with that recently, and in some cases (depending on the hdd and/or pc) the OS takes care of it and it doesn't make a big difference, but on another combination however, a slowdown could be seen to 1/10 of the normal speed.
So, if using multiple threads (only needed if the processing of your data takes longer than the reading from your hdd!), make sure you have a lock somewhere to prevent multiple threads reading from the disk at the same time.
You might also want to look into memory mapped files for this.
edit:
In case you are working with buffers, you could start one thread to continuously fill the buffers, while another thread processes the data.
edit2 (in answer to Micky):
"Process or thread which is best ,faster and take less memory?"
As I said, I would not use processes (due to the extra overhead). That leaves threads, or no threads at all - depending on the amount of processing that needs to be done on the data. If data is read directly from memory buffers (instead of using something like readline for example, where all bets would be off), one or max. two threads would probably be the best option (if the processing of the data is fast enough - testing and timing would be needed to be sure).
As for speed and memory usage: best option (for me) would be memory mapped files (with the files opened in forward only mode). This would not only take advantage of the efficiency of the OS disk cache, but would also access the kernel-memory directly - while, when working with (user)buffers, memory has to be copied from kernel- to userspace, which takes time and uses extra memory.
IOCP: ok, but depends on what the threads would be asking. For example, if 10 threads would be asking 100kB each time in turn (on the different files), 10 x 10ms seektime would be needed, while reading 100Kb would take less than 1ms. Seektimes for future requests would depend on how IOCP handles the caching, which would probably be the same as using memory-mapping, but I don't think IOCP would be any faster in this case.
And using IOCP, would probably also be copying/filling buffers in userspace (and probably harder to handle in general). But I have to say, while writing my answer I was thinking C/C++ (using direct access to memory buffers) only to see later that it was C#. Although the principles stay the same, maybe there's an easy way in C# to use async I/O with IOCP.
As for the speed-testing and avoiding the reading at the same time: I have done testing with more than 50 threads on large files (via memory mapping) - and if done correctly, no reading-speed is lost. On the other hand, when just firing some threads and letting them access the hdd at random (even in large blocks), total reading-speed could come down to 10% in some cases - and sometimes not at all. Same PC, other hdd, other results.
This question already has answers here:
Multi Threading [closed]
(5 answers)
Closed 9 years ago.
How can I measure a code if it is thread-safe or not?
may be general guidelines or best practices
I know that the code to be threading safe is to work across threads without doing unpredictable behavior, but that's sometimes become very tricky and hard to do!
I came up with one simple rule, which is probably hard to implement and therefore theoretical in nature. Code is not thread safe if you can inject some Sleep operations to some places in the code and so change the outcome of the code in a significant way. The code is thread safe otherwise (there's no such combination of delays that can change the result of code execution).
Not only your code should be taken into account when considering thread safety, but other parts of the code, the framework, the operating system, the external factors, like disk drives and memory... everything. That is why this "rule of thumb" is mainly theoretical.
I think The best answer would be here
Multi Threading, I couldn't have notice such an answer before writing this question
I think it is better to close is it !
thanks
Edit by 280Z28 (since I can't add a new answer to a closed question)
Thread safety of an algorithm or application is typically measured in terms of the consistency model which it is guaranteed to follow in the presence of multiple threads of execution (or multiple processes for distributed systems). The two most important things to examine are the following.
Are the pre- and post-conditions of individual methods preserved when multiple threads are used? For example, if your method "adds an element to a dynamically-sized list", then one post condition would be that the size of the list increases by 1 as a result of the add method. If your algorithm is thread-safe, then calling the add method 2 times would result in the size increasing by exactly 2, regardless of which threads were used for the add operations. On the other hand, if the algorithm is not thread-safe, then using multiple threads for the 2 calls could result in anything, ranging from correctly adding the 2 items all the way to the possibility of crashing the program entirely.
When changes are made to data used by algorithms in the program, when do those changes become visible to the other threads in the system. This is the consistency model of your code. Consistency models can be very difficult to understand fully so I'll leave the link above as the starting place for your continued learning, along with a note that systems guaranteeing linearizability or sequential consistency are often the easiest to work with, although not necessarily the easiest to create.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
just a simple question on data updating.
Suppose I have a TextBox called txtBox1 and I want to update the value of a string variable called foo.
Which gives the best performance and best to do?
// The lengthier code but will check if the value is the same before updating.
if (foo != txtBox1.Text)
foo = txtBox1.Text;
or
// The shorter code but will update it regardless if it's the same value
foo = txtBox1.Text;
It really depends on what you do with foo variable.
If updating foo involves updating other parts of your application (via data binding for example) then yes, you should only update it when necessary.
Original Answer
Warning: I messed up... this answer applies for the opposite case, that is:
txtBox1.Text = foo
It may depend on what TextBox you are using...
I haven't reviewed all the clases with that name in the .NET framework from Microsoft. But I can tell for System.Windows.Forms.TextBox that the check is done internally, so doing it yourself is a waste. This is probably the case for the others.
New Answer
Note: This is an edit based on the comments. It it taken from granted that the objective is keep track of the modifications of the texbox and that we are working in windows forms or similar dektop forms solution (that may be WinForms, WPF, GTK#, etc..).
IF you need every value...
TextChanged is the way to go if you want a a log or undo feature where you want to offer each value the textbox was in.
Although take note that the event runs in the same thread as that the text was assigned, and that thread ought to be the thread that created the textbox. Meaning that if you cause any kind of lock or do an expensive operation, it will heavily^1 impact the performance of the form, causing it to react slowly because the thread that must update the form is busy in the TextChanged handler.
^1: heavily compared to the alternative presented below.
If you need to do an expensive operation, what you should do is add the values to a ConcurrentQueue<T> (or similar). And then you can have an async^2 operation run in the background that takes the values from it and process them. Make sure to add to the queue the necessary parameters^3, that way the expensive operation can happen in the background.
^2: It doesn't need to be using the async keyword, it can be a ThreadPool, a Timer, a dedicated Thread or something like that.
^3: for example the text, and the time in the case of a log. If have to monitor multiple controls you could also consider using a POCO (Plain Old CLR Object) class or struct to store all the status that need to be kept.
IF you can miss some values...
Using the event
Use the event to update a version number instead of reading the value.
That is, you are going to keep two integer variables:
The current version number that you will increment when there were a change. Use Thead.VolatireWrite for this (there is no need for Interlocked)
The last checked version number that you will update when you read the values from the form (this done from an async operation), and that you will use to verify if there has been any updates recently. Use Interlocked.Exchange to update the value and proceed if the old value is different from the readed one.
Note: Test the case of aritmetic overflow and make sure it wraps MaxValue to MinValue. No, it will not happen often, but that's no excuse.
Again, under the idea that it is ok to miss some values... If you are using a dedicated Thread for this, you may want to use a WaitHandle (ManualResetEvent or AutoResetEvent [and preferably it's slim counterparts]) to have the thread sleep when there hasn't been modifications instead of having it nopping (spin waiting). You will then set the WaitHandle in the event.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Are C# arrays thread safe?
I have a program that is single-threaded at the moment, which basically puts lots of computed data into a multidimensional array (e.g. double[,] ; string[,] ).
Is it possible to assign segments of this array to different threads ? More precisely, if I make sure only one thread will write at a given coordinate, will there be some lock mechanism triggered ?
In terms of concurrency problems, you will be fine as long as your threads do not read or write to the same portion of your array concurrently. You may see slowdowns because of "False Sharing" hazard, though, so you may want to be on the lookout for unexpected slow-downs when the number of threads increases.
if I make sure only one thread will write at a given coordinate
Then you are safe. Assuming you don't resize the array etc.
If you are now using a for loop you can probably simply switch to Parallel.For(0, n, method)