Tasks in C# - unclear outcome using Random - c#

I'm learning about async programming in C#, and written this code to test Task Parallel Library (Console application):
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
var opr1 = new SlowOperation();
var opr2 = new SlowOperation();
//TASK
Console.WriteLine("Started processing using TASK. Start: {0}", sw.Elapsed);
sw.Start();
Task.Factory.StartNew(() => opr1.PerformSlowOperation(1));
Task.Factory.StartNew(() => opr2.PerformSlowOperation(2));
Console.WriteLine("Stopped processing using TASK. Stop: {0}", sw.Elapsed);
sw.Stop();
}
where slow operation is:
public class SlowOperation
{
public void PerformSlowOperation(int id)
{
var rand = new Random();
double sum = 0;
for (int i = 0; i < 100000000; i++)
{
var number = Convert.ToDouble(rand.Next(100)) / 100;
sum += number;
}
Console.WriteLine("Finished processing operation no. {0}. Final sum calculated is: {1}", id, sum.ToString("0.##"));
}
}
Could anyone please help me to understand why sum produced by each instance of SlowOperation class is exactly the same?

Random is seeded based on time at a low resolution. This is a classic issue and, in my mind, an API design error. I think this is already changed in the CoreCLR repo.
new Random().Next() == new Random().Next() is almost always true.
Also note, that 95% of the code in the question has nothing to do with the problem. In the future you can simplify the code yourself until only the random call is left. That allows you to find such problems yourself.

Set with a different seed value in each task. For example:
var rand = new Random(new System.DateTime().Millisecond + id);
Random construtor: https://msdn.microsoft.com/pt-br/library/ctssatww(v=vs.110).aspx
If your application requires different random number sequences, invoke
this constructor repeatedly with different seed values.One way to
produce a unique seed value is to make it time-dependent.For example,
derive the seed value from the system clock.However, the system clock
might not have sufficient resolution to provide different invocations
of this constructor with a different seed value.

Related

Join statement in multiple thread

I have a program that starts 2 threads and use Join.My understanding says that joins blocks the calling operation till it is finished executing .So,the below program should give 2 Million as answer since both the threads blocks till execution is completed but I am always getting the different value.This might be because first thread is completed but second thread is not run completely.
Can someone please explain the output.
Reference -Multithreading: When would I use a Join?
namespace ThreadSample
{
class Program
{
static int Total = 0;
public static void Main()
{
Thread thread1 = new Thread(Program.AddOneMillion);
Thread thread2 = new Thread(Program.AddOneMillion);
thread1.Start();
thread2.Start();
thread1.Join();
thread2.Join();
Console.WriteLine("Total = " + Total);
Console.ReadLine();
}
public static void AddOneMillion()
{
for (int i = 1; i <= 1000000; i++)
{
Total++;
}
}
}
}
When you call start method of thread, it starts immediately. hence by the time u call join on the thread1, thread2 would also have started. As a result variable 'Total' will be accessed by both threads simultaneously. Hence you will not get correct result as one thread operation is overwriting the value of 'Total' value causing data lose.
public static void Main()
{
Thread thread1 = new Thread(Program.AddOneMillion);
Thread thread2 = new Thread(Program.AddOneMillion);
thread1.Start(); //starts immediately
thread2.Start();//starts immediately
thread1.Join(); //By the time this line executes, both threads have accessed the Total varaible causing data loss or corruption.
thread2.Join();
Console.WriteLine("Total = " + Total);
Console.ReadLine();
}
Inorder to correct results either u can lock the Total variable as follows
static object _l = new object();
public static void AddOneMillion()
{
for (int i = 0; i < 1000000; i++)
{
lock(_l)
ii++;
}
}
U can use Interlocked.Increment which atomically updates the variable.
Please refer the link posted by #Emanuel Vintilă in the comment for more insight.
public static void AddOneMillion()
{
for (int i = 0; i < 1000000; i++)
{
Interlocked.Increment(ref Total);
}
}
It's because the increment operation is not done atomically. That means that each thread may hold a copy of Total and increment it. To avoid that you can use a lock or Interlock.Increment that is specific to incrementing a variable.
Clarification:
thread 1: read copy of Total
thread 2: read copy of Total
thread 1: increment and store Total
thread 2: increment and store Total (overwriting previous value)
I leave you with all possible scenarios where things could go wrong.
I would suggest avoiding explicit threading when possible and use map reduce operations that are less error prone.
You need to read about multi-threading programming and functional programming constructs available in mainstream languages. Most languages have added libraries to leverage the multicore capabilities of modern CPUs.

RX terminolgy: Async processing in RX operator when there are frequent observable notifications

The purpose is to do some async work on a scarce resource in a RX operator, Select for example. Issues arise when observable notifications came at a rate that is faster than the time it takes for the async operation to complete.
Now I actually solved the problem. My question would be what is the correct terminology for this particular kind of issue? Does it have a name? Is it backpressure? Research I did until now indicate that this is some kind of a pressure problem, but not necessarily backpressure from my understanding. The most relevant resources I found are these:
https://github.com/ReactiveX/RxJava/wiki/Backpressure-(2.0)
http://reactivex.io/documentation/operators/backpressure.html
Now to the actual code. Suppose there is a scarce resource and it's consumer. In this case exception is thrown when resource is in use. Please note that this code should not be changed.
public class ScarceResource
{
private static bool inUse = false;
public async Task<int> AccessResource()
{
if (inUse) throw new Exception("Resource is alredy in use");
var result = await Task.Run(() =>
{
inUse = true;
Random random = new Random();
Thread.Sleep(random.Next(1, 2) * 1000);
inUse = false;
return random.Next(1, 10);
});
return result;
}
}
public class ResourceConsumer
{
public IObservable<int> DoWork()
{
var resource = new ScarceResource();
return resource.AccessResource().ToObservable();
}
}
Now here is the problem with a naive implementation to consume the resource. Error is thrown because notifications came at a faster rate than the consumer takes to run.
private static void RunIntoIssue()
{
var numbers = Enumerable.Range(1, 10);
var observableSequence = numbers
.ToObservable()
.SelectMany(n =>
{
Console.WriteLine("In observable: {0}", n);
var resourceConsumer = new ResourceConsumer();
return resourceConsumer.DoWork();
});
observableSequence.Subscribe(n => Console.WriteLine("In observer: {0}", n));
}
With the following code the problem is solved. I slow down processing by using a completed BehaviorSubject in conjunction with the Zip operator. Essentially what this code does is to take a sequential approach instead of a parallel one.
private static void RunWithZip()
{
var completed = new BehaviorSubject<bool>(true);
var numbers = Enumerable.Range(1, 10);
var observableSequence = numbers
.ToObservable()
.Zip(completed, (n, c) =>
{
Console.WriteLine("In observable: {0}, completed: {1}", n, c);
var resourceConsumer = new ResourceConsumer();
return resourceConsumer.DoWork();
})
.Switch()
.Select(n =>
{
completed.OnNext(true);
return n;
});
observableSequence.Subscribe(n => Console.WriteLine("In observer: {0}", n));
Console.Read();
}
Question
Is this backpressure, and if not does it have another terminology associated?
You're basically implementing a form of locking, or a mutex. Your code an cause backpressure, it's not really handling it.
Imagine if your source wasn't a generator function, but rather a series of data pushes. The data pushes arrive at a constant rate of every millisecond. It takes you 10 Millis to process each one, and your code forces serial processing. This causes backpressure: Zip will queue up the unprocessed datapushes infinitely until you run out of memory.

C# program runs slowly on multiple instances

Now a days I'm practicing some programs of C#. I have an issue in this program.
I have created this program and it took 21 seconds to execute and my CPU usage is 20% and Ram usage is 1Gb max.
static void Main(string[] args)
{
string str = Console.ReadLine();
if (str == "start")
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 1; i < 200000; i++)
{
Console.WriteLine("Acccessed Value" + i.ToString());
Console.WriteLine("Time " + sw.ElapsedMilliseconds);
}
}
Console.Read();
}
but when I create 2 instances of this It took 140 seconds and CPU usage is 20 % and Ram usage is 1GB max.
Can you please help me, how can I run multiple instances which will take 21 seconds but can utilize my Ram and CPU as maximum.
You don't want to start different instances. Try using Tasks in your application, to utilize multiple cores of your CPU. Create Environment.ProcessorCount number of tasks and run the operations on them. There is a higher-level of abstraction too - Parallel, which you can look into.
You are using Console.WriteLine method which is an IO method, and does not scale well for multi-threaded operations (see here), and does not support asynchronous operations. So you are likely to not to have a control over this.
But the question is, do you really need such an application? I don't think so; no body wants to write that amount of text for output at once. Writing into file, maybe, which supports asynchronous operations.
As a simple improvement, you can use StringBuilder instead of creating many short-lived String objects as follows:
static void Main(string[] args)
{
string str = Console.ReadLine();
if (str == "start")
{
Stopwatch sw = new Stopwatch();
var builder = new StringBuilder();
for (int i = 1; i < 200000; i++)
{
sw.Start();
builder.Clear();
string a = builder.Append("Acccessed Value").Append(i.ToString()).ToString();
builder.Clear();
string b = builder.Append("Time ").Append(sw.ElapsedMilliseconds);
Console.WriteLine(a);
Console.WriteLine(b);
}
}
Console.Read();
}

Wanting to limit Threadpool so it doesn't max out CPU

I am programming with Threads for the first time. My program only shows a small amount of data at a time; as the user moves through the data I want it to load all the possible data that could be access next so there is as little lag as possible when user switches to a new section.
Worst case scenario I might need to preload 6 sections of data. So I use something like:
if (SectionOne == null)
{
ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(PreloadSection),
Tuple.Create(thisSection, SectionOne));
}
if (SectionTwo == null)
{
ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(PreloadSection),
Tuple.Create(thisSection, SectionTwo));
}
//....
to preload each area. It works great on my main system that has 8 cores; but on my test system that only has 4 cores the entire system slows to a crawl while it is running the threads.
I am thinking that I want to run a maximum of TotalCores - 2 threads at the same time. But really I have no idea.
Looking for any help in getting this to run as efficiently as possible on multiple system setups (single core through 8 cores or whatever). Also, I am using C# and this is a Portable Class Library project, so some of my options are limited.
I would be using this built in .NET parallelism magic.
Task Parallelism
With the Task operations that is managed for you but you still have control to pick how many cores and threads you want.
Example:
const int MAX = 10000;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = 2
};
IList<int> threadIds = new List<int>();
Parallel.For(0, MAX, options, i =>
{
var id = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine("Number '{0}' on thread {1}", i, id);
threadIds.Add(id);
});
You can even do it with Extensions if you want:
const int MAX_TASKS = 8;
var numbers = Enumerable.Range(0, 10000000);
IList<int> threadIds = new List<int>(MAX_TASKS);
numbers.AsParallel()
.WithDegreeOfParallelism(MAX_TASKS)
.ForAll(i =>
{
var id = Thread.CurrentThread.ManagedThreadId;
if (!threadIds.Contains(id))
{
threadIds.Add(id);
}
});
Assert.IsTrue(threadIds.Count > 2);
Assert.IsTrue(threadIds.Count <= MAX_TASKS);
Console.WriteLine(threadIds.Count);

Multithreading and speeding things up

I have the following piece of code. I wish to start the file creation on multiple threads. The objective is that it will take less time to create 10 files when I do it on multiple threads. As I understand I need to introduce the element of asynchronous calls to make that happen.
What changes should I make in this piece of code?
using System;
using System.Text;
using System.Threading;
using System.IO;
using System.Diagnostics;
namespace MultiDemo
{
class MultiDemo
{
public static void Main()
{
var stopWatch = new Stopwatch();
stopWatch.Start();
// Create an instance of the test class.
var ad = new MultiDemo();
//Should create 10 files in a loop.
for (var x = 0; x < 10; x++)
{
var y = x;
int threadId;
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
myThread.Start();
myThread.Join();
//TestMethod("outpFile", y, out threadId);
}
stopWatch.Stop();
Console.WriteLine("Seconds Taken:\t{0}",stopWatch.Elapsed.TotalMilliseconds);
}
public static void TestMethod(string fileName, int hifi, out int threadId)
{
fileName = fileName + hifi;
var fs = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.ReadWrite);
var sw = new StreamWriter(fs, Encoding.UTF8);
for (int x = 0; x < 10000; x++)
{
sw.WriteLine(DateTime.Now.ToString());
}
sw.Close();
threadId = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine("{0}",threadId);
}
}
}
Right now, if I comment the thread creation part of the code and just call testMethod 10 times in a loop, it is faster than the multiple threads that the thread creation attempts to process.
The threaded version of your code is doing extra work, so it's not suprising that it's slower.
When you do something like:
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
myThread.Start();
myThread.Join();
...you're creating a thread, having it call TestMethod, then waiting for it to finish. The additional overhead of creating and starting a thread will make things slower than just calling TestMethod without any threads.
It's possible that you'll see better performance if you start all of the threads working and then wait for them to finish, e.g.:
var workers = new List<Thread>();
for (int i = 0; i < 10; ++i)
{
var y = x;
int threadId;
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
myThread.Start();
workers.Add(myThread);
}
foreach (var worker in workers) worker.Join();
Perhaps this doesn't directly answer your question but here is my thought on the matter. The bottleneck in that code is unlikely to be the processor. I would bet the disk IO would take way more time than the CPU processing. As such, I don't believe that creating new threads will help at all (all the threads will attempt to write to the same disk). I think this is a case of premature optimization. If I were you, I would just do it all on one thread.
The reason you're slower is that all you're doing is starting up a new thread and waiting for it to complete so it has to be slower because your other method is simply not doing 3 steps.
Try this out (assuming .Net 4.0 because of TPL). On my machine, it's consistently 100ms faster when done in parallel.
[Test]
public void Y()
{
var sw = Stopwatch.StartNew();
Parallel.For(0, 10, n => TestMethod("parallel", n));
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0; i < 10; i++)
TestMethod("forloop", i);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
private static void TestMethod(string fileName, int hifi)
{
fileName = fileName + hifi;
var fs = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.ReadWrite);
var sw = new StreamWriter(fs, Encoding.UTF8);
for (int x = 0; x < 10000; x++)
{
sw.WriteLine(DateTime.Now.ToString());
}
sw.Close();
}
The primary thing to observe in your case is Amdahl's Law. Your algorithm makes roughly equal use of each of the following resources:
Processor usage
Memory access
Drive access
Of these, the drive access is by far the slowest item, so to see speedup you'll need to parallelize your algorithm across this resource. In other words, if you parallelize your program by writing the 10 different files to 10 different drives, you'll see a substantial performance improvement compared to just parallelizing the computation of the file contents. In fact if you create the files on 10 different threads, the serialization involved with drive access could actually reduce the overall performance of your program.
Although both imply multi-threaded programming, parallelization should NOT be treated the same as asynchronous programming in the case of IO. While I would not recommend parallelizing your use of the file system, it is almost always beneficial to use asynchronous methods for reading/writing to files.
it's wrong way to get up speed, multithreading for parallel work, but not for accelerate
So why did you decide to use multi threading? The price of starting a new thread might be higher than a simple loop. Its not something you can blindly decide about... If you insist on using threads, you can also check the managed ThreadPool / usage of async delegates, which can reduce the cost of creating new threads, by re-using existing ones.
You're negating the benefit of multiple threads because you Join each thread and thus wait for it to complete before you create and start the next thread.
Instead, add the threads to a list as you create and start them, and then loop through the list of threads, joining them in sequence until they finish.
using System.Collections.Generic;
List<Thread> threads= new List<Thread>();
//Should create 10 files in a loop.
for (var x = 0; x < 10; x++)
{
var y = x;
int threadId;
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
threads.Add(myThread);
myThread.Start();
//myThread.Join();
//TestMethod("outpFile", y, out threadId);
}
foreach (var thread in threads) thread.Join();
try something like:
for (int i = 0; i < 10; ++i)
{
new Action(() => { TestMethod("outpFile"); }).BeginInvoke(null,null);
}
Console.ReadLine();
if it wont be quicker than serial calls then indeed your IO is a botleneck and nothing you can do about it.

Categories