Recently I started working on trying to mass-scrape a website for archiving purposes and I thought it would be a good idea to have multiple web requests working asynchronously to speed things up (10,000,000 pages is definitely a lot to archive) and so I ventured into the harsh mistress of parallelism, three minutes later I start to wonder why the tasks I'm creating (via Task.Factory.StartNew) are 'clogging'.
Annoyed and intrigued I decided to test this to see if it wasn't just a result of circumstance, so I created a new console project in VS2012 and created this:
static void Main(string[] args)
{
for (int i = 0; i < 10; i++) {
int i2 = i + 1;
Stopwatch t = new Stopwatch();
t.Start();
Task.Factory.StartNew(() => {
t.Stop();
Console.ForegroundColor = ConsoleColor.Green; //Note that the other tasks might manage to write their lines between these colour changes messing up the colours.
Console.WriteLine("Task " + i2 + " started after " + t.Elapsed.Seconds + "." + t.Elapsed.Milliseconds + "s");
Thread.Sleep(5000);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Task " + i2 + " finished");
});
}
Console.ReadKey();
}
That when run came up with this result:
As you can see the first four tasks start within quick succession with times of ~0.27, however after that the tasks start to drastically increase in the time it takes them to start.
Why is this happening and what can I do to fix or get around this limitation?
The tasks (by default) runs on the threadpool, which is just as it sounds, a pool of threads. The threadpool is optimized for a lot of situations, but throwing Thread.Sleep in there probably throws a wrench in most of them. Also, Task.Factory.StartNew is a generally a bad idea to use, because people doesn't understand how it works. Try this instead:
static void Main(string[] args)
{
for (int i = 0; i < 10; i++) {
int i2 = i + 1;
Stopwatch t = new Stopwatch();
t.Start();
Task.Run(async () => {
t.Stop();
Console.ForegroundColor = ConsoleColor.Green; //Note that the other tasks might manage to write their lines between these colour changes messing up the colours.
Console.WriteLine("Task " + i2 + " started after " + t.Elapsed.Seconds + "." + t.Elapsed.Milliseconds + "s");
await Task.Delay(5000);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Task " + i2 + " finished");
});
}
Console.ReadKey();
}
More explanation:
The threadpool has a limited number of threads at it's disposal. This number changes depending on certain conditions, however, in general it holds true. For this reason, you should never do anything blocking on the threadpool (if you want to achieve parallelism that is). Thread.Sleep is a perfect example of a blocking API, but so is most web request APIs, unless you use the newer async versions.
So the problem in your original program with crawling is probably the same as in the sample you posted. You are blocking all the thread pool threads, and thus it's getting forced to spin up new threads, and ends up clogging.
Extra goodies
Coincidentally, using Task.Run in this way also easily allows you to rewrite the code in such a way that you can know when it's complete. By storing a reference to all of the started tasks, and awaiting them all at the end (this does not prevent parallelism), you can reliably know when all the tasks have completed. The following shows how to achieve that:
static void Main(string[] args)
{
var tasks = new List<Task>();
for (int i = 0; i < 10; i++) {
int i2 = i + 1;
Stopwatch t = new Stopwatch();
t.Start();
tasks.Add(Task.Run(async () => {
t.Stop();
Console.ForegroundColor = ConsoleColor.Green; //Note that the other tasks might manage to write their lines between these colour changes messing up the colours.
Console.WriteLine("Task " + i2 + " started after " + t.Elapsed.Seconds + "." + t.Elapsed.Milliseconds + "s");
await Task.Delay(5000);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Task " + i2 + " finished");
}));
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine("All tasks completed");
Console.ReadKey();
}
Note: this code has not been tested
Read more
More info on Task.Factory.StartNew and why it should be avoided: http://blog.stephencleary.com/2013/08/startnew-is-dangerous.html.
I think this is occurring because you have exhausted all available threads in the thread pool. Try starting your tasks using TaskCreationOptions.LongRunning. More details here.
Another problem is that you are using Thread.Sleep, this blocks the current thread and its a waste of resources. Try waiting asynchronously using await Task.Delay. You may need to change your lambda to be async.
Task.Factory.StartNew(async () => {
t.Stop();
Console.ForegroundColor = ConsoleColor.Green; //Note that the other tasks might manage to write their lines between these colour changes messing up the colours.
Console.WriteLine("Task " + i2 + " started after " + t.Elapsed.Seconds + "." + t.Elapsed.Milliseconds + "s");
await Task.Delay(5000);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Task " + i2 + " finished");
});
Related
I've just started learning C# and I'm trying to figure out threads.
So, I've made a two threads and I would like to stop one of them by pressing x.
So far when I press x it only shows on the console but it doesn't abort the thread.
I'm obviously doing something wronng so can someone please point out what I'm doing wrong? Thank you.
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
//Creating Threads
Thread t1 = new Thread(Method1)
{
Name = "Thread1"
};
Thread t4 = new Thread(Method4)
{
Name = "Thread4"
};
t1.Start();
t4.Start();
Console.WriteLine("Method4 has started. Press x to stop it. You have 5 SECONDS!!!");
var input = Console.ReadKey();
string input2 = input.Key.ToString();
Console.ReadKey();
if (input2 == "x")
{
t4.Abort();
Console.WriteLine("SUCCESS! You have stoped Thread4! Congrats.");
};
Console.Read();
}
static void Method1()
{
Console.WriteLine("Method1 Started using " + Thread.CurrentThread.Name);
for (int i = 1; i <= 5; i++)
{
Console.WriteLine("Method1: " + i);
System.Threading.Thread.Sleep(1000);
}
Console.WriteLine("Method1 Ended using " + Thread.CurrentThread.Name);
}
static void Method4()
{
Console.WriteLine("Method4 Started using " + Thread.CurrentThread.Name);
for (int i = 1; i <= 5; i++)
{
Console.WriteLine("Method4: " + i);
System.Threading.Thread.Sleep(1000);
}
Console.WriteLine("Method4 Ended using " + Thread.CurrentThread.Name);
}
It looks like you have a extra Console.ReadKey(); before if (input2 == "x"), that extra read causes the program to stop and wait before going inside your if statement waiting for a 2nd key to be pressed.
Also input.Key returns a enum, when you do the to string on it the enum will use a capital X because that is what it is set to. Either use input.KeyChar.ToString() to convert it to a string or use
var input = Console.ReadKey();
if (input.Key == ConsoleKey.X)
To compare against the enum instead of a string.
I also recommend you read the article "How to debug small programs", debugging is a skill you will need to learn to be able to write more complex programs. Stepping through the code with a debugger you would have seen input2 was equal to X so your if statement was
if ("X" == "x")
which is not true.
I have two functions using a lock statement, where after some operations finished, I redirect the output in a .txt file. I have realised that their executions take a lot of time, resulting in blocking their operations and degrading the app's performance in general.
I was thinking that the high execution time could be due to the write operations into a file. What would be the most efficient way to reduce the execution time? Should I use another thread for the write operations, is it possible inside a lock without holding the lock?
A simplified version of my code is illustrated below:
StatsInformation statsInfo = new StatsInformation ();
List<int> lInt = new List<int>();
public void FunctionEnq(List<byte> lByte, int _int)
{
lock (OperationLock)
{
//Do some work here
lInt.Add(_int);
string result = "New Int " + _int + " size " + lInt.Count + " time " + DateTime.Now.ToString("hh:mm:ss.fff");
statsInfo.WriteStatsInFile(result);
}
}
public (List<byte> _outByte, int _time) FunctionDeq()
{
List<byte> _outByte = new List<byte> ();
int _time = -1;
lock (OperationLock)
{
//Do some work here
_outByte.Add(...);
int _int = lInt[0];
//do operations
_time = _int;
lInt.RemoveAt(0);
string result = "Get Int " + _int + " new size " + lInt.Count + " time " + DateTime.Now.ToString("hh:mm:ss.fff");
statsInfo.WriteStatsInFile(result);
}
return (_outByte, _time);
}
I'm trying to find a way of causing the program to not pause but for their to be a delay to execute certain tasks. I.e. I am trying to delay outputting 'Hello' to the console for 10 seconds for example, but the program will continue to execute the rest of the program.
Using TPL:
static void Main(string[] args)
{
Console.WriteLine("Starting at " + DateTime.Now.ToString());
Task.Run(() =>
{
Thread.Sleep(10000);
Console.WriteLine("Done sleeping " + DateTime.Now.ToString());
});
Console.WriteLine("Press any Key...");
Console.ReadKey();
}
output:
Starting at 2/14/2017 3:05:09 PM
Press any Key...
Done sleeping 2/14/2017 3:05:19 PM
just note that if you press a key before 10 seconds, it will exit.
There are 2 typical ways to simulate a delay:
an asynchronous task-like: Task.Delay
or a blocking activity: Thread.Sleep
You seem to refer to the first situation.
Here it is an example
public static void Main(string[] args)
{
Both();
}
static void Both() {
var list = new Task [2];
list[0] = PauseAndWrite();
list[1] = WriteMore();
Task.WaitAll(list);
}
static async Task PauseAndWrite() {
await Task.Delay(2000);
Console.WriteLine("A !");
}
static async Task WriteMore() {
for(int i = 0; i<5; i++) {
await Task.Delay(500);
Console.WriteLine("B - " + i);
}
}
Output
B - 0
B - 1
B - 2
A !
B - 3
B - 4
Start a new thread:
Task.Factory.StartNew(new Action(() =>
{
Thread.Sleep(1000 * 10); // sleep for 10 seconds
Console.Write("Whatever");
}));
You could use a combination of Task.Delay and ContinueWith methods:
Task.Delay(10000).ContinueWith(_ => Console.WriteLine("Done"));
You could use 'Thread.Sleep(10000);'
See:
https://msdn.microsoft.com/en-us/library/d00bd51t(v=vs.110).aspx
I've written this sample code for a long running process, but the Windows Form freezes until the process completes. How do I change the code so that the work runs in parallel?
var ui = TaskScheduler.FromCurrentSynchronizationContext();
Task t = Task.Factory.StartNew(delegate
{
textBox1.Text = "Enter Thread";
for (int i = 0; i < 20; i++)
{
//My Long Running Work
}
textBox1.Text = textBox1.Text + Environment.NewLine + "After Loop";
}, CancellationToken.None, TaskCreationOptions.None, ui);
You can use a continuation. I don't remember the exact syntax but it's something like:
textBox1.Text = "Enter Thread"; //assuming here we're on the UI thread
Task t = Task.Factory.StartNew(delegate
{
for (int i = 0; i < 20; i++)
{
//My Long Running Work
}
return result;
})
.ContinueWith(ret => textBox1.Text = textBox1.Text + Environment.NewLine + result,
TaskScheduler.FromCurrentSynchronizationContext());
An alternative would be something like:
Task t = Task.Factory.StartNew(delegate
{
YourForm.Invoke((Action)(() => textBox1.Text = "Enter Thread");
for (int i = 0; i < 20; i++)
{
//My Long Running Work
}
YourForm.Invoke((Action)(() => textBox1.Text = textBox1.Text + Environment.NewLine + result);},
CancellationToken.None, TaskCreationOptions.None);
Again, I don't remember the exact syntax but the idea is that you want to perform the long operation on a thread different than the UI thread, but report progress (including completion) on the UI thread.
By the way, the BackGroundWorker class would work very well here, too (I personally like it very much).
I've got this code below, where I spawn several threads, normally about 7, and join them to wait until all are done:
List<Thread> threads = new List<Thread>();
Thread thread;
foreach (int size in _parameterCombinations.Keys)
{
thread = new Thread(new ParameterizedThreadStart(CalculateResults));
thread.Start(size);
threads.Add(thread);
}
// wait for all threads to finish
for (int index = 0; index < threads.Count; index++)
{
threads[index].Join();
}
When I check this most of the time only one or two threads are running at the same time, only once or twice when I rerun the app all of them executed.
Is there any way to force all the threads to start executing?
Many thanks.
Your code is fine.. i changed it abit to show you that the execution of the thread is not limited to 2 threads.
I would look for problems in the calculation process..
class Program
{
static void Main(string[] args)
{
List<Thread> threads = new List<Thread>();
Thread thread;
for (int i = 0; i < 7; i++)
{
thread = new Thread(new ParameterizedThreadStart(CalculateResults));
thread.Start();
threads.Add(thread);
}
// wait for all threads to finish
for (int index = 0; index < threads.Count; index++)
{
threads[index].Join();
}
}
static void CalculateResults(object obj)
{
Console.WriteLine("Thread number " + Thread.CurrentThread.ManagedThreadId + " is alive");
Thread.Sleep(1000);
Console.WriteLine("Thread number " + Thread.CurrentThread.ManagedThreadId + " is closing");
}
}