Combining Thread with parameters and return value - c#

So i have been multithreading lately,and since im new to this im probably doing something basic wrong..
Thread mainthread = new Thread(() => threadmain("string", "string", "string"));
mainthread.Start();
the above code works flawlessly but now i want to get a value back from my thread.
to do that i searched on SO and found this code:
object value = null;
var thread = new Thread(
() =>
{
value = "Hello World";
});
thread.Start();
thread.Join();
MessageBox.Show(value);
}
and i dont know how to combine the two.
the return value will be a string.
thank you for helping a newbie,i tried combining them but got errors due to my lack of experience
edit:
my thread:
public void threadmain(string url,string search, string regexstring)
{
using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
{
string allthreadusernames = "";
string htmlCode = client.DownloadString(url);
string[] htmlarray = htmlCode.Split(new string[] { "\n", "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in htmlarray)
{
if (line.Contains(search))
{
var regex = new Regex(regexstring);
var matches = regex.Matches(line);
foreach (var singleuser in matches.Cast<Match>().ToList())
{
allthreadusernames = allthreadusernames + "\n" + singleuser.Groups[1].Value;
}
}
}
MessageBox.Show(allthreadusernames);
}
}

An easy solution would be to use another level of abstraction for asynchronous operations: Tasks.
Example:
public static int Calculate()
{
// Simulate some work
int sum = 0;
for (int i = 0; i < 10000; i++)
{
sum += i;
}
return sum;
}
// ...
var task = System.Threading.Tasks.Task.Run(() => Calculate());
int result = task.Result; // waits/blocks until the task is finished
In addition to task.Result, you can also wait for the task with await task (async/await pattern) or task.Wait (+ timeout and/or cancellation token).

Threads aren't really supposed to behave like functions. The code you found still lacks synchronization/thread-safety of reading/writing the output variable.
Task Parallel Library provides a better abstraction, Tasks.
Your problem can then be solved by code similar to this:
var result = await Task.Run(() => MethodReturningAValue());
Running tasks like this is actually more lightweight, as it only borrows an existing thread from either the SynchronizationContext or the .NET thread pool, with low overhead.
I highly recommend Stephen Cleary's blog series about using tasks for parallelism and asynchronicity. It should answer all your further questions.

Related

Task Factory for each loop with await

I'm new to tasks and have a question regarding the usage. Does the Task.Factory fire for all items in the foreach loop or block at the 'await' basically making the program single threaded? If I am thinking about this correctly, the foreach loop starts all the tasks and the .GetAwaiter().GetResult(); is blocking the main thread until the last task is complete.
Also, I'm just wanting some anonymous tasks to load the data. Would this be a correct implementation? I'm not referring to exception handling as this is simply an example.
To provide clarity, I am loading data into a database from an outside API. This one is using the FRED database. (https://fred.stlouisfed.org/), but I have several I will hit to complete the entire transfer (maybe 200k data points). Once they are done I update the tables, refresh market calculations, etc. Some of it is real time and some of it is End-of-day. I would also like to say, I currently have everything working in docker, but have been working to update the code using tasks to improve execution.
class Program
{
private async Task SQLBulkLoader()
{
foreach (var fileListObj in indicators.file_list)
{
await Task.Factory.StartNew( () =>
{
string json = this.GET(//API call);
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = fileListObj.series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", fileListObj.series_id);
}
});
}
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
Your awaiting the task returned from Task.Factory.StartNew does make it effectively single threaded. You can see a simple demonstration of this with this short LinqPad example:
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
await Task.Run(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
});
}
Here we wait less as we progress through the loop. The output is:
0 inline
0 in thread
1 inline
1 in thread
2 inline
2 in thread
If you remove the await in front of StartNew you'll see it runs in parallel. As others have mentioned, you can certainly use Parallel.ForEach, but for a demonstration of doing it a bit more manually, you can consider a solution like this:
var tasks = new List<Task>();
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
tasks.Add(Task.Factory.StartNew(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
}));
}
Task.WaitAll(tasks.ToArray());
Notice now how the result is:
0 inline
1 inline
2 inline
2 in thread
1 in thread
0 in thread
This is a typical problem that C# 8.0 Async Streams are going to solve very soon.
Until C# 8.0 is released, you can use the AsyncEnumarator library:
using System.Collections.Async;
class Program
{
private async Task SQLBulkLoader() {
await indicators.file_list.ParallelForEachAsync(async fileListObj =>
{
...
await s.WriteToServerAsync(dataTableConversion);
...
},
maxDegreeOfParalellism: 3,
cancellationToken: default);
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
I do not recommend using Parallel.ForEach and Task.WhenAll as those functions are not designed for asynchronous streaming.
You'll want to add each task to a collection and then use Task.WhenAll to await all of the tasks in that collection:
private async Task SQLBulkLoader()
{
var tasks = new List<Task>();
foreach (var fileListObj in indicators.file_list)
{
tasks.Add(Task.Factory.StartNew( () => { //Doing Stuff }));
}
await Task.WhenAll(tasks.ToArray());
}
My take on this: most time consuming operations will be getting the data using a GET operation and the actual call to WriteToServer using SqlBulkCopy. If you take a look at that class you will see that there is a native async method WriteToServerAsync method (docs here)
. Always use those before creating Tasks yourself using Task.Run.
The same applies to the http GET call. You can use the native HttpClient.GetAsync (docs here) for that.
Doing that you can rewrite your code to this:
private async Task ProcessFileAsync(string series_id)
{
string json = await GetAsync();
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
await s.WriteToServerAsync(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", series_id);
}
}
private async Task SQLBulkLoaderAsync()
{
var tasks = indicators.file_list.Select(f => ProcessFileAsync(f.series_id));
await Task.WhenAll(tasks);
}
Both operations (http call and sql server call) are I/O calls. Using the native async/await pattern there won't even be a thread created or used, see this question for a more in-depth explanation. That is why for IO bound operations you should never have to use Task.Run (or Task.Factory.StartNew. But do mind that Task.Run is the recommended approach).
Sidenote: if you are using HttpClient in a loop, please read this about how to correctly use it.
If you need to limit the number of parallel actions you could also use TPL Dataflow as it plays very nice with Task based IO bound operations. The SQLBulkLoaderAsyncshould then be modified to (leaving the ProcessFileAsync method from earlier this answer intact):
private async Task SQLBulkLoaderAsync()
{
var ab = new ActionBlock<string>(ProcessFileAsync, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (var file in indicators.file_list)
{
ab.Post(file.series_id);
}
ab.Complete();
await ab.Completion;
}
Use a Parallel.ForEach loop to enable data parallelism over any System.Collections.Generic.IEnumerable<T> source.
// Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)
Parallel.ForEach(fileList, (currentFile) =>
{
//Doing Stuff
Console.WriteLine("Processing {0} on thread {1}", currentFile, Thread.CurrentThread.ManagedThreadId);
});
Why didnt you try this :), this program will not start parallel tasks (in a foreach), it will be blocking but the logic in task will be done in separate thread from threadpool (only one at the time, but the main thread will be blocked).
The proper approach in your situation is to use Paraller.ForEach
How can I convert this foreach code to Parallel.ForEach?

Task.WaitAll deadlock

I have a question regarding Task.WaitAll. At first I tried to use async/await to get something like this:
private async Task ReadImagesAsync(string HTMLtag)
{
await Task.Run(() =>
{
ReadImages(HTMLtag);
});
}
Content of this function doesn't matter, it works synchronously and is completely independent from outside world.
I use it like this:
private void Execute()
{
string tags = ConfigurationManager.AppSettings["HTMLTags"];
var cursor = Mouse.OverrideCursor;
Mouse.OverrideCursor = System.Windows.Input.Cursors.Wait;
List<Task> tasks = new List<Task>();
foreach (string tag in tags.Split(';'))
{
tasks.Add(ReadImagesAsync(tag));
//tasks.Add(Task.Run(() => ReadImages(tag)));
}
Task.WaitAll(tasks.ToArray());
Mouse.OverrideCursor = cursor;
}
Unfortunately I get deadlock on Task.WaitAll if I use it that way (with async/await). My functions do their jobs (so they are executed properly), but Task.WaitAll just stays here forever because apparently ReadImagesAsync doesn't return to the caller.
The commented line is approach that actually works correctly. If I comment the tasks.Add(ReadImagesAsync(tag)); line and use tasks.Add(Task.Run(() => ReadImages(tag))); - everything works well.
What am I missing here?
ReadImages method looks like that:
private void ReadImages (string HTMLtag)
{
string section = HTMLtag.Split(':')[0];
string tag = HTMLtag.Split(':')[1];
List<string> UsedAdresses = new List<string>();
var webClient = new WebClient();
string page = webClient.DownloadString(Link);
var siteParsed = Link.Split('/');
string site = $"{siteParsed[0]} + // + {siteParsed[1]} + {siteParsed[2]}";
int.TryParse(MinHeight, out int minHeight);
int.TryParse(MinWidth, out int minWidth);
int index = 0;
while (index < page.Length)
{
int startSection = page.IndexOf("<" + section, index);
if (startSection < 0)
break;
int endSection = page.IndexOf(">", startSection) + 1;
index = endSection;
string imgSection = page.Substring(startSection, endSection - startSection);
int imgLinkStart = imgSection.IndexOf(tag + "=\"") + tag.Length + 2;
if (imgLinkStart < 0 || imgLinkStart > imgSection.Length)
continue;
int imgLinkEnd = imgSection.IndexOf("\"", imgLinkStart);
if (imgLinkEnd < 0)
continue;
string imgAdress = imgSection.Substring(imgLinkStart, imgLinkEnd - imgLinkStart);
string format = null;
foreach (var imgFormat in ConfigurationManager.AppSettings["ImgFormats"].Split(';'))
{
if (imgAdress.IndexOf(imgFormat) > 0)
{
format = imgFormat;
break;
}
}
// not an image
if (format == null)
continue;
// some internal resource, but we can try to get it anyways
if (!imgAdress.StartsWith("http"))
imgAdress = site + imgAdress;
string imgName = imgAdress.Split('/').Last();
if (!UsedAdresses.Contains(imgAdress))
{
try
{
Bitmap pic = new Bitmap(webClient.OpenRead(imgAdress));
if (pic.Width > minHeight && pic.Height > minWidth)
webClient.DownloadFile(imgAdress, SaveAdress + "\\" + imgName);
}
catch { }
finally
{
UsedAdresses.Add(imgAdress);
}
}
}
}
You are synchronously waiting for tasks to finish. This is not gonna work for WPF without a little bit of ConfigureAwait(false) magic. Here is a better solution:
private async Task Execute()
{
string tags = ConfigurationManager.AppSettings["HTMLTags"];
var cursor = Mouse.OverrideCursor;
Mouse.OverrideCursor = System.Windows.Input.Cursors.Wait;
List<Task> tasks = new List<Task>();
foreach (string tag in tags.Split(';'))
{
tasks.Add(ReadImagesAsync(tag));
//tasks.Add(Task.Run(() => ReadImages(tag)));
}
await Task.WhenAll(tasks.ToArray());
Mouse.OverrideCursor = cursor;
}
If this is WPF, then I'm sure you would call it when some kind of event happens. The way you should call this method is from event handler, e.g.:
private async void OnWindowOpened(object sender, EventArgs args)
{
await Execute();
}
Looking at the edited version of your question I can see that in fact you can make it all very nice and pretty by using async version of DownloadStringAsync:
private async Task ReadImages (string HTMLtag)
{
string section = HTMLtag.Split(':')[0];
string tag = HTMLtag.Split(':')[1];
List<string> UsedAdresses = new List<string>();
var webClient = new WebClient();
string page = await webClient.DownloadStringAsync(Link);
//...
}
Now, what's the deal with tasks.Add(Task.Run(() => ReadImages(tag)));?
This requires knowledge of SynchronizationContext. When you create a task, you copy the state of thread that scheduled the task, so you can come back to it when you are finished with await. When you call method without Task.Run, you say "I want to come back to UI thread". This is not possible, because UI thread is already waiting for the task and so they are both waiting for themselves. When you add another task to the mix, you are saying: "UI thread must schedule an 'outer' task that will schedule another, 'inner' task, that I will come back to."
Use WhenAll instead of WaitAll, Turn your Execute into async Task and await the task returned by Task.WhenAll.
This way it never blocks on an asynchronous code.
I found some more detailed articles explaining why actually deadlock happened here:
https://medium.com/bynder-tech/c-why-you-should-use-configureawait-false-in-your-library-code-d7837dce3d7f
https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html
Short answer would be making a small change in my async method so it looks like that:
private async Task ReadImagesAsync(string HTMLtag)
{
await Task.Run(() =>
{
ReadImages(HTMLtag);
}).ConfigureAwait(false);
}
Yup. That's it. Suddenly it doesn't deadlock. But these two articles + #FCin response explain WHY it actually happened.
It's like you are saying i do not care when ReadImagesAsync() finishes but you have to wait for it .... Here is a definition
The Task.WaitAll blocks the current thread until all other tasks have completed execution.
The Task.WhenAll method is used to create a task that will complete if and only if all the other tasks are complete.
So, if you are using Task.WhenAll you would get a task object that isn't complete. However, it will not block and would allow the program to execute. On the contrary, the Task.WaitAll method call actually blocks and waits for all other tasks to complete.
Essentially, Task.WhenAll would provide you a task that isn't complete but you can use ContinueWith as soon as the specified tasks have completed their execution. Note that neither the Task.WhenAll method nor the Task.WaitAll would run the tasks, i.e., no tasks are started by any of these methods.
Task.WhenAll(taskList).ContinueWith(t => {
// write your code here
});

Why simple multi task doesn't work when multi thread does?

var finalList = new List<string>();
var list = new List<int> {1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ................. 999999};
var init = 0;
var limitPerThread = 5;
var countDownEvent = new CountdownEvent(list.Count);
for (var i = 0; i < list.Count; i++)
{
var listToFilter = list.Skip(init).Take(limitPerThread).ToList();
new Thread(delegate()
{
Foo(listToFilter);
countDownEvent.Signal();
}).Start();
init += limitPerThread;
}
//wait all to finish
countDownEvent.Wait();
private static void Foo(List<int> listToFilter)
{
var listDone = Boo(listToFilter);
lock (Object)
{
finalList.AddRange(listDone);
}
}
This doesn't:
var taskList = new List<Task>();
for (var i = 0; i < list.Count; i++)
{
var listToFilter = list.Skip(init).Take(limitPerThread).ToList();
var task = Task.Factory.StartNew(() => Foo(listToFilter));
taskList.add(task);
init += limitPerThread;
}
//wait all to finish
Task.WaitAll(taskList.ToArray());
This process must create at least 700 threads in the end. When I run using Thread, it works and creates all of them. But with Task it doesn't.. It seems like its not starting multiples Tasks async.
I really wanna know why.... any ideas?
EDIT
Another version with PLINQ (as suggested).
var taskList = new List<Task>(list.Count);
Parallel.ForEach(taskList, t =>
{
var listToFilter = list.Skip(init).Take(limitPerThread).ToList();
Foo(listToFilter);
init += limitPerThread;
t.Start();
});
Task.WaitAll(taskList.ToArray());
EDIT2:
public static List<Communication> Foo(List<Dispositive> listToPing)
{
var listResult = new List<Communication>();
foreach (var item in listToPing)
{
var listIps = item.listIps;
var communication = new Communication
{
IdDispositive = item.Id
};
try
{
for (var i = 0; i < listIps.Count(); i++)
{
var oPing = new Ping().Send(listIps.ElementAt(i).IpAddress, 10000);
if (oPing != null)
{
if (oPing.Status.Equals(IPStatus.TimedOut) && listIps.Count() > i+1)
continue;
if (oPing.Status.Equals(IPStatus.TimedOut))
{
communication.Result = "NOK";
break;
}
communication.Result = oPing.Status.Equals(IPStatus.Success) ? "OK" : "NOK";
break;
}
if (listIps.Count() > i+1)
continue;
communication.Result = "NOK";
break;
}
}
catch
{
communication.Result = "NOK";
}
finally
{
listResult.Add(communication);
}
}
return listResult;
}
Tasks are NOT multithreading. They can be used for that, but mostly they're actually used for the opposite - multiplexing on a single thread.
To use tasks for multithreading, I suggest using Parallel LINQ. It has many optimizations in it already, such as intelligent partitioning of your lists and only spawning as many threads as there ar CPU cores, etc.
To understand Task and async, think of it this way - a typical workload often includes IO that needs to be waited upon. Maybe you read a file, or query a webservice, or access a database, or whatever. The point is - your thread gets to wait a loooong time (in CPU cycles at least) until you get a response from some faraway destination.
In the Olden Days™ that meant that your thread was getting locked down (suspended) until that response came. If you wanted to do something else in the meantime, you needed to spawn a new thread. That's doable, but not too efficient. Each OS thread carries a significant overhead (memory, kernel resources) with it. And you could end up with several threads actively burning the CPU, which means that the OS needs to switch between them so that each gets a bit of CPU time and these "context switches" are pretty expensive.
async changes that workflow. Now you can have multiple workloads executing on the same thread. While one piece of work is awaiting the result from a faraway source, another can step in and use that thread to do something else useful. When that second workload gets to its own await, the first can awaken and continue.
After all, it doesn't make sense to spawn more threads than there are CPU cores. You're not going to get more work done that way. Just the opposite - more time will be spent on switching the threads and less time will be available for useful work.
That is what the Task/async/await was originally designed for. However Parallel LINQ has also taken advantage of it and reused it for multithreading. In this case you can look at it this way - the other threads is what your main thread is the "faraway destination" that your main thread is waiting on.
Tasks are executed on the Thread Pool. This means that a handful of threads will serve a large number of tasks. You have multi-threading, but not a thread for every task spawned.
You should use tasks. You should aim to use as much threads as your CPU. Generally, the thread pool is doing this for you.
How did you measure up the performance? Do you think that the 700 threads will work faster than 700 tasks executing by 4 threads? No, they would not.
It seems like its not starting multiples Tasks async
How did you came up with this? As other suggested in comments and in other answers, you probably need to remove a thread creation, as after creating 700 threads you'll degrade your system performance, as your threads would fight to each other for the processor time, without any work done faster.
So, you need to add the async/await for your IO operations, into the Foo method, with SendPingAsync version. Also, your method could be simplyfied, as many checks for a listIps.Count() > i + 1 conditions are useless - you do it in the for condition block:
public static async Task<List<Communication>> Foo(List<Dispositive> listToPing)
{
var listResult = new List<Communication>();
foreach (var item in listToPing)
{
var listIps = item.listIps;
var communication = new Communication
{
IdDispositive = item.Id
};
try
{
var ping = new Ping();
communication.Result = "NOK";
for (var i = 0; i < listIps.Count(); i++)
{
var oPing = await ping.SendPingAsync(listIps.ElementAt(i).IpAddress, 10000);
if (oPing != null)
{
if (oPing.Status.Equals(IPStatus.Success)
{
communication.Result = "OK";
break;
}
}
}
}
catch
{
communication.Result = "NOK";
}
finally
{
listResult.Add(communication);
}
}
return listResult;
}
Other problem with your code is that PLINQ version isn't threadsafe:
init += limitPerThread;
This can fail while executing in parallel. You may introduce some helper method, like in this answer:
private async Task<List<PingReply>> PingAsync(List<Communication> theListOfIPs)
{
Ping pingSender = new Ping();
var tasks = theListOfIPs.Select(ip => pingSender.SendPingAsync(ip, 10000));
var results = await Task.WhenAll(tasks);
return results.ToList();
}
And do this kind of check (try/catch logic removed for simplicity):
public static async Task<List<Communication>> Foo(List<Dispositive> listToPing)
{
var listResult = new List<Communication>();
foreach (var item in listToPing)
{
var listIps = item.listIps;
var communication = new Communication
{
IdDispositive = item.Id
};
var check = await PingAsync(listIps);
communication.Result = check.Any(p => p.Status.Equals(IPStatus.Success)) ? "OK" : "NOK";
}
}
And you probably should use Task.Run instead of Task.StartNew for being sure that you aren't blocking the UI thread.

TPL DataFlow Workflow

I have just started reading TPL Dataflow and it is really confusing for me. There are so many articles on this topic which I read but I am unable to digest it easily. May be it is difficult and may be I haven't started to grasp the idea.
The reason why I started looking into this is that I wanted to implement a scenario where parallel tasks could be run but in order and found that TPL Dataflow can be used as this.
I am practicing TPL and TPL Dataflow both and am at very beginners level so I need help from experts who could guide me to the right direction. In the test method written by me I have done the following thing,
private void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
txtOutput.Clear();
ExecutionDataflowBlockOptions execOptions = new ExecutionDataflowBlockOptions();
execOptions.MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded;
ActionBlock<string> actionBlock = new ActionBlock<string>(async v =>
{
await Task.Delay(200);
await Task.Factory.StartNew(
() => txtOutput.Text += v + Environment.NewLine,
CancellationToken.None,
TaskCreationOptions.None,
scheduler
);
}, execOptions);
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i.ToString());
}
actionBlock.Complete();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
Now the procedure is parallel and both asynchronous (not freezing my UI) but the output generated is not in order whereas I have read that TPL Dataflow keeps the order of the elements by default. So my guess is that, then the Task which I have created is the culprit and it is not output the string in correct order. Am I right?
If this is the case then how do I make this Asynchronous and in order both?
I have tried to separate the code and tried to distribute the code in to different methods but my this try is failed as only string is output to textbox and nothing else happened.
private async void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
await TPLDataFlowOperation();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
public async Task TPLDataFlowOperation()
{
var actionBlock = new ActionBlock<int>(async values => txtOutput.Text += await ProcessValues(values) + Environment.NewLine,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded, TaskScheduler = scheduler });
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
}
private async Task<string> ProcessValues(int i)
{
await Task.Delay(200);
return "Test " + i;
}
I know I have written a bad piece of code but this is the first time I am experimenting with TPL Dataflow.
How do I make this Asynchronous and in order?
This is something of a contradiction. You can make concurrent tasks start in order, but you can't really guarantee that they will run or complete in order.
Let's examine your code and see what's happening.
First, you've selected DataflowBlockOptions.Unbounded. This tells TPL Dataflow that it shouldn't limit the number of tasks that it allows to run concurrently. Therefore, each of your tasks will start at more-or-less the same time, in order.
Your asynchronous operation begins with await Task.Delay(200). This will cause your method to be suspended and then resume after about 200 ms. However, this delay is not exact, and will vary from one invocation to the next. Also, the mechanism by which your code is resumed after the delay may presumably take a variable amount of time. Because of this random variation in the actual delay, then next bit of code to run is now not in order—resulting in the discrepancy you're seeing.
You might find this example interesting. It's a console application to simplify things a bit.
class Program
{
static void Main(string[] args)
{
OutputNumbersWithDataflow();
OutputNumbersWithParallelLinq();
Console.ReadLine();
}
private static async Task HandleStringAsync(string s)
{
await Task.Delay(200);
Console.WriteLine("Handled {0}.", s);
}
private static void OutputNumbersWithDataflow()
{
var block = new ActionBlock<string>(
HandleStringAsync,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
for (int i = 0; i < 20; i++)
{
block.Post(i.ToString());
}
block.Complete();
block.Completion.Wait();
}
private static string HandleString(string s)
{
// Perform some computation on s...
Thread.Sleep(200);
return s;
}
private static void OutputNumbersWithParallelLinq()
{
var myNumbers = Enumerable.Range(0, 20).AsParallel()
.AsOrdered()
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.WithMergeOptions(ParallelMergeOptions.NotBuffered);
var processed = from i in myNumbers
select HandleString(i.ToString());
foreach (var s in processed)
{
Console.WriteLine(s);
}
}
}
The first set of numbers is calculated using a method rather similar to yours—with TPL Dataflow. The numbers are out-of-order.
The second set of numbers, output by OutputNumbersWithParallelLinq(), doesn't use Dataflow at all. It relies on the Parallel LINQ features built into .NET. This runs my HandleString() method on background threads, but keeps the data in order through to the end.
The limitation here is that PLINQ doesn't let you supply an async method. (Well, you could, but it wouldn't give you the desired behavior.) HandleString() is a conventional synchronous method; it just gets executed on a background thread.
And here's a more complex Dataflow example that does preserve the correct order:
private static void OutputNumbersWithDataflowTransformBlock()
{
Random r = new Random();
var transformBlock = new TransformBlock<string, string>(
async s =>
{
// Make the delay extra random, just to be sure.
await Task.Delay(160 + r.Next(80));
return s;
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
// For a GUI application you should also set the
// scheduler here to make sure the output happens
// on the correct thread.
var outputBlock = new ActionBlock<string>(
s => Console.WriteLine("Handled {0}.", s),
new ExecutionDataflowBlockOptions
{
SingleProducerConstrained = true,
MaxDegreeOfParallelism = 1
});
transformBlock.LinkTo(outputBlock, new DataflowLinkOptions { PropagateCompletion = true });
for (int i = 0; i < 20; i++)
{
transformBlock.Post(i.ToString());
}
transformBlock.Complete();
outputBlock.Completion.Wait();
}

c# Create various number of threads

I have a method which creates different number of requests according to user's input. For each input I have to create an instance of an object and run method from that object in new thread.
It means I never know, how many threads I will need.
And later I will have to access data from each instance that I created before.
So the question is: How can I create different number of requests (and for each request one thread) according to user's input?
An example:
userInput = [start, end, start2, end2, start3, end3]
//for each pair Start-end create instance with unique name or make it other way accessible
Request req1 = new Request('start', 'end')
Request req2 = new Request('start2', 'end2')
Request req3 = new Request('start3', 'end3')
//create thread for each instance
Thread t1 = new Thread(new ThreadStart(req1.GetResponse));
t1.Start();
Thread t2 = new Thread(new ThreadStart(req2.GetResponse));
t2.Start();
Thread t3 = new Thread(new ThreadStart(req3.GetResponse));
t3.Start();
//get data from each instance
string a = req1.result
string b = req2.result
string c = req3.result
In .NET 4, this could be done via Task<T>. Instead of making a thread, you'd write this as something like:
Request req1 = new Request("start", "end");
Task<string> task1 = req1.GetResponseAsync(); // Make GetResponse return Task<string>
// Later, when you need the results:
string a = task1.Result; // This will block until it's completed.
You'd then write the GetRepsonseAsync method something like:
public Task<string> GetRepsonseAsync()
{
return Task.Factory.StartNew( () =>
{
return this.GetResponse(); // calls the synchronous version
});
}
You should NOT create a new thread per Request. Instead take advantage of the ThreadPool, either by using TPL, PLINQ or ThreadPool.QueueUserWorkItem.
An example of how you could do it with PLINQ:
public static class Extensions
{
public static IEnumerable<Tuple<string, string>> Pair(this string[] source)
{
for (int i = 0; i < source.Length; i += 2)
{
yield return Tuple.Create(source[i], source[i + 1]);
}
}
}
class Program
{
static void Main(string[] args)
{
var userinput = "start, end, start2, end2, start3, end3, start4, end4";
var responses = userinput
.Replace(" ", string.Empty)
.Split(',')
.Pair()
.AsParallel()
.Select(x => new Request(x.Item1, x.Item2).GetResponse());
foreach (var r in responses)
{
// do something with the response
}
Console.ReadKey();
}
}
If you are using .NET 3.5 or lower you could use the ThreadPool.QueueUserWorkItem method and if you are using .NET 4.0 use the TPL.

Categories