I have a method which creates different number of requests according to user's input. For each input I have to create an instance of an object and run method from that object in new thread.
It means I never know, how many threads I will need.
And later I will have to access data from each instance that I created before.
So the question is: How can I create different number of requests (and for each request one thread) according to user's input?
An example:
userInput = [start, end, start2, end2, start3, end3]
//for each pair Start-end create instance with unique name or make it other way accessible
Request req1 = new Request('start', 'end')
Request req2 = new Request('start2', 'end2')
Request req3 = new Request('start3', 'end3')
//create thread for each instance
Thread t1 = new Thread(new ThreadStart(req1.GetResponse));
t1.Start();
Thread t2 = new Thread(new ThreadStart(req2.GetResponse));
t2.Start();
Thread t3 = new Thread(new ThreadStart(req3.GetResponse));
t3.Start();
//get data from each instance
string a = req1.result
string b = req2.result
string c = req3.result
In .NET 4, this could be done via Task<T>. Instead of making a thread, you'd write this as something like:
Request req1 = new Request("start", "end");
Task<string> task1 = req1.GetResponseAsync(); // Make GetResponse return Task<string>
// Later, when you need the results:
string a = task1.Result; // This will block until it's completed.
You'd then write the GetRepsonseAsync method something like:
public Task<string> GetRepsonseAsync()
{
return Task.Factory.StartNew( () =>
{
return this.GetResponse(); // calls the synchronous version
});
}
You should NOT create a new thread per Request. Instead take advantage of the ThreadPool, either by using TPL, PLINQ or ThreadPool.QueueUserWorkItem.
An example of how you could do it with PLINQ:
public static class Extensions
{
public static IEnumerable<Tuple<string, string>> Pair(this string[] source)
{
for (int i = 0; i < source.Length; i += 2)
{
yield return Tuple.Create(source[i], source[i + 1]);
}
}
}
class Program
{
static void Main(string[] args)
{
var userinput = "start, end, start2, end2, start3, end3, start4, end4";
var responses = userinput
.Replace(" ", string.Empty)
.Split(',')
.Pair()
.AsParallel()
.Select(x => new Request(x.Item1, x.Item2).GetResponse());
foreach (var r in responses)
{
// do something with the response
}
Console.ReadKey();
}
}
If you are using .NET 3.5 or lower you could use the ThreadPool.QueueUserWorkItem method and if you are using .NET 4.0 use the TPL.
Related
I have a sample of code
var options = new ExecutionDataflowBlockOptions();
var actionBlock = new ActionBlock<int>(async request=>{
var rand = new Ranodm();
//do some stuff with request by using rand
},options);
And the problem with this code is that in every request I have to create new rand object. There is a way to define one rand object per thread and re-use the same object when handling requests?
Thanks to Reed Copsey for the nice article and Theodor Zoulias for reminding about ThreadLocal<T>.
The new ThreadLocal class provides us with a strongly typed,
locally scoped object we can use to setup data that is kept separate
for each thread. This allows us to use data stored per thread,
without having to introduce static variables into our types.
Internally, the ThreadLocal instance will automatically setup the
static data, manage its lifetime, do all of the casting to and from
our specific type. This makes developing much simpler.
A msdn example:
// Demonstrates:
// ThreadLocal(T) constructor
// ThreadLocal(T).Value
// One usage of ThreadLocal(T)
static void Main()
{
// Thread-Local variable that yields a name for a thread
ThreadLocal<string> ThreadName = new ThreadLocal<string>(() =>
{
return "Thread" + Thread.CurrentThread.ManagedThreadId;
});
// Action that prints out ThreadName for the current thread
Action action = () =>
{
// If ThreadName.IsValueCreated is true, it means that we are not the
// first action to run on this thread.
bool repeat = ThreadName.IsValueCreated;
Console.WriteLine("ThreadName = {0} {1}", ThreadName.Value, repeat ? "(repeat)" : "");
};
// Launch eight of them. On 4 cores or less, you should see some repeat ThreadNames
Parallel.Invoke(action, action, action, action, action, action, action, action);
// Dispose when you are done
ThreadName.Dispose();
}
So your code would like this:
ThreadLocal<Random> ThreadName = new ThreadLocal<Random>(() =>
{
return new Random();
});
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 4,
EnsureOrdered = false,
BoundedCapacity = 4 * 8
};
var actionBlock = new ActionBlock<int>(async request =>
{
bool repeat = ThreadName.IsValueCreated;
var random = ThreadName.Value; // your local random class
Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId},
repeat: ", repeat ? "(repeat)" : "");
}, options);
I'm new to tasks and have a question regarding the usage. Does the Task.Factory fire for all items in the foreach loop or block at the 'await' basically making the program single threaded? If I am thinking about this correctly, the foreach loop starts all the tasks and the .GetAwaiter().GetResult(); is blocking the main thread until the last task is complete.
Also, I'm just wanting some anonymous tasks to load the data. Would this be a correct implementation? I'm not referring to exception handling as this is simply an example.
To provide clarity, I am loading data into a database from an outside API. This one is using the FRED database. (https://fred.stlouisfed.org/), but I have several I will hit to complete the entire transfer (maybe 200k data points). Once they are done I update the tables, refresh market calculations, etc. Some of it is real time and some of it is End-of-day. I would also like to say, I currently have everything working in docker, but have been working to update the code using tasks to improve execution.
class Program
{
private async Task SQLBulkLoader()
{
foreach (var fileListObj in indicators.file_list)
{
await Task.Factory.StartNew( () =>
{
string json = this.GET(//API call);
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = fileListObj.series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", fileListObj.series_id);
}
});
}
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
Your awaiting the task returned from Task.Factory.StartNew does make it effectively single threaded. You can see a simple demonstration of this with this short LinqPad example:
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
await Task.Run(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
});
}
Here we wait less as we progress through the loop. The output is:
0 inline
0 in thread
1 inline
1 in thread
2 inline
2 in thread
If you remove the await in front of StartNew you'll see it runs in parallel. As others have mentioned, you can certainly use Parallel.ForEach, but for a demonstration of doing it a bit more manually, you can consider a solution like this:
var tasks = new List<Task>();
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
tasks.Add(Task.Factory.StartNew(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
}));
}
Task.WaitAll(tasks.ToArray());
Notice now how the result is:
0 inline
1 inline
2 inline
2 in thread
1 in thread
0 in thread
This is a typical problem that C# 8.0 Async Streams are going to solve very soon.
Until C# 8.0 is released, you can use the AsyncEnumarator library:
using System.Collections.Async;
class Program
{
private async Task SQLBulkLoader() {
await indicators.file_list.ParallelForEachAsync(async fileListObj =>
{
...
await s.WriteToServerAsync(dataTableConversion);
...
},
maxDegreeOfParalellism: 3,
cancellationToken: default);
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
I do not recommend using Parallel.ForEach and Task.WhenAll as those functions are not designed for asynchronous streaming.
You'll want to add each task to a collection and then use Task.WhenAll to await all of the tasks in that collection:
private async Task SQLBulkLoader()
{
var tasks = new List<Task>();
foreach (var fileListObj in indicators.file_list)
{
tasks.Add(Task.Factory.StartNew( () => { //Doing Stuff }));
}
await Task.WhenAll(tasks.ToArray());
}
My take on this: most time consuming operations will be getting the data using a GET operation and the actual call to WriteToServer using SqlBulkCopy. If you take a look at that class you will see that there is a native async method WriteToServerAsync method (docs here)
. Always use those before creating Tasks yourself using Task.Run.
The same applies to the http GET call. You can use the native HttpClient.GetAsync (docs here) for that.
Doing that you can rewrite your code to this:
private async Task ProcessFileAsync(string series_id)
{
string json = await GetAsync();
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
await s.WriteToServerAsync(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", series_id);
}
}
private async Task SQLBulkLoaderAsync()
{
var tasks = indicators.file_list.Select(f => ProcessFileAsync(f.series_id));
await Task.WhenAll(tasks);
}
Both operations (http call and sql server call) are I/O calls. Using the native async/await pattern there won't even be a thread created or used, see this question for a more in-depth explanation. That is why for IO bound operations you should never have to use Task.Run (or Task.Factory.StartNew. But do mind that Task.Run is the recommended approach).
Sidenote: if you are using HttpClient in a loop, please read this about how to correctly use it.
If you need to limit the number of parallel actions you could also use TPL Dataflow as it plays very nice with Task based IO bound operations. The SQLBulkLoaderAsyncshould then be modified to (leaving the ProcessFileAsync method from earlier this answer intact):
private async Task SQLBulkLoaderAsync()
{
var ab = new ActionBlock<string>(ProcessFileAsync, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (var file in indicators.file_list)
{
ab.Post(file.series_id);
}
ab.Complete();
await ab.Completion;
}
Use a Parallel.ForEach loop to enable data parallelism over any System.Collections.Generic.IEnumerable<T> source.
// Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)
Parallel.ForEach(fileList, (currentFile) =>
{
//Doing Stuff
Console.WriteLine("Processing {0} on thread {1}", currentFile, Thread.CurrentThread.ManagedThreadId);
});
Why didnt you try this :), this program will not start parallel tasks (in a foreach), it will be blocking but the logic in task will be done in separate thread from threadpool (only one at the time, but the main thread will be blocked).
The proper approach in your situation is to use Paraller.ForEach
How can I convert this foreach code to Parallel.ForEach?
I have just started reading TPL Dataflow and it is really confusing for me. There are so many articles on this topic which I read but I am unable to digest it easily. May be it is difficult and may be I haven't started to grasp the idea.
The reason why I started looking into this is that I wanted to implement a scenario where parallel tasks could be run but in order and found that TPL Dataflow can be used as this.
I am practicing TPL and TPL Dataflow both and am at very beginners level so I need help from experts who could guide me to the right direction. In the test method written by me I have done the following thing,
private void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
txtOutput.Clear();
ExecutionDataflowBlockOptions execOptions = new ExecutionDataflowBlockOptions();
execOptions.MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded;
ActionBlock<string> actionBlock = new ActionBlock<string>(async v =>
{
await Task.Delay(200);
await Task.Factory.StartNew(
() => txtOutput.Text += v + Environment.NewLine,
CancellationToken.None,
TaskCreationOptions.None,
scheduler
);
}, execOptions);
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i.ToString());
}
actionBlock.Complete();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
Now the procedure is parallel and both asynchronous (not freezing my UI) but the output generated is not in order whereas I have read that TPL Dataflow keeps the order of the elements by default. So my guess is that, then the Task which I have created is the culprit and it is not output the string in correct order. Am I right?
If this is the case then how do I make this Asynchronous and in order both?
I have tried to separate the code and tried to distribute the code in to different methods but my this try is failed as only string is output to textbox and nothing else happened.
private async void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
await TPLDataFlowOperation();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
public async Task TPLDataFlowOperation()
{
var actionBlock = new ActionBlock<int>(async values => txtOutput.Text += await ProcessValues(values) + Environment.NewLine,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded, TaskScheduler = scheduler });
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
}
private async Task<string> ProcessValues(int i)
{
await Task.Delay(200);
return "Test " + i;
}
I know I have written a bad piece of code but this is the first time I am experimenting with TPL Dataflow.
How do I make this Asynchronous and in order?
This is something of a contradiction. You can make concurrent tasks start in order, but you can't really guarantee that they will run or complete in order.
Let's examine your code and see what's happening.
First, you've selected DataflowBlockOptions.Unbounded. This tells TPL Dataflow that it shouldn't limit the number of tasks that it allows to run concurrently. Therefore, each of your tasks will start at more-or-less the same time, in order.
Your asynchronous operation begins with await Task.Delay(200). This will cause your method to be suspended and then resume after about 200 ms. However, this delay is not exact, and will vary from one invocation to the next. Also, the mechanism by which your code is resumed after the delay may presumably take a variable amount of time. Because of this random variation in the actual delay, then next bit of code to run is now not in order—resulting in the discrepancy you're seeing.
You might find this example interesting. It's a console application to simplify things a bit.
class Program
{
static void Main(string[] args)
{
OutputNumbersWithDataflow();
OutputNumbersWithParallelLinq();
Console.ReadLine();
}
private static async Task HandleStringAsync(string s)
{
await Task.Delay(200);
Console.WriteLine("Handled {0}.", s);
}
private static void OutputNumbersWithDataflow()
{
var block = new ActionBlock<string>(
HandleStringAsync,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
for (int i = 0; i < 20; i++)
{
block.Post(i.ToString());
}
block.Complete();
block.Completion.Wait();
}
private static string HandleString(string s)
{
// Perform some computation on s...
Thread.Sleep(200);
return s;
}
private static void OutputNumbersWithParallelLinq()
{
var myNumbers = Enumerable.Range(0, 20).AsParallel()
.AsOrdered()
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.WithMergeOptions(ParallelMergeOptions.NotBuffered);
var processed = from i in myNumbers
select HandleString(i.ToString());
foreach (var s in processed)
{
Console.WriteLine(s);
}
}
}
The first set of numbers is calculated using a method rather similar to yours—with TPL Dataflow. The numbers are out-of-order.
The second set of numbers, output by OutputNumbersWithParallelLinq(), doesn't use Dataflow at all. It relies on the Parallel LINQ features built into .NET. This runs my HandleString() method on background threads, but keeps the data in order through to the end.
The limitation here is that PLINQ doesn't let you supply an async method. (Well, you could, but it wouldn't give you the desired behavior.) HandleString() is a conventional synchronous method; it just gets executed on a background thread.
And here's a more complex Dataflow example that does preserve the correct order:
private static void OutputNumbersWithDataflowTransformBlock()
{
Random r = new Random();
var transformBlock = new TransformBlock<string, string>(
async s =>
{
// Make the delay extra random, just to be sure.
await Task.Delay(160 + r.Next(80));
return s;
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
// For a GUI application you should also set the
// scheduler here to make sure the output happens
// on the correct thread.
var outputBlock = new ActionBlock<string>(
s => Console.WriteLine("Handled {0}.", s),
new ExecutionDataflowBlockOptions
{
SingleProducerConstrained = true,
MaxDegreeOfParallelism = 1
});
transformBlock.LinkTo(outputBlock, new DataflowLinkOptions { PropagateCompletion = true });
for (int i = 0; i < 20; i++)
{
transformBlock.Post(i.ToString());
}
transformBlock.Complete();
outputBlock.Completion.Wait();
}
I have a requirement where I have pull data from Sailthru API.The problem is it will take forever if I make a call synchronously as the response time depends on data.I have very much new to threading and tried out something but it didn't seems to work as expected. Could anyone please guide.
Below is my sample code
public void GetJobId()
{
Hashtable BlastIDs = getBlastIDs();
foreach (DictionaryEntry entry in BlastIDs)
{
Hashtable blastStats = new Hashtable();
blastStats.Add("stat", "blast");
blastStats.Add("blast_id", entry.Value.ToString());
//Function call 1
//Thread newThread = new Thread(() =>
//{
GetBlastDetails(entry.Value.ToString());
//});
//newThread.Start();
}
}
public void GetBlastDetails(string blast_id)
{
Hashtable tbData = new Hashtable();
tbData.Add("job", "blast_query");
tbData.Add("blast_id", blast_id);
response = client.ApiPost("job", tbData);
object data = response.RawResponse;
JObject jtry = new JObject();
jtry = JObject.Parse(response.RawResponse.ToString());
if (jtry.SelectToken("job_id") != null)
{
//Function call 2
Thread newThread = new Thread(() =>
{
GetJobwiseDetail(jtry.SelectToken("job_id").ToString(), client,blast_id);
});
newThread.Start();
}
}
public void GetJobwiseDetail(string job_id, SailthruClient client,string blast_id)
{
Hashtable tbData = new Hashtable();
tbData.Add("job_id", job_id);
SailthruResponse response;
response = client.ApiGet("job", tbData);
JObject jtry = new JObject();
jtry = JObject.Parse(response.RawResponse.ToString());
string status = jtry.SelectToken("status").ToString();
if (status != "completed")
{
//Function call 3
Thread.Sleep(3000);
Thread newThread = new Thread(() =>
{
GetJobwiseDetail(job_id, client,blast_id);
});
newThread.Start();
string str = "test sleeping thread";
}
else {
string export_url = jtry.SelectToken("export_url").ToString();
TraceService(export_url);
SaveCSVDataToDB(export_url,blast_id);
}
}
I want the Function call 1 to start asynchronously(or may be after gap of 3 seconds to avoid load on processor). In Function call 3 I am calling the same function again if the status is not completed with delay of 3 seconds to give time for receiving response.
Also correct me if my question sounds stupid.
You should never use Thread.Sleep like that because among others you don't know if 3000ms will be enough and instead of using Thread class you should use Task class which is much better option because it provides some additional features, thread pool managing etc. I don't have access to IDE but you should try to use Task.Factory.StartNew to invoke your request asynchronously then your function GetJobwiseDetail should return some value that you want to save to database and then use .ContinueWith with delegate that should save your function result into database. If you are able to use .NET 4.5 you should try feature async/await.
Easier solution:
Parallel.ForEach
Get some details on the internet about it, you have to know nothing about threading. It will invoke every loop iteration on another thread.
First of all, avoid at all cost starting new Threads in your code; those threads will suck your memory like a collapsing star because each of those gets ~1MB of memory allocated to it.
Now for the code - depending on the framework you can choose from the following:
ThreadPool.QueueUserWorkItem for older versions of .NET Framework
Parallel.ForEach for .NET Framework 4
async and await for .NET Framework 4.5
TPL Dataflow also for .NET Framework 4.5
The code you show fits quite well with Dataflow and also I wouldn't suggest using async/await here because its usage would transform your example is a sort of fire and forget mechanism which is against the recommendations of using async/await.
To use Dataflow you'll need in broad lines:
one TransformBlock which will take a string as an input and will return the response from the API
one BroadcastBlock that will broadcast the response to
two ActionBlocks; one to store data in the database and the other one to call TraceService
The code should look like this:
var downloader = new TransformBlock<string, SailthruResponse>(jobId =>
{
var data = new HashTable();
data.Add("job_id", jobId);
return client.ApiGet("job", data);
});
var broadcaster = new BroadcastBlock<SailthruResponse>(response => response);
var databaseWriter = new ActionBlock<SailthruResponse>(response =>
{
// save to database...
})
var tracer = new ActionBlock<SailthruResponse>(response =>
{
//TraceService() call
});
var options = new DataflowLinkOptions{ PropagateCompletion = true };
// link blocks
downloader.LinkTo(broadcaster, options);
broadcaster.LinkTo(databaseWriter, options);
broadcaster.LinkTo(tracer, options);
// process values
foreach(var value in getBlastIds())
{
downloader.Post(value);
}
downloader.Complete();
try with async as below.
async Task Task_MethodAsync()
{
// . . .
// The method has no return statement.
}
So i have been multithreading lately,and since im new to this im probably doing something basic wrong..
Thread mainthread = new Thread(() => threadmain("string", "string", "string"));
mainthread.Start();
the above code works flawlessly but now i want to get a value back from my thread.
to do that i searched on SO and found this code:
object value = null;
var thread = new Thread(
() =>
{
value = "Hello World";
});
thread.Start();
thread.Join();
MessageBox.Show(value);
}
and i dont know how to combine the two.
the return value will be a string.
thank you for helping a newbie,i tried combining them but got errors due to my lack of experience
edit:
my thread:
public void threadmain(string url,string search, string regexstring)
{
using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
{
string allthreadusernames = "";
string htmlCode = client.DownloadString(url);
string[] htmlarray = htmlCode.Split(new string[] { "\n", "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string line in htmlarray)
{
if (line.Contains(search))
{
var regex = new Regex(regexstring);
var matches = regex.Matches(line);
foreach (var singleuser in matches.Cast<Match>().ToList())
{
allthreadusernames = allthreadusernames + "\n" + singleuser.Groups[1].Value;
}
}
}
MessageBox.Show(allthreadusernames);
}
}
An easy solution would be to use another level of abstraction for asynchronous operations: Tasks.
Example:
public static int Calculate()
{
// Simulate some work
int sum = 0;
for (int i = 0; i < 10000; i++)
{
sum += i;
}
return sum;
}
// ...
var task = System.Threading.Tasks.Task.Run(() => Calculate());
int result = task.Result; // waits/blocks until the task is finished
In addition to task.Result, you can also wait for the task with await task (async/await pattern) or task.Wait (+ timeout and/or cancellation token).
Threads aren't really supposed to behave like functions. The code you found still lacks synchronization/thread-safety of reading/writing the output variable.
Task Parallel Library provides a better abstraction, Tasks.
Your problem can then be solved by code similar to this:
var result = await Task.Run(() => MethodReturningAValue());
Running tasks like this is actually more lightweight, as it only borrows an existing thread from either the SynchronizationContext or the .NET thread pool, with low overhead.
I highly recommend Stephen Cleary's blog series about using tasks for parallelism and asynchronicity. It should answer all your further questions.