I have a for-loop that creates a new Thread each iteration. In short, my loop is creating 20 threads that does some action, at the same time.
My goal inside each of these threads, is to create a DateTime variable with a start time, execute an operation, and create a DateTime variable with an end time. Hereafter I'll take the difference between these two variables to find out, how long this operation took in this SPECIFIC thread. Then log it out.
However that isn't working as expected, and I'm confused on why.
It seems like it justs "adds" the time to the variables, each iteration of a new thread, instead of creating a completely new and fresh version of the variable, only to be taking into consideration in that specific thread.
This is my for-loop code:
for(int i = 0; i < 20; i++)
{
Thread thread = new Thread(() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
RESTRequest(Method.POST, ....),
sw.Stop();
Console.WriteLine("Result took (" + sw.Elapsed.Seconds + " seconds, " + sw.Elapsed.Milliseconds + " milliseconds)");
});
thread.IsBackground = true;
thread.Start();
}
Long operation function:
public static string RESTRequest(Method method, string endpoint, string resource, string body, SimplytureRESTRequestHeader[] requestHeaders = null, SimplytureRESTResponseHeader[] responseHeaders = null, SimplytureRESTAuthentication authentication = null, SimplytureRESTParameter[] parameters = null)
{
var client = new RestClient(endpoint);
if(authentication != null)
{
client.Authenticator = new HttpBasicAuthenticator(authentication.username, authentication.password);
}
var request = new RestRequest(resource, method);
if (requestHeaders != null)
{
foreach (var header in requestHeaders)
{
request.AddHeader(header.headerType, header.headerValue);
}
}
if(body != null)
{
request.AddParameter("text/json", body, ParameterType.RequestBody);
}
if(parameters != null)
{
foreach (var parameter in parameters)
{
request.AddParameter(parameter.key, parameter.value);
}
}
IRestResponse response = client.Execute(request);
if (responseHeaders != null)
{
foreach (var header in responseHeaders)
{
var par = new Parameter();
par.Name = header.headerType;
par.Value = header.headerValue;
response.Headers.Add(par);
}
}
var content = response.Content;
return content;
}
This is my results:
EDIT:
I also tried using the Stopwatch class, but it didn't do any difference, but definitely more handy. I also Added the long operation for debugging.
There is a limitation for concurrent calls to the same ServicePoint.
The default is 2 concurrent connections for each unique ServicePoint.
Add System.Net.ServicePointManager.DefaultConnectionLimit = 20; to raise that limit to match the thread count.
System.Net.ServicePointManager.DefaultConnectionLimit
You can also set this value in config file
<system.net>
<connectionManagement>
<add address="*" maxconnection="20" />
</connectionManagement>
</system.net>
Related
I am having a problem calling a method in a list of Tasks. I have a method that creates N number of tasks. In each task I perform some operations that in the end results an fetching data via HttpWebRequest and writing that data into a file. I use lock objects to lock the access to shared resources like variables. Everything performs great except the call for a method that creates a executes an HttpWebRequest (method GetData). Whenever I don't lock that call for the method (GetData) it seems that some data/files are skipped. For example:
With the lock object I get file 1,2,3 and 4
Without the lock object I get file 2,4 and 3
Here's the code for the method that creates the tasks
private object lockObjectWebRequest= new object();
private object lockObjectTransactions = new object();
public List<Task> ExtractLoanTransactionsData(string URLReceived, string Headers, string Body)
{
List<Task> Tasks = new List<Task>();
try
{
int Limit = 0;
int OffsetItemsTotal = 0;
int NumberOftasks = 4;
// Create the task to run in parallel
for (int i = 0; i <= NumberOftasks; i++)
{
int OffsetCalculated = 0;
if (i > 0)
{
OffsetCalculated = Limit * i;
}
Tasks.Add(Task.Factory.StartNew(() =>
{
string URL = URLReceived+ "&offset=" + OffsetCalculated .ToString() + "&limit=" + Limit.ToString();
string Output = string.Empty;
lock (lockObjectWebRequest)
{
Output = GetData(URL, Headers,Body);
}
if (Output != "[]")
{
lock (lockObjectTransactions)
{
Identifier++;
Job.Identifier = Identifier;
// write to file
string json = JValue.Parse(Output).ToString(Formatting.Indented);
string FileName = OffSet.ToString() + Identifier;
string Path = #"C:\FileFolder\" + FileName + ".json";
File.WriteAllText(Path, json);
}
}
}));
}
}
catch (Exception ex)
{
Tasks = new List<Task>();
}
return Tasks;
}
Here's the code that performs the HttpWebRequest:
public string GetData(string URL, string Headers, string Body)
{
string Data = string.Empty;
Headers = Headers.Trim('{').Trim('}');
string[] HeadersSplit = Headers.Split(new char[] { ',', ':' });
HttpWebRequest WebRequest = (HttpWebRequest)HttpWebRequest.Create(URL);
WebRequest.Credentials = new NetworkCredential();
WebRequest.Method = "POST";
HttpWebResponse WebResponse;
// Set necessary Request Headers
for (int i = 0; i < HeadersSplit.Length; i = i + 2)
{
string HeaderPart1 = HeadersSplit[i].Replace("\"", "").Trim();
string HeaderPart2 = HeadersSplit[i + 1].Replace("\"", "").Trim();
if (HeaderPart1 == "Content-Type")
{
WebRequest.ContentType = HeaderPart2;
}
else if (HeaderPart1 == "Accept")
{
WebRequest.Accept = HeaderPart2;
}
else if (HeaderPart1 == "Authorization")
{
WebRequest.Headers["Authorization"] = HeaderPart2;
}
}
WebRequest.Headers.Add("cache-control", "no-cache");
// Add body to Request
using (var streamWriter = new StreamWriter(WebRequest.GetRequestStream()))
{
streamWriter.Write(Body);
streamWriter.Flush();
streamWriter.Close();
}
// Execute Request
WebResponse = (HttpWebResponse)WebRequest.GetResponse();
// Validate Response
if (WebResponse.StatusCode == HttpStatusCode.OK)
{
using (var streamReader = new StreamReader(WebResponse.GetResponseStream()))
{
Data = streamReader.ReadToEnd();
}
}
return Data;
}
What am I doing wrong here? The method doesn't have global data that is shared between tasks.
But you do have data that is shared between the tasks: the local varibleIdentifier and the method argument Job.
You are writing to a file using the Identifier in the file-name. If the lock is not in place, that piece of code will be running simultaniously.
The implications for Job can't be deduced from your question.
I think you can solve the identifier problem by doing this:
var identifier = Interlocked.Increment(ref Identifier);
Job.Identifier = identifier; // Use 'identifier', not 'Identifier'
// write to file
string json = ...;
string FileName = OffSet.ToString() + "_" +
"MAMBU_LT_" + DateTime.Now.ToString("yyyyMMddHHmmss") + "_" +
identifier; // Use 'identifier', not 'Identifier'
...
I've made a tool that sends HTTP requests (GET) (reads the info of what to send from a .txt), captures the json and parses it + writes it to a .txt. The tool is a console app targetting .NET Framework 4.5.
Could I possibly speed this up with the help of "multi-threading"?
System.IO.StreamReader file =
new System.IO.StreamReader(#"file.txt");
while ((line = file.ReadLine()) != null)
{
using (var client = new HttpClient(new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate }))
{
client.BaseAddress = new Uri("https://www.website.com/");
HttpResponseMessage response = client.GetAsync("index?userGetLevel=" + line).Result;
response.EnsureSuccessStatusCode();
string result = response.Content.ReadAsStringAsync().Result;
dynamic json = JObject.Parse(result);
string level = json.data.level;
if (level == "null")
{
Console.WriteLine("{0}: User does not exist.", line);
}
else
{
Console.WriteLine("{0}: User level is {1}.", line, level);
using (StreamWriter sw = File.AppendText(levels.txt))
{
sw.WriteLine("{0} : {1}", line, level);
}
}
}
}
file.Close();
Answers to questions:
"Speed up what?": I'd like to speed up the whole process (the amount of requests it sends each time, not how fast it sends them.
"How many requests?": The tool reads a string from a text file and puts that information in to a part of the URL, and then captures the response, then places that in a result.txt. I'd like to increase the speed it does this at/how many it does at a time.
"Can the requests happen concurrently or are dependant on prior request responses?": Yes, and no, they are not dependent.
" Does your web server impose a limit on the number of concurrent requests?": No.
"How long does a typical request take?": The request + the time the response shows up on console per each requests is a little bit more than 1/3 of a second.
There are several ways to asymmetric programming, one of them could be something like this:
var messages = new ConcurrentQueue<string>();
//or
//var lockObj = new object();
public int main()
{
var fileText = File.ReadAllLines(#"file.txt");
var taskList = new List<Task>();
foreach (var line in fileText)
{
taskList.Add(Task.Factory.StartNew(HandlerMethod, line));
//you can control the amount of produced task if you want:
//if(taskList.Count > 20)
//{
// Task.WaitAll(taskList.ToArray());
// taskList.Clear();
//}
}
Task.WaitAll(taskList.ToArray()); //this line may not work as I expected.
//for the first way
var results = new StringBuilder();
foreach (var msg in messages)
{
results.AppendLine("{0} : {1}", line, level);
}
File.WriteAllText("path", results.ToString());
}
For writing the results, either you can use a public concurrent collection or use a lock pattern:
public void HandlerMethod(object obj)
{
var line = (string)obj;
var result = string.Empty;
using (var client = new HttpClient(new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate }))
{
client.BaseAddress = new Uri("https://www.website.com/");
HttpResponseMessage response = client.GetAsync("index?userGetLevel=" + line).Result;
response.EnsureSuccessStatusCode();
string result = response.Content.ReadAsStringAsync().Result;
dynamic json = JObject.Parse(result);
result = json.data.level;
}
//for the first way
if (string.IsNullOrWhiteSpace(result))
{
messages.Enqueue("{0}: User does not exist.", line);
}
else
{
messages.Enqueue("{0}: User level is {1}.", line, result);
}
//for the second way
//lock(lockObj)
//{
// using (StreamWriter sw = File.AppendText(levels.txt))
// {
// sw.WriteLine("{0} : {1}", line, level);
// }
//}
}
I will be use this.
http://www.matlus.com/httpwebrequest-asynchronous-programming/
But how can I know request iscompleted? So how do I know that the work done?
I use like it :
PostAsync(url, postParameters, callbackState =>
{
if (callbackState.Exception != null)
{
throw callbackState.Exception;
}
else
{
WriteCallBack(GetResponseText(callbackState.ResponseStream));
}
});
But I do not know whether it ends the process?
In the documentation it says:
Each of these methods also requires a callback delegate that is called back with the response of the http call[...] The callback will be called for each response, as and when a response is received and we simply continue on with processing the response as we see fit
Whatever you do with the callbackState delegate, it will hold the response content. That's how you know it will be finished.
Edit: Example from the docs.
ServicePointManager.DefaultConnectionLimit = 50;
var url = "http://localhost/HttpTaskServer/default.aspx";
var iterations = 1000;
for (int i = 0; i < iterations; i++)
{
var postParameters = new NameValueCollection();
postParameters.Add("data", i.ToString());
HttpSocket.PostAsync(url, postParameters, callbackState =>
{
if (callbackState.Exception != null)
throw callbackState.Exception;
Console.WriteLine(HttpSocket.GetResponseText(callbackState.ResponseStream));
});
}
I wrote a method to download data from the internet and save it to my database. I wrote this using PLINQ to take advantage of my multi-core processor and because it is downloading thousands of different files in a very short period of time. I have added comments below in my code to show where it stops but the program just sits there and after awhile, I get an out of memory exception. This being my first time using TPL and PLINQ, I'm extremely confused so I could really use some advice on what to do to fix this.
UPDATE: I found out that I was getting a webexception constantly because the webclient was timing out. I fixed this by increasing the max amount of connections according to this answer here. I was then getting exceptions for the connection not opening and I fixed it by using this answer here. I'm now getting connection timeout errors for the database even though it is a local sql server. I still haven't been able to get any of my code to run so I could totally use some advice
static void Main(string[] args)
{
try
{
while (true)
{
// start the download process for market info
startDownload();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
public static void startDownload()
{
DateTime currentDay = DateTime.Now;
List<Task> taskList = new List<Task>();
if (Helper.holidays.Contains(currentDay) == false)
{
List<string> markets = new List<string>() { "amex", "nasdaq", "nyse", "global" };
Parallel.ForEach(markets, market =>
{
Downloads.startInitialMarketSymbolsDownload(market);
}
);
Console.WriteLine("All downloads finished!");
}
// wait 24 hours before you do this again
Task.Delay(TimeSpan.FromHours(24)).Wait();
}
public static void startInitialMarketSymbolsDownload(string market)
{
try
{
List<string> symbolList = new List<string>();
symbolList = Helper.getStockSymbols(market);
var historicalGroups = symbolList.AsParallel().Select((x, i) => new { x, i })
.GroupBy(x => x.i / 100)
.Select(g => g.Select(x => x.x).ToArray());
historicalGroups.AsParallel().ForAll(g => getHistoricalStockData(g, market));
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
public static void getHistoricalStockData(string[] symbols, string market)
{
// download data for list of symbols and then upload to db tables
Uri uri;
string url, line;
decimal open = 0, high = 0, low = 0, close = 0, adjClose = 0;
DateTime date;
Int64 volume = 0;
string[] lineArray;
List<string> symbolError = new List<string>();
Dictionary<string, string> badNameError = new Dictionary<string, string>();
Parallel.ForEach(symbols, symbol =>
{
url = "http://ichart.finance.yahoo.com/table.csv?s=" + symbol + "&a=00&b=1&c=1900&d=" + (DateTime.Now.Month - 1) + "&e=" + DateTime.Now.Day + "&f=" + DateTime.Now.Year + "&g=d&ignore=.csv";
uri = new Uri(url);
using (dbEntities entity = new dbEntities())
using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(uri))
using (StreamReader reader = new StreamReader(stream))
{
while (reader.EndOfStream == false)
{
line = reader.ReadLine();
lineArray = line.Split(',');
// if it isn't the very first line
if (lineArray[0] != "Date")
{
// set the data for each array here
date = Helper.parseDateTime(lineArray[0]);
open = Helper.parseDecimal(lineArray[1]);
high = Helper.parseDecimal(lineArray[2]);
low = Helper.parseDecimal(lineArray[3]);
close = Helper.parseDecimal(lineArray[4]);
volume = Helper.parseInt(lineArray[5]);
adjClose = Helper.parseDecimal(lineArray[6]);
switch (market)
{
case "nasdaq":
DailyNasdaqData nasdaqData = new DailyNasdaqData();
var nasdaqQuery = from r in entity.DailyNasdaqDatas.AsParallel().AsEnumerable()
where r.Date == date
select new StockData { Close = r.AdjustedClose };
List<StockData> nasdaqResult = nasdaqQuery.AsParallel().ToList(); // hits this line
break;
default:
break;
}
}
}
// now save everything
entity.SaveChanges();
}
}
);
}
Async lambdas work like async methods in one regard: They do not complete synchronously but they return a Task. In your parallel loop you are simply generating tasks as fast as you can. Those tasks hold onto memory and other resources such as DB connections.
The simplest fix is probably to just use synchronous database commits. This will not result in a loss of throughput because the database cannot deal with high amounts of concurrent DML anyway.
New to multi-threded apps.
I am trying to create a console app to check a given list of IP addresses (intranet). Each web page for any given IP address contains some stats, displayed in an html table, that I need to collect.
I can do this in a single thread: set up the request/response sequence, get the page content and parse it.
What I am struggling with right now is to make this multi-threaded since I have to deal with 4000 IP addresses and single thread would take some time. I have the list of IPs in a list or array of strings; do you know how I can set up the threads?
Assuming I have a function that processes the response, say, "ProcessResponse(string s)", and want to start with 10 threads, can I start with something like:
public class PASSServer
{
private string _ip;
public string IPAddress
{
get;
set;
}
public PASSServer()
{
}
}
static void Main(string[] args)
{
int iNumThreads = 3;
Thread[] threads = new Thread[iNumThreads];
string[] sIPs = { "192.168.10.20", "192.168.10.21", "192.168.10.22" };
for (int i = 0; i < threads.Length; i++)
{
ParameterizedThreadStart start = new ParameterizedThreadStart(Start);
threads[i] = new Thread(start);
PASSServer pserver = new PASSServer();
pserver.IPAddress = sIPs[i];
threads[i].Start(pserver);
}
Console.WriteLine("DONE");
Console.ReadKey();
}
static void Start(object info)
{
PASSServer pserver = (PASSServer)info;
crawl(pserver.IPAddress);
}
private static void crawl(string sUrl)
{
PASSData cData = new PASSData();
string sRequestUrl = "http://" + sUrl.Trim() + "/cgi-bin/sysstat?";
string sEncodingType = "utf-8";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(sRequestUrl);
request.KeepAlive = true;
request.Timeout = 15 * 1000;
System.Net.HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string sStatus = ((HttpWebResponse)response).StatusDescription;
sEncodingType = GetEncodingType(response);
System.IO.StreamReader reader = new System.IO.StreamReader(response.GetResponseStream(), Encoding.GetEncoding(sEncodingType));
// Read the content.
string responseFromServer = reader.ReadToEnd();
Console.WriteLine(responseFromServer);
}
Any help is greatly appreciated.
I have not used multi threading but googled the subject and got some ideas just not sure how best to set up my scenario.
Don't use threads. Use asynchronous HTTP requests. For example, use HttpWebRequest.BeginGetResponse or perhaps HttpWebRequest.GetResponseAsync. Limit the number of concurrent requests using a Semaphore.
So, if you have a list of URLs (a List<string>) and you want a maximum of 10 concurrent requests:
List<string> _urls = GetListOfUrls();
Semaphore _requestSemaphore = new Semaphore(10, 10);
foreach (var url in _urls)
{
// wait for an available spot
_requestSemaphore.WaitOne();
// Now start an asynchronous request with this url
var request = (HttpWebRequest)WebRequest.Create(url);
request.BeginGetResponse(GetResponseCallback, request);
}
When your list is empty, you have to wait for the final responses to be received. The way you do that is to wait on the semaphore 10 times. When you've got 10, then there can't be any outstanding requests:
for (int i = 0; i < 10; ++i)
{
_requestSemaphore.WaitOne();
}
And your callback, which is called when a response is received:
void GetResponseCallback(IAsyncResult ar)
{
var request = (HttpWebRequest)ar.AsyncState;
var response = (HttpWebResponse)request.EndGetResponse(ar);
// process the response here.
// when you're done processing the response, release the semaphore
_requestSemaphore.Release();
}
I would loop through your list of IP addresses and start a ThreadPool work item.
foreach(string addr in IpAddresses)
Threading.ThreadPool.QueueUserWorkItem(
(string ipaddr) =>
{
ResponseFromQuery resp = new ResponseFromQuery();
this.BeginInvoke(new MethodInvoker(() => { UpdateTable(resp); }));
}, addr);
*EDIT: Above, you will need to call BeginInvoke and create a methodinvoker that calls back to a new method in your application call UpdateTable. You can pass in your response information (whatever type it is, I used a made up ResponseFromQuery class for example).
You can use either an anonymous function or, if there is a lot of code and you might use it elsewhere, you could create a processing class and method that you can pass as your method that you want executed.
If you wanted to manage your threads yourself, you can create a Dictionary or List object and add a thread to that for each item in your collection:
Dictionary<string, Thread> _threads = new Dictionary<string, Thread>();
foreach (string addr in IpAddresses)
{
_threads.Add(addr, new System.Threading.Thread(
new System.Threading.ParameterizedThreadStart(
(object ip) =>
{
// process ip.
}, addr)));
_threads[addr].Start();
}