C# Error on multithreading webrequests - c#

i am trying to make webrequests with multiple threads but if i try with more than 2 i get error
Index was outside the bonds of the array
on this line:
string username = ScrapeBox1.Lines[NamesCounter].ToString();
Here's the code:
while (working)
{
while (usernamescount > NamesCounter)
{
string username = ScrapeBox1.Lines[NamesCounter].ToString();
string url = "http://www.someforum.com/members/" + username + ".html";
var request = (HttpWebRequest)(WebRequest.Create(url));
var response = request.GetResponse();
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20100101 Firefox/16.0";
using (var responseStream = response.GetResponseStream())
{
using (var responseStreamReader = new StreamReader(responseStream))
{
var serverResponse = responseStreamReader.ReadToEnd();
int startpoint = serverResponse.IndexOf("Contact Info</span>");
try
{
string strippedResponse = serverResponse.Remove(0, startpoint);
ExtractEmails(strippedResponse);
}
catch { }
}
}
NamesCounter++;
textBox1.Text = NamesCounter.ToString();
}
}

This code is not thread safe.
You need the code for performing an HttpWebRequest to be atomic and outside the context of looping through the collection.
For example
public void MakeHttpWebRequest(string userName)
{
string url = "http://www.someforum.com/members/" + userName + ".html";
var request = (HttpWebRequest)(WebRequest.Create(url));
var response = request.GetResponse();
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:16.0) Gecko/20100101 Firefox/16.0";
using (var responseStream = response.GetResponseStream())
{
using (var responseStreamReader = new StreamReader(responseStream))
{
var serverResponse = responseStreamReader.ReadToEnd();
int startpoint = serverResponse.IndexOf("Contact Info</span>");
try
{
string strippedResponse = serverResponse.Remove(0, startpoint);
ExtractEmails(strippedResponse);
}
catch { }
}
}
}
Assuming ScrapeBox.Lines implements IEnumerable, I would reccomend using Parallel.ForEach and passing the ScrapeBox.Lines as the IEnumerable over which to iterate.
Now, there is one additional problem, the code for reading the response from the HttpWebRequest still needs to write its output to a shared location. To accomplish that in a thread safe manner. A common way to do this is with a semaphore. You need a object accessible to each thread instance. A class level private variable private object sharedMutex = new object(); would work. Then the code ExtractEmails(strippedResponse); should be changed to
lock(sharedMutex)
{
ExtractEmails(strippedResponse);
}
Without having the code for the ExtractEmails(<string>) method, I can't provide a thread safe implementation for that, so that part of the solution may still cause a problem.

Related

Not able to download zip file using httpwebrequest through c# code. It gets downloaded through browser

I want to download the zip file from one of the website https://eqrreportviewer.ferc.gov/. The way in which the zip file gets downloaded is that you click on the filing inquiries tab first. In the reportType dropdown select SubmissionsBydate and in export dropdown select CSV. Now click on submit button and the zip file gets downloaded. I want to automate this process. I have written a code in C# by capturing the request along with its headers and passing that details to the site, but I am not able to download the file through code.
This is the code that I have written:
public static string PageSourceCode { get; set; }
//The ASP.NET SessionID to add validation to posts
public static string SessionID { get; set; }
//The value we are posting to the page on subsequent calls
public static string PostBackValue { get; set; }
public static string AcquisitionURL = "https://eqrreportviewer.ferc.gov";
static void Main(string[] args)
{
Acquire();
}
private static void Acquire()
{
GetLandingPage();
PopulatePostBackValueForSubmitBtn();
PostToPageForSubmitBtn();
}
private static void GetLandingPage()
{
string mainPageOutput = string.Empty;
HttpWebRequest objRequestLandingPage = (HttpWebRequest)WebRequest.Create(AcquisitionURL);
objRequestLandingPage.Method = WebRequestMethods.Http.Get;
objRequestLandingPage.Headers.Add("Cache-Control", "max-age=0");
objRequestLandingPage.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
objRequestLandingPage.Headers.Add("Accept-Encoding", "gzip, deflate, br");
objRequestLandingPage.Headers.Add("Accept-Language", "en-US,en;q=0.9");
objRequestLandingPage.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36";
objRequestLandingPage.Headers.Add("Sec-Fetch-Dest", "document");
objRequestLandingPage.Headers.Add("Sec-Fetch-Mode", "navigate");
objRequestLandingPage.Headers.Add("Sec-Fetch-Site", "none");
objRequestLandingPage.Headers.Add("Sec-Fetch-User", "?1");
objRequestLandingPage.Headers.Add("Upgrade-Insecure-Requests", "1");
//objRequestLandingPage.Headers.Add("Connection", "keep-alive");
objRequestLandingPage.KeepAlive = true;
objRequestLandingPage.Host = "eqrreportviewer.ferc.gov";
using (WebResponse objResponseLandingPage = objRequestLandingPage.GetResponse())
{
WebHeaderCollection headers = objResponseLandingPage.Headers;
using (Stream streamLandingPage = objResponseLandingPage.GetResponseStream())
using (StreamReader streamReaderLandingPage = new StreamReader(streamLandingPage))
{
mainPageOutput = streamReaderLandingPage.ReadToEnd();
}
SessionID = headers["Set-Cookie"];
}
SessionID = StripCookie(SessionID);
//Set the source code of the page
PageSourceCode = mainPageOutput;
}
private static void PopulatePostBackValueForSubmitBtn()
{
if (!String.IsNullOrEmpty(PageSourceCode))
{
// get fields from landing page
Dictionary<string, string> formFields = GetFormFields(PageSourceCode);
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelSummaryReports$ddlReportTypeSum"] = "0";
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelSummaryReports$ddlReportPeriodSum"] = "650";
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelSummaryReports$ListSearchExtender1_ClientState"] = String.Empty;
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelFilingInquiries$ddlReportType"] = "4";
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelFilingInquiries$txtFromSubmissionDate"] = System.DateTime.Now.Date.AddDays(-30).ToShortDateString();
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelFilingInquiries$txtToSubmissionDate"] = System.DateTime.Now.Date.ToShortDateString();
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelFilingInquiries$ddlExport"] = "2";
formFields["TabContainerReportViewer$TabPanelReporting$TabContainerReports$TabPanelFilingInquiries$btnSubmitOptional"] = "Submit";
formFields["TabContainerReportViewer$TabPanelDownloads$TabContainerDownloads$TabPanelSelectiveFilings$txtCID"] = String.Empty;
formFields["TabContainerReportViewer$TabPanelDownloads$TabContainerDownloads$TabPanelSelectiveFilings$txtFilingOrg"] = String.Empty;
formFields["TabContainerReportViewer$TabPanelDownloads$TabContainerDownloads$TabPanelSelectiveFilings$ddlQuarter"] = "Pick";
formFields["TabContainerReportViewer$TabPanelDownloads$TabContainerDownloads$TabPanelSelectiveFilings$ddlDownloadType"] = "CSV";
formFields["TabContainerReportViewer$TabPanelDownloads$TabContainerDownloads$TabPanelSelectiveFilings$txtName"] = String.Empty;
formFields["TabContainerReportViewer$TabPanelDownloads$TabContainerDownloads$TabPanelSelectiveFilings$txtEmail"] = String.Empty;
formFields["__EVENTTARGET"] = String.Empty;
formFields["__EVENTARGUMENT"] = String.Empty;
formFields["__LASTFOCUS"] = String.Empty;
formFields["__AjaxControlToolkitCalendarCssLoaded"] = String.Empty;
formFields["TabContainerReportViewer_ClientState"] = "{\"ActiveTabIndex\" : 0,\"TabState\": [true,true]}";
formFields["TabContainerReportViewer_TabPanelReporting_TabContainerReports_ClientState"] = "{\"ActiveTabIndex\" : 1,\"TabState\": [true,true]}";
formFields["TabContainerReportViewer_TabPanelDownloads_TabContainerDownloads_ClientState"] = "{\"ActiveTabIndex\" : 0,\"TabState\": [true,true]}";
formFields["__VIEWSTATE"] = ViewState;
formFields["__VIEWSTATEGENERATOR"] = ViewStateGenerator;
formFields["__VIEWSTATEENCRYPTED"] = ViewStateEncrypted;
string postString = FormatPostString(formFields);
PostBackValue = postString;
}
}
private static void PostToPageForSubmitBtn()
{
HttpWebRequest objRequestPostPage = (HttpWebRequest)WebRequest.Create(AcquisitionURL);
objRequestPostPage.Method = WebRequestMethods.Http.Post;
objRequestPostPage.ContentLength = PostBackValue.Length;
objRequestPostPage.ContentType = "application/x-www-form-urlencoded";
objRequestPostPage.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
objRequestPostPage.KeepAlive = true;
objRequestPostPage.Host = "eqrreportviewer.ferc.gov";
objRequestPostPage.Headers.Add("Cache-Control", "max-age=0");
objRequestPostPage.Headers.Add("Sec-Fetch-Dest", "document");
objRequestPostPage.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36";
objRequestPostPage.Headers.Add("Origin", "https://eqrreportviewer.ferc.gov");
objRequestPostPage.Headers.Add("Sec-Fetch-Site", "same-origin");
objRequestPostPage.Headers.Add("Sec-Fetch-Mode", "navigate");
objRequestPostPage.Referer = "https://eqrreportviewer.ferc.gov/";
objRequestPostPage.Headers.Add("Accept-Encoding", "gzip, deflate,br");
objRequestPostPage.Headers.Add("Accept-Language", "en-US,en;q=0.9");
//Pass in the ASP.NET Session ID
objRequestPostPage.Headers.Add("Cookie", SessionID);
objRequestPostPage.Headers.Add("Upgrade-Insecure-Requests", "1");
objRequestPostPage.Headers.Add("Sec-Fetch-User", "?1");
objRequestPostPage.ServicePoint.Expect100Continue = false;
StreamWriter streamWriterPostPage = new StreamWriter(objRequestPostPage.GetRequestStream());
//Post the arguments
streamWriterPostPage.Write(PostBackValue);
streamWriterPostPage.Close();
//Get response
HttpWebResponse responsePostPage = (HttpWebResponse)objRequestPostPage.GetResponse();
WebHeaderCollection responseHeaders = responsePostPage.Headers;
Stream responseStream = responsePostPage.GetResponseStream();
StreamReader reader = new StreamReader(responseStream);
PageSourceCode = reader.ReadToEnd();
using (FileStream file = new FileStream(#"C:\Test\test.csv", FileMode.Create, FileAccess.Write))
{
WriteFile(responseStream, file);
}
}
Can anyone let me know if there is something wrong that I am doing. Right now all the values are hard coded but if it works I can organize that properly.
Also I don't get the Content Disposition response header in the response that I am getting but I get this header when its gets runned from Chrome browser.
What is the code that I can do differently or if I am missing something?
Any help/suggestion would be great help moving forward with this issue.
I was not able to do this using C#.
Finally I used python in combination with selenium and chrome web driver to get the task done.
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
options.add_experimental_option("prefs", {"download.default_directory":"/databricks/driver"})
driver = webdriver.Chrome(chrome_options=options)
driver.implicitly_wait(5)
url = "https://eqrreportviewer.ferc.gov/"
driver.get(url)
driver.implicitly_wait(5)
#Filing Inquiries
driver.find_element_by_xpath('//*[#id="__tab_TabContainerReportViewer_TabPanelReporting_TabContainerReports_TabPanelFilingInquiries"]').click()
driver.implicitly_wait(5)
#Submission by Date
driver.find_element_by_xpath('//*[#id="TabContainerReportViewer_TabPanelReporting_TabContainerReports_TabPanelFilingInquiries_ddlReportType"]/option[5]').click()
driver.implicitly_wait(5)
#CSV
driver.find_element_by_xpath('//*[#id="TabContainerReportViewer_TabPanelReporting_TabContainerReports_TabPanelFilingInquiries_ddlExport"]/option[2]').click()
driver.implicitly_wait(15)
#Submit
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
element = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '//*[#id="TabContainerReportViewer_TabPanelReporting_TabContainerReports_TabPanelFilingInquiries_btnSubmitOptional"]')))
element.click()
driver.implicitly_wait(15) #putting wait here to make sure file gets downloaded before driver is stopped.
driver.quit()

C# Httpwebrequest Log In System

The following code should log user in, but it does not work. The code uses 9gag as an example, but the general idea should work elsewhere too. The user does not get access his/hers profile.
What is wrong with the code?
using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
namespace ttp_B
{
internal class Program
{
private static void Main(string[] args)
{
bool flag = false;
//string word_to_stop = "profile";
string urltosite = "https://9gag.com/login"; // site that i'm trying to log in
string emailaddress = "";
string password = "";
var coo = new System.Net.CookieContainer();
try
{
var request = System.Net.WebRequest.Create(urltosite) as System.Net.HttpWebRequest;
request.CookieContainer = coo;
request.Method = "POST";
request.Proxy = new WebProxy("127.0.0.1", 8888); // fiddler
//some extra headers
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0";
using (var stream = request.GetRequestStream())
{
byte[] buffer =
Encoding.UTF8.GetBytes(
string.Format(
"csrftoken=&next=http%3A%2F%2F9gag.com%2F&location=1&username={0}&password={1}", emailaddress, password));
//this text of request is correct I guess. Got it from fiddler btw.
stream.Write(buffer, 0, buffer.Length);
stream.Close();
}
using (var response = request.GetResponse() as HttpWebResponse)
{
coo.Add(response.Cookies); // adding cookies, just to make this working properly
using (var sr = new System.IO.StreamReader(response.GetResponseStream()))
{
string http_code = sr.ReadToEnd(); // gettin' that new html document
//flag = (http_code.Contains(word_to_stop)); // looking for word that I'm sure exist after succesfull loggin' in
//if(flag == true)
//{
// console.writeline("Works");
//}
}
}
}
catch (WebException e)
{
Console.Write(e.ToString());
}
}
}
}
As far as I can see, the request stream isn't closed before you go for the response.
simplified it should look like this:
var stream = request.GetRequestStream();
tStream(buffer, 0, buffer.Length);
//close the stream
tStream.Close();
//go for the response
request.GetResponse();

Using a Shared HTTPWebRequst Property

Is it possible to change the URI of a HttpWebRequest after it's been set? I only ask because if you see my code below I am setting the CookieContainer, and the UserAgent. If I was to set the shared client property to a new instance of a HttpWebRequest later in the code would I have to reset the UserAgent and CookieContainer?
The reason I wanted a shared HttpWebRequest property is so that I don't have to set these variables each time I make a request.
public MyAPI(String username, String password)
{
this.username = username;
this.password = password;
this.cookieContainer = new CookieContainer();
this.client = (HttpWebRequest)WebRequest.Create("http://mysite.com/api");
this.client.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0";
this.client.CookieContainer = this.cookieContainer;
}
private async Task<bool> initLoginTokens()
{
using (WebResponse response = await client.GetResponseAsync())
using (Stream responseStream = response.GetResponseStream())
using (StreamReader stream = new StreamReader(responseStream))
{
CsQuery.CQ dom = CsQuery.CQ.Create(stream.ReadToEnd());
tt = dom.Select("input[name='tt']").Attr("value");
dn = dom.Select("input[name='dn']").Attr("value");
pr = dom.Select("input[name='pr']").Attr("value");
if (tt == null || dn == null || pr == null) {
return false;
} else {
return true;
}
}
}
public async Task<string> LoginAsync()
{
if(! await initLoginTokens())
{
// Throw exception. Login tokens not set.
}
// Here I need to make another request, but utilizing the same HTTPWebRequest client if possible.
}
No the request URI cannot be changed once set. Its readonly. You will have to re-initialize your variable.

Requesting Definitions Using the Wordnik API

I've only worked with APIs in a very minimal sense so I've been wanting to try out how to do this for some time. Ok so this is what I have so far and it works but it returns everything of the definition. So I have a few questions:
Is there a way to request just the definitions without anything else?
Do I just parse the data? I saw in the Wordnik API and I can include XML tags...so can I use an XMLReader to grab the definitions?
Now how about requesting both the definitions and if it is noun/verb/etc at once?
The ultimate goal would be to create a list of definitions that I could do stuff with. Any help would be greatly appreciated. Here's my code so far:
class Program
{
static void Main(string[] args)
{
string apiKey = "***************";
string wordToSearch = "";
do
{
Console.Write("Please type a word to get the definition: ");
wordToSearch = Console.ReadLine();
if (!wordToSearch.Equals("q"))
{
string url = "http://api.wordnik.com/v4/word.json/" + wordToSearch + "/definitions?api_key=" + apiKey;
WebRequest request = WebRequest.Create(url);
request.Method = "GET";
request.ContentType = "application/json";
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(stream);
string responseFromWordnik = reader.ReadToEnd();
Console.WriteLine(responseFromWordnik);
}
}
}
} while (!wordToSearch.Equals("q"));
}
}
thanks,
Justin
Here's an example to get definitions for a word. You need to replace the api key with your own api key.
public class Word
{
public string word { get; set; }
public string sourceDictionary { get; set; }
public string partOfSpeech { get; set; }
public string text { get; set; }
}
public class WordList
{
public List<Word> wordList { get; set; }
}
string url = "http://api.wordnik.com:80/v4/word.json/" + word + "/definitions?limit=200&includeRelated=false&sourceDictionaries=all&useCanonical=false&includeTags=false&api_key=a2a73e7b926c924fad7001ca3111acd55af2ffabf50eb4ae5";
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = WebRequestMethods.Http.Get;
webRequest.Accept = "application/json";
webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36";
webRequest.Referer = "http://developer.wordnik.com/docs.html";
webRequest.Headers.Add("Accept-Encoding", "gzip, deflate, sdch");
webRequest.Headers.Add("Accept-Language", "en-US,en;q=0.8");
webRequest.Host = "api.wordnik.com";
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();
string enc = webResponse.ContentEncoding;
using (Stream stream = webResponse.GetResponseStream())
{
StreamReader reader = new StreamReader(stream, Encoding.UTF8);
String responseString = "{\"wordList\":" + reader.ReadToEnd() + "}";
if (responseString != null)
{
JavaScriptSerializer ser = new JavaScriptSerializer();
WordList words = ser.Deserialize<WordList>(responseString);
}
}
The API documentation will probably tell you that.
Yes, parse the data. If the data is coming down as XML, then you can parse it with an XMLReader, or you can load it into an XMLDocument. It looks like you're asking for JSON, though. If so, you'll want a JSON parser. Check out Json.Net.
Again, check out the API documentation.
Their documentation page is suspiciously sparse. You'll probably get better response on their Google group or one of the other sources listed on their support page.

Dotnet webclient timesout but browser works file for json webservice

I am trying to get the result of the following json webservice https://mtgox.com/code/data/getDepth.php into a string using the following code.
using (WebClient client = new WebClient())
{
string data = client.DownloadString("https://mtgox.com/code/data/getDepth.php");
}
but it always returns a timeout exception and no data. I plan to use fastjson to turn the response into objects and expected that to be that hard part not the returning of the content of the page.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://mtgox.com/code/data/getDepth.php");
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
string data = sr.ReadToEnd();
}
}
Also resulted in the same error. Can anyone point out what i am doing wrong?
Hmm, strage, this works great for me:
class Program
{
static void Main()
{
using (var client = new WebClient())
{
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0";
var result = client.DownloadString("https://mtgox.com/code/data/getDepth.php");
Console.WriteLine(result);
}
}
}
Notice that I am specifying a User Agent HTTP header as it seems that the site is expecting it.
I had similar issue before. request.KeepAlive = false solved my problem. Try this:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://mtgox.com/code/data/getDepth.php");
request.KeepAlive = false;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
string data = sr.ReadToEnd();
}
}

Categories