Screen Scraping - By pass Captcha Validation by code [ traffic issue ] - c#

I am doing one screen scraping project in asp.net using c#, And I can scrap the screen successfully.
But I have to make one by one multiple requests to the targeted server, but after some time server redirects to Captcha validation page, at that time I stuck.
Here is my code :
public static string SearchPage(Uri url, int timeOutSeconds)
{
StringBuilder sb = new StringBuilder();
try
{
string place = HttpUtility.ParseQueryString(url.Query).Get("destination").Split(':')[1];
string resultID = HttpUtility.ParseQueryString(url.Query).Get("resultID");
string checkin = HttpUtility.ParseQueryString(url.Query).Get("checkin").Replace("-", "");
string checkout = HttpUtility.ParseQueryString(url.Query).Get("checkout").Replace("-", "");
string Rooms = HttpUtility.ParseQueryString(url.Query).Get("Rooms");
string adults_1 = HttpUtility.ParseQueryString(url.Query).Get("adults_1");
string languageCode = "EN";
string currencyCode = "INR";
string ck = "languageCode=" + languageCode + "; a_aid=400; GcRan=1; __RequestVerificationToken=IHZjc7KM_LbUXRypz02LoK4wmeLNcmRpIr-6vmPl5eNepILScAc15vn0TgQJtmABgedDy8xz4bnkqC30_zUGE1A1SaA1; Analytics=LandingID=place:77469:0m&LanguageCode=" + languageCode + "&WebPageID=9; Tests=165F000901000A1100F81000FE110100000102100103100104000105100052; dcid=dal05; currencyCode=" + currencyCode + "; countryCode=" + languageCode + "; search=place:" + place + "#" + checkin + "#" + checkout + "#" + adults_1 + "; SearchHistory=" + place + "%" + checkin + "%" + checkout + "%" + adults_1 + "%" + currencyCode + "%%11#" + place + "%" + checkin + "%" + checkout + "%" + adults_1 + "%" + currencyCode + "%%" + resultID + "#; visit=date=2015-11-23T18:26:05.4922127+11:00&id=45111733-acef-47d1-aed3-63cef1a60591; visitor=id=efff4190-a4a0-41b5-b807-5d18e4ee6177&tracked=true";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.Timeout = timeOutSeconds * 1000;
request.ReadWriteTimeout = timeOutSeconds * 1000;
request.KeepAlive = true;
request.Method = "GET";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36 OPR/33.0.1990.115";
request.Headers.Add("Accept-Language", "en-US,en;q=0.8");
request.Headers.Add("Cookie", ck);
request.Headers.Add("Upgrade-Insecure-Requests", "1");
request.CachePolicy = new RequestCachePolicy(RequestCacheLevel.NoCacheNoStore);
StreamReader reader = new StreamReader(request.GetResponse().GetResponseStream());
string line = reader.ReadToEnd();
sb.Append(line);
sb.Replace("<br/>", Environment.NewLine);
sb.Replace("\n", Environment.NewLine);
sb.Replace("\t", " ");
reader.Close();
reader.Dispose();
request.Abort();
}
catch (Exception ex)
{
//throw ex;
}
return sb.ToString();
}
This code works successfully but after some requests it stuck because may be server allows some limited requests.

Related

Parse only specific text/int

So i need some help with this.
I have an server-log where i need to filter out the error codes (404) from the log.
What i have so far cuts the error codes from the log but it still also displays the succesful connection codes (200) which i don't want.
I'm new to c# so any help is needed.
This is what i have:
private void btnOpen_Click(object sender, EventArgs e)
{
openFileDialog1.ShowDialog();
string filename = openFileDialog1.FileName;
StreamReader streamreader = new StreamReader(filename);
string value = filename;
while (!streamreader.EndOfStream)
{
string data = bestand.ReadLine();
// Split the data to keep only the error codes
string[] errorcodeArray = data.Split('"');
string trim = Regex.Replace(errorcodeArray[2], #"", "");
// Trim to keep only the 3 figure codes
trim = trim.Substring(0, trim.IndexOf(" ") + 5);
txtLog.Text += Environment.NewLine + data;
txtError.Text += Environment.NewLine + trim;
// Couldn't get the 404's out of this.
}
streamreader.Close();
Log-sample:
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
109.169.248.247 - - [12/Dec/2015:18:25:11 +0100] "POST /administrator/index.php HTTP/1.1" 200 4494 "almhuette-raith.at/administrator" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
46.72.177.4 - - [12/Dec/2015:18:31:08 +0100] "GET /administrator/ HTTP/1.1" 200 4263 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"
One-liner:
var lines = File.ReadAllLines(/*path*/);
var result = lines.Select(x=> Regex.Replace(x, #"HTTP/1.1"" \d+ ", #"HTTP/1.1"" "));
It will filter out all codes.
For just 200 and 404:
var result = lines.Select(x=> Regex.Replace(x, #"HTTP/1.1"" (200|404) ", #"HTTP/1.1"" "));
I believe you only want the "404" codes, and you don't show any of these in your example. if they are the same format this should work:
openFileDialog1.ShowDialog();
string filename = openFileDialog1.FileName;
var rows = File.ReadAllLines(filename);
var results = rows.Where(r => r.Split('"')[2].Trim().StartsWith("404"));
If the log file is very large and you don't want to read it all in one go, you should do the test in your loop. Here is a complete example of how to do it in a loop:
openFileDialog1.ShowDialog();
string filename = openFileDialog1.FileName;
string data;
//using a string builder to concat strings is much more efficient
StringBuilder sbLog = new StringBuilder();
StringBuilder sbError = new StringBuilder();
using (StreamReader file = new StreamReader(filename))
{
while ((data = file.ReadLine()) != null)
{
if (data.Split('"')[2].Trim().StartsWith("404"))
{
sbLog.Append(data + Environment.NewLine);
sbError.Append(data.Split('"')[2].Trim().Substring(0, 3) + Environment.NewLine);
}
}
}
txtLog.Text = sbLog.ToString();
txtError.Text += sbError.ToString();

c# Unable to parse xml, receiving error 463

Basically I am trying to parse xml from this However I recieve {"The remote server returned an error: (463)."} (System.Net.WebException) The error happens in string xml = webClient2.DownloadString(address);
Here is my full code
Task.Run((Action)(() =>
{
XmlDocument xmlDocument = new XmlDocument();
using (WebClient webClient1 = new WebClient())
{
WebClient webClient2 = webClient1;
Uri address = new Uri("https://habbo.com/gamedata/furnidata_xml/1");
string xml = webClient2.DownloadString(address);
xmlDocument.LoadXml(xml);
}
foreach (XmlNode xmlNode1 in xmlDocument.GetElementsByTagName("furnitype"))
{
string nr1 = "[" + xmlNode1.Attributes["id"].Value + "]";
string nr2 = " : " + xmlNode1.Attributes["classname"].InnerText;
foreach (XmlNode xmlNode2 in xmlNode1)
{
XmlNode childNode = xmlNode2;
if (childNode.Name == "name")
{
this.FurniCB.Invoke((Action)(() => this.FurniCB.Items.Add((object)(nr1 + nr2 + " : " + childNode.InnerText))));
this.FurniDataList.Add(nr1 + nr2 + " : " + childNode.InnerText);
}
}
}
}));
Thanks in advance
I tested your code's downloading part. All you need is to add User-Agent header to the request..
webClient1.Headers.Add("User-Agent", "Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30");

Programmatically reading emails from Exchange Server 2010 mailbox

we have a c# application that reads an email Inbox currently hosted on Exchange 2003 using the http service.
Now the mailbox is to be migrated to an Exchange 2010 server, so we are testing our code to confirm it will still work.
We are getting an error 'Bad request' with the below code (which tries to get all the mail):
public static XmlDocument GetUnreadMailAll()
{
HttpWebRequest loRequest = default(HttpWebRequest);
HttpWebResponse loResponse = default(HttpWebResponse);
string lsRootUri = null;
string lsQuery = null;
byte[] laBytes = null;
Stream loRequestStream = default(Stream);
Stream loResponseStream = default(Stream);
XmlDocument loXmlDoc = default(XmlDocument);
loXmlDoc = new XmlDocument();
try
{
lsRootUri = strServer + "/Exchange/" + strAlias + "/" + strInboxURL;
lsQuery = "<?xml version=\"1.0\"?>"
+ "<D:searchrequest xmlns:D = \"DAV:\" xmlns:m=\"urn:schemas:httpmail:\">"
+ "<D:sql>SELECT "
+ "\"urn:schemas:httpmail:to\", "
+ "\"urn:schemas:httpmail:displayto\", "
+ "\"urn:schemas:httpmail:from\", "
+ "\"urn:schemas:httpmail:fromemail\", "
+ "\"urn:schemas:httpmail:subject\", "
+ "\"urn:schemas:httpmail:textdescription\", "
//+ "\"urn:schemas:httpmail:htmldescription\", "
+ "\"urn:schemas:httpmail:hasattachment\", "
+ "\"urn:schemas:httpmail:attachmentfilename\", "
+ "\"urn:schemas:httpmail:senderemail\", "
+ "\"urn:schemas:httpmail:sendername\", "
+ "\"DAV:displayname\", "
+ "\"urn:schemas:httpmail:datereceived\", "
+ "\"urn:schemas:httpmail:read\", "
+ "\"DAV:id\" "
+ "FROM \"" + lsRootUri
+ "\" WHERE \"DAV:ishidden\" = false "
+ "AND \"DAV:isfolder\" = false "
+ "AND \"urn:schemas:httpmail:read\" = false "
+ "AND \"urn:schemas:httpmail:fromemail\" != 'emailAddy#domainName.co.uk' "
+ "</D:sql></D:searchrequest>";
loRequest = (HttpWebRequest)WebRequest.Create(lsRootUri);
loRequest.Credentials = new NetworkCredential(strUserName, strPassword);
loRequest.Method = "SEARCH";
laBytes = System.Text.Encoding.UTF8.GetBytes(lsQuery);
loRequest.ContentLength = laBytes.Length;
loRequestStream = loRequest.GetRequestStream();
loRequestStream.Write(laBytes, 0, laBytes.Length);
loRequestStream.Close();
loRequest.ContentType = "text/xml";
loRequest.Headers.Add("Translate", "F");
loResponse = (HttpWebResponse)loRequest.GetResponse();
loResponseStream = loResponse.GetResponseStream();
loXmlDoc.Load(loResponseStream);
loResponseStream.Close();
}
the exception is thrown on the line loResponseStream = loResponse.GetResponseStream();
here is the xml that we are sending:
<?xml version="1.0" ?>
- <D:searchrequest xmlns:D="DAV:" xmlns:m="urn:schemas:httpmail:">
<D:sql>SELECT "urn:schemas:httpmail:to", "urn:schemas:httpmail:displayto", "urn:schemas:httpmail:from", "urn:schemas:httpmail:fromemail", "urn:schemas:httpmail:subject", "urn:schemas:httpmail:textdescription", "urn:schemas:httpmail:hasattachment", "urn:schemas:httpmail:attachmentfilename", "urn:schemas:httpmail:senderemail", "urn:schemas:httpmail:sendername", "DAV:displayname", "urn:schemas:httpmail:datereceived", "urn:schemas:httpmail:read", "DAV:id" FROM "https://domain/Exchange/bbtest/Inbox" WHERE "DAV:ishidden" = false AND "DAV:isfolder" = false AND "urn:schemas:httpmail:read" = false AND "urn:schemas:httpmail:fromemail" != 'emailAddy#domainName.co.uk'</D:sql>
</D:searchrequest>
and from MSDN the answer is that WebDAV is deprecated after Exchange 2007, and replaced by Exchange Web Services
here are a couple of links:
MSDN Library: Get started with Exchange Web Services
OMEGACODER: Getting all emails from Exchange using Exchange Web Services
MSDN Code Downloads: Exchange - 101 samples

C# Base-64 error when logging into server

I was testing my own method of logging into a database last night for a program I am working on and everything was fine. When I tried logging into the database this morning I get a Base-64 error. Here is the error in its entirety.
The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
Before I put the program into production I need to fix this error but I cannot figure out what is causing it. Here is the login code (I am excluding the real server address and real encryption keys for security but it shouldn't make a difference).
private void login_Click(object sender, EventArgs e)
{
try
{
WebRequest request_user = WebRequest.Create("server address here");
WebResponse response_user = request_user.GetResponse();
StreamReader sr_user = new StreamReader(response_user.GetResponseStream());
string user = RC4.Decrypt("encryption key here", sr_user.ReadToEnd());
//show("Client: " + username.Text + "\nServer: " + user);
WebRequest request_pass = WebRequest.Create("server address here");
WebResponse response_pass = request_pass.GetResponse();
StreamReader sr_pass = new StreamReader(response_pass.GetResponseStream());
string pass = RC4.Decrypt("encryption key here", sr_pass.ReadToEnd());
//show("Client: " + password.Text + "\nServer: " + pass);
WebRequest request_key1 = WebRequest.Create("server address here");
WebResponse response_key1 = request_key1.GetResponse();
StreamReader sr_key1 = new StreamReader(response_key1.GetResponseStream());
string key1 = RC4.Decrypt("encryption key here", sr_key1.ReadToEnd());
//show("Client: " + RC4.Decrypt("encryption key here", AuthKey1) + "\nServer: " + key1);
WebRequest request_key2 = WebRequest.Create("server address here");
WebResponse response_key2 = request_key2.GetResponse();
StreamReader sr_key2 = new StreamReader(response_key2.GetResponseStream());
string key2 = RC4.Decrypt("encryption key here", sr_key2.ReadToEnd());
//show("Client: " + RC4.Decrypt("encryption key here", AuthKey2) + "\nServer: " + key2);
WebRequest request_ipv4 = WebRequest.Create("server address here");
WebResponse response_ipv4 = request_ipv4.GetResponse();
StreamReader sr_ipv4 = new StreamReader(response_ipv4.GetResponseStream());
string ipv4 = sr_ipv4.ReadToEnd();
//show("Client: " + IPAddress + "\nServer: " + ipv4);
if (user.Contains(username.Text) && pass.Contains(password.Text) && key1.Contains(RC4.Decrypt("encryption key here", AuthKey1)) && key2.Contains(RC4.Decrypt("encryption key here", AuthKey2)) && ipv4.Contains(IPAddress))
{
WebRequest request_tu = WebRequest.Create("server address here");
WebResponse response_tu = request_tu.GetResponse();
StreamReader sr_tu = new StreamReader(response_tu.GetResponseStream());
string tu = sr_tu.ReadToEnd();
show("Successfully logged into the Grand Theft Rape Server!\nCurrent TU: " + tu);
}
else
{
show("Username and/or Password incorrect!");
}
}
catch (Exception ex)
{
show(ex.Message);
}
}

ASP.NET - how to detect a MAC user

I am trying to detect a MAC user using c#. I have used the following code but it always says unknown when a mac user navigates to my site. It works great for windows users but not for MAC or anything else. Does anyone have any ideas how to pick up on mac users?
Thanks
HttpBrowserCapabilities moo = HttpContext.Current.Request.Browser;
StringBuilder sb = new StringBuilder();
sb.Append("<p>Browser Capabilities:</p>");
sb.Append("Type = " + moo.Type + "<br>");
sb.Append("Name = " + moo.Browser + "<br>");
sb.Append("Version = " + moo.Version + "<br>");
sb.Append("Major Version = " + moo.MajorVersion + "<br>");
sb.Append("Minor Version = " + moo.MinorVersion + "<br>");
sb.Append("Platform = " + moo.Platform + "<br>");
sb.Append("Is Beta = " + moo.Beta + "<br>");
sb.Append("Is Crawler = " + moo.Crawler + "<br>");
sb.Append("Is AOL = " + moo.AOL + "<br>");
sb.Append("Is Win16 = " + moo.Win16 + "<br>");
sb.Append("Is Win32 = " + moo.Win32 + "<br>");
sb.Append("Supports Frames = " + moo.Frames + "<br>");
sb.Append("Supports Tables = " + moo.Tables + "<br>");
sb.Append("Supports Cookies = " + moo.Cookies + "<br>");
sb.Append("Supports VB Script = " + moo.VBScript + "<br>");
sb.Append("Supports ActiveX Controls = " + moo.ActiveXControls + "<br>");
sb.Append("CDF = " + moo.CDF + "<br>");
You can extract OS information from Request.UserAgent.
Macintosh user agent strings are in this form:
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us)
AppleWebKit/528.16 (KHTML, like Gecko) Version/4.0 Safari/528.16"
"Mozilla/4.0 (compatible; MSIE 5.15; Mac_PowerPC)"
So you could do something like:
public bool IsMacOS(string userAgent)
{
var osInfo = userAgent.Split(new Char[] { '(', ')' })[1];
return osInfo.Contains("Mac_PowerPC") || osInfo.Contains("Macintosh");
}
you can use Request.UserAgent it will return something like this:
"Mozilla/5.0 (Windows; U; Windows NT 5.1; da; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13"..
then you will need to extract the OS
This might help you
http://www.javascripter.net/faq/operatin.htm
You should use native ASP.NET browser caps and just extend them.
What you do is just create App_Browsers/BrowserFile.browser file in your ASP.NET application.
And add this to the file:
<browsers>
<gateway id="MacOS" parentID="Safari">
<identification>
<userAgent match="Intel Mac OS X" />
</identification>
<capabilities>
<capability name="platform" value="MacOS" />
</capabilities>
</gateway>
</browsers>
Doing this will be enough for Browser.Platform to return "MacOS"

Categories