I am new to the programming and I have one concern when I am trying to read my html page content. I am not getting complete data but when I am saving the page with right click save as option I get complete data.
NOTE:
and inspect element also I am not able to see all data(I am able to see what I read using c#)
How can I read complete data using programming? Please help me to overcome this and following is my code:
private void button1_Click(object sender, EventArgs e)
{
string urlAddress = "http://iris-rmds.tomtomgroup.com
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
textBox1.Text = data;
MessageBox.Show(data.ToString());
//var msg = "Hello world!";
//MessageBox.Show(msg);
Clipboard.SetText(data);
response.Close();
readStream.Close();
}
}
Maybe your page is using JavaScript that will not be loaded completely at the beginning. This type of page is like Facebook or Twitter that will dynamically loading its content. By right click and save as, the content is already exist or loaded completely.
Maybe you can save it using method in the answer of the other question: "Scraping webpage generated by javascript with C#".
Related
I login to a website --> I read links from a text file(Which has working and broken links) --> I am using the HttpWebRequest and HttpWebResponse to detect the working and notWorking links ---Here is the main problem---> I am trying to separate the working and not working links and log into two different files. But everything goes into only one file. I am using the StatusCode property to distinguish between the two. But not successful. Please find the code below. I am very new to selenium C#.
public void Login()
{
WebDriver.Navigate().GoToUrl($"{ApplicationUrl}Login.aspx");
WebDriver.Manage().Window.Maximize();
WebDriver.FindElement(By.Id("UserName")).SendKeys(_user.Login);
WebDriver.FindElement(By.Id("tbPassword")).SendKeys(_user.Passwort);
IJavaScriptExecutor js = WebDriver as IJavaScriptExecutor;
js.ExecuteScript("arguments[0].click();",
WebDriver.FindElement(By.Id("LoginButton")));
Thread.Sleep(3000);
string fileName = #"TestFile.txt";
// Reading the text file using StreamReader
using (StreamReader sr = new StreamReader(fileName))
{
String line;
while ((line = sr.ReadLine()) != null)
{
WebDriver.Navigate().GoToUrl(line);
IsLinkWorking(line);
}
WorkingLinks.Close();
NotWorkingLinks.Close();
}
}
public void IsLinkWorking(string line)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(line);
try
{
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
WorkingLinks.WriteLine(WebDriver.Url);
// Releases the resources of the response.
response.Close();
}
else
{
NotWorkingLinks.WriteLine(WebDriver.Url);
response.Close();
}
}
catch
{ //TODO: Check for the right exception here
}
}
When HTTP status code outside of range 200-299 it will send to catch, set you code there
catch
{
NotWorkingLinks.WriteLine(WebDriver.Url);
}
I'm trying to get weather data from online as json and then deserialize the json into an object that I can use. Here's my code:
public static RootObject7 GetWeather7(int zip)
{
var url = "http://api.weatherunlocked.com/api/forecast/us." + zip.ToString() + "?app_id=xxxxxxx&app_key=xxxxxxxxxxxxxxxxxxxxxxx";
var weather = new wunlocked();
string json = weather.getJson(url);
JavaScriptSerializer serializer = new JavaScriptSerializer();
var data = (RootObject7)serializer.Deserialize<RootObject7>(json);
return data;
}
private string getJson(string url)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
try
{
WebResponse response = request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(responseStream, Encoding.UTF8);
return reader.ReadToEnd();
}
}
catch (WebException ex)
{
WebResponse errorResponse = ex.Response;
using (Stream responseStream = errorResponse.GetResponseStream())
{
StreamReader reader = new StreamReader(responseStream, Encoding.GetEncoding("utf-8"));
String errorText = reader.ReadToEnd();
}
throw;
}
}
I'm debugging, and what's happening is my RootObject7 data object is being created, and inside it has a "Forecast" object, which is supposed to contain a list of other information but instead it's null. I've already defined all of the classes (they're long, so if it's important, I'll post them but otherwise I don't think I need to). I've never done anything like this before so most of this came from other code examples on here that I've found, but obviously I didn't put them together correctly, since my object is always null but when I go to the url, there's valid xml there. I'm not sure if I need to be somehow converting the xml to json in my code, or if that is being done somehow? Like I said, I really don't know what I'm doing but if anyone has suggestions, that'd be great.
Try
dynamic data = serializer.Deserialize(json);
and then inspect the data object in the debugger - you may not need to deserialise to a fixed interface to get out the data you need. Using dynamic may also be a more robust solution to deal with upgrades to the service that may make a set interface/object more brittle.
I´m making repeated requests to a web server using HttpWebRequest, but I randomly get a 'broken' response stream in return. e.g it doesn´t contain the tags that I KNOW is supposed to be there. If I request the same page multiple times in a row it turns up 'broken' ~3/5.
The request always returns a 200 response so I first thought there was a null value inserted in the response that made the StreamReader think it reached the end.
I´ve tried:
1) reading everything into a byte array and cleaning it
2) inserting a random Thread.Sleep after each request
Is there any potentially bad practice with my code below or can anyone tell me why I´m randomly getting an incomplete response stream? As far as I can see I´m closing all unmanaged resources so that shouldn´t be a problem, right?
public string ReturnHtmlResponse(string url)
{
string result;
var request = (HttpWebRequest)WebRequest.Create(url);
{
using(var response = (HttpWebResponse)request.GetResponse())
{
Console.WriteLine((int)response.StatusCode);
var encoding = Encoding.GetEncoding(response.CharacterSet);
using(var stream = response.GetResponseStream())
{
using(var sr = new StreamReader(stream,encoding))
{
result = sr.ReadToEnd();
}
}
}
}
return result;
}
I do not see any direct flaws in you're code. What could be is that one of the 'Parent' using statements is done before the nested one. Try changing the using to a Dispose() and Close() method.
public string ReturnHtmlResponse(string url)
{
string result;
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();
Console.WriteLine((int)response.StatusCode);
var encoding = Encoding.GetEncoding(response.CharacterSet);
var stream = response.GetResponseStream();
var sr = new StreamReader(stream,encoding);
result = sr.ReadToEnd();
sr.Close();
stream.Close();
response.Close();
sr.Dispose();
stream.Dispose();
response.Dispose();
return result;
}
I'm making a WinForms project on C#/C++ (depending on the best way I could find to reach my goal, language could be changed). I need to get a page from website and parse it to get some information. I'm a very beginner in web programming with Visual C#/C++ and all the answers I found here are too complicated for me as a beginner. Could you help me to tell which standart classes should I use for getting page from Internet in some form and how to parse it then. I would be very pleased if you have any code examples, cause as I wrote above I have no experience in web coding and have no time to learn every term in detail. Thank you in advance.
You can use c# to download the specific webpage then do the analysis, an code example of downloading:
using System.Net;
using System.IO;
using System.Windows.Forms;
string result = null;
string url = "http://www.devtopics.com";
WebResponse response = null;
StreamReader reader = null;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create( url );
request.Method = "GET";
response = request.GetResponse();
ContentType contentType = new ContentType(response.ContentType);
Encoding encoding = Encoding.GetEncoding(contentType.CharSet);
reader = new StreamReader( response.GetResponseStream(), encoding);
result = reader.ReadToEnd();
}
catch (Exception ex)
{
// handle error
MessageBox.Show( ex.Message );
}
finally
{
if (reader != null)
reader.Close();
if (response != null)
response.Close();
}
Check out this project 'here' and their code examples 'here'
I need to check if a text file exists on a site on a different domain. The URL could be:
http://sub.somedomain.com/blah/atextfile.txt
I need to do this from code behind. I am trying to use the HttpWebRequest object, but not sure how to do it.
EDIT: I am looking for a light weight way of doing this as I'll be executing this logic every few seconds
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
"http://sub.somedomain.com/blah/atextfile.txt");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
// FILE EXISTS!
}
response.Close();
You could probably use the method used here:
http://www.eggheadcafe.com/tutorials/aspnet/2c13cafc-be1c-4dd8-9129-f82f59991517/the-lowly-http-head-reque.aspx
Something like this might work for you:
using (WebClient webClient = new WebClient())
{
try
{
using (Stream stream = webClient.OpenRead("http://does.not.exist.com/textfile.txt"))
{
}
}
catch (WebException)
{
throw;
}
}