Reading data from a website using C# - c#

I have a webpage which has nothing on it except some string(s). No images, no background color or anything, just some plain text which is not really that long in length.
I am just wondering, what is the best (by that, I mean fastest and most efficient) way to pass the string in the webpage so that I can use it for something else (e.g. display in a text box)? I know of WebClient, but I'm not sure if it'll do what I want it do and plus I don't want to even try that out even if it did work because the last time I did it took approximately 30 seconds for a simple operation.
Any ideas would be appreciated.

The WebClient class should be more than capable of handling the functionality you describe, for example:
System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData("http://www.yoursite.com/resource/file.htm");
string webData = System.Text.Encoding.UTF8.GetString(raw);
or (further to suggestion from Fredrick in comments)
System.Net.WebClient wc = new System.Net.WebClient();
string webData = wc.DownloadString("http://www.yoursite.com/resource/file.htm");
When you say it took 30 seconds, can you expand on that a little more? There are many reasons as to why that could have happened. Slow servers, internet connections, dodgy implementation etc etc.
You could go a level lower and implement something like this:
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://www.yoursite.com/resource/file.htm");
using (StreamWriter streamWriter = new StreamWriter(webRequest.GetRequestStream(), Encoding.UTF8))
{
streamWriter.Write(requestData);
}
string responseData = string.Empty;
HttpWebResponse httpResponse = (HttpWebResponse)webRequest.GetResponse();
using (StreamReader responseReader = new StreamReader(httpResponse.GetResponseStream()))
{
responseData = responseReader.ReadToEnd();
}
However, at the end of the day the WebClient class wraps up this functionality for you. So I would suggest that you use WebClient and investigate the causes of the 30 second delay.

If you're downloading text then I'd recommend using the WebClient and get a streamreader to the text:
WebClient web = new WebClient();
System.IO.Stream stream = web.OpenRead("http://www.yoursite.com/resource.txt");
using (System.IO.StreamReader reader = new System.IO.StreamReader(stream))
{
String text = reader.ReadToEnd();
}
If this is taking a long time then it is probably a network issue or a problem on the web server. Try opening the resource in a browser and see how long that takes.
If the webpage is very large, you may want to look at streaming it in chunks rather than reading all the way to the end as in that example.
Look at http://msdn.microsoft.com/en-us/library/system.io.stream.read.aspx to see how to read from a stream.

Regarding the suggestion
So I would suggest that you use WebClient and investigate the causes of the 30 second delay.
From the answers for the question
System.Net.WebClient unreasonably slow
Try setting Proxy = null;
WebClient wc = new WebClient();
wc.Proxy = null;
Credit to Alex Burtsev

If you use the WebClient to read the contents of the page, it will include HTML tags.
string webURL = "https://yoursite.com";
WebClient wc = new WebClient();
wc.Headers.Add("user-agent", "Only a Header!");
byte[] rawByteArray = wc.DownloadData(webURL);
string webContent = Encoding.UTF8.GetString(rawByteArray);
After getting the content, the html tags should be removed. Regex can be used for this:
var result= Regex.Replace(webContent, "<.*?>", String.Empty);
But this method is not very accurate, the better way is to install HtmlAgilityPack and use the following code:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webData);
string result = doc.DocumentNode.InnerText;
You say it takes 30 seconds, It has nothing to do with using WebClient (The main factor is internet connections or proxy). WebClient has worked very well for me. example

WebClient client = new WebClient();
using (Stream data = client.OpenRead(Text))
{
using (StreamReader reader = new StreamReader(data))
{
string content = reader.ReadToEnd();
string pattern = #"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)";
MatchCollection matches = Regex.Matches(content,pattern);
List<string> urls = new List<string>();
foreach (Match match in matches)
{
urls.Add(match.Value);
}
}

XmlDocument document = new XmlDocument();
document.Load("www.yourwebsite.com");
string allText = document.InnerText;

Related

Why does .NET Core HttpClient or WebClient take so much longer than Python

I am trying to retrieve content from a URL, I have tried .NET Core's HttpClient and WebClient, both of which take ~10 seconds to load this specific website.
However, when I use Python's urllib.request it loads within the same second. I have tried pretty much all the different combinations including: DownloadString, GetStringAsync, GetStreamAsync, GetAsync, OpenRead, etc.
I can provide the specific URL if needed. Any possible ideas?
Attempt #1
WebClient client = new WebClient();
Stream data = client.OpenRead("https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/dtpp/search/");
StreamReader reader = new StreamReader(data);
string s = await reader.ReadToEndAsync();
data.Close();
reader.Close();
return s;
Attempt #2
using (var wc = new HttpClient())
{
var test = wc.GetAsync("https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/dtpp/search/").Result;
var contents = test.Content.ReadAsStringAsync().Result;
return contents;
}
Attempt #3
using (var wc = new HttpClient())
{
HTML = await wc.GetStringAsync("https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/dtpp/search/");
return HTML;
}
All three attempts work, just take ~10 seconds everytime. If I run this the same sort of thing in Python it returns within the same second.
Python Version
with urllib.request.urlopen('https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/dtpp/search/') as response:
html = response.read()

Pulling CSV file from server and displaying on a site [duplicate]

Im trying to create a web service which gets to a URL e.g. www.domain.co.uk/prices.csv and then reads the csv file. Is this possible and how? Ideally without downloading the csv file?
You could use:
public string GetCSV(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(resp.GetResponseStream());
string results = sr.ReadToEnd();
sr.Close();
return results;
}
And then to split it:
public static void SplitCSV()
{
List<string> splitted = new List<string>();
string fileList = getCSV("http://www.google.com");
string[] tempStr;
tempStr = fileList.Split(',');
foreach (string item in tempStr)
{
if (!string.IsNullOrWhiteSpace(item))
{
splitted.Add(item);
}
}
}
Though there are plenty of CSV parsers out there and i would advise against rolling your own. FileHelpers is a good one.
// Download the file to a specified path. Using the WebClient class we can download
// files directly from a provided url, like in this case.
System.Net.WebClient client = new WebClient();
client.DownloadFile(url, csvPath);
Where the url is your site with the csv file and the csvPath is where you want the actual file to go.
In your Web Service you could use the WebClient class to download the file, something like this ( I have not put any exception handling, not any using or Close/Dispose calls, just wanted to give the idea you can use and refine/improve... )
using System.Net;
WebClient webClient = new WebClient();
webClient.DownloadFile("http://www.domain.co.uk/prices.csv");
then you can do anything you like with it once the file content is available in the execution flow of your service.
if you have to return it to the client as return value of the web service call you can either return a DataSet or any other data structure you prefer.
Sebastien Lorion's CSV Reader has a constructor that takes a Stream.
If you decided to use this, your example would become:
void GetCSVFromRemoteUrl(string url)
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
using (CsvReader csvReader = new CsvReader(response.GetResponseStream(), true))
{
int fieldCount = csvReader.FieldCount;
string[] headers = csvReader.GetFieldHeaders();
while (csvReader.ReadNextRecord())
{
//Do work with CSV file data here
}
}
}
The ever popular FileHelpers also allows you to read directly from a stream.
The documentation for WebRequest has an example that uses streams. Using a stream allows you to parse the document without storing it all in memory

C# WebClient StreamReader string replace not working

I want to read the response from the URI and modify it by replacing all S's to X's and to return that string back to client.
Below is my code, but replace is not working.
I downloaded the "response" string to check and there are lots of S characters.
Any idea why this is not working or how can I manipulate this ?
try
{
// open and read from the supplied URI
stream = webClient.OpenRead(uri);
reader = new StreamReader(stream);
response = reader.ReadToEnd();
response.Replace('S', 'X');
webClient.DownloadFile(uri, "C://Users//MyPC//Desktop//a.txt");
}
Thanks..
you can use webClient.DownloadString(uri)
like this:
string str = webClient.DownloadString(uri).Replace('S', 'X');
File.WriteAllText(#"C://Users//MyPC//Desktop//a.txt", str);

How to Download a file from a website programatically c#

I am developing a application for downloading mcx data from the website.It would be good if i
myself create an application and use it.
There is a datetimepicker in the website in which I want to select the date programatically
click the go button and later view in excel.when I click view on excel it downloads the file
of the data of the particular date. You can see this link and understand what i want to say.
http://www.mcxindia.com/sitepages/bhavcopy.aspx
There would be a great appreciation if someone could help me.
Thanks In Avance.
using System.Net;
WebClient webClient = new WebClient();
webClient.DownloadFile("http://mysite.com/myfile.txt", #"c:\myfile.txt");
but if the file is too large then you should use the async method.
check this code example http://www.csharp-examples.net/download-files/
You'll need to post your data to the server with your client request as explained by #Peter.
This is an ASP.net page, and therefore it requires that you send some data on postback in order to complete the callback.
Using google, I was able to find this as a proof of concept.
The following is a snippet I wrote in Linqpad to test it out. Here it is:
void Main()
{
WebClient webClient = new WebClient();
byte[] b = webClient.DownloadData("http://www.mcxindia.com/sitepages/BhavCopyDateWise.aspx");
string s = System.Text.Encoding.UTF8.GetString(b);
var __EVENTVALIDATION = ExtractVariable(s, "__EVENTVALIDATION");
__EVENTVALIDATION.Dump();
var forms = new NameValueCollection();
forms["__EVENTTARGET"] = "btnLink_Excel";
forms["__EVENTARGUMENT"] = "";
forms["__VIEWSTATE"] = ExtractVariable(s, "__VIEWSTATE");
forms["mTbdate"] = "11%2F15%2F2011";
forms["__EVENTVALIDATION"] = __EVENTVALIDATION;
webClient.Headers.Set(HttpRequestHeader.ContentType, "application/x-www-form-urlencoded");
var responseData = webClient.UploadValues(#"http://www.mcxindia.com/sitepages/BhavCopyDateWise.aspx", "POST", forms);
System.IO.File.WriteAllBytes(#"c:\11152011.csv", responseData);
}
private static string ExtractVariable(string s, string valueName)
{
string tokenStart = valueName + "\" value=\"";
string tokenEnd = "\" />";
int start = s.IndexOf(tokenStart) + tokenStart.Length;
int length = s.IndexOf(tokenEnd, start) - start;
return s.Substring(start, length);
}
There're many way to DownloadFile using WebClient
You must read this first
http://msdn.microsoft.com/en-us/library/system.net.webclient.aspx
If you want put some addition information, you can use WebClient.Headers,
and using
using System.Net;
WebClient webClient = new WebClient();
var forms = new NameValueCollection();
forms["token"] = "abc123";
var responseData = webClient.UploadValues(#"http://blabla.com/download/?name=abc.exe", "POST", forms);
System.IO.File.WriteAllBytes(#"D:\abc.exe");

SOAP to Stream to String

I have a SOAP object that I want to capture as a string. This is what I have now:
RateRequest request = new RateRequest();
//Do some stuff to request here
SoapFormatter soapFormat = new SoapFormatter();
using (MemoryStream myStream = new MemoryStream())
{
soapFormat.Serialize(myStream, request);
myStream.Position = 0;
using (StreamReader sr = new StreamReader(myStream))
{
string reqString = sr.ReadToEnd();
}
}
Is there a more elegant way to do this? I don't care that much about the resulting string format - just so it's human readable. XML is fine.
No, that's pretty much the way to do it. You could always factor this out to a method which will do this work for you, and then you can just reduce it to a single call where you need it.
I think you can also do this:
soapFormat.Serialize(myStream, request);
string xml=System.Text.ASCIIEncoding.ASCII.GetString(myStream.GetBuffer());

Categories