Strange characters as a result of HttpWebResponse [duplicate] - c#

This question already has an answer here:
Garbled httpWebResponse string when posting data to web form programmatically
(1 answer)
Closed 4 years ago.
I'm trying to create site parser for telegram bot. The exact code is:
var link = "https://www.detmir.ru/";
var request = HttpWebRequest.Create(link);
var resp = (HttpWebResponse)request.GetResponse();
string result;
using (var stream = resp.GetResponseStream())
{
using (var reader = new StreamReader(stream, Encoding.GetEncoding(resp.CharacterSet)))
result = reader.ReadToEnd();
}
File.WriteAllText(#"d:\1.txt", result);
Result is a set of strange symbols:
As far as I get - the main clue in encoding. I've tried to use Encoding.Defult, Encoding.UTF8 with the same result.
But with other sites it works perfectly. Is there any trick to solve issue with this certain website?
Update
In Google Chrome the source code of webpage shows correctly:
Google Chrome webpage source code

The contents of the response is UTF-8, as the site reports, but it is compressed to increase throughput performance.
Enable automatic decompression:
var request = (HttpWebRequest)HttpWebRequest.Create(link);
request.AutomaticDecompression = DecompressionMethods.GZip;

Related

Fastest Way To Download Website Data C#

I am writing an Rcon in Visual Studio for Black Ops. I know its an old game but I still have a server running.
I am trying to download the data from this link
Black Ops Log File
I am using this code.
System.Net.WebClient wc = new System.Net.WebClient();
string raw = wc.DownloadString(logFile);
Which take between 6441ms to 13741ms according to Visual Studio.
Another attempt was...
string html = null;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(logFile);
request.AutomaticDecompression = DecompressionMethods.GZip;
request.Proxy = null;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
}
Which also takes around 6133ms according to VS debugging.
I have seen other rcon respond to commands really quickly. Mine take on best 5000ms which is not really acceptable. How can I download this this information quicker. I am told it shouldn't take this long??? What am I doing wrong?
This is just how long the server takes to answer:
In the future you can debug such problems yourself using network tools such as Fiddler or by profiling your code to see what takes the longest amount of time.

Yet another thread on The underlying connection was closed. A connection that was expected to be kept alive was closed by the server

I have seen lots of threads on this topic... but solutions on google are not working for me.
I am doing a POST operation using HttpWebRequest object and when I try to post a lot of data I get an error
The underlying connection was closed. A connection that was expected to be kept alive was closed by the server
Now I googled and I found three solutions
Set KeepAlive=False and set ProtocolVersion = HttpVersion10.
when I do this, there is no error... but somehow the data which I am posing doesn't reach the server. (so somehow it fails silently... without any error).
If I remove KeepAlive=false and Set ProtocolVersion = HttpVersion10. Then I can see that for small requests everything works fine.... but for large requests I get error of Underlying connection was closed.
I also found that some people solved the problem by moving to HttpClient instead of HttpWebRequest... but i think its only for .NET 4.5 but I must compile code for .NET 3.5 at a minimum.
Some people solved the problem by
ServicePoint sp = ServicePointManager.FindServicePoint(request.RequestUri);
sp.Expect100Continue = false;
once again, this threw no errors. but data was no committed.
So for me these solutions are not working.
Do you have any ideas?
Here is my code
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(completeUrl);
request.CookieContainer = Utility.GetSSOCookie(completeUrl);
request.Method = httpMethod;
request.Timeout = int.MaxValue;
Stream reqStream = null;
string output = null;
try {
if (String.IsNullOrEmpty(input) == false) {
byte[] buffer = Encoding.GetEncoding("UTF-8").GetBytes(
input
);
request.ContentLength = buffer.Length;
reqStream = request.GetRequestStream();
reqStream.Write(buffer, 0, buffer.Length);
}
using (WebResponse response = request.GetResponse()) {
using (StreamReader reader = new StreamReader(response.GetResponseStream())) {
output = reader.ReadToEnd();
}
}
}
This Exception can be missleading.
I got the same Exception in a scenario where a c# httpclient talks to an asp-mvc (v5) app on iis (v7). After i while it found out that the server runs in a recrusion-loop without throwing a stackoverflow or any log. The result was that the server response the disconection to the client. Maybe this could help someone running into this kind of error.

Reading information from a website c#

In the project I have in mind I want to be able to look at a website, retrieve text from that website, and do something with that information later.
My question is what is the best way to retrieve the data(text) from the website. I am unsure about how to do this when dealing with a static page vs dealing with a dynamic page.
From some searching I found this:
WebRequest request = WebRequest.Create("anysite.com");
// If required by the server, set the credentials.
request.Credentials = CredentialCache.DefaultCredentials;
// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// Display the status.
Console.WriteLine(response.StatusDescription);
Console.WriteLine();
// Get the stream containing content returned by the server.
using (Stream dataStream = response.GetResponseStream())
{
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream, Encoding.UTF8);
// Read the content.
string responseString = reader.ReadToEnd();
// Display the content.
Console.WriteLine(responseString);
reader.Close();
}
response.Close();
So from running this on my own I can see it returns the html code from a website, not exactly what I'm looking for. I eventually want to be able to type in a site (such as a news article), and return the contents of the article. Is this possible in c# or Java?
Thanks
I hate to brake this to you but that's how webpages looks, it's a long stream of html markup/content. This gets rendered by the browser as what you see on your screen. The only way I can think of is to parse to html by yourself.
After a quick search on google I found this stack overflow article.
What is the best way to parse html in C#?
but I'm betting you figured this would be a bit easier than you expected, but that's the fun in programming always challenging problems
You can just use a WebClient:
using(var webClient = new WebClient())
{
string htmlFromPage = webClient.DownloadString("http://myurl.com");
}
In the above example htmlFromPage will contain the HTML which you can then parse to find the data you're looking for.
What you are describing is called web scraping, and there are plenty of libraries that do just that for both Java and C#. It doesn't really matter if the target site is static or dynamic since both output HTML in the end. JavaScript or Flash heavy sites on the other hand tend to be problematic.
Please try this,
System.Net.WebClient wc = new System.Net.WebClient();
string webData = wc.DownloadString("anysite.com");

Get Size of Image File before downloading from web

I am downloading image files from web using the following code in my Console Application.
WebClient client = new WebClient();
client.DownloadFile(string address_of_image_file,string filename);
The code is running absolutely fine.
I want to know if there is a way i can get the size of this image file before I download it.
PS- Actually I have written code to make a crawler which moves around the site downloading image files. So I doesn't know its size beforehand. All I have is the complete path of file which has been extracted from the source of webpage.
Here is a simple example you can try
if you have files of different extensions like .GIF, .JPG, etc
you can create a variable or wrap the code within a Switch Case Statement
System.Net.WebClient client = new System.Net.WebClient();
client.OpenRead("http://someURL.com/Images/MyImage.jpg");
Int64 bytes_total= Convert.ToInt64(client.ResponseHeaders["Content-Length"])
MessageBox.Show(bytes_total.ToString() + " Bytes");
If the web-service gives you a Content-Length HTTP header then it will be the image file size. However, if the web-service wants to "stream" data to you (using Chunk encoding), then you won't know until the whole file is downloaded.
You can use this code:
using System.Net;
public long GetFileSize(string url)
{
long result = 0;
WebRequest req = WebRequest.Create(url);
req.Method = "HEAD";
using (WebResponse resp = req.GetResponse())
{
if (long.TryParse(resp.Headers.Get("Content-Length"), out long contentLength))
{
result = contentLength;
}
}
return result;
}
You can use an HttpWebRequest to query the HEAD Method of the file and check the Content-Length in the response
You should look at this answer: C# Get http:/…/File Size where your question is fully explained. It's using HEAD HTTP request to retrieve the file size, but you can also read "Content-Length" header during GET request before reading response stream.

"An error occurred while parsing EntityName" while Loading an XmlDocument

I have written some code to parse RSS feeds for a ASP.NET C# application and it works fine for all RSS feeds that I have tried, until I tried Facebook.
My code fails at the last line below...
WebRequest request = WebRequest.Create(url);
WebResponse response = request.GetResponse();
Stream rss = response.GetResponseStream();
XmlDocument xml = new XmlDocument();
xml.Load(rss);
...with the error "An error occurred while parsing EntityName. Line 12, position 53."
It is hard to work out what is at thhat position of the XML file as the entire file is all in one line, but it is straight from Facebook and all characters appear to be encoded properly except possibly one character (♥).
I don't particularly want to rewrite my RSS parser to use a different method. Any suggestions for how to bypass this error? Is there a way of turning off checking of the file?
Look at the downloaded stream. It doesn't contain the RSS feed, but a HTML page with message about incompatible browser. That's because when downloading the URL like this, the user agent header is not set. If you do that, your code should work:
var request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = "MyApplication";
var xml = new XmlDocument();
using (var response = request.GetResponse())
using (var rss = response.GetResponseStream())
{
xml.Load(rss);
}

Categories