C#: getting site's encoding for WebClient beforehand - c#

I'm downloading and parsing a lot of XML files from Internet. They all have different encodings that are described on the first line.
<?xml version="1.0" encoding="windows-1251"?>
<?xml version="1.0" encoding="UTF-8"?>
and so on...
I need to set correct WebClient.Encoding parameter in order to receive the text in correct encoding. But I can't do that without pre-downloading the file and reading the first line.
Is it possible to do?
Thank you

You don't need to set anything - you don't need to handle the encoding at all. Just get the binary data and get the XML parsers to handle it. Or if you're going to store the files on disk, just dump the binary data straight onto disk. You don't need to worry about the encoding at all.

Simply use this now and it should handle everything own his own:
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
XDocument.Load(myHttpWebResponse.GetResponseStream());
http://msdn.microsoft.com/en-us/library/system.xml.linq.xdocument.aspx

Related

Keep special characters in File When i save it

I have a file that contains some HTML code. I am trying to load this data into a C# Console app and transfer it into a JSON file to upload somewhere. When loading the file i am losing some of the encoding immediately when bringing the data in.
Example data
<li>Comfort Range: -60°F to 30°F / -50°C to -1°C</li>
Basic read file
//Load the file
String HTML_File = File.ReadAllText(location);
//Output the file to see the text
Console.WriteLine(HTML_File);
Console Output
<li>Comfort Range: -60??F to 30?F / -50?C to -1?C</li>
After i split the data how I need to, I than save the class to a JSON File
File.WriteAllText(OutputPath,JsonConvert.SerializeObject(HTMLDATA));
JSON file Data
<li>Comfort Range: -60�F to 30�F / -50�C to -1�C</li>
How can i go about loading this data and converting it to JSON without losing the encoding? I am still pretty new when it comes to encoding like this.
#JeremyLakeman helped me solve this, thank you sir!! When reading the text into the utility i needed to set the Encoding but not by the default ones.
File.WriteAllText(OutputPath,JsonConvert.SerializeObject(HTMLDATA), Encoding.GetEncoding("iso-8859-1"));
#JeremyLakeman helped me solve this, thank you sir!! When reading the text into the utility i needed to set the Encoding but not by the default ones.
File.WriteAllText(OutputPath,JsonConvert.SerializeObject(HTMLDATA), Encoding.GetEncoding("iso-8859-1"));

c# File in xml format not opening in xml viewer

Created a .xml file in C# console program using StreamWriter without using any xml write library functions). However, it does not show data in XML viewer - shows fine if opened as a text file.
I tried, like I saw somewhere on this site, the following -
FileStream fStream = new FileStream (#"c:\new.xml", FileMode.Create)
StreamWriter fWrite = new StreamWriter(fStream, Encoding.UTF8);
fwrite.WriteLine (myLine);
where the first myLine was
<?xml version="1.0" encoding="UTF-8"?>
Is there a way to make this open like an xml file without having to use the xml lib functions?
Here's some more info -
Contents of the file I wrote, as it opens in Notepad :
(OK, the contents are like below, but formatting isn't - the CTRL K that I was instructed to do here did the formatting!)
<?xml version="1.0" encoding="UTF-8"?>
<OutermostTag>
<RepetitiveInnerTag Action="AddSomething">
<ID1>12345<ID1>
<Level1>Leveldata1<Level1>
<DisplayName>Name to Display<DisplayName>
<Description>Describe it all here<Description>
<SortOrder>ASC<SortOrder>
<ID2>C3<ID2>
<Level2>Data<Level2>
</RepetitiveInnerTag>
</OutermostTag>
While opened as xml only the first inner tag (viz.,) data is displayed, space-demited as follows:
12345 Leveldata1 Name to Display Describe it all here ASC C3 Data
And the output display is the same whether I use the Encoding.UTF8 property or not.
By "open like an xml" I mean, in addition to displaying the entire data in the file, also make the tags collapsible (the color and all that format-related stuff that (presumably) the browser (IE) puts in)
Have you tried using flush? try putting it after fWrite.WriteLine
fWrite.Flush();
OK guys, I found out the blunder I did - didn't used the opening element tags to close them as well (OOOOPs!).Than you all for your time (and apologize to have wasted it too). a C/C++ programmer on my first C# trial project, didn't want the complications of using the XML writer libs; and now am delighted that it still works doing the lib work simply by myself the C-style (contrary to my boss's insistence that it wont:)). Will be careful next time I post

How to read only a small part of a .XML

I built an application in order to read a file, but even with the fact that my connection is fast, the page takes several seconds to load, I would like to know how to read only the first records of this .xml
string rssURL = "http://www.cnt.org.br/Paginas/feed.aspx?t=n";
System.Net.WebRequest myRequest = System.Net.WebRequest.Create(rssURL);
System.Net.WebResponse myResponse = myRequest.GetResponse();
System.IO.Stream rssStream = myResponse.GetResponseStream();
System.Xml.XmlDocument rssDoc = new System.Xml.XmlDocument();
rssDoc.Load(rssStream);
System.Xml.XmlNodeList rssItems = rssDoc.SelectNodes("rss/channel/item");
Tks..
As the fore posters mention you can’t download part of a web request. But you can start parsing Xml before the request finished. Using XmlDocument is the wrong approach for your use case, because it needs the complete request to create the object. Try using XmlTextReader.
There is no easy way to download part of a web request and ensure it is what you want. One workaround would be to use the Google Feed API.
You'd have to use the JSON interface since they don't provide a library for C#, but since it's going through Google's servers it will be much faster. You'd have to modify your code a little bit, since it returns JSON by default instead of XML, but that is a trivial change to make. You can also change the parameter output=xml to retrieve the XML representation of the data.
Try going to this page, that is your same feed, with fewer elements and loads much faster. That only returns a few elements, but if you want 10 elements, all you have to do is add num=10 to the URL. For example, this url has 10 elements. Read the API documentation a little more to see what variables you can add to cater the request to what you want to do.

C# display XML from html POST

Got a problem here... If I put the XML file on the server, then I can read it through steamReader, convert to variable and got everything working in the MSSQL database.
However, it is required that I send through html POST, and it doesn't work for the code below:
page.Response.ContentType = "text/xml";
StreamReader reader = new StreamReader(page.Request.InputStream);
inputString = reader.ReadToEnd();
deleteShip(inputString);
it seems to me that the above code didn't get the XML that POST from my program. Because for the same code in deleteShip, if I use an xml on the server then it works fine.
Is there a way to solve this problem? As long as I can send any string to deleteShip(string s) then I'm happy. The string will be in XML format though
Thanks for the help!
It would be useful to see how the XML is POSTed to your program. Typically, data is sent from an HTML form as name-value pairs in the HTTP request body when using the POST method. It's not clear from your question whether you're using an HTML form to POST the XML to your program and it's hard to tell what might be going wrong without more information.
From your code it looks like you're reading the entire HTTP request where you'd usually read the value of a request parameter for example:
Request["XmlParameterName"]
Where XmlParameterName is the name of an HTML form input field.
Have you inspected the value of the inputString variable? Is it valid XML? Is it encoded correctly? Are any invalid characters like ampersands (&) escaped correctly?
Update your question with a bit more information if none of the things I mentioned are the problem.
OK, I got it fixed.
Here is the code.
System.IO.Stream stream;
string inputString;
Int32 stringLength;
stream = Request.InputStream;
stringLength = Convert.ToInt32(stream.Length);
byte[] stringArray = new byte[stringLength];
inputString = System.Text.Encoding.ASCII.GetString(stringArray, 0, stringLength);
deleteShip(inputString);
By this it will access the POST body from my html request (which in this case XML).

Replace a word in an XML file through StreamReader in XNA?

Okay, so this is sort of a hack...but it may have to be. I'm writing an app in XNA, which from my research into this problem apparently doesn't support XML version 1.1. I'm reading in the contents of an ePub document, and one of the newer books encodes its content as a version 1.1 XML document. This causes my program to crash, however, the structure is the same as the rest. The only thing that is keeping it from working is the hard-coded "1.0" in the XmlDocument class.
Is it possible that I could read in the file from the stream, see if it contains:
<?xml version="1.1" encoding="UTF-8" standalone="no"?>
and simply replace it with "1.0"? Then I could pull it in as an XmlDocument. I'm not doing any writing to the file, or any complex structural reading, just looking for a few specific nodes, and pulling in the values, so I don't know what the ramifications of this would be.
You can do this in a very dodgy way by reading the entire XML file into memory and having your way with it:
string content = "";
// Read the XML file into content
StreamReader reader = new StreamReader("file.xml");
content = reader.ReadToEnd();
reader.Close();
// Find the character position just after the <?xml token, and just before the ?> token
int openIndex = content.IndexOf("<?xml", StringComparison.OrdinalIgnoreCase) + 5;
int closeIndex = content.IndexOf("?>", openIndex);
// Get the bits between <?xml and ?>
string header = content.Substring(openIndex, closeIndex - openIndex);
// Substitute version string.
header = header.Replace("version=\"1.1\"", "version=\"1.0\"");
// Put Humpty Dumpty back together again.
content = string.Concat(content.Substring(0, openIndex), header, content.Substring(closeIndex));
// Feed content into an XMLReader (or equivalent) here.
It works for the example string you provide, but I haven't tested it on imperfectly-formatted XML documents.

Categories