Reading HTML file into textbox translates apostrophes and bullets into? - c#

I am using a StreamReader (in C#) to read contents of an HTML file into a textbox. No matter which encoding I use as an uption, all of the apostrophes and bullets get changed into question marks.
Is there another way to read an HTML file that will preserve these characters?
Thanks!
Jerry
Here is the code:
private void button1_Click(object sender, EventArgs e)
{
DialogResult result = openFileDialog1.ShowDialog();
if (result == DialogResult.Cancel)
return;
StreamReader sr = new StreamReader(openFileDialog1.FileName);
txtMessage.Text = sr.ReadToEnd();
sr.Close();
}
I have also used the StreamReader with the Encoding parameter (tried every one). The only thing it seems to do is interpret the question marks are regular or reversed (black diamond with white question mark).
If it makes any difference, the files are created in Word by another department and then exported to Filtered HTML.
One last thing: If I open the HTML file in something like Notepad and copy/paste the text into the textbox, then everything looks exactly as it should.
The changes only occur if I try to pull the file in via a reader.

I would try it with new StreamReader(..., Encoding.UTF8); or new StreamReader(..., Encoding.GetEncoding("iso-8859-1")); and if that doesn't work, then I'd go after the person who created the file and stuff needles under their fingernails until they confess what encoding they used to create it.

Related

Saving source code formatting to TXT file

I am trying to extract the source code from a webpage and save it to a text file. However, I want to keep the formatting of the source code.
My code is below.
// this block fetches the source code from the URL entered.
private void buttonFetch_Click(object sender, EventArgs e)
{
using (WebClient webClient = new WebClient())
{
string s = webClient.DownloadString("http://www.ebay.com");
Clipboard.SetText(s, TextDataFormat.Text);
string[] lines = { s };
System.IO.File.WriteAllLines(#"C:\Users\user\Dropbox\Personal Projects\WriteLines.txt", lines);
MessageBox.Show(s.ToString(), "Source code",
MessageBoxButtons.OKCancel, MessageBoxIcon.Asterisk);
}
}
I would like the text file to show the source code as it is formatted in the Messagebox.
Messagebox screenshot:
Text file screenshot:
How would I go about getting the text document's formatting to be the same as in the Messagebox?
I agree with the comment, but I'll add just a note. If you open it in Notepad++, N++ will detect the line endings and display the file nicely for you. In Notepad++ you can go into the menu and change the Line Endings to Windows. If you then re-save it and open it in Notepad itself, it will look correctly. The problem is that the base Notepad doesn't understand different line endings.
Hope it helps.
The problem is that the string you're downloading has LF-only line endings. The Windows standard is CRLF line endings. Windows Notepad is notoriously adamant about supporting only CRLF line endings. Other editors, including Visual Studio, correctly handle the LF-only versions.
You can convert the text to CRLF line endings easily enough:
string s = webClient.DownloadString("http://www.ebay.com");
string fixedString = s.Replace("\n", "\r\n");
System.IO.File.WriteAllText("filename", fixedString);
MessageBox.Show(fixedString, "Source code",
MessageBoxButtons.OKCancel, MessageBoxIcon.Asterisk);
Note also that it is not necessary to call ToString on a string.
Try this:
string[] lines = s.Split('\n');
System.IO.File.WriteAllLines(#"C:\Users\user\Dropbox\Personal Projects\WriteLines.txt", lines);

How to Open Texts with Format and Color

I am sorry if this Question is not well structured but this is a question that has been puzzling for a while now.
I know how to Read a Text file from the Open file Dialog into a Rich Text box using
DialogResult DR = openFileDialog1.ShowDialog();
if (DR == DialogResult.OK)
{
string txt = openFileDialog1.SafeFileName;
FileStream textFile = new FileStream(openFileDialog1.FileName, FileMode.Open, FileAccess.Read);
StreamReader doc = new StreamReader(textFile);
GetCurrentTextbox().Text = doc.ReadToEnd();
tabControl1.SelectedTab.Text = txt;
}
This works perfectly for Ordinary Text files but the Problem is that if this File was created using Wordpad or MsWord, it shows something like
Pls what can i do????
This works perfectly for Ordinary Text files but the Problem is that if this File was created using Wordpad or MsWord, it shows something like
Yes, because those aren't text files - but you're trying to read them as text files.
If you need to read a Word/Wordpad document, you'll either need to use Office Interop, or possibly a third party library which understands the file format. Either way, you won't be able to just set the Text property of a control to anything to get formatted text. You might be able to convert it to RTF and then use a RichTextBox.

C#: Compare text file contents to a string variable

I have an application that dumps text to a text file. I think there might be an issue with the text not containing the proper carriage returns, so I'm in the process of writing a test that will compare the contents of of this file to a string variable that I declare in the code.
Ex:
1) Code creates a text file that contains the text:
This is line 1
This is line 2
This is line 3
2) I have the following string that I want to compare it to:
string testString = "This is line 1\nThis is line 2\nThis is line3"
I understand that I could open a file stream reader and read the text file line by line and store that in a mutable string variable while appending "\n" after each line, but wondering if this is re-inventing the wheel (other words, .NET has a built in class for something like this). Thanks in advance.
you can either use StreamReader's ReadToEnd() method to read contents in a single string like
using System.IO;
using(StreamReader streamReader = new StreamReader(filePath))
{
string text = streamReader.ReadToEnd();
}
Note: you have to make sure that you release the resources (above code uses "using" to do that) and ReadToEnd() method assumes that stream knows when it has reached an end. For interactive protocols in which the server sends data only when you ask for it and does not close the connection, ReadToEnd might block indefinitely because it does not reach an end, and should be avoided and also you should take care that current position in the string should be at the start.
You can also use ReadAllText like
// Open the file to read from.
string readText = File.ReadAllText(path);
which is simple it opens a file, reads all lines and takes care of closing as well.
No, there is nothing built in for this. The easiest way, assuming that your file is small, is to just read the whole thing and compare them:
var fileContents = File.ReadAllText(fileName);
return testString == filecontents;
If the file is fairly long, you may want to compare the file line by line, since finding a difference early on would allow you to reduce IO.
A faster way to implement reading all the text in a file is
System.IO.File.ReadAllText()
but theres no way to do the string level comparison shorter
if(System.IO.File.ReadAllText(filename) == "This is line 1\nThis is line 2\nThis is line3") {
// it matches
}
This should work:
StreamReader streamReader = new StreamReader(filePath);
string originalString = streamReader.ReadToEnd();
streamReader.Close();
I don't think there is a quicker way of doing it in C#.
You can read the entire file into a string variable this way:
FileStream stream;
StreamReader reader;
stream = new FileStream(yourFileName, FileMode.Open, FileAccess.Read, FileShare.Read);
reader = new StreamReader(stream);
string stringContainingFilesContent = reader.ReadToEnd();
// and check for your condition
if (testString.Equals(stringContainingFilesContent, StringComparison.OrdinalIgnoreCase))

StreamReader is unable to correctly read extended character set (UTF8)

I am having an issue where I am unable to read a file that contains foreign characters. The file, I have been told, is encoded in UTF-8 format.
Here is the core of my code:
using (FileStream fileStream = fileInfo.OpenRead())
{
using (StreamReader reader = new StreamReader(fileStream, System.Text.Encoding.UTF8))
{
string line;
while (!string.IsNullOrEmpty(line = reader.ReadLine()))
{
hashSet.Add(line);
}
}
}
The file contains the word "achôcre" but when examining it during debugging it is adding it as "ach�cre".
(This is a profanity file so I apologize if you speak French. I for one, have no idea what that means)
The evidence clearly suggests that the file is not in UTF-8 format. Try System.Text.Encoding.Default and see if you get the correct text then — if you do, you know the file is in Windows-1252 (assuming that is your system default codepage). In that case, I recommend that you open the file in Notepad, then re-“Save As” it as UTF-8, and then you can use Encoding.UTF8 normally.
Another way to check what encoding the file is actually in is to open it in your browser. If the accents display correctly, then the browser has detected the correct character set — so look at the “View / Character set” menu to find out which one is selected. If the accents are not displaying correctly, then change the character set via that menu until they do.

Accessing the content of the file

//Introduction
Hey, Welcome.....
This is the tutorial
//EndIntro
//Help1
Select a Stock
To use this software you first need to select the stock. To do that, simply enter the stock symbol in the stock text-box (such as "MSFT").
To continue enter "MSFT" in the stock symbol box.
//EndHelp1
//Help2
Download Stock Data
Next step is to to download the stock data from the online servers. To start the process simply press the "Update" button or hit the <ENTER> key.
After stock data is downloaded the "Refresh" button will appear instead. Press it when you want to refresh the data with the latest quote.
To continue make sure you are online and press the "Update" button
//EndHelp2
First time I want to display the content between //Introduction and //EndIntro then second time the content between //Help1 and //EndHelp1 and so on.
That's a very open-ended question - what sort of file? To read binary data from it you'd usually use:
using (Stream stream = File.OpenRead(filename))
{
// Read from the stream here
}
or
byte[] data = File.ReadAllBytes(filename);
To read text you could use any of:
using (TextReader reader = File.OpenText(filename))
{
// Read from the reader
}
or
string text = File.ReadAllText(filename);
or
string[] lines = File.ReadAllLines(filename);
If you could give more details about the kind of file you want to read, we could help you with more specific advice.
EDIT: To display content from an RTF file, I suggest you load it as text (but be careful of the encoding - I don't know what encoding RTF files use) and then display it in a RichTextBox control by setting the Rtf property. Make the control read-only to avoid the user editing the control (although if the user does edit the control, that wouldn't alter the file anyway).
If you only want to display part of the file, I suggest you load the file, find the relevant bit of text, and use it appropriately with the Rtf property. If you load the whole file as a single string you can use IndexOf and Substring to find the relevant start/end markers and take the substring between them; if you read the file as multiple lines you can look for the individual lines as start/end markers and then concatenate the content between them.
(I also suggest that next time you ask a question, you include this sort of detail to start with rather than us having to tease it out of you.)
EDIT: As Mark pointed out in a comment, RTF files should have a header section. What you've shown isn't really an RTF file in the first place - it's just plain text. If you really want RTF, you could have a header section and then the individual sections. A better alternative would probably be to have separate files for each section - it would be cleaner that way.
Not sure I understand your question correctly. But you can read and write content using System.IO.StreamReader and StreamWriter classes
string content = string.Empty;
using (StreamReader sr = new StreamReader("C:\\sample.txt"))
{
content = sr.ReadToEnd();
}
using (StreamWriter sw = new StreamWriter("C:\\Sample1.txt"))
{
sw.Write(content);
}
Your question needs more clarification. Look at System.IO.File for many ways to read data.
The easiest way of reading a text file is probably this:
string[] lines = File.ReadAllLines("filename.txt");
Note that this automatically handles closing the file so no using statement is need.
If the file is large or you don't need all lines you might prefer to reading the text file in a streaming manner:
using (StreamReader streamReader = File.OpenText(path))
{
while (true)
{
string line = streamReader.ReadLine();
if (line == null)
{
break;
}
// Do something with line...
}
}
If the file contains XML data you might want to open it using an XML parser:
XDocument doc = XDocument.Load("input.xml");
var nodes = doc.Descendants();
There are many, many other ways to read data from a file. Could you be more specific about what the file contains and what information you need to read?
Update: To read an RTF file and display it:
richTextBox.Rtf = File.ReadAllText("input.rtf");

Categories