Apostrophe character in an html email not showing - c#

I am using System.Net.Mail and I am reading html into the body of email.
Unfortunately the apostrophe character ' is shown as a question mark with a black background.
I have tried to replace the apostrophe with the html ' but this still displays the question mark with a black background. Other Html tags (h1, p etc) are working fine.
I now there must be a really obvious answer but I cannot seem to find it. Thanks for your help.
UPDATE
It appears that it is System.IO.StreamReader that is causing my problem.
using (StreamReader reader = new StreamReader("/Email/Welcome.htm"))
{
body = reader.ReadToEnd();
//body string now has odd question mark character instead of apostrophe.
}

If you know the encoding of your file you will want to pass that to your StreamReader initialization:
using (StreamReader reader = new StreamReader("/Email/Welcome.htm", "Windows-1252"))
{
body = reader.ReadToEnd();
// If the encoding is correct you'll now see ´ rather than �
// Which, by the way is the unicode replacement character
// See: http://www.fileformat.info/info/unicode/char/fffd/index.htm
}

You need to save this file as unicode utf-8 format to get it right.

Related

C# Parse Memory Stream text from RichTextBox with special characters

I need your help to find the best/fastest way to parse (regular expression) text in a RichTextbox.
I have already tried several methods, and the fastest one, so far, seems to be saving the text into a MemoryStream and read it line by line while performing the validation.
I have no problem doing that and it actually seems to work pretty well... Except, when I have special chars - Latin chars to be more specific. Lets say for example that I have the name, "João" (John in English BTW), the text, coming from the StreamReader, appears as "Jo\'e3o"... resulting in a failure to find the text.
Not sure if this is because of encoding, I have tried to set the Encoding to UTF8 when creating the StreamReader, but it doesn't work, I always see the text with those codes.
I am starting to think that my only option is to parse the text or lines from the RichTextbox obj, but it is sooooo much slower...
UPDATE
Adding some example code on how I'm reading the RichTextBox text.
(This seems to be the fastest way to read large amounts of text.)
var rtb = new RichTextBox();
var rtbMemStream = new MemoryStream();
rtb.SaveFile(rtbMemStream, RichTextBoxStreamType.RichText);
using (StreamReader sr = new StreamReader(rtbStream, Encoding.UTF8))
{
while (!sr.EndOfStream)
{
var streamLine = sr.ReadLine();
ParseLine(streamLine);
}
}
Any help or suggestions is appreciated,
Thank you in advanced.

File ReadAllLines turns foreign language into gibberish (�)

I am creating a tool that replaces some text in a text file. My problem is that File ReadAllLines turns the Hebrew characters into Gibberish (weird question marks �)
Does anyone know why this is happening? Note that I do have a problem with Hebrew in games and such.. And in Notepad, I can't save Hebrew documents. I can write Hebrew letters but when I save it tells me there's a problem with that.
EDIT - Tried this, but it only turned the Hebrew into regular question marks and not "special" ones-
string[] lines = File.ReadAllLines(fullFilenameDir);
byte[] htmlBytes = Encoding.Convert(Encoding.ASCII, Encoding.Unicode, Encoding.ASCII.GetBytes(String.Join("\r\n", lines)));
char[] htmlChars = new char[Encoding.Unicode.GetCharCount(htmlBytes)];
Encoding.Unicode.GetChars(htmlBytes, 0, htmlBytes.Length, htmlChars, 0);
Try using the Windows-1255 code page to get the encoder.
var myLines = File.ReadAllLines(#"C:\MyFile.txt", Encoding.GetEncoding("Windows-1255"));

HttpWebResponse shows junk characters in Headers C#

Here is the code snippet that returns a persian language word in Response.AddHeaders
and here is a code snippert that get response via HttpWebResponse and shows junk characters for that persian word
I'm curious why does it returns these garbage characters? And how i can rectify this issue? On my aspx page the junk text appears like this-
Please help as soon as possible...
Thanks
i found unicode encoding useful.
pwd = HttpUtility.UrlEncodeUnicode(pwd); // encode string to unicode
pwd = HttpUtility.UrlDecode(pwd); // decode unicode to string

Xml exception due to leading unicode character in REST API response

When I try to parse a response from a certain REST API, I'm getting an XmlException saying "Data at the root level is invalid. Line 1, position 1." Looking at the XML it looks fine, but then examining the first character I see that it is actually a zero-width no-break space (character code 65279 or 0xFEFF).
Is there any good reason for that character to be there? Maybe I'm supposed to be setting a different Encoding when I make my request? Currently I'm using Encoding.UTF8.
I've thought about just removing the character from the string, or asking the developer of the REST API to fix it, but before I do either of those things I wanted to check if there is a valid reason for that character to be there. I'm no unicode expert. Is there something different I should be doing?
Edit: I suspected that it might be something like that (BOM). So, the question becomes, should I have to deal with this character specially? I've tried loading the XML two ways and both throw the same exception:
public static User GetUser()
{
WebClient req = new WebClient();
req.Encoding = Encoding.UTF8;
string response = req.DownloadString(url);
XmlSerializer ser = new XmlSerializer(typeof(User));
User user = ser.Deserialize(new StringReader(response)) as User;
XElement xUser = XElement.Parse(response);
...
return user;
}
U+FFEF is a byte order mark. It's there at the start of the document to indicate the character encoding (or rather, the byte-order of an encoding which could be either way; most specifically UTF-16). It's entirely reasonable for it to be there at the start of an XML document. Its use as a zero-width non-breaking space is deprecated in favour of U+2060 instead.
It would be unreasonable if the byte-order mark was in a different encoding, e.g. if it were a UTF-8 BOM in a document which claimed to be UTF-8.
How are you loading the document? Perhaps you're specifying an inappropriate encoding somewhere? It's best to let the XML API detect the encoding if at all possible.
EDIT: After you've download it as a string, I can imagine that could cause problems... given that it's used to detect the encoding, which you've already got. Don't download it as a string - download it as binary data (WebClient.DownloadData) and then you should be able to parse it okay, I believe. However, you probably still shouldn't use XElement.Parse as there may well be a document declaration - use XDocument.Parse. I'd be slightly surprised if the result of the call could be fed straight into XmlSerializer, but you can have a go... wrap it in a MemoryStream if necessary.
That is called a Byte Order Mark. It's not required in UTF-8 though.
Instead of using Encoding.UTF8, create your own UTF-8 encoder, using the constructor overload that lets you specify whether or not the BOM is to be emitted:
req.Encoding = new UTF8Encoding( false ) ; // omit the BOM
I believe that will do the trick for you.
Amended to Note: The following will work:
public static User GetUser()
{
WebClient req = new WebClient();
req.Encoding = Encoding.UTF8;
byte[] response = req.DownloadData(url);
User instance ;
using ( MemoryStream stream = new MemoryStream(buffer) )
using ( XmlReader reader = XmlReader.Create( stream ) )
{
XmlSerializer serializer = new XmlSerializer(typeof(User)) ;
instance = (User) serializer.Deserialize( reader ) ;
}
return instance ;
}
That character at the beginning is the BOM (Byte Order Mark). It's placed as the first character in unicode text files to specify which encoding was used to create the file.
The BOM should not be part of the response, as the encoding is specified differently for HTTP content.
Typically a BOM in the response comes from sending a text file as response, where the text file was saved with the BOM signature. Visual Studio for example has an option to save a file without the BOM signature so that it can be send directly as a response.

Find and replace question regarding RegEx.Replace

I have a text file and I want to be able to change all instances of:
T1M6 to N1T1M6
The T will always be a different value depending on the text file loaded. So example it could sometimes be
T2M6 and that would need to be turned into N2T2M6. The N(value) must match the T(value). The M6 will always be M6.
Another example:
T9M6 would translate to N9T9M6
Here is my code to do the loading of the text file:
StreamReader reader = new StreamReader(fDialog.FileName.ToString());
string content = reader.ReadToEnd();
reader.Close();
Here is RegEx.Replace statement that I came up with. Not sure if it is right.
content = Regex.Replace(content, #"(T([-\d.]))M6", "N1$1M6");
It seems to work at searching for T5M6 and turning it into N1T5M6.
But I am unsure how to turn the N(value) into the value that T is. For example N5T5M6.
Can someone please show me how to do modify my code to handle this?
Thanks.
Like this:
string content = File.ReadAllText(fDialog.FileName.ToString());
content = Regex.Replace(content, #"T([-\d.])M6", "N$1T$1M6");
Also, you should probably replace [-\d.] with \d or -?\d\.?

Categories