I am having a problem where users are composing some large chunks of text in MS Word, then pasting that in to the online form. These get entered into the DB as an upside down ?. What are my options to replace these with standard quotes?
These smart quotes are a unicode point. All you need is a simple String.Replace to sort them out.
-edit- Something like:
mystring.Replace("\u201C","\"").Replace("\u201D","\"")
What are my options to replace these with standard quotes?
The best approach is not to replace them. People want to use “smart quotes”, let them. They're not aberrations that only exist in MS Word, they're perfectly valid Unicode characters, and if your application isn't storing non-ASCII characters right then there's a whole lot more that will go wrong than just smart quotes.
Use UTF-8 encoding for all your web pages and store your content in a Unicode-capable database (eg. if you are using SQL Server, use NVARCHAR) and you'll not only support smart quotes but also accents and other alphabets.
You should run the input through the HtmlEncode method, which will convert from or to and , allowing you to save those and other higher characters to a format that can be saved without hassle.
Should I also mention Joel's post again?
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Related
In the XML i need to read in C#, i find characters such as
é, É.
As far as i know , i should not find those characters in a windows-1252 encoded XML. Can i fix that problem in C# or the XML itself must be updated?
Thanks in advance.
It does look like the XML needs to be updated.
You could certainly write something that reads it in as the UTF-8 it really is and writes it back out as the Windows-1252 it claimed to be, but why bother? XML in Windows-1252 is like someone using their smart-phone while dressed ye olde knight at a Renaissance Faire anyway. Just drop the incorrect declaration from the first line and away you go.
The simple answer is: you're probably using the wrong encoding. From this I'd say you should be using UTF-8. You can force it by downloading the document before parsing it.
I should note that downloading URL's is tricky: web servers often report the wrong encoding. That is also the reason why the HTML5 standard includes a section on encoding detection. I'm afraid there's no easy generic solution for this -- we ended up implementing our own encoding detection algorithms for our web crawlers.
I've to read datamatrix barcodes (vda 4902, gtin, gs1) which use non-printable chars as seperator.
The goal is to scan the barcode with intermec or honeywell hardware and send it to a c# mvc webapplication.
The printable characters are received by the webapplication, but the non-printable chars not.
I've scanned the code to the VI editor on a linux server - bere i can see the special characters. But i couldn't get it with a asp.net to work nor a c# windows form application.
So currently i don't know where to look at...
Most likely if you are passing values to another page or webservice, you are forgetting the step of properly encoding the characters you are sending. You should probably look at using something like System.Web.HttpServerUtility.HtmlEncode. This function properly converts special characters in the value you are sending to an alternate representation that gets decoded on the receiving end.
Depending on other specifics would you did not elaborate on your original question, there are many other ways to encode/escape characters for purposes like this. But the above is what I would suggest starting with if you are not clear.
I'm using sharpPDF dll (http://sharppdf.sourceforge.net) to create PDF's in C#. Everything works great but I don't get any special characters (actually these are Polish letters such as "ą, ć, ł, Ó...") in my output. I'm saving strings in that PDF.
Is there any way to get that working?
Thanks.
Unfortunately SharpPDF has a lot of issues with special characters and there is no evolution planned for a correction of the special characters problem.
Sorry.
I have been trying to figure the difference for quite sometime now. The issue is with a file that is in ANSI encoding has japanese characters like: ‚È‚‚Æ‚à1‚‚ÌINCREMENTs‚ª•K—v‚Å‚·. It equivalent in shift-jis is 少なくとも1つのINCREMENT行が必要です. which is expected to be in japanese.
I need to display these characters after reading from file(in ANSI) on a webpage. There are some other files in UTF-8 displaying characters right not seeing this. I am finding it difficult to figure out whats the difference and how do I change encoding to do right things here..
I use c# for reading this file and displaying it, I also need to write the string back into file if its modified on web. Any encoding and decoding schemas here?
As far as code pages are concerned, "ANSI" (and Encoding.Default in .NET) basically just means "the non-Unicode codepage used by this system" - exactly what codepage that is, depends on how the system is configured, but on a Western European system, it's likely to be Windows-1252.
For the system where that text comes from, then "ANSI" would appear to mean Shift-JIS - so unless your system has the same code page, you'll need to tell your code to read the text as Shift-JIS.
Assuming you're reading the file with a StreamReader, there are various constructors that take an Encoding, so just grab a Shift-JIS encoding with Encoding.GetEncoding("shift_jis") or Encoding.GetEncoding(932) and use it to construct your StreamReader.
I get HTML from a webpage that is in german language, i have to insert its html in database, but when I insert it in database the german letters does not appear coorectly.
E.g. Bundesstraße appears as Bundesstraße. I am using C# and MYsql database.
It seems like special characters are encoded as html entities (http://www.w3schools.com/tags/ref_entities.asp) on the website. When using UTF8 this isn't necessary, but many sites still do it.
If you want to have the exact html as it is on the website these encoded entities are correct.
To decode the entities you can use System.Net.WebUtility.HtmlDecode(yourString).
What encoding are you using?
Try switching to UTF-8 and ensure your database supports it. It looks as if though your string is getting HTML encoding, this is fine for presentation, but you'll need the original format to store it in the database.
In HTML, ß is encoded as ß.
You say "i have to insert its html in database", and what you're currently getting is correct.