I'm using sharpPDF dll (http://sharppdf.sourceforge.net) to create PDF's in C#. Everything works great but I don't get any special characters (actually these are Polish letters such as "ą, ć, ł, Ó...") in my output. I'm saving strings in that PDF.
Is there any way to get that working?
Thanks.
Unfortunately SharpPDF has a lot of issues with special characters and there is no evolution planned for a correction of the special characters problem.
Sorry.
Related
In the XML i need to read in C#, i find characters such as
é, É.
As far as i know , i should not find those characters in a windows-1252 encoded XML. Can i fix that problem in C# or the XML itself must be updated?
Thanks in advance.
It does look like the XML needs to be updated.
You could certainly write something that reads it in as the UTF-8 it really is and writes it back out as the Windows-1252 it claimed to be, but why bother? XML in Windows-1252 is like someone using their smart-phone while dressed ye olde knight at a Renaissance Faire anyway. Just drop the incorrect declaration from the first line and away you go.
The simple answer is: you're probably using the wrong encoding. From this I'd say you should be using UTF-8. You can force it by downloading the document before parsing it.
I should note that downloading URL's is tricky: web servers often report the wrong encoding. That is also the reason why the HTML5 standard includes a section on encoding detection. I'm afraid there's no easy generic solution for this -- we ended up implementing our own encoding detection algorithms for our web crawlers.
I've to read datamatrix barcodes (vda 4902, gtin, gs1) which use non-printable chars as seperator.
The goal is to scan the barcode with intermec or honeywell hardware and send it to a c# mvc webapplication.
The printable characters are received by the webapplication, but the non-printable chars not.
I've scanned the code to the VI editor on a linux server - bere i can see the special characters. But i couldn't get it with a asp.net to work nor a c# windows form application.
So currently i don't know where to look at...
Most likely if you are passing values to another page or webservice, you are forgetting the step of properly encoding the characters you are sending. You should probably look at using something like System.Web.HttpServerUtility.HtmlEncode. This function properly converts special characters in the value you are sending to an alternate representation that gets decoded on the receiving end.
Depending on other specifics would you did not elaborate on your original question, there are many other ways to encode/escape characters for purposes like this. But the above is what I would suggest starting with if you are not clear.
I have a process which reads text from a text file. Overall, it works fine, but in one file, the imported char shows as a triangle with a question mark.
What is this, and what do I do about it?
You're using the wrong encoding. You probably need to use UTF8 vs. ASCII. Now, I have no idea how to help with your code, well because there is no code in the question, but the encoding is your problem.
My question might be a little bit confusing, but I think it's still worth of paying some attention.
Basically I'm designing a program to display all printable Unicode characters in a RichTextBox.
I'm using VC# 2010 Express Edition.
However, the RichTextBox has a critical problem: some special characters cannot be displayed correctly.
For example, some Korean Characters (ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑᄒ), can be displayed correctly in Microsoft Word. After I copy to the RichTextBox, the characters cannot be displayed correctly. However, when I copy back to Microsoft Word, it can be displayed correctly.
Therefore, it's a display problem (the characters themselves are correct). I guess it might be a font problem.
Some related property info:
RichTextBox.Font.GdiChaSet
RichTextBox.Font
How can I solve it? So that all printable Unicode characters can be displayed correctly (using different fonts for different CharSets are acceptable).
Actually, I need further assistance about removing all formatting when pasting
rtbxFileContent.Paste(DataFormats.GetFormat(DataFormats.Text)); // DataFormats.UnicodeText
I still need to have all printable characters to be displayed correctly, but without any formatting (except font).
Thanks.
Hope I made myself understood.
I hate sounding like MS Office Clippy, but your questions seems a lot like this one.
Essentially, you're not mad, it is hard. You could try reading/writing the text manually, using UTF8Encoding and BinaryWriter/BinaryReader.
I found the font "Arial Unicode MS" can almost solve my problem, but some characters from Char Sets looks weird to me. (Also, what if the user computer has not installed the font "Arial Unicode MS"?
So I'm still looking for a better universal solution to my question: automatically using different font for different Char Sets in the RichTextBox.
Thanks.
I am having a problem where users are composing some large chunks of text in MS Word, then pasting that in to the online form. These get entered into the DB as an upside down ?. What are my options to replace these with standard quotes?
These smart quotes are a unicode point. All you need is a simple String.Replace to sort them out.
-edit- Something like:
mystring.Replace("\u201C","\"").Replace("\u201D","\"")
What are my options to replace these with standard quotes?
The best approach is not to replace them. People want to use “smart quotes”, let them. They're not aberrations that only exist in MS Word, they're perfectly valid Unicode characters, and if your application isn't storing non-ASCII characters right then there's a whole lot more that will go wrong than just smart quotes.
Use UTF-8 encoding for all your web pages and store your content in a Unicode-capable database (eg. if you are using SQL Server, use NVARCHAR) and you'll not only support smart quotes but also accents and other alphabets.
You should run the input through the HtmlEncode method, which will convert from or to and , allowing you to save those and other higher characters to a format that can be saved without hassle.
Should I also mention Joel's post again?
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)