I'm trying to output the unicode latin cross character described in this chart :
http://en.wikipedia.org/wiki/List_of_Unicode_characters
So, from a winforms program, I tried this :
MessageBox.Show("Unicode latin cross character follows : \u271D");
but it shows a small box shape where the cross was expected.
Is there a safe* way to output this char ?
Note * : "Safe", meaning it will work on the average PC which has standard Windows fonts installed.
(I actually want to output this char in a SSRS report. If it can't be done in text, I'll have to use an image).
This appears to be a font problem; a small box typically indicates that. The character “✝” LATIN CROSS (U+271D) has limited coverage in fonts. None of the fonts normally shipped with Windows contains it. The Arial Unicode MS font does, but it is not part of Windows but comes with Microsoft Office (and some other products).
Consider using the widely supported “†” DAGGER (U+2020) instead, in bold face or larger size if needed an applicable.
However, there is an ugly hack that may or may not work, depending on the software used: set the font to Wingdings 2 (commonly available on Windows) and output \u0085 or \u0086. This depends on the use of Wingdings 2 as a font with an 8-bit encoding. I don’t expect the trick to work in most modern environments.
Related
Some background:
I’m developing a WPF application used for measuring and comparing data that’s delivered of a balance attached to it. It will be installed on a Windows 10 system delivered together with the balance. The application currently has to support eleven languages including Japanese and Chinese. One feature of the application is, that the values measured can be shown as a PDF report. For the creation of the PDF we use the PDFsharp library, for displaying the Telerik RadPdfViewer.
As mentioned before the application supports Japanese and Chinese language, therefore the PDF also needs the ability to show Japanese and Chinese letters. Earlier versions of the application were delivered on older versions of Windows, which meant we could use either the Microsoft YaHei or Arial Unicode MS font for this case. Unfortunately PDFsharp does not support TrueType font collections, which means I can’t use the version of Microsoft Yahei installed on the System and Arial Unicode MS is not available anymore.
Problem:
Since these two aren’t an option anymore I searched the internet for an alternative. After a quick search I noticed that Google’s Noto Sans might be what I need, so I tried to use it. Unfortunately it resulted in a ton of IndexOutOfRange Exceptions from the Telerik.Windows.Documents.Fixed.dll (all internally caught by the library) and does not show any letters on the generated PDF. It shows the PDF and the lines of the generated table, so I assume it’s some problem with the font. I used “Noto Sans CJK SC Regular” for Chinese and “Noto Sans CJK JP Regular” for Japanese.
After a long search session I still did not find another suitable font for Chinese. For Japanese I could use “Gen Shin Gothic”, but I’d prefer to use a font with the same style for both languages. There is the additional problem that the font should be usable without an additional license needed.
Unfortunately I can’t add code, since it’s a running project of my employer.
Questions:
Is there something I need to set/adjust for the Noto fonts to work properly?
Is there another usable sans-serif font for Chinese, where no additional licensing is needed? (It has to be capable of showing Latin letters too, since some stuff, like the product name is written that way)
Alternatively:
Is there a tweak to the PDFsharp library, so it can use TTC fonts?
A user posted their changes for TTC support on the PDFsharp forum:
https://forum.pdfsharp.net/viewtopic.php?p=9039#p9039
Another user found a suitable TTF file that works with PDFsharp:
https://forum.pdfsharp.net/viewtopic.php?p=11874#p11874
There is one TTF and its font name is "標楷體".
I know the default encoding for Windows in Western Europe is ISO-8859-1 and the default for web standards is UTF8 but I'm hoping (google is failing me) that someone knows the default for Windows/Visual Studio/C# software in India?
The reason is that we have an India-based company contacting our web services and getting a parse exception and my suspicion is that they aren't setting the encoding right (to UTF8) but testing with the English Windows default (ISO-8859-1) works so I'm investigating alternatives.
I may be wrong, but after a bit of research I concluded that if they are not using en_IN locale, they have no codepage for either GUI or console.
This MS official source lists Hindi codepage as 0.
This random copy of this list says that Hindi is a Unicode-only locale.
IANA claims codepage numbers 0, 1 and 2 are reserved.
Here we have Moodle developer who discovered that while he can use specialised codepages for text files under most of locales, they had to resort to UTF-8 (aka codepage 65001) text files under Hindi locale – files which in most other versions of Windows are called "Unicode files".
Here we have another developer who discovered that Hindi doesn't have a default codepage.
According to MSDN, all locale-sensitive functions default to C locale, which means ASCII for 8-bit strings.
So:
you cannot type Hindi without Unicode
Hindi locale probably treats all bytes >=128 in 8-bit strings as invalid characters, while in Windows-1252 most of them are valid; I'm guessing the application performs too many conversions bytes-text without taking encoding into account (or those Indians do)
and finally, other languages of India also have no ANSI codepage
I'm right now on Linux, but if you can, I suggest running programs via Applocale under various locales. I recommend Hindi, Japanese and Turkish – for the largest chance of revealing bugs.
But my bet is that they read that XML off the wire, convert to string with default encoding and it blows up.
I create an image in C# which should contain some text in Japanese. Then I put this image into the whole page which is also in Japanese. The whole page is displayed correctly (Encoding: UTF-8), but the image is rendered incorrectly. Instead of the correct text I get wrong symbols (not '?' but something similar to square).
I need to write this text on image in Arial. Does anyone know what might be wrong? Why text is not rendered correctly.
And one more think... when I test it on my local machine everything looks correctly, but when I deploy app with page on external server this strange error occurs.
In order to create image with text I use:
Font f = Font("Arial", 10f, FontStyle.Bold);
g.DrawString(text, f, b, rect);
The external server likely has a version of Arial installed, which doesn't include the Japanese character set (as far as I recall, the one that includes Japanese is called "Arial Unicode MS"). Remember that when you're generating an image in ASP.NET, it's the servers fonts that are used.
Note, however, that legally, you're not allowed to install "Arial Unicode MS", except when it's part of Office - or if you've licensed it ("Arial Unicode") from Monotype/Ascender. It may be a more viable option to replace Arial with some other typeface, depending on your funds (I'll keep my subjective opinions on Arial out of this).
When installing a new font on the server, make sure you restart IIS. .NET won't recognize installed fonts until a restart (it's not enough to restart the application - may be enough to recycle the app pool, but I never tried that).
Update
If it still doesn't work, it's likely font fallback isn't in place. I.e., you're specifying "Arial", but GDI+ (DrawString) doesn't know to fall back to "Arial Unicode MS" for characters that aren't in Arial (Office sets this up on install, I think).
Two possibilities:
Change your code to actually use the font (i.e. "Arial Unicode MS") rather than "Arial" (which never has Japanese characters in any other versions). This has the disadvantage that if you're using other characters than Japanese, they may look (even) less good than in the standard "Arial" typeface, because "Arial Unicode MS" includes no kerning or other such features.
Or check if there's a link between Arial and other fonts in your (local) registry: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink - one of those fonts will be the one actually displaying your Japanese characters as a fallback font - it may not even be "Arial Unicode". You may add the same link manually in registry on the server (and probably restart IIS again).
Another likely candidate that may be used for fallback is "MS Gothic". As far as I recall, GDI+ uses the above "FontLink" system for font fallback, while WPF uses its own system. The easiest way to be sure (when you're using fonts that you control anyway) is to directly use a font that has Japanese glyphs. Arial Unicode is merely intended as a last resort for Windows when glyphs aren't found in other fonts - not as something that looks nice in itself.
My question might be a little bit confusing, but I think it's still worth of paying some attention.
Basically I'm designing a program to display all printable Unicode characters in a RichTextBox.
I'm using VC# 2010 Express Edition.
However, the RichTextBox has a critical problem: some special characters cannot be displayed correctly.
For example, some Korean Characters (ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑᄒ), can be displayed correctly in Microsoft Word. After I copy to the RichTextBox, the characters cannot be displayed correctly. However, when I copy back to Microsoft Word, it can be displayed correctly.
Therefore, it's a display problem (the characters themselves are correct). I guess it might be a font problem.
Some related property info:
RichTextBox.Font.GdiChaSet
RichTextBox.Font
How can I solve it? So that all printable Unicode characters can be displayed correctly (using different fonts for different CharSets are acceptable).
Actually, I need further assistance about removing all formatting when pasting
rtbxFileContent.Paste(DataFormats.GetFormat(DataFormats.Text)); // DataFormats.UnicodeText
I still need to have all printable characters to be displayed correctly, but without any formatting (except font).
Thanks.
Hope I made myself understood.
I hate sounding like MS Office Clippy, but your questions seems a lot like this one.
Essentially, you're not mad, it is hard. You could try reading/writing the text manually, using UTF8Encoding and BinaryWriter/BinaryReader.
I found the font "Arial Unicode MS" can almost solve my problem, but some characters from Char Sets looks weird to me. (Also, what if the user computer has not installed the font "Arial Unicode MS"?
So I'm still looking for a better universal solution to my question: automatically using different font for different Char Sets in the RichTextBox.
Thanks.
My application needs to be able to display text in English, German, Chinese, and Korean. I would like to use a single font throughout the application. I know I could use Arial Unicode MS or Lucida Sans Unicode. But they are both very large and need to be licensed.
Is the a good font that I could use?
edit:
This is a windows forms application.
First of all, Arial Unicode MS is a fallback font, it shouldn't be used explicitly. And the idea of having a single font is of much less use than most people think. Keep in mind that the scripts are already very different. You'll need Latin, sinographs and Hangul. While all those probably can be found in a single font I have found the Latin characters to be pretty ugly in comparison.
Furthermore, the operating system already knows which font it can use for which script. And mostly it does a pretty good job at choosing the right one.
It would help if you can say what kind of application we are talking about.
There are big differences (for instance) between ASP.NET (accessed with random browsers from random operating systems) and forms/WPF/Win32 applications.
You cannot use the same font between Korean, Traditional Chinese and Simplified Chinese.
If you do, it is immediately obvious to a native if the font is not the one for his language.