This question already has answers here:
How to write Unicode characters to the console?
(5 answers)
Closed 6 years ago.
Background:
I have a table in SQL database, two of the columns are English names and Simplified Chinese names.
With C#, I had records of this table displayed as buttons with both names, such as: Car车. This is how I did it:
button.Text = x.EnglishName + x.ChineseName; The buttons displays correctly.
3.I would like to compare button.Text to other strings, like so:
for (int K = 0; K < alist.Count; K++)
{
string alpha = alist.[K];
if (alpha == button.Text)
//blahblahblah
}
Problem:
There is always an error.
And I found out why: when I use Console.Writeline(button.Text), the output is Car?.
Each Chinese character is turned into a "?"
So, apparently, writing Chinese characters onto the face of a button is fine. But when reading Chinese characters off the face of a button does not work.
How do I correct this?
You might need to change the Encoding type -
Console.OutputEncoding = Encoding.Unicode // For UTF-16
See here for other encoding types available.
Related
I'm parsing a number of text files that contain 99.9% ascii characters. Numbers, basic punctuation and letters A-Z (upper and lower case).
The files also contain names, which occasionally contain characters which are part of the extended ascii character set, for example umlauts Ü and cedillas ç.
I want to only work with standard ascii, so I handle these extended characters by processing any names through a series of simple replace() commands...
myString = myString.Replace("ç", "c");
myString = myString.Replace("Ü", "U");
This works with all the strange characters I want to replace except for Ø (capital O with a forward slash through it). I think this has the decimal equivalent of 157.
If I process the string character-by-character using ToInt32() on each character it claims the decimal equivalent is 65533 - well outside the normal range of extended ascii codes.
Questions
why doesn't myString.Replace("Ø", "O"); work on this character?
How can I replace "Ø" with "O"?
Other information - may be pertinent. Opening the file with Notepad shows the character as a "Ø". Comparison with other sources indicate that the data is correct (i.e. the full string is "Jørgensen" - a valid Danish name). Viewing the character in visual studio shows it as "�". I'm getting exactly the same problem (with this one character) in hundreds of different files. I can happily replace all the other extended characters I encounter without problems. I'm using System.IO.File.ReadAllLines() to read all the lines into an array of strings for processing.
Replace works fine for the 'Ø' when it 'knows' about it:
Console.WriteLine("Jørgensen".Replace("ø", "o"));
In your case the problem is that you are trying to read the data with the wrong encoding, that's why the string does not contain the character which you are trying to replace.
Ø is part of the extended ASCII set - iso-8859-1, but File.ReadAllLines tries to detect encoding using BOM chars and, I suspect, falls back to UTF-8 in your case (see Remarks in the documentation).
The same behavior you see in the VS code - it tries to open the file with UTF-8 encoding and shows you �:
If you switch the encoding to the correct one - it shows the text correctly:
If you know what encoding is used for your files, just use it explicitly, here is an example to illustrate the difference:
// prints J?rgensen
File.ReadAllLines("data.txt")
.Select(l => l.Replace("Ø", "O"))
.ToList()
.ForEach(Console.WriteLine);
// prints Jorgensen
File.ReadAllLines("data.txt",Encoding.GetEncoding("iso-8859-1"))
.Select(l => l.Replace("Ø", "O"))
.ToList()
.ForEach(Console.WriteLine);
If you want to use chars from the default ASCII set, you may convert all special chars from the extended set to the base one (it will be ugly and non-trivial). Or you can search online how to deal with your concern, and you may find String.Normalize() or this thread with several other suggestions.
public static string RemoveDiacritics(string s)
{
var normalizedString = s.Normalize(NormalizationForm.FormD);
var stringBuilder = new StringBuilder();
for(var i = 0; i < normalizedString.Length; i++)
{
var c = normalizedString[i];
if(CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
...
// prints Jorgensen
File.ReadAllLines("data.txt", Encoding.GetEncoding("iso-8859-1"))
.Select(RemoveDiacritics)
.ToList()
.ForEach(Console.WriteLine);
I'd strongly recommend reading C# in Depth: Unicode by Jon Skeet and Programming with Unicode by Victor Stinner books to have a much better understanding of what's going on :) Good luck.
PS. My code example is functional, compact but pretty inefficient, if you parse huge files consider using another solution.
This question already has answers here:
How to insert a Symbol (Pound, Euro, Copyright) into a Textbox
(2 answers)
Closed 2 years ago.
I want to display the String with superscript like Shibu® using C#.
Unicode of ®:- U+000AE
Here is the code:-
String s = "Shibu";
Console.write(s.join("\xBU+000AE", s));
I am not getting proper output like Shibu®.
You need to concatenate those strings, string.Join is for joining several strings together with a special one repeated in between.
Also, representation of unicode characters is done with \uXXXX where XXXX is the hexadecimal code point value.
string s = "Shibu";
Console.WriteLine(s + "\u00AE");
Or simply
string s = "Shibu\u00AE";
Console.WriteLine(s);
Also, you can directly write unicode characters; C# strings are unicode.
string s = "Shibu®";
Console.WriteLine(s);
This does not, however, set the character as "superscript" in the unicode or font meaning of term. I'm not sure that is possible with native C# strings, you need to cope with the existing basic unicode characters.
For fancier, use a visual rich textbox control, or a WPF control, that allow you to set font options.
This question already has answers here:
How to write Unicode characters to the console?
(5 answers)
Closed 9 years ago.
I tried to print this character ’ using Console.WriteLine((char) 146); but it printed ?. When I set Console.OutputEncoding = System.Text.Encoding.UTF8 it printed some glitched characters, not the one I needed.
The code you need is 8217.
But you also have to enable UTF8 encoding and change font, to the one which can display UTF8 characters:
Console.OutputEncoding = Encoding.UTF8;
int value = '’';
Console.WriteLine((char)value);
Console.ReadLine();
And if your current console font doesnt support this character you may also have to change it.
How?
After you launch the console right-click on the title bar -> properties -> fonts -> Lucida Console
And voila it works!
Have you tried this?
static void Main(string[] args)
{
Console.WriteLine((char)39);
}
At least this works for me.
Using C#, Framework 4.0, I'm facing a tricky problem with the german language.
Considering this snippet :
string l_stest = "ZÄHLWERKE";
Console.WriteLine(l_stest.Length); // 9
Console.WriteLine(toto.LengthInTextElements); // 9
Console.ReadLine();
The result will be 9;
Now, selecting the text withing Notepad++, it will give me a length of 10.
I'm guessing the encoding is the source of my problem but without having to scan my words and replace the Umlauts by the matching two letters (Ä -> AE), how can I proceed to calculate precisely the length of my strings ?
Edit : I consider the correct length is 10.
Thanks in advance !
Encoding.UTF8.GetByteCount(l_stest) looks like it'll get the length you want.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Which passwordchar shows a black dot (•) in a winforms textbox?
Unicode encoding for string literals in C++11
I want to use code to reveal the password or make it a dot like •
textBoxNewPassword.PasswordChar = (char)0149;
How can I achieve this?
http://blog.billsdon.com/2011/04/dot-password-character-c/ suggests '\u25CF';
Or try copy pasting this •
(not exactly an answer to your question, but still)
You can also use the UseSystemPasswordChar property to select the default password character of the system:
textBoxNewPassword.UseSystemPasswordChar = true;
Often mapped to the dot, and always creating a consistent user experience.
You need to look into using the PasswordBox control and setting the PasswordChar as *.
Example:
textBox1.PasswordChar = '*'; // Set a text box for password input
Wikipedia has a table of similar symbols.
In C#, to make a char literal corresponding to U+2022 (for example) use '\u2022'. (It's also fine to cast an integer literal as you do in your question, (char)8226)
Late addition. The reason why your original approach was unsuccessful, is that the value 149 you had is not a Unicode code point. Instead it comes from Windows-1252, and Windows-1252 is not a subset of Unicode. In Unicode, decimal 149 means the C1 control code "Message Waiting".
You could translate from Windows-1252 with:
textBoxNewPassword.PasswordChar =
Encoding.GetEncoding("Windows-1252").GetString(new byte[] { 149, })[0];
but it is easier to use the Unicode value directly of course.
In newer versions of .NET, you need to call:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
before you can use something like Encoding.GetEncoding("Windows-1252").
textBoxNewPassword.PasswordChar = '\u25CF';