Using C#, Framework 4.0, I'm facing a tricky problem with the german language.
Considering this snippet :
string l_stest = "ZÄHLWERKE";
Console.WriteLine(l_stest.Length); // 9
Console.WriteLine(toto.LengthInTextElements); // 9
Console.ReadLine();
The result will be 9;
Now, selecting the text withing Notepad++, it will give me a length of 10.
I'm guessing the encoding is the source of my problem but without having to scan my words and replace the Umlauts by the matching two letters (Ä -> AE), how can I proceed to calculate precisely the length of my strings ?
Edit : I consider the correct length is 10.
Thanks in advance !
Encoding.UTF8.GetByteCount(l_stest) looks like it'll get the length you want.
Related
Issue identifying the form feed character in c# code when reading a file
string contents = File.ReadAllText(file);
I have attempted to encode in various formats and then run a replace using UTF-8 hex, UTF-32 hex values for the character.
In the watch window I see
'\f' character
but when i expand out the visualizer i see the actual female character
how do you identify which is the correct character to be searching for? Either the \f or some variation of the female sign?
I have looked at this site for the variations of encoding values with no luck at actually finding it in c#: www.fileformat.info/info/unicode/char/2640/index.htm
Your question is a little vague on whether you are trying to find the character \f or the ♀ character.
If you are trying to find the ♀ character, you can use the hexadecimal code 0x2640, or simply use the character as-is:
var ctn = File.ReadAllText("file.txt", Encoding.UTF8);
int pos = ctn.IndexOf((char)0x2640);
int pos1 = ctn.IndexOf('♀');
Clarification: I think the confusion might come from the fact that character ALT+12 and character ALT+2640 often produces the same 'Female Sign' character, but this is for historical reasons, as the ALT+12 is, in ASCII, a device control code. Only the ALT+2640 Unicode character is specifically designed to always produce the ♀ sign.
So, I re-ran everything this morning with the following combination of UTF8 encoding and searching on '\f'
string contents = File.ReadAllText(file, Encoding.UTF8);
int pos = contents.IndexOf("\f");
and finally got a hit.
I still don't know why the watch and visualizer display the character differently, but that combination of searching works.
Thanks everyone.
I have a small problem using PadLeft and PadRight.
So I have in my code that the user can input the character they want to use for padding and how many characters they wanna put in. Like this:
String StartString;
int AmountOfCharacters;
Char PadCharacter;
StartString = TextBoxString.Text(Lawnmower)
AmountOfCharacters = Convert.ToInt32(TextBoxAmountofCharacters.Text) (Lets Say 5)
PadCharacter = Convert.ToChar(TextBoxPadCharacter.Text)(Lets use *)
So then later i have put.
Padding = String.PadLeft(AmountOfCharacters,PadCharacter)
Now the problem I have when I run the code as I have it above it doesn't do anything.
It just gives me as text lawnmower without any **** attatched.
Do I have to change something in my code to make it work or am I using the wrong variables for this?
Because when I use the PadCharacter as a String to I get a error message
Cannot implicitly convert String to char.
You misunderstand how PadLeft() works. The length you specify as a parameter (in your case AmountOfCharacters) does not specify how many characters you want added but how many characters the string should have at the end (at least).
So when you specify the string "Lawnmower" and AmountOfCharacters = 5, nothing will happen because the word Lawnmower is already more than 5 characters long.
If StartString charecter count is less then AmountOfCharacters you can see stars infront of StartString. The number of stars will be
[AmountOfCharacters] - [StartString Character Count]
String is a sequence of characters, but not itself a character - that's why Convert.ToChar fails with an exception. Try TextBoxPadCharacter[0] to get the first character of user input. You will also need to verify that the input is non-empty.
I'm new to programming and self taught. I'm trying to output the astrological symbol for Taurus, which is supposed to be U+2649 in Unicode. Here is the code I'm using...
string myString = "\u2649";
byte[] unicode = System.Text.Encoding.Unicode.GetBytes(myString);
Console.WriteLine(unicode.Length);
The result I'm getting is the number 2 instead of the symbol or font. I'm sure I'm doing something wrong.
Why are you converting it to unicode, this will not do anything.. lose the conversion and do the following:
string a ="\u2649" ;
Console.write(a) ;
You need to have a font which displays that glyph. If you do, then:
Console.WriteLine(myString);
is all you need.
EDIT: Note, the only font I could find which has this glyph is "MS Reference Sans Serif".
The length of the Unicode character, in bytes, is 2 and you are writing the Length to the Console.
Console.WriteLine(unicode.Length);
If you want to display the actual character, then you want:
Console.WriteLine(myString);
You must be using a font that has that Unicode range for it to display properly.
UPDATE:
Using default console font the above Console.WriteLine(myString) will output a ? character as there is no \u2649. As far I have so far googled, there is no easy way to make the console display Unicode characters that are not already part of the system code pages or the font you choose for the console.
It may be possible to change the font used by the console: Changing Console Fonts
You are outputting the length of the character, in bytes. The Console doesn't support unicode output, however, so it will come out as an '?' character.
What am I missing:
decVal = Decimal.Parse(myAr[0]);
Or
Decimal.TryParse(myAr[0], out decVal);
Fails !
Input string was not in correct foramt.
myAr[0] is "678016".
Tried to add NumberStyle.Any and CultureInfo.InvarialtCulture but got the same results.
More info on the string:
it is concatenated with some letters in hebrew and a "\u200e" space between them. and then I use split(' ') to get the numbers back.
This is probably the source of this error, but when I check the myAr[0] in the watch it is pure string....
Guys I've found the answer, I'll rewrite the question for future generation.
The Original string was a concatenation of letters and numbers separated with a special sequence to preserve the order in a rtl situation: "\u200E".
The number where extracted later using string.split(' ') which seems to work OK (in the watch) be it caused the problem.
once I used string.split("\u200e").ToCharArray() I got the same results, but now the decimal.Parse is working.
It looks like the special char was still inside the string, invisible to the watch.
This is weird, on my machine (.NET 4) even this works:
Decimal.TryParse("asdf123&*", out someDecimal);
By works I mean that TryParse returns false, no exception is thrown.
Parse method may throw an exception - maybe you have some whitespace or string literally contains " (quotes)?
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Which passwordchar shows a black dot (•) in a winforms textbox?
Unicode encoding for string literals in C++11
I want to use code to reveal the password or make it a dot like •
textBoxNewPassword.PasswordChar = (char)0149;
How can I achieve this?
http://blog.billsdon.com/2011/04/dot-password-character-c/ suggests '\u25CF';
Or try copy pasting this •
(not exactly an answer to your question, but still)
You can also use the UseSystemPasswordChar property to select the default password character of the system:
textBoxNewPassword.UseSystemPasswordChar = true;
Often mapped to the dot, and always creating a consistent user experience.
You need to look into using the PasswordBox control and setting the PasswordChar as *.
Example:
textBox1.PasswordChar = '*'; // Set a text box for password input
Wikipedia has a table of similar symbols.
In C#, to make a char literal corresponding to U+2022 (for example) use '\u2022'. (It's also fine to cast an integer literal as you do in your question, (char)8226)
Late addition. The reason why your original approach was unsuccessful, is that the value 149 you had is not a Unicode code point. Instead it comes from Windows-1252, and Windows-1252 is not a subset of Unicode. In Unicode, decimal 149 means the C1 control code "Message Waiting".
You could translate from Windows-1252 with:
textBoxNewPassword.PasswordChar =
Encoding.GetEncoding("Windows-1252").GetString(new byte[] { 149, })[0];
but it is easier to use the Unicode value directly of course.
In newer versions of .NET, you need to call:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
before you can use something like Encoding.GetEncoding("Windows-1252").
textBoxNewPassword.PasswordChar = '\u25CF';