I am writing a console tester for a web service that I am using in my app. When I write the output to my console(JSON) for a large enough set the console app spins and I start getting a beeping noise for 5-10 seconds. I checked the output for a \a and couldn't find one, so I'm not sure what is causing the beeping.
At this point I am just guessing the long output, but I am unsure what else the problem could be or if there are any solutions.
Could be BEL?
ctl-G
7
7
BEL
BELL
A control character that is used when there is a need to call for attention; it may control alarm or attention devices.
I've seen something similar when trying to write to a Event Log, but it is full.
Try Start > Run... > eventvwr then either clearing out some logs or changing the maximum size of logs (via the 'Action' menu).
Even if you check the input for BELL characters, it may still beep. This is due to font settings and unicode conversion. The character in question is U+2022, Bullet.
Raymond Chen explains:
In the OEM code page, the bullet character is being converted to a
beep. But why is that?
What you're seeing is MB_USEGLYPHCHARS in reverse. Michael Kaplan
discussed MB_USEGLYPHCHARS a while ago. It determines whether certain
characters should be treated as control characters or as printable
characters when converting to Unicode. For example, it controls
whether the ASCII bell character 0x07 should be converted to the
Unicode bell character U+0007 or to the Unicode bullet U+2022. You
need the MB_USEGLYPHCHARS flag to decide which way to go when
converting to Unicode, but there is no corresponding ambiguity when
converting from Unicode. When converting from Unicode, both U+0007 and
U+2022 map to the ASCII bell character.
Related
In C# StringInfo and TextElementEnumerator classes provide methods and properties for text elements.
And here, we can find the definition of the Text Element.
The .NET Framework defines a text element as a unit of text that is
displayed as a single character, that is, a grapheme. A text element
can be any of the following:
Yes, it says a text element is a grapheme in .NET. I also tested with some unicode characters myself, and it really seemed true until I tested one Korean letter '가'.
As we all know some Unicode characters consist of multiple code points. Also we may face code point sequences and that's the reason I'm using StringInfo and TextElementEnumerator instead of simple String.
StringInfo and TextElementEnumerator could tell if Chars were surrogate pairs correctly. And "\u0061\u0308", a Unicode character which consists of multiple code points, was recognized as one text element just as expected. But as for "\u1100\u1161", it failed to say that it was also one text element.
"\u1100" is a leading letter "ㄱ", and "\u1161" is a vowel letter "ㅏ". They can be individual characters and shown to the users just as I write here and you can see them now. But if they are used together, they are rendered as one character "가" instead of "ㄱㅏ".
There are two ways in order to represent a Korean character "가":
Using a single code point U+AC00 from Hangul Syllable.
Using two code points U+1100 and U+1161 from Jamo.
Most of the time the former is used. The latter is rarely used, to be honest, I can't imagine when it's used at all..
Anyway, the first one is just one precomposed letter and the second is a sequence of Lead and Vowel which is treated as one character. When rendered they look the exactly same and both are actually canonically equivalent.
Also the following line returns true in C# :
"\u1100\u1161".Normalize() == "\uAC00"
I wonder why Normalize() here works just fine when C# doesn't think they are one complete text element..
I thought it had something to do with my .NET's version, but it turns out it's not the case. This thing happens even in Mono too.
I tested this with ICU as well, and it could treat "\u1100\u1161" as one grapheme correctly!
I initially thought StringInfo and TextElementEnumerator could eliminate need for ICU4C in some simple cases, so I'm very disappointed now..
Here's my question :
Am I doing something wrong here?
or
A Text Element in .NET isn't a user-perceived character unlike in ICU?
The basic issue here is that per the Korean standard KS X 1026, the two jamos ㄱ and ㅏ are distinct from their combined form 가. In fact, this exact example is used in the official standard (see section 6.2).
Long story short, Microsoft attempted to follow the standard but other operating systems and applications don't necessarily do so. Hence you can get "malformed" content from other software / platforms that appears to be parsed incorrectly on Windows / in .NET, even though it is parsed "correctly" on those platforms.
You will either need to ensure your data is correctly formed in the first place (unlikely, given that the de-facto standard is to completely ignore the official standard) or you will need to use ICU (or a similar library) to deal with these cases.
I'm testing an SDK that extracts text from a searchable PDF. One of the SDK's dependencies was recently updated, and it's causing an existing test on Hebrew text to fail. I don't know Hebrew nor enough about how the involved technologies represent right-to-left languages.
The NUnit test asserts that the extracted text matches the C# string "מנבוצץז ".
string hebrewText = reader.ReadToEnd();
Assert.AreEqual("מנבוצץז ", hebrewText);
The rasterized PDF has what I believe are the same characters, but in the opposite order.
The unit test fails with this message:
Expected: "מנבוצץז "
But was: " זץצובנמ"
Although the actual result more closely matches what I see in the rasterized PDF, I'm not completely sure the original test is wrong.
Are Hebrew characters in a C# string supposed to be read right-to-left like printed Hebrew text?
Does any part of the .NET stack tamper with the direction of Hebrew strings?
What about NUnit?
Are Hebrew characters embedded in a searchable PDF normally supposed to go in the same direction as the rasterized text?
Anything else I should know before deciding whether to "fix" this unit test?
There are various ways to encode RTL languages. The most common way (and Window's default) is to use logical ordering, which means the first letter is encoded as the first character in a string (or file). So whether visually the first letter appears on the left or right side of the screen doesn't affect the order in which they are stored.
Now as for the text appearing in Visual Studio, it depends on the version. As far as I remember, prior to Visual Studio 2010 the code editor displayed Hebrew backwards, and it was apparent as when you tried to select Hebrew text, it reversed in an odd way (which was visually confusing). It appears this issue no longer exists is Visual Studio 2010 (at least with SP1 which I just tested).
Let's take a Hebrew word for which the direction is more clear to non-Hebrew speakers than the string specified in your text:
יון
The word happens to be the Hebrew word for an ion, and on your screen, it should appear as three letters where the tallest letter is on the left and the shortest is on the right. In a .NET string, the expression "יון".Substring(0, 1) will produce the short letter, since it's the first letter in the string. The string can also be written as "\u05D9\u05D5\u05DF" where the leftmost Unicode character \u05D9 represents the short letter displayed on the right, which clearly demonstrates the order in which the letters are stored.
Since the string in your test case is nonsensical, I can't tell you whether it was a wrong test all along or if it a correct test that should pass. If the image you uploaded has been rendered correctly then it appears the actual result of your test is correct and the expected value is incorrect, and so you should fix the test.
I believe that all strings in C# will be stored internally as LTR; RTL strings will have a non-printable character (or something) denoting that they are indeed RTL.
More than likely. RTL GUIs and rendered text for example need certain properties (specifically RightToLeft and RightToLeftLayout) to be set in order to display correctly.
NUnit shouldn't. Nor should it care. IMHO a reversed string != the original string.
I couldn't comment. I'd assume that they should be whatever the test is expecting though, assuming it was passing at first.
Don't do half measures with RTL, it really doesn't like it. Either have full RTL support, or nothing. It can be pretty nasty, I wish you the best of luck!
I'm writing a console app that needs to print some atypical (for a console app) unicode characters such as musical notes, box drawing symbols, etc.
Most characters show up correctly, or show a ? if the glyph doesn't exist for whatever font the console is using, however I found one character which behaves oddly which can be demonstrated with the lines below:
Console.Write("ABC");
Console.Write('♪'); //This is the same as: Console.Write((char)0x266A);
Console.Write("XYZ");
When this is run it will print ABC then move the cursor back to the start of the line and overwrite it with XYZ. Why does this happen?
The console doesn't use Uncode, so the characters has to be translated to an 8-bit code page. The ♪ character is converted to the character with code 13 (hex 0x0d), which is CR or Carrage Return.
In most code pages, for example code page 850, the CR chararacter glyph resembles a quarter note, and the 266a character is specified as the Unicode equivalent.
However, if you write the CR character to the console, it will not display the quarter note glyph, instead it is interpreted as the control character CR which moves the cursor to the beginning of the line.
Console.Write('♪'); is considered Unicode. My guess it is it translates it to the closest ASCII character. You should be using U+1D160 or the appropriate unicode, musical equivalent.
There are the required primitives to generate musical output in the Unicode code set (starting at U+1D100). For example, U+1D11A is a 5-line staff, U+1D158 is a closed notehead.
See http://www.unicode.org/charts/PDF/U1D100.pdf
..then the issue becomes making sure that you have a typeface with the appropriate glyphs included (and dealing with the issues of spacing things correctly, etc.)
IF you're looking to generate printed output, you should look at Lilypond, which is an OSS music notation package that uses a text file format to define the musical content and then generates gorgeous output.
I need to encode some data (text) so that it can easily be passed by the user over phone.
The text contains random characters and is normally not longer than 100 chars. Example:
"37-b,kA.sZ:Bb9--10.y<§"
I'd like to encode this text into more human readable form so that it can easily be passed over phone.
Base36 produces a text that can easily be passed over phone, but I don't see how to encode/decode this correctly.
Any ideas or alternatives?
(Platform is .net 3.5 SP1)
Base 36 sounds like a good choice (when using symbols a-z and 0-9, it is the largest space of characters, that can be easily passed over the phone). I would suggest you make the output contain blocks of 6 or 8 characters, to make it easier to read. Also; consider adding a checksum in the end, so you can verify there are no errors in the data.
100 characters in this encoding will still not be easy to read over the phone and get right the first time. Have you considered another delivery mechanism ? Text message (SMS) ?
On Wikipedia, there is an example of encoding Base36 in Python - shouldn't be too hard to convert to C#.
Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application. what i think is we may do it using ASCII values and convert.toChar();.. if i am right that please give me link of page where i can get ASCII values of all such a scientific symbols.
Please give me link of any URL which contains list of such a ASCII numbers.
Open the Windows character map (charmap.exe), select a Unicode font (Arial should suffice) and copy the symbols into your source code or resources. It's just characters. Of course, you can also switch to Greek keyboard layout, so you can write the characters directly rather than going the charmap route.
Note that you need to use a Unicode font for the labels. You can use charmap to look up which font has Greek characters.
Please tell me how can i show symbols like "lambda" or Mu using c#.net in desktop application.
You don't have to do anything special. Just use whatever letters you want in either the IDE or in strings in the program. C# treats Greek letters the same as any other letters; they are not special.
what i think is we may do it using ASCII values and convert.toChar();
Hold on, I have a phone call. Oh, it's for you. It's 1968 calling, and they want their character set back. :-)
ASCII proper only has 95 printable characters, and Greek letters are not among them. ASCII was invented for teletypes back in the 1960's; we don't use it anymore. Characters in modern programming environments are represented using Unicode, which provides uniform support for tens of thousands of characters in dozens of alphabets.
if i am right then please give me link of page where i can get ASCII values of all such a scientific symbols.
You can get a list of all the Unicode characters at unicode.org. But like I said, you don't need to. You can just embed the character you want directly in the text. There's no need to resort to clumsy tricks like unicode escapes. (Unless, of course, you are planning on sending your source code to your coworkers using a 1970's era teletype machine.)
C# applications are all Unicode - so there should be no problem assigning Unicode strings to the controls' text, for example:
textBox1.Text = "this is a lambda symbol - λ";
Try this
char c = '\u03BB'; //03BC
System.Console.WriteLine(c.ToString());
does it work for you?