Encoding issue: vbscript "Chr()" to .Net C#

Encoding issue: vbscript "Chr()" to .Net C# - c#

I can't seem to find the answer to this question.
It seems like I should be able to go from a number to a character in C# by simply doing something along the lines of (char)MyInt to duplicate the behaviour of vb's Chr() function; however, this is not the case:
In VB Script w/ an asp page, if my code says this:
Response.Write(Chr(139))
It outputs this:
‹ (character code 8249)
Opposed to this:
(character code 139)
I'm missing something somewhere with the encoding, but I can't find it. What encoding is Chr() using?

Chr() uses the system default encoding, I believe - so it's roughly equivalent to:
byte[] bytes = new byte[] { 139 };
char c = Encoding.Default.GetString(bytes)[0];
On my box (Windows CP1252 as the default) that does indeed give Unicode 8249.

If you want to call something that has exactly the behaviour of VB's Chr from C#, then, why not simply call it rather than trying to deduce its behaviour?
Just put a "using Microsoft.VisualBasic;" at the top of your C# program, add the VB runtime DLL to your references, and go to town.

If you cast an int to a char, you will get the character with the Unicode character code that was in the integer. The char data type is just a 16 bit UTF-16 character code.
To get the equivalent of the VBScript chr() function in .NET you would need something like:
string s = Encoding.Default.GetString(new byte[]{ 139 });

Related

Weird behavior C#

Somehow I'm getting a weird result from a GetString(). So, in my project I got this code:
byte[] arrayBytes = System.Convert.FromBase64String(n["spo_fdat"].InnerText);
string str = System.Text.Encoding.UTF8.GetString(arrayBytes);
The InnerText Value and the code is in: https://dotnetfiddle.net/mMUlti
So, my problem is that somehow I'm getting this result on my Visual Studio:
While in the online compiler that I post above the output is as expected.
This output is an output for a printer and this \0 are destroying the format.
Anyone have a clue of what is going on and what should I do/try?

It looks like for some reason every other byte in your input is null. If you strip those out you get something that looks much more plausible as printer commands (though I am no expert). Hopefully you can verify things...
To do this all I did was added this line in:
arrayBytes = arrayBytes.Where((x,i)=>i%2==0).ToArray();
The where command takes the value (x), and index (i) and if the index mode 2 is 0 (ie its even) then the where clause allows it - if its odd it throws it away.
The output I get from this starts:
CT~~CD,~CC^~CT~
^XA~TA000~JSN^LT0^MNW^MTT^PON^PMN^LH0,0^JMA^PR2,2~SD15^JUS^LRN^CI0^XZ
^XA
^MMT
^PW607
^LL0406
There are some non-printing character in there too that look like possible printing commands (eg 16 is the first character that is "data link escape" character.
Edited afterthought:
The problem you have here is obviously a problem with the specification. It seems to be that your input is wrong. You need to talk to whoever generated it find out the specification they are using to generate it, make sure their ode matches that spec and then right your code to accept that spec. With a solid specification you should both be writing compatible code.

Try inspecting the bytes instead. You'll see that what you have encoded in the base-64 string is much closer to what Visual Studio shows to you in comparison to the output from dotnetfiddle. Consoles usually don't escape non-printables (such as \0 - the null character) whereas Visual Studio string inspector does so in attempt to provide as much value to its user as possible.
Looking at your base-64 encoded data, it looks way more like UTF-16 than UTF-8. If you decode it like so, you'll perhaps get rid of the null characters in Visual Studio inspector as well.
Regardless of that, the base-64 data don't make much sense. More semantical context is required to figure out what the issue is.
According to inspection by Chris, it looks like the data is UTF-8 encoded in UTF-16.
You should be able to get proper results with the following:
var xml = //your base-64 input...
var arrayBytes = Convert.FromBase64String(xml);
var utf16 = Encoding.Unicode.GetString(arrayBytes);
var utf8Bytes = utf16.Select(c => (byte)c).ToArray();
var utf8 = Encoding.UTF8.GetString(utf8Bytes);
Console.WriteLine(utf8);
The opposite is probably how your input was created. However, you could also go for Chris' solution of ignoring every odd byte as it is basically the same with less weird encoding things going on (although this may be more explicit to what really goes on: UTF-8 inside UTF-16).

Convert Extended character to int in csharp

I had to copy an encryption, decryption function from VB6 to csharp. I am running into a problem with extended ascii characters. As an example, the character in question has an Extended ASCII value of 155 (looks like a smaller version of the '>').
I learned from my Google searches that there are many extended ascii versions (pages?) but I just need the standard Latin-1 shown here http://www.ascii-code.com/
But I could not find a clear way to do what I need. What I need is a way to get the value 155 (and any others in the extended set) from the character. VB6 does this with a simple Asc(String) statement. I just need a way to emulate this statement in csharp.

You can do something like this:
string str = "›";
var encoding = System.Text.Encoding.Default;
var values = encoding.GetBytes(str); //Result is { 155 }
The trick here is to get an encoding object for the Windows-1252 code page, then use GetBytes to convert the string into a byte array.

Why do some character literals cause Syntax Errors in Java?

In the latest edition of JavaSpecialists newsletter, the author mentions a piece of code that is un-compilable in Java
public class A1 {
Character aChar = '\u000d';
}
Try compile it, and you will get an error, such as:
A1.java:2: illegal line end in character literal
Character aChar = '\u000d';
^
Why an equivalent piece of c# code does not show such a problem?
public class CharacterFixture
{
char aChar = '\u000d';
}
Am I missing anything?
EDIT: My original intention of question was how c# compiler got unicode file parsing correct (if so) and why java should still stick with the incorrect(if so) parsing?
EDIT: Also i want myoriginal question title to be restored? Why such a heavy editing and i strongly suspect that it heavily modified my intentions.

Java's compiler translates \uxxxx escape sequences as one of the very first steps, even before the tokenizer gets a crack at the code. By the time it actually starts tokenizing, there are no \uxxxx sequences anymore; they're already turned into the chars they represent, so to the compiler your Java example looks the same as if you'd actually typed a carriage return in there somehow. It does this in order to provide a way to use Unicode within the source, regardless of the source file's encoding. Even ASCII text can still fully represent Unicode chars if necessary (at the cost of readability), and since it's done so early, you can have them almost anywhere in the code. (You could say \u0063\u006c\u0061\u0073\u0073\u0020\u0053\u0074\u0075\u0066\u0066\u0020\u007b\u007d, and the compiler would read it as class Stuff {}, if you wanted to be annoying or torture yourself.)
C# doesn't do that. \uxxxx is translated later, with the rest of the program, and is only valid in certain types of tokens (namely, identifiers and string/char literals). This means it can't be used in certain places where it can be used in Java. cl\u0061ss is not a keyword, for example.

How does WChar relate to Unicode and ASCII

I am about to show my total ignorance of how encoding works and different string formats.
I am passing a string to a compiler (Microsoft as it happens amd for their Flight Simulator). The string is passed as part of an XML document which is used as the source for the compiler. This is created using using standard NET strings. I have not needed to specifically specify any encoding or setting of type since the XML is just text.
The string is just a collection of characters. This is an example of one that gives the error:
ARG, AFL, AMX, ACA, DAH, CCA, AEL, AGN, MAU, SEY, TSC, AZA, AAL, ANA, BBC, CPA, CAL, COA, CUB, DAL, UGX, ELY, UAE, ERT, ETH, EEZ, GHA, IRA, JAL, NWA, KAL, KAC, LAN, LDI, MAS, MEA, PIA, QTR, RAM, RJA, SVA, SIA, SWR, ROT, THA, THY, AUI, UAL, USA, ACA, TAR, UZB, IYE, QFA
If I create the string using my C# managed program then there is no issue. However this string is coming from a c++ program that can create the compiled file using its own compiler that is not compliant with the MS one
The MS compiler does not like the string. It throws two errors:
INTERNAL COMPILER ERROR: #C2621: Couldn't convert WChar string!
INTERNAL COMPILER ERROR: #C2029: Failed to convert attribute value from UNICODE!
Unfortunately there is not any useful documentation with the compiler on its errors. We just makethe best of what we see!
I have seen other errors of this type but these contain hidden characters and control characters that I can trap and remove.
In this case I looked at the string as a Char[] and could not see anything unusual. Only what I expected. No values above the ascii limit of 127 and no control characters.
I understand that WChar is something that C++ understands (but I don't), Unicode is a two byte representation of characters and ASCII is a one byte representation.
I would like to do two things - first identify a string that will fail if passed to the compiler and second fix the string. I assume the compiler is expecting ASCII.
EDIT
I told an untruth - in fact I do use encoding. I checked the code I used to convert a byte array into a string.
public static string Bytes2String(byte[] bytes, int start, int length) {
string temp = Encoding.Defaut.GetString(bytes, start, length);
}
I realized that Default might be an issue but changing it to ASCII makes no difference. I am beginning to believe that the error message is not what it seems.

It looks like you are taking a byte array, and converting it as a string using the encoding returned by Encoding.Default.
It is recommended that you do not do this (in the Microsoft documentation).
You need to work out what encoding is being used in the C++ program to generate the byte array, and use the same one (or a compatible one) to convert the byte array back to a string again in the C# code.
E.g. if the byte array is using ASCII encoding, you could use:
System.Text.ASCIIEncoding.GetString(bytes, start, length);
or
System.Text.UTF8Encoding.GetString(bytes, start, length);
P.S. I hope Joel doesn't catch you ;)

I have to come clean that the compiler error has nothing to do with the encoding format of the string. It turns out that it is the length of the string that is at fault. As per the sample there are a number of entries separated by commas. The compiler throws the rather unhelful messages if the entry count exceeds 50.
However Thanks everyone for your help - it has raised the issue of encoding in my mind and I will now look at it much more carefully

What is the C# equivalent of ChrW(e.KeyCode)?

In VB.NET 2008, I used the following statement:
MyKeyChr = ChrW(e.KeyCode)
Now I want to convert the above statement into C#.
Any Ideas?

The quick-and-dirty equivalent of ChrW in C# is simply casting the value to char:
char MyKeyChr = (char)e.KeyCode;
The longer and more expressive version is to use one of the conversion classes instead, like System.Text.ASCIIEncoding.
Or you could even use the actual VB.NET function in C# by importing the Microsoft.VisualBasic namespace. This is really only necessary if you're relying on some of the special checks performed by the ChrW method under the hood, ones you probably shouldn't be counting on anyway. That code would look something like this:
char MyKeyChr = Microsoft.VisualBasic.Strings.ChrW(e.KeyCode);
However, that's not guaranteed to produce exactly what you want in this case (and neither was the original code). Not all the values in the Keys enumeration are ASCII values, so not all of them can be directly converted to a character. In particular, casting Keys.NumPad1 et. al. to char would not produce the correct value.

Looks like the C# equivalent would be
var MyKeyChr = char.ConvertFromUtf32((int) e.KeyCode)
However, e.KeyCode does not contain a Unicode codepoint, so this conversion is meaningless.

The most literal way to translate the code is to use the VB.Net runtime function from C#
MyKeyChr = Microsoft.VisualBasic.Strings.ChrW(e.KeyCode);
If you'd like to avoid a dependency on the VB.Net runtime though you can use this trimmed down version
MyKeyChr = Convert.ToChar((int) (e.KeyCode & 0xffff));

The C# equivalent of ChrW(&H[YourCharCode]) is Strings.ChrW(0x[YourCharCode])
You can use https://converter.telerik.com/ do convert between VB & C#.

This worked for me to convert VB:
e.KeyChar = Microsoft.VisualBasic.ChrW(13)
To C#:
e.KeyChar == Convert.ToChar(13)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.