ASCII Extended in C# string - c#

How do I make a string in C# to accept non printable ASCII extended characters like • , cause when I try to put • in a string it just give a blank space or null.

Extended ASCII is just ASCII with the 8 high bits set to different values.
The problem lies in the fact that no commission has ratified a standard for extended ASCII. There are a lot of variants out there and there's no way to tell what you are using.
Now C# uses UTF-16 encoding which will be different from whichever extended ASCII you are using.
You will have to find the matching Unicode character and display it as follows
string a ="\u2649" ; //where 2649 is a the Unicode number
Console.write(a) ;
Alternatively you could find out which encoding your files use and use it like so
eg. encoding Windows-1252:
Encoding encoding = Encoding.GetEncoding(1252);
and for UTF-16
Encoding enc = new UnicodeEncoding(false, true, true);
and convert it using
Encoding.Convert (Encoding, Encoding, Byte[], Int32, Int32)
Details are here

Try this..
Convert those charcaters as string as folows.
string equivalentLetter = Encoding.Default.GetString(new byte[] { (byte)letter });
Now, the equivalent letter contains the correct string.
I tried this for EURO symbol, it worked.

.NET strings are UTF-16 encoded, not extended-ascii (whatever that is). By simply adding a number to a character will give you another defined character within the UTF-16 plain set. If you want to see the underlying character as it would be in your extended ASCII encoding you need to convert the newly calculated letter from whatever encoding you are talking about to UTF-16. See: http://msdn.microsoft.com/en-us/library/66sschk1.aspx

Related

Decode UTF-8 bytes as Latin-1 characters

I have a string that I receive from a third party app and I would like to display it correctly in any language using C# on my Windows Surface.
Due to incorrect encoding, a piece of my string looks like this in Farsi (Persian-Arabic):
مدل-رنگ-موی-جدید-5-436x500
whereas it should look like this:
مدل-رنگ-موی-جدید-5-436x500
This link convert this correctly:
http://www.ltg.ed.ac.uk/~richard/utf-8.html
How I can do it in c#?
It is very hard to tell exactly what is going on from the description of your question. We would all be much better off if you provided us with an example of what is happening using a single character instead of a whole string, and if you chose an example character which does not belong to some exotic character set, for example the bullet character (u2022) or something like that.
Anyhow, what is probably happening is this:
The letter "ر" is represented in UTF-8 as a byte sequence of D8 B1, but what you see is "ر", and that's because in UTF-16 Ø is u00D8 and ± is u00B1. So, the incoming text was originally in UTF-8, but in the process of importing it to a dotNet Unicode String in your application it was incorrectly interpreted as being in some 8-bit character set such as ANSI or Latin-1. That's why you now have a Unicode String which appears to contain garbage.
However, the process of converting 8-bit characters to Unicode is for the most part not destructive, so all of the information is still there, that's why the UTF-8 tool that you linked to can still kind of make sense out of it.
What you need to do is convert the string back to an array of ANSI (or Latin-1, whatever) bytes, and then re-construct the string the right way, which is a conversion of UTF-8 to Unicode.
I cannot easily reproduce your situation, so here are some things to try:
byte[] bytes = System.Text.Encoding.Ansi.GetBytes( garbledUnicodeString );
followed by
string properUnicodeString = System.Text.Encoding.UTF8.GetString( bytes );

Convert Extended character to int in csharp

I had to copy an encryption, decryption function from VB6 to csharp. I am running into a problem with extended ascii characters. As an example, the character in question has an Extended ASCII value of 155 (looks like a smaller version of the '>').
I learned from my Google searches that there are many extended ascii versions (pages?) but I just need the standard Latin-1 shown here http://www.ascii-code.com/
But I could not find a clear way to do what I need. What I need is a way to get the value 155 (and any others in the extended set) from the character. VB6 does this with a simple Asc(String) statement. I just need a way to emulate this statement in csharp.
You can do something like this:
string str = "›";
var encoding = System.Text.Encoding.Default;
var values = encoding.GetBytes(str); //Result is { 155 }
The trick here is to get an encoding object for the Windows-1252 code page, then use GetBytes to convert the string into a byte array.

UTF-8 Encoding and Decoding in c#

I Searched for " How to Encode the data in utf-8 format". Regarding this I got the best result is following:
UTF8Encoding utf8 = new UTF8Encoding();
String unicodeString = "ABCD";
// Encode the string.
Byte[] encodedBytes = utf8.GetBytes(unicodeString);
// Decode bytes back to string.
String decodedString = utf8.GetString(encodedBytes);
But the Problem is when I see the encoded data I found that is not more than ASCII code.
can any one help me to improve my knowledge.
For example as I passed "ABCD " it gets converted into 65,66,67,68.... I think this is not utf-8
UTF-8 is backwards compatible with ASCII of course. You should test with some characters that are not included in ASCII.
If you program in C# the strings are already encoded in UTF-16. You will not see anything Special there. If you want to see something you should try to compare the LENGTH of the Byte[] when you encode the string into different Encodings.
Check out the Wikipedia article on UTF8: Wikipedia.
From there:
Backward compatibility: One-byte codes are used only for the ASCII
values 0 through 127. In this case the UTF-8 code has the same value
as the ASCII code. The high-order bit of these codes is always 0. This
means that UTF-8 can be used for parsers expecting 8-bit extended
ASCII even if they are not designed for UTF-8.
The point here is that for anything that would be ASCII 0-127 in UTF8 it's the same. You need to try more extended characters (an example in the article is the Euro symbol) to see how it's different. Or try an ASCII value greater than 127 and you'll see it different.

Convert ISCII characters to its UTF-8 encoding?

I want to convert the ascii encoded text input by my users into UTF-8 encoding, so that I can display it using any unicode font types. For example, I want to display english alphabet 'l' in ASCII as 'ക' in Unicode. I think I would require a mapping system too, so that I can Map l to 'ക'. Please help me to solve this issue.
Your text is in ISCII (Indian Script Code for Information Interchange). You need to convert ISCII with the proper code page to unicode. The following methods should do the job. Convert will convert a given text from one encoding to another. GetEncoding will provide you with the Encoding objects to be used by the Convert method.
Example code can be found here: http://www.dotnetframework.org/default.aspx/Net/Net/3#5#50727#3053/DEVDIV/depot/DevDiv/releases/whidbey/netfxsp/ndp/clr/src/BCL/System/Text/ISCIIEncoding#cs/1/ISCIIEncoding#cs
Code page identifiers can be found here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
public static byte[] Convert(System.Text.Encoding srcEncoding, System.Text.Encoding dstEncoding, byte[] bytes)
Member of System.Text.Encoding
Summary:
Converts an entire byte array from one encoding to another.
Parameters:
srcEncoding: The encoding format of bytes.
dstEncoding: The target encoding format.
bytes:
Returns:
An array of type System.Byte containing the results of converting bytes from srcEncoding to dstEncoding.
and this
public static System.Text.Encoding GetEncoding(int codepage)
Member of System.Text.Encoding
Summary:
Returns the encoding associated with the specified code page identifier.
Parameters:
codepage: The code page identifier of the preferred encoding. -or- 0, to use the default encoding.
Returns:
The System.Text.Encoding associated with the specified code page.
As per Wikipedia Article, the code page for Malayalam is 57009
Encoding.UTF8.GetString(Encoding.ASCII.GetBytes(input))
Your question makes no sense. Changing the encoding from ASCII to UTF-8 does not magically turn an l into a ക, it only changes the byte representation of the l (actually, since ASCII is a subset of UTF-8, it does not even do that here. It does nothing.)
What you probably want is some kind of transliteration between the Latin and Malayalam alphabet, but that is something completely different.

Default C# String encoding

I am having some issues with the default string encoding in C#. I need to read strings from certain files/packets. However, these strings include characters from the 128-256 range (extended ascii), and all of these characters show up as question marks , instead of the proper character. For example, when reading a string ,it could come up as "S?meStr?n?" if the string contained the extended ascii characters.
Now, is there any way to change the default encoding for my application? I know in java you could define the default character set from command line.
There's no one single "extended ASCII" encoding. There are lots of different 8-bit encodings which are compatible with ASCII for the bottom 128 values.
You need to find out what encoding your files actually use, and specific that when reading the data with StreamReader (or whatever else you're using). For example, you may want encoding Windows-1252:
Encoding encoding = Encoding.GetEncoding(1252);
.NET strings are always sequences of UTF-16 code points. You can't change that, and you shouldn't try. (That's true in Java as well, and you really shouldn't use the platform default encoding when calling getBytes() etc unless that's what you really, really mean.)
An Encoding can be specified in at least one overload of functions for reading text - for example, ReadAllText(string, Encoding).
So if you no a file's encoded using Windows-1252, then you can specify it like so:
string contents = File.ReadAllText(someFilePath, Encoding.GetEncoding(1252));
Of course, doing this requires knowing ahead of time which code page is being used.

Categories