EBCDIC support in c# - c#

If I have an bytes array for EBCDIC encoding, how can I convert it to a c# string?
I tried:
string str= Encoding.GetEncoding("EBCDIC").GetString(bytes);
string str= Encoding.GetEncoding("UTF-EBCDIC").GetString(bytes);
but got exception:
'EBCDIC' is not a supported encoding name
'UTF-EBCDIC' is not a supported encoding name
Is it a legacy encoding not supported in .Net anymore?

Related

Why does File.ReadAllText() also recognize UTF-16 encodings?

I read a file using
File.ReadAllText(..., Encoding.ASCII);
According the documentation [MSDN] (emphasis mine),
This method attempts to automatically detect the encoding of a file based on the presence of byte order marks. Encoding formats UTF-8 and UTF-32 (both big-endian and little-endian) can be detected.
However, in my case the ASCII file incorrectly started with 0xFE 0xFF and it detected UTF-16 (probably big endian, but I did not check).
According to File [referencesource] it uses a StreamReader:
private static String InternalReadAllText(String path, Encoding encoding, bool checkHost)
{
...
using (StreamReader sr = new StreamReader(path, encoding, true, StreamReader.DefaultBufferSize, checkHost))
return sr.ReadToEnd();
}
and that StreamReader overload with 5 parameter [MSDN] is documented to support UTF-16 as well
It automatically recognizes UTF-8, little-endian Unicode, big-endian Unicode, little-endian UTF-32, and big-endian UTF-32 text if the file starts with the appropriate byte order marks. Otherwise, the user-provided encoding is used.
(emphasis mine)
Since File.ReadAlltext() is supposed to and documented to detect Unicode BOMs, it's probably a good idea that it detects UTF-16 as well. However, the documentation is wrong and should be updated. I filed issue #7515.

Convert the strings to UTF-16 Encoding C#

I have few strings which are 1252 ENCODED ,UTF-8 and UTF-16 encoded. Ultimately I have to convert all the strings to UTF-16 encoding for comparison,how do I do this?
I came across if we know source encoding we can convert to destination encoding,but I need to convert strings(which may be encoded in any format) to UTF-16(default)
var url=#"file:///C:/Users/Œser/file.html";
Uri parsedurl;
var pass=Uri.TryCreate(url.Trim(),UriKind.Absolute,out parsedurl);
At this point parsedurl.AbsoluteUri prints file:///C:/Users/ %C5%92ser/file.html which is expected
Then I load the html file in IE WebBrowserControl
I intercept navigate
strURL = URL.ToString();
Now strURL prints file:///C:/Users/%8Cser/file.html
.NET string values are always UTF-16 (at least until Utf8String, which is looking like .NET 7 or .NET 8 now). So presumably you have some bytes or streams that are encoded in various encodings, that you want to covert to UTF-16 string instances.
The key here is Encoding; for example:
var enc = Encoding.GetEncoding(1252);
var enc = Encoding.UTF8
var enc = Encoding.BigEndianUnicode; (UTF-16, big-endian)
var enc = Encoding.Unicode; (UTF-16, little-endian)
You can use this encoding manually (GetString(...), GetEncoder(...) etc) - or you can pass it to a TextReader such as StreamReader as an optional constructor argument.
Note that 1252 may not be available in .NET Core / .NET 5 (only .NET Framework), as it depends on the OS encoding directory. You may have to settle for "Western European (ISO)" (iso-8859-1, code-page 28591 i.e. Encoding.GetEncoding(28591)).
From https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html:
ISO-8859-1 (also called Latin-1) is identical to Windows-1252 (also called CP1252) except for the code points 128-159 (0x80-0x9F). ISO-8859-1 assigns several control codes in this range. Windows-1252 has several characters, punctuation, arithmetic and business symbols assigned to these code points.
Similarly, Encoding can be used to write to any chosen encoding, if you want to get bytes again - presumably using either of the UTF-16 variants.

ASCII Extended in C# string

How do I make a string in C# to accept non printable ASCII extended characters like • , cause when I try to put • in a string it just give a blank space or null.
Extended ASCII is just ASCII with the 8 high bits set to different values.
The problem lies in the fact that no commission has ratified a standard for extended ASCII. There are a lot of variants out there and there's no way to tell what you are using.
Now C# uses UTF-16 encoding which will be different from whichever extended ASCII you are using.
You will have to find the matching Unicode character and display it as follows
string a ="\u2649" ; //where 2649 is a the Unicode number
Console.write(a) ;
Alternatively you could find out which encoding your files use and use it like so
eg. encoding Windows-1252:
Encoding encoding = Encoding.GetEncoding(1252);
and for UTF-16
Encoding enc = new UnicodeEncoding(false, true, true);
and convert it using
Encoding.Convert (Encoding, Encoding, Byte[], Int32, Int32)
Details are here
Try this..
Convert those charcaters as string as folows.
string equivalentLetter = Encoding.Default.GetString(new byte[] { (byte)letter });
Now, the equivalent letter contains the correct string.
I tried this for EURO symbol, it worked.
.NET strings are UTF-16 encoded, not extended-ascii (whatever that is). By simply adding a number to a character will give you another defined character within the UTF-16 plain set. If you want to see the underlying character as it would be in your extended ASCII encoding you need to convert the newly calculated letter from whatever encoding you are talking about to UTF-16. See: http://msdn.microsoft.com/en-us/library/66sschk1.aspx

How to convert a string from iso 8859-1 to utf-8? C# Windows phone 7 -

my question is very simple but at the moment i don't know how to do this. I have a string in ISO-8859-1 format and i need to convert this string to UTF-8. I need to do it in c# on windows phone 7 sdk. How can i do it? Thanks
The MSDN page for the Encoding class lists the recognized encodings.
28591 iso-8859-1 Western European (ISO)
For your question the correct choice is iso-8859-1 which you can pass to Encoding.GetEncoding.
var inputEncoding = Encoding.GetEncoding("iso-8859-1");
var text = inputEncoding.GetString(input);
var output = Encoding.Utf8.GetBytes(text);
Two clarifications on the previous answers:
There is no Encoding.GetText method (unless it was introduced specifically for the WP7 framework). The method should presumably be Encoding.GetString.
The Encoding.GetString method takes a byte[] parameter, not a string. All strings in .NET are internally represented as UTF-16; there is no way of having a “string in ISO-8859-1 format”. Thus, you must be careful how you read your source (file, network), rather than how you process your string.
For example, to read from a text file encoded in ISO-8859-1, you could use:
string text = File.ReadAllText(path, Encoding.GetEncoding("iso-8859-1"));
To save to a text file encoded in UTF-8, you could use:
File.WriteAllText(path, text, Encoding.UTF8);
Reply to comment:
Yes. You can use Encoding.GetString to decode your byte array (assuming it contains character values for text under a particular encoding) into a string, and Encoding.GetBytes to convert your string back into a byte array (possibly of a different encoding), as demonstrated in the other answers.
The concept of “encoding” relates to how byte sequences (be they a byte[] array in memory or the content of a file on disk) are to be interpreted. The string class is oblivious to the encoding that the text was read from, or should be saved to.
You can use Convert which works pretty well, especially when you have byte array:
var latinString = "Řr"; // år
Encoding latinEncoding = Encoding.GetEncoding("iso-8859-1");
Encoding utf8Encoding = Encoding.UTF8;
byte[] latinBytes = latinEncoding.GetBytes(latinString);
byte[] utf8Bytes = Encoding.Convert(latinEncoding, utf8Encoding, latinBytes);
var utf8String = Encoding.UTF8.GetString(utf8Bytes);

convert from ascii (dos) to windows

Hi I have a string written in ascii code whose output is " ”˜‰ƒ ‰™˜€" this is a name in Hebrew. How can I convert it to Hebrew letters?
.net c# winform
There are no Hebrew letters in ASCII, so you have to mean ANSI. There is a default encoding for the system that is used for encoding ANSI, which you need to know to decode it.
It's probably the Windows-1255 or ISO 8859-8 encoding that was used. You can use the Encoding class to decode the data. Example:
Encoding.GetEncoding("ISO 8859-8").GetString(data);
If you already have a string, the problem is that you have decoded data using the wrong encoding. You have to go further back in the process before the data was a string, so that you can get the actual encoded bytes.
If you for example are reading the string from a file, you have to either read the file as bytes instead, or set the encoding that the stream reader uses to decode the file data into characters.

Categories