Hi I have a string written in ascii code whose output is " ”˜‰ƒ ‰™˜€" this is a name in Hebrew. How can I convert it to Hebrew letters?
.net c# winform
There are no Hebrew letters in ASCII, so you have to mean ANSI. There is a default encoding for the system that is used for encoding ANSI, which you need to know to decode it.
It's probably the Windows-1255 or ISO 8859-8 encoding that was used. You can use the Encoding class to decode the data. Example:
Encoding.GetEncoding("ISO 8859-8").GetString(data);
If you already have a string, the problem is that you have decoded data using the wrong encoding. You have to go further back in the process before the data was a string, so that you can get the actual encoded bytes.
If you for example are reading the string from a file, you have to either read the file as bytes instead, or set the encoding that the stream reader uses to decode the file data into characters.
Related
I want to convert the ascii encoded text input by my users into UTF-8 encoding, so that I can display it using any unicode font types. For example, I want to display english alphabet 'l' in ASCII as 'ക' in Unicode. I think I would require a mapping system too, so that I can Map l to 'ക'. Please help me to solve this issue.
Your text is in ISCII (Indian Script Code for Information Interchange). You need to convert ISCII with the proper code page to unicode. The following methods should do the job. Convert will convert a given text from one encoding to another. GetEncoding will provide you with the Encoding objects to be used by the Convert method.
Example code can be found here: http://www.dotnetframework.org/default.aspx/Net/Net/3#5#50727#3053/DEVDIV/depot/DevDiv/releases/whidbey/netfxsp/ndp/clr/src/BCL/System/Text/ISCIIEncoding#cs/1/ISCIIEncoding#cs
Code page identifiers can be found here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx
public static byte[] Convert(System.Text.Encoding srcEncoding, System.Text.Encoding dstEncoding, byte[] bytes)
Member of System.Text.Encoding
Summary:
Converts an entire byte array from one encoding to another.
Parameters:
srcEncoding: The encoding format of bytes.
dstEncoding: The target encoding format.
bytes:
Returns:
An array of type System.Byte containing the results of converting bytes from srcEncoding to dstEncoding.
and this
public static System.Text.Encoding GetEncoding(int codepage)
Member of System.Text.Encoding
Summary:
Returns the encoding associated with the specified code page identifier.
Parameters:
codepage: The code page identifier of the preferred encoding. -or- 0, to use the default encoding.
Returns:
The System.Text.Encoding associated with the specified code page.
As per Wikipedia Article, the code page for Malayalam is 57009
Encoding.UTF8.GetString(Encoding.ASCII.GetBytes(input))
Your question makes no sense. Changing the encoding from ASCII to UTF-8 does not magically turn an l into a ക, it only changes the byte representation of the l (actually, since ASCII is a subset of UTF-8, it does not even do that here. It does nothing.)
What you probably want is some kind of transliteration between the Latin and Malayalam alphabet, but that is something completely different.
I had written code (in C#) about to import csv file using filehelper.
I am facing one issue that if file contain any &mdash (—) than it would replace by ? character (not exact ? instead some special character as per shown in below image)
How can i handle this by code?
Thanks.
How your stream reader object is created ? Have you provided any specific encoding to it ? I think you should try if not yet, as default encoding can not be detected while there is no BOM defined.
From MSDN
The character encoding is set by the encoding parameter, and the
buffer size is set to 1024 bytes. The StreamReader object attempts to
detect the encoding by looking at the first three bytes of the stream.
It automatically recognizes UTF-8, little-endian Unicode, and
big-endian Unicode text if the file starts with the appropriate byte
order marks. Otherwise, the user-provided encoding is used. See the
Encoding.GetPreamble method for more information.
I am converting HTML to docx using http://www.codeproject.com/Articles/91894/HTML-as-a-Source-for-a-DOCX-File.
Most of the characters are read properly but some special characters such as •,“ ” are being displayed as •. What should I be doing to correct this?
The HTML that I was passing to HTMLtoDocx was also not reading special characters properly. Instead it was displaying as '?'. After changing the encoding to Encoding.Default it's returning the correct characters.
In HTMLtoDOCX there are two places that I can set encoding(lines below). In both the places I Tried changing the encoding format from Encoding.UTF8 to Encoding. But it isn't helping.
StreamWriter streamStartPart = new StreamWriter(docpartDocumentXML.GetStream(FileMode.Create, FileAccess.Write), Encoding.Default);
byte[] Origem = Encoding.Default.GetBytes(html);
• indicates a UTF-8 sequences incorrectly interpreted as ANSI (=Encoding.Default).
You should check whether the HTML file is read with the correct encoding.
While the encoding info is available in the HTTP Header or in HTML META tags, this encoding may not be correct if the HTML is read from a file.
Since .Net treats string characters as 2-byte Unicode values, making sure the correct encoding is apply to read and write byte streams is the first step to fix your problem.
I'm attempting to write out C# string data to a UTF-8 file without a byte order mark (BOM), but am getting an ANSI file created.
using (StreamWriter objStreamWriter = new StreamWriter(SomePath, false, new UTF8Encoding(false)))
{
objStreamWriter.Write("Hello world - Encoding no BOM but actually returns ANSI");
objStreamWriter.Close();
}
According to the documentation for the UTF8Encoding class constructor, setting the encoderShouldEmitUTF8Identifier parameter to false should inhibit the Byte Order Mark.
I'm using .NET Framework 4.5 on my British (en-gb) computer. Below is screenshot of the ScreenWriter object showing UTF8Encoding in place.
So why am I getting an ANSI file (as checked with Notepad++) back from this operation?
Your example string that you're writing to the file consists only of characters in the ASCII range. The ASCII range is shared by ASCII, UTF-8 and most (all?) ANSI code pages. So, given that there is no BOM, Notepad++ has no indication if UTF-8 or ANSI is meant, and apparently defaults to ANSI.
If there is no BOM and no unicode characters, how do you expect Notepad++ to recognise it as UTF-8? UTF-8, ANSI and ASCII are all identical for the characters you are emitting?
(Even if you include some unicode characters Notepad++ may struggle to guess the correct encoding.)
In "Hello world - Encoding no BOM but actually returns ANSI", no character is encoded differently in UTF8 and ANSI. Because of BOM absence, Notepad++ shows that the file is encoded in ANSI because there is no 'special character'. Try adding a "é, à, ê" character in your file and Notepad++ will show it as being encoded in UTF8 without BOM.
I'm using an old technology called RTTY to send data (it's basically fancy Morse Code) over radio.
RTTY can only transmit ascii characters
What I want to do is convert a file such as a small jpg or something similar into a block of ascii text, send the characters over radio, then convert the characters on the remote end back into the original file.
Some help getting started would be great.
I know I need to use StreamReader but then how can I convert the byte[] into an encoded ascii string that I can then 'decode'.
I know i need to use streamreader but then how can I convert the byte[] into an encoded ascii string that I can then 'decode'
Basically, you want to use a Base64 conversion. It will inflate the size of the data, but it guarantees that you'll be able to round-trip the original binary data.
Use Convert.ToBase64String to convert a byte[] into a string, and Convert.FromBase64String to do the reverse.