Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am wondering how to best handle a special character such as ’ using c#?
e.g
public static string DecodeFrom64(string toDecode)
{
byte[] arrayToDecode = System.Convert.FromBase64String(toDecode);
return System.Text.Encoding.Unicode.GetString(arrayToDecode);
}
The problem here is that you've stored a UTF-8 string to a different encoding in your database - probably the Windows-1252 code page (CP2152). As a result the UTF-8 character ’ represented by the byte sequence E2 80 99 is translated into the CP2152 single-byte characters ’. This was all explained to your previously in this answer, which also gives a solution to your current problem.
In order to get back to the original UTF-8 encoding you will need to take the string returned from your database and correct it with the following code:
public static string UTF8From1252(string source)
{
// get original UTF-8 bytes from CP1252-encoded string
byte[] bytes = System.Text.Encoding.GetEncoding("windows-1252").GetBytes(source);
return System.Text.Encoding.UTF8.GetString(bytes);
}
This highlights the fact that it is vital to use the correct encoding at all times when using the GetBytes method.
It is important to note that the reverse of this transformation is not always possible, since there are gaps in the CP2152 code space - values that will be discarded or altered during conversion from byte values.
The hex values for these gaps are: 81 8D 8F 90 9D.
Unfortunately these values are present in various UTF-8 encodings, such as ” (E2 80 9D). If you have one of these values in your database then it will not load correctly. Depending on how you did the first stage conversion the third byte may be lost or corrupted in the database, in which case you cannot retrieve it.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working on modbus communication. I am trying to get the length of a frame which is in actual a string.
while (reader.Read())
{
data.Add(reader["read_param"].ToString());
}
var single = string.Join("",data);
The resultant string is
4A601933906620468040204220442040004200404020602260246
As per documentation, the length is 1B in hex and 27 in decimal
But when I try to get the length int length = combine.Length; I am getting 53. How to get it in HEX?
Any help would be highly appreciated.
You are converting byte to string which converts each single byte to 1 or 2 string characters. So the 53 you are getting from combine.Length is the length of the converted string, the 0x1B from modbus protocol is the length of the byte string. You are getting 53 string characters instead of 54 because one of the bytes was probably a 0x0X so the leading zero is stripped off.
I am unsure of which reader you are using, but if it reads in byte, you can add a counter to determine the length of the modbus message.
When a string is sent, the encoded string is sent in some standard format, the most popular is UTF-8 with which each character can occupy from 1 to 4 bytes.
So the encoded string can have more bytes than characters in the string.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I use below code for convert UTF8 (Persian characters) to LATIN1.
but it doesn't work for some character like (و ی ه)
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(source);
string des = iso.GetString(utfBytes);
I use below code for convert UTF8 (Persian characters) to LATIN1.
ISO-8859-1 cannot contain Persian characters. What you are doing is intentionally making a mojibake error.
If your code is doing something functional, that implies that there is some other component taking the output from des and handling it incorrectly (ie outputting it as a Latin-like encoding when it should originally have been using UTF-8). If you can at all, it would be much better to fix that problem downstream, instead of trying to work around it with deliberately-bad encoding.
If you really have to handle it this way, and some characters work but others don't, the likelihood is that Latin-like encoding you are trying to target is not actually real Latin-1 (ISO-8859-1); the most likely reason for that is that it is Windows code page 1252. This shares many of the same character mappings as ISO-8859-1, but not all. So try GetEncoding(1252).
Latin1 is more orientated to the Latin Alphabeth (which is fine if you only aim to that) and UTF8 can represent any Unicode charset, not just Wester European countries, but also Eastern europeans.
Most of the characters are the same but there are some differences. Take a look at the charsets of UTF8 and Latin1.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Using the BinaryWriter I write an sbyte variable to a file but it gets written to the file as unsigned and not signed as it should be.
sbyte a = (sbyte)image.pixelData[i + 0];
bw.Write(a);
For example, the above code writes values ranging from 0x00 to 0xFF. (These are the values I see in a hex editor.)
You are demonstrating a fundamental misunderstanding of how data is stored in memory, files, etc.
All computer data is in binary form.
The different data types determine how the data is treated, how calculations are performed, how values are formatted, etc.
If you write a signed value to a file, it is always written using binary form (the only format a computer understands). If you read that data with a hex editor, then the hex editor will translate the data using whatever translation it considers to be appropriate.
If you write data to a file as a signed byte, and then you read that same data back as a signed byte, then the data will be the same as the data written. You should expect no more, and no less.
(Note: If you use the hex editor I wrote (Cygnus Hex Editor), you can inspect the data using any data type. In that case, you can have it appear in the format you expect. Otherwise, the hex editor is converting it to hex, or whatever, which tells you nothing about how the data is stored in the file.)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have a bitmask of 200 bit stored as a hexadecimal value.
In order to apply the & bit operator, I have to first convert the hex to an integer but 200 bit is too big for a uint64 so my question is : how do I split my bitmask in 4 different hexadecimal value without loosing data?
So that I can also split my 200 bit data and then compare every chunk of data with the corresponding chunk of bitmask without altering the result.
You can use the BigInteger from System.Numerics (it's a separate assembly):
BigInteger bi = BigInteger.Parse("01ABC000000000000000000000000000000000", System.Globalization.NumberStyles.HexNumber);
VERY IMPORTANT: prepend a "0" before the hex number! (because BigInteger.Parse("F", NumberStyles.HexNumber) == -1, while BigInteger.Parse("0F", NumberStyles.HexNumber) == 15
BigInteger implement the "classical" logical operators (&, |, ^)
Requires .NET 4.0
The most efficient way of achieving this is writing a class that can store and do binary operations on 200bits of data, have strings as input, etc.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How to encrypt Decrypt text without using Base64String?
I don't want to use Base64String because encrypted text should not contains any special character like #, $, #, /, \, |,=,% ,^
Well the obvious approach if you don't want to use base64 is to use base16 - i.e. hex.
There are plenty of examples of converting between a byte array and a hex string representation on Stack Overflow. (BitConverter.ToString(data).Replace("-", "") is an inefficient way of performing the conversion to a string; there's nothing quite as simple for the reverse, but it's not much code.)
EDIT: As noted in comments, SoapHexBinary has a simple way of doing this. You may wish to wrap the use of that class in a less SOAP-specific type, of course :)
Of course that will use rather more space than base64. One alternative is to use base64, but using a different set of characters: find 65 characters you can use (the 65th is for padding) and encode it that way. (You may find there's a base64 library available which allows you to specify the characters to use, but if not it's pretty easy to write.)
Do not try to just use a normal Encoding - it's not appropriate for data which isn't fundamentally text.
EDIT: As noted in comments, you can use base32 as well. That can be case-insensitive (potentially handy) and you can avoid I/1 and O/0 for added clarity. It's harder to code and debug though.
There's a great example in the MySQL Connector source code for the ASP.NET membership provider implementation. It may be a little hassle to download and research, but it has a well-established encryption and decryption module in there.
http://dev.mysql.com/downloads/connector/net/#downloads
Choose the 'source code' option before downloading.
If you want encoding/decoding for data transmission or condensed character storage, you should edit your question. Answers given to an encoding question will be much different than answers given to an encryption/decryption question.