Encode UTF8 text to Unicode C# [duplicate] - c#

This question already has answers here:
How to convert a UTF-8 string into Unicode?
(4 answers)
Closed 3 years ago.
How to encode UTF8 text to Unicode?
string text_txt = "пÑивеÑ";
byte[] bytesUtf8 = Encoding.Default.GetBytes(text_txt);
text_txt = Encoding.UTF8.GetString(bytesUtf8);
The problem is output: п�?иве�
I need output: привет
Using that site: https://www.branah.com/unicode-converter enter text in "UTF-8 text (Example: a 中 Я)" to "пÑивеÑ" it will show you "привет" on Unicode text
Please give some advice thanks

byte[] utf8Bytes = new byte[text_txt.Length];
for (int i = 0; i < text_txt.Length; ++i)
{
//Debug.Assert( 0 <= utf8String[i] && utf8String[i] <= 255, "the char must be in byte's range");
utf8Bytes[i] = (byte)text_txt[i];
}
text_txt= Encoding.UTF8.GetString(utf8Bytes, 0, text_txt.Length);
from answer: How to convert a UTF-8 string into Unicode?

Well, you probably mean this:
// Forward: given in UTF-8 represented in WIN-1252
byte[] data = Encoding.UTF8.GetBytes("привет");
string text = Encoding.GetEncoding(1252).GetString(data);
// Reverse: given in WIN-1252 represented in UTF-8
byte[] reversedData = Encoding.GetEncoding(1252).GetBytes("привет");
string reversedText = Encoding.UTF8.GetString(reversedData);
Console.WriteLine($"{string.Join(" ", data)} <=> {text}");
Console.WriteLine(reversedText);
Outcome:
208 191 209 128 208 184 208 178 208 181 209 130 <=> привет
привет
Please, note that you've omitted € and , characters:
Ð¿Ñ Ð¸Ð²ÐµÑ - actual string
привет - should be

You need to be explicit about the type of encoding you're using to convert to bytes, (Syste.Text.Encoding.UTF8.GetBytes). eg:
using System;
using System.Text;
public class Program {
public static void Main() {
string text_txt = "пÑивеÑ";
byte[] bytesUtf8 = Encoding.UTF8.GetBytes(text_txt);
text_txt = Encoding.UTF8.GetString(bytesUtf8);
Console.WriteLine(text_txt);
}
}
This way UTF8 is used to both encode and decode the string the same way, and when you ensure the same string comes back from the GetString method.

Related

how to convert a hex value from a byte array as an interpreted ASCII number into an integer?

I'm just starting with the c# programming and
as the heading describes, I'm looking for a way to convert a number passed to me as an ASCII character in a byte[] to an integer. I often find the way to convert a hex-byte to ASCII-char or string. I also find the other direction, get the hex-byte from a char. Maybe I should still say that I have the values displayed in a texbox for control.
as an example:
hex- code: 30 36 38 31
Ascii string: (0) 6 8 1
Integer (dez) should be: 681
so far I have tried all sorts of things. I also couldn't find it on the Microsoft Visual Studio website. Actually this should be relatively simple. I am sorry for my missing basics in c#.
Putting together this hex-to-string answer and this integer parsing answer, we get the following:
// hex -> byte array -> string
var hexBytes = "30 36 38 31";
var bytes = hexBytes.Split(' ')
.Select(hb => Convert.ToByte(hb, 16)) // converts string -> byte using base 16
.ToArray();
var asciiStr = System.Text.Encoding.ASCII.GetString(bytes);
// parse string as integer
int x = 0;
if (Int32.TryParse(asciiStr, out x))
{
Console.WriteLine(x); // write to console
}
else
{
Console.WriteLine("{0} is not a valid integer.", asciiStr); // invalid number, write error to console
}
Try it online
A typical solution of the problem is a Linq query. We should
Split initial string into items
Convert each item to int, treating item being hexadecimal. We should subtract '0' since we have not digit itself but its ascii code.
Aggregate items into the final integer
Code:
using System.Linq;
...
string source = "30 36 38 31";
int result = source
.Split(' ')
.Select(item => Convert.ToInt32(item, 16) - '0')
.Aggregate((sum, item) => sum * 10 + item);
If you want to obtain ascii string you can
Split the string
Convert each item into char
Join the chars back to string:
Code:
string source = "30 36 38 31";
string asciiString = string.Join(" ", source
.Split(' ')
.Select(item => (char)Convert.ToInt32(item, 16)));
To convert a byte array containing ASCII codes to an integer:
byte[] data = {0x30, 0x36, 0x38, 0x31};
string str = Encoding.ASCII.GetString(data);
int number = int.Parse(str);
Console.WriteLine(number); // Prints 681
To convert an integer to a 4-byte array containing ASCII codes (only works if the number is <= 9999 of course):
int number = 681;
byte[] data = Encoding.ASCII.GetBytes(number.ToString("D4"));
// data[] now contains 30h, 36h, 38h, 31h
Console.WriteLine(string.Join(", ", data.Select(b => b.ToString("x"))));

Rijndael 256 Encrypt and Decrypt in C# and ASP.net [duplicate]

I want to achieve Base64 URL safe encoding in C#. In Java, we have the common Codec library which gives me an URL safe encoded string. How can I achieve the same using C#?
byte[] toEncodeAsBytes = System.Text.ASCIIEncoding.ASCII.GetBytes("StringToEncode");
string returnValue = System.Convert.ToBase64String(toEncodeAsBytes);
The above code converts it to Base64, but it pads ==. Is there is way to achieve URL safe encoding?
It is common to simply swap alphabet for use in urls, so that no %-encoding is necessary; only 3 of the 65 characters are problematic - +, / and =. the most common replacements are - in place of + and _ in place of /. As for the padding: just remove it (the =); you can infer the amount of padding needed. At the other end: just reverse the process:
string returnValue = System.Convert.ToBase64String(toEncodeAsBytes)
.TrimEnd(padding).Replace('+', '-').Replace('/', '_');
with:
static readonly char[] padding = { '=' };
and to reverse:
string incoming = returnValue
.Replace('_', '/').Replace('-', '+');
switch(returnValue.Length % 4) {
case 2: incoming += "=="; break;
case 3: incoming += "="; break;
}
byte[] bytes = Convert.FromBase64String(incoming);
string originalText = Encoding.ASCII.GetString(bytes);
The interesting question, however, is: is this the same approach that the "common codec library" uses? It would certainly be a reasonable first thing to test - this is a pretty common approach.
You can use class Base64UrlEncoder from namespace Microsoft.IdentityModel.Tokens.
const string StringToEncode = "He=llo+Wo/rld";
var encodedStr = Base64UrlEncoder.Encode(StringToEncode);
var decodedStr = Base64UrlEncoder.Decode(encodedStr);
if (decodedStr == StringToEncode)
Console.WriteLine("It works!");
else
Console.WriteLine("Dangit!");
Microsoft.IdentityModel.Tokens is a NuGet package that has to be downloaded.
Another option, if you are using ASP.NET Core, would be to use Microsoft.AspNetCore.WebUtilities.WebEncoders.Base64UrlEncode.
If you are not using ASP.NET Core, the WebEncoders source is available under the Apache 2.0 License.
Based off the answers here with some performance improvements, we've published a very easy to use url-safe base64 implementation to NuGet with the source code available on GitHub (MIT licensed).
Usage is as easy as
var bytes = Encoding.UTF8.GetBytes("Foo");
var encoded = UrlBase64.Encode(bytes);
var decoded = UrlBase64.Decode(encoded);
To get an URL-safe base64-like encoding, but not "base64url" according to RFC4648, use System.Web.HttpServerUtility.UrlTokenEncode(bytes) to encode, and
System.Web.HttpServerUtility.UrlTokenDecode(bytes) to decode.
Simplest solution:
(with no padding)
private static string Base64UrlEncode(string input) {
var inputBytes = System.Text.Encoding.UTF8.GetBytes(input);
// Special "url-safe" base64 encode.
return Convert.ToBase64String(inputBytes)
.Replace('+', '-') // replace URL unsafe characters with safe ones
.Replace('/', '_') // replace URL unsafe characters with safe ones
.Replace("=", ""); // no padding
}
Credit goes to: Tholle
public string Decode(string str)
{
byte[] decbuff = Convert.FromBase64String(str.Replace(",", "=").Replace("-", "+").Replace("/", "_"));
return System.Text.Encoding.UTF8.GetString(decbuff);
}
public string Encode(string input)
{
byte[] encbuff = Encoding.UTF8.GetBytes(input ?? "");
return Convert.ToBase64String(encbuff).Replace("=", ",").Replace("+", "-").Replace("_", "/");
}
This is the way to do it to align with JavaScript!
Here is another method to decode an url-safe base64 was encode by same way with Marc. I just don't get why 4-length%4 worked(it does).
As follows, only the origin's bit length are common multiple of 6 and 8, base64 do not append "=" to result.
1 2 3 4 5 6 7 8|1 2 3 4 5 6 7 8|1 2 3 4 5 6 7 8
1 2 3 4 5 6|1 2 3 4 5 6|1 2 3 4 5 6|1 2 3 4 5 6
"==" "="
So we can do it conversely, if result's bit length can't divisible by 8, it has been appended:
base64String = base64String.Replace("-", "+").Replace("_", "/");
var base64 = Encoding.ASCII.GetBytes(base64String);
var padding = base64.Length * 3 % 4;//(base64.Length*6 % 8)/2
if (padding != 0)
{
base64String = base64String.PadRight(base64String.Length + padding, '=');
}
return Convert.FromBase64String(base64String);
Karanvir Kang's answer is a good one and I voted for it. However, it does leave an odd character on the end of the string (indicating the number of padding characters removed). Here is my solution.
var bytesToEncode = System.Text.Encoding.UTF8.GetBytes("StringToEncode");
var bytesEncodedPadded = HttpServerUtility.UrlTokenEncode(bytesToEncode);
var objectIdBase64 = bytesEncodedPadded.Substring(0, bytesEncodedPadded.Length - 1);
Using Microsoft cryptographic engine in UWP.
uint length = 32;
IBuffer buffer = CryptographicBuffer.GenerateRandom(length);
string base64Str = CryptographicBuffer.EncodeToBase64String(buffer)
// ensure url safe
.TrimEnd('=').Replace('+', '-').Replace('/', '_');
return base64Str;

ulong.Parse(string, NumberStyles) Exception C#

i been working on this "string to Binary" method for longer than usual and i have no idea where i m going wrong.
i have already searched the internet for solution but nothing seem to be working the way it supposed to do.
public static string hexToBin(string strValue)
{
byte[] hexThis = ASCIIEncoding.ASCII.GetBytes(strValue.ToString());
string thiI = ToHex(strValue);
ulong number = UInt64.Parse(*string*, System.Globalization.NumberStyles.HexNumber);
byte[] bytes = BitConverter.GetBytes(number);
string binaryString = string.Empty;
foreach (byte singleByte in bytes)
{
binaryString += Convert.ToString(singleByte, 2);
}
return binaryString;
}
ToHex(string) takes string and returns its hex representation.
but all i keep getting is "Input string was not in a correct format." at the ulong.Parse(string, NumberStyle); and no matter what are my inputs i keep getting the "FormatException" "Input string was not in a correct format." Error.
the inputs and its outputs
string: format exception - "Hello"
hex: format exception - "48 65 6C 6C 6F"
byte[]: format exception - { 72, 101, 108, 108, 111 }
i have also tried using the "Hello" string, but it threw me the same error.
would you please let me know what i m doing wrong in here?
i also have tried "Clean/build/rebuild" restart visual studio, but i keep getting the same format exception.
EDIT,, used UInt64.Parse() not ulong.Parse() and the used string is "Hello" w/o quotation.
EDIT #2,,
so i did this based on knittl suggestion and used the Convert.ToUInt64 instead of the parse, but still getting same error
ulong binary;
string binThis;
byte[] ByteThis;
binThis = "Hello";
ByteThis = ASCIIEncoding.ASCII.GetBytes(binThis);
binary = Convert.ToUInt64(ByteThis);
Console.WriteLine(binary);
the CurrentCulture is set to en-US and i m also using en-US keyboard
EDIT #3 - Solved
thanks to knittl
the solution is as follow:
string thestring = "example";
string[] finale = new string[thestring.Length];
foreach (var c in ByteThis)
{
for (int i = 0; i < ByteThis.Length; i++)
{
thestring = Convert.ToString(c, 2);
thestring = "0" + thestring;
if (thestring.Length == 9)
thestring.Remove(0, 1);
finale[i] = thestring;
Console.WriteLine(finale[i]);
}
}
the final for is to check on the solution.
this question aimed to get the binary representation of a given string.
Not totally clear, what your method should do (i.e. what format the input string is. Is it a bas10 number, or already a hexadecimal number?)
If it's a hexadecimal number, use ulong.Parse(inputStr, NumberStyles.HexNumber). If not, simply use ulong.Parse(inputStr). Note that NumberStyles.HexNumber does not allow the 0x prefix (Convert.ToUInt64(inputStr) does however).
Then, once you have your input string parsed to a number, simply use Convert.ToString(number, 2) to convert to base2. You will notice that there is no overload which takes an ulong and an int, but you can simply cast your number to a (signed) long, since the binary representation will be identical between the two (cf. two's complement). So, in effect Convert.ToString((long)number, 2).
No need for complicated loops and conversions to byte arrays.
Bonus answer.
If you are not too concerned with performance, you can even use a LINQ one-liner:
Encoding.ASCII.GetBytes(inputStr).Aggregate(
new StringBuilder(),
(sb, ch) => sb.Append(Convert.ToString(ch, 2).PadLeft(8, '0')),
sb => sb.ToString());

How to convert Hex to Chinese ASCII character in c#? [duplicate]

I have the following code to convert from HEX to ASCII.
//Hexadecimal to ASCII Convertion
private static string hex2ascii(string hexString)
{
MessageBox.Show(hexString);
StringBuilder sb = new StringBuilder();
for (int i = 0; i <= hexString.Length - 2; i += 2)
{
sb.Append(Convert.ToString(Convert.ToChar(Int32.Parse(hexString.Substring(i, 2), System.Globalization.NumberStyles.HexNumber))));
}
return sb.ToString();
}
input hexString = D3FCC4A7B6FABBB7
output return = Óüħ¶ú»·
The output that I need is 狱魔耳环, but I am getting Óüħ¶ú»· instead.
How would I make it display the correct string?
First, convert the hex string to a byte[], e.g. using code at How do you convert Byte Array to Hexadecimal String, and vice versa?. Then use System.Text.Encoding.Unicode.GetString(myArray) (use proper encoding, might not be Unicode, but judging from your example it is a 16-bit encoding, which, incidentally, is not "ASCII", which is 7-bit) to convert it to a string.

How do I convert a set of hexadecimal bytes to a custom string

So basically I need to find a way to convert this; 29 38 33 30 3D 34 FF, to this; Zidane
FF being character to imply end of name.
What I've got so far is that I can read that to its literal string, )830=4ÿ, which isn't at all user friendly for what I'm trying to create.
Now just by that one name alone you can guess what I'm working on, but this is the only thing I seem to be getting stuck on is the whole custom character string.
This is the code to get string from hex string,
private string HexString2Ascii(string hexString)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i <= hexString.Length - 2; i += 2)
{
sb.Append(Convert.ToString(Convert.ToChar(Int32.Parse(hexString.Substring(i, 2), System.Globalization.NumberStyles.HexNumber))));
}
return sb.ToString();
}
Will be happy if you explain in what format is the input string, hex or byte array.
Ah I see what is happening. Remember that when using Hex you have Unicode, Shift JIs and in your case Little-endian. As I understand it looks like you have an incorrect hex table for what you are currently trying to read. Sorry if my answer didn't help enough.
If you decode the hex values and expect the output to be ascii encoded then you get exactly what you state above, as seen using this online hex decoder.
The string is not obviously ascii encoded. I cant be certain of the exact encoding but by looking at the values, the expected output and the difference between the values you can predict how to map values to letters:
A-Z = 0x04 - 0x29
EG: 'A' = 04, 'B' = 05, .... 'Z' = 29
a-z = 0x30 - 0x55
EG: 'a' = 30, 'b' = 31, .... 'z' = 55
This should be enough to get you a readable string.

Categories