OK so I'm making a ASCII to HEX converter and it works fine, but when i insert line breaks it replaces them with this character -> Ú
ie
turns this
1
2
3
to this
1Ú2Ú3
Code under command buttons
private void asciiToHex_Click(object sender, EventArgs e)
{
HexConverter HexConvert =new HexConverter();
string sData=textBox1.Text;
textBox2.Text = HexConvert.StringToHexadecimal(sData);
}
private void hexToAscii_Click(object sender, EventArgs e)
{
HexConverter HexConvert = new HexConverter();
string sData = textBox1.Text;
textBox2.Text = HexConvert.HexadecimalToString(sData);
}
Code under HexConverter.cs
public class HexConverter
{
public string HexadecimalToString(string Data)
{
string Data1 = "";
string sData = "";
while (Data.Length > 0)
//first take two hex value using substring.
//then convert Hex value into ascii.
//then convert ascii value into character.
{
Data1 = System.Convert.ToChar(System.Convert.ToUInt32(Data.Substring(0, 2), 16)).ToString();
sData = sData + Data1;
Data = Data.Substring(2, Data.Length - 2);
}
return sData;
}
public string StringToHexadecimal(string Data)
{
//first take each charcter using substring.
//then convert character into ascii.
//then convert ascii value into Hex Format
string sValue;
string sHex = "";
foreach (char c in Data.ToCharArray())
{
sValue = String.Format("{0:X}", Convert.ToUInt32(c));
sHex = sHex + sValue;
}
return sHex;
}
}
Any Ideas?
The problem is that String.Format("{0:X}", Convert.ToUInt32(c)) does not zero-pad its output to two digits, so \r\n becomes DA instead of 0D0A. You'll get a similar problem, but worse, with \t (which becomes 9 instead of 09, which will cause misalignment for subsequent characters as well).
To zero-pad to two digits, you can use X2 instead of bare X; or, more generally, you can use Xn to zero-pad to n digits. (See the "Standard Numeric Format Strings" page on MSDN.)
Instead of
System.Convert.ToUInt32(hexString), use
uint.Parse(hexString, System.Globalization.NumberStyles.AllowHexSpecifier);
MSDN says the "AllowHexSpecifier flag indicates that the string to be parsed is always interpreted as a hexadecimal value"
How to: Convert Between Hexadecimal Strings and Numeric Types
the laziest thing you could do is do a string.replace("Ú","\r\n") on the result. Unless there were a compelling reason not to do it this way, I would start here.
Otherwise, in your Char loop, look for the NewLine char and add it as-is to your string.
Related
This question may reveal my ignorance regarding character encoding, so if it does, I would greatly appreciate information to correct that.
I am relaying strings from new applications to an old application. The old application only accepts ASCII characters (http://www.asciitable.com/). The old application also does not support certain characters such as backslashes. The new applications support more or less anything.
Let's say I have the string:
"Whatever - 1_夜_💦💦💦"
I need to convert that to something with only ASCII characters. For example, maybe something like:
"Whatever - 1_\u001cY_=???=???=???"
Then I want to replace the remaining illegal characters with substitution strings.
Ideally, any character that is encoded to ASCII should be able to be de-coded. That is, any unique input string will have a unique output string (no arbitrary inputs "abc" and "xyz" which are different produce the same result). An algorithm could convert the output string back to the input string.
This is what I've tried:
static string ConvertToAscii(string str)
{
var return_string = "";
foreach (var c in str)
{
if ((int)c < 128)
{
return_string += c;
}
else
{
var charBytes = BitConverter.GetBytes(c);
var ascii = Encoding.ASCII.GetString(charBytes);
return_string += ascii;
}
}
return return_string;
}
When I use this with the string I mentioned above, I get:
"Whatever - 1_\u001cY_=???=???=???"
That seems great - however, the "\u001cY" is apparently a single character, rather than a collection of ASCII characters. So my target database rejects it, and I am not able to figure out how to remove the "\" while leaving the remaining characters.
How can I convert any string into a collection of ASCII characters?
The easiest approach is Base64 all bytes since you don't seem to care how strings are represented:
Convert.ToBase64String( Encoding.Unicode.GetBytes("Whatever - 1_夜_💦💦💦"))
will produce result that is guaranteed to be ASCII (even printable ASCII) - for your string result would be "VwBoAGEAdABlAHYAZQByACAALQAgADEAXwAcWV8APdim3D3Yptw92Kbc".
Here is similar code to what I ended up using to convert everything to Ascii:
internal static string ConvertToAscii(string str)
{
var returnStringBuilder = new StringBuilder();
foreach (var c in str)
{
if (char.IsControl(c))
{
// Control character
continue;
}
if (c < 127)
{
// ASCII Character
returnStringBuilder.Append(c);
}
else
{
returnStringBuilder.Append("U+" + ((int) c).ToString("X4"));
}
}
return returnStringBuilder.ToString();
}
How do I decode this string 'Sch\u00f6nen' (#"Sch\u00f6nen") in C#, I've tried HttpUtility but it doesn't give me the results I need, which is "Schönen".
Regex.Unescape did the trick:
System.Text.RegularExpressions.Regex.Unescape(#"Sch\u00f6nen");
Note that you need to be careful when testing your variants or writing unit tests: "Sch\u00f6nen" is already "Schönen". You need # in front of string to treat \u00f6 as part of the string.
If you landed on this question because you see "Sch\u00f6nen" (or similar \uXXXX values in string constant) - it is not encoding. It is a way to represent Unicode characters as escape sequence similar how string represents New Line by \n and Return by \r.
I don't think you have to decode.
string unicodestring = "Sch\u00f6nen";
Console.WriteLine(unicodestring);
Schönen was outputted.
Wrote a code that covnerts unicode strings to actual chars. (But the best answer in this topic works fine and less complex).
string stringWithUnicodeSymbols = #"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, #"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
try
{
if (s.Length == 4)
{
var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
outString += decoded;
}
else
{
outString += s;
}
}
catch (Exception e)
{
outString += s;
}
}
I was recently working on a project where I needed to convert a regular string of numbers into ASCIII hexadecimal and store the hex in a string.
So I had something like
string random_string = "4000124273218347581"
and I wanted to convert it into a hexadecimal string in the form
string hex_string = "34303030313234323733323138333437353831"
This might seem like an oddly specific task but it's one I encountered and, when I tried to find out how to perform it, I couldn't find any answers online.
Anyway, I figured it out and created a class to make things tidier in my code.
In case anyone else needs to convert a regular string into a hexadecimal string I'll be posting an answer in a moment which will contain my solution.
(I'm fairly new to stackoverflow so I hope that doing this is okay)
=========================================
Turns out I can't answer my question myself within the first 8 hours of asking due to not having a high enough reputation.
So I'm sticking my answer here instead:
Okay, so here's my solution:
I created a class called StringToHex in the namespace
public class StringToHex
{
private string localstring;
private char[] char_array;
private StringBuilder outputstring = new StringBuilder();
private int value;
public StringToHex(string text)
{
localstring = text;
}
public string ToAscii()
{
/* Convert text into an array of characters */
char_array = localstring.ToCharArray();
foreach (char letter in char_array)
{
/* Get the integral value of the character */
value = Convert.ToInt32(letter);
/* Convert the decimal value to a hexadecimal value in string form */
string hex = String.Format("{0:X}", value);
/* Append hexadecimal version of the char to the string outputstring*/
outputstring.Append(Convert.ToString(hex));
}
return outputstring.ToString();
}
}
And to use it you need to do something of the form:
/* Convert string to hexadecimal */
StringToHex an_instance_of_stringtohex = new StringToHex(string_to_convert);
string converted_string = an_instance_of_stringtohex.ToAscii();
If it's working properly, the converted string should be twice the length of the original string (due to hex using two bytes to represent each character).
Now, as someone's already pointed out, you can find an article doing something similar here:
http://www.c-sharpcorner.com/UploadFile/Joshy_geo/HexConverter10282006021521AM/HexConverter.aspx
But I didn't find it much help for my specific task and I'd like to think that my solution is more elegant ;)
This works as long as the character codes in the string is not greater than 255 (0xFF):
string hex_string =
String.Concat(random_string.Select(c => ((int)c).ToString("x2")));
Note: This also works for character codes below 16 (0x10), e.g. it will produce the hex codes "0D0A" from the line break characters "\r\n", not "DA".
you need to read the following article -
http://www.c-sharpcorner.com/UploadFile/Joshy_geo/HexConverter10282006021521AM/HexConverter.aspx
the main function that converts data into hex format
public string Data_Hex_Asc(ref string Data)
{
string Data1 = "";
string sData = "";
while (Data.Length > 0)
//first take two hex value using substring.
//then convert Hex value into ascii.
//then convert ascii value into character.
{
Data1 = System.Convert.ToChar(System.Convert.ToUInt32(Data.Substring(0, 2), 16)).ToString();
sData = sData + Data1;
Data = Data.Substring(2, Data.Length - 2);
}
return sData;
}
see if this what you are looking for.
Okay, so here's my solution:
I created a class called StringToHex in the namespace
public class StringToHex
{
private string localstring;
private char[] char_array;
private StringBuilder outputstring = new StringBuilder();
private int value;
public StringToHex(string text)
{
localstring = text;
}
public string ToAscii()
{
/* Convert text into an array of characters */
char_array = localstring.ToCharArray();
foreach (char letter in char_array)
{
/* Get the integral value of the character */
value = Convert.ToInt32(letter);
/* Convert the decimal value to a hexadecimal value in string form */
string hex = String.Format("{0:X}", value);
/* Append hexadecimal version of the char to the string outputstring*/
outputstring.Append(Convert.ToString(hex));
}
return outputstring.ToString();
}
}
And to use it you need to do something of the form:
/* Convert string to hexadecimal */
StringToHex an_instance_of_stringtohex = new StringToHex(string_to_convert);
string converted_string = an_instance_of_stringtohex.ToAscii();
If it's working properly, the converted string should be twice the length of the original string (due to hex using two bytes to represent each character).
Now, as someone's already pointed out, you can find an article doing something similar here:
http://www.c-sharpcorner.com/UploadFile/Joshy_geo/HexConverter10282006021521AM/HexConverter.aspx
But I didn't find it much help for my specific task and I'd like to think that my solution is more elegant ;)
This question already has answers here:
Unicode characters string
(5 answers)
Closed 6 years ago.
We have one text file which has the following text
"\u5b89\u5fbd\u5b5f\u5143"
When we read the file content in C# .NET it shows like:
"\\u5b89\\u5fbd\\u5b5f\\u5143"
Our decoder method is
public string Decoder(string value)
{
Encoding enc = new UTF8Encoding();
byte[] bytes = enc.GetBytes(value);
return enc.GetString(bytes);
}
When I pass a hard coded value,
string Output=Decoder("\u5b89\u5fbd\u5b5f\u5143");
it works well, but when we use a variable value it is not working.
When we use the string this is what we get from the text file:
value=(text file content)
string Output=Decoder(value);
It returns the wrong output.
How can I fix this?
Use the below code. This unescapes any escaped characters from the input string
Regex.Unescape(value);
You could use a regular expression to parse the file:
private static Regex _regex = new Regex(#"\\u(?<Value>[a-zA-Z0-9]{4})", RegexOptions.Compiled);
public string Decoder(string value)
{
return _regex.Replace(
value,
m => ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString()
);
}
And then:
string data = Decoder(File.ReadAllText("test.txt"));
So your file contains the verbatim string
\u5b89\u5fbd\u5b5f\u5143
in ASCII and not the string represented by those four Unicode codepoints in some given encoding?
As it happens, I just wrote some code in C# that can parse strings in this format for a JSON parser project -- here's a variant that only handles \uXXXX escapes:
private static string ReadSlashedString(TextReader reader) {
var sb = new StringBuilder(32);
bool q = false;
while (true) {
int chrR = reader.Read();
if (chrR == -1) break;
var chr = (char) chrR;
if (!q) {
if (chr == '\\') {
q = true;
continue;
}
sb.Append(chr);
}
else {
switch (chr) {
case 'u':
case 'U':
var hexb = new char[4];
reader.Read(hexb, 0, 4);
chr = (char) Convert.ToInt32(new string(hexb), 16);
sb.Append(chr);
break;
default:
throw new Exception("Invalid backslash escape (\\ + charcode " + (int) chr + ")");
}
q = false;
}
}
return sb.ToString();
}
And you could use it like:
var str = ReadSlashedString(new StringReader("\\u5b89\\u5fbd\\u5b5f\\u5143"));
(or using a StreamReader to read from a file).
Darin Dimitrov's regexp-utilizing answer is probably faster, but I happened to have this code at hand. :)
UTFEncoding (or any other encoding) won't translate escape sequences like \u5b89 into the corresponding character.
The reason why it works when you pass a string constant is that the C# compiler is interpreting the escape sequences and translating them in the corresponding character before calling the decoder (actually even before the program is executed...).
You have to write code that recognizes the escape sequences and convert them into the corresponding characters.
When you are reading "\u5b89\u5fbd\u5b5f\u5143" you get exactly what you read. The debugger escapes your strings before displaying them. The double backslashes in the string are actually single backslashes that have been escaped.
When you pass you hardcoded value, you are not actually passing in what you see on the screen. You are passing in four Unicode characters, since the C# string is unescaped by the compiler.
Darin already posted a way to unescape Unicode characters from the file, so I won't repeat it.
I think this will give you some idea.
string str = "ivandro\u0020";
str = str.Trim();
If you try to print the string, you will notice that the space, which is \u0020, is removed.
I have strings like this:
var a = "abcdefg";
var b = "xxxxxxxx";
The strings are always longer than five characters.
Now I need to trim off the last 3 characters. Is there some simple way that I can do this with C#?
In the trivial case you can just use
result = s.Substring(0, s.Length-3);
to remove the last three characters from the string.
Or as Jason suggested Remove is an alternative:
result = s.Remove(s.Length-3)
Unfortunately for unicode strings there can be a few problems:
A unicode codepoint can consist of multiple chars since the encoding of string is UTF-16 (See Surrogate pairs). This happens only for characters outside the basic plane, i.e. which have a code-point >2^16. This is relevant if you want to support Chinese.
A glyph (graphical symbol) can consist of multiple codepoints. For example ä can be written as a followed by a combining ¨.
Behavior with right-to-left writing might not be what you want either
You want String.Remove(Int32)
Deletes all the characters from this string beginning at a specified
position and continuing through the last position.
If you want to perform validation, along the lines of druttka's answer, I would suggest creating an extension method
public static class MyStringExtensions
{
public static string SafeRemove(this string s, int numCharactersToRemove)
{
if (numCharactersToRemove > s.Length)
{
throw new ArgumentException("numCharactersToRemove");
}
// other validation here
return s.Remove(s.Length - numCharactersToRemove);
}
}
var s = "123456";
var r = s.SafeRemove(3); //r = "123"
var t = s.SafeRemove(7); //throws ArgumentException
string a = "abcdefg";
a = a.Remove(a.Length - 3);
string newString = oldString.Substring(0, oldString.Length - 4);
If you really only need to trim off the last 3 characters, you can do this
string a = "abcdefg";
if (a.Length > 3)
{
a = a.Substring(0, a.Length-3);
}
else
{
a = String.Empty;
}