trim hex from end of string - c#

I have a byte array that's been initialized with 0xFF in each byte:
for (int i = 0; i < buffer.Length; i++)
{
buffer[i] = 0xFF;
}
Once this byte array has been filled with valid data, I need to extract an ASCII string that's stored at offset 192 and may be up to 32 characters in length. I'm doing this like so:
ASCIIEncoding enc = new ASCIIEncoding();
stringToRead = enc.GetString(buffer, 192, 32);
This works but I need to strip off the trailing bytes that contain 0xFF to avoid the string looking something like "John Smith??????????????????????". Is there a function in .NET that provides this ability? Something like the String.TrimEnd() function perhaps or am I looking at a regex to do this?

I would suggest just finding out how long the string will really be:
int firstFF = Array.IndexOf(buffer, (byte) 0xff, 192);
if (firstFF == -1)
{
firstFF = buffer.Length;
}
stringToRead = Encoding.ASCII(buffer, 192, firstFF - 192);
I would not try to give Encoding.ASCII bytes which aren't valid ASCII-encoded text. I don't know offhand what it would do with them - I suspect it would convert them to ? to show the error (as suggested by your existing output), but then you wouldn't be able to tell the difference between that and real question marks. For example:
byte[] data = { 0x41, 0x42, 0x43, 0xff, 0xff };
string text = Encoding.ASCII.GetString(data);
Console.WriteLine(text.Contains((char) 0xff)); // False
Console.WriteLine(text.TrimEnd((char) 0xff).Length); // Still 5...
Now you could create an encoding which used some non-ASCII replacement character... but that's a lot of hassle when you can just find where the binary data stops being valid.

var s = "Whatever" + new String((Char)0xFF, 32);
var trimmed = s.TrimEnd((Char)0xFF);
Alternatively, you can scan the string for the first index of the character, then take the substring:
var index = s.IndexOf((Char)0xFF);
var trimmed = s.Substring(0, index);

Related

Unicode Hex String to String

I have a unicode string like this:
0030003100320033
Which should turn into 0123.
This is a simple case of 0123 string, but there are some string and unicode chars as well. How can I turn this type of unicode hex string to string in C#?
For normal US charset, first part is always 00, so 0031 is "1" in ASCII, 0032 is "2" and so on.
When its actual unicode char, like Arabic and Chinese, first part is not 00, for instance for Arabic its 06XX, like 0663.
I need to be able to turn this type of Hex string into C# decimal string.
There are several encodings that can represent Unicode, of which UTF-8 is today's de facto standard. However, your example is actually a string representation of UTF-16 using the big-endian byte order. You can convert your hex string back into bytes, then use Encoding.BigEndianUnicode to decode this:
public static void Main()
{
var bytes = StringToByteArray("0030003100320033");
var decoded = System.Text.Encoding.BigEndianUnicode.GetString(bytes);
Console.WriteLine(decoded); // gives "0123"
}
// https://stackoverflow.com/a/311179/1149773
public static byte[] StringToByteArray(string hex)
{
byte[] bytes = new byte[hex.Length / 2];
for (int i = 0; i < hex.Length; i += 2)
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
return bytes;
}
Since Char in .NET represents a UTF-16 code unit, this answer should give identical results to Slai's, including for surrogate pairs.
Shorter less efficient alternative:
Regex.Replace("0030003100320033", "....", m => (char)Convert.ToInt32(m + "", 16) + "");
You should try this solution
public static void Main()
{
string hexString = "0030003100320033"; //Hexa pair numeric values
//string hexStrWithDash = "00-30-00-31-00-32-00-33"; //Hexa pair numeric values separated by dashed. This occurs using BitConverter.ToString()
byte[] data = ParseHex(hexString);
string result = System.Text.Encoding.BigEndianUnicode.GetString(data);
Console.Write("Data: {0}", result);
}
public static byte[] ParseHex(string hexString)
{
hexString = hexString.Replace("-", "");
byte[] output = new byte[hexString.Length / 2];
for (int i = 0; i < output.Length; i++)
{
output[i] = Convert.ToByte(hexString.Substring(i * 2, 2), 16);
}
return output;
}

How do you convert UTF8 number into written text

I am writing a winform to convert written text into Unicode numbers and UTF8 numbers. This bit is working well
//------------------------------------------------------------------------
// Convert to UTF8
// The return will be either 1 byte, 2 bytes or 3 bytes.
//-----------------------------------------------------------------------
UTF8Encoding utf8 = new UTF8Encoding();
StringBuilder builder = new StringBuilder();
string utext = rchtxbx_text.Text;
// do one char at a time
for (int text_index = 0; text_index < utext.Length; text_index++)
{
byte[] encodedBytes = utf8.GetBytes(utext.Substring(text_index, 1));
for (int index = 0; index < encodedBytes.Length; index++)
{
builder.AppendFormat("{0}", Convert.ToString(encodedBytes[index], 16));
}
builder.Append(" ");
}
rchtxtbx_UTF8.SelectionFont = new System.Drawing.Font("San Serif", 20);
rchtxtbx_UTF8.AppendText(builder.ToString() + "\r");
As an example the characters 乘义ש give me e4b998 e4b989 d7a9, note I have a mix LtoR and RtoL text. Now if the user inputs the number e4b998 I want to show them it is 乘, in Unicode 4E58
I have tried a few things and the closest I got, but still far away, is
Encoding utf8 = Encoding.UTF8;
rchtxbx_text.Text = Encoding.ASCII.GetString(utf8.GetBytes(e4b998));
What do I need to do to input e4b998 and write 乘 to a textbox?
Something like this:
Split source into 2-character chunks: "e4b998" -> {"e4", "b9", "98"}
Convert chunks into bytes
Encode bytes into the final string
Implementation:
string source = "e4b998";
string result = Encoding.UTF8.GetString(Enumerable
.Range(0, source.Length / 2)
.Select(i => Convert.ToByte(source.Substring(i * 2, 2), 16))
.ToArray());
If you have an int as source:
string s_unicode2 = System.Text.Encoding.UTF8.GetString(utf8.GetBytes(e4b998));

How to convert decimal string value to hex byte array in C#?

I have an input string which is in decimal format:
var decString = "12345678"; // in hex this is 0xBC614E
and I want to convert this to a fixed length hex byte array:
byte hexBytes[] // = { 0x00, 0x00, 0xBC, 0x61, 0x4E }
I've come up with a few rather convoluted ways to do this but I suspect there is a neat two-liner! Any thoughts? Thanks
UPDATE:
OK I think I may have inadvertently added a level of complexity by having the example showing 5 bytes. Maximum is in fact 4 bytes (FF FF FF FF) = 4294967295. Int64 is fine.
If you have no particular limit to the size of your integer, you could use BigInteger to do this conversion:
var b = BigInteger.Parse("12345678");
var bb = b.ToByteArray();
foreach (var s in bb) {
Console.Write("{0:x} ", s);
}
This prints
4e 61 bc 0
If the order of bytes matters, you may need to reverse the array of bytes.
Maximum is in fact 4 bytes (FF FF FF FF) = 4294967295
You can use uint for that - like this:
uint data = uint.Parse("12345678");
byte[] bytes = new[] {
(byte)((data>>24) & 0xFF)
, (byte)((data>>16) & 0xFF)
, (byte)((data>>8) & 0xFF)
, (byte)((data>>0) & 0xFF)
};
Demo.
To convert the string to bytes you can use BitConverter.GetBytes:
var byteArray = BitConverter.GetBytes(Int32.Parse(decString)).Reverse().ToArray();
Use the appropriate type instead of Int32 if the string is not allways an 32 bit integer.
Then you could check the lenght and add padding bytes if needed:
if (byteArray.Length < 5)
{
var newArray = new byte[5];
Array.Copy(byteArray, 0, newArray, 5 - byteArray.Length, byteArray.Length);
byteArray = newArray;
}
You can use Linq:
String source = "12345678";
// "BC614E"
String result = String.Join("", BigInteger
.Parse(source)
.ToByteArray()
.Reverse()
.SkipWhile(item => item == 0)
.Select(item => item.ToString("X2")));
In case you want Byte[] it'll be
// [0xBC, 0x61, 0x4E]
Byte[] result = BigInteger
.Parse(source)
.ToByteArray()
.Reverse()
.SkipWhile(item => item == 0)
.ToArray();

Expressing byte values > 127 in .Net Strings

I'm writing some binary protocol messages in .Net using strings, and it mostly works, except for one particular case.
The message I'm trying to send is:
String cmdPacket = "\xFD\x0B\x16MBEPEXE1.";
myDevice.Write(Encoding.ASCII.GetBytes(cmdPacket));
(to help decode, those bytes are 253, 11, 22, then the ASCII chars: "MBEPEXE1.").
Except when I do the Encoding.ASCII.GetBytes, the 0xFD comes out as byte 0x3F
(value 253 changed to 63).
(I should point out that the \x0B and \x16 are interpreted correctly as Hex 0B & Hex 16)
I've also tried Encoding.UTF8 and Encoding.UTF7, to no avail.
I feel there is probably a good simple way to express values above 128 in Strings, and convert them to bytes, but I'm missing it.
Any guidance?
Ignoring if it's good or bad what you are doing, the encoding ISO-8859-1 maps all its characters to the characters with the same code in Unicode.
// Bytes with all the possible values 0-255
var bytes = Enumerable.Range(0, 256).Select(p => (byte)p).ToArray();
// String containing the values
var all1bytechars = new string(bytes.Select(p => (char)p).ToArray());
// Sanity check
Debug.Assert(all1bytechars.Length == 256);
// The encoder, you could make it static readonly
var enc = Encoding.GetEncoding("ISO-8859-1"); // It is the codepage 28591
// string-to-bytes
var bytes2 = enc.GetBytes(all1bytechars);
// bytes-to-string
var all1bytechars2 = enc.GetString(bytes);
// check string-to-bytes
Debug.Assert(bytes.SequenceEqual(bytes2));
// check bytes-to-string
Debug.Assert(all1bytechars.SequenceEqual(all1bytechars2));
From the wiki:
ISO-8859-1 was incorporated as the first 256 code points of ISO/IEC 10646 and Unicode.
Or a simple and fast method to convert a string to a byte[] (with unchecked and checked variant)
public static byte[] StringToBytes(string str)
{
var bytes = new byte[str.Length];
for (int i = 0; i < str.Length; i++)
{
bytes[i] = checked((byte)str[i]); // Slower but throws OverflowException if there is an invalid character
//bytes[i] = unchecked((byte)str[i]); // Faster
}
return bytes;
}
ASCII is a 7-bit code. The high-order bit used to be used as a parity bit, so "ASCII" could have even, odd or no parity. You may notice that 0x3F (decimal 63) is the ASCII character ?. That is what non-ASCII octets (those greater than 0x7F/decimal 127) are converted to by the CLR's ASCII encoding. The reason is that there is no standard ASCII character representation of the code points in the range 0x80–0xFF.
C# strings are UTF-16 encoded Unicode internally. If what you care about are the byte values of the strings, and you know that the strings are, in fact, characters whose Unicode code points are in the range U+0000 through U+00FF, then its easy. Unicode's first 256 codepoints (0x00–0xFF), the Unicode blocks C0 Controls and Basic Latin (\x00-\x7F) and C1 Controls and Latin Supplement (\x80-\xFF) are the "normal" ISO-8859-1 characters. A simple incantation like this:
String cmdPacket = "\xFD\x0B\x16MBEPEXE1.";
byte[] buffer = cmdPacket.Select(c=>(byte)c).ToArray() ;
myDevice.Write(buffer);
will get you the byte[] you want, in this case
// \xFD \x0B \x16 M B E P E X E 1 .
[ 0xFD , 0x0B , 0x16 , 0x4d , 0x42 , 0x45, 0x50 , 0x45 , 0x58 , 0x45 , 0x31 , 0x2E ]
With LINQ, you could do something like this:
String cmdPacket = "\xFD\x0B\x16MBEPEXE1.";
myDevice.Write(cmdPacket.Select(Convert.ToByte).ToArray());
Edit: Added an explanation
First, you recognize that your string is really just an array of characters. What you want is an "equivalent" array of bytes, where each byte corresponds to a character.
To get the array, you have to "map" each character of the original array as a byte in the new array. To do that, you can use the built-in System.Convert.ToByte(char) method.
Once you've described your mapping from characters to bytes, it's as simple as projecting the input string, through the mapping, into an array.
Hope that helps!
I use Windows-1252 as it seems to give the most bang for the byte
And is compatible with all .NET string values
You will probably want to comment out the ToLower
This was built for compatibility with SQL char (single byte)
namespace String1byte
{
/// <summary>
/// Interaction logic for MainWindow.xaml
/// </summary>
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
String8bit s1 = new String8bit("cat");
String8bit s2 = new String8bit("cat");
String8bit s3 = new String8bit("\xFD\x0B\x16MBEPEXE1.");
HashSet<String8bit> hs = new HashSet<String8bit>();
hs.Add(s1);
hs.Add(s2);
hs.Add(s3);
System.Diagnostics.Debug.WriteLine(hs.Count.ToString());
System.Diagnostics.Debug.WriteLine(s1.Value + " " + s1.GetHashCode().ToString());
System.Diagnostics.Debug.WriteLine(s2.Value + " " + s2.GetHashCode().ToString());
System.Diagnostics.Debug.WriteLine(s3.Value + " " + s3.GetHashCode().ToString());
System.Diagnostics.Debug.WriteLine(s1.Equals(s2).ToString());
System.Diagnostics.Debug.WriteLine(s1.Equals(s3).ToString());
System.Diagnostics.Debug.WriteLine(s1.MatchStart("ca").ToString());
System.Diagnostics.Debug.WriteLine(s3.MatchStart("ca").ToString());
}
}
public struct String8bit
{
private static Encoding EncodingUnicode = Encoding.Unicode;
private static Encoding EncodingWin1252 = System.Text.Encoding.GetEncoding("Windows-1252");
private byte[] bytes;
public override bool Equals(Object obj)
{
// Check for null values and compare run-time types.
if (obj == null) return false;
if (!(obj is String8bit)) return false;
String8bit comp = (String8bit)obj;
if (comp.Bytes.Length != this.Bytes.Length) return false;
for (Int32 i = 0; i < comp.Bytes.Length; i++)
{
if (comp.Bytes[i] != this.Bytes[i])
return false;
}
return true;
}
public override int GetHashCode()
{
UInt32 hash = (UInt32)(Bytes[0]);
for (Int32 i = 1; i < Bytes.Length; i++) hash = hash ^ (UInt32)(Bytes[0] << (i%4)*8);
return (Int32)hash;
}
public bool MatchStart(string start)
{
if (string.IsNullOrEmpty(start)) return false;
if (start.Length > this.Length) return false;
start = start.ToLowerInvariant(); // SQL is case insensitive
// Convert the string into a byte array
byte[] unicodeBytes = EncodingUnicode.GetBytes(start);
// Perform the conversion from one encoding to the other
byte[] win1252Bytes = Encoding.Convert(EncodingUnicode, EncodingWin1252, unicodeBytes);
for (Int32 i = 0; i < win1252Bytes.Length; i++) if (Bytes[i] != win1252Bytes[i]) return false;
return true;
}
public byte[] Bytes { get { return bytes; } }
public String Value { get { return EncodingWin1252.GetString(Bytes); } }
public Int32 Length { get { return Bytes.Count(); } }
public String8bit(string word)
{
word = word.ToLowerInvariant(); // SQL is case insensitive
// Convert the string into a byte array
byte[] unicodeBytes = EncodingUnicode.GetBytes(word);
// Perform the conversion from one encoding to the other
bytes = Encoding.Convert(EncodingUnicode, EncodingWin1252, unicodeBytes);
}
public String8bit(Byte[] win1252bytes)
{ // if reading from SQL char then read as System.Data.SqlTypes.SqlBytes
bytes = win1252bytes;
}
}
}

Convert byte array to int

I am trying to do some conversion in C#, and I am not sure how to do this:
private int byteArray2Int(byte[] bytes)
{
// bytes = new byte[] {0x01, 0x03, 0x04};
// how to convert this byte array to an int?
return BitConverter.ToInt32(bytes, 0); // is this correct?
// because if I have a bytes = new byte [] {0x32} => I got an exception
}
private string byteArray2String(byte[] bytes)
{
return System.Text.ASCIIEncoding.ASCII.GetString(bytes);
// but then I got a problem that if a byte is 0x00, it show 0x20
}
Could anyone give me some ideas?
BitConverter is the correct approach.
Your problem is because you only provided 8 bits when you promised 32. Try instead a valid 32-bit number in the array, such as new byte[] { 0x32, 0, 0, 0 }.
If you want an arbitrary length array converted, you can implement this yourself:
ulong ConvertLittleEndian(byte[] array)
{
int pos = 0;
ulong result = 0;
foreach (byte by in array) {
result |= ((ulong)by) << pos;
pos += 8;
}
return result;
}
It's not clear what the second part of your question (involving strings) is supposed to produce, but I guess you want hex digits? BitConverter can help with that too, as described in an earlier question.
byte[] bytes = { 0, 0, 0, 25 };
// If the system architecture is little-endian (that is, little end first),
// reverse the byte array.
if (BitConverter.IsLittleEndian)
Array.Reverse(bytes);
int i = BitConverter.ToInt32(bytes, 0);
Console.WriteLine("int: {0}", i);
this is correct, but you're
missing, that Convert.ToInt32
'wants' 32 bits (32/8 = 4 bytes)
of information to make a conversion,
so you cannot convert just One byte:
`new byte [] {0x32}
absolutely the the same trouble
you have. and do not forget about
the encoding you use: from encoding to encoding you have 'different byte count per symbol'
A fast and simple way of doing this is just to copy the bytes to an integer using Buffer.BlockCopy:
UInt32[] pos = new UInt32[1];
byte[] stack = ...
Buffer.BlockCopy(stack, 0, pos, 0, 4);
This has the added benefit of being able to parse numerous integers into an array just by manipulating offsets..

Categories