Removing trailing nulls from byte array in C# - c#

Ok, I am reading in dat files into a byte array. For some reason, the people who generate these files put about a half meg's worth of useless null bytes at the end of the file. Anybody know a quick way to trim these off the end?
First thought was to start at the end of the array and iterate backwards until I found something other than a null, then copy everything up to that point, but I wonder if there isn't a better way.
To answer some questions:
Are you sure the 0 bytes are definitely in the file, rather than there being a bug in the file reading code? Yes, I am certain of that.
Can you definitely trim all trailing 0s? Yes.
Can there be any 0s in the rest of the file? Yes, there can be 0's other places, so, no, I can't start at the beginning and stop at the first 0.

I agree with Jon. The critical bit is that you must "touch" every byte from the last one until the first non-zero byte. Something like this:
byte[] foo;
// populate foo
int i = foo.Length - 1;
while(foo[i] == 0)
--i;
// now foo[i] is the last non-zero byte
byte[] bar = new byte[i+1];
Array.Copy(foo, bar, i+1);
I'm pretty sure that's about as efficient as you're going to be able to make it.

Given the extra questions now answered, it sounds like you're fundamentally doing the right thing. In particular, you have to touch every byte of the file from the last 0 onwards, to check that it only has 0s.
Now, whether you have to copy everything or not depends on what you're then doing with the data.
You could perhaps remember the index and keep it with the data or filename.
You could copy the data into a new byte array
If you want to "fix" the file, you could call FileStream.SetLength to truncate the file
The "you have to read every byte between the truncation point and the end of the file" is the critical part though.

#Factor Mystic,
I think there is a shortest way:
var data = new byte[] { 0x01, 0x02, 0x00, 0x03, 0x04, 0x00, 0x00, 0x00, 0x00 };
var new_data = data.TakeWhile((v, index) => data.Skip(index).Any(w => w != 0x00)).ToArray();

How about this:
[Test]
public void Test()
{
var chars = new [] {'a', 'b', '\0', 'c', '\0', '\0'};
File.WriteAllBytes("test.dat", Encoding.ASCII.GetBytes(chars));
var content = File.ReadAllText("test.dat");
Assert.AreEqual(6, content.Length); // includes the null bytes at the end
content = content.Trim('\0');
Assert.AreEqual(4, content.Length); // no more null bytes at the end
// but still has the one in the middle
}

Assuming 0=null, that is probably your best bet... as a minor tweak, you might want to use Buffer.BlockCopy when you finally copy the useful data..

test this :
private byte[] trimByte(byte[] input)
{
if (input.Length > 1)
{
int byteCounter = input.Length - 1;
while (input[byteCounter] == 0x00)
{
byteCounter--;
}
byte[] rv = new byte[(byteCounter + 1)];
for (int byteCounter1 = 0; byteCounter1 < (byteCounter + 1); byteCounter1++)
{
rv[byteCounter1] = input[byteCounter1];
}
return rv;
}

There is always a LINQ answer
byte[] data = new byte[] { 0x01, 0x02, 0x00, 0x03, 0x04, 0x00, 0x00, 0x00, 0x00 };
bool data_found = false;
byte[] new_data = data.Reverse().SkipWhile(point =>
{
if (data_found) return false;
if (point == 0x00) return true; else { data_found = true; return false; }
}).Reverse().ToArray();

You could just count the number of zero at the end of the array and use that instead of .Length when iterating the array later on. You could encapsulate this however you like. Main point is you don't really need to copy it into a new structure. If they are big, it may be worth it.

if in the file null bytes can be valid values, do you know that the last byte in the file cannot be null. if so, iterating backwards and looking for the first non-null entry is probably best, if not then there is no way to tell where the actual end of the file is.
If you know more about the data format, such as there can be no sequence of null bytes longer than two bytes (or some similar constraint). Then you may be able to actually do a binary search for the 'transition point'. This should be much faster than the linear search (assuming that you can read in the whole file).
The basic idea (using my earlier assumption about no consecutive null bytes), would be:
var data = (byte array of file data...);
var index = data.length / 2;
var jmpsize = data.length/2;
while(true)
{
jmpsize /= 2;//integer division
if( jmpsize == 0) break;
byte b1 = data[index];
byte b2 = data[index + 1];
if(b1 == 0 && b2 == 0) //too close to the end, go left
index -=jmpsize;
else
index += jmpsize;
}
if(index == data.length - 1) return data.length;
byte b1 = data[index];
byte b2 = data[index + 1];
if(b2 == 0)
{
if(b1 == 0) return index;
else return index + 1;
}
else return index + 2;

When the file is large (much larger than my RAM), I use this to remove trailing nulls:
static void RemoveTrailingNulls(string inputFilename, string outputFilename)
{
int bufferSize = 100 * 1024 * 1024;
long totalTrailingNulls = 0;
byte[] emptyArray = new byte[bufferSize];
using (var inputFile = File.OpenRead(inputFilename))
using (var inputFileReversed = new ReverseStream(inputFile))
{
var buffer = new byte[bufferSize];
while (true)
{
var start = DateTime.Now;
var bytesRead = inputFileReversed.Read(buffer, 0, buffer.Length);
if (bytesRead == emptyArray.Length && Enumerable.SequenceEqual(emptyArray, buffer))
{
totalTrailingNulls += buffer.Length;
}
else
{
var nulls = buffer.Take(bytesRead).TakeWhile(b => b == 0).Count();
totalTrailingNulls += nulls;
if (nulls < bytesRead)
{
//found the last non-null byte
break;
}
}
var duration = DateTime.Now - start;
var mbPerSec = (bytesRead / (1024 * 1024D)) / duration.TotalSeconds;
Console.WriteLine($"{mbPerSec:N2} MB/seconds");
}
var lastNonNull = inputFile.Length - totalTrailingNulls;
using (var outputFile = File.Open(outputFilename, FileMode.Create, FileAccess.Write))
{
inputFile.Seek(0, SeekOrigin.Begin);
inputFile.CopyTo(outputFile, lastNonNull, bufferSize);
}
}
}
It uses the ReverseStream class, which can be found here.
And this extension method:
public static class Extensions
{
public static long CopyTo(this Stream input, Stream output, long count, int bufferSize)
{
byte[] buffer = new byte[bufferSize];
long totalRead = 0;
while (true)
{
if (count == 0) break;
int read = input.Read(buffer, 0, (int)Math.Min(bufferSize, count));
if (read == 0) break;
totalRead += read;
output.Write(buffer, 0, read);
count -= read;
}
return totalRead;
}
}

In my case LINQ approach never finished ^))) It's to slow to work with byte arrays!
Guys, why won't you use Array.Copy() method?
/// <summary>
/// Gets array of bytes from memory stream.
/// </summary>
/// <param name="stream">Memory stream.</param>
public static byte[] GetAllBytes(this MemoryStream stream)
{
byte[] result = new byte[stream.Length];
Array.Copy(stream.GetBuffer(), result, stream.Length);
return result;
}

Related

Fastest way to split a Stream according to a pattern

What would be the most optimal/fastest way to split a Steam into chunks delimited by a byte pattern (eg. new byte[] { 0, 0 })?
My current, naieve and slow, implementation reads the stream byte per byte, decrements a counter each time it encounters the delimiter. If the counter is zero, it yields a memory chunk.
const int NUMBER_CONSECUTIVE_DELIMITER = 2;
const int DELIMITER = 0;
public IEnumerable<ReadOnlyMemory<byte>> Chunk(Stream stream)
{
var chunk = new MemoryStream();
try
{
int b; //the byte being read
int c = NUMBER_CONSECUTIVE_DELIMITER;
while ((b = stream.ReadByte()) != -1) //Read the stream byte by byte, -1 = end of the stream
{
chunk.WriteByte((byte)b); //Write this byte to the next chunk
if (b == DELIMITER)
c--; //if we hit the delimiter (ie '0') decrement the counter
else
c = NUMBER_CONSECUTIVE_DELIMITER; //else, reset the couter
if ((c <= 0 || stream.Position == stream.Length) //we hit two subsequent '0's
{
var r = chunk.ToArray().AsMemory(); //parse it to a Memory<T>
chunk.Dispose();
chunk = new();
yield return r;
}
}
}
finally
{
chunk.Dispose();
}
}
Such an implementation is extremely difficult to implement because a stream has to be read out in fixed buffer sizes. The buffer can be too big or too small for the content to be interpreted. To solve this problem, the ReadOnlySequence<T> struct was added. More information about this topic can be seen here.
By using System.IO.Pipelines (package must be obtained) this problem can be solved as follows:
public static async Task FillPipeAsync(Stream stream, PipeWriter writer, CancellationToken cancellationToken = default)
{
// The minimum buffer size that is used for the current buffer segment.
const int bufferSize = 65536;
while (true)
{
// Request 65536 bytes from the PipeWriter.
Memory<byte> memory = writer.GetMemory(bufferSize);
// Read the content from the stream.
int bytesRead = await stream.ReadAsync(memory, cancellationToken).ConfigureAwait(false);
if (bytesRead == 0) break;
// Tell the writer how many bytes are read.
writer.Advance(bytesRead);
// Flush the data to the PipeWriter.
FlushResult result = await writer.FlushAsync(cancellationToken).ConfigureAwait(false);
if (result.IsCompleted) break;
}
// This enables our reading process to be notified that no more new data is coming.
await writer.CompleteAsync().ConfigureAwait(false);
}
This will read your stream asynchronously and write a buffer segment to the pipe. Next you have to implement a read logic to slice/merge the concatenated buffer segments into chunks:
public static async IAsyncEnumerable<ReadOnlySequence<byte>> ReadPipeAsync(PipeReader reader, ReadOnlyMemory<byte> delimiter,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
while (true)
{
// Read from the PipeReader.
ReadResult result = await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
ReadOnlySequence<byte> buffer = result.Buffer;
while (TryReadChunk(ref buffer, delimiter.Span, out ReadOnlySequence<byte> chunk))
yield return chunk;
// Tell the PipeReader how many bytes are read.
// This is essential because the Pipe will release last used buffer segments that are not longer in use.
reader.AdvanceTo(buffer.Start, buffer.End);
// Take care of the complete notification and return the last buffer. UPDATE: Corrected issue 2/.
if (result.IsCompleted)
{
yield return buffer;
break;
}
}
await reader.CompleteAsync().ConfigureAwait(false);
}
private static bool TryReadChunk(ref ReadOnlySequence<byte> buffer, ReadOnlySpan<byte> delimiter,
out ReadOnlySequence<byte> chunk)
{
// Search the buffer for the first byte of the delimiter.
SequencePosition? position = buffer.PositionOf(delimiter[0]);
// If no occurence was found or the next bytes of the data in the buffer does not match the delimiter, return false.
// UPDATE: Corrected issue 3/.
if (position is null || !buffer.Slice(position.Value, delimiter.Length).FirstSpan.StartsWith(delimiter))
{
chunk = default;
return false;
}
// Return the calculated chunk and update the buffer to cut the start.
chunk = buffer.Slice(0, position.Value);
buffer = buffer.Slice(buffer.GetPosition(delimiter.Length, position.Value));
return true;
}
For this to work in that form you have to use an IAsyncEnumerable so that the chunks can be streamed into a foreach loop. Merging and slicing is largely handled by the pipe, so that a reliable algorithm can be built here with relatively little code. This code will also handle this in a high-performance manner.
Usage:
// Create a Pipe that manages the buffer.
Pipe pipe = new Pipe();
ConfiguredTaskAwaitable writing = FillPipeAsync(stream, pipe.Writer).ConfigureAwait(false);
// The delimiter that should be used. This can be any data with length > 0.
ReadOnlyMemory<byte> delimiter = new ReadOnlyMemory<byte>(new byte[] { 0, 0 });
// 'await foreach' and 'await writing' are executed asynchronously (in parallel).
await foreach (ReadOnlySequence<byte> chunk in ReadPipeAsync(pipe.Reader, delimiter))
{
// Use "chunk" to retrieve your chunked content.
};
await writing;
Note that reading and chunking is done asynchronously and independently.
I eventually ended up with the below code, strongly inspired by Philipp's answer above and https://keestalkstech.com/2010/11/seek-position-of-a-string-in-a-file-or-filestream/.
public override IEnumerable<byte[]> Chunk(Stream stream)
{
var buffer = new byte[bufferSize];
var size = bufferSize;
var offset = 0;
var position = stream.Position;
var nextChunk = Array.Empty<byte>();
while (true)
{
var bytesRead = stream.Read(buffer, offset, size);
// when no bytes are read -- the string could not be found
if (bytesRead <= 0)
break;
// when less then size bytes are read, we need to slice the buffer to prevent reading of "previous" bytes
ReadOnlySpan<byte> ro = buffer;
if (bytesRead < size)
ro = ro.Slice(0, offset + bytesRead);
// check if we can find our search bytes in the buffer
var i = ro.IndexOf(Delimiter);
if (i > -1 && // we found something
i <= bytesRead && //i <= r -- we found something in the area that was read (at the end of the buffer, the last values are not overwritten). i = r if the delimiter is at the end of the buffer
nextChunk.Length + (i + Delimiter.Length - offset) >= MinChunkSize) //the size of the chunk that will be made is large enough
{
var chunk = buffer[offset..(i + Delimiter.Length)];
yield return new byte[](Concat(nextChunk, chunk));
nextChunk = Array.Empty<byte>();
offset = 0;
size = bufferSize;
position += i + Delimiter.Length;
stream.Position = position;
continue;
}
else if (stream.Position == stream.Length)
{
// we re at the end of the stream
var chunk = buffer[offset..(bytesRead + offset)]; //return the bytes read
yield return new byte[](Concat(nextChunk, chunk));
break;
}
// the stream is not finished. Copy the last 2 bytes to the beginning of the buffer and set the offset to fill the buffer as of byte 3
nextChunk = Concat(nextChunk, buffer[offset..buffer.Length]);
offset = Delimiter.Length;
size = bufferSize - offset;
Array.Copy(buffer, buffer.Length - offset, buffer, 0, offset);
position += bufferSize - offset;
}
}

FileStream delimiter?

I have a large file with (text/Binary) format.
file format: (0 represent a byte)
00000FileName0000000Hello
World
world1
...
0000000000000000000000
Currently i'm using FileStream and i want to read the Hello.
I Know where Hello start, and it ends with a 0x0D 0x0A.
I also need to go back if the words is not equal to Hello.
How can i read until a carriage return?
is there any PEEK like function in FileStream so i can move back the read pointer`?
is FileStream even a good choice in this case?
You can use the method FileStream.Seek to change the read/write position.
You can use BinaryReader for reading binary content; however, it uses an inner buffer so you cannot rely the underlying Stream.Position anymore, because it can read more bytes in the background than you want. But you can re-implement its needed methods:
private byte[] ReadBytes(Stream s, int count)
{
buffer = new byte[count];
if (count == 0)
{
return buffer;
}
// reading one byte
if (count == 1)
{
int value = s.ReadByte();
if (value == -1)
threw new IOException("Out of stream");
buffer[0] = (byte)value;
return buffer;
}
// reading multiple bytes
int offset = 0;
do
{
int readBytes = s.Read(buffer, offset, count - offset);
if (readBytes == 0)
threw new IOException("Out of stream");
offset += readBytes;
}
while (offset < count);
return buffer;
}
public int ReadInt32(Stream s)
{
byte[] buffer = ReadBytes(s, 4);
return BitConverter.ToInt32(buffer, 0);
}
// similarly, write ReadInt16/64, etc, whatever you need
Assuming that you are on the start position, you can write a ReadString, too:
private string ReadString(Stream s, char delimiter)
{
var result = new List<char>();
int c;
while ((c = s.ReadByte()) != -1 && (char)c != delimiter)
{
result.Add((char)c);
}
return new string(result.ToArray());
}
Usage:
FileStream fs = GetMyFile(); // todo
if (!fs.CanSeek)
throw new NotSupportedException("sorry");
long posCurrent = fs.Position; // save current position
int posHello = ReadInt32(fs); // read position of "hello"
fs.Seek(posHello, SeekOrigin.Begin); // seeking to hello
string hello = ReadString(fs, '\n'); // reading hello
fs.Seek(posCurrent, SeekOrigin.Begin); // seeking back

Bitwise operation performance, how to improve

I have a simple task: determine how many bytes is necessary to encode some number (byte array length) to byte array and encode final value (implement this article: Encoded Length and Value Bytes).
Originally I wrote a quick method that accomplish the task:
public static Byte[] Encode(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
List<Byte> computedRawData = new List<Byte> { enclosingtag };
// if array size is less than 128, encode length directly. No questions here
if (rawData.Length < 128) {
computedRawData.Add((Byte)rawData.Length);
} else {
// convert array size to a hex string
String hexLength = rawData.Length.ToString("x2");
// if hex string has odd length, align it to even by prepending hex string
// with '0' character
if (hexLength.Length % 2 == 1) { hexLength = "0" + hexLength; }
// take a pair of hex characters and convert each octet to a byte
Byte[] lengthBytes = Enumerable.Range(0, hexLength.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hexLength.Substring(x, 2), 16))
.ToArray();
// insert padding byte, set bit 7 to 1 and add byte count required
// to encode length bytes
Byte paddingByte = (Byte)(128 + lengthBytes.Length);
computedRawData.Add(paddingByte);
computedRawData.AddRange(lengthBytes);
}
computedRawData.AddRange(rawData);
return computedRawData.ToArray();
}
This is an old code and was written in an awful way.
Now I'm trying to optimize the code by using either, bitwise operators, or BitConverter class. Here is an example of bitwise-edition:
public static Byte[] Encode2(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
List<Byte> computedRawData = new List<Byte>(rawData);
if (rawData.Length < 128) {
computedRawData.Insert(0, (Byte)rawData.Length);
} else {
// temp number
Int32 num = rawData.Length;
// track byte count, this will be necessary further
Int32 counter = 1;
// simply make bitwise AND to extract byte value
// and shift right while remaining value is still more than 255
// (there are more than 8 bits)
while (num >= 256) {
counter++;
computedRawData.Insert(0, (Byte)(num & 255));
num = num >> 8;
}
// compose final array
computedRawData.InsertRange(0, new[] { (Byte)(128 + counter), (Byte)num });
}
computedRawData.Insert(0, enclosingtag);
return computedRawData.ToArray();
}
and the final implementation with BitConverter class:
public static Byte[] Encode3(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
List<Byte> computedRawData = new List<Byte>(rawData);
if (rawData.Length < 128) {
computedRawData.Insert(0, (Byte)rawData.Length);
} else {
// convert integer to a byte array
Byte[] bytes = BitConverter.GetBytes(rawData.Length);
// start from the end of a byte array to skip unnecessary zero bytes
for (int i = bytes.Length - 1; i >= 0; i--) {
// once the byte value is non-zero, take everything starting
// from the current position up to array start.
if (bytes[i] > 0) {
// we need to reverse the array to get the proper byte order
computedRawData.InsertRange(0, bytes.Take(i + 1).Reverse());
// compose final array
computedRawData.Insert(0, (Byte)(128 + i + 1));
computedRawData.Insert(0, enclosingtag);
return computedRawData.ToArray();
}
}
}
return null;
}
All methods do their work as expected. I used an example from Stopwatch class page to measure performance. And performance tests surprised me. My test method performed 1000 runs of the method to encode a byte array (actually, only array sixe) with 100 000 elements and average times are:
Encode -- around 200ms
Encode2 -- around 270ms
Encode3 -- around 320ms
I personally like method Encode2, because the code looks more readable, but its performance isn't that good.
The question: what you woul suggest to improve Encode2 method performance or to improve Encode readability?
Any help will be appreciated.
===========================
Update: Thanks to all who participated in this thread. I took into consideration all suggestions and ended up with this solution:
public static Byte[] Encode6(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
Byte[] retValue;
if (rawData.Length < 128) {
retValue = new Byte[rawData.Length + 2];
retValue[0] = enclosingtag;
retValue[1] = (Byte)rawData.Length;
} else {
Byte[] lenBytes = new Byte[3];
Int32 num = rawData.Length;
Int32 counter = 0;
while (num >= 256) {
lenBytes[counter] = (Byte)(num & 255);
num >>= 8;
counter++;
}
// 3 is: len byte and enclosing tag
retValue = new byte[rawData.Length + 3 + counter];
rawData.CopyTo(retValue, 3 + counter);
retValue[0] = enclosingtag;
retValue[1] = (Byte)(129 + counter);
retValue[2] = (Byte)num;
Int32 n = 3;
for (Int32 i = counter - 1; i >= 0; i--) {
retValue[n] = lenBytes[i];
n++;
}
}
return retValue;
}
Eventually I moved away from lists to fixed-sized byte arrays. Avg time against the same data set is now about 65ms. It appears that lists (not bitwise operations) gives me a significant penalty in performance.
The main problem here is almost certainly the allocation of the List, and the allocation needed when you are inserting new elements, and when the list is converted to an array in the end. This code probably spend most of its time in the garbage collector and memory allocator. The use vs non-use of bitwise operators probably means very little in comparison, and I would have looked into ways to reduce the amount of memory you allocate first.
One way is to send in a reference to a byte array allocated in advance and and an index to where you are in the array instead of allocating and returning the data, and then return an integer telling how many bytes you have written. Working on large arrays is usually more efficient than working on many small objects. As others have mentioned, use a profiler, and see where your code spend its time.
Of cause the optimization I mentioned will makes your code more low level in nature, and more close to what you would typically do in C, but there is often a trade off between readability and performance.
Using "reverse, append, reverse" instead of "insert at front", and preallocating everything, it might be something like this: (not tested)
public static byte[] Encode4(byte[] rawData, byte enclosingtag) {
if (rawData == null) {
return new byte[] { enclosingtag, 0 };
}
List<byte> computedRawData = new List<byte>(rawData.Length + 6);
computedRawData.AddRange(rawData);
if (rawData.Length < 128) {
computedRawData.InsertRange(0, new byte[] { enclosingtag, (byte)rawData.Length });
} else {
computedRawData.Reverse();
// temp number
int num = rawData.Length;
// track byte count, this will be necessary further
int counter = 1;
// simply cast to byte to extract byte value
// and shift right while remaining value is still more than 255
// (there are more than 8 bits)
while (num >= 256) {
counter++;
computedRawData.Add((byte)num);
num >>= 8;
}
// compose final array
computedRawData.Add((byte)num);
computedRawData.Add((byte)(counter + 128));
computedRawData.Add(enclosingtag);
computedRawData.Reverse();
}
return computedRawData.ToArray();
}
I don't know for sure whether it's going to be faster, but it would make sense - now the expensive "insert at front" operation is mostly avoided, except in the case where there would be only one of them (probably not enough to balance with the two reverses).
An other idea is to limit the insert at front to only one time in an other way: collect all the things that have to be inserted there and then do it once. Could look something like this: (not tested)
public static byte[] Encode5(byte[] rawData, byte enclosingtag) {
if (rawData == null) {
return new byte[] { enclosingtag, 0 };
}
List<byte> computedRawData = new List<byte>(rawData);
if (rawData.Length < 128) {
computedRawData.InsertRange(0, new byte[] { enclosingtag, (byte)rawData.Length });
} else {
// list of all things that will be inserted
List<byte> front = new List<byte>(8);
// temp number
int num = rawData.Length;
// track byte count, this will be necessary further
int counter = 1;
// simply cast to byte to extract byte value
// and shift right while remaining value is still more than 255
// (there are more than 8 bits)
while (num >= 256) {
counter++;
front.Insert(0, (byte)num); // inserting in tiny list, not so bad
num >>= 8;
}
// compose final array
front.InsertRange(0, new[] { (byte)(128 + counter), (byte)num });
front.Insert(0, enclosingtag);
computedRawData.InsertRange(0, front);
}
return computedRawData.ToArray();
}
If it's not good enough or didn't help (or if this is worse - hey, could be), I'll try to come up with more ideas.

C# Reading, Storing, and Combining Arrays

I am working on an RS232 communication effort but have been running into issues with the some of the arrays I am handling.
In the example below I am sending out a "Command" with the intent to read and store the first 4 bytes of the command in a new array called "FirstFour". For every loop execution that is run I also want to convert the integer "i" to a Hex value. I then intend to combine the "FirstFour" and "iHex" arrays into a new array noted as "ComboByte". Below is my code so far but it doesn't seem to be working.
private void ReadStoreCreateByteArray()
{
byte[] Command = { 0x01, 0x02, 0x05, 0x04, 0x05, 0x06, 0x07, 0x08};
for (int i = 0; i < 10; i++)
{
//Send Command
comport.Write(Command, 0, Command.Length);
//Read response and store in buffer
int bytes = comport.BytesToRead;
byte[] Buffer = new byte[bytes];
comport.Read(Buffer, 0, bytes);
//Create 4 byte array to hold first 4 bytes out of Command
var FirstFour = Buffer.Take(4).ToArray();
//Convert i to a Hex value
byte iHex = Convert.ToByte(i.ToString());
//Combine "FirstFour" and "iHex" into a new array
byte [] ComboByte = {iHex, FirstFour[1], FirstFour[2], FirstFour[3], First Four[4]};
comport.Write(ComboByte, 0, ComboByte.Length);
}
}
Any help would be appreciated. Thanks!
Arrays are zero based, so...
byte [] ComboByte = {iHex, FirstFour[0], FirstFour[1], FirstFour[2], First Four[3]};
...should give you the first 4 elements of FirstFour.
Firstly, you need to handle the return value from Read operations. The following pattern should work fine for a range of APIs, including SerialPort, Stream, etc:
static void ReadExact(SerialPort port, byte[] buffer, int offset, int count)
{
int read;
while(count > 0 && (read = port.Read(buffer, offset, count)) > 0)
{
count -= read;
offset += read;
}
if (count != 0) throw new EndOfStreamException();
}
So: you have a method that can read a reliable number of bytes - you should then be able to re-use a single buffer and populate it sequentially:
byte[] buffer = new byte[5];
for (int i = 0; i < 10; i++)
{
//...
buffer[0] = (byte)i;
ReadExact(port, buffer, 1, 4);
}
The BytesToRead property is largely useless except for deciding whether to read synchronously or asynchronously, as it doesn't tell you whether more data is imminent. With your existing code there is no guarantee you will have at least 4 bytes.

BitConverter.ToString() in reverse? [duplicate]

This question already has answers here:
How do you convert a byte array to a hexadecimal string, and vice versa?
(53 answers)
Closed 8 years ago.
I have an array of bytes that I would like to store as a string. I can do this as follows:
byte[] array = new byte[] { 0x01, 0x02, 0x03, 0x04 };
string s = System.BitConverter.ToString(array);
// Result: s = "01-02-03-04"
So far so good. Does anyone know how I get this back to an array? There is no overload of BitConverter.GetBytes() that takes a string, and it seems like a nasty workaround to break the string into an array of strings and then convert each of them.
The array in question may be of variable length, probably about 20 bytes.
Not a built in method, but an implementation. (It could be done without the split though).
String[] arr=str.Split('-');
byte[] array=new byte[arr.Length];
for(int i=0; i<arr.Length; i++) array[i]=Convert.ToByte(arr[i],16);
Method without Split: (Makes many assumptions about string format)
int length=(s.Length+1)/3;
byte[] arr1=new byte[length];
for (int i = 0; i < length; i++)
arr1[i] = Convert.ToByte(s.Substring(3 * i, 2), 16);
And one more method, without either split or substrings. You may get shot if you commit this to source control though. I take no responsibility for such health problems.
int length=(s.Length+1)/3;
byte[] arr1=new byte[length];
for (int i = 0; i < length; i++)
{
char sixteen = s[3 * i];
if (sixteen > '9') sixteen = (char)(sixteen - 'A' + 10);
else sixteen -= '0';
char ones = s[3 * i + 1];
if (ones > '9') ones = (char)(ones - 'A' + 10);
else ones -= '0';
arr1[i] = (byte)(16*sixteen+ones);
}
(basically implementing base16 conversion on two chars)
You can parse the string yourself:
byte[] data = new byte[(s.Length + 1) / 3];
for (int i = 0; i < data.Length; i++) {
data[i] = (byte)(
"0123456789ABCDEF".IndexOf(s[i * 3]) * 16 +
"0123456789ABCDEF".IndexOf(s[i * 3 + 1])
);
}
The neatest solution though, I believe, is using extensions:
byte[] data = s.Split('-').Select(b => Convert.ToByte(b, 16)).ToArray();
If you don't need that specific format, try using Base64, like this:
var bytes = new byte[] { 0x12, 0x34, 0x56 };
var base64 = Convert.ToBase64String(bytes);
bytes = Convert.FromBase64String(base64);
Base64 will also be substantially shorter.
If you need to use that format, this obviously won't help.
byte[] data = Array.ConvertAll<string, byte>(str.Split('-'), s => Convert.ToByte(s, 16));
I believe the following will solve this robustly.
public static byte[] HexStringToBytes(string s)
{
const string HEX_CHARS = "0123456789ABCDEF";
if (s.Length == 0)
return new byte[0];
if ((s.Length + 1) % 3 != 0)
throw new FormatException();
byte[] bytes = new byte[(s.Length + 1) / 3];
int state = 0; // 0 = expect first digit, 1 = expect second digit, 2 = expect hyphen
int currentByte = 0;
int x;
int value = 0;
foreach (char c in s)
{
switch (state)
{
case 0:
x = HEX_CHARS.IndexOf(Char.ToUpperInvariant(c));
if (x == -1)
throw new FormatException();
value = x << 4;
state = 1;
break;
case 1:
x = HEX_CHARS.IndexOf(Char.ToUpperInvariant(c));
if (x == -1)
throw new FormatException();
bytes[currentByte++] = (byte)(value + x);
state = 2;
break;
case 2:
if (c != '-')
throw new FormatException();
state = 0;
break;
}
}
return bytes;
}
it seems like a nasty workaround to break the string into an array of strings and then convert each of them.
I don't think there's another way... the format produced by BitConverter.ToString is quite specific, so if there is no existing method to parse it back to a byte[], I guess you have to do it yourself
the ToString method is not really intended as a conversion, rather to provide a human-readable format for debugging, easy printout, etc.
I'd rethink about the byte[] - String - byte[] requirement and probably prefer SLaks' Base64 solution

Categories