convert serialport.readline() non-ascii string into a byte array - c#

I want to use serialport.readline() because it's a blocking call and the data I'm getting has 0x0D 0x0A (CR LF) at the end. However, what I want is an Hex String instead of an Ascii representation.
For example, the device I am communication with is sending byte arrays like {0xff,0xff,0x45,0x0D,0x0A}. I want to simply print out in my program just like this: 0xff, 0xff,0x45. Readline() kindly trims out LF and CR.
I though about using serialport.read(buff[]...) by specifying how many bytes I want to read. But it didn't work very well because if I am reading too fast half of the array will be 0x00 and if I am reading too slow, there will be an overflow for the com port. I don't want to lose any bytes.
I tried to convert what I got from serialport.readline() to a byte array but the hex string I got usually turns into 0x3f. Code is like this:
var line = string.Join(",", mySerialPort.ReadLine().Select(c => ((Byte)c).ToString("X")).ToArray());
I changed the encoding a few times (ASCII, UTF8,UNICODE) but still no go.
Is there any way to convert the non-Ascii String I got from readline() into a byte array?

It sounds like you shouldn't be reading it as text data at all.
You're fundamentally dealing with binary data, so use the overload of Read which takes a byte array rather than a char array. (Or call ReadByte repeatedly.)
Any time you try to treat arbitrary binary data as if it's text, you're going to have a bad experience.
It sounds like you've already tried this, but done it badly:
But it didn't work very well because if I am reading too fast half of the array will be 0x00
That suggests you're ignoring the return value of Read, which says how many bytes have actually been read. You should have something like:
int bytesRead = port.Read(buffer, 0, buffer.Length);
// Now use the portion of buffer which is from 0 (inclusive) to
// bytesRead (exclusive).

Related

Handling IAC bytes in TcpClient.Read()

I am reading data from a Telnet server with TcpClient as a byte array and convert into string with ASCII encoder, i.e.:
int size;
byte[] buf;
tcpclient.Read(buf,0,MAX_BUFFER_SIZE);
string resp = new System.Text.ASCIIEncoding().GetString(fullBuffer, 0, fullBuffer.Length);
and have buf with the read bytes. I want to extract (and answer later) IAC commands inside buf, not to have them in resp. Do you have an efficient suggested way.
What first comes to my mind is: Find "\0xFF" strings in resp (a) if followed by "\0xFF", replace "\0xFF\0xFF" with "\0xFF" (b) if not extract followed byte as IAC and answer. resp.Remove() two characters.
Is that an efficient solution? Do you suggest to solve it in byte[] level (is there a way to have some values in byte[] that are ignored by ASCII encoder?) or after encoding into string?
What first comes to my mind is: Find "\0xFF" strings in resp (a) if followed by "\0xFF", replace "\0xFF\0xFF" with "\0xFF" (b) if not extract followed byte as IAC and answer. resp.Remove() two characters.
Part a will work, but part b is lacking. There are several commands that consist of more than 2 bytes. You're going to have to do more work and determine how many bytes each command should consume. For instance, the SB command will contain a variable number of bytes depending on what the suboption is.

BinaryWriter Unusual hex

I got some problem transfering Data via the BinaryWriter.
When I try to send
bw.Write(0x1a);
bw.Write(0xf8);
bw.Write(0x05);
It gets in the output to 0x00 - via
Client2Server._mainSock.Send(ms.ToArray());
What is causing this problem?
Greetings
You are writing 3 integers here. Integers take 4 bytes, and in the cases shown, 3 of them are going to be zeros. Send bytes instead:
bw.Write((byte)0x1a);
of course, if you are writing bytes, then BinaryWriter is overkill - you could just use the Stream.

Byte array replace byte with byte sequence efficiency: iterate and copy versus SelectMany

I'm dealing with a byte array that comprises a text message, but some of the characters in the message are control characters (i.e. less than 0x20) and I want to replace them with sequences of characters that are human readable when decoded into ASCII (for instance 0x0F would display [TAB] instead of actually being a tab character). So as I see it, I have three options:
Decode the whole thing into an ASCII string, then use String.Replace() to swap out what I want. The problem with this is that the characters seem to just be decoded as the unprintable box character or question marks, thus losing their actual byte values.
Iterate through the byte array looking for any of my control characters and performing an array insert operation (make new larger array, copy existing pieces in, write new pieces).
Use Array.ToList<byte>() to convert the byte array to a List, then use IEnumerable.SelectMany() to transform the control characters into sequences of readable characters which SelectMany will then flatten back out for me.
So the question is, which is the best option in terms of efficiency? I don't really have a good feel for the performance implications of the IEnumerable lambda operations. I believe option 1 is out as functionally unworkable, but I could be wrong.
Try
// your byte array for the message
byte[] TheMessage = ...;
// a string representation of your message (the character 0x01... 0x32 are NOT altered)
string MessageString = Encoding.ASCII.GetString(TheMessage);
// replace whatever you want...
MessageString = MessageString.Replace (" ", "x").Replace ( "\n", " " )...
// the replaced message back as byte array
byte[] TheReplacedMessage= Encoding.ASCII.GetBytes(MessageString.ToCharArray());
EDIT:
Sample for replacing an 8 Bit byte value
MessageString = MessageString.Replace ( Encoding.ASCII.GetString (new byte[] {0xF7}), " " )...
Regarding the performance
I am not 100% sure whether it is the fastest approach... we just tried several approaches though our requirement was to replace "byte array of 1-n bytes" whithin the original byte-array... this came out the fastet+cleanest for our use case (1 MB - 1 GB files).

How to properly write a UDP packet

I am trying to rewrite some of my code from a C++ program I wrote a while ago, but I am not sure if/how I can write to a byte array properly, or if I should be using something else. The code I am trying to change to C# .NET is below.
unsigned char pData[1400];
bf_write g_ReplyInfo("SVC_ReplyInfo", &pData, 1400);
void PlayerManager::BuildReplyInfo()
{
// Delete the old packet
g_ReplyInfo.Reset();
g_ReplyInfo.WriteLong(-1);
g_ReplyInfo.WriteByte(73);
g_ReplyInfo.WriteByte(g_ProtocolVersion.GetInt());
g_ReplyInfo.WriteString(iserver->GetName());
g_ReplyInfo.WriteString(iserver->GetMapName());
}
BinaryWriter might work, although strings are written with a preceding 7-bit encoded length, which I suspect the client won't be able to handle. You'll probably have to convert strings to bytes and then either add a length word or 0-terminate it.
No need to manually convert numbers to bytes. If you have a long that you want to write as a byte, just cast it. That is, if your BinaryWriter is bw, then you can write bw.Write((byte)longval);. To write -1 as a long: bw.Write((long)(-1)).

problem with encoding.utf8.getbytes in c#

I am working on C#, trying below code
byte[] buffer = new byte[str.Length];
buffer = Encoding.UTF8.GetBytes(str);
In str I've got lengthy data but I've got problem in getting complete encoded bytes.
Please tell me what's going wrong and how can I overcome this problem?
Why are you creating a new byte array and then ignoring it? The value of buffer before the call to GetBytes is being replaced with a reference to a new byte array returned by GetBytes.
However, you shouldn't expect the UTF-8 encoded version of a string to be the same length in bytes as the original string's length in characters, unless it's all ASCII. Any character over U+007F takes up at least 2 bytes.
What's the bigger picture here? What are you trying to achieve, and why does the length of the byte array matter to you?
The proper use is:
byte[] buffer = Encoding.UTF8.GetBytes(str);
In general, you should not make any assumptions about length/size/count when working with encoding, bytes and chars/strings. Let the Encoding objects do their work and then query the resulting objects for that info.
Having said that, I don't believe there is an inherent length restriction for the encoding classes. I have several production apps doing the same work in the opposite direction (bytes encoded to chars) which are processing byte arrays in the 10s of megabytes.

Categories