I'm venturing into networking using C# and I'm trying to create a clean way to send my packets, right now though I'm not going to worry with all of the Packets enclosed in special characters stuff that I've been reading about, instead the packet is three digit number dedicated to the front of the data passed to the client. For example, a data string may be the following.
LoginPacket is packet 000.
LoginPacket Data would be "000Username~Password"
I've tried to clean this up, so I could just write things in a cleaner manner, and try something like this
SendPacket(000, new string { "data", "parameters" });
However, sending the integer 000 is instantly converted to zero.
Is there a way around this, or would I be better off storing it all in a string, such as
SendPacket(new string { "000", "data", "params" } );
When you convert the number to text you need to specify the number of digits. The number 000 and 0 are both zero. However the string "000" and "0" are different strings.
Use
n.ToString("000");
To ensure you get three digits.
I would suggest you go with a Command Type followed by a Length followed by the Payload
Then your payloads can be of a similar structure.. so the login command (0) would use a structure that began with a length byte followed by username followed by a second length byte and finally followed by a password.
For example:
0155Dan-o8password
Remember that this all comes over the wire as a byte array.. so you read the first 4 bytes (Int32, Command Type)
Then read the next Int32 to figure out the length of the payload.. that's how many bytes you will read in your third read.
Now that you know the Command is login you can implement login-specific reading.
In addition I would suggest you create some extension methods to make this easier.
like: Stream.ReadByte, Stream.ReadInt32, Stream.ReadInt64, Stream.ReadString(length)
Then some application-specific extensions.. like Stream.ReadLogin
Related
Alright, so I basically want to read any file with a specific extension. Going through all the bytes and reading the file is basically easy, but what about getting the type of the next byte? For example:
while ((int)reader.BaseStream.Position != RecordSize * RecordsCount)
{
// How do I check what type is the next byte gonna be?
// Example:
// In every file, the first byte is always a uint:
uint id = reader.GetUInt32();
// However, now I need to check for the next byte's type:
// How do I check the next byte's type?
}
Bytes don't have a type. When data in some language type, such as a char or string or Long is converted to bytes and written to a file, there is no strict way to tell what the type was : all bytes look alike, a number from 0-255.
In order to know, and to convert back from bytes to structured language types, you need to know the format that the file was written in.
For example, you might know that the file was written as an ascii text file, and hence every byte represents one ascii character.
Or you might know that your file was written with the format {uint}{50 byte string}{linefeed}, where the first 2 bytes represent a uint, the next 50 a string, followed by a linefeed.
Because all bytes look the same, if you don't know the file format you can't read the file in a semantically correct way. For example, I might send you a file I created by writing out some ascii text, but I might tell you that the file is full of 2-byte uints. You would write a program to read those bytes as 2-byte uints and it would work : any 2 bytes can be interpreted as a uint. I could tell someone else that the same file was composed of 4-byte longs, and they could read it as 4-byte longs : any 4 bytes can be interpreted as a long. I could tell someone else the file was a 2 byte uint followed by 6 ascii characters. And so on.
Many types of files will have a defined format : for example, a Windows executable, or a Linux ELF binary.
You might be able to guess the types of the bytes in the file if you know something about the reason the file exists. But somehow you have to know, and then you interpret those bytes according to the file format description.
You might think "I'll write the bytes with a token describing them, so the reading program can know what each byte means". For example, a byte with a '1' might mean the next 2 bytes represent a uint, a byte with a '2' might mean the following byte tells the length of a string, and the bytes after that are the string, and so on. Sure, you can do that. But (a) the reading program still needs to understand that convention, so everything I said above is true (it's turtles all the way down), (b) that approach uses a lot of space to describe the file, and (c) The reading program needs to know how to interpret a dynamically described file, which is only useful in certain circumstances and probably means there is a meta-meta format describing what the embedded meta-format means.
Long story short, all bytes look the same, and a reading program has to be told what those bytes represent before it can use them meaningfully.
I'm setting up a way to communicate between a server and a client. How I am working it at the moment, is that a stream's first byte will contain an indicator of what is coming and then looking up that request's class I can determine the length of the request:
stream.Read(message, 0, 1)
if(message == <byte representation of a known class>)
{
stream.Read(message, 0, Class.RequestSize);
}
I'm curious how to handle the case of when the class size is not known, of if after reading a known request the data is corrupt.
I'm thinking that I can insert in some sort of delimiter into the stream, but since a byte can only be between 0-255, I'm not sure how to go about creating a unique delimiter. Do I want to place a pattern into the stream to represent the end of a message? How can I be sure that this pattern is unique enough to not be mistaken for actual data?
There are different approaches on this. One option would be sending the length of the class name and possible of the whole packet first (e.g. always the first byte). This way you can read just read that byte and then n bytes more to get the class name.
By this approach you don't end up reading a lot of stuff a malicious client sends you with the intent to DoS your application and you can quickly determine if you read enough to handle the packet or if it's not yet complete.
There are some low level bytes which are used especially as delimiters. Start of Text and End of Text have a (hex) value of 0x02 and 0x03 respectively. And you have Start of Heading coupled with End of Transmission, 0x01 and 0x04; you could use these.
Maybe there are any way to compress small strings(86 chars) to something smaller?
#a#1\s\215\c\6\-0.55955,-0.766462,0.315342\s\1\x\-3421.-4006,3519.-4994,3847.1744,sbs
The only way I see is to replace the recurring characters on a unique character.
But i can't find something about that in google.
Thanks for any reply.
http://en.wikipedia.org/wiki/Huffman_coding
Huffman coding would probably be pretty good start. In general the idea is to replace individual characters with the smallest bit pattern needed to replicate the original string or dataset.
You'll want to run statistical analysis on a variety of 'small strings' to find the most common characters so that the more common characters will be represented with the smallest unique bit patterns. And possibly makeup a 'example' small string with every character that will need to be represented (like a-z0-9#.0-)
I took your example string of 85 bytes (not 83 since it was copied verbatim from the post, perhaps with some intended escapes not processed). I compressed it using raw deflate, i.e. no zlib or gzip headers and trailers, and it compressed to 69 bytes. This was done mostly by Huffman coding, though also with four three-byte backward string references.
The best way to compress this sort of thing is to use everything you know about the data. There appears to be some structure to it and there are numbers coded in it. You could develop a representation of the expected data that is shorter. You can encode it as a stream of bits, and the first bit could indicate that what follows is straight bytes in the case that the data you got was not what was expected.
Another approach would be to take advantage of previous messages. If this message is one of a stream of messages, and they all look similar to each other, then you can make a dictionary of previous messages to use as a basis for compression, which can be reconstructed at the other end by the previous messages received. That may offer dramatically improved compression if they messages really are similar.
You should look up RUN-LENGTH ENCODING. Here is a demonstration
rrrrrunnnnnn BECOMES 5r1u6n WHAT? truncate repetitions: for x consecutive r use xr
Now what if some of the characters are digits? Then instead of using x, use the character whose ASCII value is x. for example,
if you have 43 consecutive P, write +P because '+' has ASCII code 43. If you have 49 consecutive y, write 1y because '1' has ASCII code 49.
Now the catch, which you will find with all compression algorithms, is if you have a string with little or no repetitions. Then in that case your code may be longer than the original word. But that's true for all compression algorithms.
NOTE:
I don't encourage using Huffman coding because even if you use the Ziv-Lempel implementation, it's still a lot of work to get it right.
I have a structure that I am converting to a byte array of length 37, then to a string from that.
I am writing a very basic activation type library, and this string will be passed between people. So I want to shorten it from length 37 to something more manageable to type.
Right now:
Convert the structure to a byte array,
Convert the byte array to a base 64 string (which is still too long).
What is a good way to shorten this string, yet still maintain the data stored in it?
Thanks.
In the general case, going from an arbitrary byte[] to a string requires more data, since we assume we want to avoid non-printable characters. The only way to reduce it is to compress before the base-whatever (you can get a little higher than base-64, but not much - and it certainly isn't any more "friendly") - but compression won't really kick in for such a short size. Basically, you can't do that. You are trying to fit a quart in a pint pot, and that doesn't work.
You may have to rethink your requirements. Perhaps save the BLOB internally, and issue a shorter token (maybe 10 chars, maybe a guid) that is a key to the actual BLOB.
Data compression may be a possiblity to check out, but you can't just compress a 40-byte message to 6 bytes (for example).
If the space of possible strings/types is limited, map them to a list (information coding).
I don't know of anything better than base-64 if you actually have to pass the value around and if users have to type it in.
If you have a central data store they can all access, you could just give them the ID of the row where you saved it. This of course depends on how "secret" this data needs to be.
But I suspect that if you're trying to use this for activation, you need them to have an actual value.
How will the string be passed? Can you expect users to perhaps just copy/paste? Maybe some time spent on clearing up superfluous line breaks that come from an email reader or even your "Copy from here" and "Copy to here" lines might bear more fruit!
Can the characters in your string have non-printable chars? If so, you don't need to base64-encode the bytes, you can simply create the string from them (saved 33%)
string str = new string(byteArray.Cast<char>().ToArray());
Also, are the values in the byte array restricted somehow? If they fall into a certain range (i.e., not all of the 256 possible values), you can consider stuffing two of each in each character of the string.
If you really have 37 bytes of non-redundant information, then you are out of luck. Compression may help in some cases, but if this is an activation key, I would recommend having keys of same length (and compression will not enforce this).
If this code is going to be passed over e-mail, then I see no problem in having an even larger key. Another option might be to insert hyphens every 5-or-so characters, to break it into smaller chunks (e.g. XXXXX-XXXXX-XXXXX-XXXXX-XXXXX).
Use a 160bit hash and hope no collisions? It would be much shorter. If you can use a look-up table, just use a 128 or even 64bit incremental value. Much much shorter than your 37 chars.
I have a couple of parameters, which need to be sent to a client app via TCP/IP.
For example:
//inside C++ program
int Temp = 10;
int maxTemp = 100;
float Pressure = 2.3;
Question: What is the best practice to format a string? I need to make sure that the whole string is received by the client and it should be easier at the client end to decode the string.
Basically, I want to know, what should be the format of the string, which I am going to send?
PS: Client app is in C# and the sender's app is in Qt (C++).
This is pretty subjective, but if it will always be as simple as described, then: keep it simple:
ASCII, space delimited, invariant (culture-independent) format integers in their fully expanded form (no E etc), CR as the end sentinel, so:
10 100 2
(with a CR at the end) This scales to any number of records, and will be easy to decode from just about any platform.
If it gets more nuanced: use a serializer built for the job, and just share details of what serialization format you are using.
Use ASCII, of the form paramName paramValue, space delimited, culture-independent format and use integers in their full form (no E notation) and a carriage return at the end, for example: T 10 P 100 mT 2 with CR at the end. In the other side, you can simply split the string by white spaces and note that even indices are parameters and odds indices are parameter values. Note that for every even parameter name index i then i+1 is its corresponding odd index parameter value. Also mind the CR at the end.