Determine what Byte array is? - c#

I have recently started learning C# Networking and I was wondering how would you tell if the received Byte array is a file or a string?

A byte array is just a byte array. It's just got data in.
How you interpret that data is up to you. What's the difference between a text file and a string, for example?
Fundamentally, if your application needs to know how to interpret the data, you've got to put that into the protocol.

A byte array is just a byte array. However, you could make the original byte array include a byte that describes what type it is (assuming you are the originator of it). Then you find this descriptor byte and use it to make decisions.

Strings are encoded byte arrays; files can contain strings and/or binary data.
ASCII strings use byte values between 0-127 to represent characters and control codes. For UTF8 people have written validation routines (https://stackoverflow.com/a/892443/884862).
You'd have to check the array for all of the string encoding characteristics before you could assume it's a binary file.
edit Here's an SO question about classifying a file type Using .NET, how can you find the mime type of a file based on the file signature not the extension using a signature (first X bytes) of the file to determine it's mimetype.

No you can't. Data is data, you must layer on top of your network communication form of protocol, it will need to say something like: "If the first byte I see is a 1 the next four bytes represent a int, if I see a 2 read the next byte and that is the length of the text string that follows that..."
A much easier solution than inventing your own protocol is use a prebuilt one that gives you a higher level abstraction like WCF so you don't need to deal with byte arrays.

Not quite a "file", an array contains data. You should loop through that array and write the data,
Try this:
foreach(string data in array)
{
Console.WriteLine(data);
}
Now, if it doesn't contain strings, but data, you can simply use a
foreach(var data in array)
{
Console.WriteLine(data.ToString());
}

Related

A custom binarywriter or a custom numeric datatype?

I have to query an oracle database for various numeric values and dump them with my c# console app in a binary file with a custom format. Depending on the business data I need to encode the numeric value on 1 byte length,2 byte length,3 byte length,4,6,8,10 and 16...
As for now I think that I could store the 1 byte as a char or a byte value type and write with the standard binarywriter. For the 2 byte length I could use a short value type, etc... But I am pretty sure that there is no native .Net numeric type for the 3 byte length, the 10 byte length and so on...
So I am trying to find how to query the values (from oracle as string ?) and binary write them...
The two solutions I have in mind : write a custom binarywriter or try to find how to create some custome numeric type class (something like Byte10,Byte16...) but both solutions seems akwards....
How would you have deal with that problem? Please do not advise to switch to c/c++ as I do not really know those languages...
Thank you for any help.
Yes it is all about managing a custom byte[] and I think I have found what I was looking for here

C# WPF Binary Reading

Alright, so I basically want to read any file with a specific extension. Going through all the bytes and reading the file is basically easy, but what about getting the type of the next byte? For example:
while ((int)reader.BaseStream.Position != RecordSize * RecordsCount)
{
// How do I check what type is the next byte gonna be?
// Example:
// In every file, the first byte is always a uint:
uint id = reader.GetUInt32();
// However, now I need to check for the next byte's type:
// How do I check the next byte's type?
}
Bytes don't have a type. When data in some language type, such as a char or string or Long is converted to bytes and written to a file, there is no strict way to tell what the type was : all bytes look alike, a number from 0-255.
In order to know, and to convert back from bytes to structured language types, you need to know the format that the file was written in.
For example, you might know that the file was written as an ascii text file, and hence every byte represents one ascii character.
Or you might know that your file was written with the format {uint}{50 byte string}{linefeed}, where the first 2 bytes represent a uint, the next 50 a string, followed by a linefeed.
Because all bytes look the same, if you don't know the file format you can't read the file in a semantically correct way. For example, I might send you a file I created by writing out some ascii text, but I might tell you that the file is full of 2-byte uints. You would write a program to read those bytes as 2-byte uints and it would work : any 2 bytes can be interpreted as a uint. I could tell someone else that the same file was composed of 4-byte longs, and they could read it as 4-byte longs : any 4 bytes can be interpreted as a long. I could tell someone else the file was a 2 byte uint followed by 6 ascii characters. And so on.
Many types of files will have a defined format : for example, a Windows executable, or a Linux ELF binary.
You might be able to guess the types of the bytes in the file if you know something about the reason the file exists. But somehow you have to know, and then you interpret those bytes according to the file format description.
You might think "I'll write the bytes with a token describing them, so the reading program can know what each byte means". For example, a byte with a '1' might mean the next 2 bytes represent a uint, a byte with a '2' might mean the following byte tells the length of a string, and the bytes after that are the string, and so on. Sure, you can do that. But (a) the reading program still needs to understand that convention, so everything I said above is true (it's turtles all the way down), (b) that approach uses a lot of space to describe the file, and (c) The reading program needs to know how to interpret a dynamically described file, which is only useful in certain circumstances and probably means there is a meta-meta format describing what the embedded meta-format means.
Long story short, all bytes look the same, and a reading program has to be told what those bytes represent before it can use them meaningfully.

Is it possible to convert any arbitrary file to a string? (C#)

I need to store an arbitrary amount of files (with any file type) as a property on a class. This class will get serialized to a JSON file. Later the user can load the JSON file back into the app, and has the ability to recreate the files they originally loaded. Right now I'm storing the files as an array of bytes. The issue is that some of the files are large, and the array of bytes is too large and is causing the serialization/deserializationto take a very long time.
Is there a way I can store the files as a string/array of strings instead of bytes? Or some different way of storing the files? What are some options to deal with this problem?
edit:
I believe a string would be faster because right now when the byte array is being rendered out in JSON in ascii format, so it looks like this:
150,123,43,62...
Encode your byte array as a base 64 string using Convert.ToBase64String(). That should reduce the size of your JSON significantly: http://rextester.com/ILJNV57711
For example, here's a random byte array, serialized as JSON:
[95,103,154,174,23,5,178,179,158,186,181,89,40,229,233,168,217,42,98,65,248]
Here's the same array, converted to a base 64 string, serialized as JSON:
"X2earhcFsrOeurVZKOXpqNkqYkH4"
It's plain to see that a byte array is smaller in JSON when expressed as a base 64 string. It goes from 76 characters to 30.
Certainly don't store the byte array as decimal numbers like that; Base64 encode it at the very least. Base64 encoding will enlarge the data to 133% of the raw file size but that'll be a massive improvement from the 400% enlargement you're currently using.

Sorting through byte arrays

My program sends data from one application to another in a byte array. I want to pull sections of the data out to store in different variables. for instance the first [7] in the byte array hold the symbol data, the next section is a number which i don't know the length of because it will vary with each msg it sends. Before i send the data i break it up with commas between each section of data i want. My issue is setting up a loop that will stop at the commas so i can add the data into another variable. If this makes sense please any ideas will help. Thanks.
You need to know what encoding you have, since comma is not always the same byte value in different encoding schemes. Also if you want efficiency, you can try to parse the byte array as a byte array, but this is easier. Also, you could create a class on both ends that has the properties you need and is [Serializable].
If for whatever reason you don't want to do that then you can easily parse the byte array like this:
UTF8Encoding encoding = new UTF8Encoding();
string s = encoding.GetString(byteArray);
string[] values = s.Split(new char[] {','});
//then do something with the values
The data is just complicated to handle as a byte array, as it's really encoded text. Just decode it (using the encoding that you used to turn it into a byte array) and split it:
string[] parts = Encoding.UTF8.GetString(data).Split(',');
Now ou can get each part and parse them:
int symbol = Int32.Parse(parts[0]);
int count = Int32.Parse(parts[1]);
I recommend defining an object model that represents the data that you need to send, and then using some serialization framework to convert this to/from a byte array.
See for example http://msdn.microsoft.com/en-us/library/ms973893.aspx
Another topic which may be interesting for you is data contracts in .Net.

c# reading/writing bytes to/from console

I have a byte[] array and want to write it to stdout: Console.Out.Write(arr2str(arr)). How to convert byte[] to string, so that app.exe > arr.txt does the expected thing? I just want to save the array to a file using a pipe, but encodings mess things up.
I'd later want to read that byte array from stdin: app.exe < arr.txt and get the same thing.
How can I do these two things: write and read byte arrays to/from stdin/stdout?
EDIT:
I'm reading with string s = Console.In.ReadToEnd(), and then System.Text.Encoding.Default.GetBytes(s). I'm converting from array to string with System.Text.Encoding.Default.GetString(bytes), but this doesn't work when used with <,>. By "doesn't work" I mean that writing and reading over a pipe does not return the same thing.
To work with binary files you want Console.OpenStandardInput() to retrieve a Stream that you can read from. This has been covered in other threads here at SO, this one for example: Read binary data from Console.In
If you are writing to Console.WriteLine you need to encode the text in to a printable format. If you want to output to a file as a binary you can't use Console.WriteLine
If you still need to output to the console you either need to open the raw stream with Console.OpenStandardOutput() or call Convert.ToBase64String to turn the byte array to a string. There is also Convert.FromBase64String to come back from base64 to a byte array.

Categories