I'm doing a number of bitwise operations on an array of bytes in C#. I'm obtaining the array by calling FileStream.Read. I just realized that I'm not sure what would happen if a file had a bad byte or corrupt byte in it somewhere. For example, maybe a nibble is chopped off of the end or something like that. What would the FileStream do with it? Would the messed up byte by 'rounded' off by the Read method? Would an exception be thrown? Or is this something that will virtually never happen?
Thanks,
brian
If your FileStream.Read call succeeds, there's no such thing as a file having a bad byte or corrupt byte. Each byte that is successfully read, and part of the file, is a value from 0 to 255. How it is interpreted by a program is what matters.
If FileStream.Read returns for example 5 bytes, then you can rely that those 5 bytes are read successfully from the file and all bits of the bytes were put into your buffer successfully.
There is such thing though for example as a bad cluster on your hard disk, in which case your Read would fail with some kind of exception.
For completeness I should also mention that every file type has a file format. I.e. how you should interpret the binary data. It is possible that a byte or several bytes don't follow the file format. And in that way you can view a byte as being corrupt or invalid, but it's not really corrupt or invalid just wrong in terms of what the file format specifies.
Related
Alright, so I basically want to read any file with a specific extension. Going through all the bytes and reading the file is basically easy, but what about getting the type of the next byte? For example:
while ((int)reader.BaseStream.Position != RecordSize * RecordsCount)
{
// How do I check what type is the next byte gonna be?
// Example:
// In every file, the first byte is always a uint:
uint id = reader.GetUInt32();
// However, now I need to check for the next byte's type:
// How do I check the next byte's type?
}
Bytes don't have a type. When data in some language type, such as a char or string or Long is converted to bytes and written to a file, there is no strict way to tell what the type was : all bytes look alike, a number from 0-255.
In order to know, and to convert back from bytes to structured language types, you need to know the format that the file was written in.
For example, you might know that the file was written as an ascii text file, and hence every byte represents one ascii character.
Or you might know that your file was written with the format {uint}{50 byte string}{linefeed}, where the first 2 bytes represent a uint, the next 50 a string, followed by a linefeed.
Because all bytes look the same, if you don't know the file format you can't read the file in a semantically correct way. For example, I might send you a file I created by writing out some ascii text, but I might tell you that the file is full of 2-byte uints. You would write a program to read those bytes as 2-byte uints and it would work : any 2 bytes can be interpreted as a uint. I could tell someone else that the same file was composed of 4-byte longs, and they could read it as 4-byte longs : any 4 bytes can be interpreted as a long. I could tell someone else the file was a 2 byte uint followed by 6 ascii characters. And so on.
Many types of files will have a defined format : for example, a Windows executable, or a Linux ELF binary.
You might be able to guess the types of the bytes in the file if you know something about the reason the file exists. But somehow you have to know, and then you interpret those bytes according to the file format description.
You might think "I'll write the bytes with a token describing them, so the reading program can know what each byte means". For example, a byte with a '1' might mean the next 2 bytes represent a uint, a byte with a '2' might mean the following byte tells the length of a string, and the bytes after that are the string, and so on. Sure, you can do that. But (a) the reading program still needs to understand that convention, so everything I said above is true (it's turtles all the way down), (b) that approach uses a lot of space to describe the file, and (c) The reading program needs to know how to interpret a dynamically described file, which is only useful in certain circumstances and probably means there is a meta-meta format describing what the embedded meta-format means.
Long story short, all bytes look the same, and a reading program has to be told what those bytes represent before it can use them meaningfully.
I have a small problem with crc checking in c#, im needing to read a file which contains the crc value in the last 8 bytes, how im doing it now is like
using filestream with filemode open
calculate stream length minus 8 bytes
stream.read(buffer,0,streamlength minus 8 bytes)
crc computehash passed in buffer
this leaves the remaining 8bytes which I compare against the crcvalue
the problem ive got is that it works ok for small files, but obviously I get a system out of memory exception for bigger files, I know computehash will take a stream but its either pass in full stream which means I cant get the remaining bytes.
Is there a better way of doing this?
kindest regards
Providing a code snippet will prove to be a great help for us that are trying to help you. While I understand what you are saying I can never be sure that I do get what you did without reading your code.
Not being sure what you want to do with the file I suggest you also look at the MemoryStream class. One quick advantage of a MemoryStream is that there is no need to create temporary buffers and files in an application meaning that you could actually save on memory.
You can apply your current method in a similar fashion to MemoryStream and see if that works.
Info on MemoryStream: http://msdn.microsoft.com/en-us/library/system.io.memorystream%28v=vs.110%29.aspx
Of all the example codes I have read online regarding SerialPorts all uses ReadByte then convert to Character instead of using ReadChar in the first place.
Is there a advantage in doing this?
The SerialPort.Encoding property is often misunderstood. The default is ASCIIEncoding, it will produce ? for byte values 0x80..0xFF. So they don't like getting these question marks. If you see such code then converting the byte to char directly then they are getting it really wrong, Unicode has lots of unprintable codepoints in that byte range and the odds that the device actually meant to send these characters are zero. A string tends to be regarded as easier to handle than a byte[], it is.
When you use ReadChar it is based on the encoding you are using, like #Preston Guillot said. According to the docu of ReadChar:
This method reads one complete character based on the encoding.
Use caution when using ReadByte and ReadChar together. Switching
between reading bytes and reading characters can cause extra data to
be read and/or other unintended behavior. If it is necessary to switch
between reading text and reading binary data from the stream, select a
protocol that carefully defines the boundary between text and binary
data, such as manually reading bytes and decoding the data.
I am trying to rewrite some of my code from a C++ program I wrote a while ago, but I am not sure if/how I can write to a byte array properly, or if I should be using something else. The code I am trying to change to C# .NET is below.
unsigned char pData[1400];
bf_write g_ReplyInfo("SVC_ReplyInfo", &pData, 1400);
void PlayerManager::BuildReplyInfo()
{
// Delete the old packet
g_ReplyInfo.Reset();
g_ReplyInfo.WriteLong(-1);
g_ReplyInfo.WriteByte(73);
g_ReplyInfo.WriteByte(g_ProtocolVersion.GetInt());
g_ReplyInfo.WriteString(iserver->GetName());
g_ReplyInfo.WriteString(iserver->GetMapName());
}
BinaryWriter might work, although strings are written with a preceding 7-bit encoded length, which I suspect the client won't be able to handle. You'll probably have to convert strings to bytes and then either add a length word or 0-terminate it.
No need to manually convert numbers to bytes. If you have a long that you want to write as a byte, just cast it. That is, if your BinaryWriter is bw, then you can write bw.Write((byte)longval);. To write -1 as a long: bw.Write((long)(-1)).
I need to be able to read a file format that mixes binary and non-binary data. Assuming I know the input is good, what's the best way to do this? As an example, let's take a file that has a double as the first line, a newline (0x0D 0x0A) and then ten bytes of binary data afterward. I could, of course, calculate the position of the newline, then make a BinaryReader and seek to that position, but I keep thinking that there has to be a better way.
You can use System.IO.BinaryReader. The problem with this though is you must know what type of data you are going to be reading before you call any of the Read methods.
Read(byte[], int, int)
Read(char[], int, int)
Read()
Read7BitEncodedInt()
ReadBoolean()
ReadByte()
ReadBytes(int)
ReadChar()
ReadChars()
ReadDecimal()
ReadDouble()
ReadInt16()
ReadInt32()
ReadInt64()
ReadSByte()
ReadSingle()
ReadString()
ReadUInt16()
ReadUInt32()
ReadUInt64()
And of course the same methods exist for writing in System.IO.BinaryWriter.
Is this file format already fixed? If it's not, it's a really good idea to change to use a length-prefixed format for the strings. Then you can read just the right amount and convert it to a string.
Otherwise, you'll need to read chunks from the file, scan for the newline, and decode the right amount of data or (if you don't find the newline) either buffer it somewhere else (e.g. a MemoryStream) or just remember the starting point and rewind the stream appropriately. It will be ugly, but that's just because of the deficiency of the file format.
I would suggest you don't "over-decode" (i.e. decode the arbitrary binary data after the string) - while it may well not do any harm, in some encodings you could be reading an impossible sequence of binary data, which then starts getting into the realms of DecoderFallbacks and the like.
I've had to deal with that when reading HTTP requests coming in over the wire on Compact Framework. My solution was to roll my own non-buffering ASCII-only StreamReader, so that it was safe to interleave calls to both the StreamReader and the underlying Stream.