How to I read any file in binary using C#? [duplicate] - c#

This question already has answers here:
C# - How do I read and write a binary file?
(4 answers)
Closed 9 years ago.
The application I'm attempting to create would read the binary code of any file and create a file with the exact same binary code, creating a copy.
While writing a program that reads a file and writes it somewhere else, I was running into encoding issues, so I hypothesize that reading as straight binary will overcome this.
The file being read into the application is important, as after I get this to work I will add additional functionality to search within or manipulate the file's data as it is read.
Update:
I'd like to thank everyone that took the time to answer, I now have a working solution. Wolfwyrd's answer was exactly what I needed.

BinaryReader will handle reading the file into a byte buffer. BinaryWriter will handle dumping those bytes back out to another file. Your code will be something like:
using (var binReader = new System.IO.BinaryReader(System.IO.File.OpenRead("PATHIN")))
using (var binWriter = new System.IO.BinaryWriter(System.IO.File.OpenWrite("PATHOUT")))
{
byte[] buffer = new byte[512];
while (binReader.Read(buffer, 0, 512) != 0)
{
binWriter.Write(buffer);
}
}
Here we cycle a buffer of 512 bytes and immediately write it out to the other file. You would need to choose sensible sizes for your own buffer (there's nothing stopping you reading the entire file if it's reasonably sized). As you mentioned doing pattern matching you will need to consider the case where a pattern overlaps a buffered read if you do not load the whole file into a single byte array.
This SO Question has more details on best practices on reading large files.

Look at MemoryStream and BinaryReader/BinaryWriter:
http://www.dotnetperls.com/memorystream
http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx
http://msdn.microsoft.com/en-us/library/system.io.binarywriter.aspx

Have a look at using BinaryReader Class
Reads primitive data types as binary values in a specific encoding.
and maybe BinaryReader.ReadBytes Method
Reads the specified number of bytes from the current stream into a
byte array and advances the current position by that number of bytes.
also BinaryWriter Class
Writes primitive types in binary to a stream and supports writing
strings in a specific encoding.
Another good example C# - Copying Binary Files

for instance, one char at a time.
using (BinaryReader writer = new BinaryWrite(File.OpenWrite("target"))
{
using (BinaryReader reader = new BinaryReader(File.OpenRead("source"))
{
var nextChar = reader.Read();
while (nextChar != -1)
{
writer.Write(Convert.ToChar(nextChar));
nextChar = reader.Read();
}
}
}

The application I'm attempting to create would read the binary code of any file and create a file with the exact same binary code, creating a copy.
Is this for academic purposes? Or do you actually just want to copy a file?
If the latter, you'll want to just use the System.IO.File.Copy method.

Related

How can I write to the column I want with StreamWriter? [duplicate]

I am trying to use StreamReader and StreamWriter to Open a text file (fixed width) and to modify a few specific columns of data. I have dates with the following format that are going to be converted to packed COMP-3 fields.
020100718F
020100716F
020100717F
020100718F
020100719F
I want to be able to read in the dates form a file using StreamReader, then convert them to packed fields (5 characters), and then output them using StreamWriter. However, I haven't found a way to use StreamWriter to right to a specific position, and beginning to wonder if is possible.
I have the following code snip-it.
System.IO.StreamWriter writer;
this.fileName = #"C:\Test9.txt";
reader = new System.IO.StreamReader(System.IO.File.OpenRead(this.fileName));
currentLine = reader.ReadLine();
currentLine = currentLine.Substring(30, 10); //Substring Containing the Date
reader.Close();
...
// Convert currentLine to Packed Field
...
writer = new System.IO.StreamWriter(System.IO.File.Open(this.fileName, System.IO.FileMode.Open));
writer.Write(currentLine);
Currently what I have does the following:
After:
!##$%0718F
020100716F
020100717F
020100718F
020100719F
!##$% = Ascii Characters SO can't display
Any ideas? Thanks!
UPDATE
Information on Packed Fields COMP-3
Packed Fields are used by COBOL systems to reduce the number of bytes a field requires in files. Please see the following SO post for more information: Here
Here is Picture of the following date "20120123" packed in COMP-3. This is my end result and I have included because I wasn't sure if it would effect possible answers.
My question is how do you get StreamWriter to dynamically replace data inside a file and change the lengths of rows?
I have always found it better to to read the input file, filter/process the data and write the output to a temporary file. After finished, delete the original file (or make a backup) and copy the temporary file over. This way you haven't lost half your input file in case something goes wrong in the middle of processing.
You should probably be using a Stream directly (probably a FileStream). This would allow you to change position.
However, you're not going to be able to change record sizes this way, at least, not in-line. You can have one Stream reading from the original file, and another writing to a new, converted copy of the file.
However, I haven't found a way to use StreamWriter to right to a specific position, and
beginning to wonder if is possible.
You can use StreamWriter.BaseStream.Seek method
using (StreamWriter wr = new StreamWriter(File.Create(#"c:\Temp\aaa.txt")))
{
wr.Write("ABC");
wr.Flush();
wr.BaseStream.Seek(0, SeekOrigin.Begin);
wr.Write("Z");
}

Data from byte array

I'm trying to read the bytes in the stream at each frame.
I want to be able to read the position and the timestamp information that is stored on a file I have created.
The stream is a stream of recorded skeleton data and it is in encoded binary format
Stream recordStream;
byte[] results;
using (FileStream SourceStream = File.Open(#".....\Stream01.recorded", FileMode.Open))
{
if (SourceStream.CanRead)
{
results = new byte[recordStream.Length];
SourceStream.Read(results, 0, (int)recordStream.Length);
}
}
The file should be read and the Read method should read the current sequence of bytes before advances the position in the stream.
Is there a way to pull out the data (position and timestamp) I want from the bytes read, and save it in separate variables before it advances?
Could using the binary reader give me the capabilities to do this.
BinaryReader br1 = new BinaryReader(recordStream);
I have save the file as .recorded. I have also saved it as .txt to see what is contained in the file, but since it is encoded, it is not understandable.
Update:
I tried running the code with breakpoints to see if it enters the function with my binaryreader and it crashes with an error: ArgumentException was unhandled. Stream was not readable, on the BinaryReader initialization and declaration
BinaryReader br1 = new BinaryReader(recordStream);
The file type was .recorded.
You did not provide any information about the format of the data you are trying to read.
However, using the BinaryReader is exactly what you need to do.
It exposes methods to read data from the stream and convert them to various types.
Consider the following example:
var filename = "pathtoyourfile";
using (var stream = File.Open(filename, FileMode.Open))
using(var reader = new BinaryReader(stream))
{
var x = reader.ReadByte();
var y = reader.ReadInt16();
var z = reader.ReadBytes(10);
}
It really depends on the format of your data though.
Update
Even though I feel I've already provided all the information you need,
let's use your data.
You say each record in your data starts with
[long: timestamp][int: framenumber]
using (var stream = File.Open(filename, FileMode.Open))
using(var reader = new BinaryReader(stream))
{
var timestamp = reader.ReadInt64();
var frameNumber = reader.ReadInt32();
//At this point you have the timestamp and the frame number
//you can now do whatever you want with it and decide whether or not
//to continue, after that you just continue reading
}
How you continue reading depends on the format of the remaining part of the records
If all fields in a record have a specific length, then you either (depending on the
choice you made knowing the values of the timestamp and the frame number) continue
reading all the fields for that record OR you simply advance to a position in the stream
that contains the next record. For example if each record is 100 bytes long, if you want to skip this record after you got the first two fields:
stream.Seek(88, SeekOrigin.Current);
//88 here because the first two fields take 12 bytes -> (100 - 8 + 4)
If the records have a variable length the solution is similar, but you'll have to
take into account the length of the various fields (which should be defined by
length fields preceding the variable length fields)
As for knowing if the first 8 bytes really do represent a timestamp,
there's no real way of knowing for sure... remember in the end the stream just contains
a series of individual bytes that have no meaning whatsoever except for the meaning
given to them by your file format. Either you have to revise the file format or you could
try checking if the value of 'timestamp' in the example above even makes sense.
Is this a file format you have defined yourself, if so... perhaps you are making it to complicated and might want to look at solutions such as Google Protocol Buffers or Apache Thrift.
If this is still not what you are looking for, you will have to redefine your question.
Based on your comments:
You need to know the exact definition of the entire file. You create a struct based on this file format:
struct YourFileFormat {
[FieldOffset(0)]
public long Timestamp;
[FieldOffset(8)]
public int FrameNumber;
[FieldOffset(12)]
//.. etc..
}
Then, using a BinaryReader, you can either read each field individually for each frame:
// assume br is an instantiated BinaryReader..
YourFileFormat file = new YourFileFormat();
file.Timestamp = br.ReadInt64();
file.FrameNumber = br.ReadInt32();
// etc..
Or, you can read the entire file in and have the Marshalling classes copy everything into the struct for you..
byte[] fileContent = br.ReadBytes(sizeof(YourFileFormat));
GCHandle gcHandle = GCHandle.Alloc(fileContent, GCHandleType.Pinned); // or pinning it via the "fixed" keyword in an unsafe context
file = (YourFileFormat)Marshal.PtrToStructure(gcHandle.AddrOfPinnedObject(), typeof(YourFileFormat));
gcHandle.Free();
However, this assumes you'll know the exact size of the file. With this method though.. each frame (assuming you know how many there are) can be a fixed size array within this struct for that to work.
Bottom line: Unless you know the size of what you want to skip.. you can't hope to get the data from the file you require.

Copying files using C# [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Best way to copy between two Stream instances - C#
I know I can use File.Copy but, I'm more interested in doing it a longer way round, just for educational purposes.
Now, the approach I want to discuss is to use a StreamReader and StreamWriter (or FileStreams).
In my mind, I would read the file (as binary) into memory and then write the file to the new location. This strikes me as potential for errors since
1) the entire file is being loaded into memory (and I wouldn't know how big the file is) and
2) it would be time consuming compared to something like copy a byte, paste a byte (which I assume is how streaming works) since we'd have to wait for the entire file to be stored in memory before pasting even begins.
So, a long way to ask, how would I stream a copy and paste job?
So, a long way to ask, how would I stream a copy and paste job?
Instead of reading either each byte separately into memory, or the whole file into memory, you'd go half-way between - create a buffer, e.g. 32K, and ask the input stream to read that much data. Write out however many bytes you've read, and then repeat until there's no more data to read. So something like:
public void Copy(Stream input, Stream output)
{
byte[] buffer = new byte[32 * 1024];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) != 0)
{
output.Write(buffer, 0, bytesRead);
}
}
This isn't as efficient as it could potentially be, as you're not reading and writing at the same time. Asynchronous IO would fix that, but make it much more complicated - and it's entirely possible that the OS will be smart enough to keep reading while you're writing anyway...
Note that you would definitely want to use Stream rather than TextReader / TextWriter unless you were actually reading text.
You can read/write the file in blocks of 1 MB (for example). MSDN has an example which shows how to do that.
Create an empty destination file
Read a buffer from the source file (lets say 1024 bytes)
Write the buffer to the destionation file
Repeat with 2 until everything is done

C# Reading, Modifying then writing binary data to file. Best convention?

I'm new to programming in general (My understanding of programming concepts is still growing.). So this question is about learning, so please provide enough info for me to learn but not so much that I can't, thank you.
(I would also like input on how to make the code reusable with in the project.)
The goal of the project I'm working on consists of:
Read binary file.
I have known offsets I need to read to find a particular chunk of data from within this file.
First offset is first 4 bytes(Offset for end of my chunk).
Second offset is 16 bytes from end of file. I read for 4 bytes.(Gives size of chunk in hex).
Third offset is the 4 bytes following previous, read for 4 bytes(Offset for start of chunk in hex).
Locate parts in the chunk to modify by searching ASCII text as well as offsets.
Now I have the start offset, end offset and size of my chunk.
This should allow me to read bytes from file into a byte array and know the size of the array ahead of time.
(Questions: 1. Is knowing the size important? Other than verification. 2. Is reading part of a file into a byte array in order to change bytes and overwrite that part of the file the best method?)
So far I have managed to read the offsets from the file using BinaryReader on a MemoryStream. I then locate the chunk of data I need and read that into a byte array.
I'm stuck in several ways:
What are the best practices for binary Reading / Writing?
What's the best storage convention for the data that is read?
When I need to modify bytes how do I go about that.
Should I be using FileStream?
Since you want to both read and write, it makes sense to use the FileStream class directly (using FileMode.Open and FileAccess.ReadWrite). See FileStream on MSDN for a good overall example.
You do need to know the number of bytes that you are going to be reading from the stream. See the FileStream.Read documentation.
Fundamentally, you have to read the bytes into memory at some point if you're going to use and later modify their contents. So you will have to make an in-memory copy (using the Read method is the right way to go if you're reading a variable-length chunk at a time).
As for best practices, always dispose your streams when you're done; e.g.:
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
//Do work with the FileStream here.
}
If you're going to do a large amount of work, you should be doing the work asynchronously. (Let us know if that's the case.)
And, of course, check the FileStream.Read documentation and also the FileStream.Write documentation before using those methods.
Reading bytes is best done by pre-allocating an in-memory array of bytes with the length that you're going to read, then reading those bytes. The following will read the chunk of bytes that you're interested in, let you do work on it, and then replace the original contents (assuming the length of the chunk hasn't changed):
EDIT: I've added a helper method to do work on the chunk, per the comments on variable scope.
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
var chunk = new byte[numOfBytesInChunk];
var offsetOfChunkInFile = stream.Position; // It sounds like you've already calculated this.
stream.Read(chunk, 0, numOfBytesInChunk);
DoWorkOnChunk(ref chunk);
stream.Seek(offsetOfChunkInFile, SeekOrigin.Begin);
stream.Write(chunk, 0, numOfBytesInChunk);
}
private void DoWorkOnChunk(ref byte[] chunk)
{
//TODO: Any mutation done here to the data in 'chunk' will be written out to the stream.
}

Reading a MemoryStream which contains multiple files

If I have a single MemoryStream of which I know I sent multiple files (example 5 files) to this MemoryStream. Is it possible to read from this MemoryStream and be able to break apart file by file?
My gut is telling me no since when we Read, we are reading byte by byte... Any help and a possible snippet would be great. I haven't been able to find anything on google or here :(
You can't directly, not if you don't delimit the files in some way or know the exact size of each file as it was put into the buffer.
You can use a compressed file such as a zip file to transfer multiple files instead.
A stream is just a line of bytes. If you put the files next to each other in the stream, you need to know how to separate them. That means you must know the length of the files, or you should have used some separator. Some (most) file types have a kind of header, but looking for this in an entire stream may not be waterproof either, since the header of a file could just as well be data in another file.
So, if you need to write files to such a stream, it is wise to add some extra information. For instance, start with a version number, then, write the size of the first file, write the file itself and then write the size of the next file, etc....
By starting with a version number, you can make alterations to this format. In the future you may decide you need to store the file name as well. In that case, you can increase version number, make up a new format, and still be able to read streams that you created earlier.
This is of course especially useful if you store these streams too.
Since you're sending them, you'll have to send them into the stream in such a way that you'll know how to pull them out. The most common way of doing this is to use a length specification. For example, to write the files to the stream:
write an integer to the stream to indicate the number of files
Then for each file,
write an integer (or a long if the files are large) to indicate the number of bytes in the file
write the file
To read the files back,
read an integer (n) to determine the number of files in the stream
Then, iterating n times,
read an integer (or long if that's what you chose) to determine the number of bytes in the file
read the file
You could use an IEnumerable<Stream> instead.
You need to implement this yourself, what you would want to do is write in some sort of 'delimited' into the stream. As you're reading, look for that delimited, and you'll know when you have hit a new file.
Here's a quick and dirty example:
byte[] delimiter = System.Encoding.Default.GetBytes("++MyDelimited++");
ms.Write(myFirstFile);
ms.Write(delimiter);
ms.Write(mySecondFile);
....
int len;
do {
len = ms.ReadByte(buffer, lastOffest, delimiter.Length);
if(buffer == delimiter)
{
// Close and open a new file stream
}
// Write buffer to output stream
} while(len > 0);

Categories