How to Interpret Binary a Binary File - c#

I have a binary file that I'd like to open, read and understand; but I've never tried to work with binary information before.
Various questions (including Using structs in C# to read data and
How to read a binary file using c#?) helped me to open and read the file, but I have no idea how to interpret the information I've so far extracted.
One approach I got some hopeful data out of was this:
using (BinaryReader reader = new BinaryReader(File.Open(filename, FileMode.Open, FileAccess.Read)))
{
for (int i = 0; i < 100; i++)
{
iValue = reader.ReadInt32();
sb.AppendFormat("{1}={2}{0}", Environment.NewLine, i, iValue);
}
}
Returns something like this:
0=374014592
1=671183229
2=558694987
3=-1018526206
4=1414798970
5=650
6=4718677
7=44
8=0
9=7077888
10=7864460
But this isn't what I was expecting, nor do I even know what it means - have i successfully determined the file contains a bunch of numbers or am I looking at an interpretation of the data (similar to how using the wrong/different encodings will return different characters for the same input).
Do I have any hope or should I stop entirely?

You have to already know how the binary file is structured in order to be able to read and interpret the file properly.
For example, if you write to a binary file an int, a double, a boolean and a string, like this:
int i = 25;
double d = 3.14157;
bool b = true;
string s = "I am happy";
using (var bw = new BinaryWriter(new FileStream("mydata", FileMode.Create))
{
bw.Write(i);
bw.Write(d);
bw.Write(b);
bw.Write(s);
}
then you must later read back the data values using the same types, in exactly the same order:
using (var br = new BinaryReader(new FileStream("mydata", FileMode.Open)))
{
i = br.ReadInt32();
Console.WriteLine("Integer data: {0}", i);
d = br.ReadDouble();
Console.WriteLine("Double data: {0}", d);
b = br.ReadBoolean();
Console.WriteLine("Boolean data: {0}", b);
s = br.ReadString();
Console.WriteLine("String data: {0}", s);
}
http://www.tutorialspoint.com/csharp/csharp_binary_files.htm
Here is what you would need to know to be able to successfully read a .WAV file (a binary file format that holds sound information). WAV files are one of the simpler binary formats:
http://soundfile.sapp.org/doc/WaveFormat/

By definition a binary file is just a series of bits. Whether you interpret those bits as numbers, characters or something else depends entirely upon what was written into the file in the first place.
In general there's no way to tell what was written into the file by looking at the file contents. Of course if you interpret the bits as characters and get readable text then there's a good chance that text is what was written into the file. But a file containing only text typically wouldn't be described as a binary file.
By calling ReadInt32 you are assuming that the contents of your file are a series of four-byte integers. But what if eight-byte integers or floats or an enumeration or something else was written to your file? What if your file doesn't contain a multiple of four bytes?
You might consider changing your loop to use ReadByte rather than ReadInt32 so it might look something like this...
bValue = reader.ReadByte();
sb.AppendFormat("{1}=0x{2:X}{0}", Environment.NewLine, i, bValue);
so you treat the file as a sequence of bytes and write the data out in hex rather than as a decimal number.
Another approach might be to find a good hex editor and use that to inspect the file contents rather than writing your own code (at least to start with).
There is a simple hex editor built into Visual Studio (assuming that's what you are using). Go to File | Open | Open File. Then in the Open File dialog select your binary file and then click on the drop down to the right of the Open Button and select Open With and then select Binary Editor.
What you'll see is the contents of the file shown as hex and characters. Not great but quick.

Related

Reading alphanumeric data from text file

I am using the code below to read binary data from text file and divide it into small chunks. I want to do the same with a text file with alphanumeric data which is obviously not working with the binary reader. Which reader would be best to achieve that stream,string or text and how to implement that in the following code?
public static IEnumerable<IEnumerable<byte>> ReadByChunk(int chunkSize)
{
IEnumerable<byte> result;
int startingByte = 0;
do
{
result = ReadBytes(startingByte, chunkSize);
startingByte += chunkSize;
yield return result;
} while (result.Any());
}
public static IEnumerable<byte> ReadBytes(int startingByte, int byteToRead)
{
byte[] result;
using (FileStream stream = File.Open(#"C:\Users\file.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
using (BinaryReader reader = new BinaryReader(stream))
{
int bytesToRead = Math.Max(Math.Min(byteToRead, (int)reader.BaseStream.Length - startingByte), 0);
reader.BaseStream.Seek(startingByte, SeekOrigin.Begin);
result = reader.ReadBytes(bytesToRead);
}
return result;
}
I can only help you get the general process figured out:
String/Text is the 2nd worst data format to read, write or process. It should be reserved for output towards and input from the user exclusively. It has some serious issues as a storage and retreival format.
If you have to transmit, store or retreive something as text, make sure you use a fixed Encoding and Culture Format (usually invariant) at all endpoints. You do not want to run into issues with those two.
The worst data fromat is raw binary. But there is a special 0th place for raw binary that you have to interpret into text, to then further process. To quote the most importnt parts of what I linked on encodings:
It does not make sense to have a string without knowing what encoding it uses. [...]
If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.
Almost every stupid “my website looks like gibberish” or “she can’t read my emails when I use accents” problem comes down to one naive programmer who didn’t understand the simple fact that if you don’t tell me whether a particular string is encoded using UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot display it correctly or even figure out where it ends. There are over a hundred encodings and above code point 127, all bets are off.

Can I write binary stored as a string to a binary file directly in C#?

This is a bit of a strange problem I have here. I'm working with C# 6 on the .NET platform on a binary compression algorithm. Multiple stages of compression are working great, even far better than expected! However, converting the unoptimized binary back into a file is proving to be a bit more of a headache than I expected.
The Case
Binary is being read from an arbitrary file, and passed along within the program as a string. Multiple waves of optimization work on the string, converting it into an intermediate representation, which is written as the compressed object. Then, deoptimization turns the intermediate form back into pure binary, ready to be written.
The Code
Binary Input
BinaryString = ""; Filename = filename;
StringBuilder sb = new StringBuilder();
foreach(byte b in File.ReadAllBytes(filename)) {
{
sb.Append(Convert.ToString(b, 2).PadLeft(8, '0'));
}
BinaryString = sb.ToString();
This is how I'm accepting input. It will return a literal binary string, in the form of 11001010110001
The conversion from its intermeidate form returns exactly the same string.
Binary
Output Currently, I'm trying to directly write a binary file as bytes, as such:
List<Byte> bytes = new List<byte>();
foreach(char c in binary)
bytes.Add(Convert.ToByte(c));
File.WriteAllBytes(filename, bytes.ToArray());
The Problem
The method I'm trying right now for binary output is simply writing the binary outright to a text file, rather than writing a binary object to the filesystem.
We're compressing pictures, executables, text, git objects, etc. So it's obviously not feasible whatsoever to have it written like this.
Does there exist a method in C#/.NET that will easily let me translate the binary back into a file, or is this a more involved problem than I'm thinking?
There are 8 bits in a byte but you are trying to convert 1 bit at a time to a byte. You need to gather up 8 bits first, then convert that to byte using the Convert.ToByte() overload that accepts fromBase:
Replace this code:
// Note: I assume you meant to reference `BinaryString` here
// and not "binary" which isn't defined in your example
foreach(char c in BinaryString)
bytes.Add(Convert.ToByte(c));
With this:
var thisByte = string.Empty;
foreach (char c in BinaryString)
{
thisByte += c;
if (thisByte.Length == 8)
{
bytes.Add(Convert.ToByte(thisByte, 2));
thisByte = string.Empty;
}
}

loop for reading different data types & sizes off very large byte array from file

I have a raw byte stream stored on a file (rawbytes.txt) that I need to parse and output to a CSV-style text file.
The input of raw bytes (when read as characters/long/int etc.) looks something like this:
A2401028475764B241102847576511001200C...
Parsed it should look like:
OutputA.txt
(Field1,Field2,Field3) - heading
A,240,1028475764
OutputB.txt
(Field1,Field2,Field3,Field4,Field5) - heading
B,241,1028475765,1100,1200
OutputC.txt
C,...//and so on
Essentially, it's a hex-dump-style input of bytes that is continuous without any line terminators or gaps between data that needs to be parsed. The data, as seen above, consists of different data types one after the other.
Here's a snippet of my code - because there are no commas within any field, and no need arises to use "" (i.e. a CSV wrapper), I'm simply using TextWriter to create the CSV-style text file as follows:
if (File.Exists(fileName))
{
using (BinaryReader reader = new BinaryReader(File.Open(fileName, FileMode.Open)))
{
inputCharIdentifier = reader.ReadChar();
switch (inputCharIdentifier)
case 'A':
field1 = reader.ReadUInt64();
field2 = reader.ReadUInt64();
field3 = reader.ReadChars(10);
string strtmp = new string(field3);
//and so on
using (TextWriter writer = File.AppendText("outputA.txt"))
{
writer.WriteLine(field1 + "," + field2 + "," + strtmp); // +
}
case 'B':
//code...
My question is simple - how do I use a loop to read through the entire file? Generally, it exceeds 1 GB (which rules out File.ReadAllBytes and the methods suggested at Best way to read a large file into a byte array in C#?) - I considered using a while loop, but peekchar is not suitable here. Also, case A, B and so on have different sized input - in other words, A might be 40 bytes total, while B is 50 bytes. So the use of a fixed size buffer, say inputBuf[1000], or [50] for instance - if they were all the same size - wouldn't work well either, AFAIK.
Any suggestions? I'm relatively new to C# (2 months) so please be gentle.
You could read the file byte by byte which you append to the currentBlock byte array until you find the next block. If the byte identifies a new block you can then parse the currentBlock using you case trick and make the currentBlock = characterJustRead.
This approach works even if the id of the next block is longer than 1 byte - in this case you just parse currentBlock[0,currentBlock.Lenght-lenOfCurrentIdInBytes] - in other words you read a little too much, but you then parse only what is needed and use what is left as the base for the next currentBlock.
If you want more speed you can read the file in chunks of X bytes, but apply the same logic.
You said "The issue is that the data is not 100% kosher - i.e. there are situations where I need to separately deal with the possibility that the character I expect to identify each block is not in the right place." but building a currentBlock still should work. The code surely will have some complications, maybe something like nextBlock, but I'm guessing here without knowing what incorrect data you have to deal with.

How to I read any file in binary using C#? [duplicate]

This question already has answers here:
C# - How do I read and write a binary file?
(4 answers)
Closed 9 years ago.
The application I'm attempting to create would read the binary code of any file and create a file with the exact same binary code, creating a copy.
While writing a program that reads a file and writes it somewhere else, I was running into encoding issues, so I hypothesize that reading as straight binary will overcome this.
The file being read into the application is important, as after I get this to work I will add additional functionality to search within or manipulate the file's data as it is read.
Update:
I'd like to thank everyone that took the time to answer, I now have a working solution. Wolfwyrd's answer was exactly what I needed.
BinaryReader will handle reading the file into a byte buffer. BinaryWriter will handle dumping those bytes back out to another file. Your code will be something like:
using (var binReader = new System.IO.BinaryReader(System.IO.File.OpenRead("PATHIN")))
using (var binWriter = new System.IO.BinaryWriter(System.IO.File.OpenWrite("PATHOUT")))
{
byte[] buffer = new byte[512];
while (binReader.Read(buffer, 0, 512) != 0)
{
binWriter.Write(buffer);
}
}
Here we cycle a buffer of 512 bytes and immediately write it out to the other file. You would need to choose sensible sizes for your own buffer (there's nothing stopping you reading the entire file if it's reasonably sized). As you mentioned doing pattern matching you will need to consider the case where a pattern overlaps a buffered read if you do not load the whole file into a single byte array.
This SO Question has more details on best practices on reading large files.
Look at MemoryStream and BinaryReader/BinaryWriter:
http://www.dotnetperls.com/memorystream
http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx
http://msdn.microsoft.com/en-us/library/system.io.binarywriter.aspx
Have a look at using BinaryReader Class
Reads primitive data types as binary values in a specific encoding.
and maybe BinaryReader.ReadBytes Method
Reads the specified number of bytes from the current stream into a
byte array and advances the current position by that number of bytes.
also BinaryWriter Class
Writes primitive types in binary to a stream and supports writing
strings in a specific encoding.
Another good example C# - Copying Binary Files
for instance, one char at a time.
using (BinaryReader writer = new BinaryWrite(File.OpenWrite("target"))
{
using (BinaryReader reader = new BinaryReader(File.OpenRead("source"))
{
var nextChar = reader.Read();
while (nextChar != -1)
{
writer.Write(Convert.ToChar(nextChar));
nextChar = reader.Read();
}
}
}
The application I'm attempting to create would read the binary code of any file and create a file with the exact same binary code, creating a copy.
Is this for academic purposes? Or do you actually just want to copy a file?
If the latter, you'll want to just use the System.IO.File.Copy method.

What is the BEST way to replace text in a File using C# / .NET?

I have a text file that is being written to as part of a very large data extract. The first line of the text file is the number of "accounts" extracted.
Because of the nature of this extract, that number is not known until the very end of the process, but the file can be large (a few hundred megs).
What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?
IMPORTANT NOTE: - I do not need to replace a "fixed amount of bytes" - that would be easy. The problem here is that the data that needs to be inserted at the top of the file is variable.
IMPORTANT NOTE 2: - A few people have asked about / mentioned simply keeping the data in memory and then replacing it... however that's completely out of the question. The reason why this process is being updated is because of the fact that sometimes it crashes when loading a few gigs into memory.
If you can you should insert a placeholder which you overwrite at the end with the actual number and spaces.
If that is not an option write your data to a cache file first. When you know the actual number create the output file and append the data from the cache.
BEST is very subjective. For any smallish file, you can easily open the entire file in memory and replace what you want using a string replace and then re-write the file.
Even for largish files, it would not be that hard to load into memory. In the days of multi-gigs of memory, I would consider hundreds of megabytes to still be easily done in memory.
Have you tested this naive approach? Have you seen a real issue with it?
If this is a really large file (gigabytes in size), I would consider writing all of the data first to a temp file and then write the correct file with the header line going in first and then appending the rest of the data. Since it is only text, I would probably just shell out to DOS:
TYPE temp.txt >> outfile.txt
I do not need to replace a "fixed
amount of bytes"
Are you sure?
If you write a big number to the first line of the file (UInt32.MaxValue or UInt64.MaxValue), then when you find the correct actual number, you can replace that number of bytes with the correct number, but left padded with zeros, so it's still a valid integer.
e.g.
Replace 999999 - your "large number placeholder"
With 000100 - the actual number of accounts
Seems to me if I understand the question correctly?
What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?
How about placing at the top of the file a token {UserCount} when it is first created.
Then use TextReader to read the file line by line. If it is the first line look for {UserCount} and replace with your value. Write out each line you read in using TextWriter
Example:
int lineNumber = 1;
int userCount = 1234;
string line = null;
using(TextReader tr = File.OpenText("OriginalFile"))
using(TextWriter tw = File.CreateText("ResultFile"))
{
while((line = tr.ReadLine()) != null)
{
if(lineNumber == 1)
{
line = line.Replace("{UserCount}", userCount.ToString());
}
tw.WriteLine(line);
lineNumber++;
}
}
If the extracted file is only a few hundred megabytes, then you can easily keep all of the text in-memory until the extraction is complete. Then, you can write your output file as the last operation, starting with the record count.
Ok, earlier I suggested an approach that would be a better if dealing with existing files.
However in your situation you want to create the file and during the create process go back to the top and write out the user count. This will do just that.
Here is one way to do it that prevents you having to write the temporary file.
private void WriteUsers()
{
string userCountString = null;
ASCIIEncoding enc = new ASCIIEncoding();
byte[] userCountBytes = null;
int userCounter = 0;
using(StreamWriter sw = File.CreateText("myfile.txt"))
{
// Write a blank line and return
// Note this line will later contain our user count.
sw.WriteLine();
// Write out the records and keep track of the count
for(int i = 1; i < 100; i++)
{
sw.WriteLine("User" + i);
userCounter++;
}
// Get the base stream and set the position to 0
sw.BaseStream.Position = 0;
userCountString = "User Count: " + userCounter;
userCountBytes = enc.GetBytes(userCountString);
sw.BaseStream.Write(userCountBytes, 0, userCountBytes.Length);
}
}

Categories