How to read file bytes from byte offset? - c#

If I am given a .cmp file and a byte offset 0x598, how can I read a file from this offset?
I can ofcourse read file bytes like this
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
But how can I read it from byte offset 0x598
To explain a bit more, actually from this offset the actual data starts that I have to read and before this byte offset it is just header data, so basically I have to read file from that offset till end.

Try code like this:
using (BinaryReader reader = new BinaryReader(File.Open("upgradefile.cmp", FileMode.Open)))
{
long offset = 0x598;
if (reader.BaseStream.Length > offset)
{
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
byte[]fileBytes = reader.ReadBytes((int) (reader.BaseStream.Length - offset));
}
}

If you are not familiar with Streams, Linq, or whatever, I have simplest solution for you:
Read entire file into memory (I hope you deal with small files):
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
Calculate how many bytes are present in array after given offset:
long startOffset = 0x598; // this is just hexadecimal representation for human, it can be decimal or whatever
long howManyBytesToRead = fileBytes.Length - startOffset;
Then just copy data to new array:
byte[] newArray = new byte[howManyBytesToRead];
long pos = 0;
for (int i = startOffset; i < fileBytes.Length; i++)
{
newArray[pos] = fileBytes[i];
pos = pos + 1;
}
If you understand how it works you can look at Array.Copy method in Microsoft documentation.

By not using ReadAllBytes.
Get a stream, move to potition, read rest of files.
You basically complain that a convenience method made to allow a one line read of a whole file is not what you want - ignoring that it is just that, a convenience method. The normal way to deal with files is opening them and using a Stream.

Related

C# - Stream.Read offset is working incorrectly

I have a test code like below. I am reading from stream, offsetting by 2 positions, and then taking next 2 bytes. I would hope that result would be an array with 2 elements. This does not work though - offset is completely ignored, and full sized array is always returned, with only offset blocks having values. But this means my result table is still very large, it just has a lot of unwanted zeroes
How can I rework below code, so that file.Read() returns only an array of 2 bytes instead of 10 when length = 2 and offset = 2? In real world scenario I am dealing with large files (>2gigs) so filtering out the result array is not an option.
Edit: As the issue is unclear - below code requires me to always define output array that is the same size as the stream. Instead I would like to have an output that is of size of length (in below example I would like to have var buffer = new byte[2], but that will throw an exception because file.Read ignores offset and length and always returns 10 elements (with only 2 of them being read, rest is dummy zeroes).
private byte[] GetFilePart(int length, int offset)
{
//build some dummy content
var content = new byte[10];
for (int i = 0; i<10; i++)
{
content[i] = 1;
}
//read the data from content
var buffer = new byte[10];
using (Stream file = new MemoryStream(content))
{
file.Read(buffer, offset, length);
}
return buffer;
}
Looks like it's working properly to me; maybe your confusion would clear a bit if you inited your content array with something like:
for (int i = 1; i<=10; i++)
{
content[i-1] = i;
}
then each byte would have a different number and the image would look like:
offset relates to where into buffer the Stream will write the bytes to (it reads from the start of content). It does not relate to what bytes are read out of content.
Imagine Read as being called WriteBytesInto(byte[] whatBuffer, int whereToStartWriting, int howManyBytesToWrite) - you provide the buffer it will write into and tell it where to start and how many to do
If you did this, having inited content to be incrementing numbers:
file.Read(buffer, 2, 3); //read 3 bytes from stream and write to buffer # index 2
file.Read(buffer, 0, 2); //read 2 bytes from stream and write to buffer # index 0
Your buffer would end up looking like:
4,5,1,2,3,0,0,0,0,0
The 1,2,3 having been written first, then the 4,5 written next
If you want to skip two bytes from the stream (i.e. read the 3rd and 4th byte from content, Seek() the stream or set its Position (or as canton7 advises in the commments, if the stream is not seekable, read and discard some bytes)
How can I rework below code, so that file.Read() returns only an array of 2 bytes instead of 10 when length = 2 and offset = 2?
Well, file.Read doesn't return an array at all; it modifies an array you give it. If you want a 2 byte array, give it a 2 byte array:
byte buf = new byte[2];
file.Read(buf, 0, buf.Length);
If you want to open a file, skip the first 7 bytes and then read bytes 8th and 9th into your length-of-2 byte array then:
byte buf = new byte[2];
file.Position = 7; //absolute skip to 8th byte
file.Read(buf, 0, buf.Length);
For more on seeking in streams see Stream.Seek(0, SeekOrigin.Begin) or Position = 0

Replace a byte of data

I'm trying to replace only one byte of data from a file, meaning something like 0X05 -> 0X15.
I'm using Replace function to do this.
using (StreamReader reader = new System.IO.StreamReader(Inputfile))
{
content = reader.ReadToEnd();
content = content.Replace("0x05","0x15");
reader.Close();
}
using (FileStream stream = new FileStream(outputfile, FileMode.Create))
{
using (BinaryWriter writer = new BinaryWriter(stream, Encoding.UTF8))
{
writer.Write(content);
}
}
Technically speaking, only that byte of data had to replaced with new byte, but I see there are many bytes of data changed.
Why other bytes are changing ?How can I achieve this?
You're talking about bytes but you've written code that reads strings; strings are an interpretation of bytes so if you truly do mean bytes, mangling them through strings is the wrong way to go
Anyways, there are helper methods to make your life easy, if the file is relatively small (maybe up to 500mb - I'd switch to using an incremental streaming reading/changing/writing method if it's bigger than this)
If you want bytes changed:
var b = File.ReadAllBytes("path");
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5)
b[x] = (byte)0x15;
File.WriteAllBytes("path", b);
If your file is a text file that literally has "0x05" in it:
File.WriteAllText("path", File.ReadAllText("path").Replace("0x05", "0x15"));
In response to your question in the comments, and assuming you want your file to grow by 2 bytes more for each 0x05 it contains (so a 1000 byte file that cotnains three 0x05 bytes will be 1006 bytes after being written) it is probably simplest to:
var b = File.ReadAllBytes("path");
using(FileStream fs = new FileStream("path", FileMode.Create)) //replace file
{
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5) {
fs.WriteByte((byte)0x15);
fs.WriteByte((byte)0x5);
fs.WriteByte((byte)0x15);
} else
fs.WriteByte(b);
}
Don't worry about writing a single byte at a time - it is buffered elsewhere in the IO chain. You could go for a solution that writes blocks of bytes from the array if you wanted.. this is just easier to code/understand

Processing Huge Files In C#

I have a 4Gb file that I want to perform a byte based find and replace on. I have written a simple program to do it but it takes far too long (90 minutes+) to do just one find and replace. A few hex editors I have tried can perform the task in under 3 minutes and don't load the entire target file into memory. Does anyone know a method where I can accomplish the same thing? Here is my current code:
public int ReplaceBytes(string File, byte[] Find, byte[] Replace)
{
var Stream = new FileStream(File, FileMode.Open, FileAccess.ReadWrite);
int FindPoint = 0;
int Results = 0;
for (long i = 0; i < Stream.Length; i++)
{
if (Find[FindPoint] == Stream.ReadByte())
{
FindPoint++;
if (FindPoint > Find.Length - 1)
{
Results++;
FindPoint = 0;
Stream.Seek(-Find.Length, SeekOrigin.Current);
Stream.Write(Replace, 0, Replace.Length);
}
}
else
{
FindPoint = 0;
}
}
Stream.Close();
return Results;
}
Find and Replace are relatively small compared with the 4Gb "File" by the way. I can easily see why my algorithm is slow but I am not sure how I could do it better.
Part of the problem may be that you're reading the stream one byte at a time. Try reading larger chunks and doing a replace on those. I'd start with about 8kb and then test with some larger or smaller chunks to see what gives you the best performance.
There are lots of better algorithms for finding a substring in a string (which is basically what you are doing)
Start here:
http://en.wikipedia.org/wiki/String_searching_algorithm
The gist of them is that you can skip a lot of bytes by analyzing your substring. Here's a simple example
4GB File starts with: A B C D E F G H I J K L M N O P
Your substring is: N O P
You skip the length of the substring-1 and check against the last byte, so compare C to P
It doesn't match, so the substring is not the first 3 bytes
Also, C isn't in the substring at all, so you can skip 3 more bytes (len of substring)
Compare F to P, doesn't match, F isn't in substring, skip 3
Compare I to P, etc, etc
If you match, go backwards. If the character doesn't match, but is in the substring, then you have to do some more comparing at that point (read the link for details)
Instead of reading file byte by byte read it by buffer:
buffer = new byte[bufferSize];
currentPos = 0;
length = (int)Stream .Length;
while ((count = Stream.Read(buffer, currentPos, bufferSize)) > 0)
{
currentPos += count;
....
}
Another, easier way of reading more than one byte at a time:
var Stream = new BufferedStream(new FileStream(File, FileMode.Open, FileAccess.ReadWrite));
Combining this with Saeed Amiri's example of how to read into a buffer, and one of the better binary find/replace algorithms should give you better results.
You should try using memory-mapped files. C# supports them starting with version 4.0.
A memory-mapped file contains the contents of a file in virtual memory.
Persisted files are memory-mapped files that are associated with a source file on a disk. When the last process has finished working with the file, the data is saved to the source file on the disk. These memory-mapped files are suitable for working with extremely large source files.

Split up a memorystream into bytarray

Im trying to split up a memorystream into chunks by reading parts into a byte array but i think i have got something fundamentally wrong. I can read the first chunk but when i try to read rest of memorystream i get index out of bound even if there are more bytes to read. It seems that the issue is the size of the receving byte buffer that need to be as large as the memorystrem. I need to convert it into chunks as the code is in a webservice.
Anyone knows whats wrong with this code
fb.buffer is MemoryStream
long bytesLeft = fb.Buffer.Length;
fb.Buffer.Position = 0;
int offset =0;
int BUFF_SIZE = 8196;
while (bytesLeft > 0)
{
byte[] fs = new byte[BUFF_SIZE];
fb.Buffer.Read(fs, offset, BUFF_SIZE);
offset += BUFF_SIZE;
bytesLeft -= BUFF_SIZE;
}
offset here is the offset into the array. It should be zero here. You should also be looking at the return value from Read. It is not guaranteed to fill the buffer, even if more data is available.
However, if this is a MemoryStream - a better option might be ArraySegment<byte>, which requires no duplication of data.
Please look at this code for Stream.Read from MSDN from a glance - you shouldn't be incrementing offset - it should always be zero. Unless, of course, you happen to know for a fact the exact length of the stream in advance (therefore you would create the array the exact size).
You should also always grab the amount of bytes read from Read (the return value).
If you're looking to split it into 'chunks` do you mean you want n 8k chunks? Then you might want to do something like this:
List<byte[]> chunks = new List<byte[]>();
byte chunk = new byte[BUFF_SIZE];
int bytesRead = fb.Buffer.Read(chunk, 0, BUFF_SIZE);
while(bytesRead > 0)
{
if(bytesRead != BUFF_SIZE)
{
byte[] tail = new byte[bytesRead];
Array.Copy(chunk, tail, bytesRead);
chunk = tail;
}
chunks.Add(chunk);
bytesRead = fb.Buffer.Read(chunk, 0, BUFF_SIZE);
}
Note in particular that the last chunk is more than likely not going to be exactly BUFF_SIZE in length.

Unable to read beyond the end of the stream

I did some quick method to write a file from a stream but it's not done yet. I receive this exception and I can't find why:
Unable to read beyond the end of the stream
Is there anyone who could help me debug it?
public static bool WriteFileFromStream(Stream stream, string toFile)
{
FileStream fileToSave = new FileStream(toFile, FileMode.Create);
BinaryWriter binaryWriter = new BinaryWriter(fileToSave);
using (BinaryReader binaryReader = new BinaryReader(stream))
{
int pos = 0;
int length = (int)stream.Length;
while (pos < length)
{
int readInteger = binaryReader.ReadInt32();
binaryWriter.Write(readInteger);
pos += sizeof(int);
}
}
return true;
}
Thanks a lot!
Not really an answer to your question but this method could be so much simpler like this:
public static void WriteFileFromStream(Stream stream, string toFile)
{
// dont forget the using for releasing the file handle after the copy
using (FileStream fileToSave = new FileStream(toFile, FileMode.Create))
{
stream.CopyTo(fileToSave);
}
}
Note that i also removed the return value since its pretty much useless since in your code, there is only 1 return statement
Apart from that, you perform a Length check on the stream but many streams dont support checking Length.
As for your problem, you first check if the stream is at its end. If not, you read 4 bytes. Here is the problem. Lets say you have a input stream of 6 bytes. First you check if the stream is at its end. The answer is no since there are 6 bytes left. You read 4 bytes and check again. Ofcourse the answer is still no since there are 2 bytes left. Now you read another 4 bytes but that ofcourse fails since there are only 2 bytes. (readInt32 reads the next 4 bytes).
I presume that the input stream have ints only (Int32). You need to test the PeekChar() method,
while (binaryReader.PeekChar() != -1)
{
int readInteger = binaryReader.ReadInt32();
binaryWriter.Write(readInteger);
}
You are doing while (pos < length) and length is the actual length of the stream in bytes. So you are effectively counting the bytes in the stream and then trying to read that many number of ints (which is incorrect). You could take length to be stream.Length / 4 since an Int32 is 4 bytes.
try
int length = (int)binaryReader.BaseStream.Length;
After reading the stream by the binary reader the position of the stream is at the end, you have to set the position to zero "stream.position=0;"

Categories