Replace a byte of data - c#

I'm trying to replace only one byte of data from a file, meaning something like 0X05 -> 0X15.
I'm using Replace function to do this.
using (StreamReader reader = new System.IO.StreamReader(Inputfile))
{
content = reader.ReadToEnd();
content = content.Replace("0x05","0x15");
reader.Close();
}
using (FileStream stream = new FileStream(outputfile, FileMode.Create))
{
using (BinaryWriter writer = new BinaryWriter(stream, Encoding.UTF8))
{
writer.Write(content);
}
}
Technically speaking, only that byte of data had to replaced with new byte, but I see there are many bytes of data changed.
Why other bytes are changing ?How can I achieve this?

You're talking about bytes but you've written code that reads strings; strings are an interpretation of bytes so if you truly do mean bytes, mangling them through strings is the wrong way to go
Anyways, there are helper methods to make your life easy, if the file is relatively small (maybe up to 500mb - I'd switch to using an incremental streaming reading/changing/writing method if it's bigger than this)
If you want bytes changed:
var b = File.ReadAllBytes("path");
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5)
b[x] = (byte)0x15;
File.WriteAllBytes("path", b);
If your file is a text file that literally has "0x05" in it:
File.WriteAllText("path", File.ReadAllText("path").Replace("0x05", "0x15"));
In response to your question in the comments, and assuming you want your file to grow by 2 bytes more for each 0x05 it contains (so a 1000 byte file that cotnains three 0x05 bytes will be 1006 bytes after being written) it is probably simplest to:
var b = File.ReadAllBytes("path");
using(FileStream fs = new FileStream("path", FileMode.Create)) //replace file
{
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5) {
fs.WriteByte((byte)0x15);
fs.WriteByte((byte)0x5);
fs.WriteByte((byte)0x15);
} else
fs.WriteByte(b);
}
Don't worry about writing a single byte at a time - it is buffered elsewhere in the IO chain. You could go for a solution that writes blocks of bytes from the array if you wanted.. this is just easier to code/understand

Related

How to read file bytes from byte offset?

If I am given a .cmp file and a byte offset 0x598, how can I read a file from this offset?
I can ofcourse read file bytes like this
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
But how can I read it from byte offset 0x598
To explain a bit more, actually from this offset the actual data starts that I have to read and before this byte offset it is just header data, so basically I have to read file from that offset till end.
Try code like this:
using (BinaryReader reader = new BinaryReader(File.Open("upgradefile.cmp", FileMode.Open)))
{
long offset = 0x598;
if (reader.BaseStream.Length > offset)
{
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
byte[]fileBytes = reader.ReadBytes((int) (reader.BaseStream.Length - offset));
}
}
If you are not familiar with Streams, Linq, or whatever, I have simplest solution for you:
Read entire file into memory (I hope you deal with small files):
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
Calculate how many bytes are present in array after given offset:
long startOffset = 0x598; // this is just hexadecimal representation for human, it can be decimal or whatever
long howManyBytesToRead = fileBytes.Length - startOffset;
Then just copy data to new array:
byte[] newArray = new byte[howManyBytesToRead];
long pos = 0;
for (int i = startOffset; i < fileBytes.Length; i++)
{
newArray[pos] = fileBytes[i];
pos = pos + 1;
}
If you understand how it works you can look at Array.Copy method in Microsoft documentation.
By not using ReadAllBytes.
Get a stream, move to potition, read rest of files.
You basically complain that a convenience method made to allow a one line read of a whole file is not what you want - ignoring that it is just that, a convenience method. The normal way to deal with files is opening them and using a Stream.

Better/faster way to fill a big array in C#

I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.
Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.
I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.
What are the different ways to speed up the initialization of these objects?
The fastest way is to manually serialize the data.
An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.
You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).
An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.
Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])
These is how you can implement them both:
// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
writer.Write(intArr[i]);
// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
for (int i = 0; i < intArr.Length; i++)
intArr[i] = reader.ReadInt32();
// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));
writer.Write(intArr.Length);
writer.Write(byteArr);
// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);
I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.
It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).
Both times include the work the garbage collector has to do.
Obviously method 2 is faster than method 1, though possibly less readable.
Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.
Something like this, will write 128MB of an int[] at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB
int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
intArr[i] = i;
byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB
int dataDone = 0;
using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
while (dataDone < intArr.Length)
{
int dataToWrite = intArr.Length - dataDone;
if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
writer.Write(byteArr);
dataDone += dataToWrite;
}
}
Note that this is just for writing, reading works differently too :P.
I hope this gives you some more insight in dealing with very large data files :).
If you've just got a bunch of integers, then using JSON will indeed be pretty inefficient in terms of parsing. You can use BinaryReader and BinaryWriter to write binary files efficiently... but it's not clear to me why you need to read the file every time you create an object anyway. Why can't each new object keep a reference to the original array, which has been read once? Or if they need to mutate the data, you could keep one "canonical source" and just copy that array in memory each time you create an object.
The fastest way to create a byte array from an array of integers is to use Buffer.BlockCopy
byte[] result = new byte[a.Length * sizeof(int)];
Buffer.BlockCopy(a, 0, result, 0, result.Length);
// write result to FileStream or wherever
If you store the size of the array in the first element, you can use it again to deserialize. Make sure everything fits into memory, but looking at your file sizes it should.
var buffer = File.ReadAllBytes(#"...");
int size = BitConverter.ToInt32(buffer,0);
var result = new int[size];
Buffer.BlockCopy(buffer, 0, result, result.length);
Binary is not human readable, but definetely faster than JSON.

Reading compressed file and writing to new file will not allow decompression

I have a test program that demonstrates the end result that I am hoping for (even though in this test program the steps may seem unnecessary).
The program compresses data to a file using GZipStream. The resulting compressed file is C:\mydata.dat.
I then read this file, and write it to a new file.
//Read original file
string compressedFile = String.Empty;
using (StreamReader reader = new StreamReader(#"C:\mydata.dat"))
{
compressedFile = reader.ReadToEnd();
reader.Close();
reader.Dispose();
}
//Write to a new file
using (StreamWriter file = new StreamWriter(#"C:\mynewdata.dat"))
{
file.WriteLine(compressedUserFile);
}
When I try to decompress the two files, the original one decompresses perfectly, but the new file throws an InvalidDataException with message The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
Why are these files different?
StreamReader is for reading a sequence of characters, not bytes. The same applies to StremWriter. Since treating compressed files as a stream of characters doesn't make any sense, you should use some implementation of Stream. If you want to get the stream as an array of bytes, you can use MemoryStream.
The exact reason why using character streams doesn't work is that they assume the UTF-8 encoding by default. If some byte is not valid UTF-8 (like the second byte of the header, 0x8B), it's represented as Unicode “replacement character” (U+FFFD). When the string is written back, that character is encoded using UTF-8 into something completely different than what was in the source.
For example, to read a file from a stream, get it as an array of bytes and then write it to another files as a stream:
byte[] bytes;
using (var fileStream = new FileStream(#"C:\mydata.dat", FileMode.Open))
using (var memoryStream = new MemoryStream())
{
fileStream.CopyTo(memoryStream);
bytes = memoryStream.ToArray();
}
using (var memoryStream = new MemoryStream(bytes))
using (var fileStream = new FileStream(#"C:\mynewdata.dat", FileMode.Create))
{
memoryStream.CopyTo(fileStream);
}
The CopyTo() method is only available in .Net 4, but you can write your own if you use older versions.
Of course, for this simple example, there is no need to use streams. You can simply do:
byte[] bytes = File.ReadAllBytes(#"C:\mydata.dat");
File.WriteAllBytes(#"C:\mynewdata.dat", bytes);
EDIT: Apparently, my suggestions are wrong/invalid/whatever... please use one of the others which have no doubt been highly re-factored to the point where no extra performance could be possible be achieved (else, that would mean they are just as invalid as mine)
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"C:\mydata.dat"))
{
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(#"C:\mynewdata.dat"))
{
byte[] bytes = new byte[1024];
int count = 0;
while((count = sr.BaseStream.Read(bytes, 0, bytes.Length)) > 0){
sw.BaseStream.Write(bytes, 0, count);
}
}
}
Read all bytes
byte[] bytes = null;
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"C:\mydata.dat"))
{
bytes = new byte[sr.BaseStream.Length];
int index = 0;
int count = 0;
while((count = sr.BaseStream.Read(bytes, index, 1024)) > 0){
index += count;
}
}
Read all bytes/write all bytes (from svick's answer):
byte[] bytes = File.ReadAllBytes(#"C:\mydata.dat");
File.WriteAllBytes(#"C:\mynewdata.dat", bytes);
PERFORMANCE TESTING WITH OTHER ANSWERS:
Just did a quick test between my Answer (StreamReader) (first part above, file copy) and svick's answer (FileStream/MemoryStream) (the first one). The test is 1000 iterations of the code, here are the results from 4 tests (results are in whole seconds, all actual result where slightly over these values):
My Code | svick code
--------------------
9 | 12
9 | 14
8 | 13
8 | 14
As you can see, in my test at least, my code performed better. One thing perhaps to note with mine is I am not reading a character stream, I am in fact accessing the BaseStream which is providing a byte stream. Perhaps svick's answer is slow because he is using two streams for reading, then two for writing. Of course, there is a lot of optimisation that could be done to svick's answer to improve the performance (and he also provided an alternative for simple file copy)
Testing with third option (ReadAllBytes/WriteAllBytes)
My Code | svick code | 3rd
----------------------------
8 | 14 | 7
9 | 18 | 9
9 | 17 | 8
9 | 17 | 9
Note: in milliseconds the 3rd option was always better

How to read a file from the x line in .NET

I read a file from the file system and FTP it to an FTP server. I now have a request to skip the first line (it's a CSV with header info). I think I can somehow do this with the Offset of the stream.Read method (or the write method) but I am not sure how to translate the byte array offset from a single line.
How would I calculate the offset to tell it to only read from the 2nd line of the file?
Thanks
// Read the file to be uploaded into a byte array
stream = File.OpenRead(currentQueuePathAndFileName);
var buffer = new byte[stream.Length];
stream.Read(buffer, 0, buffer.Length);
stream.Close();
// Get the stream for the request and write the byte array to it
var reqStream = request.GetRequestStream();
reqStream.Write(buffer, 0, buffer.Length);
reqStream.Close();
return request;
You should use File.ReadAllLines. It returns array of strings. Then just strArray.Skip(1) will return you all lines except of first.
UPDATE
Here is the code:
var stringArray = File.ReadAllLines(fileName);
if (stringArray.Length > 1)
{
stringArray = stringArray.Skip(1).ToArray();
var reqStream = request.GetRequestStream();
reqStream.Write(stringArray, 0, stringArray.Length);
reqStream.Close();
}
If you need just skip the one line, you can use the reader class, read line, get the current position and read contents as usual.
using (var stream = File.OpenRead("currentQueuePathAndFileName"))
using (var reader = new StreamReader(stream))
{
reader.ReadLine();
Console.Write(stream.Position);
var buffer = new byte[stream.Length - stream.Position];
stream.Read(buffer, 0, buffer.Length);
var reqStream = request.GetRequestStream();
reqStream.Write(buffer, 0, buffer.Length);
reqStream.Close();
return request;
}
And Why are you using RequestStream? Aren't you supposed to use the HttpContext.Current.Response.OutputStream ?
You could take the string from your stream input, parse it into lines with array split, and then drop the first, and put it back together, OR, you could regex out the first line, or just look for a line feed and copy the string from there on. etc.
You should use a [StreamReader][1] class. Then by using ReadLine until the result is null you read all the lines one by one. Just skeep the first one as your requirement. Reading all the file in memory is usually not a good idea ( it does not scale as the file became bigger )
The brute force approach to is to iterate through the source byte array and look for the first occurrence of the carriage return character followed by a line feed (CR+LF). That way you'll know the offset in the source array to exclude the first line in the CSV file.
Here's an example:
const byte carriageReturn = 13;
const byte lineFeed = 10;
byte[] file = File.ReadAllBytes(currentQueuePathAndFileName);
int offset = 0;
for (int i = 0; i < file.Length; i++)
{
if (file[i] == carriageReturn && file[++i] == lineFeed)
{
offset = i + 2;
break;
}
}
Note that this example assumes the source CSV file was created on Windows, since line breaks are expressed differently on other platforms.
Related resources:
Environment.NewLine Property

Problem with comparing 2 files - false when should be true?

I tried using byte by byte comparison and also comparing calculated hash of files (code samples below). I have a file, copy it - compare both - result is TRUE. But problem starts when I open one of the files:
With MS word files - after opening and closing one of files, result is still TRUE, but, for example, when I delete last symbol in file, then write it back again, and then try to compare again - result FALSE. Files are basically the same, bet theoretically it seems that in byte-by-byte they are not the same anymore.
With Excel files - even opening a file causes function to return false. Should it really be like that? Only thing that I can think of that has changed is Last Access time. But does it takes into consideration when comparing byte-by-byte?
So I wanted to ask, does this comparison really should work like this, and is there anything I could do to evade this? In my program I will compare mostly.pdf files, where editing won't be much of an option, but I still wanted to know why is it acting like this.
Byte-by-Byte with buffer:
static bool FilesAreEqualFaster(string f1, string f2)
{
// get file length and make sure lengths are identical
long length = new FileInfo(f1).Length;
if (length != new FileInfo(f2).Length)
return false;
byte[] buf1 = new byte[4096];
byte[] buf2 = new byte[4096];
// open both for reading
using (FileStream stream1 = File.OpenRead(f1))
using (FileStream stream2 = File.OpenRead(f2))
{
// compare content for equality
int b1, b2;
while (length > 0)
{
// figure out how much to read
int toRead = buf1.Length;
if (toRead > length)
toRead = (int)length;
length -= toRead;
// read a chunk from each and compare
b1 = stream1.Read(buf1, 0, toRead);
b2 = stream2.Read(buf2, 0, toRead);
for (int i = 0; i < toRead; ++i)
if (buf1[i] != buf2[i])
return false;
}
}
return true;
}
Hash:
private static bool CompareFileHashes(string fileName1, string fileName2)
{
// Compare file sizes before continuing.
// If sizes are equal then compare bytes.
if (CompareFileSizes(fileName1, fileName2))
{
// Create an instance of System.Security.Cryptography.HashAlgorithm
HashAlgorithm hash = HashAlgorithm.Create();
// Declare byte arrays to store our file hashes
byte[] fileHash1;
byte[] fileHash2;
// Open a System.IO.FileStream for each file.
// Note: With the 'using' keyword the streams
// are closed automatically.
using (FileStream fileStream1 = new FileStream(fileName1, FileMode.Open),
fileStream2 = new FileStream(fileName2, FileMode.Open))
{
// Compute file hashes
fileHash1 = hash.ComputeHash(fileStream1);
fileHash2 = hash.ComputeHash(fileStream2);
}
return BitConverter.ToString(fileHash1) == BitConverter.ToString(fileHash2);
}
else
{
return false;
}
}
Aside from anything else, this code is wrong:
b1 = stream1.Read(buf1, 0, toRead);
b2 = stream2.Read(buf2, 0, toRead);
for (int i = 0; i < toRead; ++i)
if (buf1[i] != buf2[i])
return false;
You're ignoring the possibility of b1 and b2 being unequal to each other and to toRead. What if you only read 10 bytes from the first stream and 20 from the second, when you asked for 30? You may not have reached the end of the files, but it can still potentially return you less data than you ask for. Never ignore the return value of Stream.Read. (You're saving it in a variable but then ignoring the variable.)
Basically you'll need to have independent buffers, which are replenished when necessary - keep track of where you are within each buffer, and how much useful data is there. Read more data into each buffer when you need to.
Then there's the other problem of files actually changing just by opening them, as Henk mentioned.

Categories