Open a file without loading it into memory at once - c#

Hi I have a problem to solve for college and I have a hard time understanding the sentence of the problem.
This is the problem I have :
Reverse the order of bytes from a file without loading the entire file into memory at once.You have to solve this problem in C# , Java , PHP and Python.
Now there are two things that I do not understand here.
First I am not sure if bytes refer to the actual characters of the file , or to something else.The problem does not state if it is a text file or not.
Second I am not sure how to open a file without actualy loading into memory.
This is how I would normaly approach this problem , but I think if I do it this way the file gets loaded into memory:
string fileName = 'file.txt';
reader = new StreamReader(fileName);
string line;
while ((line = reader.ReadLine()) != null)
{
Console.WriteLine(line + "\n");
}
Also I am not sure how I would actualy reverse all the characters if I am reading it one line at the time.
EDIT Sorry for posting in multiple languages I do not want the solution for the problem I only want to clarify it so I can solve it myself.I assumed that because I have to solve it in four different languages the concept would apply to all 4 and it did not matter who answer

Open a FileStream and use the Seek method to go to the end of the file. From there, go backwards, reading one byte at a time. This will read in reverse order. So, until you reach the beginning of the file, loop:
read 1 byte
// do whatever you want with that byte...write to another file?
seek back 2 bytes
As to efficiency, you can read a buffer of, say, 1024 bytes in memory. That way, you don't issue Read operations for each byte of the file. Once you have the buffer filled, reverse it and you're good to go.

Related

byte array stream crc check

I have a small problem with crc checking in c#, im needing to read a file which contains the crc value in the last 8 bytes, how im doing it now is like
using filestream with filemode open
calculate stream length minus 8 bytes
stream.read(buffer,0,streamlength minus 8 bytes)
crc computehash passed in buffer
this leaves the remaining 8bytes which I compare against the crcvalue
the problem ive got is that it works ok for small files, but obviously I get a system out of memory exception for bigger files, I know computehash will take a stream but its either pass in full stream which means I cant get the remaining bytes.
Is there a better way of doing this?
kindest regards
Providing a code snippet will prove to be a great help for us that are trying to help you. While I understand what you are saying I can never be sure that I do get what you did without reading your code.
Not being sure what you want to do with the file I suggest you also look at the MemoryStream class. One quick advantage of a MemoryStream is that there is no need to create temporary buffers and files in an application meaning that you could actually save on memory.
You can apply your current method in a similar fashion to MemoryStream and see if that works.
Info on MemoryStream: http://msdn.microsoft.com/en-us/library/system.io.memorystream%28v=vs.110%29.aspx

C# Reading, Modifying then writing binary data to file. Best convention?

I'm new to programming in general (My understanding of programming concepts is still growing.). So this question is about learning, so please provide enough info for me to learn but not so much that I can't, thank you.
(I would also like input on how to make the code reusable with in the project.)
The goal of the project I'm working on consists of:
Read binary file.
I have known offsets I need to read to find a particular chunk of data from within this file.
First offset is first 4 bytes(Offset for end of my chunk).
Second offset is 16 bytes from end of file. I read for 4 bytes.(Gives size of chunk in hex).
Third offset is the 4 bytes following previous, read for 4 bytes(Offset for start of chunk in hex).
Locate parts in the chunk to modify by searching ASCII text as well as offsets.
Now I have the start offset, end offset and size of my chunk.
This should allow me to read bytes from file into a byte array and know the size of the array ahead of time.
(Questions: 1. Is knowing the size important? Other than verification. 2. Is reading part of a file into a byte array in order to change bytes and overwrite that part of the file the best method?)
So far I have managed to read the offsets from the file using BinaryReader on a MemoryStream. I then locate the chunk of data I need and read that into a byte array.
I'm stuck in several ways:
What are the best practices for binary Reading / Writing?
What's the best storage convention for the data that is read?
When I need to modify bytes how do I go about that.
Should I be using FileStream?
Since you want to both read and write, it makes sense to use the FileStream class directly (using FileMode.Open and FileAccess.ReadWrite). See FileStream on MSDN for a good overall example.
You do need to know the number of bytes that you are going to be reading from the stream. See the FileStream.Read documentation.
Fundamentally, you have to read the bytes into memory at some point if you're going to use and later modify their contents. So you will have to make an in-memory copy (using the Read method is the right way to go if you're reading a variable-length chunk at a time).
As for best practices, always dispose your streams when you're done; e.g.:
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
//Do work with the FileStream here.
}
If you're going to do a large amount of work, you should be doing the work asynchronously. (Let us know if that's the case.)
And, of course, check the FileStream.Read documentation and also the FileStream.Write documentation before using those methods.
Reading bytes is best done by pre-allocating an in-memory array of bytes with the length that you're going to read, then reading those bytes. The following will read the chunk of bytes that you're interested in, let you do work on it, and then replace the original contents (assuming the length of the chunk hasn't changed):
EDIT: I've added a helper method to do work on the chunk, per the comments on variable scope.
using (var stream = File.Open(FILE_NAME, FileMode.Open, FileAccess.ReadWrite))
{
var chunk = new byte[numOfBytesInChunk];
var offsetOfChunkInFile = stream.Position; // It sounds like you've already calculated this.
stream.Read(chunk, 0, numOfBytesInChunk);
DoWorkOnChunk(ref chunk);
stream.Seek(offsetOfChunkInFile, SeekOrigin.Begin);
stream.Write(chunk, 0, numOfBytesInChunk);
}
private void DoWorkOnChunk(ref byte[] chunk)
{
//TODO: Any mutation done here to the data in 'chunk' will be written out to the stream.
}

Reading a MemoryStream which contains multiple files

If I have a single MemoryStream of which I know I sent multiple files (example 5 files) to this MemoryStream. Is it possible to read from this MemoryStream and be able to break apart file by file?
My gut is telling me no since when we Read, we are reading byte by byte... Any help and a possible snippet would be great. I haven't been able to find anything on google or here :(
You can't directly, not if you don't delimit the files in some way or know the exact size of each file as it was put into the buffer.
You can use a compressed file such as a zip file to transfer multiple files instead.
A stream is just a line of bytes. If you put the files next to each other in the stream, you need to know how to separate them. That means you must know the length of the files, or you should have used some separator. Some (most) file types have a kind of header, but looking for this in an entire stream may not be waterproof either, since the header of a file could just as well be data in another file.
So, if you need to write files to such a stream, it is wise to add some extra information. For instance, start with a version number, then, write the size of the first file, write the file itself and then write the size of the next file, etc....
By starting with a version number, you can make alterations to this format. In the future you may decide you need to store the file name as well. In that case, you can increase version number, make up a new format, and still be able to read streams that you created earlier.
This is of course especially useful if you store these streams too.
Since you're sending them, you'll have to send them into the stream in such a way that you'll know how to pull them out. The most common way of doing this is to use a length specification. For example, to write the files to the stream:
write an integer to the stream to indicate the number of files
Then for each file,
write an integer (or a long if the files are large) to indicate the number of bytes in the file
write the file
To read the files back,
read an integer (n) to determine the number of files in the stream
Then, iterating n times,
read an integer (or long if that's what you chose) to determine the number of bytes in the file
read the file
You could use an IEnumerable<Stream> instead.
You need to implement this yourself, what you would want to do is write in some sort of 'delimited' into the stream. As you're reading, look for that delimited, and you'll know when you have hit a new file.
Here's a quick and dirty example:
byte[] delimiter = System.Encoding.Default.GetBytes("++MyDelimited++");
ms.Write(myFirstFile);
ms.Write(delimiter);
ms.Write(mySecondFile);
....
int len;
do {
len = ms.ReadByte(buffer, lastOffest, delimiter.Length);
if(buffer == delimiter)
{
// Close and open a new file stream
}
// Write buffer to output stream
} while(len > 0);

When should I slurp a file, and when should I read it by-line?

Imagine that I have a C# application that edits text files. The technique employed for each file can be either:
1) Read the file at once in to a string, make the changes, and write the string over the existing file:
string fileContents = File.ReadAllText(fileName);
// make changes to fileContents here...
using (StreamWriter writer = new StreamWriter(fileName))
{
writer.Write(fileContents);
}
2) Read the file by line, writing the changes to a temp file, then deleting the source and renaming the temp file:
using (StreamReader reader = new StreamReader(fileName))
{
string line;
using (StreamWriter writer = new StreamWriter(fileName + ".tmp"))
{
while (!reader.EndOfStream)
{
line = reader.ReadLine();
// make changes to line here
writer.WriteLine(line);
}
}
}
File.Delete(fileName);
File.Move(fileName + ".tmp", fileName);
What are the performance considerations with these options?
It seems to me that either reading by line or reading the entire file at once, the same quantity of data will be read, and disk times will dominate the memory alloc times. That said, once a file is in memory, the OS is free to page it back out, and when it does so the benefit of that large read has been lost. On the other hand, when working wit a temporary file, once the handles are closed I need to delete the old file and rename the temp file, which incurs a cost.
Then there are questions around caching, and prefetching, and disk buffer sizes...
I am assuming that in some cases, slurping the file is better, and in others, operating by line is better. My question is, what are the conditions for these two cases?
in some cases, slurping the file is better, and in others, operating by line is better.
Very nearly; except that reading line-by-line is actually a much more specific case. The actual choices we want to distinguish between are ReadAll and using a buffer. ReadLine makes assumptions - the biggest one being that the file actually has lines, and they are a reasonable length! If we can't make this assumption about the file, we want to choose a specific buffer size and read into that, regardless of whether we've reached the end of a line or not.
So deciding between reading it all at once and using a buffer - always go with the easiest to implement, and most naive approach until you run into a specific situation that does not work for you - and having a concrete case, you can make an educated decision based on the information you actually have, rather than speculating about hypothetical situations.
Simplest - read it all at once.
Is performance becoming a problem? Does this application run against uncontrolled files, so their size is not predictable? Just a few examples where you want to chunk it.

What is the BEST way to replace text in a File using C# / .NET?

I have a text file that is being written to as part of a very large data extract. The first line of the text file is the number of "accounts" extracted.
Because of the nature of this extract, that number is not known until the very end of the process, but the file can be large (a few hundred megs).
What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?
IMPORTANT NOTE: - I do not need to replace a "fixed amount of bytes" - that would be easy. The problem here is that the data that needs to be inserted at the top of the file is variable.
IMPORTANT NOTE 2: - A few people have asked about / mentioned simply keeping the data in memory and then replacing it... however that's completely out of the question. The reason why this process is being updated is because of the fact that sometimes it crashes when loading a few gigs into memory.
If you can you should insert a placeholder which you overwrite at the end with the actual number and spaces.
If that is not an option write your data to a cache file first. When you know the actual number create the output file and append the data from the cache.
BEST is very subjective. For any smallish file, you can easily open the entire file in memory and replace what you want using a string replace and then re-write the file.
Even for largish files, it would not be that hard to load into memory. In the days of multi-gigs of memory, I would consider hundreds of megabytes to still be easily done in memory.
Have you tested this naive approach? Have you seen a real issue with it?
If this is a really large file (gigabytes in size), I would consider writing all of the data first to a temp file and then write the correct file with the header line going in first and then appending the rest of the data. Since it is only text, I would probably just shell out to DOS:
TYPE temp.txt >> outfile.txt
I do not need to replace a "fixed
amount of bytes"
Are you sure?
If you write a big number to the first line of the file (UInt32.MaxValue or UInt64.MaxValue), then when you find the correct actual number, you can replace that number of bytes with the correct number, but left padded with zeros, so it's still a valid integer.
e.g.
Replace 999999 - your "large number placeholder"
With 000100 - the actual number of accounts
Seems to me if I understand the question correctly?
What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?
How about placing at the top of the file a token {UserCount} when it is first created.
Then use TextReader to read the file line by line. If it is the first line look for {UserCount} and replace with your value. Write out each line you read in using TextWriter
Example:
int lineNumber = 1;
int userCount = 1234;
string line = null;
using(TextReader tr = File.OpenText("OriginalFile"))
using(TextWriter tw = File.CreateText("ResultFile"))
{
while((line = tr.ReadLine()) != null)
{
if(lineNumber == 1)
{
line = line.Replace("{UserCount}", userCount.ToString());
}
tw.WriteLine(line);
lineNumber++;
}
}
If the extracted file is only a few hundred megabytes, then you can easily keep all of the text in-memory until the extraction is complete. Then, you can write your output file as the last operation, starting with the record count.
Ok, earlier I suggested an approach that would be a better if dealing with existing files.
However in your situation you want to create the file and during the create process go back to the top and write out the user count. This will do just that.
Here is one way to do it that prevents you having to write the temporary file.
private void WriteUsers()
{
string userCountString = null;
ASCIIEncoding enc = new ASCIIEncoding();
byte[] userCountBytes = null;
int userCounter = 0;
using(StreamWriter sw = File.CreateText("myfile.txt"))
{
// Write a blank line and return
// Note this line will later contain our user count.
sw.WriteLine();
// Write out the records and keep track of the count
for(int i = 1; i < 100; i++)
{
sw.WriteLine("User" + i);
userCounter++;
}
// Get the base stream and set the position to 0
sw.BaseStream.Position = 0;
userCountString = "User Count: " + userCounter;
userCountBytes = enc.GetBytes(userCountString);
sw.BaseStream.Write(userCountBytes, 0, userCountBytes.Length);
}
}

Categories