Find and replace data in file (without loading the entire thing)? - c#

I want to replace some data in a file, however I do not know exactly where this 200MB file would contain it. Is it possible to find (and replace them with something else) these values without loading a 200mb+ file into the memory?

Searching the file is not a problem. What you need is to work with the FileStream which is available via File.Open method. You can read through the file up to the bytes you need to replace.
Problem arises when you need to insert something. The FileStream allows you to overwrite some or all of the file contents from a particular byte forth and to append new content to its end but it does not allow you to insert data in the middle of the file. In order to overcome this problem you are going to need a temporary file. If you agree to that you could do the following:
Open the FileStream on the original file.
Create a temporary file that will hold the draft version.
Search through the original file and copy all "good" data into temporary file up to the point where modifications are to be made.
Insert modified and new data into the temporary file.
Finish up the temporary file with the remaining "good" content from the original file.
Replace the original file with the temporary one.
Delete the temporary file.
You could use the Path.GetTempFileName method for convenient way of utilizing a temporary file.
P.S. If you modify an exe then you probably make replacements on text constants and you neither need to insert new bytes nor to remove any. In such a case you do not need to bother with the temporary file and the FileStream is all you need.
P.P.S. Working with the FileStream you decide on size of a buffer you read from file and write back. Keep in mind that this size is the tradeoff between memory consumption, I/O performance and complexity of your code. Choose wisely. I would make it per-byte for the first time and try to optimize increasing the buffer to say 64k when it works. You can count on the FileStream to buffer data; it is not performing disk I/O each time you request another byte from it. If you dive into buffering yourself then try not to fragment the Large Object Heap. The threshold for .NET 4.5 is 85000 bytes.

Just a thought, how about reading your file line by line or may be in chunk of bytes and see in each chunk if u have the data that needs to be replaced. Also while reading make sure get the file pointer till where you have read the file so that when u find the match then u can go back to that location and over write those exact bytes which u have targetted.

Related

Editing specific line of text file in asp.net?

I have to change the specific line of the text file in asp.net.
Can I change/Replace the text in a particular line only??
I have used the replace function in text file but it is replacing text in entire file.
I want to replace only one line specified by me.
Waiting for the reply..
Thanks in advance..
File systems don't generally allow you to edit within a file other than directly overwriting byte-by-byte. If your text file uses the same number of bytes for every line, then you can very efficiently replace a line of text - but that's a relatively rare case these days.
It's more likely that you'll need to take one of these options:
Load the whole file into memory using File.ReadAllLines, change the relevant line, and then write it out again using File.WriteAllLines. This is inefficient in terms of memory, but really simple to code. If your file is small, it's a good option.
Open the input file and a new output file. Read a line of text at a time from the input, and either copying it to the output or writing a different line instead. Then close both files, delete the input file and rename the output file. This only requires a single line of text in memory at a time, but it's considerably more fiddly.
The second option has another benefit - you can shuffle the files around (using lots of rename steps) so that at no point do you ever have the possibility of losing the input file unless the output file is known to be complete and in the right place. That's even more complicated though.

Reading Data from a File as it grows

I have a binary data file that is written to from a live data stream, so it keeps on growing as stream comes. In the meanwhile, I need to open it at the same time in read-only mode to display data on my application (time series chart). Opening the whole file takes a few minutes as it is pretty large (a few 100' MBytes).
What I would like to do is, rather than re-opening/reading the whole file every x seconds, read only the last data that was added to the file and append it to the data that was already read.
I would suggest using FileSystemWatcher to be notified of changes to the file. From there, cache information such as the size of the file between events and add some logic to only respond to full lines, etc. You can use the Seek() method of the FileStream class to jump to a particular point in the file and read only from there. I hope it helps.
If you control the writing of this file, I would split it in several files of a predefined size.
When the writer determines that the current file is larger than, say, 50MB, close it and immediately create a new file to write data to. The process writing this data should always know the current file to write received data to.
The reader thread/process would read all these files in order, jumping to the next file when the current file was read completely.
You can probably use a FileSystemWatcher to monitor for changes in the file, like the example given here: Reading changes in a file in real-time using .NET.
But I'd suggest that you evaluate another solution, including a queue, like RabbitMQ, or Redis - any queue that has Subscriber-Publisher model. Then you'll just push the live data into the queue, and will have 2 different listeners(subscribers) - one to save in the file, and the other to process the last-appended data. This way you can achieve more flexibility with distributing load of the application.

Open file from byte array

I am storing attachments in my applications.
These gets stored in SQL as varbinary types.
I then read them into byte[] object.
I now need to open these files but dont want to first write the files to disk and then open using Process.Start().
I would like to open using inmemory streams. Is there a way to to this in .net. Please note these files can be of any type
You can write all bytes to file without using Streams:
System.IO.File.WriteAllBytes(path, bytes);
And then just use
Process.Start(path);
Trying to open file from memory isn't worth the result. Really, you don't want to do it.
MemoryStream has a constructor that takes a Byte array.
So:
var bytes = GetBytesFromDatabase(); // assuming you can do that yourself
var stream = new MemoryStream(bytes);
// use the stream just like a FileStream
That should pretty much do the trick.
Edit: Aw, crap, I totally missed the Process.Start part. I'm rewriting...
Edit 2:
You cannot do what you want to do. You must execute a process from a file. You'll have to write to disk; alternatively, the answer to this question has a very complex suggestion that might work, but would probably not be worth the effort.
MemoryMappedFile?
http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile.aspx
My only issue with this was that I will have to make sure the user has write access to the path where I will place the file...
You should be able to guarantee that the return of Path.GetTempFileName is something to which your user has access.
...and also am not sure how I will detect that the user has closed the file so I can delete the file from disk.
If you start the process with Process.Start(...), shouldn't you be able to monitor for when the process terminates?
If you absolutely don't want to write to disk yourself you can implement local HTTP server and serve attachemnts over HTTP (like http://localhost:3456/myrecord123/attachment1234.pdf).
Also I'm not sure if you get enough benefits doing such non-trivial work. You'll open files from local security zone that is slightly better then opening from disk... and no need to write to disk yourself. And you'll likely get somewhat reasonable warning if you have .exe file as attachment.
On tracking "process done with the attachment" you more or less out of luck: only in some cases the process that started openeing the file is the one that is actually using it. I.e. Office applications are usually one-instance applications, and as result document will be open in first instance of the application, not the one you've started.

Need help manipulating WAV (RIFF) Files at a byte level

I'm writing an an application in C# that will record audio files (*.wav) and automatically tag and name them. Wave files are RIFF files (like AVI) which can contain meta data chunks in addition to the waveform data chunks. So now I'm trying to figure out how to read and write the RIFF meta data to and from recorded wave files.
I'm using NAudio for recording the files, and asked on their forums as well on SO for way to read and write RIFF tags. While I received a number of good answers, none of the solutions allowed for reading and writing RIFF chunks as easily as I would like.
But more importantly I have very little experience dealing with files at a byte level, and think this could be a good opportunity to learn. So now I want to try writing my own class(es) that can read in a RIFF file and allow meta data to be read, and written from the file.
I've used streams in C#, but always with the entire stream at once. So now I'm little lost that I have to consider a file byte by byte. Specifically how would I go about removing or inserting bytes to and from the middle of a file? I've tried reading a file through a FileStream into a byte array (byte[]) as shown in the code below.
System.IO.FileStream waveFileStream = System.IO.File.OpenRead(#"C:\sound.wav");
byte[] waveBytes = new byte[waveFileStream.Length];
waveFileStream.Read(waveBytes, 0, waveBytes.Length);
And I could see through the Visual Studio debugger that the first four byte are the RIFF header of the file.
But arrays are a pain to deal with when performing actions that change their size like inserting or removing values. So I was thinking I could then to the byte[] into a List like this.
List<byte> list = waveBytes.ToList<byte>();
Which would make any manipulation of the file byte by byte a whole lot easier, but I'm worried I might be missing something like a class in the System.IO name-space that would make all this even easier. Am I on the right track, or is there a better way to do this? I should also mention that I'm not hugely concerned with performance, and would prefer not to deal with pointers or unsafe code blocks like this guy.
If it helps at all here is a good article on the RIFF/WAV file format.
I did not write in C#, but can point on some places which are bad from my point of view:
1) Do not read whole WAV files in memory unless the files are your own files and knowingly have small size.
2) There is no need to insert a data in memory. You can simply for example do about the following: Analyze source file, store offsets of chunks, and read metadata in memory; present the metadata for editing in a dialog; while saving write RIFF-WAV header, fmt chunk, transfer audio data from source file (by reading and writing blocks), add metadata; update RIFF-WAV header.
3) Try save metadata in the tail of file. This will results in alternating only tag will not require re-writing of whole file.
It seems some sources regarding working with RIFF files in C# are present here.

C#: Archiving a File into Parts of 100MB

In my application, the user selects a big file (>100 mb) on their drive. I wish for the program to then take the file that was selected and chop it up into archived parts that are 100 mb or less. How can this be done? What libraries and file format should I use? Could you give me some sample code? After the first 100mb archived part is created, I am going to upload it to a server, then I will upload the next 100mb part, and so on until the upload is finished. After that, from another computer, I will download all these archived parts, and then I wish to connect them into the original file. Is this possible with the 7zip libraries, for example? Thanks!
UPDATE: From the first answer, I think I'm going to use SevenZipSharp, and I believe I understand now how to split a file into 100mb archived parts, but I still have two questions:
Is it possible to create the first 100mb archived part and upload it before creating the next 100mb part?
How do you extract a file with SevenZipSharp from multiple splitted archives?
UPDATE #2: I was just playing around with the 7-zip GUI and creating multi-volume/split archives, and I found that selecting the first one and extracting from it will extract the whole file from all of the split archives. This leads me to believe that paths to the subsequent parts are included in the first one (or is it consecutive?). However, I'm not sure if this would work directly from the console, but I will try that now, and see if it solves question #2 from the first update.
Take a look at SevenZipSharp, you can use this to create your spit 7z files, do whatever you want to upload them, then extract them on the server side.
To split the archive look at the SevenZipCompressor.CustomParameters member, passing in "v100m". (you can find more parameters in the 7-zip.chm file from 7zip)
You can split the data into 100MB "packets" first, and then pass each packet into the compressor in turn, pretending that they are just separate files.
However, this sort of compression is usually stream-based. As long as the library you are using will do its I/O via a Stream-derived class, it would be pretty simple to implement your own Stream that "packetises" the data any way you like on the fly - as data is passed into your Write() method you write it to a file. When you exceed 100MB in that file, you simply close that file and open a new one, and continue writing.
Either of these approaches would allow you to easily upload one "packet" while continuing to compress the next.
edit
Just to be clear - Decompression is just the reverse sequence of the above, so once you've got the compression code working, decompression will be easy.

Categories