I have an application that reads a linked list1 from a file when it starts, and write it back to the file when it ends. I choose truncate as the file mode when writing back. However, truncate sounds a little bit dangerous to me as it clears the whole content first. Thus if something goes wrong, I cannot get my old stuff back. Is there any better alternative?
1: I use a linked list because the order of items may change. Thus I later use truncate to update the whole file.
The right answer reputation goes to Hans as he first pointed out File.Replace(), though it is not available for Silverlight for now.
Write to a new temporary file. When finished and satisfied with the result, delete the old file and rename/copy the new temporary file into the original file's location. This way, should anything go wrong, you are not losing data.
As pointed out in Hans Passants answer, you should use File.Replace for maximum robustness when replacing the original file.
This is covered well by the .NET framework. Use the File.Replace() method. It securely replaces the content of your original file with the content of another file, leaving the original in tact if there's any problem with the file system. It is a better mouse trap than the upvoted answers, they'll fail when there's a pending delete on the original file.
There's an overload that lets you control whether the original file is preserved as a backup file. It is best if you let the function create the backup, it significantly increases the odds that the function will succeed when another process has a lock on your file, the most typical failure mode. They'll get to keep the lock on the backup file. The method also works best when you create the intermediate file on the same drive as the original so you'll want to avoid GetTempFileName(). A good way to generate a filename is Guid.NewGuid().ToString().
The "best" alternative for robustness would be to do the following:
Create a new file for the data you're persisting to disk
Write the data out to the new file
Perform any necessary data verification
Delete the original file
Move the new file to the original file location
You can use System.IO.Path.GetTempFileName to provide you with a uniquely named temporary file to use for step 1.
You have thought to use truncate, so I assume your input data is always anew, therefore....
try ... catch to rename your original file to something like 'originalname_day_month_year.bak'
Write ex-novo your file with new data.
In this way you don't have to worry to loose anything and, as a side effect, you have a backup copy of your previous data. If that backup is not needed, you can always delete the backup file.
Related
Here is the situation. I have over 1500 SQL (text) files that contain a "GRANT some privilege ON some table TO some user after the CREATE TABLE statement. I need to remove them from the original file and put the GRANT statements in their own file. Sometimes the Grants are on a single line and sometimes they are split across multiple lines. For example:
GRANT SELECT ON XYZ.TABLE1 TO MYROLE1 ;
GRANT
SELECT ON XYZ.TABLE1 TO MYROLE2 ;
GRANT
DELETE,
INSERT,
SELECT,
UPDATE ON XYZ.TABLE1 TO MYROLE3;
I am reading through the file until I get to the GRANT and then building a string containing the text from the GRANT to the semicolon which I then write out to another file. I have an app I wrote in Delphi (Pascal) and this part works great. What I would like to do is after I read and have processed the line I want, I would like to delete the line from the original text file. I can't do this in the Delphi. The only solution there is to read the file line by line and write the file back out to another file excluding the lines I don't want while also writing the GRANTS to yet another file. Then delete the original and rename the new. Way too much processing and risk.
I looked at using the StreamReader and StreamWriter in C# but it appears to be similar situation to Delphi. I can Read or I can Write but I can't do both to the same file at the same time.
I would appreciate any suggestions or recommendations.
Thanks
If you think there's "way too much processing and risk" in generating the a new temporary file without the lines you don't want and replacing the original: then consider the alternative you're hoping to achieve.
Line 1
Line 2
Delete this line
+-->Line 4
| Line 5
|
+- Read position marker after reading line to be deleted
If you immediately delete the line while reading, the later lines have to be moved back into the "empty space" left behind after the 3rd line is deleted. In order to ensure you next read "Line 4", you'd have to backtrack your read-position-marker. What's the correct amount to backtrack? A "line" of variable length, or the number of characters of the deleted line?
What you perceive to be the "risky" option is actually the safe option!
If you want to delete while processing you can use an abstraction that gives you that impression. But you lose the benefits of stream processing and don't really eliminate any of the risk you were worried about in the first place.
E.g. Load your entire file into a list of strings; such as an array, vector or TStringList (in Delphi). Iterate the list and delete the items you don't want. Finally save the list back to file.
This approach has the following disadvantages:
Potential high memory overhead because you load the entire file instead of small buffer for the stream.
You're at risk of mid-process failure with no recovery, because your job is all-or-nothing.
You have to deal with the nuances of the particular container you choose to hold your list of strings.
In some cases (e.g. TStringList) you might still need to backtrack your position marker in a similar fashion to the earlier description.
For arrays you'd have to copy all lines back 1 position every time you delete something with a huge performance cost. (The same happens in TStringList though it's hidden from you.)
Iterators for some containers are invalidated whenever you modify the list. This means you'd have to copy to a new list without the 'deleted lines' in any case. More memory overhead.
In conclusion, take the safe option. Use a separate read and write stream; write to a temporary file, and rename when done. It will save you headaches.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Removing the first line of a text file in C#
What would be the fastest and smartest way to remove the first line from a huge (think 2-3 GB) file?
I think, that you probably can't avoid rewriting the whole file chunk-by-chunk, but I might be wrong.
Could using memory-mapped files somehow help to solve this issue?
Is it possible to achieve this behavior by operating directly on the file system (NTFS, for example) - say, update the corresponding inode data and change the file starting sector, so that the first line is ignored? If yes, would this approach be really fragile or there are many other applications, except the OS itself that do something similiar?
NTFS by default on most volumes (but importantly not all!) stores data in 4096 byte chunks. These are referenced by the $MFT record, which you cannot edit directly because it's disallowed by the Operating System (for reasons of sanity). As a result, there is no trick available to operate on the filesystem to do something approaching what you want (in other words, you cannot directly reverse truncate a file on NTFS, even in filesystem chunk sized amounts.)
Because of the way files are stored in a filesystem, the only answer is that you must rewrite the entire file directly. Or figure out a different way to store your data. a 2-3GB file is massive and crazy, especially considering you referred to lines meaning that this data is at least in part text information.
You should look into putting this data into a database perhaps? Or organizing it a bit more efficiently at the very least.
You can overwrite every character that you want to erase with '\x7f'. Then, when reading in the file, your reader ignores that character. This assumes you have a text file that doesn't ever use the DEL character, of course.
std::istream &
my_getline (std::istream &in, std::string &s,
char del = '\x7f', char delim = '\n') {
std::getline(in, s, delim);
std::size_t beg = s.find(del);
while (beg != s.npos) {
std::size_t end = s.find_first_not_of(del, beg+1);
s.erase(beg, end-beg);
beg = s.find(del, beg+1);
}
return in;
}
As Henk points out, you could choose a different character to act as your DELETE. But, the advantage is that the technique works no matter which line you want to remove (it is not limited to the first line), and doesn't require futzing with the file system.
Using the modified reader, you can periodically "defragment" the file. Or, the defragmentation may occur naturally as the contents are streamed/merged into a different file or archived to a different machine.
Edit: You don't explicitly say it, but I am guessing this is for some kind of logging application, where the goal is to put an upper bound on the size of the log file. However, if that is the goal, it is much easier to just use a collection of smaller log files. Let's say you maintained roughly 10MB log files, with total logs bounded to 4GB. So that would be about 400 files. If the 401st file is started, for each line written there, you could use the DELETE marker on successive lines in the first file. When all lines have been marked for deletion, the file itself can be deleted, leaving you with about 400 files again. There is no hidden O(n2) behavior so long as the first file is not closed while the lines are being deleted.
But easier still is allow your logging system to keep the 1st and 401st file as is, and remove the 1st file when moving to the 402nd file.
Even if you could remove a leading block it would at least be a sector (512 bytes), probably not a match to the size of your line.
Consider a wrapper (maybe even a helper file) to just start reading from a certain offset.
Idea (no magic dust, only hard work below):
use user-mode file system such as http://www.eldos.com/cbfs/ or http://dokan-dev.net/en/ to WRAP around your real filesystem, and create a small book-keeping system to track how many of the file is 'eaten' at front. At certain time, when file grows too big, rewrite the file into another and start over.
How about that?
EDIT:
if you go with virtual file system, then you can use smaller (256mb) file fragments that you can glue into one 'virtual' file with desired offset. That way you won't ever need to re-write the file.
MORE:
Reflection on the idea on 'overwriting' first few lines with 'nothing' - don't do that, instead, add one 64-bit integer to the FRONT of the file, and use any method you like to skip that many bytes, for example Stream derivation that will wrap original stream and offset the reading in it.
I guess that might be better if you choose to use wrappers on the 'client' side.
Break the file in two , the first being the smaller chunk.
Remove the first line and then attach with the other.
This question already has answers here:
c# file move and overwrite [duplicate]
(4 answers)
Closed 2 years ago.
From the documentation of File.Move:
Note that if you attempt to replace a file by moving a file of the same name into that directory, you get an IOException. You cannot use the Move method to overwrite an existing file.
In short, you can't overwrite on Move, so in order to facilitate overwriting on Move I mimic the behavior by doing a File.Copy followed by a File.Delete. Something like:
if (File.Exists(dstFileName))
{
// System.IO.File.Move cannot be used to overwrite existing files, so we're going
// to simulate that behavior with a Copy & Delete.
File.Copy(procContext.FileName, dstFileName);
File.Delete(procContext.FileName);
}
else
File.Move(procContext.FileName, dstFileName);
My question is: Are there any situations that I need to guard against which could lead to the source file being deleted without it first being successfully copied?
My understanding from reading the documentation is that since File.Copy doesn't return anything that it should throw an exception in any case that it doesn't succeed. Has anyone encountered any situations where this isn't true?
I suggest you to probe first if the target file exists and if yes, delete it. Then execute a normal move operation.
Since this sequence is not atomic, in case the destination exists you might want to rename it instead of deleting it, to avoid losing it in case the move fails.
The correct way to do it would be to call
File.Replace(source, destination, copy)
That does the trick for me
It is difficult to simulate an atomic operation if the operating system doesn't give you good atomic operations. Move is atomic on some but not all filesystems, but not when you are moving disk to disk.
In case of the same disk, Delete + Move is somewhat elegant (fast and safe) as it does not really stuffle the data in any way. You could further extend it to
try
{
Move(dest, tmp);
Move(src, dest);
Delete(tmp);
}
catch
{
try
{
Move(tmp, dest);
}
catch
{
}
throw;
}
(This makes it less likely that you will lose the destination file when you for example do not have the rights necessary to finish the move.)
In a scenario where you do not know that it is the same disk, your solution is safe enough and simple enough. However, it copies the data even within the same disk, bringing you a wider window of risk of a power failure.
This is safe. File.Copy will either succeed entirely or throw. Of course, the delete could fail leaving the source file behind as garbage.
If your computer crashes, though, there is no guarantee that the copy oepration has hardened the data yet. You might loose data in that case.
During normal operations this is safe.
Check if file "Target" Exsists. If no, copy your file.
If yes: Move "Target" to temp dir, where you can be sure, that the move will be successful. You can generate a subdir in Temp with the name auf an UUID. Then copy your file.
I need to monitor a folder and its subdirectories for any file manipulations (add/remove/rename). I've read about FileSystemWatcher but I'd like to monitor changes between each time the program is run or when the user presses the "check for changes" button (FSW seems more orientated to runtime detection). My first thought was to iterate through all the (sub)directories and hash each file. Then, concatenate all the hashes (which have been ordered) and hash that. When I want to check for changes, I repeat the process and check if the hashes are the same.
Is this an efficient way of doing it?
Also, once I've detected a change, how do I find out what file has been added, removed or renamed as quickly as possible?
As a side note, I don't mind using scripts to do this if they're faster as long as those scripts don't require end users to install anything and the scripts can notify my C# app of the changes.
We handle this by storing all found files in a database along with their last modification time.
On each pass through the files, we check the database for each file: if it doesn't exist in the DB, it is new and if it does exist, but the timestamp is different, it has changed.
There is also an option to handle deleted files by marking all of the files in the database as ToBeDeleteed prior to the pass and clearing this if the file was found. Then, at the end of the process, we can just delete all of the records that are marked as ToBeDeleted.
Obviously you need to make "snapshots" of the directory tree and compare them as required. What exactly goes into the snapshots would depend on your requirements. Keep in mind that:
You need to store filenames in order to detect "new" and "deleted" files
File sizes and last-modified times are a good and cheap indicator that a file has or has not changed, but do not provide a guarantee
Hashing the contents of files can be prohibitively expensive if the files can be large, but it's the only way to know they have changed with a near-perfect degree of accuracy (remember that hashes can collide as well, so if you want mathematical 100% certainty that's not going to be good enough either)
What is difference between
Copying a file and deleting it using File.Copy() and File.Delete()
Moving the file using File.Move()
In terms of permission required to do these operations is there any difference? Any help much appreciated.
File.Move method can be used to move the file from one path to another. This method works across disk volumes, and it does not throw an exception if the source and destination are the same.
You cannot use the Move method to overwrite an existing file. If you attempt to replace a file by moving a file of the same name into that directory, you get an IOException. To overcome this you can use the combination of Copy and Delete methods
Performance wise, if on one and the same file system, moving a file is (in simplified terms) just adjusting some internal registers of the file system itself (possibly adjusting some nodes in a red/black-tree), without actually moving something.
Imagine you have 180MiB to move, and you can write onto your disk at roughly 30MiB/s. Then with copy/delete, it takes approximately 6 seconds to finish. With a simple move [same file system], it goes so fast you might not even realise it.
(I once wrote some transactional file system helpers that would move or copy multiple files, all or none; in order to make the commit as fast as possible, I moved/copied all stuff into a temporary sub-folder first, and then the final commit would move existent data into another folder (to enable rollback), and the new data up to the target).
I don't think there is any difference permission-wise, but I would personally prefer to use File.Move() since then you have both actions happening in the same "transaction". In other words if something on the move fails the whole operation fails. However, if you break it up in two steps (copy + delete) if copy worked and delete failed, you would have to reverse the "transaction" (delete the copy) manually.
Permission in file transfer is checked at two points: source, and destination. So, if you don't have read permission in source folder, or you don't have write permission in destination, then these methods both throw AccessDeniedException exception. In other words, permission checking is agnostic to method in use.