Reading a file without causing access denial to other processes - c#

I've been thinking about writing a small specialized backup app, similar to newly introduced file history in Windows 8. The basic idea is to scan some directories every N hours for changed files and copy them to another volume. The problem is, some other apps may request access to these files while they are being backed up and get an access denial, potentially causing all kinds of nasty problems.
I far as i can tell, there are several approaches to that problem:
1) Using Volume Shadow Copy service
From my point of view, the future of this thing is uncertain and it's overhead during heavy IO loads may cripple the system.
2) Using Sharing Mode when opening files
Something like this mostly works...
using (var stream = new FileStream("test.txt", FileMode.Open, FileAccess.Read,
FileShare.Delete | FileShare.ReadWrite | FileShare.Read | FileShare.Write))
{
[Copy data]
}
... until some other process request access to the same file without FileShare.Read, at which point an IOException will be thrown.
3) Using Opportunistic Lock that may be "broken" by other (write?) requests.
This behaviour of FileIO.ReadTextAsync looks exactly like what I want, but it also looks very implementation-specific and may be changed in the future. Does someone knows, how to explicitly oplock a file locally via C# or C++?
Maybe there is some simple C# method like File.TryReadBytes that provides such "polite" reading? I'm interested in the solutions that will work on Windows 7 and above.

My vote's on VSS. The main reason is that it doesn't interfere with other processes modifying your files, thus it provides consistency. A possible inconsistency pretty much defeats the purpose of a backup. The API is stable and I wouldn't worry about its future.

Related

How to handle properly temporary files?

Problem:
I have a web api which expose a method UploadFile, which will upload a file from a client to a specific directory of the server. The piece of code that handle the request and do the upload is the following:
var boundary = MultipartRequestHelper.GetBoundary(MediaTypeHeaderValue.Parse(Request.ContentType), _defaultFormOptions.MultipartBoundaryLengthLimit);
var reader = new MultipartReader(boundary, HttpContext.Request.Body);
try
{
// Read the form data.
var section = await reader.ReadNextSectionAsync();
// This illustrates how to get the file names.
while (section != null)
{
var hasContentDispositionHeader = ContentDispositionHeaderValue.TryParse(section.ContentDisposition, out ContentDispositionHeaderValue contentDisposition);
if (hasContentDispositionHeader)
{
if (MultipartRequestHelper.HasFileContentDisposition(contentDisposition))
{
targetFilePath = Path.Combine(root, contentDisposition.FileName.ToString());
using (var targetStream = System.IO.File.Create(targetFilePath))
{
await section.Body.CopyToAsync(targetStream);
//_logger.LogInformation($"Copied the uploaded file '{targetFilePath}'");
}
}
I always calledthis method using the following statement:
bool res = await importClient.UploadFileAsync(filePath);
where UploadFileAsync (which is on the client) build the request in this way:
var requestContent = new MultipartFormDataContent();
var array = File.ReadAllBytes(filePath);
var fileContent = new ByteArrayContent(array);
fileContent.Headers.ContentType = MediaTypeHeaderValue.Parse("application/octet-stream");
requestContent.Add(fileContent, "file", Path.GetFileName(filePath));
As you can see, this method expect a file name/path to work, this means that the file must "exist" somewhere in the client machine. I've used this method without any problem until now. I have a very specific case in which i need to upload something needed on the server that the user don't want to save on his client.
Possible solutions:
The first thing that i thought was to manually create a file in the client, and after the upload delete it. However I'm not very happy with this solution cause i need to handle everything manually
I can use the System.IO.Path.GetTempFileName() method, which will create a file in the temporary directory, but i'm not quite sure how the cancellation of the files is handled
I can use the TempFileCollection, but it seems more or less a mix of the previous point. I can technically create this collection in a using statement to get rid of it when the upload is done
I'm inexperienced about these topics, so I'm not sure which solution could fit best this scenario
My requirements are that i need to be 100% sure that the file is deleted after the upload is done, and i would like the solution to be "async friendly", i.e. i need the whole process to keep going without problems.
EDIT: I see a little bit of confusion. My problem is not how to handle the files on the server. That part is not a problem. I need to handle "temporary" files on the client.
Once you write something on the disk you can't be 100% that you will able to delete it. Moreover, even if you delete the file, you can't be sure that file can't be recovered.
So you have to ask why I need to delete the file. If it contains some secret, keep it in memory. If you can't fit the file into memory, write it encrypted on the disk and keep only key in the memory.
If you relax 100% to 99%, I would go for creating a file with Path.GetTempFileName and deleting it in finally block.
If 99% is not enough but 99.98% is, I would store names of created temporary files in persistent storage and regularly check if they are deleted.
For completition i'm writing the solution i used based on the suggestions i received here. Also the filename written as i did grant that statistically you won't have 2 temporary file with the same name
try
{
string file = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".xml";
tempFile = Path.GetFileName(file);
using (FileStream fs = new FileStream(file, FileMode.Create, FileAccess.Write, FileShare.None))
{
XmlSerializer serializer = new XmlSerializer(typeof(FileTemplate));
serializer.Serialize(fs, w.Template);
}
}
catch(Exception ex)
{
logger.Error(ex.Message);
//...
}
finally
{
//.... do stuff
File.Delete(tempFile );
}
You clearly shouldn't be using a file, in fact you don't want your data to ever leave RAM. You need to use "secure" memory storage so that the data is "guaranteed" to be pinned to physical RAM, untouched by the garbage collector, "never" paged out to swap. I use the quotes, because all those terms are somewhat misleading: the implementation isn't secure in absolute sense, it's just more secure than writing stuff to a disk file. Absolute security is impossible.
There are no common mechanisms that guarantee deletion of anything: the machine could "die" at any point between the writing of the data to the file, and whatever deletion operation you'd use to wipe the file "clean". Then you have no guarantee that e.g. the SSD or the hard drive won't duplicate the data should e.g. a sector become bad and need to be reallocated. You seem to wish to deal with several layers of underdocumented and complex (and often subtly buggy) layers of software when you talk about files:
The firmware in the storage device controller.
The device driver for the storage device.
The virtual memory system.
The filesystem driver.
The virtual filesystem layer (present in most OSes).
The .net runtime (and possibly the C runtime, depending on implementation).
By using a file you're making a bet that all those layers will do exactly what you want them to do. That won't usually be the case unless you tightly control all of these layers (e.g. you deploy a purpose-made linux distribution that you audit, and you use your own flash storage firmware or use linux memory technology driver that you'd audit too).
Instead, you can limit your exposure to just the VM system and the runtime. See e.g. this answer; it's easy to use:
using (var secret = new SecureArray<byte>(secretLength))
{
DoSomethingSecret(secret.Buffer);
}
SecureArray makes it likely that secret.Buffer stays in RAM - but you should audit that code as well, since, after all, you need it to do what it does, with your reputation possibly at stake, or legal liability, etc.
A simple test that can give you some peace of mind would involve a small test application that writes a short pseudorandom sequence to secret.Buffer, and then sleeps. Let this run in the background for a few days as you use your computer, then forcibly power it down (on a desktop: turn the on-off switch on the power supply to "off" position). Then boot up from a linux live CD, and run a search for some chunk of the pseudorandom sequence on the raw disk device. The expected outcome is that no identifiable part of the sequence has leaked to disk (say nothing larger than 48-64 bits). Even then you can't be totally sure, but this will thwart the majority of attempts at recovering the information...
...until someone takes the customer's system, dumps liquid nitrogen on the RAM sticks, shuts down the power, then transfers RAM to a readout device you can put together for
...or until they get malware on the system where the software runs, and it helpfully streams out RAM contents over internet, because why not.
...or until someone injects their certificate into the trust root on just one client machine, and MITM-s all the data elsewhere on the client's network.
And so on. It's all a tradeoff: how sure you wish to be that the data doesn't leak? I suggest getting the exact requirements from the customer in writing, and they must agree that they understand that it's not possible to be completely sure.

Should I reuse a FileStream/BinaryWriter object?

Update: After looking in the event log at around the time this occurred, I get the message: "The server was unable to allocate from the system nonpaged pool because the pool was empty." repeated continually throughout the log, until it was rebooted.
I am writing a class that writes debugging information to a file, up until now the class has worked fine, however I am now starting to stress-test my application (by running it at 1000x times faster than normal) and this has caused an unusual error to occur.
The problem I am seeing is that after a long period of time (4 hours+) my application crashes and seems to take out Windows with it; I can no longer open up Windows Explorer or any other application. A system reboot seems to solve the issue, however when I do the file I am writing to is blank.
This makes me think that perhaps the issue is related to open file handles; perhaps Windows is reaching it's limit of open file handles somehow?
So, here comes the related question; here is the main function that writes data to the file. As you can see, FileStream and BinaryWriter objects are created with each call to this function, wrapped in using statements to ensure they are properly Closed/Disposed.
/// <summary>
/// This is called after changing any
/// stats data, or on initial startup.
/// It saves the current stats to file.
/// </summary>
public void UpdateStatsData()
{
lock (this.lockObject)
{
using (FileStream fileStream = new FileStream(Constants.StatsFile, FileMode.Create, FileAccess.Write, FileShare.None, 128, FileOptions.WriteThrough))
{
using (BinaryWriter binWriter = new BinaryWriter(fileStream))
{
binWriter.Write(this.serverStats.APM);
binWriter.Write(this.serverStats.AverageJackpotWin);
binWriter.Write(this.serverStats.AverageWinnings);
binWriter.Write(this.serverStats.NumberOfGamesPlayed);
binWriter.Write(this.serverStats.NumberOfJackpots);
binWriter.Write(this.serverStats.RunningPercentage);
binWriter.Write(this.serverStats.SiteID);
binWriter.Write(this.serverStats.TotalJackpotsValue);
binWriter.Write(this.serverStats.TotalStaked);
binWriter.Write(this.serverStats.TotalWinnings);
}
}
}
}
Is it possible that this function, when called very rapidly, could cause file handles to slowly build up and eventually exceed Windows' maximum?
A possible solution involves making the FileStream and BinaryWriter objects private member variables of the class, creating them in the constructor, and then overwriting the data with each call.
.
/// <summary>
/// This should be called after changing any
/// stats data, or on initial startup.
/// It saves the current stats to a serialized file.
/// </summary>
public void UpdateStatsData()
{
lock (this.lockObject)
{
// Seek to the beginning of the file.
this.binWriter.BaseStream.Seek(0, SeekOrigin.Begin);
// Write the stats data over the existing data.
this.binWriter.Write(this.serverStats.APM);
this.binWriter.Write(this.serverStats.AverageJackpotWin);
this.binWriter.Write(this.serverStats.AverageWinnings);
this.binWriter.Write(this.serverStats.NumberOfGamesPlayed);
this.binWriter.Write(this.serverStats.NumberOfJackpots);
this.binWriter.Write(this.serverStats.RunningPercentage);
this.binWriter.Write(this.serverStats.SiteID);
this.binWriter.Write(this.serverStats.TotalJackpotsValue);
this.binWriter.Write(this.serverStats.TotalStaked);
this.binWriter.Write(this.serverStats.TotalWinnings);
}
}
However, while it may be quicker and only mean using one FileStream, how do I ensure that the FileStream and BinaryWriter are Closed/Disposed properly on application shutdown?
The combination of parameters to the FileStream constructor look suspect to me (assuming that all threads log to the same file (Constants.StatsFile):
FileMode.Create = Always create the file. overwrite if it exists. you are deleting all previous logs with each entry into this method (might try OpenOrCreate or Append)
FileOptions.WriteThrough = no caching - force the disk to spin and force the thread to wait for the disk - slow
My guess: you are calling this method much more quickly than it can complete. Each call backs up on the lock statement waiting for the previous call to delete the file, write to it, and completely flush it to disk. After awhile you just run out of memory.
Assuming you didn't intend to delete the log file each time try this combination and see if things get better and at a minimum get rid of WriteThrough as that will make this method much faster:
using (FileStream fileStream = new FileStream(Constants.StatsFile, FileMode.Append,
FileAccess.Write, FileShare.None, 128, FileOptions.SequentialScan))
Running out of non-paged pool memory is a very serious mishap in Windows. Nothing good happens after that, drivers will fail to do their job, a reboot is required to recover from this.
Of course, it isn't normal for a user mode program (a managed one at that) to cause this to happen. Windows protects itself against this by giving a process a limited quota of the available system resources. There are many of them, a limit of 10,000 handles is an obvious one that strikes pretty often if a program leaks handles.
Memory from the non-paged pool is exclusively allocated by drivers. They need that kind of precious memory because they use memory at device interrupt time. A critical time where it isn't possible to map memory from the paging file. The pool is small, it needs to be because it permanently occupies RAM. It depends on the amount of RAM your machine has, typically 256 MB max for a machine with 1 GB of RAM. You can see its current size in TaskMgr.exe, Performance tab. I'm giving it a decent workaround right now, it is currently showing 61 MB.
Clearly your program is making a driver on your machine consume too much non-page pool memory. Or it is leaking, possibly induced by the heavy workout you give it. Windows is powerless to prevent this, quotas are associated with processes, not drivers. You'll have to find the driver that misbehaves. It would be one that's associated with the file system or the disk. A very common one that causes trouble like this is, you probably guessed it by now, your virus scanner.
Most of this code looks fine to me -- you should have no problem re-creating the FileStreams like you are.
The only thing that jumps out at me is that your lockObject is not static. That's potentially a big problem -- multiple instances of the class will cause blocking not to occur, which means you might be running into some strange condition caused by multiple threads running the same code at the same time. Who knows, under load you could be creating thousands of open file handles all at the same time.
I see nothing wrong with the first in terms of handle closure. I do with the second; specifically the very issues you ask about. You could make your class disposable and then ideally close it during a "controlled" shut-down, while depending on the file object's finaliser to take care of matters during exceptional shut-down, but I'm not sure you're fixing the right issue.
What measurements of open file handles confirm your suspicion that this is the issue? It's reasonable to suspect open file handles when you are indeed opening lots of files, but it's foolish to "fix" that unless either A) examining the code shows it will obviously have this problem (not the case here) or B) you've shown that such file handles are indeed too high.
Does the app leave an exception in the event viewer on crashing?

How do I avoid excessive Network File I/O when appending to a large file with .NET?

I have a program that opens a large binary file, appends a small amount of data to it, and closes the file.
FileStream fs = File.Open( "\\\\s1\\temp\\test.tmp", FileMode.Append, FileAccess.Write, FileShare.None );
fs.Write( data, 0, data.Length );
fs.Close();
If test.tmp is 5MB before this program is run and the data array is 100 bytes, this program will cause over 5MB of data to be transmitted across the network. I would have expected that the data already in the file would not be transmitted across the network since I'm not reading it or writing it. Is there any way to avoid this behavior? This makes it agonizingly slow to append to very large files.
0xA3 provided the answer in a commment above. The poor performance was due to an on-access virus scan. Each time my program opened the file, the virus scanner read the entire contents of the file to check for viruses even though my program didn't read any of the existing content. Disabling the on-access virus scan eliminated the excessive network I/O and the poor performance.
Thanks to everyone for your suggestions.
I found this on MSDN (CreateFile is called internally):
When an application creates a file across a network, it is better to use GENERIC_READ | GENERIC_WRITE for dwDesiredAccess than to use GENERIC_WRITE alone. The resulting code is faster, because the redirector can use the cache manager and send fewer SMBs with more data. This combination also avoids an issue where writing to a file across a network can occasionally return ERROR_ACCESS_DENIED.
Using Reflector, FileAccess maps to dwDesiredAccess, so it would seem to suggest using FileAccess.ReadWrite instead of just FileAccess.Write.
I have no idea if this will help :)
You could cache your data into a local buffer and periodically (much less often than now) append to the large file. This would save on a bunch of network transfers but... This would also increase the risk of losing that cache (and your data) in case your app crashes.
Logging (if that's what it is) of this type is often stored in a db. Using a decent RDBMS would allow you to post that 100 bytes of data very frequently with minimal overhead. The caveat there is the maintenance of an RDBMS.
If you have system access or perhaps a friendly admin for the machine actually hosting the file you could make a small listener program that sits on the other end.
You make a call to it passing just the data to be written and it does the write locally, avoiding the extra network traffic.
The File object in .NET has quite a few static methods to handle this type of thing. I would suggest trying:
File file = File.AppendAllText("FilePath", "What to append", Encoding.UTF8);
When you reflect this method it turns out that it's using:
using (StreamWriter writer = new StreamWriter(path, true, encoding))
{
writer.Write(contents);
}
This StreamWriter method should allow you to simply append something to the end (at least this is the method I've seen used in every instance of logging that I've encountered so far).
Write the data to separate files, then join them (do it on the hosting machine if possible) only when necessary.
I did some googling and was looking more at how to read excessively large files quickly and found this link https://web.archive.org/web/20190906152821/http://www.4guysfromrolla.com/webtech/010401-1.shtml
The most interesting part there would be the part about byte reading:
Besides the more commonly used ReadAll and ReadLine methods, the TextStream object also supports a Read(n) method, where n is the number of bytes in the file/textstream in question. By instantiating an additional object (a file object), we can obtain the size of the file to be read, and then use the Read(n) method to race through our file. As it turns out, the "read bytes" method is extremely fast by comparison:
const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath("myfile.txt"))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)
strSearchThis = objTS.Read(objFile.Size)
if instr(strSearchThis, "keyword") > 0 then
Response.Write "Found it!"
end if
This method could then be used by you to go to the end of the file and manually appending it instead of loading the entire file in append mode with a filestream.

Detect file 'COPY' operation in Windows

Say I want to be informed whenever a file copy is launched on my system and get the file name, the destination where it is being copied or moved and the time of copy.
Is this possible? How would you go about it? Should you hook CopyFile API function?
Is there any software that already accomplishes this?
Windows has the concept of I/O filters which allow you to intercept all I/O operations and choose to perform additional actions as a result. They are primarily used for A/V type scenarios but can be programmed for a wide variety of tasks. The SysInternals Process Monitor for example uses a I/O filter to see the file level access.
You can view your current filters using MS Filter Manager, (fltmc.exe from a command prompt)
There is a kit to help you write filters, you can get the drivers and develop your own.
http://www.microsoft.com/whdc/driver/filterdrv/default.mspx is a starting place to get in depth info
As there is a .NET tag on this question, I would simply use System.IO.FileSystemWatcher that's in the .NET Framework. I'm guessing it is implemented using the I/O Filters that Andrew mentions in his answer, but I really do not know (nor care, exactly). Would that fit your needs?
As Andrew says a filter driver is the way to go.
There is no foolproof way of detecting a file copy as different programs copy files in different ways (some may use the CopyFile API, others may just read one file and write out the contents to another themselves). You could try calculating a hash in your filter driver of any file opened for reading, and then do the same after a program finishes writing to a file. If the hashes match you know you have a file copy. However this technique may be slow. If you just hook the CopyFile API you will miss file copies made without that API. Java programs (to name but one) have no access to the CopyFile API.
This is likely impossible as there is no guaranteed central method for performing a copy/move. You could hook into a core API (like CopyFile) but of course that means that you will still miss any copy/move that any application does without using this API.
Maybe you could watch the entire filesystem with IO filters for open files and then just draw conclusions yourself if two files with same names and same filesizes are open at the same time. But that no 100% solution either.
As previously mentioned, a file copy operation can be implemented in various ways and may involve several disk and memory transfers, therefore is not possible to simply get notified by the system when such operation occurs.
Even for the user, there are multiple ways to duplicate content and entire files. Copy commands, "save as", "send to", move, using various tools. Under the hood the copy operation is a succession of read / write, correlated by certain parameters. That is the only way to guarantee successful auditing. Hooking on CopyFile will not give you the copy operations of Total Commander, for example. Nor will it give you "Save as" operations which are in fact file create -> file content moved -> closing of original file -> opening of the new file. Then, things are different when dealing with copy over network, impersonated copy operations where the file handle security context is different than the process security context, and so on. I do not think that there is a straightforward way to achieve all of the above.
However, there is a software that can notify you for most of the common copy operations (i.e. when they are performed through windows explorer, total commander, command prompt and other applications). It also gives you the source and destination file name, the timestamp and other relevant details. It can be found here: http://temasoft.com/products/filemonitor.
Note: I work for the company which develops this product.

How to avoid File Blocking

We are monitoring the progress of a customized app (whose source is not under our control) which writes to a XML Manifest. At times , the application is stuck due to unable to write into the Manifest file. Although we are covering our traces by explicitly closing the file handle using File.Close and also creating the file variables in Using Blocks. But somehow it keeps happening. ( Our application is multithreaded and at most three threads might be accessing the file. )
Another interesting thing is that their app updates this manifest at three different events(add items, deleting items, completion of items) but we are only suffering about one event (completion of items). My code is listed here
using (var st = new FileStream(MenifestPath, FileMode.Open, FileAccess.Read))
{
using (TextReader r = new StreamReader(st))
{
var xml = r.ReadToEnd();
r.Close();
st.Close();
//................ Rest of our operations
}
}
If you are only reading from the file, then you should be able to pass a flag to specify the sharing mode. I don't know how you specify this in .NET, but in WinAPI you'd pass FILE_SHARE_READ | FILE_SHARE_WRITE to CreateFile().
I suggest you check your file API documentation to see where it mentions sharing modes.
Two things:
You should do the rest of your operations outside the scopes of the using statements. This way, you won't risk using the closed stream and reader. Also, you needn't use the Close methods, because when you exit the scope of the using statement, Dispose is called, which is equivalent.
You should use the overload that has the FileShare enumeration. Locking is paranoid in nature, so the file may be locked automatically to protect you from yourself. :)
HTH.
The problem is different because that person is having full control on the file access for all processes while as i mentioned ONE PROCESS IS THIRD PARTY WITH NO SOURCE ACCCESS. And our applications are working fine. However, their application seems stuck if they cant get hold the control of file. So i am willing to find a method of file access that does not disturb their running.
This could happen if one thread was attempting to read from the file while another was writing. To avoid this type of situation where you want multiple readers but only one writer at a time, make use of the ReaderWriterLock or in .NET 2.0 the ReaderWriterLockSlim class in the System.Threading namespace.
Also, if you're using .NET 2.0+, you can simplify your code to just:
string xmlText = File.ReadAllText(ManifestFile);
See also: File.ReadAllText on MSDN.

Categories