Problem:
I have a web api which expose a method UploadFile, which will upload a file from a client to a specific directory of the server. The piece of code that handle the request and do the upload is the following:
var boundary = MultipartRequestHelper.GetBoundary(MediaTypeHeaderValue.Parse(Request.ContentType), _defaultFormOptions.MultipartBoundaryLengthLimit);
var reader = new MultipartReader(boundary, HttpContext.Request.Body);
try
{
// Read the form data.
var section = await reader.ReadNextSectionAsync();
// This illustrates how to get the file names.
while (section != null)
{
var hasContentDispositionHeader = ContentDispositionHeaderValue.TryParse(section.ContentDisposition, out ContentDispositionHeaderValue contentDisposition);
if (hasContentDispositionHeader)
{
if (MultipartRequestHelper.HasFileContentDisposition(contentDisposition))
{
targetFilePath = Path.Combine(root, contentDisposition.FileName.ToString());
using (var targetStream = System.IO.File.Create(targetFilePath))
{
await section.Body.CopyToAsync(targetStream);
//_logger.LogInformation($"Copied the uploaded file '{targetFilePath}'");
}
}
I always calledthis method using the following statement:
bool res = await importClient.UploadFileAsync(filePath);
where UploadFileAsync (which is on the client) build the request in this way:
var requestContent = new MultipartFormDataContent();
var array = File.ReadAllBytes(filePath);
var fileContent = new ByteArrayContent(array);
fileContent.Headers.ContentType = MediaTypeHeaderValue.Parse("application/octet-stream");
requestContent.Add(fileContent, "file", Path.GetFileName(filePath));
As you can see, this method expect a file name/path to work, this means that the file must "exist" somewhere in the client machine. I've used this method without any problem until now. I have a very specific case in which i need to upload something needed on the server that the user don't want to save on his client.
Possible solutions:
The first thing that i thought was to manually create a file in the client, and after the upload delete it. However I'm not very happy with this solution cause i need to handle everything manually
I can use the System.IO.Path.GetTempFileName() method, which will create a file in the temporary directory, but i'm not quite sure how the cancellation of the files is handled
I can use the TempFileCollection, but it seems more or less a mix of the previous point. I can technically create this collection in a using statement to get rid of it when the upload is done
I'm inexperienced about these topics, so I'm not sure which solution could fit best this scenario
My requirements are that i need to be 100% sure that the file is deleted after the upload is done, and i would like the solution to be "async friendly", i.e. i need the whole process to keep going without problems.
EDIT: I see a little bit of confusion. My problem is not how to handle the files on the server. That part is not a problem. I need to handle "temporary" files on the client.
Once you write something on the disk you can't be 100% that you will able to delete it. Moreover, even if you delete the file, you can't be sure that file can't be recovered.
So you have to ask why I need to delete the file. If it contains some secret, keep it in memory. If you can't fit the file into memory, write it encrypted on the disk and keep only key in the memory.
If you relax 100% to 99%, I would go for creating a file with Path.GetTempFileName and deleting it in finally block.
If 99% is not enough but 99.98% is, I would store names of created temporary files in persistent storage and regularly check if they are deleted.
For completition i'm writing the solution i used based on the suggestions i received here. Also the filename written as i did grant that statistically you won't have 2 temporary file with the same name
try
{
string file = System.IO.Path.GetTempPath() + Guid.NewGuid().ToString() + ".xml";
tempFile = Path.GetFileName(file);
using (FileStream fs = new FileStream(file, FileMode.Create, FileAccess.Write, FileShare.None))
{
XmlSerializer serializer = new XmlSerializer(typeof(FileTemplate));
serializer.Serialize(fs, w.Template);
}
}
catch(Exception ex)
{
logger.Error(ex.Message);
//...
}
finally
{
//.... do stuff
File.Delete(tempFile );
}
You clearly shouldn't be using a file, in fact you don't want your data to ever leave RAM. You need to use "secure" memory storage so that the data is "guaranteed" to be pinned to physical RAM, untouched by the garbage collector, "never" paged out to swap. I use the quotes, because all those terms are somewhat misleading: the implementation isn't secure in absolute sense, it's just more secure than writing stuff to a disk file. Absolute security is impossible.
There are no common mechanisms that guarantee deletion of anything: the machine could "die" at any point between the writing of the data to the file, and whatever deletion operation you'd use to wipe the file "clean". Then you have no guarantee that e.g. the SSD or the hard drive won't duplicate the data should e.g. a sector become bad and need to be reallocated. You seem to wish to deal with several layers of underdocumented and complex (and often subtly buggy) layers of software when you talk about files:
The firmware in the storage device controller.
The device driver for the storage device.
The virtual memory system.
The filesystem driver.
The virtual filesystem layer (present in most OSes).
The .net runtime (and possibly the C runtime, depending on implementation).
By using a file you're making a bet that all those layers will do exactly what you want them to do. That won't usually be the case unless you tightly control all of these layers (e.g. you deploy a purpose-made linux distribution that you audit, and you use your own flash storage firmware or use linux memory technology driver that you'd audit too).
Instead, you can limit your exposure to just the VM system and the runtime. See e.g. this answer; it's easy to use:
using (var secret = new SecureArray<byte>(secretLength))
{
DoSomethingSecret(secret.Buffer);
}
SecureArray makes it likely that secret.Buffer stays in RAM - but you should audit that code as well, since, after all, you need it to do what it does, with your reputation possibly at stake, or legal liability, etc.
A simple test that can give you some peace of mind would involve a small test application that writes a short pseudorandom sequence to secret.Buffer, and then sleeps. Let this run in the background for a few days as you use your computer, then forcibly power it down (on a desktop: turn the on-off switch on the power supply to "off" position). Then boot up from a linux live CD, and run a search for some chunk of the pseudorandom sequence on the raw disk device. The expected outcome is that no identifiable part of the sequence has leaked to disk (say nothing larger than 48-64 bits). Even then you can't be totally sure, but this will thwart the majority of attempts at recovering the information...
...until someone takes the customer's system, dumps liquid nitrogen on the RAM sticks, shuts down the power, then transfers RAM to a readout device you can put together for
...or until they get malware on the system where the software runs, and it helpfully streams out RAM contents over internet, because why not.
...or until someone injects their certificate into the trust root on just one client machine, and MITM-s all the data elsewhere on the client's network.
And so on. It's all a tradeoff: how sure you wish to be that the data doesn't leak? I suggest getting the exact requirements from the customer in writing, and they must agree that they understand that it's not possible to be completely sure.
Related
I have a class implementing a Log file writer.
Logs must be written for the application to "work correctly", so it is of the utmost importance that the writings to disk are ok.
The log file is kept open for the whole life of the application, and write operations are accordingly very fast:
var logFile = new FileInfo(filepath);
_outputStream = logFile.Open(FileMode.Append, FileAccess.Write, FileShare.Read);
Now, I need to synchronize this file to a network path, during application lifetime.
This network copy can be slightly delayed without problems. The important bit is that I have to guarantee that it doesn't interfere with log writing.
Given this network copy must be eventually consistent, I need to make sure that all file contents are written, instead only the last message(s).
A previous implementation used heavy locking and a simple System.IO.File.Copy(filepath, networkPath, true), but I would like to lock as little as possible.
How could I approach this problem? I'm out of ideas.
I've been thinking about writing a small specialized backup app, similar to newly introduced file history in Windows 8. The basic idea is to scan some directories every N hours for changed files and copy them to another volume. The problem is, some other apps may request access to these files while they are being backed up and get an access denial, potentially causing all kinds of nasty problems.
I far as i can tell, there are several approaches to that problem:
1) Using Volume Shadow Copy service
From my point of view, the future of this thing is uncertain and it's overhead during heavy IO loads may cripple the system.
2) Using Sharing Mode when opening files
Something like this mostly works...
using (var stream = new FileStream("test.txt", FileMode.Open, FileAccess.Read,
FileShare.Delete | FileShare.ReadWrite | FileShare.Read | FileShare.Write))
{
[Copy data]
}
... until some other process request access to the same file without FileShare.Read, at which point an IOException will be thrown.
3) Using Opportunistic Lock that may be "broken" by other (write?) requests.
This behaviour of FileIO.ReadTextAsync looks exactly like what I want, but it also looks very implementation-specific and may be changed in the future. Does someone knows, how to explicitly oplock a file locally via C# or C++?
Maybe there is some simple C# method like File.TryReadBytes that provides such "polite" reading? I'm interested in the solutions that will work on Windows 7 and above.
My vote's on VSS. The main reason is that it doesn't interfere with other processes modifying your files, thus it provides consistency. A possible inconsistency pretty much defeats the purpose of a backup. The API is stable and I wouldn't worry about its future.
I'm currently using filestreams to copy files form one location to another.
It all functioned as intended until now when I suddenly have the problemn that File.open freezes the thread that it is running in.
FileStream sourceStream = File.Open(filePath, FileMode.Open)
It only happens for 1 specific file (3 GB in size). The interesting thing is one day prior it functioned normally though for this file so it can't be the file size. Next thing I checked was if some sort of exception was thrown that I don't catch.
I put a try / catch block the whole thing (normally I use the calling method to catch the exceptions) and still same effect.
try
{
FileStream sourceStream = File.Open(filePath, FileMode.Open);
sourceStream.Close();
}
catch (Exception e)
{
Console.Write("A");
}
I also checked what happens if the file is being accessed already. Then an exception is thrown (tested it for other files as like I said for this specific file it always hangs up the thread now when I try to open it).
The file is located on the local harddrive and other files (smaller though) in the same folder don't show this problem.
As I'm now running out of ideas what the possible reason could be, my question is:
What could possible reasons for this unexpected behaviour be and how can they be adverted?
EDIT:
It now functions again (just when I tried to use the process monitor it started functioning again).
So in total no clue what could have caused the phenomenon. If anyone would have an idea what could be a possible reason for this it would be good to know to avoid a possible repeat of the problem in the future.
Also of note as one question brought it up before the File.Open I have an using block with:
using (var stream = new BufferedStream(File.OpenRead(filePath), 1024 * 1024))
{
//..do calculations
}
Which I use to make some hash calculations in regards to the file. THIS one had no issues at all with opening the file (only the later File.Open had the issues)
Edit:
I've just received an info from the sysadmins here that shines a new light onto the problem:
The system is set up in a way so that the whole system is backuped time and again file by file wihtout the OS having any knowledge of it. This means in the case of the backuped file that the OS thinks it is there and nobody accesses it when in reality it is currently being backuped (and thus accessed and unable to be accessed from within the OS according to how they described the backup process.....as the OS doesn't know about the backup happening nothing was shown in the resources hard drive access nor the task manager).
Thus with that information it could be that as the OS didnt know about the file being accessed it tried to access it (through the open command) and waited and waited and waited for the hard drive read head to come to the file which never happened as it was not accessible in reality).
Thus it would have had to run into a timeout which the file.open command doesn't have (at least my guess there with the new infos if I understood the sys admins accurately there)
tnx
A couple possible reasons:
Your antivirus. That thing hooks into the OS and replaces the I/O functions with its own. When you open a file, it can actually perform a virus check before returning back control to your application. You could have had a bad signature update which forced the AV to perform the check on your 3GB file, and a subsequent update could have fixed the problem.
A bad sector on your drive. This usually makes I/O perform very poorly, but your system could have relocated the bad sector into another one, so the performance went back to normal. You can run a chkdsk /R to see if you have bad sectors.
Another app that locks the file, though I'd rather expect an exception in this case.
The Problem stemmed not from c# or the Windows System, but from the architecture of how the PC was set up itself.
In this case it was set up so, that the files I tried to read could be inacessible (because they were being backed up) WITHOUT the OS of the local PC knowing it.
Thus the OS thought the file was accessible and C# received that answer from the OS when it tried to open the file. And as file operations in C# use their Windows aequivalents and those have no timeouts.... the whole Operation hanged / freezed until the file backup was finished.
In retrospect I would say: Lucas Trzesniewski answer should cover most situations where the freeze happens....my own Problem was not answerd by that only because I had such a Special Situation that caused the Problem in the end.
Are you absolutely sure that the freezing always occurs in File.Open()?
Given the absence of exceptions it appears that the problem may be at lower level. When you've experienced it you tried to open the file with a hex editor or some other tool to check that it is actually entirely readable? It could be a problem of access to a certain area of the hard drive.
Try to specify the access mode with FileAccess if you need read-only, write-only, etc.
See also this post for the actual usefulness of BufferedStream.
Have you check with File.Open() function with FileAccess & FileShare values ,
I think it's a file locking issue
I had a similar issue when sometimes File.Open hangs when trying to check if a file is locked.
I solved it like this:
public async Task<bool> IsLocked(FileInfo file)
{
var checkTask = Task.Run(() =>
{
try
{
using (file.Open(FileMode.Open, FileAccess.Read, FileShare.None)) { };
return false;
}
catch (Exception)
{
return true;
}
});
var delayTask = Task.Delay(1000);
var firstTask = await Task.WhenAny(checkTask, delayTask);
if (firstTask == delayTask)
{
return true;
}
else
{
return await checkTask;
}
}
I am working on server software that periodically needs to save data to disk. I need to make sure that the old file is overwritten, and that the file cannot get corrupted (e.g. only partially overwritten) in case of unexpected circumstances.
I've adopted the following pattern:
string tempFileName = Path.GetTempFileName();
// ...write out the data to temporary file...
MoveOrReplaceFile(tempFileName, fileName);
...where MoveOrReplaceFile is:
public static void MoveOrReplaceFile( string source, string destination ) {
if (source == null) throw new ArgumentNullException("source");
if (destination == null) throw new ArgumentNullException("destination");
if (File.Exists(destination)) {
// File.Replace does not work across volumes
if (Path.GetPathRoot(Path.GetFullPath(source)) == Path.GetPathRoot(Path.GetFullPath(destination))) {
File.Replace(source, destination, null, true);
} else {
File.Copy(source, destination, true);
}
} else {
File.Move(source, destination);
}
}
This works well as long as the server has exclusive access to files. However, File.Replace appears to be very sensitive to external access to files. Any time my software runs on a system with an antivirus or a real-time backup system, random File.Replace errors start popping up:
System.IO.IOException: Unable to remove the file to be replaced.
Here are some possible causes that I've eliminated:
Unreleased file handles: using() ensures that all file handles are released as soon as possible.
Threading issues: lock() guards all access to each file.
Different disk volumes: File.Replace() fails when used across disk volumes. My method checks this already, and falls back to File.Copy().
And here are some suggestions that I've come across, and why I'd rather not use them:
Volume Shadow Copy Service: This only works as long as the problematic third-party software (backup and antivirus monitors, etc) also use VSS. Using VSS requires tons of P/Invoke, and has platform-specific issues.
Locking files: In C#, locking a file requires maintaining a FileStream open. It would keep third-party software out, but 1) I still won't be able to replace the file using File.Replace, and 2) Like I mentioned above, I'd rather write to a temporary file first, to avoid accidental corruption.
I'd appreciate any input on either getting File.Replace to work every time or, more generally, saving/overwriting files on disk reliably.
You really want to use the 3rd parameter, the backup file name. That allows Windows to simply rename the original file without having to delete it. Deleting will fail if any other process has the file opened without delete sharing, renaming is never a problem. You could then delete it yourself after the Replace() call and ignore an error. Also delete it before the Replace() call so the rename won't fail and you'll cleanup failed earlier attempts. So roughly:
string backup = destination + ".bak";
File.Delete(backup);
File.Replace(source, destination, backup, true);
try {
File.Delete(backup);
}
catch {
// optional:
filesToDeleteLater.Add(backup);
}
There are several possible approaches, here some of them:
Use a "lock" file - a temporary file that is created before the operation and indicates other writers (or readers) that the file is being modified and thus exclusively locked. After the operation complete - remove the lock file. This method assumes that the file-creation command is atomic.
Use NTFS transactional API (if appropriate).
Create a link to the file, write the changed file under a random name (for example Guid.NewGuid()) - and then remap the link to the new file. All readers will access the file through the link (which name is known).
Of course all 3 approaches have their own drawbacks and advantages
If the software is writing to an NTFS partition then try using Transactional NTFS. You can use AlphFS for a .NET wrapper to the API. That is probably the most reliable way to write files and prevent corruption.
I have a program that opens a large binary file, appends a small amount of data to it, and closes the file.
FileStream fs = File.Open( "\\\\s1\\temp\\test.tmp", FileMode.Append, FileAccess.Write, FileShare.None );
fs.Write( data, 0, data.Length );
fs.Close();
If test.tmp is 5MB before this program is run and the data array is 100 bytes, this program will cause over 5MB of data to be transmitted across the network. I would have expected that the data already in the file would not be transmitted across the network since I'm not reading it or writing it. Is there any way to avoid this behavior? This makes it agonizingly slow to append to very large files.
0xA3 provided the answer in a commment above. The poor performance was due to an on-access virus scan. Each time my program opened the file, the virus scanner read the entire contents of the file to check for viruses even though my program didn't read any of the existing content. Disabling the on-access virus scan eliminated the excessive network I/O and the poor performance.
Thanks to everyone for your suggestions.
I found this on MSDN (CreateFile is called internally):
When an application creates a file across a network, it is better to use GENERIC_READ | GENERIC_WRITE for dwDesiredAccess than to use GENERIC_WRITE alone. The resulting code is faster, because the redirector can use the cache manager and send fewer SMBs with more data. This combination also avoids an issue where writing to a file across a network can occasionally return ERROR_ACCESS_DENIED.
Using Reflector, FileAccess maps to dwDesiredAccess, so it would seem to suggest using FileAccess.ReadWrite instead of just FileAccess.Write.
I have no idea if this will help :)
You could cache your data into a local buffer and periodically (much less often than now) append to the large file. This would save on a bunch of network transfers but... This would also increase the risk of losing that cache (and your data) in case your app crashes.
Logging (if that's what it is) of this type is often stored in a db. Using a decent RDBMS would allow you to post that 100 bytes of data very frequently with minimal overhead. The caveat there is the maintenance of an RDBMS.
If you have system access or perhaps a friendly admin for the machine actually hosting the file you could make a small listener program that sits on the other end.
You make a call to it passing just the data to be written and it does the write locally, avoiding the extra network traffic.
The File object in .NET has quite a few static methods to handle this type of thing. I would suggest trying:
File file = File.AppendAllText("FilePath", "What to append", Encoding.UTF8);
When you reflect this method it turns out that it's using:
using (StreamWriter writer = new StreamWriter(path, true, encoding))
{
writer.Write(contents);
}
This StreamWriter method should allow you to simply append something to the end (at least this is the method I've seen used in every instance of logging that I've encountered so far).
Write the data to separate files, then join them (do it on the hosting machine if possible) only when necessary.
I did some googling and was looking more at how to read excessively large files quickly and found this link https://web.archive.org/web/20190906152821/http://www.4guysfromrolla.com/webtech/010401-1.shtml
The most interesting part there would be the part about byte reading:
Besides the more commonly used ReadAll and ReadLine methods, the TextStream object also supports a Read(n) method, where n is the number of bytes in the file/textstream in question. By instantiating an additional object (a file object), we can obtain the size of the file to be read, and then use the Read(n) method to race through our file. As it turns out, the "read bytes" method is extremely fast by comparison:
const ForReading = 1
const TristateFalse = 0
dim strSearchThis
dim objFS
dim objFile
dim objTS
set objFS = Server.CreateObject("Scripting.FileSystemObject")
set objFile = objFS.GetFile(Server.MapPath("myfile.txt"))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)
strSearchThis = objTS.Read(objFile.Size)
if instr(strSearchThis, "keyword") > 0 then
Response.Write "Found it!"
end if
This method could then be used by you to go to the end of the file and manually appending it instead of loading the entire file in append mode with a filestream.