Writing to file, memory used steadily increasing

Writing to file, memory used steadily increasing - c#

I have an application where I need to write binary to a file constantly. The bits of data are small, about 1K each. The computers this is running on aren't great and are running XP. I've run into the problem that when I turn on the logging the computers just get totally hosed and I watch the Task Manager and just see the memory usage going up and up until it crashes.
A coworker suggested that I just keep the packets in memory until a certain amount of time has passed and then write it all at once instead of writing each one separately - tried that, same issue.
This is the code (loggingBuffer is the List<byte[]> I'm storing the packets in while the interval passes):
if ((DateTime.Now - lastStoreTime).TotalSeconds > 10)
{
string fileName = #"C:\Storage\file";
FileMode fm = File.Exists(fileName) ? FileMode.Append : FileMode.Create;
using (BinaryWriter w = new BinaryWriter(File.Open(fileName, fm), Encoding.ASCII))
{
foreach (byte[] packetData in loggingBuffer)
{
w.Write(packetData);
}
}
loggingBuffer.Clear();
lastStoreTime= DateTime.Now;
}
Is there anything different I should be doing to accomplish this?

Seems to me that, while you're writing each 10 seconds, you could close the file in between. And cleanup all related file-writing things. Perhaps that would solved your problem.
Secondly, I'd suggest creating the BinaryWriter outside the function where you actually write the data. It'll keep things clearer. In your current code you're checking each time wether to append data or to create a new file and the write to it. If you'll do this outside the function and call it just once perhaps this will save memory too. All untested by me, that is :)

Related

C# I/O async (copyAsync): how to avoid file fragmentation?

Within a tool copying big files between disks, I replaced the
System.IO.FileInfo.CopyTo method by System.IO.Stream.CopyToAsync.
This allow a faster copy and a better control during the copy, e.g. I can stop the copy.
But this create even more fragmentation of the copied files. It is especially annoying when I copy file of many hundreds megabytes.
How can I avoid disk fragmentation during copy?
With the xcopy command, the /j switch copies files without buffering. And it is recommended for very large file in TechNet
It seems indeed to avoid file fragmentation (while a simple file copy within windows 10 explorer DOES fragment my file!)
A copy without buffering seems to be the opposite way than this async copy. Or it there any way to do async copy without buffering?
Here it my current code for aync copy. I let the default buffersize of 81920 bytes, i.e. 10*1024*size(int64).
I am working with NTFS file systems, thus 4096 bytes clusters.
EDIT: I updated the code with SetLength as suggested, added the FileOptions Async while creating the destinationStream and fix setting the attributes AFTER setting the time (otherwise, exception is thrown for ReadOnly files)
int bufferSize = 81920;
try
{
using (FileStream sourceStream = source.OpenRead())
{
// Remove existing file first
if (File.Exists(destinationFullPath))
File.Delete(destinationFullPath);
using (FileStream destinationStream = File.Create(destinationFullPath, bufferSize, FileOptions.Asynchronous))
{
try
{
destinationStream.SetLength(sourceStream.Length); // avoid file fragmentation!
await sourceStream.CopyToAsync(destinationStream, bufferSize, cancellationToken);
}
catch (OperationCanceledException)
{
operationCanceled = true;
}
} // properly disposed after the catch
}
}
catch (IOException e)
{
actionOnException(e, "error copying " + source.FullName);
}
if (operationCanceled)
{
// Remove the partially written file
if (File.Exists(destinationFullPath))
File.Delete(destinationFullPath);
}
else
{
// Copy meta data (attributes and time) from source once the copy is finished
File.SetCreationTimeUtc(destinationFullPath, source.CreationTimeUtc);
File.SetLastWriteTimeUtc(destinationFullPath, source.LastWriteTimeUtc);
File.SetAttributes(destinationFullPath, source.Attributes); // after set time if ReadOnly!
}
I fear also that the File.SetAttributes and Time at the end on my code could increase file fragmentation.
Is there a proper way to create a 1:1 asynchronous file copy without any file fragmentation, i.e. asking the HDD that the file steam get only contiguous sectors?
Other topics regarding file fragmentation like How can I limit file fragmentation while working with .NET suggests incrementing the file size in larger chunks, but it does not seem to be a direct answer to my question.

but the SetLength method does the job
It does not do the job. It only updates the file size in the directory entry, it does not allocate any clusters. The easiest way to see this for yourself is by doing this on a very large file, say 100 gigabytes. Note how the call completes instantly. Only way it can be instant is when the file system does not also do the job of allocating and writing the clusters. Reading from the file is actually possible, even though the file contains no actual data, the file system simply returns binary zeros.
This will also mislead any utility that reports fragmentation. Since the file has no clusters, there can be no fragmentation. So it only looks like you solved your problem.
The only thing you can do to force the clusters to be allocated is to actually write to the file. It is in fact possible to allocate 100 gigabytes worth of clusters with a single write. You must use Seek() to position to Length-1, then write a single byte with Write(). This will take a while on a very large file, it is in effect no longer async.
The odds that it will reduce fragmentation are not great. You merely reduced the risk somewhat that the writes will be interleaved by writes from other processes. Somewhat, actual writing is done lazily by the file system cache. Core issue is that the volume was fragmented before you began writing, it will never be less fragmented after you're done.
Best thing to do is to just not fret about it. Defragging is automatic on Windows these days, has been since Vista. Maybe you want to play with the scheduling, maybe you want to ask more about it at superuser.com

I think, FileStream.SetLength is what you need.

Considering Hans Passant answer,
in my code above, an alternative to
destinationStream.SetLength(sourceStream.Length);
would be, if I understood it properly:
byte[] writeOneZero = {0};
destinationStream.Seek(sourceStream.Length - 1, SeekOrigin.Begin);
destinationStream.Write(writeOneZero, 0, 1);
destinationStream.Seek(0, SeekOrigin.Begin);
It seems indeed to consolidate the copy.
But a look at the source code of FileStream.SetLengthCore seems it does almost the same, seeking at the end but without writing one byte:
private void SetLengthCore(long value)
{
Contract.Assert(value >= 0, "value >= 0");
long origPos = _pos;
if (_exposedHandle)
VerifyOSHandlePosition();
if (_pos != value)
SeekCore(value, SeekOrigin.Begin);
if (!Win32Native.SetEndOfFile(_handle)) {
int hr = Marshal.GetLastWin32Error();
if (hr==__Error.ERROR_INVALID_PARAMETER)
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_FileLengthTooBig"));
__Error.WinIOError(hr, String.Empty);
}
// Return file pointer to where it was before setting length
if (origPos != value) {
if (origPos < value)
SeekCore(origPos, SeekOrigin.Begin);
else
SeekCore(0, SeekOrigin.End);
}
}
Anyway, not sure that theses method guarantee no fragmentation, but at least avoid it for most of the cases. Thus the auto defragment tool will finish the job at a low performance expense.
My initial code without this Seek calls created hundred of thousands of fragments for 1 GB file, slowing down my machine when the defragment tool went active.

Crash safe on-the-fly compression with GZipStream

I'm compressing a log file as data is written to it, something like:
using (var fs = new FileStream("Test.gz", FileMode.Create, FileAccess.Write, FileShare.None))
{
using (var compress = new GZipStream(fs, CompressionMode.Compress))
{
for (int i = 0; i < 1000000; i++)
{
// Clearly this isn't what is happening in production, just
// a simply example
byte[] message = RandomBytes();
compress.Write(message, 0, message.Length);
// Flush to disk (in production we will do this every x lines,
// or x milliseconds, whichever comes first)
if (i % 20 == 0)
{
compress.Flush();
}
}
}
}
What I want to ensure is that if the process crashes or is killed, the archive is still valid and readable. I had hoped that anything since the last flush would be safe, but instead I am just ending up with a corrupt archive.
Is there any way to ensure I end up with a readable archive after each flush?
Note: it isn't essential that we use GZipStream, if something else will give us the desired result.

An option is to let Windows handle the compression. Just enable compression on the folder where you're storing your log files. There are some performance considerations you should be aware of when copying the compressed files, and I don't know how well NT compression performs in comparision to GZipStream or other compression options. You'll probably want to compare compression ratios and CPU load.
There's also the option of opening a compressed file, if you don't want to enable compression on the entire folder. I haven't tried this, but you might want to look into it: http://social.msdn.microsoft.com/forums/en-US/netfxbcl/thread/1b63b4a4-b197-4286-8f3f-af2498e3afe5

Good news: GZip is a streaming format. Therefore corruption at the end of the stream cannot affect the beginning which was already written.
So even if your streaming writes are interrupted at an arbitrary point, most of the stream is still good. You can write yourself a little tool that reads from it and just stops at the first exception it sees.
If you want an error-free solution I'd recommend splitting the log into one file every x seconds (maybe x = 1 or 10?). Write into a file with extensions ".gz.tmp" and rename to ".gz" after the file was completely written and closed.

Yes, but it's more involved than just flushing. Take a look at gzlog.h and gzlog.c in the zlib distribution. It does exactly what you want, efficiently adding short log entries to a gzip file, and always leaving a valid gzip file behind. It also has protection against crashes or shutdowns during the process, still leaving a valid gzip file behind and not losing any log entries.
I recommend not using GZIPStream. It is buggy and does not provide the necessary functionality. Use DotNetZip instead as your interface to zlib.

how to copy one Stream object values to second Stream Object in asp.net

In my project user can upload file up to 1GB. I want to copy that uploaded file stream data to second stream.
If I use like this
int i;
while ( ( i = fuVideo.FileContent.ReadByte() ) != -1 )
{
strm.WriteByte((byte)i);
}
then it is taking so much time.
If i try to do this by byte array then I need to add array size in long which is not valid.
If someone has better idea to do this then please let me know.
--
Hi Khepri thanks for your response. I tried Stream.Copy but it is taking so much time to copy one stream object to second.
I tried with 8.02Mb file and it took 3 to 4 minutes.
The code i have added is
Stream fs = fuVideo.FileContent; //fileInf.OpenRead();
Stream strm = ftp.GetRequestStream();
fs.CopyTo(strm);
If i am doing something wrong then please let me know.

Is this .NET 4.0?
If so Stream.CopyTo is probably your best bet.
If not, and to give credit where credit is due, see the answer in this SO thread. If you're not .NET 4.0 make sure to read the comments in that thread as there are some alternative solutions (Async stream reading/writing) that may be worth investigating if performance is at an absolute premium which may be your case.
EDIT:
Based off the update, are you trying to copy the file to another remote destination? (Just guessing based on GetRequestStream() [GetRequestStream()]. The time is going to be the actual transfer of the file content to the destination. So in this case when you do fs.CopyTo(strm) it has to move those bytes from the source stream to the remote server. That's where the time is coming from. You're literally doing a file upload of a huge file. CopyTo will block your processing until it completes.
I'd recommend looking at spinning this kind of processing off to another task or at the least look at the asynchronous option I listed. You can't really avoid this taking a large period of time. You're constrained by file size and available upload bandwidth.
I verified that when working locally CopyTo is sub-second. I tested with a half gig file and a quick Stopwatch class returned a processing time of 800 millisecondss.

If you are not .NET 4.0 use this
static void CopyTo(Stream fromStream, Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = fromStream.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}

Power Loss after StreamWriter.Close() produces blank file, why?

Ok, so to explain; I am developing for a system that can suffer a power failure at any point in time, one point that I am testing is directly after I have written a file out using a StreamWriter. The code below:
// Write the updated file back out to the Shell directory.
using (StreamWriter shellConfigWriter =
new StreamWriter(#"D:\xxx\Shell\Config\Game.cfg.bak"))
{
for (int i = 0; i < configContents.Count; i++)
{
shellConfigWriter.WriteLine(configContents[i]);
}
shellConfigWriter.Close();
}
FileInfo gameCfgBackup = new FileInfo(#"D:\xxx\Shell\Config\Game.cfg.bak");
gameCfgBackup.CopyTo(#"D:\xxx\Shell\Config\Game.cfg", true);
Writes the contents of shellConfigWriter (a List of strings) out to a file used as a temporary store, then it is copied over the original. Now after this code has finished executing the power is lost, upon starting back up again the file Game.cfg exists and is the correct size, but is completely blank. At first I thought that this was due to Write-Caching being enabled on the hard drive, but even with it off it still occurs (albeit less often).
Any ideas would be very welcome!
Update: Ok, so after removing the .Close() statements and calling .Flush() after every write operation the files still end up blank. I could go one step further and create a backup of the original file first, before creating the new one, and then I have enough backups to do a integrity check, but I don't think it'll help to solve the underlying issue (that when I tell it to write to, flush and close a file... It doesn't!).

Keep the OS from buffering the output using the FileOptions parameter of the FileStream object's constructor:
using (Stream fs = new FileStream(#"D:\xxx\Shell\Config\Game.cfg.bak", FileMode.Create, FileAccess.Write, FileShare.None, 0x1000, FileOptions.WriteThrough))
using (StreamWriter shellConfigWriter = new StreamWriter(fs))
{
for (int i = 0; i < configContents.Count; i++)
{
shellConfigWriter.WriteLine(configContents[i]);
}
shellConfigWriter.Flush();
shellConfigWriter.BaseStream.Flush();
}

First of all, you don't have to call shellConfigWriter.Close() there. The using statement will take care of it. What you might want to do instead to guard against power failure is call shellConfigWriter.Flush().
Update
Something else you might want to consider is that if a power failure can really happen at any time, it could happen in the middle of a write, such that only some of the bytes make it to a file. There's really no way to stop that.
To protect against these scenarios, a common procedure is to use state/condition flag files. You use the existence or non-existence on the file system of a zero-byte file with a particular name to tell your program where to pick up again when it resumes. Then you don't create or destroy the files that trigger a particular state until you are sure you've reached that state and completed the previous.
The downside here is that it might mean throwing a lot of work away now and then. But the benefit is that it means the functional part of your code looks like normal: there's very little extra work to do to make the system sufficiently robust.

You want to set AutoFlush = true;

wait for a TXT file to be readable c#

My application use "FileSystemWatcher()" to raise an event when a TXT file is created by an "X" application and then read its content.
the "X" application create a file (my application detect it successfully) but it take some time to fill the data on it, so the this txt file cannot be read at the creation time, so im
looking for something to wait until the txt file come available to reading. not a static delay but something related to that file.
any help ? thx

Create the file like this:
myfile.tmp
Then when it's finished, rename it to
myfile.txt
and have your filewatcher watch for the .txt extension

The only way I have found to do this is to put the attempt to read the file in a loop, and exit the loop when I don't get an exception. Hopefully someone else will come up with a better way...
bool FileRead = false;
while (!FileRead)
{
try
{
// code to read file, which you already know
FileRead = true;
}
catch(Exception)
{
// do nothing or optionally cause the code to sleep for a second or two
}
}

You could track the file's Changed event, and see if it's available for opening on change. If the file is still locked, just watch for the next change event.

You can open and read a locked file like this
using (var stream = new FileStream(#"c:\temp\file.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
using (var file = new StreamReader(stream)) {
while (!file.EndOfStream) {
var line = file.ReadLine();
Console.WriteLine(line);
}
}
}
However, make sure your file writer flushes otherwise you may not see any changes.

The application X should lock the file until it closes it. Is application X also a .NET application and can you modify it? In that case you can simply use the FileInfo class with the proper value for FileShare (in this case FileShare.Read).
If you have no control over application X, the situation becomes a little more complex. But then you can always attempt to open the file exclusively via the same FileInfo.Open method. Provide FileShare.None in that case. It will attempt to open the file exclusively and will fail if the file is still in use. You can perform this action inside a loop until the file is closed by application X and ready to be read.

We have a virtual printer for creating pdf documents, and I do something like this to access that document after it's sent to the printer:
using (FileSystemWatcher watcher = new FileSystemWatcher(folder))
{
if(!File.Exists(docname))
for (int i = 0; i < 3; i++)
watcher.WaitForChanged(WatcherChangeTypes.Created, i * 1000);
}
So I wait for a total of 6 seconds (some documents can take a while to print but most come very fast, hence the increasing wait time) before deciding that something has gone awry.
After this, I also read in a for loop, in just the same way that I wait for it to be created. I do this just in case the document has been created, but not released by the printer yet, which happens nearly every time.

You can use the same class to be notified when file changes.
The Changed event is raised when changes are made to the size, system attributes, last write time, last access time, or security permissions of a file or directory in the directory being monitored.
So I think you can use that event to check if file is readable and open it if it is.

If you have a DB at your disposal I would recommend using a DB table as a queue with the file names and then monitor that instead. nice and transactional.

You can check if file's size has changed. Although this will require you to poll it's value with some frequency.
Also, if you want to get the data faster, you can .Flush() while writing, and make sure to .Close() stream as soon as you will finish writing to it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.