An external Windows service I work with maintains a single text-based log file that it continuously appends to. This log file grows unbounded over time. I'd like to prune this log file periodically to maintain, say the most recent 5mb of log entries. How can I efficiently implement the file I/O code in C# .NET 4.0 to prune the file to say 5mb?
Updated:
The way service dependencies are set up, my service always starts before the external service. This means I get exclusive access to the log file to truncate it, if required. Once the external service starts up, I will not access the log file. I can gain exclusive access to the file on desktop startup. The problem is - the log file may a few gigabytes in size and I'm looking for an efficient way to truncate it.
It's going to take the amount of memory that you want to store to process the "new" log file but if you only want 5Mb then it should be fine. If you are talking about Gb+ then you probably have other problems; however, it could still be accomplished using a temp file and some locking.
As noted before, you may experience a race condition but that's not the case if this is the only thread writing to this file. This would replace your current writing to the file.
const int MAX_FILE_SIZE_IN_BYTES = 5 * 1024 * 1024; //5Mb;
const string LOG_FILE_PATH = #"ThisFolder\log.txt";
string newLogMessage = "Hey this happened";
#region Use one or the other, I mean you could use both below if you really want to.
//Use this one to save an extra character
if (!newLogMessage.StartsWith(Environment.NewLine))
newLogMessage = Environment.NewLine + newLogMessage;
//Use this one to imitate a write line
if (!newLogMessage.EndsWith(Environment.NewLine))
newLogMessage = newLogMessage + Environment.NewLine;
#endregion
int newMessageSize = newLogMessage.Length*sizeof (char);
byte[] logMessage = new byte[MAX_FILE_SIZE_IN_BYTES];
//Append new log to end of "file"
System.Buffer.BlockCopy(newLogMessage.ToCharArray(), 0, logMessage, MAX_FILE_SIZE_IN_BYTES - newMessageSize, logMessage.Length);
FileStream logFile = File.Open(LOG_FILE_PATH, FileMode.Open, FileAccess.ReadWrite);
int sizeOfRetainedLog = (int)Math.Min(MAX_FILE_SIZE_IN_BYTES - newMessageSize, logFile.Length);
//Set start position/offset of the file
logFile.Position = logFile.Length - sizeOfRetainedLog;
//Read remaining portion of file to beginning of buffer
logFile.Read(logMessage, logMessage.Length, sizeOfRetainedLog);
//Clear the file
logFile.SetLength(0);
logFile.Flush();
//Write the file
logFile.Write(logMessage, 0, logMessage.Length);
I wrote this really quick, I apologize if I'm off by 1 somewhere.
depending on how often it is written to I'd say you might be facing a race condition to modify the file without damaging the log. You could always try writing a service to monitor the file size, and once it reaches a certain point lock the file, dupe and clear the whole thing and close it. Then store the data in another file that the service controls the size of easily. Alternatively you could see if the external service has an option for logging to a database, which would make it pretty simple to roll out the oldest data.
You could use a file observer to monitor the file:
FileSystemWatcher logWatcher = new FileSystemWatcher();
logWatcher.Path = #"c:\example.log"
logWatcher.Changed += logWatcher_Changed;
Then when the event is raised you can use a StreamReader to read the file
private void logWatcher_Changed(object sender, FileSystemEventArgs e)
{
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
while ((line = readFile.ReadLine()) != null)
{
// Here you delete the lines you want or move it to another file, so that your log keeps small. Then save the file.
}
}
}
It´s an option.
Related
I work with a program that takes large amounts of data, turns the data into xml files, then takes those xml files and zips them for use in another program. Occasionally, during the zipping process, one or two xml files gets left out. It is fairly rare, once or twice a month, but when it does happen it's a big mess. I am looking for help figuring out why the files don't get zipped and how to prevent it. This code is straightforward:
public string AddToZip(string outfile, string toCompress)
{
if (!File.Exists(toCompress)) throw new FileNotFoundException("Could not find the file to compress", toCompress);
string dir = Path.GetDirectoryName(outfile);
if(!Directory.Exists(dir))
{
Directory.CreateDirectory(dir);
}
// The program that gets this data can't handle files over
// 20 MB, so it splits it up into two or more files if it hits the
// limit.
if (File.Exists(outfile))
{
FileInfo tooBig = new FileInfo(outfile);
int converter = 1024;
float fileSize = tooBig.Length / converter; //bytes to KB
fileSize = fileSize / converter; //KB to MB
int limit = CommonTypes.Helpers.ConfigHelper.GetConfigEntryInt("zipLimit", "19");
if (fileSize >= limit)
{
outfile = MakeNewName(outfile);
}
}
using (ZipFile zf = new ZipFile(outfile))
{
zf.AddFile(toCompress,"");
zf.Save();
}
return outfile;
}
Ultimately, what I want to do is have a check that sees if any xml files weren't added to the zip after the zip file is created, but stopping the problem in its tracks are best overall. Thanks for the help.
Make sure you have that code inside a try... catch statement. Also make sure that if you have done that, you do something with the exception. It would not be the first case that has this type of exception handling:
try
{
//...
}
catch { }
Given the code above if you have any exception on your process, you will never notice.
It's hard to judge from this function alone, here's a list of things that can go wrong:
- The toCompress file can be gone by the time zf.AddFile is called (but after the Exists test). Test return value or add exception handling to detect this.
- The zip outFile can be just below the size limit, adding a new file can make it go over the limit.
- The AddToZip() may be called concurrently, that may cause adding to fail.
How is the toCompress file remove handled? I think adding locking to the AddoZip() on a function scope might also be a good idea.
This could be a timing issue. You are checking to see if outfile is too big before trying to add the toCompress file. What you should be doing is:
Add toCompress to outfile
Check to see if adding the file made outfile too big
If outfile is now too big, remove toCompress, create new outfile, add toCompress to new outfile.
I suspect that you occasionally have an outfile that is just under the limit, but adding toCompress puts it over. Then the receiving program does not process outfile because it is too big.
I could be completely off base, but it is something to check.
I'm looking to create a console application that will read a file, and monitor every new line since it's being write by another process every .5 seconds.
How can I achieve that, within a Console App using .NET 4.5?
It sounds like you want a version of tail for Windows. See "Looking for a windows equivalent of the unix tail command" for discussion on that.
Otherwise, open the file without preventing other processes access using FileShare.ReadWrite. Seek to the end read then use Thread.Sleep() or Task.Delay() to wait the half a second between seeing if there are any changes.
For example:
public static void Follow(string path)
{
// Note the FileShare.ReadWrite, allowing others to modify the file
using (FileStream fileStream = File.Open(path, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(0, SeekOrigin.End);
using (StreamReader streamReader = new StreamReader(fileStream))
{
for (;;)
{
// Substitute a different timespan if required.
Thread.Sleep(TimeSpan.FromSeconds(0.5));
// Write the output to the screen or do something different.
// If you want newlines, search the return value of "ReadToEnd"
// for Environment.NewLine.
Console.Out.Write(streamReader.ReadToEnd());
}
}
}
}
As #Sudhakar mentioned, FileSystemWatcher is useful when you want to be notified when a file updates sporadically, and polling at regular intervals is useful when you want to be constantly processing information from an always-growing file (such as a busy log file).
I'd like to add a note about efficiency. If you are concerned with the efficiency and speed of processing large files (many MB or GB), then you will want to track your position in the file as you read and process updates. For example:
// This does exactly what it looks like.
long position = GetMyLastReadPosition();
using (var file = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
if (position == file.Length)
return;
file.Position = position;
using (var reader = new StreamReader(file))
{
string line;
while ((line = reader.ReadLine()) != null)
{
// Do reading.
}
position = file.Position; // Store this somewhere too.
}
}
This should allow you to avoid reprocessing any part of a file that you have already processed.
Solution 1: You can use FileSystemWatcher class
From MSDN:
Use FileSystemWatcher to watch for changes in a specified directory.
You can watch for changes in files and subdirectories of the specified
directory. You can create a component to watch files on a local
computer, a network drive, or a remote computer.
Solution 2: You can use Polling by creating a Timer and reading the contents of the file for every 5 seconds.
I have a strange problem. So my code follows as following.
The exe takes some data from the user
Call a web service to write(and create CSV for the data) the file at perticular network location(say \some-server\some-directory).
Although this web service is hosted at the same location where this
folder is (i.e i can also change it to be c:\some-directory). It then
returns after writing the file
the exe checks for the file to exists, if the file exists then further processing else quite with error.
The problem I am having is at step 3. When I try to read the file immediately after it has been written, I always get file not found exception(but the file there is present). I do not get this exception when I am debugging (because then I am putting a delay by debugging the code) or when Thread.Sleep(3000) before reading the file.
This is really strange because I close the StreamWriter before I return the call to exe. Now according to the documention, close should force the flush of the stream. This is also not related to the size of the file. Also I am not doing Async thread calls for writing and reading the file. They are running in same thread serially one after another(only writing is done by a web service and reading is done by exe. Still the call is serial)
I do not know, but it feels like there is some time difference between the file actually gets written on the disk and when you do Close(). However this baffling because this is not at all related to size. This happens for all file size. I have tried this with file with 10, 50, 100,200 lines of data.
Another thing which I suspected was since I was writing this file to a network location, it could be windows is optimizing the call by writing first to cache and then to network location. So I went ahead and changed the code to write it on drive(i.e use c:\some-directory), rather than network location. But it also resulted in same error.
There is no error in code(for reading and writing). As explained earlier, by putting a delay, it starts working fine. Some other useful information
The exe is .Net Framework 3.5
Windows Server 2008(64 bit, 4 GB Ram)
Edit 1
File.AppendAllText() is not correct solution, as it creates a new file, if it does not exits
Edit 2
code for writing
using (FileStream fs = new FileStream(outFileName, FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(fs, Encoding.Unicode))
{
writer.WriteLine(someString)
}
}
code for reading
StreamReader rdr = new StreamReader(File.OpenRead(CsvFilePath));
string header = rdr.ReadLine();
rdr.Close();
Edit 3
used textwriter, same error
using (TextWriter writer = File.CreateText(outFileName))
{
}
Edit 3
Finally as suggested by some users, I am doing a check for the file in while loop for certain number of times before I throw the exception of file not found.
int i = 1;
while (i++ < 10)
{
bool fileExists = File.Exists(CsvFilePath);
if (!fileExists)
System.Threading.Thread.Sleep(500);
else
break;
}
So you are writing a stream to a file, then reading the file back to a stream? Do you need to write the file then post process it, or can you not just use the source stream directly?
If you need the file, I would use a loop that keeps checking if the file exists every second until it appears (or a silly amount of time has passed) - the writer would give you an error if you couldn't write the file, so you know it will turn up eventually.
Since you're writing over a network, most optimal solution would be to save your file in the local system first, then copy it to network location. This way you can avoid network connection problems. And as well have a backup in case of network failure.
Based on your update, Try this instead:
File.WriteAllText(outFileName, someString);
header = null;
using(StreamReader reader = new StreamReader(CsvFilePath)) {
header = reader.ReadLine();
}
Have you tried to read after disposing the writer FileStream?
Like this:
using (FileStream fs = new FileStream(outFileName, FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(fs, Encoding.Unicode))
{
writer.WriteLine(someString)
}
}
using (StreamReader rdr = new StreamReader(File.OpenRead(CsvFilePath)))
{
string header = rdr.ReadLine();
}
I'm developing a small C# application that scans a log file for lines containing certain keywords and alerts the user when one of the keywords is found. This log is potentially extremely large (several gigabytes, in worst case scenario) but the only lines on the log that are relevant to me, are the ones added to the log while my application is running.
Is there a way I can capture each text line being appended to the file, without having to worry about the file content that was already present?
I already found out about the FileSystemWatcher class while searching for a solution, and while that seems great for notifying when I have new content to fetch from the log, it doesn't seem to help for telling me what was added to it.
If you keep a FileStream open in Read mode (allowing writers, of course), you should be able to initially scan through the whole file and wait at the end until the FSW notifies you that the file has been modified.
Just be careful to reset your reading thread somehow if the file is deleted, for example if the log file that you are tailing gets rolled.
Here, I knocked together an example- run this, and while it is running, edit C:\Temp\Temp.txt in notepad and save it:
public static void Main()
{
var lockMe = new object();
using (var latch = new ManualResetEvent(true))
using (var fs = new FileStream(#"C:\Temp\Temp.txt", FileMode.OpenOrCreate, FileAccess.Read, FileShare.ReadWrite))
using (var fsw = new FileSystemWatcher(#"C:\Temp\"))
{
fsw.Changed += (s, e) =>
{
lock (lockMe)
{
if (e.FullPath != #"C:\Temp\Temp.txt") return;
latch.Set();
}
};
using (var sr = new StreamReader(fs))
while (true)
{
latch.WaitOne();
lock (lockMe)
{
String line;
while ((line = sr.ReadLine()) != null)
Console.Out.WriteLine(line);
latch.Set();
}
}
}
}
The most efficient solution (if your application needs it), is to write a file hook driver to capture all write access to to the file. That driver might tell you what bytes were changed. If you don't want to write the driver in C/C++, perhaps you can use EasyHook. EasyHook is great because, if you know the exact application that's writing to the log file, you can write a very simple user-mode hook (check his examples on CodePlex). If you don't know the name of the applications, you might have to write a kernel-hook (which is still easier with EasyHook).
Instead of reading the text from the file (what I assume you are doing), read the bytes of the file. If you can assume that writes to the file will always be appended, and you know the text encoding of the file, then you can just read in the bytes starting at the file size of the original file. Then convert the bytes to text using the proper encoding.
In a similar way to this question, but you'll need to have the old file size recorded. Then instead of seeking back 10 newlines, just seek back the size difference. You'll have to be careful about encodings though.
My application use "FileSystemWatcher()" to raise an event when a TXT file is created by an "X" application and then read its content.
the "X" application create a file (my application detect it successfully) but it take some time to fill the data on it, so the this txt file cannot be read at the creation time, so im
looking for something to wait until the txt file come available to reading. not a static delay but something related to that file.
any help ? thx
Create the file like this:
myfile.tmp
Then when it's finished, rename it to
myfile.txt
and have your filewatcher watch for the .txt extension
The only way I have found to do this is to put the attempt to read the file in a loop, and exit the loop when I don't get an exception. Hopefully someone else will come up with a better way...
bool FileRead = false;
while (!FileRead)
{
try
{
// code to read file, which you already know
FileRead = true;
}
catch(Exception)
{
// do nothing or optionally cause the code to sleep for a second or two
}
}
You could track the file's Changed event, and see if it's available for opening on change. If the file is still locked, just watch for the next change event.
You can open and read a locked file like this
using (var stream = new FileStream(#"c:\temp\file.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) {
using (var file = new StreamReader(stream)) {
while (!file.EndOfStream) {
var line = file.ReadLine();
Console.WriteLine(line);
}
}
}
However, make sure your file writer flushes otherwise you may not see any changes.
The application X should lock the file until it closes it. Is application X also a .NET application and can you modify it? In that case you can simply use the FileInfo class with the proper value for FileShare (in this case FileShare.Read).
If you have no control over application X, the situation becomes a little more complex. But then you can always attempt to open the file exclusively via the same FileInfo.Open method. Provide FileShare.None in that case. It will attempt to open the file exclusively and will fail if the file is still in use. You can perform this action inside a loop until the file is closed by application X and ready to be read.
We have a virtual printer for creating pdf documents, and I do something like this to access that document after it's sent to the printer:
using (FileSystemWatcher watcher = new FileSystemWatcher(folder))
{
if(!File.Exists(docname))
for (int i = 0; i < 3; i++)
watcher.WaitForChanged(WatcherChangeTypes.Created, i * 1000);
}
So I wait for a total of 6 seconds (some documents can take a while to print but most come very fast, hence the increasing wait time) before deciding that something has gone awry.
After this, I also read in a for loop, in just the same way that I wait for it to be created. I do this just in case the document has been created, but not released by the printer yet, which happens nearly every time.
You can use the same class to be notified when file changes.
The Changed event is raised when changes are made to the size, system attributes, last write time, last access time, or security permissions of a file or directory in the directory being monitored.
So I think you can use that event to check if file is readable and open it if it is.
If you have a DB at your disposal I would recommend using a DB table as a queue with the file names and then monitor that instead. nice and transactional.
You can check if file's size has changed. Although this will require you to poll it's value with some frequency.
Also, if you want to get the data faster, you can .Flush() while writing, and make sure to .Close() stream as soon as you will finish writing to it.