I'm trying to write a console app in C# which reads a log file. The problem that i'm facing is that this log file is updated every 1 hour so for example, if I had 10 lines in the beginning and afterwards 12, in my second read attempt i will have to read only the 2 newly added lines.
Can you suggest me a way to do this efficiently (without the need to read all the lines again because the log file usually has 5000+ lines)?
First of all you can use FileSystemWatcher to have notifications after file changed.
Morover you can use FileStream and Seek function to ready only new added lines. On http://www.codeproject.com/Articles/7568/Tail-NET there is an example with Thread.Sleep:
using ( StreamReader reader = new StreamReader(new FileStream(fileName,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) )
{
//start at the end of the file
long lastMaxOffset = reader.BaseStream.Length;
while ( true )
{
System.Threading.Thread.Sleep(100);
//if the file size has not changed, idle
if ( reader.BaseStream.Length == lastMaxOffset )
continue;
//seek to the last max offset
reader.BaseStream.Seek(lastMaxOffset, SeekOrigin.Begin);
//read out of the file until the EOF
string line = "";
while ( (line = reader.ReadLine()) != null )
Console.WriteLine(line);
//update the last max offset
lastMaxOffset = reader.BaseStream.Position;
}
}
Related
I have little problem. My code in visual studio:
file = new StreamReader("D:\\BaseList.txt");
string line;
while ((line = file.ReadLine()) != null)
{
listBox1.Items.Add(line);
}
file.Close(); // 1
file = new StreamReader("D:\\Baza3.txt"); //2
I read all lines in file and I would like once more to read from the beginning. Do I have to close the stream and reload the file to stream( line numbered 1 and 2)?
Is there a method, which allows to set the stream at the beginning of my file without using this numbered line?
You can reset the position of the base stream like this
streamReader.BaseStream.Position = 0;
You can do that only if the base stream is seekable. (myStream.CanSeek == true). The is true in your case when you create a new StreamReader with a path string.
Try setting the BaseStream Position to 0, or copying the contents to a MemoryStream before actually start reading it.
Check out this thread:
Return StreamReader to Beginning
I wrote a Winform application that reads in each line of a text file, does a search and replace using RegEx on the line, and then it writes back out to a new file. I chose the "line by line" method as some of the files are just too large to load into memory.
I am using the BackgroundWorker object so the UI can be updated with the progress of the job. Below is the code (with parts omitted for brevity) that handles the reading and then outputting of the lines in the file.
public void bgWorker_DoWork(object sender, DoWorkEventArgs e)
{
// Details of obtaining file paths omitted for brevity
int totalLineCount = File.ReadLines(inputFilePath).Count();
using (StreamReader sr = new StreamReader(inputFilePath))
{
int currentLine = 0;
String line;
while ((line = sr.ReadLine()) != null)
{
currentLine++;
// Match and replace contents of the line
// omitted for brevity
if (currentLine % 100 == 0)
{
int percentComplete = (currentLine * 100 / totalLineCount);
bgWorker.ReportProgress(percentComplete);
}
using (FileStream fs = new FileStream(outputFilePath, FileMode.Append, FileAccess.Write))
using (StreamWriter sw = new StreamWriter(fs))
{
sw.WriteLine(line);
}
}
}
}
Some of the files I am processing are very large (8 GB with 132 million rows). The process takes a very long time (a 2 GB file took about 9 hours to complete). It looks to be working at around 58 KB/sec. Is this expected or should the process be going faster?
Don't close and re-open the writing file every loop iteration, just open the writer outside the file loop. This should improve performance as the writer no longer needs to seek to the end of the file every single loop iteration.
AlsoFile.ReadLines(inputFilePath).Count(); is causing you to read your input file twice and could be a big chunk of time. Instead of a percentage based off of lines calculate the percentage based off of stream position.
public void bgWorker_DoWork(object sender, DoWorkEventArgs e)
{
// Details of obtaining file paths omitted for brevity
using (StreamWriter sw = new StreamWriter(outputFilePath, true)) //You can use this constructor instead of FileStream, it does the same operation.
using (StreamReader sr = new StreamReader(inputFilePath))
{
int lastPercentage = 0;
String line;
while ((line = sr.ReadLine()) != null)
{
// Match and replace contents of the line
// omitted for brevity
//Poisition and length are longs not ints so we need to cast at the end.
int currentPercentage = (int)(sr.BaseStream.Position * 100L / sr.BaseStream.Length);
if (lastPercentage != currentPercentage )
{
bgWorker.ReportProgress(currentPercentage );
lastPercentage = currentPercentage;
}
sw.WriteLine(line);
}
}
}
Other than that you will need to show what Match and replace contents of the line omitted for brevity does as I would guess that is where your slowness comes from. Run a profiler on your code and see where it is taking the most time and focus your efforts there.
Follow this process:
Instantiate reader and writer
Loop through lines, doing the next two steps
In loop change line
In loop write changed line
Dispose of reader and writer
This should be a LOT faster than instantiating the writer on each line loop, as you have.
I will append this with a code sample shortly. Looks like someone else beat me to the punch on code samples - see #Scott Chamberlain's answer.
Remove the ReadAllLines method at the top as the reads through whole file just to get numberof lines.
Suppose the following lines in text file to which i have to read
INFO 2014-03-31 00:26:57,829 332024549ms Service1 startmethod - FillPropertyColor end
INFO 2014-03-31 00:26:57,829 332024549ms Service1 getReports_Dataset - getReports_Dataset started
INFO 2014-03-31 00:26:57,829 332024549ms Service1 cheduledGeneration - SwitchScheduledGeneration start
INFO 2014-03-31 00:26:57,829 332024549ms Service1 cheduledGeneration - SwitchScheduledGeneration limitId, subscriptionId, limitPeriod, dtNextScheduledDate,shoplimittype0, 0, , 3/31/2014 12:26:57 AM,0
I use the FileStream method to read the text file because the text file size having size over 1 GB. I have to read the files into chunks like initially in first run of program this would read two lines i.e. up to "getReports_Dataset started of second line". In next run it should read from 3rd line. I did the code but unable to get desired output.Problem is that my code doesn't give the exact chunk from where i have to start read text in next run. And second problem is while reading text lines .. don't give a complete line..i.e. some part is missing in lines. Following code:
readPosition = getLastReadPosition();
using (FileStream fStream = new FileStream(logFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (System.IO.StreamReader rdr = new System.IO.StreamReader(fStream))
{
rdr.BaseStream.Seek(readPosition, SeekOrigin.Begin);
while (numCharCount > 0)
{
int numChars = rdr.ReadBlock(block, 0, block.Length);
string blockString = new string(block);
lines = blockString.Split(Convert.ToChar('\r'));
lines[0] = fragment + lines[0];
fragment = lines[lines.Length - 1];
foreach (string line in lines)
{
lstTextLog.Add(line);
if (lstTextLog.Contains(fragment))
{
lstTextLog.Remove(fragment);
}
numProcessedChar++;
}
numCharCount--;
}
SetLastPosition(numProcessedChar, logFilePath);
}
If you want to read a file line-by-line, do this:
foreach (string line in File.ReadLines("filename"))
{
// process line here
}
If you really must read a line and save the position, you need to save the last line number read, rather than the stream position. For example:
int lastLineRead = getLastLineRead();
string nextLine = File.ReadLines("filename").Skip(lastLineRead).FirstOrDefault();
if (nextLine != null)
{
lastLineRead++;
SetLastPosition(lastLineRead, logFilePath);
}
The reason you can't do it by saving the base stream position is because StreamReader reads a large buffer full of data from the base stream, which moves the file pointer forward by the buffer size. StreamReader then satisfies read requests from that buffer until it has to read the next buffer full. For example, say you open a StreamReader and ask for a single character. Assuming that it has a buffer size of 4 kilobytes, StreamReader does essentially this:
if (buffer is empty)
{
read buffer (4,096 bytes) from base stream
buffer_position = 0;
}
char c = buffer[buffer_position];
buffer_position++; // increment position for next read
return c;
Now, if you ask for the base stream's position, it's going to report that the position is at 4096, even though you've only read one character from the StreamReader.
A feature of this program basically needs to tail a log file and forward lines which are newly written to it. I believe I am doing this correctly by creating the FileStream with the FileShare.ReadWrite option as the stream for StreamReader, as described in several other answers here and here.
But when I run the program it prevents some processes from writing to the file. Using Process Monitor I can see that my program is opening the file with R/W rights instead of just Read.
reader = new StreamReader(new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite));
{
//start at the end of the file
long lastMaxOffset = reader.BaseStream.Length;
while (true)
{
System.Threading.Thread.Sleep(Properties.Settings.Default.pauseInMilliseconds);
// if the file size has not changed, keep idling
if (reader.BaseStream.Length == lastMaxOffset)
continue;
// handle if the file contents have been cleared
if (reader.BaseStream.Length < lastMaxOffset)
lastMaxOffset = 0;
eventLogger.WriteEntry("LogChipper target file was reset, starting from beginning", EventLogEntryType.Information, 0);
// seek to the last max offset
reader.BaseStream.Seek(lastMaxOffset, SeekOrigin.Begin);
// read out of the file until the EOF
string line = "";
while ((line = reader.ReadLine()) != null)
syslogForwarder.Send(line);
// update the last max offset
lastMaxOffset = reader.BaseStream.Position;
// block if the service is paused or is shutting down
pause.WaitOne();
}
}
Is there something else I'm doing in that block which is holding the file open? I'm open to trying different approaches (e.g. FileSystemWatcher) if that would be better...
I have a text file that I want to read line by line and record the position in the text file as I go. After reading any line of the file the program can exit, and I need to resume reading the file at the next line when it resumes.
Here is some sample code:
using (FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
using (StreamReader streamReader = new StreamReader(fileStream))
{
while (!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
DoSomethingInteresting(line);
SaveLastPositionInFile(fileStream.Position);
if (CheckSomeCondition())
{
break;
}
}
}
}
When I run this code, the value of fileStream.Position does not change after reading each line, it only advances after reading a couple of lines. When it does change, it increases in multiples of 1024. Now I assume that there is some buffering going on under the covers, but how can I record the exact position in the file?
It's not FileStream that's responsible - it's StreamReader. It's reading 1K at a time for efficiency.
Keeping track of the effective position of the stream as far as the StreamReader is concerned is tricky... particularly as ReadLine will discard the line ending, so you can't accurately reconstruct the original data (it could have ended with "\n" or "\r\n"). It would be nice if StreamReader exposed something to make this easier (I'm pretty sure it could do so without too much difficulty) but I don't think there's anything in the current API to help you :(
By the way, I would suggest that instead of using EndOfStream, you keep reading until ReadLine returns null. It just feels simpler to me:
string line;
while ((line = reader.ReadLine()) != null)
{
// Process the line
}
I would agree with Stefan M., it is probably the buffering which is causing the Position to be incorrect. If it is just the number of characters that you have read that you want to track than I suggest you do it yourself, as in:
using(FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
/**Int32 position = 0;**/
using(StreamReader streamReader = new StreamReader(fileStream))
{
while(!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
/**position += line.Length;**/
DoSomethingInteresting(line);
/**SaveLastPositionInFile(position);**/
if(CheckSomeCondition())
{
break;
}
}
}
}
Provide that your file is not too big, why not read the whole thing in big chuncks and then manipulate the string - probably faster than the stop and go i/o.
For example,
//load entire file
StreamReader srFile = new StreamReader(strFileName);
StringBuilder sbFileContents = new StringBuilder();
char[] acBuffer = new char[32768];
while (srFile.ReadBlock(acBuffer, 0, acBuffer.Length)
> 0)
{
sbFileContents.Append(acBuffer);
acBuffer = new char[32768];
}
srFile.Close();