Read the large text files into chunks line by line

Read the large text files into chunks line by line - c#

Suppose the following lines in text file to which i have to read
INFO 2014-03-31 00:26:57,829 332024549ms Service1 startmethod - FillPropertyColor end
INFO 2014-03-31 00:26:57,829 332024549ms Service1 getReports_Dataset - getReports_Dataset started
INFO 2014-03-31 00:26:57,829 332024549ms Service1 cheduledGeneration - SwitchScheduledGeneration start
INFO 2014-03-31 00:26:57,829 332024549ms Service1 cheduledGeneration - SwitchScheduledGeneration limitId, subscriptionId, limitPeriod, dtNextScheduledDate,shoplimittype0, 0, , 3/31/2014 12:26:57 AM,0
I use the FileStream method to read the text file because the text file size having size over 1 GB. I have to read the files into chunks like initially in first run of program this would read two lines i.e. up to "getReports_Dataset started of second line". In next run it should read from 3rd line. I did the code but unable to get desired output.Problem is that my code doesn't give the exact chunk from where i have to start read text in next run. And second problem is while reading text lines .. don't give a complete line..i.e. some part is missing in lines. Following code:
readPosition = getLastReadPosition();
using (FileStream fStream = new FileStream(logFilePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (System.IO.StreamReader rdr = new System.IO.StreamReader(fStream))
{
rdr.BaseStream.Seek(readPosition, SeekOrigin.Begin);
while (numCharCount > 0)
{
int numChars = rdr.ReadBlock(block, 0, block.Length);
string blockString = new string(block);
lines = blockString.Split(Convert.ToChar('\r'));
lines[0] = fragment + lines[0];
fragment = lines[lines.Length - 1];
foreach (string line in lines)
{
lstTextLog.Add(line);
if (lstTextLog.Contains(fragment))
{
lstTextLog.Remove(fragment);
}
numProcessedChar++;
}
numCharCount--;
}
SetLastPosition(numProcessedChar, logFilePath);
}

If you want to read a file line-by-line, do this:
foreach (string line in File.ReadLines("filename"))
{
// process line here
}
If you really must read a line and save the position, you need to save the last line number read, rather than the stream position. For example:
int lastLineRead = getLastLineRead();
string nextLine = File.ReadLines("filename").Skip(lastLineRead).FirstOrDefault();
if (nextLine != null)
{
lastLineRead++;
SetLastPosition(lastLineRead, logFilePath);
}
The reason you can't do it by saving the base stream position is because StreamReader reads a large buffer full of data from the base stream, which moves the file pointer forward by the buffer size. StreamReader then satisfies read requests from that buffer until it has to read the next buffer full. For example, say you open a StreamReader and ask for a single character. Assuming that it has a buffer size of 4 kilobytes, StreamReader does essentially this:
if (buffer is empty)
{
read buffer (4,096 bytes) from base stream
buffer_position = 0;
}
char c = buffer[buffer_position];
buffer_position++; // increment position for next read
return c;
Now, if you ask for the base stream's position, it's going to report that the position is at 4096, even though you've only read one character from the StreamReader.

Related

How can i read the end of a file in c#?

i searched in stackoverflow and got one way but this method only let me to write word by word in the console. My goal is to get the end of my file but get the complete result not char by char.
This code only show me char by char the end of my file:
using (var reader = new StreamReader("file.dll")
{
if (reader.BaseStream.Length > 1024)
{
reader.BaseStream.Seek(-1024, SeekOrigin.End);
}
string line;
while ((line = reader.ReadLine()) != null)
{
Console.WriteLine(line);
Console.ReadKey();
}
}
I was trying to get something like this, it's c++ but i was trying to get the same result in c#.
QFile *archivo;
archivo = new QFile();
archivo->setFileName("file.dll");
archivo->open(QFile::ReadOnly);
archivo->seek(archivo->size() - 1024);
trama = archivo->read(1024);
It's possible to get the complete result of the end of my file in c#?

If the file is line-delimited text file, you can use ReadAllLines.
string[] lines = System.IO.File.ReadAllLines("file.txt");
If it's a binary file, you can use ReadAllBytes. Shocker, I know.
byte[] data = System.IO.File.ReadAllBytes("file.dll");
if you want to be able to seek first (e.g. if you want only the last 1024 bytes of the file) you can use the stream's Read method. Again, crazy.
reader.BaseStream.Seek(-1024, SeekOrigin.End);
var chars = new char[1024];
reader.Read(chars, 0, 1024);
And before you ask, you can convert the characters to a string by passing them to the constructor:
char[] chars = new char[1024];
string s = new string(chars);
Console.WriteLine(s);
Not sure what it'll look like, since you're reading characters from a binary file, but good luck. My guess is you should be reading bytes instead though:
reader.BaseStream.Seek(-1024, SeekOrigin.End);
var bytes = new byte[1024];
reader.BaseStream.Read(bytes, 0, 1024);
(Notice you don't even need the StreamReader, since the FileStream (your base stream) exposes the Read method you need).

Reading a complete text file using for loop

I am trying to read a text file using a for loop that runs for a 100 times.
StreamReader reader = new StreamReader("client.txt");
for (int i=0;i<=100;i++)
{
reader.readline();
}
Now this works fine if the text file has 100 lines but not if lets say 700. So I want the loop to run for 100 times but read "1%" of the file in each run.How would i do that?

If file size is not too large you can:
string[] lines = File.ReadAllLines("client.txt");
or
string text = File.ReadAllText("client.txt");
Reading 1% at a time is a bit tricky, I'd go with the approach of reading line by line:
var filename = "client.txt";
var info = new FileInfo(filename);
var text = new StringBuilder();
using (var stream = new FileStream(filename, FileMode.Open))
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
text.AppendLine(reader.ReadLine());
var progress = Convert.ToDouble(stream.Position) * 100 / info.Length;
Console.WriteLine(progress);
}
}
var result = text.ToString();
But please notice, the progress will not be very accurate because StreamReader.ReadLine (and equivalently ReadLineAsync) will often read more than just a single line - it basically reads into a buffer and then interprets that buffer. That's much more efficient than reading a single byte at a time, but it does mean that the stream will have advanced further than it strictly speaking needs to.

How to read a log file which is hourly updated in c#?

I'm trying to write a console app in C# which reads a log file. The problem that i'm facing is that this log file is updated every 1 hour so for example, if I had 10 lines in the beginning and afterwards 12, in my second read attempt i will have to read only the 2 newly added lines.
Can you suggest me a way to do this efficiently (without the need to read all the lines again because the log file usually has 5000+ lines)?

First of all you can use FileSystemWatcher to have notifications after file changed.
Morover you can use FileStream and Seek function to ready only new added lines. On http://www.codeproject.com/Articles/7568/Tail-NET there is an example with Thread.Sleep:
using ( StreamReader reader = new StreamReader(new FileStream(fileName,
FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) )
{
//start at the end of the file
long lastMaxOffset = reader.BaseStream.Length;
while ( true )
{
System.Threading.Thread.Sleep(100);
//if the file size has not changed, idle
if ( reader.BaseStream.Length == lastMaxOffset )
continue;
//seek to the last max offset
reader.BaseStream.Seek(lastMaxOffset, SeekOrigin.Begin);
//read out of the file until the EOF
string line = "";
while ( (line = reader.ReadLine()) != null )
Console.WriteLine(line);
//update the last max offset
lastMaxOffset = reader.BaseStream.Position;
}
}

Why StreamReader.EndOfStream property change the BaseStream.Position value

I wrote this small program which reads every 5th character from Random.txt
In random.txt I have one line of text: ABCDEFGHIJKLMNOPRST. I got the expected result:
Position of A is 0
Position of F is 5
Position of K is 10
Position of P is 15
Here is the code:
static void Main(string[] args)
{
StreamReader fp;
int n;
fp = new StreamReader("d:\\RANDOM.txt");
long previousBSposition = fp.BaseStream.Position;
//In this point BaseStream.Position is 0, as expected
n = 0;
while (!fp.EndOfStream)
{
//After !fp.EndOfStream were executed, BaseStream.Position is changed to 19,
//so I have to reset it to a previous position :S
fp.BaseStream.Seek(previousBSposition, SeekOrigin.Begin);
Console.WriteLine("Position of " + Convert.ToChar(fp.Read()) + " is " + fp.BaseStream.Position);
n = n + 5;
fp.DiscardBufferedData();
fp.BaseStream.Seek(n, SeekOrigin.Begin);
previousBSposition = fp.BaseStream.Position;
}
}
My question is, why after line while (!fp.EndOfStream) BaseStream.Position is changed to 19, e.g. end of a BaseStream. I expected, obviously wrong, that BaseStream.Position will stay the same when I call EndOfStream check?
Thanks.

Thre only certain way to find out whether a Stream is at its end is to actually read something from it and check whether the return value is 0. (StreamReader has another way – checking its internal buffer, but you correctly don't let it do that by calling DiscardBufferedData.)
So, EndOfStream has to read at least one byte from the base stream. And since reading byte by byte is inefficient, it reads more. That's the reason why the call to EndOfStream changes the position to the end (it woulnd't be the end of file for bigger files).
It seems you don't actually need to use StreamReader, so you should use Stream (or specifically FileStream) directly:
using (Stream fp = new FileStream(#"d:\RANDOM.txt", FileMode.Open))
{
int n = 0;
while (true)
{
int read = fp.ReadByte();
if (read == -1)
break;
char c = (char)read;
Console.WriteLine("Position of {0} is {1}.", c, fp.Position);
n += 5;
fp.Position = n;
}
}
(I'm not sure what does setting the position beyond the end of file do in this situation, you may need to add a check for that.)

The base stream's Position property refers to the position of the last read byte in the buffer, not the actual position of the StreamReader's cursor.

You are right and I could reproduce your issue as well, anyway according to (MSDN: Read Text from a File) the proper way to read a text file with a StreamReader is the following, not yours (this also always closes and disposes the stream by using a using block):
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}

Why does FileStream.Position increment in multiples of 1024?

I have a text file that I want to read line by line and record the position in the text file as I go. After reading any line of the file the program can exit, and I need to resume reading the file at the next line when it resumes.
Here is some sample code:
using (FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
using (StreamReader streamReader = new StreamReader(fileStream))
{
while (!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
DoSomethingInteresting(line);
SaveLastPositionInFile(fileStream.Position);
if (CheckSomeCondition())
{
break;
}
}
}
}
When I run this code, the value of fileStream.Position does not change after reading each line, it only advances after reading a couple of lines. When it does change, it increases in multiples of 1024. Now I assume that there is some buffering going on under the covers, but how can I record the exact position in the file?

It's not FileStream that's responsible - it's StreamReader. It's reading 1K at a time for efficiency.
Keeping track of the effective position of the stream as far as the StreamReader is concerned is tricky... particularly as ReadLine will discard the line ending, so you can't accurately reconstruct the original data (it could have ended with "\n" or "\r\n"). It would be nice if StreamReader exposed something to make this easier (I'm pretty sure it could do so without too much difficulty) but I don't think there's anything in the current API to help you :(
By the way, I would suggest that instead of using EndOfStream, you keep reading until ReadLine returns null. It just feels simpler to me:
string line;
while ((line = reader.ReadLine()) != null)
{
// Process the line
}

I would agree with Stefan M., it is probably the buffering which is causing the Position to be incorrect. If it is just the number of characters that you have read that you want to track than I suggest you do it yourself, as in:
using(FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
/**Int32 position = 0;**/
using(StreamReader streamReader = new StreamReader(fileStream))
{
while(!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
/**position += line.Length;**/
DoSomethingInteresting(line);
/**SaveLastPositionInFile(position);**/
if(CheckSomeCondition())
{
break;
}
}
}
}

Provide that your file is not too big, why not read the whole thing in big chuncks and then manipulate the string - probably faster than the stop and go i/o.
For example,
//load entire file
StreamReader srFile = new StreamReader(strFileName);
StringBuilder sbFileContents = new StringBuilder();
char[] acBuffer = new char[32768];
while (srFile.ReadBlock(acBuffer, 0, acBuffer.Length)
> 0)
{
sbFileContents.Append(acBuffer);
acBuffer = new char[32768];
}
srFile.Close();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read the large text files into chunks line by line - c#

Related

How can i read the end of a file in c#?

Reading a complete text file using for loop

How to read a log file which is hourly updated in c#?

Why StreamReader.EndOfStream property change the BaseStream.Position value

Why does FileStream.Position increment in multiples of 1024?

Categories

Resources