Why does FileStream.Position increment in multiples of 1024? - c#

I have a text file that I want to read line by line and record the position in the text file as I go. After reading any line of the file the program can exit, and I need to resume reading the file at the next line when it resumes.
Here is some sample code:
using (FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
using (StreamReader streamReader = new StreamReader(fileStream))
{
while (!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
DoSomethingInteresting(line);
SaveLastPositionInFile(fileStream.Position);
if (CheckSomeCondition())
{
break;
}
}
}
}
When I run this code, the value of fileStream.Position does not change after reading each line, it only advances after reading a couple of lines. When it does change, it increases in multiples of 1024. Now I assume that there is some buffering going on under the covers, but how can I record the exact position in the file?

It's not FileStream that's responsible - it's StreamReader. It's reading 1K at a time for efficiency.
Keeping track of the effective position of the stream as far as the StreamReader is concerned is tricky... particularly as ReadLine will discard the line ending, so you can't accurately reconstruct the original data (it could have ended with "\n" or "\r\n"). It would be nice if StreamReader exposed something to make this easier (I'm pretty sure it could do so without too much difficulty) but I don't think there's anything in the current API to help you :(
By the way, I would suggest that instead of using EndOfStream, you keep reading until ReadLine returns null. It just feels simpler to me:
string line;
while ((line = reader.ReadLine()) != null)
{
// Process the line
}

I would agree with Stefan M., it is probably the buffering which is causing the Position to be incorrect. If it is just the number of characters that you have read that you want to track than I suggest you do it yourself, as in:
using(FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
/**Int32 position = 0;**/
using(StreamReader streamReader = new StreamReader(fileStream))
{
while(!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
/**position += line.Length;**/
DoSomethingInteresting(line);
/**SaveLastPositionInFile(position);**/
if(CheckSomeCondition())
{
break;
}
}
}
}

Provide that your file is not too big, why not read the whole thing in big chuncks and then manipulate the string - probably faster than the stop and go i/o.
For example,
//load entire file
StreamReader srFile = new StreamReader(strFileName);
StringBuilder sbFileContents = new StringBuilder();
char[] acBuffer = new char[32768];
while (srFile.ReadBlock(acBuffer, 0, acBuffer.Length)
> 0)
{
sbFileContents.Append(acBuffer);
acBuffer = new char[32768];
}
srFile.Close();

Related

C# - Read bytes from file from a specific string

I'm trying to parse a crg-file in C#. The file is mixed with plain text and binary data. The first section of the file contains plain text while the rest of the file is binary (lots of floats), here's an example:
$
$ROAD_CRG
reference_line_start_u = 100
reference_line_end_u = 120
$
$KD_DEFINITION
#:KRBI
U:reference line u,m,730.000,0.010
D:reference line phi,rad
D:long section 1,m
D:long section 2,m
D:long section 3,m
...
$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
�#z����RA����\�l
...
I know I can read bytes starting at a specific offset but how do I find out which byte to start from? The last row before the binary section will always contain at least four dollar signs "$$$$". Here's what I've got so far:
using var fs = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read);
var startByte = ??; // How to find out where to start?
using (BinaryReader reader = new BinaryReader(fs))
{
reader.BaseStream.Seek(startByte, SeekOrigin.Begin);
var f = reader.ReadSingle();
Debug.WriteLine(f);
}
When you have a mixture of text data and binary data, you need to treat everything as binary. This means you should be using raw Stream access, or something similar, and using binary APIs to look through the text data (often looking for cr/lf/crlf at bytes as sentinels, although it sounds like in your case you could just look for the $$$$ using binary APIs, then decode the entire block before, and scan forwards). When you think you have an entire line, then you can use Encoding to parse each line - the most convenient API being encoding.GetString(). When you've finished looking through the text data as binary, then you can continue parsing the binary data, again using the binary API. I would usually recommend against BinaryReader here too, because frankly it doesn't gain you much over more direct API. The other problem you might want to think about is CPU endianness, but assuming that isn't a problem: BitConverter.ToSingle() may be your friend.
If the data is modest in size, you may find it easiest to use byte[] for the data; either via File.ReadAllBytes, or by renting an oversized byte[] from the array-pool, and loading it from a FileStream. The Stream API is awkward for this kind of scenario, because once you've looked at data: it has gone - so you need to maintain your own back-buffers. The pipelines API is ideal for this, when dealing with large data, but is an advanced topic.
UPDATE: This code may not work as expected. Please review the valuable information in the comments.
using (var fs = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read))
{
using (StreamReader sr = new StreamReader(fs, Encoding.ASCII, true, 1, true))
{
var line = sr.ReadLine();
while (!string.IsNullOrWhiteSpace(line) && !line.Contains("$$$$"))
{
line = sr.ReadLine();
}
}
using (BinaryReader reader = new BinaryReader(fs))
{
// TODO: Start reading the binary data
}
}
Solution
I know this is far from the most optimized solution but in my case it did the trick and since the plain text section of the file was known to be fairly small this didn't cause any noticable performance issues. Here's the code:
using var fileStream = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read);
using var reader = new BinaryReader(fileStream);
var newLine = '\n';
var markerString = "$$$$";
var currentString = "";
var foundMarker = false;
var foundNewLine = false;
while (!foundNewLine)
{
var c = reader.ReadChar();
if (!foundMarker)
{
currentString += c;
if (currentString.Length > markerString.Length)
currentString = currentString.Substring(1);
if (currentString == markerString)
foundMarker = true;
}
else
{
if (c == newLine)
foundNewLine = true;
}
}
if (foundNewLine)
{
// Read binary
}
Note: If you're dealing with larger or more complex files you should probably take a look at Mark Gravell's answer and the comment sections.

Reading a complete text file using for loop

I am trying to read a text file using a for loop that runs for a 100 times.
StreamReader reader = new StreamReader("client.txt");
for (int i=0;i<=100;i++)
{
reader.readline();
}
Now this works fine if the text file has 100 lines but not if lets say 700. So I want the loop to run for 100 times but read "1%" of the file in each run.How would i do that?
If file size is not too large you can:
string[] lines = File.ReadAllLines("client.txt");
or
string text = File.ReadAllText("client.txt");
Reading 1% at a time is a bit tricky, I'd go with the approach of reading line by line:
var filename = "client.txt";
var info = new FileInfo(filename);
var text = new StringBuilder();
using (var stream = new FileStream(filename, FileMode.Open))
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
text.AppendLine(reader.ReadLine());
var progress = Convert.ToDouble(stream.Position) * 100 / info.Length;
Console.WriteLine(progress);
}
}
var result = text.ToString();
But please notice, the progress will not be very accurate because StreamReader.ReadLine (and equivalently ReadLineAsync) will often read more than just a single line - it basically reads into a buffer and then interprets that buffer. That's much more efficient than reading a single byte at a time, but it does mean that the stream will have advanced further than it strictly speaking needs to.

Using StreamReader / StreamWriter to grab logs causes program to cease responding

I'm attempting to use StreamReader and StreamWriter to grab a temporary output log (.txt format) from another application.
The output log is always open and constantly written to.
Unhelpfully if the application closes or crashes, the log file ends up deleted - hence the need for a tool that can grab the information from this log and save it.
What my program currently does is:
Create a new .txt file, and stores the path of that file as the
string "destinationFile".
Finds the .txt log file to read, and stores the path of that file as
the string "sourceFile"
It then passes those two strings to the method below.
Essentially I'm trying to read the sourceFile one line at a time.
Each time one line is read, it is appended to destinationFile.
This keeps looping until the sourceFile no longer exists (i.e. the application has closed or crashed and deleted its log).
In addition, the sourceFile can get quite big (sometimes 100Mb+), and this program may be handling more than one log at a time.
Reading the whole log rather than line by line will most likely start consuming a fair bit of memory.
private void logCopier(string sourceFile, string destinationFile)
{
while (File.Exists(sourceFile))
{
string textLine;
using (var readerStream = File.Open(sourceFile,
FileMode.Open,
FileAccess.Read,
FileShare.ReadWrite))
using (var reader = new StreamReader(readerStream))
{
while ((textLine = reader.ReadLine()) != null)
{
using (FileStream writerStream = new FileStream(destinationFile,
FileMode.Append,
FileAccess.Write))
using (StreamWriter writer = new StreamWriter(writerStream))
{
writer.WriteLine(textLine);
}
}
}
}
}
The problem is that my WPF application locks up and ceases to respond when it reaches this code.
To track down where, I put a MessageBox just before the writerStream line of the code to output what the reader was picking up.
It was certainly reading the log file just fine, but there appears to be a problem with writing it to the file.
As soon as it reaches the using (FileStream writerStream = new FileStream part of the code, it stops responding.
Is using the StreamWriter in this manner not valid, or have I just gone and dome something silly in the code?
Am also open to a better solution than what I'm trying to do here.
Simply what I understand is you need to copy a file from source to destination which may be deleted at any time.
I'll suggest you to use FileSystemWatcher to watch for source file changed event, then just simply copy the whole file from source to destination using File.Copy.
I've just solved the problem, and the issue was indeed something silly!
When creating the text file for the StreamWriter, I had forgotten to use .Dispose();. I had File.Create(filename); instead of File.Create(filename).Dispose(); This meant the text file was already open, and the StreamWriter was attempting to write to a file that was locked / in use.
The UI still locks up (as expected), as I've yet to implement this on a new thread as SteenT mentioned. However the program no longer crashes and the code correctly reads the log and outputs to a text file.
Also after a bit of refinement, my log reader/writer code now looks like this:
private void logCopier(string sourceFile, string destinationFile)
{
int num = 1;
string textLine = String.Empty;
long offset = 0L;
while (num == 1)
{
if (File.Exists(sourceFile))
{
FileStream stream = new FileStream(sourceFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (new StreamReader(stream))
{
stream.Seek(offset, SeekOrigin.Begin);
TextReader reader2 = new StreamReader(stream);
while ((textLine = reader2.ReadLine()) != null)
{
Thread.Sleep(1);
StreamWriter writer = new StreamWriter(destinationFile, true);
writer.WriteLine(textLine);
writer.Flush();
writer.Close();
offset = stream.Position;
}
continue;
}
}
else
{
num = 0;
}
}
}
Just putting this code up here in case anyone else is looking for something like this. :)

Streamwriter is cutting off my last couple of lines sometimes in the middle of a line?

Here is my code. :
FileStream fileStreamRead = new FileStream(pathAndFileName, FileMode.OpenOrCreate, FileAccess.Read, FileShare.None);
FileStream fileStreamWrite = new FileStream(reProcessedFile, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);
StreamWriter sw = new StreamWriter(fileStreamWrite);
int readIndex = 0;
using (StreamReader sr = new StreamReader(fileStreamRead))
{
while (!sr.EndOfStream) {
Console.WriteLine("eof" + sr.EndOfStream);
readIndex++;
Console.WriteLine(readIndex);
string currentRecord = "";
currentRecord = sr.ReadLine();
if (currentRecord.Trim() != "")
{
Console.WriteLine("Writing " + readIndex);
sw.WriteLine(currentRecord);
}
else {
Console.WriteLine("*******************************************spaces ***********************");
}
}
It is cutting off 2 lines with one test file and half a line, and then 1 line and half a line with the other test file I am running it against.
I am not a streamreader/writer expert you can probably see.
Any ideas or suggestions would be greatly appreciated as this is driving me batty. I am sure it is me using these incorrectly.
You are missing Flush/Close or simply using for your writer.
using(FileStream fileStreamWrite =
new FileStream(reProcessedFile, FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);
{
using(StreamWriter sw = new StreamWriter(fileStreamWrite))
{
// .... write everything here
}
}
Right after the closing brace of the using statement, do this:
sw.Flush();
sw.Close();
There, that should do it.
You need to Flush your StreamWriter. A StreamWriter has a buffer, and it writes to disk only when the buffer is full. By flushing at the end you make sure all the text in the buffer is written to the disk.
In addition to other answers (use using, and/or flush/close), would say that they do not actually respond to the question: "why it may cut several lines."
I have an idea on subject that it is related to a fact that you use StreamReader and call EndOfStream twice: in a while loop header, and another inside it.
The only possible way of understanding if the stream ends is try to read some data from it. So I suspect EnfOfStream does it, and reading it twice, may create a problem in stream processing.
To resolve an issue:
Or use simple TextReader, considering that you are reading text file (seems to me)
Or change your logic to call only once, so no more call to Console.WriteLine("eof" + sr.EndOfStream);
Or change your logic, so do not use EndOFStream at all, but read line by line till the line is null.
You're not using StreamWriter properly. Also, since you're always reading lines, I would use a method that already does all that for you (and manages it properly).
using (var writer = new StreamWriter("path"))
{
foreach(var line in File.ReadLines("path"))
{
if (string.IsNullOrWhiteSpace(line))
{ /**/ }
else
{ /**/ }
}
}
... or ...
/* do not call .ToArray or something that will evaluate this _here_, let WriteAllLines do that */
var lines = File.ReadLines("path")
.Select(line => string.IsNullOrWhiteSpace(line) ? Stars : line);
var encoding = Encoding.ASCII; // whatever is appropriate for you.
File.WriteAllLines("path", lines, encoding);

How to handle StreamReader?

I use StreamReader to read my csv file.
The problem is : i need to read this file twice, and in second time then i use StreamReader
StreamReader.EndOfStream is true and reading not executed.
using (var csvReader = new StreamReader(file.InputStream))
{
string inputLine = "";
var values = new List<string>();
while ((inputLine = csvReader.ReadLine()) != null)...
Can enybody help
Try file.InputStream.Seek(0, SeekOrigin.Begin); before you open the second StreamReader to reset the Stream to the starting point.
A much better approach(if possible) would be to store the file contents in memory, and re-use it from there.

Categories