Skip Lines by using StreamReader - c#

I have a really big file with round about 30.000 Rows. I have to parse this file and can not delete entries on it. So my idea is to skip allready read lines. I tried something like this:
//Gets the allready readed lines
int readLines = GetCurrentCounter();
//Open File
FileStream stream = new FileStream(LogDatabasePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (StreamReader reader = new StreamReader(stream))
{
int counter = 0;
string line;
//If File was allready read to a specified line, skip these lines
if (readLines != 0) reader.ReadLine().Skip(readLines);
//Check if new lines are available
while ((line = reader.ReadLine()) != null)
{
if (counter >= readedLines)
{
//If there is text which contains the searched Testsystem-Name
if (line.Contains(TestSystemName.ToUpper()))
{
//Create new Database-Entry
new TestsystemError().GenerateNewDatabaseEntry(line, counter);
}
}
System.Console.WriteLine(line);
counter++;
}
}
The problem is, that the function reader.ReadLine().Skip(readLines) has no function or i use it in a wrong way.
I need a possibility to skip lines without use the function "reader.ReadLine()" because this is very slow (i get performance problems if i have to iterate through all lines ~about 30.000 lines).
Is there a way to skip lines? If so, would be great to share code. Thanks.

The method reader.ReadLine() returns a string.
The extension method Skip(readedLines) iterates that string and returns an iterator which has skipped the first readedLines characters in the string.
This has no effect on the reader.
If you want to skip the first n lines, either read the first n lines by calling reader.ReadLine() n times, or read the stream until you have read in n end-of-line character sequences before creating the reader. The latter approach avoids creating strings for the lines you want to ignore, but is more code.
If you happen to have extremely regular data so all the rows are the same length, then you can skip the stream before you create the reader
FileStream stream = new FileStream(LogDatabasePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
stream.Seek(readedRows * lengthOfRowInBytes, SeekOrigin.Begin);
using (StreamReader reader = new StreamReader(stream))
// etc
If you have the row number encoded in the row, you could also do a binary search, but that's more code.

Instead of keeping track of the number of lines, keep track of the number of characters read. Then you can use stream.Seek() to quickly skip to the last read position instead of iterating through the whole file every time.
long currentPosition = GetCurrentPosition();
//Open File
FileStream stream = new FileStream(LogDatabasePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
using (StreamReader reader = new StreamReader(stream))
{
string line;
// Seek to the previously read position
stream.Seek(currentPosition, SeekOrigin.Begin);
//Check if new lines are available
while ((line = reader.ReadLine()) != null)
{
// do stuff with the line
// ...
Console.WriteLine(line);
// keep track of the current character position
currentPosition += line.Length + 2; // Add 2 for newline
}
}
SaveCurrentPosition(currentPosition);

You should skip the lines as you read them
//If File was allready read to a specified line, skip these lines
while ((line = reader.ReadLine()) != null && readLines < readedLines){
readLines++
}
if (readedLines != 0) reader.ReadLine()
//Check if new lines are available
while ((line = reader.ReadLine()) != null)

Related

How to read a line and write to that line, of a text file?

I'm use MS-Visual Studio 2015, develop a Winforms application in C#.
What I'm trying to reach is a reader&writer which opens a CSV file with UTF-8 coding, and reads line for line. My program actually reads a line, split it at the semicolons (;) and send that informations to my database. Now it should mark that line as already read, by appending a text or a special sign e.g. ("read" or "done" or "§$%").
Because it's possible that someone or something (ERP-Systems), appends new data to that CSV file. So, the next time my program iterates through that file, it shall only read the line without my special mark.
my program:
foreach (string sFile in Directory.GetFiles(sImportPfad, "*.*", SearchOption.TopDirectoryOnly))
{
var oStream = new FileStream(sFile, FileMode.Append, FileAccess.Write, FileShare.Read);
var iStream = new FileStream(sFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
var sw = new System.IO.StreamWriter(oStream);
var sr = new System.IO.StreamReader(iStream);
int c = 0;
// alle Zeilen jedes Files laden
while (!sr.EndOfStream)
{
String line = sr.ReadLine();
String[] splitLine = line.Trim().Split(txtDivider.Text.Trim().ToCharArray());
if (line.Contains("§$%"))
break;
DatenbankEintragAuftragsPool dbEintrag = new DatenbankEintragAuftragsPool();
foreach (myImportFilterObject ob in arrImportFilterObjects)
{
.
.
.
}
String result = Program.myPaletti.AuftragInDieDatenbankSchreiben(dbEintrag);
if (result.Equals("ok"))
{
sw.WriteLine(line + " §$%"); sw.Flush();
}
}
}
My problem is the writer is appending the line+"special mark" to the end of my file.
Additionally I didn't figure out how to read the file with UTF-8 coding.
I appreciate your answers !!
EDIT: ##############
This code would do the trick...
string[] lines = System.IO.File.ReadAllLines("test");
lines[0] = lines[0] + " $%&"; /* replace with whatever you need */
System.IO.File.WriteAllLines("test", lines);
But for my usage it's not recommended to read all lines, 'cause it's possible that the guys never delete any data for the next 20 years.
I'll go further to find a solution line by line...
There are some problems in your code that I will try to solve here
using(var stream = new FileStream(sFile, FileMode.Open, FileAccess.Read)
using(var reader = new StreamReader(stream, Encoding.UTF8))
{
long position = GetFirstNewRecordOfFile(sFile);
stream.Seek(position, SeekOrigin.Begin);
while(!reader.EndOfStream)
{
var line = reader.ReadLine();
// Process line
}
SaveFirstNewRecordOfFile(sFile, stream.Position);
}
Now you just need to figure out where and how to save the position of the file.
If you have a writer that appends data to the file the file might grow to a huge size over time, maybe it is better to truncate or delete the file when it has been read.
I recommend deleting the file since you will not have to loop through a lot of empty files, that will however require that you rename/move the file before processing it to avoid that the writer process appends data to it after you close it but before you delete it.
If you just move the file to a sub folder you can use that as a backup.
My solution now is to create a new file, write into this file, delete the original file and rename the new file.
foreach (string sFile in Directory.GetFiles(sImportPfad, "*.*", SearchOption.TopDirectoryOnly))
{
FileStream iStream;
try
{
using (iStream = new FileStream(sFile, FileMode.Open, FileAccess.Read, FileShare.None))
{
var sr = new System.IO.StreamReader(iStream, Encoding.UTF8);
if (rbCSVfilesMarkieren.Checked)
{
using (var oStream = new FileStream(sFile + "_new", FileMode.Create, FileAccess.Write, FileShare.None))
{
var sw = new System.IO.StreamWriter(oStream, Encoding.UTF8);
int c = 0;
while (!sr.EndOfStream)
{
String line = sr.ReadLine();
String[] splitLine = line.Trim().Split(txtDivider.Text.Trim().ToCharArray());
if (line.Contains("$$$"))
{
sw.WriteLine(line);
sw.Flush();
continue;
}
String result = Program.myPaletti.Irgendwasneues(splitLine, arrImportFilterObjects);
if (result.Equals("ok"))
{
sw.WriteLine(line + "$$$");
sw.Flush();
anzNeueDatensätze++;
}
}
}
}
System.IO.File.Delete(sFile);
System.IO.File.Move(sFile + "_new", sFile);
}
}
}
I also included the UTF-8 coding.
Furthermore I've found a way to block the file I'm reading/writing, by using FileShare.None.
Thank you guys for your help !! I appreciate it !

Very slow StreamReader for medium sized files.

I am reading text files into program(they are code in Unicode, the output must be in utf-8). The code below works fine for smaller ones (around 150 lines, where line is one word only), however when I am using it on bigger files(like 20.000 line, still only one word on which line) the program takes areound half a minute to complete its task. Should I write new code, or is there a way to optimize this?
int next;
string storage = "";
using (StreamReader sr = new StreamReader(path))
{
while( (next = sr.Read()) != -1 )
{
storage += Char.ConvertFromUtf32(next);
}
sr.Close();
}
Use StringBuilder instead of String:
int next;
StringBuilder storage = new StringBuilder();
using (StreamReader sr = new StreamReader(path)) {
while ((next = sr.Read()) != -1) {
storage.Append(Char.ConvertFromUtf32(next));
}
sr.Close();
}
string result = storage.ToString();
So, everything start working really smoothly when I used different StreamReader,
using (StreamReader sr = new StreamReader(path, Encoding.Unicode))
this, let me get properly formated string, not int indicating character, this has improved speed by A LOT.

Read a very large file by chunks and not line-by-line

I want to read a CSV file which can be at a size of hundreds of GBs and even TB. I got a limitation that I can only read the file in chunks of 32MB. My solution to the problem, not only does it work kinda slow, but it can also break a line in the middle of it.
I wanted to ask if you know of a better solution:
const int MAX_BUFFER = 33554432; //32MB
byte[] buffer = new byte[MAX_BUFFER];
int bytesRead;
using (FileStream fs = File.Open(filePath, FileMode.Open, FileAccess.Read))
using (BufferedStream bs = new BufferedStream(fs))
{
string line;
bool stop = false;
while ((bytesRead = bs.Read(buffer, 0, MAX_BUFFER)) != 0) //reading only 32mb chunks at a time
{
var stream = new StreamReader(new MemoryStream(buffer));
while ((line = stream.ReadLine()) != null)
{
//process line
}
}
}
Please do not respond with a solution which reads the file line by line (for example File.ReadLines is NOT an acceptable solution). Why? Because I'm just searching for another solution...
The problem with your solution is that you recreate the streams in each iteration. Try this version:
const int MAX_BUFFER = 33554432; //32MB
byte[] buffer = new byte[MAX_BUFFER];
int bytesRead;
StringBuilder currentLine = new StringBuilder();
using (FileStream fs = File.Open(filePath, FileMode.Open, FileAccess.Read))
using (BufferedStream bs = new BufferedStream(fs))
{
string line;
bool stop = false;
var memoryStream = new MemoryStream(buffer);
var stream = new StreamReader(memoryStream);
while ((bytesRead = bs.Read(buffer, 0, MAX_BUFFER)) != 0)
{
memoryStream.Seek(0, SeekOrigin.Begin);
while (!stream.EndOfStream)
{
line = ReadLineWithAccumulation(stream, currentLine);
if (line != null)
{
//process line
}
}
}
}
private string ReadLineWithAccumulation(StreamReader stream, StringBuilder currentLine)
{
while (stream.Read(buffer, 0, 1) > 0)
{
if (charBuffer [0].Equals('\n'))
{
string result = currentLine.ToString();
currentLine.Clear();
if (result.Last() == '\r') //remove if newlines are single character
{
result = result.Substring(0, result.Length - 1);
}
return result;
}
else
{
currentLine.Append(charBuffer [0]);
}
}
return null; //line not complete yet
}
private char[] charBuffer = new char[1];
NOTE: This needs some tweaking if newlines are two characters long and you need newline characters to be contained in the result. The worst case would be newline pair "\r\n" split across two blocks. However since you were using ReadLine I assumed that you don't need this.
Also, the problem is that in case your whole data contains only one line, this will end up in an attempt to read the whole data into memory anyway.
which can be at a size of hundreds of GBs and even TB
For a large file processing the most suitable class recommended is MemoryMappedFile Class
Some advantages:
It is ideal to access a data file on disk without performing file I/O operations and from buffering the file’s content. This works great when you deal with large data files.
You can use memory mapped files to allow multiple processes running on the same machine to share data with each other.
so try it and you will note the difference as swapping between memory and harddisk is a time consuming operation

Reading from large text files in c# causing memory leak

I am trying to read from a large text file with a word on each line and put all the values into an SQL database, with a small text file this works fine but when I have a larger text file, say 300,000 lines I run out of memory.
What is the best way to avoid this? Is there a way to read only a portion of the file, add this to the database then take it out of memory and move on to the next portion?
Here is my code so far:
string path = Server.MapPath("~/content/wordlist.txt");
StreamReader word_stream = new StreamReader(path);
string wordlist = word_stream.ReadToEnd();
string[] all_words = wordlist.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
I then loop through the array adding each value to the database, but when the file is to large it simply doesnt work.
Do it like this:
// Choose the size of the buffer according
// to your requirements and/or available memory.
int bufferSize = 256 * 1024 * 1024;
string path = Server.MapPath("~/content/wordlist.txt");
using (FileStream stream = new FileStream(path, FileMode.Open, FileAccess.Read))
using (BufferedStream bufferedStream = new BufferedStream(stream, bufferSize))
using (StreamReader reader = new StreamReader(bufferedStream))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
... put line into DB ...
}
}
Also, do not forget exception handling.
try it with yield return
StreamReader r = new StreamReader(path);
while( !r.EndOfStream )
{
string line = r.ReadLine();
yield return line;
}
maybe you read ten lines yield return them, write them to the database and then the next portion.

How to efficiently read only last line of the text file

Need to get just last line from big log file. What is the best way to do that?
You want to read the file backwards using ReverseLineReader:
How to read a text file reversely with iterator in C#
Then run .Take(1) on it.
var lines = new ReverseLineReader(filename);
var last = lines.Take(1);
You'll want to use Jon Skeet's library MiscUtil directly rather than copying/pasting the code.
String lastline="";
String filedata;
// Open file to read
var fullfiledata = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
StreamReader sr = new StreamReader(fullfiledata);
//long offset = sr.BaseStream.Length - ((sr.BaseStream.Length * lengthWeNeed) / 100);
// Assuming a line doesnt have more than 500 characters, else use above formula
long offset = sr.BaseStream.Length - 500;
//directly move the last 500th position
sr.BaseStream.Seek(offset, SeekOrigin.Begin);
//From there read lines, not whole file
while (!sr.EndOfStream)
{
filedata = sr.ReadLine();
// Interate to see last line
if (sr.Peek() == -1)
{
lastline = filedata;
}
}
return lastline;
}
Or you can do it two line (.Net 4 only)
var lines = File.ReadLines(path);
string line = lines.Last();

Categories