I have a function which reads a big text file,splits a part(from a given start and end),and save the splitted data into another text file.since the file is too big,i need to add a progressbar when reading the stream and another one when writing the splitted text into the other file.Ps.start and end are given datetime!!
using (StreamReader sr = new StreamReader(file,System.Text.Encoding.ASCII))
{
while (sr.EndOfStream == false)
{
line = sr.ReadLine();
if (line.IndexOf(start) != -1)
{
using (StreamWriter sw = new StreamWriter(DateTime.Now.ToString().Replace("/", "-").Replace(":", "-") + "cut"))
{
sw.WriteLine(line);
while (sr.EndOfStream == false && line.IndexOf(end) == -1)
{
line = sr.ReadLine();
sw.WriteLine(line);
}
}
richTextBox1.Text += "done ..." + "\n";
break;
}
}
}
The first thing to do would be to work out how long the file is using FileInfo,
http://msdn.microsoft.com/en-us/library/system.io.fileinfo.aspx
FileInfo fileInfo = new FileInfo(file);
long length = fileInfo.Length;
I would suggest you do it like this,
private long currentPosition = 0;
private void UpdateProgressBar(int lineLength)
{
currentPosition += line.Count; // or plus 2 if you need to take into account carriage return
progressBar.Value = (int)(((decimal)currentPosition / (decimal)length) * (decimal)100);
}
private void CopyFile()
{
progressBar.Minimum = 0;
progressBar.Maximum = 100;
currentPosition = 0;
using (StreamReader sr = new StreamReader(file,System.Text.Encoding.ASCII))
{
while (sr.EndOfStream == false)
{
line = sr.ReadLine();
UpdateProgressBar(line.Length);
if (line.IndexOf(start) != -1)
{
using (StreamWriter sw = new StreamWriter(DateTime.Now.ToString().Replace("/", "-").Replace(":", "-") + "cut"))
{
sw.WriteLine(line);
while (sr.EndOfStream == false && line.IndexOf(end) == -1)
{
line = sr.ReadLine();
UpdateProgressBar(line.Length);
sw.WriteLine(line);
}
}
richTextBox1.Text += "done ..." + "\n";
break;
}
}
}
}
Which is calculating the percentage of the file that has been read and setting the progress bar to that value. Then you don't have to worry about whether the length is a long, and the progress bar uses int.
If you don't want to truncate the value then do this (casting to an int above will always truncate the decimals, and thus round down),
progressBar.Value = (int)Math.Round(((decimal)currentPosition / (decimal)length) * (decimal)100), 0);
Is this on a background thread? Don't forget that you will have to call this.Invoke to update the progress bar or else you will get a cross thread exception.
Related
I have a huge text file around 2GB which I am trying to parse in C#.
The file has custom delimiters for rows and columns. I want to parse the file and extract the data and write to another file by inserting column header and replacing RowDelimiter by newline and ColumnDelimiter by tab so that I can get the data in tabular format.
sample data:
1'~'2'~'3#####11'~'12'~'13
RowDelimiter: #####
ColumnDelimiter: '~'
I keep on getting System.OutOfMemoryException on the following line
while ((line = rdr.ReadLine()) != null)
public void ParseFile(string inputfile,string outputfile,string header)
{
using (StreamReader rdr = new StreamReader(inputfile))
{
string line;
while ((line = rdr.ReadLine()) != null)
{
using (StreamWriter sw = new StreamWriter(outputfile))
{
//Write the Header row
sw.Write(header);
//parse the file
string[] rows = line.Split(new string[] { ParserConstants.RowSeparator },
StringSplitOptions.None);
foreach (string row in rows)
{
string[] columns = row.Split(new string[] {ParserConstants.ColumnSeparator},
StringSplitOptions.None);
foreach (string column in columns)
{
sw.Write(column + "\\t");
}
sw.Write(ParserConstants.NewlineCharacter);
Console.WriteLine();
}
}
Console.WriteLine("File Parsing completed");
}
}
}
As mentioned already in the comments you won't be able to use ReadLine to handle this, you'll have to essentially process the data one byte - or character - at a time. The good news is that this is basically how ReadLine works anyway, so we're not losing a lot in this case.
Using a StreamReader we can read a series of characters from the source stream (in whatever encoding you need) into an array. Using that and a StringBuilder we can process the stream in chunks and check for separator sequences on the way.
Here's a method that will handle an arbitrary delimiter:
public static IEnumerable<string> ReadDelimitedRows(StreamReader reader, string delimiter)
{
char[] delimChars = delimiter.ToArray();
int matchCount = 0;
char[] buffer = new char[512];
int rc = 0;
StringBuilder sb = new StringBuilder();
while ((rc = reader.Read(buffer, 0, buffer.Length)) > 0)
{
for (int i = 0; i < rc; i++)
{
char c = buffer[i];
if (c == delimChars[matchCount])
{
if (++matchCount >= delimChars.Length)
{
// found full row delimiter
yield return sb.ToString();
sb.Clear();
matchCount = 0;
}
}
else
{
if (matchCount > 0)
{
// append previously matched portion of the delimiter
sb.Append(delimChars.Take(matchCount));
matchCount = 0;
}
sb.Append(c);
}
}
}
// return the last row if found
if (sb.Length > 0)
yield return sb.ToString();
}
This should handle any cases where part of your block delimiter can appear in the actual data.
In order to translate your file from the input format you describe to a simple tab-delimited format you could do something along these lines:
const string RowDelimiter = "#####";
const string ColumnDelimiter = "'~'";
using (var reader = new StreamReader(inputFilename))
using (var writer = new StreamWriter(File.Create(ouputFilename)))
{
foreach (var row in ReadDelimitedRows(reader, RowDelimiter))
{
writer.Write(row.Replace(ColumnDelimiter, "\t"));
}
}
That should process fairly quickly without eating up too much memory. Some adjustments might be required for non-ASCII output.
Read the data into a buffer and then do your parsing.
using (StreamReader rdr = new StreamReader(inputfile))
using (StreamWriter sw = new StreamWriter(outputfile))
{
char[] buffer = new char[256];
int read;
//Write the Header row
sw.Write(header);
string remainder = string.Empty;
while ((read = rdr.Read(buffer, 0, 256)) > 0)
{
string bufferData = new string(buffer, 0, read);
//parse the file
string[] rows = bufferData.Split(
new string[] { ParserConstants.RowSeparator },
StringSplitOptions.None);
rows[0] = remainder + rows[0];
int completeRows = rows.Length - 1;
remainder = rows.Last();
foreach (string row in rows.Take(completeRows))
{
string[] columns = row.Split(
new string[] {ParserConstants.ColumnSeparator},
StringSplitOptions.None);
foreach (string column in columns)
{
sw.Write(column + "\\t");
}
sw.Write(ParserConstants.NewlineCharacter);
Console.WriteLine();
}
}
if(reamainder.Length > 0)
{
string[] columns = remainder.Split(
new string[] {ParserConstants.ColumnSeparator},
StringSplitOptions.None);
foreach (string column in columns)
{
sw.Write(column + "\\t");
}
sw.Write(ParserConstants.NewlineCharacter);
Console.WriteLine();
}
Console.WriteLine("File Parsing completed");
}
The problem you have is that you are eagerly consuming the whole file and placing it in memory. Attempting to split a 2GB file in memory is going to be problematic, as you now know.
Solution? Consume one lime a time. Because your file doesn't have a standard line separator you'll have to implement a custom parser that does this for you. The following code does just that (or I think it does, I haven't tested it). Its probably very improvable from a performance perspective but it should at least get you started in the right direction (c#7 syntax):
public static IEnumerable<string> GetRows(string path, string rowSeparator)
{
bool tryParseSeparator(StreamReader reader, char[] buffer)
{
var count = reader.Read(buffer, 0, buffer.Length);
if (count != buffer.Length)
return false;
return Enumerable.SequenceEqual(buffer, rowSeparator);
}
using (var reader = new StreamReader(path))
{
int peeked;
var rowBuffer = new StringBuilder();
var separatorBuffer = new char[rowSeparator.Length];
while ((peeked = reader.Peek()) > -1)
{
if ((char)peeked == rowSeparator[0])
{
if (tryParseSeparator(reader, separatorBuffer))
{
yield return rowBuffer.ToString();
rowBuffer.Clear();
}
else
{
rowBuffer.Append(separatorBuffer);
}
}
else
{
rowBuffer.Append((char)reader.Read());
}
}
if (rowBuffer.Length > 0)
yield return rowBuffer.ToString();
}
}
Now you have a lazy enumeration of rows from your file, and you can process it as you intended to:
foreach (var row in GetRows(inputFile, ParserConstants.RowSeparator))
{
var columns = line.Split(new string[] {ParserConstants.ColumnSeparator},
StringSplitOptions.None);
//etc.
}
I think this should do the trick...
public void ParseFile(string inputfile, string outputfile, string header)
{
int blockSize = 1024;
using (var file = File.OpenRead(inputfile))
{
using (StreamWriter sw = new StreamWriter(outputfile))
{
int bytes = 0;
int parsedBytes = 0;
var buffer = new byte[blockSize];
string lastRow = string.Empty;
while ((bytes = file.Read(buffer, 0, buffer.Length)) > 0)
{
// Because the buffer edge could split a RowDelimiter, we need to keep the
// last row from the prior split operation. Append the new buffer to the
// last row from the prior loop iteration.
lastRow += Encoding.Default.GetString(buffer,0, bytes);
//parse the file
string[] rows = lastRow.Split(new string[] { ParserConstants.RowSeparator }, StringSplitOptions.None);
// We cannot process the last row in this set because it may not be a complete
// row, and tokens could be clipped.
if (rows.Count() > 1)
{
for (int i = 0; i < rows.Count() - 1; i++)
{
sw.Write(new Regex(ParserConstants.ColumnSeparator).Replace(rows[i], "\t") + ParserConstants.NewlineCharacter);
}
}
lastRow = rows[rows.Count() - 1];
parsedBytes += bytes;
// The following statement is not quite true because we haven't parsed the lastRow.
Console.WriteLine($"Parsed {parsedBytes.ToString():N0} bytes");
}
// Now that there are no more bytes to read, we know that the lastrow is complete.
sw.Write(new Regex(ParserConstants.ColumnSeparator).Replace(lastRow, "\t"));
}
}
Console.WriteLine("File Parsing completed.");
}
Late to the party here, but in case anyone else want to know easy way to load such large CSV file with custom delimiters, Cinchoo ETL does the job for you.
using (var parser = new ChoCSVReader("CustomNewLine.csv")
.WithDelimiter("~")
.WithEOLDelimiter("#####")
)
{
foreach (dynamic x in parser)
Console.WriteLine(x.DumpAsJson());
}
Disclaimer: I'm the author of this library.
I am trying to count the number of rows in a text file (to compare to a control file) before performing a complex SSIS insert package.
Currently I am using a StreamReader and it is breaking a line with a {LF} embedded into a new line, whereas SSIS is using {CR}{LF} (correctly), so the counts are not tallying up.
Does anyone know an alternate method of doing this where I can count the number of lines in the file based on {CR}{LF} Line breaks only?
Thanks in advance
Iterate through the file and count number of CRLFs.
Pretty straightforward implementation:
public int CountLines(Stream stream, Encoding encoding)
{
int cur, prev = -1, lines = 0;
using (var sr = new StreamReader(stream, encoding, false, 4096, true))
{
while ((cur = sr.Read()) != -1)
{
if (prev == '\r' && cur == '\n')
lines++;
prev = cur;
}
}
//Empty stream will result in 0 lines, any content would result in at least one line
if (prev != -1)
lines++;
return lines;
}
Example usage:
using(var s = File.OpenRead(#"<your_file_path>"))
Console.WriteLine("Found {0} lines", CountLines(s, Encoding.Default));
Actually it's a find substring in string task. More generic algorithms can be used.
{CR}{LF} is the desired. Can't really say which is correct.
Since ReadLine strips off the end of line you don't know
Use StreamReader.Read Method () and look for 13 followed by 10
It return Int
Here's a pretty lazy way... this will read the entire file into memory.
var cnt = File.ReadAllText("yourfile.txt")
.Split(new[] { "\r\n" }, StringSplitOptions.None)
.Length;
Here is an extension-method that reads the lines with line-seperator {Cr}{Lf} only, and not {LF}. You could do a count on it.
var count= new StreamReader(#"D:\Test.txt").ReadLinesCrLf().Count()
But could also use it for reading files, sometimes usefull since the normal StreamReader.ReadLine breaks on both {Cr}{Lf} and {LF}. Can be used on any TextReader and works streaming (file size is not an issue).
public static IEnumerable<string> ReadLinesCrLf(this TextReader reader, int bufferSize = 4096)
{
StringBuilder lineBuffer = null;
//read buffer
char[] buffer = new char[bufferSize];
int charsRead;
var previousIsLf = false;
while ((charsRead = reader.Read(buffer, 0, bufferSize)) != 0)
{
int bufferIndex = 0;
int writeIdx = 0;
do
{
var currentChar = buffer[bufferIndex];
switch (currentChar)
{
case '\n':
if (previousIsLf)
{
if (lineBuffer == null)
{
//return from current buffer writeIdx could be higher than 0 when multiple rows are in the buffer
yield return new string(buffer, writeIdx, bufferIndex - writeIdx - 1);
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
}
else
{
Debug.Assert(writeIdx == 0, $"Write index should be 0, when linebuffer != null");
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
Debug.Assert(lineBuffer.ToString().Last() == '\r',$"Last character in linebuffer should be a carriage return now");
lineBuffer.Length--;
//shift write index to next character that will be read
writeIdx = bufferIndex + 1;
yield return lineBuffer.ToString();
lineBuffer = null;
}
}
previousIsLf = false;
break;
case '\r':
previousIsLf = true;
break;
default:
previousIsLf = false;
break;
}
bufferIndex++;
} while (bufferIndex < charsRead);
if (writeIdx < bufferIndex)
{
if (lineBuffer == null) lineBuffer = new StringBuilder();
lineBuffer.Append(buffer, writeIdx, bufferIndex - writeIdx);
}
}
//return last row
if (lineBuffer != null && lineBuffer.Length > 0) yield return lineBuffer.ToString();
}
I get some problems with c# windows form.
My goal is to slice a big file(maybe>5GB) into files,and each file contains a million lines.
According to the code below,I have no idea why it will be out of memory.
Thanks.
StreamReader readfile = new StreamReader(...);
StreamWriter writefile = new StreamWriter(...);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content + "\r\n");
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Close();
writefile.Dispose();
writefile = new StreamWriter(...);
}
label5.Text = i.ToString();
label5.Update();
}
The error is probably in the
label5.Text = i.ToString();
label5.Update();
just to make a test I've written something like:
for (int i = 0; i < int.MaxValue; i++)
{
label1.Text = i.ToString();
label1.Update();
}
The app freezes around 16000-18000 (Windows 7 Pro SP1 x64, the app running both x86 and x64).
What probably happens is that by running your long operation in the main thread of the app, you stall the message queue of the window, and at a certain point it freezes. You can see that this is the problem by adding a
Application.DoEvents();
instead of the
label5.Update();
But even this is a false solution. The correct solution is moving the copying on another thread and updating the control every x milliseconds, using the Invoke method (because you are on a secondary thread),
For example:
public void Copy(string source, string dest)
{
const int updateMilliseconds = 100;
int index = 0;
int i = 0;
StreamWriter writefile = null;
try
{
using (StreamReader readfile = new StreamReader(source))
{
writefile = new StreamWriter(dest + index);
// Initial value "back in time". Forces initial update
int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content);
writefile.Write("\r\n"); // Splitted to remove a string concatenation
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Dispose();
writefile = new StreamWriter(dest + index);
// Force update
milliseconds = unchecked(milliseconds - updateMilliseconds);
}
int milliseconds2 = Environment.TickCount;
int diff = unchecked(milliseconds2 - milliseconds);
if (diff >= updateMilliseconds)
{
milliseconds = milliseconds2;
Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1}", index, i)));
}
}
}
}
finally
{
if (writefile != null)
{
writefile.Dispose();
}
}
// Last update
Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1} Finished", index, i)));
}
and call it with:
var thread = new Thread(() => Copy(#"C:\Temp\lst.txt", #"C:\Temp\output"));
thread.Start();
Note how it will write the label5 every 100 milliseconds, plus once at the beginning (by setting the initial value of milliseconds "back in time"), each time the output file is changed (by setting the value of milliseconds "back in time") and after having disposed everything.
An even more correct example can be written by using the BackgroundWorker class, that exists explicitly for this scenario. It has an event, ProgressChanged, that can be subscribed to update the window.
Something like this:
private void button1_Click(object sender, EventArgs e)
{
BackgroundWorker backgroundWorker = new BackgroundWorker();
backgroundWorker.WorkerReportsProgress = true;
backgroundWorker.ProgressChanged += backgroundWorker_ProgressChanged;
backgroundWorker.RunWorkerCompleted += backgroundWorker_RunWorkerCompleted;
backgroundWorker.DoWork += backgroundWorker_DoWork;
backgroundWorker.RunWorkerAsync(new string[] { #"C:\Temp\lst.txt", #"C:\Temp\output" });
}
private void backgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
string[] arguments = (string[])e.Argument;
string source = arguments[0];
string dest = arguments[1];
const int updateMilliseconds = 100;
int index = 0;
int i = 0;
StreamWriter writefile = null;
try
{
using (StreamReader readfile = new StreamReader(source))
{
writefile = new StreamWriter(dest + index);
// Initial value "back in time". Forces initial update
int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content);
writefile.Write("\r\n"); // Splitted to remove a string concatenation
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Dispose();
writefile = new StreamWriter(dest + index);
// Force update
milliseconds = unchecked(milliseconds - updateMilliseconds);
}
int milliseconds2 = Environment.TickCount;
int diff = unchecked(milliseconds2 - milliseconds);
if (diff >= updateMilliseconds)
{
milliseconds = milliseconds2;
worker.ReportProgress(0, new int[] { index, i });
}
}
}
}
finally
{
if (writefile != null)
{
writefile.Dispose();
}
}
// For the RunWorkerCompleted
e.Result = new int[] { index, i };
}
void backgroundWorker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
int[] state = (int[])e.UserState;
label5.Text = string.Format("File {0}, line {1}", state[0], state[1]);
}
void backgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
int[] state = (int[])e.Result;
label5.Text = string.Format("File {0}, line {1} Finished", state[0], state[1]);
}
Hi all i am having a list box on my Form which will display all the .txt files from the directory C:. This list box selection mode is set to MultiExtended.
My condition to check whether the file is valid or not will be checked using the condition as each and every line content of the selected file should be *94*. If this one satisifes then only it is said to be a valid file. I have written a code for this too but as i am checking in the loop and at a time i can only read one file content this was working fine. But i have to check initially all the selected files matches the condition or not if ok then i have to do the remaining code if not i would like to display error
My code on Button Click
private void btnMerge_Click(object sender, EventArgs e)
{
if (lstACH.SelectedIndices.Count == 1)
{
MessageBox.Show("Select 2 Files To Merge");
}
else
{
for (i = 0; i < lstACH.SelectedItems.Count; i++)
{
strFile = lstACH.SelectedItems[i].ToString();
lines = File.ReadAllLines(strFile);
if (LinesHaveCorrectLength(lines, 94)) // Here i am checking but at a tym i am checking one file only i have to check for all and if ok then the remaining code has to be executed
{
if (i == 0)
{
Stream myStream;
SaveFileDialog saveFileDialog1 = new SaveFileDialog();
saveFileDialog1.InitialDirectory = #"C:\";
saveFileDialog1.DefaultExt = "txt";
saveFileDialog1.Filter = "(*.txt)|*.txt";
saveFileDialog1.FilterIndex = 2;
saveFileDialog1.RestoreDirectory = true;
saveFileDialog1.ValidateNames = true;
if (saveFileDialog1.ShowDialog() == DialogResult.OK)
{
Append.FileName = saveFileDialog1.FileName;
if (Append.FileName.Contains(" \\/:*?<>|"))
{
MessageBox.Show("File name should not contain \\/:*?<>|", "", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
else
{
if ((myStream = saveFileDialog1.OpenFile()) != null)
{
Append.FileName = saveFileDialog1.FileName;
myStream.Close();
}
}
}
}
using (StreamWriter sw = new StreamWriter(Append.FileName, true))
{
using (StreamReader srBatch = new StreamReader(strFile))
{
while (srBatch.Peek() >= 0)
{
strReadLine = srBatch.ReadLine();
if (strReadLine.StartsWith("1"))
{
if (i == 0)
{
strFileHeader = strReadLine;
sw.WriteLine(strFileHeader);
}
}
if (strReadLine.StartsWith("5"))
{
strBtchHeader = strReadLine;
if (i == 0)
{
Btchno = Convert.ToInt32(strReadLine.Substring(87, 7));
BatchCnt = Convert.ToInt16(Btchno);
}
if (i > 0)
{
BatchCnt++;
strBtchHeader = strBtchHeader.Substring(0, 87) + Convert.ToString(BatchCnt.ToString().PadLeft(7, (char)48));
}
sw.WriteLine(strBtchHeader);
}
if (strReadLine.StartsWith("6"))
{
strEntryDetail = strReadLine;
if (i == 0)
{
strTraceNo = strEntryDetail.Substring(87, 7);
EntryCount = Convert.ToInt16(strTraceNo);
}
if (i > 0)
{
EntryCount++;
strEntryDetail = strEntryDetail.Substring(0, 87) + EntryCount.ToString().PadLeft(7, (char)48);
}
sw.WriteLine(strEntryDetail);
}
if (strReadLine.StartsWith("8"))
{
strBtchCntrl = strReadLine;
if (i > 0)
{
//btchEntry++;
strBtchCntrl = strBtchCntrl.Substring(0, 87) + BatchCnt.ToString().PadLeft(7, (char)48);
}
sw.WriteLine(strBtchCntrl);
}
if (strReadLine.StartsWith("9"))
{
strFileCntrl = strReadLine;
strBtchCnt = strReadLine.Substring(1, 6);
strEntrycnt = strReadLine.Substring(13, 8);
strEntryHash = strReadLine.Substring(21, 10);
strDebitAmnt = strReadLine.Substring(31, 12);
strCreditAmnt = strReadLine.Substring(43, 12);
BtchCnt += Convert.ToDouble(strBtchCnt);
Entrycnt += Convert.ToDouble(strEntrycnt);
EntryHash += Convert.ToDouble(strEntryHash);
DebitAmnt += Convert.ToDouble(strDebitAmnt);
CreditAmnt += Convert.ToDouble(strCreditAmnt);
if (i == lstACH.SelectedItems.Count - 1)
{
strFileCntrl = strFileCntrl.Substring(0, 1) + BtchCnt.ToString().PadLeft(6, (char)48) + strFileCntrl.Substring(7, (strFileCntrl.Length - 7));
strFileCntrl = strFileCntrl.Substring(0, 13) + Entrycnt.ToString().PadLeft(8, (char)48) + strFileCntrl.Substring(21, (strFileCntrl.Length - 21));
strFileCntrl = strFileCntrl.Substring(0, 21) + EntryHash.ToString().PadLeft(10, (char)48) + strFileCntrl.Substring(31, (strFileCntrl.Length - 31));
strFileCntrl = strFileCntrl.Substring(0, 31) + DebitAmnt.ToString().PadLeft(12, (char)48) + strFileCntrl.Substring(43, (strFileCntrl.Length - 43));
strFileCntrl = strFileCntrl.Substring(0, 43) + CreditAmnt.ToString().PadLeft(12, (char)48) + strFileCntrl.Substring(55, (strFileCntrl.Length - 55));
sw.WriteLine(strFileCntrl);
}
}
}
}
}
if (i == lstACH.SelectedItems.Count - 1)
{
MessageBox.Show("File Has Been Merged Successfully");
this.Close();
}
}
else
{
MessageBox.Show("One of the Selected File is not a Valid ACH File");
break;
}
}
}
}
Checking for Each and every line Length
private static bool LinesHaveCorrectLength(string[] lines, int expectedLineLength)
{
foreach (string item in lines)
{
if (item.Length != expectedLineLength)
{
return false;
}
}
return true;
}
Just check them all first - and if all good, THEN begin merge.
if (lstACH.SelectedIndices.Count != 2)
{
MessageBox.Show("Select 2 Files To Merge");
return;
}
foreach (String fileName in lstACH.SelectedItems)
{
if( LinesHaveCorrectLength( File.ReadAllLines(fileName), 94 ) == false )
{
MessageBox.Show("File: " + fileName + " has an incorrect line length");
return;
}
}
// Now process them all again to merge:
foreach(String fileName in lstACH.SelectedItems)
{
// ... do merge logic
}
Ok according to your comment it looks like First you want both files to be validated. if that is the case then:
(There could be many ways , this is one of them)
First define a Function to do the checking for all files:
public bool AreFilesValid(ListBox.SelectedObjectCollection filenames)
{
int count = filenames.Count;
bool valid = false;
for(int i=0;i<count;i++)
{
string strFile = filenames[i].ToString();
string[] lines = File.ReadAllLines(strFile);
if(LinesHaveCorrectLength(lines, 94)) { valid=true; }
else { valid = false; }
}
return valid;
}
Then call it in your if condition, i.e just change following lines:
...
strFile = lstACH.SelectedItems[i].ToString();
lines = File.ReadAllLines(strFile);
if (LinesHaveCorrectLength(lines, 94))
{
...
To only this:
...
if (AreFilesValid(lstACH.SelectedItems))
{
...
You have already got your else statement down the code to catch when this if condition fails.
This question already has answers here:
Get last 10 lines of very large text file > 10GB
(21 answers)
Closed 1 year ago.
need a snippet of code which would read out last "n lines" of a log file. I came up with the following code from the net.I am kinda new to C sharp. Since the log file might be
quite large, I want to avoid overhead of reading the entire file.Can someone suggest any performance enhancement. I do not really want to read each character and change position.
var reader = new StreamReader(filePath, Encoding.ASCII);
reader.BaseStream.Seek(0, SeekOrigin.End);
var count = 0;
while (count <= tailCount)
{
if (reader.BaseStream.Position <= 0) break;
reader.BaseStream.Position--;
int c = reader.Read();
if (reader.BaseStream.Position <= 0) break;
reader.BaseStream.Position--;
if (c == '\n')
{
++count;
}
}
var str = reader.ReadToEnd();
Your code will perform very poorly, since you aren't allowing any caching to happen.
In addition, it will not work at all for Unicode.
I wrote the following implementation:
///<summary>Returns the end of a text reader.</summary>
///<param name="reader">The reader to read from.</param>
///<param name="lineCount">The number of lines to return.</param>
///<returns>The last lneCount lines from the reader.</returns>
public static string[] Tail(this TextReader reader, int lineCount) {
var buffer = new List<string>(lineCount);
string line;
for (int i = 0; i < lineCount; i++) {
line = reader.ReadLine();
if (line == null) return buffer.ToArray();
buffer.Add(line);
}
int lastLine = lineCount - 1; //The index of the last line read from the buffer. Everything > this index was read earlier than everything <= this indes
while (null != (line = reader.ReadLine())) {
lastLine++;
if (lastLine == lineCount) lastLine = 0;
buffer[lastLine] = line;
}
if (lastLine == lineCount - 1) return buffer.ToArray();
var retVal = new string[lineCount];
buffer.CopyTo(lastLine + 1, retVal, 0, lineCount - lastLine - 1);
buffer.CopyTo(0, retVal, lineCount - lastLine - 1, lastLine + 1);
return retVal;
}
Had trouble with your code. This is my version. Since its' a log file, something might be writing to it, so it's best making sure you're not locking it.
You go to the end. Start reading backwards until you reach n lines. Then read everything from there on.
int n = 5; //or any arbitrary number
int count = 0;
string content;
byte[] buffer = new byte[1];
using (FileStream fs = new FileStream("text.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
// read to the end.
fs.Seek(0, SeekOrigin.End);
// read backwards 'n' lines
while (count < n)
{
fs.Seek(-1, SeekOrigin.Current);
fs.Read(buffer, 0, 1);
if (buffer[0] == '\n')
{
count++;
}
fs.Seek(-1, SeekOrigin.Current); // fs.Read(...) advances the position, so we need to go back again
}
fs.Seek(1, SeekOrigin.Current); // go past the last '\n'
// read the last n lines
using (StreamReader sr = new StreamReader(fs))
{
content = sr.ReadToEnd();
}
}
A friend of mine uses this method (BackwardReader can be found here):
public static IList<string> GetLogTail(string logname, string numrows)
{
int lineCnt = 1;
List<string> lines = new List<string>();
int maxLines;
if (!int.TryParse(numrows, out maxLines))
{
maxLines = 100;
}
string logFile = HttpContext.Current.Server.MapPath("~/" + logname);
BackwardReader br = new BackwardReader(logFile);
while (!br.SOF)
{
string line = br.Readline();
lines.Add(line + System.Environment.NewLine);
if (lineCnt == maxLines) break;
lineCnt++;
}
lines.Reverse();
return lines;
}
Does your log have lines of similar length? If yes, then you can calculate average length of the line, then do the following:
seek to end_of_file - lines_needed*avg_line_length (previous_point)
read everything up to the end
if you grabbed enough lines, that's fine. If no, seek to previous_point - lines_needed*avg_line_length
read everything up to previous_point
goto 3
memory-mapped file is also a good method -- map the tail of file, calculate lines, map the previous block, calculate lines etc. until you get the number of lines needed
Here is my answer:-
private string StatisticsFile = #"c:\yourfilename.txt";
// Read last lines of a file....
public IList<string> ReadLastLines(int nFromLine, int nNoLines, out bool bMore)
{
// Initialise more
bMore = false;
try
{
char[] buffer = null;
//lock (strMessages) Lock something if you need to....
{
if (File.Exists(StatisticsFile))
{
// Open file
using (StreamReader sr = new StreamReader(StatisticsFile))
{
long FileLength = sr.BaseStream.Length;
int c, linescount = 0;
long pos = FileLength - 1;
long PreviousReturn = FileLength;
// Process file
while (pos >= 0 && linescount < nFromLine + nNoLines) // Until found correct place
{
// Read a character from the end
c = BufferedGetCharBackwards(sr, pos);
if (c == Convert.ToInt32('\n'))
{
// Found return character
if (++linescount == nFromLine)
// Found last place
PreviousReturn = pos + 1; // Read to here
}
// Previous char
pos--;
}
pos++;
// Create buffer
buffer = new char[PreviousReturn - pos];
sr.DiscardBufferedData();
// Read all our chars
sr.BaseStream.Seek(pos, SeekOrigin.Begin);
sr.Read(buffer, (int)0, (int)(PreviousReturn - pos));
sr.Close();
// Store if more lines available
if (pos > 0)
// Is there more?
bMore = true;
}
if (buffer != null)
{
// Get data
string strResult = new string(buffer);
strResult = strResult.Replace("\r", "");
// Store in List
List<string> strSort = new List<string>(strResult.Split('\n'));
// Reverse order
strSort.Reverse();
return strSort;
}
}
}
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine("ReadLastLines Exception:" + ex.ToString());
}
// Lets return a list with no entries
return new List<string>();
}
const int CACHE_BUFFER_SIZE = 1024;
private long ncachestartbuffer = -1;
private char[] cachebuffer = null;
// Cache the file....
private int BufferedGetCharBackwards(StreamReader sr, long iPosFromBegin)
{
// Check for error
if (iPosFromBegin < 0 || iPosFromBegin >= sr.BaseStream.Length)
return -1;
// See if we have the character already
if (ncachestartbuffer >= 0 && ncachestartbuffer <= iPosFromBegin && ncachestartbuffer + cachebuffer.Length > iPosFromBegin)
{
return cachebuffer[iPosFromBegin - ncachestartbuffer];
}
// Load into cache
ncachestartbuffer = (int)Math.Max(0, iPosFromBegin - CACHE_BUFFER_SIZE + 1);
int nLength = (int)Math.Min(CACHE_BUFFER_SIZE, sr.BaseStream.Length - ncachestartbuffer);
cachebuffer = new char[nLength];
sr.DiscardBufferedData();
sr.BaseStream.Seek(ncachestartbuffer, SeekOrigin.Begin);
sr.Read(cachebuffer, (int)0, (int)nLength);
return BufferedGetCharBackwards(sr, iPosFromBegin);
}
Note:-
Call ReadLastLines with nLineFrom starting at 0 for the last line and nNoLines as the number of lines to read back from.
It reverses the list so the 1st one is the last line in the file.
bMore returns true if there are more lines to read.
It caches the data in 1024 char chunks - so it is fast, you may want to increase this size for very large files.
Enjoy!
This is in no way optimal but for quick and dirty checks with small log files I've been using something like this:
List<string> mostRecentLines = File.ReadLines(filePath)
// .Where(....)
// .Distinct()
.Reverse()
.Take(10)
.ToList()
Something that you can now do very easily in C# 4.0 (and with just a tiny bit of effort in earlier versions) is use memory mapped files for this type of operation. Its ideal for large files because you can map just a portion of the file, then access it as virtual memory.
There is a good example here.
As #EugeneMayevski stated above, if you just need an approximate number of lines returned, each line has roughly the same line length and you're more concerned with performance especially for large files, this is a better implementation:
internal static StringBuilder ReadApproxLastNLines(string filePath, int approxLinesToRead, int approxLengthPerLine)
{
//If each line is more or less of the same length and you don't really care if you get back exactly the last n
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
var totalCharsToRead = approxLengthPerLine * approxLinesToRead;
var buffer = new byte[1];
//read approx chars to read backwards from end
fs.Seek(totalCharsToRead > fs.Length ? -fs.Length : -totalCharsToRead, SeekOrigin.End);
while (buffer[0] != '\n' && fs.Position > 0) //find new line char
{
fs.Read(buffer, 0, 1);
}
var returnStringBuilder = new StringBuilder();
using (StreamReader sr = new StreamReader(fs))
{
returnStringBuilder.Append(sr.ReadToEnd());
}
return returnStringBuilder;
}
}
Most log files have a DateTime stamp. Although can be improved, the code below works well if you want the log messages from the last N days.
/// <summary>
/// Returns list of entries from the last N days.
/// </summary>
/// <param name="N"></param>
/// <param name="cSEP">field separator, default is TAB</param>
/// <param name="indexOfDateColumn">default is 0; change if it is not the first item in each line</param>
/// <param name="bFileHasHeaderRow"> if true, it will not include the header row</param>
/// <returns></returns>
public List<string> ReadMessagesFromLastNDays(int N, char cSEP ='\t', int indexOfDateColumn = 0, bool bFileHasHeaderRow = true)
{
List<string> listRet = new List<string>();
//--- replace msFileName with the name (incl. path if appropriate)
string[] lines = File.ReadAllLines(msFileName);
if (lines.Length > 0)
{
DateTime dtm = DateTime.Now.AddDays(-N);
string sCheckDate = GetTimeStamp(dtm);
//--- process lines in reverse
int iMin = bFileHasHeaderRow ? 1 : 0;
for (int i = lines.Length - 1; i >= iMin; i--) //skip the header in line 0, if any
{
if (lines[i].Length > 0) //skip empty lines
{
string[] s = lines[i].Split(cSEP);
//--- s[indexOfDateColumn] contains the DateTime stamp in the log file
if (string.Compare(s[indexOfDateColumn], sCheckDate) >= 0)
{
//--- insert at top of list or they'd be in reverse chronological order
listRet.Insert(0, s[1]);
}
else
{
break; //out of loop
}
}
}
}
return listRet;
}
/// <summary>
/// Returns DateTime Stamp as formatted in the log file
/// </summary>
/// <param name="dtm">DateTime value</param>
/// <returns></returns>
private string GetTimeStamp(DateTime dtm)
{
// adjust format string to match what you use
return dtm.ToString("u");
}