I have a text file that contains about 100000 articles.
The structure of file is:
.Document ID 42944-YEAR:5
.Date 03\08\11
.Cat political
Article Content 1
.Document ID 42945-YEAR:5
.Date 03\08\11
.Cat political
Article Content 2
I want to open this file in c# for processing it line by line.
I tried this code:
String[] FileLines = File.ReadAllText(
TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray());
But it says:
Exception of type
'System.OutOfMemoryException' was
thrown.
The question is How can I open this file and read it line by line.
File Size: 564 MB (591,886,626 bytes)
File Encoding: UTF-8
File contains Unicode characters.
You can open the file and read it as a stream rather than loading everything into memory all at once.
From MSDN:
using System;
using System.IO;
class Test
{
public static void Main()
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
}
Your file is too large to be read into memory in one go, as File.ReadAllText is trying to do. You should instead read the file line by line.
Adapted from MSDN:
string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(#"c:\yourfile.txt"))
{
while ((line = file.ReadLine()) != null)
{
Console.WriteLine(line);
// do your processing on each line here
}
}
In this way, no more than a single line of the file is in memory at any one time.
If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.
MSDN Documentation - File.ReadLines Method (String)
Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0
Something like this:
using (var fileStream = File.OpenText(#"path to file"))
{
do
{
var fileLine = fileStream.ReadLine();
// process fileLine here
} while (!fileStream.EndOfStream);
}
Related
I'm using a StreamWriter to write data to a file. The data is a (potentially) long string that should be saved in full.
I've seen cases where the file is created, but its contents are not the entire string; that is - the string appears to have been "cut" during saving, and not saved in its entirety.
The "buggy" file i have contains exactly 4096 characters, which is exactly the length of the internal buffer used by the StreamWriter class.
Example, similar to the code we're using:
string output = "......" // long string
sw = File.CreateText(filename);
if (fileWriter == null)
{
return;
}
try
{
sw.Write (output);
}
finally
{
if (sw != null)
{
sw.Close ();
}
}
My question is:
is this an expected scenario?
e.g: can StreamWriter write only part of the string it should save? if so, is there any simple way to overcome this ?
I'm new to C# and I have a problem. I would like to have the content of a file written to a RichTextBox, but the StreamReader.ReadLine method reads only the first line.
How can I solve this?
Probably the easiest way to do this would be to use the System.IO.File class's ReadAllText method:
myRichTextBox.Text = File.ReadAllText(filePath);
This class has a bunch of static methods that wrap the StreamReader class for you that make reading and writing to files quite easy.
There are several ways to read a file. If you want to it with a StreamReader and want to read the whole file, this could be a solution:
using System;
using System.IO;
class Test
{
public static void Main()
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
}
You could edit the part were the output of the console happens. Here it would be possible to concatenate the string for your RichTextbox with
text += Environment.NewLine + line;
The code running on wince 5.0 / .net framework compact 2.0
Always get a exception says:
the process cannot access the file because it is being used by another process.
Really confused as i already encolse the stream in the using statement,so the filestream should be closed automaticly once leave the using block .
//read text
using (StreamReader sr = File.OpenText(fname))
{
string line;
while ((line = sr.ReadLine()) != null)
{
// append into stringbuilder
sb.Append(line);
sb.Append("\n");
}
}
//write text, below code raise the exception.
//if i comment it and re-run the code,exception disappear
using (StreamWriter sw = File.CreateText(fname))
{
sw.Write(sb.ToString());
}
addition:i just want to update the file, read and write. any better way?
sorry guys, the issue is in my code and i confused you here as i dont share that code.
actually because i wrote this in the very beginning of the program
// f is the fileinfo which point to fname as well
string text = f.OpenText().ReadToEnd();
this created a streamreader, not being assigned to any varible, but it's in the heap.so i ignored.
thanks people helpping here. BTW changed code to this then issue gone
using (StreamReader sr = f.OpenText())
{
string text = sr.ReadToEnd();
}
I tested this code on my computer. There is no problem.
Better way. For read and write full file, you can use File.ReadAllText(fname) and File.WriteAllText(fname). And instead of using \n use Environment.NewLine
I need help figuring out the fastest way to read through about 80 files with over 500,000 lines in each file, and write to one master file with each input file's line as a column in the master. The master file must be written to a text editor like notepad and not a Microsoft product because they can't handle the number of lines.
For example, the master file should look something like this:
File1_Row1,File2_Row1,File3_Row1,...
File1_Row2,File2_Row2,File3_Row2,...
File1_Row3,File2_Row3,File3_Row3,...
etc.
I've tried 2 solutions so far:
Create a jagged array to hold each files' contents into an array and then once reading all lines in all files, write the master file. The issue with this solution is that Windows OS memory throws an error that too much virtual memory is being used.
Dynamically create a reader thread for each of the 80 files that reads a specific line number, and once all threads finish reading a line, combine those values and write to file, and repeat for each line in all files. The issue with this solution is that it is very very slow.
Does anybody have a better solution for reading so many large files in a fast way?
The best way is going to be to open the input files with a StreamReader for each one and a StreamWriter for the output file. Then you loop through each reader and read a single line and write it to the master file. This way you are only loading one line at a time so there should be minimal memory pressure. I was able to copy 80 ~500,000 line files in 37 seconds. An example:
using System;
using System.Collections.Generic;
using System.IO;
using System.Diagnostics;
class MainClass
{
static string[] fileNames = Enumerable.Range(1, 80).Select(i => string.Format("file{0}.txt", i)).ToArray();
public static void Main(string[] args)
{
var stopwatch = Stopwatch.StartNew();
List<StreamReader> readers = fileNames.Select(f => new StreamReader(f)).ToList();
try
{
using (StreamWriter writer = new StreamWriter("master.txt"))
{
string line = null;
do
{
for(int i = 0; i < readers.Count; i++)
{
if ((line = readers[i].ReadLine()) != null)
{
writer.Write(line);
}
if (i < readers.Count - 1)
writer.Write(",");
}
writer.WriteLine();
} while (line != null);
}
}
finally
{
foreach(var reader in readers)
{
reader.Close();
}
}
Console.WriteLine("Elapsed {0} ms", stopwatch.ElapsedMilliseconds);
}
}
I've assume that all the input files have the same number of lines, but you should be add the logic to keep reading when at least one file has given you data.
Use Memory Mapped files seems what is suitable to you. Something that does not execute pressure on memory of your app contemporary maintaining good performance in IO operations.
Here complete documentation: Memory-Mapped Files
If you have enough memory on the computer, I would use the Parallel.Invoke construct and read each file into a pre-allocated array such as:
string[] file1lines = new string[some value];
string[] file2lines = new string[some value];
string[] file3lines = new string[some value];
Parallel.Invoke(
() =>
{
ReadMyFile(file1,file1lines);
},
() =>
{
ReadMyFile(file2,file2lines);
},
() =>
{
ReadMyFile(file3,file3lines);
}
);
Each ReadMyFile method should just use the following sample code which, according to these benchmarks, is the fastest way to read a text file:
int x = 0;
using (StreamReader sr = File.OpenText(fileName))
{
while ((file1lines[x] = sr.ReadLine()) != null)
{
x += 1;
}
}
If you need to manipulate the data from each file before writing your final output, read this article on the fastest way to do that.
Then you just need one method to write the contents to each string[] to the output as you desire.
Have an array of open file handles. Loop through this array and read a line from each file into a string array. Then combine this array into the master file, append a newline at the end.
This differs from your second approach that it is single threaded and doesn't read a specific line but always the next one.
Of course you need to be error proof if there are files with less lines than others.
Lets say i have a text file with following content:
Hello!
How are you?
I want to call the file via a simple application that produces an output file with the following contents:
buildLetter.Append("Hello!").AppendLine();
buildLetter.Append("How are you?").AppendLine();
As you see, every line should be put between " ".
Any help will be appreciated.
void ConvertFile(string inPath, string outPath)
{
using (var reader = new StreamReader(inPath))
using (var writer = new StreamWriter (outPath))
{
string line = reader.ReadLine();
while (line != null)
{
writer.WriteLine("buildLetter.Append(\"{0}\").AppendLine();",line.Trim());
line = reader.ReadLine ();
}
}
}
You should add some I/O exception handling on your own.
If you want to append "" to each line you could try combining the ReadAllLines and WriteAllLines methods:
File.WriteAllLines(
"output.txt",
File
.ReadAllLines("input.txt")
.Select(line => string.Format("\"{0}\"", line))
.ToArray()
);
Notice that this loads the whole file contents into memory so it wouldn't work well with very large files. In this case stream readers and writers are more adapted.
Use the StreamReader class from System.IO
Refer this link for sample code
All you probably need to do is change the line
Console.WriteLine(sr.ReadLine());
to
Console.WriteLine(""""" + sr.ReadLine() + """""); // handwritten code - not tested :-)
For a small text files this works for me.
private void EditFile(string path, string oldText, string newText)
{
string content = File.ReadAllText(path);
content = contenido.Replace(oldText, newText);
File.WriteAllText(path, content);
}