I want that separate Async threads of method splitFile should run so that the task will become faster but below code is not working. When I debug , it reaches till line RecCnt = File.ReadAllLines(SourceFile).Length - 1; and comes out. Please help.
public delegate void SplitFile_Delegate(FileInfo file);
static void Main(string[] args)
{
DirectoryInfo d = new DirectoryInfo(#"D:\test\Perf testing Splitter"); //Assuming Test is your Folder
FileInfo[] Files = d.GetFiles("*.txt"); //Getting Text files
foreach (FileInfo file in Files)
{
SplitFile_Delegate LocalDelegate = new SplitFile_Delegate(SplitFile);
IAsyncResult R = LocalDelegate.BeginInvoke(file, null, null); //invoking the method
LocalDelegate.EndInvoke(R);
}
}
private static void SplitFile(FileInfo file)
{
try
{
String fname;
//int FileLength;
int RecCnt;
int fileCount;
fname = file.Name;
String SourceFile = #"D:\test\Perf testing Splitter\" + file.Name;
RecCnt = File.ReadAllLines(SourceFile).Length - 1;
fileCount = RecCnt / 10000;
FileStream fs = new FileStream(SourceFile, FileMode.Open);
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
{
String dataLine = sr.ReadLine();
for (int x = 0; x < (fileCount + 1); x++)
{
String Filename = #"D:\test\Perf testing Splitter\Destination Files\" + fname + "_" + x + "by" + (fileCount + 1) + ".txt"; //test0by4
using (StreamWriter Writer = file.AppendText(Filename))
{
for (int y = 0; y < 10000; y++)
{
Writer.WriteLine(dataLine);
dataLine = sr.ReadLine();
}
Writer.Close();
}
}
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
Your code doesn't really need any multi-threading. It doesn't really even need asynchronous processing all that much - you're saturating the I/O most likely, and unless you've got multiple drives as the data sources, you're not going to improve that by adding parallelism.
On the other hand, your code is reading each file twice. For no reason, wasting memory, time and even CPU. Instead, just do this:
FileStream fs = new FileStream(SourceFile, FileMode.Open);
using (StreamReader sr = new StreamReader(fs))
{
string line;
string fileName = null;
StreamWriter outputFile = null;
int lineCounter = 0;
int outputFileIndex = 0;
while ((line = sr.ReadLine()) != null)
{
if (fileName == null || lineCounter >= 10000)
{
lineCounter = 0;
outputFileIndex++;
fileName = #"D:\Output\" + fname + "_" + outputFileIndex + ".txt";
if (outputFile != null) outputFile.Dispose();
outputFile = File.AppendText(fileName);
}
outputFile.WriteLine(line);
lineCounter++;
}
}
If you really need to have the filename in format XOutOfY, you can just rename them afterwards - it's a lot cheaper than reading the source file twice, line after line. Or, if you don't care about keeping the whole file in memory at once, just use the array you got from ReadAllLines and iterate over that, rather than doing the reading all over again.
To make this even easier, you can also use foreach (var line in File.ReadLines(fileName)).
If you really want to make this asynchronous, the way to handle that is by using asynchronous I/O, not just by spooling new threads. So you can use await with StreamReader.ReadLineAsync etc.
You are not required to call EndInvoke and really all EndInvoke does is wait on the return value for you. Since SplitFile returns void, my guess is there's an optimization that kicks in because you don't need to wait on anything and it simply ignores the wait. For more details: C# Asynchronous call without EndInvoke?
That being said, your usage of Begin/EndInvoke will likely not be faster than serial programming (and will likely be marginally slower) as your for loop is still serialized, and you're still running the iteration in serial. All that has changed is you're using a delegate where it looks like one isn't necessary.
It's possible that what you meant to use was Parallel.ForEach (MSDN: https://msdn.microsoft.com/en-us/library/dd992001(v=vs.110).aspx) which will potentially run iterations in parallel.
Edit: As someone else has mentioned, having multiple threads engage in file operations will likely not improve performance as your file ops are probably disk bound. The main benefit you would get from an async file read/write would probably be unblocking the main thread for a UI update. You will need to specify what you want with "performance" if you want a better answer.
Related
I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?
the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.
When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.
You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.
I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}
The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}
I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?
the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);
You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.
When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.
You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.
I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}
The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}
I have a text file with 457379 lines and this structure
Key1\t\tValue1
Key2\t\tValue2
I'm using this code to load it into a Dictionary<string,string>
private void StartScan()
{
using (StreamReader sr = new StreamReader("fh.txt"))
{
while (!sr.EndOfStream)
{
scaned++;
label4.Text = scaned.ToString();
var read = sr.ReadLine().Split(new string[] { "\t\t" }, StringSplitOptions.None);
fh.Add(read[0], read[1]);
}
}
}
but it takes more than 6 minutes to load data.
The question is is there any better solution to load the data?
The problem is you're updating an UI element (label4) every time you read a line.
This can be very expensive, so either I suggest to remove the line:
label4.Text = scaned.ToString();
or update it less frequently, e.g. once every 100 lines read.
Try:
private void StartScan()
{
var lastupdate = 0;
...
if(lastUpdate + 100 < scaned)
{
label4.Text = scaned.ToString();
lastUpdate = scaned;
}
...
it might improve quite a bit...I guess the label updating is one of the most expensive operations in your code
I find File.ReadLines to be the easiest/quickest way to process files line-by-line:
var dictionary = File.ReadLines("C:\\file.txt")
.Select(s => s.Split(new string[] { "\t\t" }, StringSplitOptions.None))
.ToDictionary(k => k[0], v => v[1]);
Having said that, there's not much functional difference between the code above and what you already have other than the fact it's slightly less verbose.
One thing that you can do is use a bufferstream.
using (FileStream fs = File.Open(file, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
string line;
while ((line = sr.ReadLine()) != null)
{
//Do the add
}
}
You will see an improvement. Also do you need a Dictionary? If you don't need a key mapping to each value use a HashSet. It is slightly faster in adding. Just a bit but it might make a difference in the long run.
I tried to split the file about 32GB using the below code but I got the memory exception.
Please suggest me to split the file using C#.
string[] splitFile = File.ReadAllLines(#"E:\\JKS\\ImportGenius\\0.txt");
int cycle = 1;
int splitSize = Convert.ToInt32(txtNoOfLines.Text);
var chunk = splitFile.Take(splitSize);
var rem = splitFile.Skip(splitSize);
while (chunk.Take(1).Count() > 0)
{
string filename = "file" + cycle.ToString() + ".txt";
using (StreamWriter sw = new StreamWriter(filename))
{
foreach (string line in chunk)
{
sw.WriteLine(line);
}
}
chunk = rem.Take(splitSize);
rem = rem.Skip(splitSize);
cycle++;
}
Well, to start with you need to use File.ReadLines (assuming you're using .NET 4) so that it doesn't try to read the whole thing into memory. Then I'd just keep calling a method to spit the "next" however many lines to a new file:
int splitSize = Convert.ToInt32(txtNoOfLines.Text);
using (var lineIterator = File.ReadLines(...).GetEnumerator())
{
bool stillGoing = true;
for (int chunk = 0; stillGoing; chunk++)
{
stillGoing = WriteChunk(lineIterator, splitSize, chunk);
}
}
...
private static bool WriteChunk(IEnumerator<string> lineIterator,
int splitSize, int chunk)
{
using (var writer = File.CreateText("file " + chunk + ".txt"))
{
for (int i = 0; i < splitSize; i++)
{
if (!lineIterator.MoveNext())
{
return false;
}
writer.WriteLine(lineIterator.Current);
}
}
return true;
}
Do not read immediately all lines into an array, but use StremReader.ReadLine method, like:
using (StreamReader sr = new StreamReader(#"E:\\JKS\\ImportGenius\\0.txt"))
{
while (sr.Peek() >= 0)
{
var fileLine = sr.ReadLine();
//do something with line
}
}
File.ReadAllLines
That will read the whole file into memory.
To work with large files you need to only read what you need now into memory, and then throw that away as soon as you have finished with it.
A better option would be File.ReadLines which returns a lazy enumerator, data is only read into memory as you get the next line from the enumerator. Providing you avoid multiple enumerations (eg. don't use Count()) only parts of the file will be read.
Instead of reading all the file at once using File.ReadAllLines, use File.ReadLines in a foreach loop to read the lines as needed.
foreach (var line in File.ReadLines(#"E:\\JKS\\ImportGenius\\0.txt"))
{
// Do something
}
Edit: On an unrelated note, you don't have to escape your backslashes when prefixing the string with a '#'. So either write "E:\\JKS\\ImportGenius\\0.txt" or #"E:\JKS\ImportGenius\0.txt", but #"E:\\JKS\\ImportGenius\\0.txt" is redundant.
The problem here is that you are reading the entire file's content into memory at once with File.ReadAllLines(). What you need to do is open a FileStream with File.OpenRead() and read/write smaller chunks.
Edit: Actually for your case ReadLine is obviously better. See other answers. :)
Use a StreamReader to read the file, write with a StreamWriter.
I am trying to remove the space at the end of line and then that line will be written in another file.
But when the program reaches to FileWriter then it gives me the following error
Process can't be accessed because it is being used by another process.
The Code is as below.
private void FrmCounter_Load(object sender, EventArgs e)
{
string[] filePaths = Directory.GetFiles(#"D:\abc", "*.txt", SearchOption.AllDirectories);
string activeDir = #"D:\dest";
System.IO.StreamWriter fw;
string result;
foreach (string file in filePaths)
{
result = Path.GetFileName(file);
System.IO.StreamReader f = new StreamReader(file);
string newFileName = result;
// Combine the new file name with the path
string newPath = System.IO.Path.Combine(activeDir, newFileName);
File.Create(newPath);
fw = new StreamWriter(newPath);
int counter = 0;
int spaceAtEnd = 0;
string line;
// Read the file and display it line by line.
while ((line = f.ReadLine()) != null)
{
if (line.EndsWith(" "))
{
spaceAtEnd++;
line = line.Substring(0, line.Length - 1);
}
fw.WriteLine(line);
fw.Flush();
counter++;
}
MessageBox.Show("File Name : " + result);
MessageBox.Show("Total Space at end : " + spaceAtEnd.ToString());
f.Close();
fw.Close();
}
}
File.Create itself returns a stream.
Use that stream to write file. Reason you are receiving this error is because Stream returned by File.Create is open and you are trying to open that file again for write.
Either close the stream returned by File.Create or better use that stream for file write or use
Stream newFile = File.Create(newPath);
fw = new StreamWriter(newFile);
Even though you solved your initial problem, if you want to write everything into a new file in the original location, you can try to read all of the data into an array and close the original StreamReader. Performance note: If your file is sufficiently large though, this option is not going to be the best for performance.
And you don't need File.Create as the StreamWriter will create a file if it doesn't exist, or overwrite it by default or if you specify the append parameter as false.
result = Path.GetFileName(file);
String[] f = File.ReadAllLines(file); // major change here...
// now f is an array containing all lines
// instead of a stream reader
using(var fw = new StreamWriter(result, false))
{
int counter = f.Length; // you aren't using counter anywhere, so I don't know if
// it is needed, but now you can just access the `Length`
// property of the array and get the length without a
// counter
int spaceAtEnd = 0;
// Read the file and display it line by line.
foreach (var item in f)
{
var line = item;
if (line.EndsWith(" "))
{
spaceAtEnd++;
line = line.Substring(0, line.Length - 1);
}
fw.WriteLine(line);
fw.Flush();
}
}
MessageBox.Show("File Name : " + result);
MessageBox.Show("Total Space at end : " + spaceAtEnd.ToString());
Also, you will not remove multiple spaces from the end of the line using this method. If you need to do that, consider replacing line = line.Substring(0, line.Length - 1); with line = line.TrimEnd(' ');
You have to close any files you are reading before you attempt to write to them in your case.
Write stream in using statement like:
using (System.IO.StreamReader f = new StreamReader(file))
{
//your code goes here
}
EDIT:
Zafar is correct, however, maybe this will clear things up.
Because File.Create returns a stream.. that stream has opened your destination file. This will make things clearer:
File.Create(newPath).Close();
Using the above line, makes it work, however, I would suggest re-writing that properly. This is just for illustrative purposes.