Writing huge amount of physical file names into text files c# - c#

I have developed a small tool that will be used to display data discrepancy in c#, what I do is explained point wise below,
fetch data from database and write list of file names in text file based on date criteria -output 1
take dir of path 1 and write into text file- output2
take dir of path2, path3 and path4 similarly and write into text files separately for each path- output 3/4/5
compare option: compare output1 and 2 and write down the difference in text file, this difference is then compared to output3 and again the difference is written in another file, and so on...
my issue is : my last path has more than 2.5 million records of files, whenever I try writing it in text file it hangs the application and it never provides output, I did try filtering it with date criteria but even for a single day where records could be around 30 thousands it hangs
I have searched many sites but did not get solution that I can understand or able to implement it. Below is my attempted code.
if (!txtpath3.Text.Equals(String.Empty) && System.IO.Directory.GetFiles(txtpath3.Text).Length > 0)
{
var directory = txtpath3.Text;
var from_dt = this.dtpickerstart.Value;
var end_dt = this.dtpickerend.Value;
DateTime from_date = from_dt;
DateTime to_date = end_dt;
DirectoryInfo di = new DirectoryInfo(directory);
FileSystemInfo[] files = di.GetFileSystemInfos();
var op = di.GetFiles()
.Where(file => file.LastWriteTime >= from_date && file.LastWriteTime <= to_date);
foreach (string file in System.IO.Directory.GetFiles(txtpath3.Text, "*.*"))
{
TextWriter tw = new StreamWriter(dirfile3, true);
tw.WriteLine("" + file + "");
tw.Close();
}
}
else
{
}

Your foreach-loop opens and closes the file for all lines. You should open and close the file outside of the loop.
using(var tw = new StreamWriter(dirfile3, true))
{
foreach (string file in System.IO.Directory.GetFiles(txtpath3.Text, "*.*"))
{
tw.WriteLine("" + file + "");
}
}
Even easier would be using the already existing functions to do this:
File.AppendAllLines(dirfile3, System.IO.Directory.GetFiles(txtpath3.Text, "*.*"));
As 2.5 million filesnames are a lot to keep in RAM at the same time, you might be better off with just enumerating them:
File.AppendAllLines(dirfile3, System.IO.Directory.EnumerateFiles(txtpath3.Text, "*.*"));

I think that the problem is in the foreach
foreach (string file in System.IO.Directory.GetFiles(txtpath3.Text, "*.*"))
{
TextWriter tw = new StreamWriter(dirfile3, true);
tw.WriteLine("" + file + "");
tw.Close();
}
For each and every one of the many, many files, you are opening a file, appending a line, and closing the file, only to open it again, write another line, etc, etc...
You should just prepare everything in a string first, and then just insert all the text in one go, something like:
StringBuilder sb = new StringBuilder();
foreach (string file in System.IO.Directory.GetFiles(txtpath3.Text, "*.*"))
{
sb.AppendLine(file);
}
File.WriteAllText(dirfile3, sb.ToString());

Related

Check specific row word count in text file based on starting row values

I have text file having content like this below format:
0001EPP000000084906875 00000 0001
0002EPP000000084906875 00016 0002
0003EPP000000084906875
............
0001EPP000000084967647 00001 0002
0002EPP000000077676678 00016 0002
0003EPP000000084777770
I need to loop through all the rows, find the row starting with 0001 and 0002 need to get the corresponding row word count for the same (0001, 0002) with the count, I will do further calculations ...
For this, I have done this:
string filename = string.Empty;
DirectoryInfo dir = new DirectoryInfo(path);
FileInfo[] TXTFiles = dir.GetFiles("*.txt");
foreach(var file in TXTFiles)
{
filename = file.Name;
}
var reader = new StreamReader(filename);
foreach(string item in File.ReadAllLines(filename))
{
// Here I need to check the file content
}
Please, could anyone help on this? How to identify the row content and get the word count, if row content matches starting criteria?
There are a couple of things that you can change that should get you where you want to be.
First, you can get rid of the StreamReader line, since File.ReadAllLines is all we need to get the contents.
Second, we should put the code that processes the file inside the foreach (var file ... loop. Otherwise, we're only processing the last file.
Third, you should replace file.Name (which is just the filename) with file.FullName (which includes the full path and file name). Otherwise, you will likely get a FileNotFoundException unless you happen to be searching in the current directory.
And finally, when reading the file, we can use StartsWith to check if the line starts with the text you're looking for, and when we find the line we want, we can use string.Split to break it on the space character into an array. Then it's just a matter of reading the parts you care about from the line.
The code would then look something like this:
var path = #"c:\public\temp";
DirectoryInfo dir = new DirectoryInfo(path);
foreach (var file in dir.GetFiles("*.txt"))
{
foreach (var line in File.ReadAllLines(file.FullName))
{
if (line.StartsWith("0001") || line.StartsWith("0002"))
{
var lineParts = line.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
// This is assuming that the word count is in the column (which has index 2)
if (lineParts.Length > 2)
{
var wordCount = lineParts[2];
Console.WriteLine($"Found data in file {file.Name}:");
Console.WriteLine($" - Line starts with {lineParts[0].Substring(0, 4)}");
Console.WriteLine($" - Has word count of {wordCount}");
}
}
}
}
Output

c# Looping through directory, finding XML in each directory and, if & present replace with & - Save

I would love to have the XML write correctly but the file is coming to us from a 3rd party, who we have no control over. (they are another vendor of our customer)
So with that in mind:
They send us a zip of folders and sub-folders (in each of the sub folders there will be an XML)
The XML which they send us has & in place of &amp (every & is like this but not every XML will have a &);
Using my code I can (on console at least) loop through each of the subs, find the xml, find the line with the & (if there is one) and change it to &
foreach (string d in Directory.GetFiles(targetDirectory, "*.xml", SearchOption.AllDirectories))
{
Console.WriteLine(d);
String[] lines = File.ReadAllLines(d);
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].Contains("&"))
{
string line = lines[i];
line = line.Replace("&", "&");
Console.WriteLine(line);
}
}
}
In the last "Console.WriteLine(line);" it shows that the & is changed (in memory at least) but when I use the "System.IO.File" if still populates with the original &.
To save some time typing, instead of that loop, do something like:
lines = lines.Select(x => x.Replace("&", "&amp")).ToArray();
foreach (string d in Directory.GetFiles(targetDirectory, "*.xml", SearchOption.AllDirectories))
{
String[] lines = File.ReadAllLines(d);
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].Contains("&"))
{
lines[i] = lines[i].Replace("&", "&");
}
}
System.IO.File.WriteAllLines(d, lines);
}
As Joe mentioned in the comments, your original code was writing to the file during the loop over the lines in the file. I also replaced your hard coded file path in the File.WriteAllLines() to the foreach variable

Joining two files in C#, split options

I am currently trying to work on files, joining multiple of them and having problem because the last work from file 1 is linked with first word from file 2. For example:
File 1:John has got new haircut
File 2: Mike has got new haircut
and it prints me "haircutMike".
The code I am using to split words:
input.Split(' ').ToList().ForEach(n =>{});
I am also making one big file from multiple ones like so:
string[] files = { "f1.txt", "f2.txt" };
FileStream outputFile = new FileStream("new.txt", FileMode.Create);
using (StreamWriter ws = new StreamWriter(outputFile))
{
foreach (string file in files)
{
ws.Write(System.IO.File.ReadAllText(file) + " ");
}
}
#EDIT
Changed some code, of course I meant to use stream not binary,also I am using split because I want to count the number of each word in files so I have to split spaces, dots etc.
You mentioned to use + " " option, although it works, but it added me 1 letter to the total count.
EDIT: for multiple input files:
string[] files = { "f1.txt", "f2.txt" };
var allLines = files.SelectMany(i => System.IO.File.ReadAllLines(i));
System.IO.File.WriteAllLines("new.txt", allLines.ToArray());

copying files from one folder to another only if their file name is a in a specified text file c#

I need to copy files from one folder to another but only if their file name is also in a text file. The text file is set up like so
file1.jpg
file2.jpg
file3.jpg
etc
There are around one-million files to copy. I'm using C#.
What would the best way to go about this be? I'm not sure if I should first read all the file names from the text file and put them in to a list then possibly convert the list in to an array then maybe use the array somehow? Or maybe there's a better way to go about it?
I know how to read and write to files and how to copy from one source destination to another. I don't know how to filter out specific files when copying from one source destination to another though.
Any help is greatly appreciated.
Thank you.
The following code will help you the process you want
string source = #"C:\SourcePath\";
string destination = #"C:\DestinationPath\";
string[] strFiles = File.ReadAllText(#"C:\Filename.txt").Split(' ');
for (int i = 0; i < strFiles.Length; i++)
{
File.Copy(source + strFiles[i], destination + strFiles[i]);
}
If the text file is one line with million files name. Use this
string from = #"c:\from" , to =#"d:\to"; // source and destination
StreamReader file = new StreamReader(#"c:\list.txt"); // your files list
string total=file.ReadLine();
string[] tobecopied = total.Split(' ');
foreach(string fil in tobecopied)
{
if(File.Exists(from+#"\"+fil))
{
File.Copy(from+#"\"+fil,to+#"\"+fil);
}
else
{
MessageBox.Show(fil+"Not found ");
}
}
But if the text file have 1 line per 1 file , for example
FIle1.exe
File2.exe
use this
string from = #"c:\from" , to =#"d:\to"; // source and destination
StreamReader file = new StreamReader(#"c:\list.txt"); // your files list
string total="";
string temp="";
while((temp=file.ReadLine())!=null)
{
total+=temp+" ";
}
string[] tobecopied = total.Split(' ');
foreach(string fil in tobecopied)
{
if(File.Exists(from+#"\"+fil))
{
File.Copy(from+#"\"+fil,to+#"\"+fil);
}
else
{
MessageBox.Show(fil+"Not found ");
}
}
These ways also check for file existance.
Hope it works. If someone see error please edit it.

How to read a csv file one line at a time and replace/edit certain lines as you go?

I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.
How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?
The process would be something like this:
Open a StreamWriter to a temporary file.
Open a StreamReader to the target file.
For each line:
Split the text into columns based on a delimiter.
Check the columns for the values you want to replace, and replace them.
Join the column values back together using your delimiter.
Write the line to the temporary file.
When you are finished, delete the target file, and move the temporary file to the target file path.
Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.
Basic example just using StreamReader and StreamWriter:
var sourcePath = #"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;
var splitExpression = new Regex(#"(" + delimiter + #")(?=(?:[^""]|""[^""]*"")*$)");
using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
string line = null;
string[] headers = null;
if (firstLineContainsHeaders)
{
line = reader.ReadLine();
lineNumber++;
if (string.IsNullOrEmpty(line)) return; // file is empty;
headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
writer.WriteLine(line); // write the original header to the temp file.
}
while ((line = reader.ReadLine()) != null)
{
lineNumber++;
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
// if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
if (headers == null) headers = new string[columns.Length];
if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));
// TODO: search and replace in columns
// example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", #"\/");
writer.WriteLine(string.Join(delimiter, columns));
}
}
File.Delete(sourcePath);
File.Move(tempPath, sourcePath);
memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files.
read here http://msdn.microsoft.com/en-us/library/dd997372.aspx
or google Memory-mapped files
Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.
using (var sr = new StreamReader(new FileStream(#"C:\temp\file.csv", FileMode.Open)))
{
var line = sr.ReadLine();
while (!sr.EndOfStream)
{
// do stuff
line = sr.ReadLine();
}
}

Categories