Reading and writing very large text files in C# - c#

I have a very large file, almost 2GB in size. I am trying to write a process to read the file in and write it out without the first row. I pretty much have been only able to read and write one line at a time which takes forever. I can open it, remove the first row and save it faster in TextPad, though that is still very slow.
I use this code to get the number of records in the file:
private long getNumRows(string strFileName)
{
long lngNumRows = 0;
string strMsg;
try
{
lngNumRows = 0;
using (var strReader = File.OpenText(#strFileName))
{
while (strReader.ReadLine() != null)
{
lngNumRows++;
}
strReader.Close();
strReader.Dispose();
}
}
catch (Exception excExcept)
{
strMsg = "The File could not be read: ";
strMsg += excExcept.Message;
System.Windows.MessageBox.Show(strMsg);
//Console.WriteLine("Thee was an error reading the file: ");
//Console.WriteLine(excExcept.Message);
//Console.ReadLine();
}
return lngNumRows;
}
This only takes seconds to run. When I add the following code it takes forever to run. Am I doing something wrong? Why does the write add so much time? Any ideas on how I can make this faster?
private void ProcessTextFiles(string strFileName)
{
string strDataLine;
string strFullOutputFileName;
string strSubFileName;
int intPos;
long lngTotalRows = 0;
long lngCurrNumRows = 0;
long lngModNumber = 0;
double dblProgress = 0;
double dblProgressPct = 0;
string strPrgFileName = "";
string strOutName = "";
string strMsg;
long lngFileNumRows;
try
{
using (StreamReader srStreamRdr = new StreamReader(strFileName))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
lngCurrNumRows++;
if (lngCurrNumRows > 1)
{
WriteDataRow(strDataLine, strFullOutputFileName);
}
}
srStreamRdr.Dispose();
}
}
catch (Exception excExcept)
{
strMsg = "The File could not be read: ";
strMsg += excExcept.Message;
System.Windows.MessageBox.Show(strMsg);
//Console.WriteLine("The File could not be read:");
//Console.WriteLine(excExcept.Message);
}
}
public void WriteDataRow(string strDataRow, string strFullFileName)
{
//using (StreamWriter file = new StreamWriter(#strFullFileName, true, Encoding.GetEncoding("iso-8859-1")))
using (StreamWriter file = new StreamWriter(#strFullFileName, true, System.Text.Encoding.UTF8))
{
file.WriteLine(strDataRow);
file.Close();
}
}

Not sure how much this will improve the performance, but surely, opening and closing the output file for every line that you want to write is not a good idea.
Instead open both files just one time and then write the line directly
using (StreamWriter file = new StreamWriter(#strFullFileName, true, System.Text.Encoding.UTF8))
using (StreamReader srStreamRdr = new StreamReader(strFileName))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
lngCurrNumRows++;
if (lngCurrNumRows > 1)
file.WriteLine(strDataRow);
}
}
You could also remove the check on lngCurrNumRow simply making an empty read before entering the while loop
strDataLine = srStreamRdr.ReadLine();
if(strDataLine != null)
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
file.WriteLine(strDataRow);
}
}

Depending on the memory of your machine. You could try the following (my big file was "D:\savegrp.log" I had a 2gb file knocking about) This used about 6gb memory when I tried it
int counter = File.ReadAllLines(#"D:\savegrp.log").Length;
Console.WriteLine(counter);
It does depends on the memory available..
File.WriteAllLines(#"D:\savegrp2.log",File.ReadAllLines(#"D:\savegrp.log").Skip(1));
Console.WriteLine("file saved");

Related

I want to Optimize upload time of Excel file records saving to Database in asp.net mvc

I want to upload the excel file of record 2500. This process takes time more than 5 minutes approximate. I want to optimize it to less than a minute maximum.
Find the best way of optimization of that code. I am using Dbgeography for location storing. File is uploaded and save the data properly. Everything is working find but I need optimization.
public ActionResult Upload(HttpPostedFileBase FileUpload)
{
if (FileUpload.ContentLength > 0)
{
string fileName = Path.GetFileName(FileUpload.FileName);
string ext = fileName.Substring(fileName.LastIndexOf('.'));
Random r = new Random();
int ran = r.Next(1, 13);
string path = Path.Combine(Server.MapPath("~/App_Data/uploads"), DateTime.Now.Ticks + "_" + ran + ext);
try
{
FileUpload.SaveAs(path);
ProcessCSV(path);
ViewData["Feedback"] = "Upload Complete";
}
catch (Exception ex)
{
ViewData["Feedback"] = ex.Message;
}
}
return View("Upload", ViewData["Feedback"]);
}
Here is other method from where the file is uploaded and save it to the database..
private void ProcessCSV(string filename)
{
Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
StreamReader sr = new StreamReader(filename);
string line = null;
string[] strArray;
int iteration = 0;
while ((line = sr.ReadLine()) != null)
{
if (iteration == 0)
{
iteration++;
continue; //Because Its Heading
}
strArray = r.Split(line);
StoreLocation store = new StoreLocation();
store.Name = strArray[0];
store.StoreName = strArray[1];
store.Street = strArray[2];
store.Street2 = strArray[3];
store.St_City = strArray[4];
store.St_St = strArray[5];
store.St_Zip = strArray[6];
store.St_Phone = strArray[7];
store.Office_Phone = strArray[8];
store.Fax = strArray[9];
store.Ownership = strArray[10];
store.website = strArray[11];
store.Retailer = check(strArray[12]);
store.WarehouseDistributor = check(strArray[13]);
store.OnlineRetailer = check(strArray[14]);
string temp_Add = store.Street + "," + store.St_City;
try
{
var point = new GoogleLocationService().GetLatLongFromAddress(temp_Add);
string points = string.Format("POINT({0} {1})", point.Latitude, point.Longitude);
store.Location = DbGeography.FromText(points);//lat_long
}
catch (Exception e)
{
continue;
}
db.StoreLocations.Add(store);
db.SaveChanges();
}
The obvious place to start would be your external call to the Geo Location service, which it looks like you are doing for each iteration of the loop. I don't know that service, but is there any way you can build up a list of addresses in memory, and then hit it once with multiple addresses, then go back and amend all the records you need to update?
The call to db.SaveChanges() which updates the database, this is happening for every line.
while (...)
{
...
db.StoreLocations.Add(store);
db.SaveChanges(); // Many database updates
}
Instead, just call it once at the end:
while (...)
{
...
db.StoreLocations.Add(store);
}
db.SaveChanges(); // One combined database update
This should speed up your code nicely.

Develop a file reading routine in c#

I have developed a file reading routine using c#.net that will read the entire file contents into memory using a suitable Data Class or structure.
I have a text file of 600MB that has RoadId and many other entries. I have to read that file using query method so I used Stream Reader in c#.net that reads line by line. But I want to know is there any other method in c#.net that will be memory efficient and less time taking or by converting the text to binary and then reading.
Not sure please guide me through this.
I am putting my code for reading whole file line by line...
public static void read_time()
{
DateTime end;
StreamReader file =
new StreamReader(#"C:\Users\Reva-Asus1\Desktop\DTF Test\F_Network_out.txt");
DateTime start = DateTime.Now;
while ((file.ReadLine()) != null) ;
end = DateTime.Now;
Console.WriteLine();
Console.WriteLine("Full File Read Time: " + (end - start));
Console.WriteLine();
file.Close();
Console.WriteLine("Data is read");
Console.ReadLine();
return;
}
// This querying method is to take roadId from user from console and display the record....
public static void querying_method()
{
Console.WriteLine("Give a RoadId to search record\n");
DateTime start, end;
string id =Console.ReadLine().Trim();
try
{
System.IO.StreamReader file =
new System.IO.StreamReader(#"C:\Users\Reva-Asus1\Desktop\DTF Test\F_Network_out.txt");
string line1;
int count = 1;
start = DateTime.Now;
while ((line1 = file.ReadLine()) != null)
{
if(line1 == id)
{
string line2 = " ";
while (count != 14)
{
Console.WriteLine(line2 = file.ReadLine());
count++;
}
int n = Convert.ToInt16(line2);
while (n != 0)
{
Console.WriteLine(line2 = file.ReadLine());
n--;
}
break;
}
}
end = DateTime.Now;
Console.WriteLine("Read Time for the data record: " + (end - start));
Console.ReadLine();
return;
}
catch (Exception)
{
Console.WriteLine("No ID match found in the file entered by user");
Console.ReadLine();
return;
}
}
You could maybe use this:
foreach (var line in File.ReadLines(path))
{
// TODO: Parse the line and convert to your object...
}
File.ReadLines(YourPath) is using using StreamReader in the background, so you can continue using it. Here reference code. So if you already using StreamReader and reading only by one line, you don't need to change anything.
using (StreamReader sr = new StreamReader(path, encoding))
{
while ((line = sr.ReadLine()) != null)
{
//You are reading the file line by line and you load only the current line in the memory, not the whole file.
//do stuff which you want with the current line.
}
}

Csvhelper - read / get a single column of all rows?

Hi I'm using csvHelper to read in a csv files with a variable number of columns. The first row always contains a header row. The number of columns is unknown at first, sometimes there are three columns and sometimes there are 30+. The number of rows can be large.
I can read in the csv file, but how do I address each column of data. I need to do some basic stats on the data (e.g. min, max, stddev), then write them out in a non csv format.
Here is my code so far...
try{
using (var fileReader = File.OpenText(inFile))
using (var csvResult = new CsvHelper.CsvReader(fileReader))
{
// read the header line
csvResult.Read();
// read the whole file
dynamic recs = csvResult.GetRecords<dynamic>().ToList();
/* now how do I get a whole column ???
* recs.getColumn ???
* recs.getColumn['hadername'] ???
*/
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
}
Thanks
I don't think the library is capable of doing so directly. You have to read your column from individual fields and add them to a List, but the process is usually fast because readers do job fast. For example if your desired column is of type string, the code would be like so:
List<string> myStringColumn= new List<string>();
using (var fileReader = File.OpenText(inFile))
using (var csvResult = new CsvHelper.CsvReader(fileReader))
{
while (csvResult.Read())
{
string stringField=csvResult.GetField<string>("Header Name");
myStringColumn.Add(stringField);
}
}
using (System.IO.StreamReader file = new System.IO.StreamReader(Server.MapPath(filepath)))
{
//Csv reader reads the stream
CsvReader csvread = new CsvReader(file);
while (csvread.Read())
{
int count = csvread.FieldHeaders.Count();
if (count == 55)
{
DataRow dr = myExcelTable.NewRow();
if (csvread.GetField<string>("FirstName") != null)
{
dr["FirstName"] = csvread.GetField<string>("FirstName"); ;
}
else
{
dr["FirstName"] = "";
}
if (csvread.GetField<string>("LastName") != null)
{
dr["LastName"] = csvread.GetField<string>("LastName"); ;
}
else
{
dr["LastName"] = "";
}
}
else
{
lblMessage.Visible = true;
lblMessage.Text = "Columns are not in specified format.";
lblMessage.ForeColor = System.Drawing.Color.Red;
return;
}
}
}

Merging 2 Text Files in C#

Firstly, i'd just like to mention that I've only started learning C# a few days ago so my knowledge of it is limited.
I'm trying to create a program that will parse text files for certain phrases input by the user and then output them into a new text document.
At the moment, i have it the program searching the original input file and gathering the selected text input by the user, coping those lines out, creating new text files and then merging them together and also deleting them afterwards.
I'm guessing that this is not the most efficient way of creating this but i just created it and had it work in a logical manor for me to understand as a novice.
The code is as follows;
private void TextInput1()
{
using (StreamReader fileOpen = new StreamReader(txtInput.Text))
{
using (StreamWriter fileWrite = new StreamWriter(#"*DIRECTORY*\FIRSTFILE.txt"))
{
string file;
while ((file = fileOpen.ReadLine()) != null)
{
if (file.Contains(txtFind.Text))
{
fileWrite.Write(file + "\r\n");
}
}
}
}
}
private void TextInput2()
{
using (StreamReader fileOpen = new StreamReader(txtInput.Text))
{
using (StreamWriter fileWrite = new StreamWriter(#"*DIRECTORY*\SECONDFILE.txt"))
{
string file;
while ((file = fileOpen.ReadLine()) != null)
{
if (file.Contains(txtFind2.Text))
{
fileWrite.Write("\r\n" + file);
}
}
}
}
}
private static void Combination()
{
ArrayList fileArray = new ArrayList();
using (StreamWriter writer = File.CreateText(#"*DIRECTORY*\FINALOUTPUT.txt"))
{
using (StreamReader reader = File.OpenText(#"*DIRECTORY*\FIRSTFILE.txt"))
{
writer.Write(reader.ReadToEnd());
}
using (StreamReader reader = File.OpenText(#"*DIRECTORY*\SECONDFILE.txt"))
{
writer.Write(reader.ReadToEnd());
}
}
}
private static void Delete()
{
if (File.Exists(#"*DIRECTORY*\FIRSTFILE.txt"))
{
File.Delete(#"*DIRECTORY*\FIRSTFILE.txt");
}
if (File.Exists(#"*DIRECTORY*\SECONDFILE.txt"))
{
File.Delete(#"*DIRECTORY*\SECONDFILE.txt");
}
}
The output file that is being created is simply outputting the first text input followed by the second. I am wondering if it is possible to be able to merge them into 1 file, 1 line at a time as it is a consecutive file meaning have the information from Input 1 followed 2 is needed rather than all of 1 then all of 2.
Thanks, Neil.
To combine the two files content in an one merged file line by line you could substitute your Combination() code with this
string[] file1 = File.ReadAllLines("*DIRECTORY*\FIRSTFILE.txt");
string[] file2 = File.ReadAllLines("*DIRECTORY*\SECONDFILE.txt");
using (StreamWriter writer = File.CreateText(#"*DIRECTORY*\FINALOUTPUT.txt"))
{
int lineNum = 0;
while(lineNum < file1.Length || lineNum < file2.Length)
{
if(lineNum < file1.Length)
writer.WriteLine(file1[lineNum]);
if(lineNum < file2.Length)
writer.WriteLine(file2[lineNum]);
lineNum++;
}
}
This assumes that the two files don't contains the same number of lines.
try this method. You can receive three paths. File 1, File 2 and File output.
public void MergeFiles(string pathFile1, string pathFile2, string pathResult)
{
File.WriteAllText(pathResult, File.ReadAllText(pathFile1) + File.ReadAllText(pathFile2));
}
If the pathResult file exists, the WriteAllText method will overwrite it. Remember to include System.IO namespace.
Important: It is not recommended for large files! Use another options available on this thread.
If your input files are quite large and you run out of memory, you could also try wrapping the two readers like this:
using (StreamWriter writer = File.CreateText(#"*DIRECTORY*\FINALOUTPUT.txt"))
{
using (StreamReader reader1 = File.OpenText(#"*DIRECTORY*\FIRSTFILE.txt"))
{
using (StreamReader reader2 = File.OpenText(#"*DIRECTORY*\SECONDFILE.txt"))
{
string line1 = null;
string line2 = null;
while ((line1 = reader1.ReadLine()) != null)
{
writer.WriteLine(line1);
line2 = reader2.ReadLine();
if(line2 != null)
{
writer.WriteLine(line2);
}
}
}
}
}
Still, you have to have an idea how many lines you have in your input files, but I think it gives you the general idea to proceed.
Using a FileInfo extension you could merge one or more files by doing the following:
public static class FileInfoExtensions
{
public static void MergeFiles(this FileInfo fi, string strOutputPath , params string[] filesToMerge)
{
var fiLines = File.ReadAllLines(fi.FullName).ToList();
fiLines.AddRange(filesToMerge.SelectMany(file => File.ReadAllLines(file)));
File.WriteAllLines(strOutputPath, fiLines.ToArray());
}
}
Usage
FileInfo fi = new FileInfo("input");
fi.MergeFiles("output", "File2", "File3");
I appreciate this question is almost old enough to (up)vote (itself), but for an extensible approach:
const string FileMergeDivider = "\n\n";
public void MergeFiles(string outputPath, params string[] inputPaths)
{
if (!inputPaths.Any())
throw new ArgumentException(nameof(inputPaths) + " required");
if (inputPaths.Any(string.IsNullOrWhiteSpace) || !inputPaths.All(File.Exists))
throw new ArgumentNullException(nameof(inputPaths), "contains invalid path(s)");
File.WriteAllText(outputPath, string.Join(FileMergeDivider, inputPaths.Select(File.ReadAllText)));
}

Split text file, fastest method

Morning,
I'm trying to split a large text file (15,000,000 rows) using StreamReader/StreamWriter. Is there a quicker way?
I tested it with 130,000 rows and it took 2min 40sec which implies 15,000,000 rows will take approx 5hrs which seems a bit excessive.
//Perform split.
public void SplitFiles(int[] newFiles, string filePath, int processorCount)
{
using (StreamReader Reader = new StreamReader(filePath))
{
for (int i = 0; i < newFiles.Length; i++)
{
string extension = System.IO.Path.GetExtension(filePath);
string temp = filePath.Substring(0, filePath.Length - extension.Length)
+ i.ToString();
string FilePath = temp + extension;
if (!File.Exists(FilePath))
{
for (int x = 0; x < newFiles[i]; x++)
{
DataWriter(Reader.ReadLine(), FilePath);
}
}
else
{
return;
}
}
}
}
public void DataWriter(string rowData, string filePath)
{
bool appendData = true;
using (StreamWriter sr = new StreamWriter(filePath, appendData))
{
{
sr.WriteLine(rowData);
}
}
}
Thanks for your help.
You haven't made it very clear, but I'm assuming that the value of each element of the newFiles array is the number of lines to copy from the original into that file. Note that currently you don't detect the situation where there's either extra data at the end of the input file, or it's shorter than expected. I suspect you want something like this:
public void SplitFiles(int[] newFiles, string inputFile)
{
string baseName = Path.GetFileNameWithoutExtension(inputFile);
string extension = Path.GetExtension(inputFile);
using (TextReader reader = File.OpenText(inputFile))
{
for (int i = 0; i < newFiles.Length; i++)
{
string outputFile = baseName + i + extension;
if (File.Exists(outputFile))
{
// Better than silently returning, I'd suggest...
throw new IOException("File already exists: " + outputFile);
}
int linesToCopy = newFiles[i];
using (TextWriter writer = File.CreateText(outputFile))
{
for (int j = 0; i < linesToCopy; j++)
{
string line = reader.ReadLine();
if (line == null)
{
return; // Premature end of input
}
writer.WriteLine(line);
}
}
}
}
}
Note that this still won't detect if there's any unconsumed input... it's not clear what you want to do in that situation.
One option for code clarity is to extract the middle of this into a separate method:
public void SplitFiles(int[] newFiles, string inputFile)
{
string baseName = Path.GetFileNameWithoutExtension(inputFile);
string extension = Path.GetExtension(inputFile);
using (TextReader reader = File.OpenText(inputFile))
{
for (int i = 0; i < newFiles.Length; i++)
{
string outputFile = baseName + i + extension;
// Could put this into the CopyLines method if you wanted
if (File.Exists(outputFile))
{
// Better than silently returning, I'd suggest...
throw new IOException("File already exists: " + outputFile);
}
CopyLines(reader, outputFile, newFiles[i]);
}
}
}
private static void CopyLines(TextReader reader, string outputFile, int count)
{
using (TextWriter writer = File.CreateText(outputFile))
{
for (int i = 0; i < count; i++)
{
string line = reader.ReadLine();
if (line == null)
{
return; // Premature end of input
}
writer.WriteLine(line);
}
}
}
There are utilities for splitting files that may outperform your solution - e.g. search for "split file by line".
If they don't suit, there are solutions for loading all the source file into memory and then writing out the files but that probably isn't appropriate given the size of the source file.
In terms of improving your code, a minor improvement would be the generation of the destination file path (and also clarifying the confusing between the source filePath you use and the destination files). You don't need to re-establish the source file extension each time in your loop.
The second improvement (and probably more significant improvement - as highlighted by commenters) is about how you write out the destination files - these seem to have a differing number of lines from the source (value in each newFiles entry) that you specify you want in individual destination files? So I'd suggest for each entry you read all the source file relevant to the next destination file, then output the destination rather than repeatedly opening a destination file. You could "gather" the lines in a StringBuilder/List etc - alternatively just write them directly out to the destination file (but only opening it once)
public void SplitFiles(int[] newFiles, string sourceFilePath, int processorCount)
{
string sourceDirectory = System.IO.Path.GetDirectoryName(sourceFilePath);
string sourceFileName = System.IO.Path.GetFileNameWithoutExtension(sourceFilePath);
string extension = System.IO.Path.GetExtension(sourceFilePath);
using (StreamReader Reader = new StreamReader(sourceFilePath))
{
for (int i = 0; i < newFiles.Length; i++)
{
string destinationFileNameWithExtension = string.Format("{0}{1}{2}", sourceFileName, i, extension);
string destinationFilePath = System.IO.Path.Combine(sourceDirectory, destinationFileNameWithExtension);
if (!File.Exists(destinationFilePath))
{
// Read all the lines relevant to this destination file
// and temporarily store them in memory
StringBuilder destinationText = new StringBuilder();
for (int x = 0; x < newFiles[i]; x++)
{
destinationText.Append(Reader.ReadLine());
}
DataWriter(destinationFilePath, destinationText.ToString());
}
else
{
return;
}
}
}
}
private static void DataWriter(string destinationFilePath, string content)
{
using (StreamWriter sr = new StreamWriter(destinationFilePath))
{
{
sr.Write(content);
}
}
}
I've recently had to do this for several hundred files under 2 GB each (up to 1.92 GB), and the fastest method I found (if you have the memory available) is StringBuilder. All the other methods I tried were painfully slow.
Please note that this is memory dependent. Adjust "CurrentPosition = 130000" accordingly.
string CurrentLine = String.Empty;
int CurrentPosition = 0;
int CurrentSplit = 0;
foreach (string file in Directory.GetFiles(#"C:\FilesToSplit"))
{
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(file))
{
while ((CurrentLine = sr.ReadLine()) != null)
{
if (CurrentPosition == 130000) // Or whatever you want to split by.
{
using (StreamWriter sw = new StreamWriter(#"C:\FilesToSplit\SplitFiles\" + Path.GetFileNameWithoutExtension(file) + "-" + CurrentSplit + "." + Path.GetExtension(file)))
{
// Append this line too, so we don't lose it.
sb.Append(CurrentLine);
// Write the StringBuilder contents
sw.Write(sb.ToString());
// Clear the StringBuilder buffer, so it doesn't get too big. You can adjust this based on your computer's available memory.
sb.Clear();
// Increment the CurrentSplit number.
CurrentSplit++;
// Reset the current line position. We've found 130,001 lines of text.
CurrentPosition = 0;
}
}
else
{
sb.Append(CurrentLine);
CurrentPosition++;
}
}
}
// Reset the integers at the end of each file check, otherwise it can quickly go out of order.
CurrentPosition = 0;
CurrentSplit = 0;
}

Categories