Large File - Adding Lines - Out Of Memory - c#

I have a very large sql insert file which is throwing "out of memory" errors when running it in SQL Enterprise Manager.
The advice I have seen is to add the "GO" command every X amount of rows to the inserts are "batched".
I am trying to write a small function to read the file and every 50 lines add a row with the text "GO"
The code I have written is also throwing System.OutOfMemoryException when I run it.
Can anyone suggest a better way of writing my code to fix this problem please?
This is what I have written:
public static void AddGo()
{
int currentline = 0;
string FilePath = #"C:\Users\trevo_000\Desktop\fmm89386.sql";
var text = new StringBuilder();
foreach (string s in File.ReadAllLines(FilePath))
{
// add the current line
text.AppendLine(s);
// increase the line counter
currentline += 1;
if (currentline == 50)
{
text.AppendLine("GO");
currentline = 0;
}
}
using (var file = new StreamWriter(File.Create(#"C:\Users\trevo_000\Desktop\fmm89386Clean.sql")))
{
file.Write(text.ToString());
}
}

You're keeping the file in memory and then writing it from memory to a file. Instead of doing that write the output file as you work through the input file; this sort of thing:
public static void AddGo() {
int currentline = 0;
string inputFilePath = #"C:\Users\trevo_000\Desktop\fmm89386.sql";
string outputFilePath = #"C:\Users\trevo_000\Desktop\fmm89386Clean.sql";
using (var outputFileStream=File.CreateText(outputFilePath)) {
foreach (string s in File.ReadLines(inputFilePath))
{
// add the current line
outputFileStream.WriteLine(s);
// increase the line counter
currentline += 1;
if (currentline == 50)
{
outputFileStream.WriteLine("GO");
currentline = 0;
}
}
}
}
Note the use of ReadLines on the input file, rather than ReadAllLines - see What is the difference between File.ReadLines() and File.ReadAllLines()? for more info on that.

Related

Streamreader reading the same line multiple times C#

My old method (other than being wrong in general) takes too long to get multiple lines from a file and then store the parameters into a dictionary.
Essentially it's open file, grab every second line one at a time, modify the line then store the data (line pos and the first element of the line (minus) ">") close the file and then repeat.
for (int i = 0; i < linecount - 1; i += 2)
{
string currentline = File.ReadLines
(datafile).Skip(i).Take(1).First();
string[] splitline = currentline.Split(' ');
string filenumber = splitline[0].Trim('>');
} for (int i = 0; i < linecount - 1; i += 2)
You need to read next line inside while loop, otherwise loop body will always analyse first line (that's why there are Dictionary error) and never exist:
while (line != null)
{
// your current code here
line = sr.ReadLine();
}
The issue is that you only ever read the first line of the file. To solve this you need to ensure you call sr.ReadLine() on every iteration through the loop. This would look like:
using (StreamReader sr = File.OpenText(datafile))
{
string line = sr.ReadLine();
while (line != null)
{
count = count ++;
if (count % 2 == 0)
{
string[] splitline = line.Split(' ');
string datanumber = splitline[0].Trim('>');
index.Add(datanumber, count);
}
line = sr.ReadLine();
}
}
This means on each iteration, the value of line will be a new value (from the next line of the file).

Adding characters to an array list only if it does not already exist

The code below is supposed to read a text file and count all ASCII characters in the file and add up the frequency. Then, it has to write the character, ASCII value and frequency to an output file. The code is below:
class CharacterFrequency
{
char ch;
int frequency;
public char getCharacter()
{
return ch;
}
public void setCharacter(char ch)
{
this.ch = ch;
}
public int getfrequency()
{
return frequency;
}
public void setfrequency(int frequency)
{
this.frequency = frequency;
}
static void Main()
{
Console.WriteLine("Enter the file path");
var InputFileName = Console.ReadLine();
Console.WriteLine("Enter the outputfile name");
var OutputFileName = Console.ReadLine();
StreamWriter streamWriter = new StreamWriter(OutputFileName);
string data = File.ReadAllText(InputFileName);
ArrayList al = new ArrayList();
//create two for loops to traverse through the arraylist and compare
for (int i = 0; i < data.Length; i++)
{
int k = 0;
int f = 0;
for (int j = 0; j < data.Length; j++)
{
if (data[i].Equals(data[j]))
{
f++;
}
}
if (!al.Contains(data[i]))
{
al.Add(data[i] + "(" + (int)data[i] + ")" + f + " ");
}
else
{
k++;
}
//i added the below if statement but it did not fix the issue
foreach (var item in al)
{
streamWriter.WriteLine(item);
}
}
streamWriter.Close();
}
}
The code compiles and runs perfectly fine, but the output file is not correct. It is adding letters that have already been reviewed. I've added an image with the output file showing the incorrect output it is creating. --> enter image description here
How do I check if a character already exists in the array list? The way I am using is not working properly and I have been working on this for a few weeks now to no success. I have tried using the debugger but this issue will not show up there as the code still runs and compiles correctly.
An ArrayList is not well suited for this task, and in fact ArrayLists are not really used anymore. If someone is telling you that you have to do this with an ArrayList
A dictionary would be a much better container for this data. You can use the character as the key, and the count as the value.
Here's one way to do this:
var inputPath = #"c:\temp\temp.txt";
var outputPath = #"c:\temp\results.txt";
var data = new Dictionary<char, int>();
// For each character in the file, add it to the dictionary
// or increment the count if it already exists
foreach (var character in File.ReadAllText(inputPath))
{
if (data.ContainsKey(character)) data[character]++;
else data.Add(character, 1);
}
// Create our results summary
var results = data.ToList()
.Select(item => $"{item.Key} ({(int) item.Key}) {item.Value}");
// Write results to output file
File.WriteAllLines(outputPath, results);
If you have to use an ArrayList (which no one ever uses anymore, but you say have you to for some reason), it would only be useful for storing the results but not keeping track of the counts.
One way to use an ArrayList would be in combination with the Linq extension methods Distinct and Count (first to find all distinct characters, and next to get the count of each one):
foreach (var chr in data.Distinct())
{
al.Add($"{chr} ({(int) chr}) {data.Count(c => c == chr)}");
}
Your algorithm works, but you are duplicating the output as you are writing to the file inside the loop, that is why you are seeing duplicates in the result. If you move the code outside the loop, it should be ok.
foreach (var item in al)
{
streamWriter.WriteLine(item);
}
I would suggest that your algorithm while correct will behave poorly for performance, you are doing too many unnecessary comparisons, perhaps you should read/check more about using dictionaries to store the results.

Using a for loop to iterate from an array to a list

I have a text file that is divided up into many sections, each about 10 or so lines long. I'm reading in the file using File.ReadAllLines into an array, one line per element of the array, and I'm then I'm trying to parse each section of the file to bring back just some of the data. I'm storing the results in a list, and hoping to export the list to csv ultimately.
My for loop is giving me trouble, as it loops through the right amount of times, but only pulls the data from the first section of the text file each time rather than pulling the data from the first section and then moving on and pulling the data from the next section. I'm sure I'm doing something wrong either in my for loop or for each loop. Any clues to help me solve this would be much appreciated! Thanks
David
My code so far:
namespace ParseAndExport
{
class Program
{
static readonly string sourcefile = #"Path";
static void Main(string[] args)
{
string[] readInLines = File.ReadAllLines(sourcefile);
int counter = 0;
int holderCPStart = counter + 3;//Changed Paths will be an different number of lines each time, but will always start 3 lines after the startDiv
/*Need to find the start of the section and the end of the section and parse the bit in between.
* Also need to identify the blank line that occurs in each section as it is essentially a divider too.*/
int startDiv = Array.FindIndex(readInLines, counter, hyphens72);
int blankLine = Array.FindIndex(readInLines, startDiv, emptyElement);
int endDiv = Array.FindIndex(readInLines, counter + 1, hyphens72);
List<string> results = new List<string>();
//Test to see if FindIndexes work. Results should be 0, 7, 9 for 1st section of sourcefile
/*Console.WriteLine(startDiv);
Console.WriteLine(blankLine);
Console.WriteLine(endDiv);*/
//Check how long the file is so that for testing we know how long the while loop should run for
//Console.WriteLine(readInLines.Length);
//sourcefile has 5255 lines (elements) in the array
for (int i = 0; i <= readInLines.Length; i++)
{
if (i == startDiv)
{
results = (readInLines[i + 1].Split('|').Select(p => p.Trim()).ToList());
string holderCP = string.Join(Environment.NewLine, readInLines, holderCPStart, (blankLine - holderCPStart - 1)).Trim();
results.Add(holderCP);
string comment = string.Join(" ", readInLines, blankLine + 1, (endDiv - (blankLine + 1)));//in case the comment is more than one line long
results.Add(comment);
i = i + 1;
}
else
{
i = i + 1;
}
foreach (string result in results)
{
Console.WriteLine(result);
}
//csvcontent.AppendLine("Revision Number, Author, Date, Time, Count of Lines, Changed Paths, Comments");
/* foreach (string result in results)
{
for (int x = 0; x <= results.Count(); x++)
{
StringBuilder csvcontent = new StringBuilder();
csvcontent.AppendLine(results[x] + "," + results[x + 1] + "," + results[x + 2] + "," + results[x + 3] + "," + results[x + 4] + "," + results[x + 5]);
x = x + 6;
string csvpath = #"addressforcsvfile";
File.AppendAllText(csvpath, csvcontent.ToString());
}
}*/
}
Console.ReadKey();
}
private static bool hyphens72(String h)
{
if (h == "------------------------------------------------------------------------")
{
return true;
}
else
{
return false;
}
}
private static bool emptyElement(String ee)
{
if (ee == "")
{
return true;
}
else
{
return false;
}
}
}
}
It looks like you are trying to grab all of the lines in a file that are not "------" and put them into a list of strings.
You can try this:
var lineswithoutdashes = readInLines.Where(x => x != hyphens72).Select(x => x).ToList();
Now you can take this list and do the split with a '|' to extract the fields you wanted
The logic seems wrong. There are issues with the code in itself also. I am unsure what precisely you're trying to do. Anyway, a few hints that I hope will help:
The if (i == startDiv) checks to see if I equals startDiv. I assume the logic that happens when this condition is met, is what you refer to as "pulls the data from the first section". That's correct, given you only run this code when I equals startDiv.
You increase the counter I inside the for loop, which in itself also increases the counter i.
If the issue in 2. wouldn't exists then I'd suggest to not do the same operation "i = i + 1" in both the true and false conditions of the if (i == startDiv).
Given I assume this file might actually be massive, it's probably a good idea to not store it in memory, but just read the file line by line and process line by line. There's currently no obvious reason why you'd want to consume this amount of memory, unless it's because of the convenience of this API "File.ReadAllLines(sourcefile)". I wouldn't be too scared to read the file like this:
Try (BufferedReader br = new BufferedReader(new FileReader (file))) {
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
}
You can skip the lines until you've passed where the line equals hyphens72.
Then for each line, you process the line with the code you provided in the true case of (i == startDiv), or at least, from what you described, this is what I assume you are trying to do.
int startDiv will return the line number that contains hyphens72.
So your current for loop will only copy to results for the single line that matches the calculated line number.
I guess you want to search the postion of startDiv in the current line?
const string hyphens72;
// loop over lines
for (var lineNumber = 0; lineNumber <= readInLines.Length; lineNumber++) {
string currentLine = readInLines[lineNumber];
int startDiv = currentLine.IndexOf(hyphens72);
// loop over characters in line
for (var charIndex = 0; charIndex < currentLine.Length; charIndex++) {
if (charIndex == startDiv) {
var currentCharacter = currentLine[charIndex];
// write to result ...
}
else {
continue; // skip this character
}
}
}
There are a several things which could be improved.
I would use ReadLines over File.ReadAllLines( because ReadAllLines reads all the lines at ones. ReadLines will stream it.
With the line results = (readInLines[i + 1].Split('|').Select(p => p.Trim()).ToList()); you're overwriting the previous results list. You'd better use results.AddRange() to add new results.
for (int i = 0; i <= readInLines.Length; i++) means when the length = 10 it will do 11 iterations. (1 too many) (remove the =)
Array.FindIndex(readInLines, counter, hyphens72); will do a scan. On large files it will take ages to completely read them and search in it. Try to touch a single line only ones.
I cannot test what you are doing, but here's a hint:
IEnumerable<string> readInLines = File.ReadLines(sourcefile);
bool started = false;
List<string> results = new List<string>();
foreach(var line in readInLines)
{
// skip empty lines
if(emptyElement(line))
continue;
// when dashes are found, flip a boolean to activate the reading mode.
if(hyphens72(line))
{
// flip state.. (start/end)
started != started;
}
if(started)
{
// I don't know what you are doing here precisely, do what you gotta do. ;-)
results.AddRange((line.Split('|').Select(p => p.Trim()).ToList()));
string holderCP = string.Join(Environment.NewLine, readInLines, holderCPStart, (blankLine - holderCPStart - 1)).Trim();
results.Add(holderCP);
string comment = string.Join(" ", readInLines, blankLine + 1, (endDiv - (blankLine + 1)));//in case the comment is more than one line long
results.Add(comment);
}
}
foreach (string result in results)
{
Console.WriteLine(result);
}
You might want to start with a class like this. I don't know whether each section begins with a row of hyphens, or if it's just in between. This should handle either scenario.
What this is going to do is take your giant list of strings (the lines in the file) and break it into chunks - each chunk is a set of lines (10 or so lines, according to your OP.)
The reason is that it's unnecessarily complicated to try to read the file, looking for the hyphens, and process the contents of the file at the same time. Instead, one class takes the input and breaks it into chunks. That's all it does.
Another class might read the file and pass its contents to this class to break them up. Then the output is the individual chunks of text.
Another class can then process those individual sections of 10 or so lines without having to worry about hyphens or what separates on chunk from another.
Now that each of these classes is doing its own thing, it's easier to write unit tests for each of them separately. You can test that your "processing" class receives an array of 10 or so lines and does whatever it's supposed to do with them.
public class TextSectionsParser
{
private readonly string _delimiter;
public TextSectionsParser(string delimiter)
{
_delimiter = delimiter;
}
public IEnumerable<IEnumerable<string>> ParseSections(IEnumerable<string> lines)
{
var result = new List<List<string>>();
var currentList = new List<string>();
foreach (var line in lines)
{
if (line == _delimiter)
{
if(currentList.Any())
result.Add(currentList);
currentList = new List<string>();
}
else
{
currentList.Add(line);
}
}
if (currentList.Any() && !result.Contains(currentList))
{
result.Add(currentList);
}
return result;
}
}

reading and writing multiple files at the same time and performing same tasks on them

I am a beginner to programming. I wrote a code in C# to open a single file (that has 4 columns of data) and extract the fourth column into a list. Then did some basic work on the data to extract the mean, minimum and maximum values of the data set. Then, the results was written to dedicated files for the mean, minimum and maximum values.
Now I want to repeat the same tests but for a multiple sets of files - each with over 100,000 lines of data. I want to enable the program to read a multiple set of files in the same folder and then do the same calculations for each file and compile all the results for mean, minimum and maximum values into separate folders, as before.
The code for the single file is as follows;
private void button1_Click_1(object sender, EventArgs e)
{
string text = "";
DialogResult result = openFileDialog1.ShowDialog(); // Show the dialog.
// create a list to insert the data into
List<float> noise = new List<float>();
int count = 0;
float sum = 0;
float mean = 0;
float max = 0;
float min = 100;
TextWriter tw = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/date.txt");
if (result == DialogResult.OK) // Test result.
{
string file = openFileDialog1.FileName;
FileInfo src = new FileInfo(file);
TextReader reader = src.OpenText();
text = reader.ReadLine();
// while the text being read in from reader.Readline() is not null
while (text != null)
{
text = reader.ReadLine();
if (text != null)
{
string[] words = text.Split(',');
noise.Add(Convert.ToSingle(words[3]));
// write text to a file
tw.WriteLine(text);
//foreach (string word in words)
//{
// tw.WriteLine(word);
//}
}
}
}
tw.Close();
TextWriter tw1 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/noise.txt");
foreach (float ns in noise)
{
tw1.WriteLine(Convert.ToString(ns));
count++;
sum += ns;
mean = sum/count;
float min1 = 0;
if (ns > max)
max = ns;
else if (ns < max)
min1 = ns;
if (min1 < min && min1 >0)
min = min1;
else
min = min;
}
tw1.Close();
TextWriter tw2 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/summarymeans.txt");
tw2.WriteLine("Mean Noise");
tw2.WriteLine("==========");
tw2.WriteLine("mote_noise 2: {0}", Convert.ToString(mean));
tw2.Close();
TextWriter tw3 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/summarymaximums.txt");
tw3.WriteLine("Maximum Noise");
tw3.WriteLine("=============");
tw3.WriteLine("mote_noise 2: {0}", Convert.ToString(max));
tw3.Close();
TextWriter tw4 = new StreamWriter("c:/Users/a3708906/Documents/Filereader - 13062012/Filereader/summaryminimums.txt");
tw4.WriteLine("Minimum Noise");
tw4.WriteLine("=============");
tw4.WriteLine("mote_noise 2: {0}", Convert.ToString(min));
tw4.Close();
}
I will be grateful if someone could help to translate this code for working with multiple files. Thank you.
Wrap your logic for processing a single file into a single Action or a void-returning function, then enumerate the files, switch them to ParallelEnumerable and call Parallel.ForAll
For example, if you made an Action or function named DoStuff(string filename) which will do the process for a single file, you can then call it with :
Directory.EnumerateFiles(dialog.SelectedPath).AsParallel().ForAll(doStuff);
Your current code will work if you simply use Directory.GetFiles() properly. The easiest way to do it would be to have three inputs; one to get the Directory, and a second to get the file extension (if wanted), and a checkbox to ask whether or not you want to recursively search the folders or not.
Then instead of
string file = openFileDialog1.FileName;
you would instead have something like
//ensure the default fileExtensionDropdown.SelectedValue is "*"
string[] filePaths;
if(chkRecursiveSearch.IsChecked == true)
filePaths = Directory.GetFiles(dlgFolderBrowser.SelectedPath, #"*"+ddlFileExtension.SelectedValue, SearchOption.AllDirectories);
else
filePaths = Directory.GetFiles(dlgFolderBrowser.SelectedPath, #"*"+ddlFileExtension.SelectedValue);
Then you can use:
for(string path in filePaths){ // do things }
to handle each file path the way you are right now.
Please note the code I've put here is definitely not as idiomatic and tidy as it could be, but since you said you were a beginner I decided to be a bit more clear. If requested I'll put up a more idiomatic take on things, though if we do that we should probably clean up your initial code a bit as well.

How to loop through all text files in a directory C#

This piece of code takes a row from 1.txt and splits it into columns. Now I have a directory of 200 + files with ending something.txt and I want them all to open one at a time and this process below run . What is the easiest way to loop thro all the files without changing my code too much ?
Snippet of code currently ;
string _nextLine;
string[] _columns;
char[] delimiters;
delimiters = "|".ToCharArray();
_nextLine = _reader.ReadLine();
string[] lines = File.ReadAllLines("C:\\P\\DataSource2_W\\TextFiles\\Batch1\\1.txt");
//Start at index 2 - and keep looping until index Length - 2
for (int i = 3; i < lines.Length - 2; i++)
{ _columns = lines[i].Split('|');
// Check if number of cols is 3
if (_columns.Length == 146)
{
JazzORBuffer.AddRow();
JazzORBuffer.Server = _columns[0];
JazzORBuffer.Country = _columns[1];
JazzORBuffer.QuoteNumber = _columns[2];
JazzORBuffer.DocumentName =_columns[3];
JazzORBuffer.CompanyNameSoldTo=_columns[4];
}
else
{
// Debug or messagebox the line that fails
MessageBox.Show("Cols:" + _columns.Length.ToString() + " Line: " + lines[i]);
return;
}
}
You can simply use Directory.EnumerateFiles() to iterate over the files colection of the specified directory.
So you can insert your code inside foreach loop, like:
foreach (var file in
Directory.EnumerateFiles(#"C:\\P\\DataSource2_W\\TextFiles\\Batch1", "*.txt"))
{
//your code
}

Categories