Writing to txt file reach out of memory ? C# [closed] - c#

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm trying to write huge lines to a .txt file in a C# Application, around 20-30 thousand lines :
class Program
{
static void Main(string[] args)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
// the code that you want to measure comes here
List <string> firstListTrade= new List<string>();
List <string> secondListTrade= new List<string>();
firstListTrade= System.IO.File.ReadAllLines(#"C:\Users\me\Desktop\File1.txt").ToList();
secondListTrade=System.IO.File.ReadAllLines(#"C:\Users\me\Desktop\Fil2.txt").ToList();
string resultOne = "C:\\Users\\me\\Desktop\\resultOutput1.txt";
string resultatsTwoo = "C:\\Users\\elbb001\\Desktop\\resultOutput2.txt";
//Sorting lists
firstListTrade= firstListTrade.OrderBy(q => q).ToList();
secondListTrade= secondListTrade.OrderBy(q => q).ToList();
// Write the string array to a new file named "WriteLines.txt".
StreamWriter outputFileOne = new StreamWriter(resultatsOne);
StreamWriter outputFileTwoo = new StreamWriter(resultatsTwoo);
int i = firstListTrade.Count();
int j = secondListTrade.Count();
int endofFile = 0;
foreach (string trade in secondListTrade)
{
endofFile++;
if (!firstListTrade.Contains(trade))
{
outputFileOne.WriteLine("Number : " + trade + " exist in first list but not second");
}
if(endofFile==i)
{
outputFileOne.WriteLine("End of file : " + endofFile);
}
outputFileOne.Flush();
}
endofFile = 0;
foreach (string trade in firstListTrade)
{
endofFile++;
if (!secondListTrade.Contains(trade))
{
outputFileTwoo.WriteLine("Number : " + trade + " exist in second but not in first ");
}
if (endofFile == j)
{
outputFileTwoo.WriteLine("End of file : "+ endofFile);
}
}
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
// Keep the console window open in debug mode.
Console.WriteLine("Done in : " + elapsedMs.ToString());
System.Console.ReadKey();
}
}
And I compiled and received no error , but when I opened the file I saw the results I want , but when I scrolled all the way down I noticed at the end of file this sentence :
" Trade number : 22311 "
But when I used breakpoints on the end of file , it was reached in code but not written in the file?
What could have gone wrong? Did it reach out of memory? or the txt file can't write anymore?

According to new question, here is the answer:
StreamWriter outputFileTwoo = new StreamWriter(resultatsTwoo);
List <string> firstListThatIcantRevealItName= new List<string>();
List <string> secondListThatIcantRevealItName= new List<string>();
firstListThatIcantRevealItName=System.IO.File.ReadAllLines(#"C:\Users\me\Desktop\blabla.txt").ToList();
secondListThatIcantRevealItName=System.IO.File.ReadAllLines(#"C:\Users\me\Desktop\potto.txt").ToList();
using(StreamWriter outputFileOne = new StreamWriter(resultatsOne))
{
foreach (string trade in secondListThatIcantRevealItName)
{
endofFile++;
if (!secondListThatIcantRevealItName.Contains(trade))
{
outputFileOne.WriteLine("Trade number : " + trade + " exist in first list but not in second list ");
}
if(endofFile==i)
{
outputFileOne.WriteLine(endofFile);
}
}
}

Related

Read a file line by line and write it transformed line by line into another file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have to read the content of an existing file "operativedata.txt" line by line and write this data into an excelsheet "datawarehouse.csv" line by line. The content of the file "operativedata.txt" looks like following in the picture (operativedata.txt)):
The data in operativedata.txt have to be written transformed in a different way to datawarehouse.csv.
It have to look like this in the csv file:
"date;time;randomvalue\n" \n means after these three lines do a return
This type of format have to be written (all 10 seconds) in the datawarehouse.csv file. It should look like this at the end: datawarehouse.csv
Code for generating datawarehouse.csv:
using System;
using System.IO;
using System.Threading;
namespace etltxt2csv
{
class Program
{
string[]content;//Array for reading each line of the file operativedata.txt
public void loop()
{
content = File.ReadAllLines(#"C:\ETL-Process\operativedata.txt");//File to be Read
while (true)
{
for (int i = 0; i < content.Length; i++)
{
Thread.Sleep(10000);
File.AppendAllText(#"C:\ETL-Process\datawarehouse.csv", content[i] + ";" + "\n");
Console.WriteLine(content[i]);
}
}
}
static void Main(string[] args)
{
Program a= new Program();
a.loop();
}
}
}
operativedata.txt was created with the following code:
using System;
using System.Threading;
using System.IO;
namespace createData
{
class Program
{
string file, date, time;
int randomValue;
public void createFile()
{
file = #"C:\ETL-Process\operativedata.txt";
date = DateTime.Now.ToString("yyMMdd");
time = DateTime.Now.ToString("HHmmss");
Random random = new Random();
while (true)
{
randomValue = random.Next(200);
Thread.Sleep(10000);
File.AppendAllText(file, "\r\n" +date + "\r\n" + time + "\r\n" + randomValue);
}
}
static void Main(string[] args)
{
Program a = new Program();
a.createFile();
}
}
}
What do I have to change in the code of etltxt2csv so that the data can be written transformed in this way:
"date;time;randomvalue\n" (always the first three lines in then the next three lines with a return \n)
into the file datawarehouse.csv ?
When I'm executing the code of etltxt2 I listed here, the Output of my excel file is not transformed like above in the picture (datawarehouse.csv)
You need to read three lines from the .txt file, and combine them into a single line for output:
using (var input = File.OpenText(txt_file_name))
{
using (var output = File.CreateText(csv_file_name))
{
var date = input.ReadLine();
var time = input.ReadLine();
var random = input.ReadLine();
output.WriteLine("{0};{1};{2}", date, time, random);
}
}
var is a keyword that tells the compiler to infer the type. For example, when you write:
var input = File.OpenText(filename);
The compiler knows that File.OpenText returns a StreamReader, so it automatically makes input type StreamReader. Here's the same code as above, with var replaced by the actual types:
using (StreamReader input = File.OpenText(txt_file_name))
{
using (StreamWriter output = File.CreateText(csv_file_name))
{
string date = input.ReadLine();
string time = input.ReadLine();
string random = input.ReadLine();
output.WriteLine("{0};{1};{2}", date, time, random);
}
}
Agreed with Jim Mischel, 3 lines need to be read per line in the output file.
Here's what the loop could look like if you're keeping the same API calls:
(Note: It's better memory use and performance to use Jim Mischel's API calls though...)
for (int i = 0; i + 2 < content.Length; i += 3)
{
// Console.WriteLine(content[i] + ";" + content[i+1] + ";" + content[i+2]);
File.AppendAllText(
#"C:\ETL-Process\datawarehouse.csv",
content[i + 0] + ";"
+ content[i + 1] + ";"
+ content[i + 2] + "\n"
);
}

Is there a more efficient way of reading and writing a text fill at the same time?

I'm back at it again with another question, this time with regards to editing text files. My home work is as follow
Write a program that reads the contents of a text file and inserts the line numbers at the beginning of each line, then rewrites the file contents.
This is what I have so far, though I am not so sure if this is the most efficient way of doing it. I've only started learning on handling text files at the moment.
static void Main(string[] args)
{
string fileName = #"C:\Users\Nate\Documents\Visual Studio 2015\Projects\Chapter 15\Chapter 15 Question 3\Chapter 15 Question 3\TextFile1.txt";
StreamReader reader = new StreamReader(fileName);
int lineCounter = 0;
List<string> list = new List<string>();
using (reader)
{
string line = reader.ReadLine();
while (line != null)
{
list.Add("line " + (lineCounter + 1) + ": " + line);
line = reader.ReadLine();
lineCounter++;
}
}
StreamWriter writer = new StreamWriter(fileName);
using (writer)
{
foreach (string line in list)
{
writer.WriteLine(line);
}
}
}
your help would be appreciated!
thanks once again. :]
this should be enough (in case the file is relatively small):
using System.IO;
(...)
static void Main(string[] args)
{
string fileName = #"C:\Users\Nate\Documents\Visual Studio 2015\Projects\Chapter 15\Chapter 15 Question 3\Chapter 15 Question 3\TextFile1.txt";
string[] lines = File.ReadAllLines(fileName);
for (int i = 0; i< lines.Length; i++)
{
lines[i] = string.Format("{0} {1}", i + 1, lines[i]);
}
File.WriteAllLines(fileName, lines);
}
I suggest using Linq, use File.ReadLinesto read the content.
// Read all lines and apply format
var formatteLines = File
.ReadLines("filepath") // read lines
.Select((line, i) => string.Format("line {0} :{1} ", line, i+1)); // format each line.
// write formatted lines to either to the new file or override previous file.
File.WriteAllLines("outputfilepath", formatteLines);
Just one loop here. I think it will be efficient.
class Program
{
public static void Main()
{
string path = Directory.GetCurrentDirectory() + #"\MyText.txt";
StreamReader sr1 = File.OpenText(path);
string s = "";
int counter = 1;
StringBuilder sb = new StringBuilder();
while ((s = sr1.ReadLine()) != null)
{
var lineOutput = counter++ + " " + s;
Console.WriteLine(lineOutput);
sb.Append(lineOutput);
}
sr1.Close();
Console.WriteLine();
StreamWriter sw1 = File.AppendText(path);
sw1.Write(sb);
sw1.Close();
}

What is the fast process to find the duplicate row from a Csv file?

I have a csv file containing 15,00,000 records and need to find the duplicate rows from the csv file. I am trying as below code
DataTable dtUniqueDataView = dvDataView.ToTable(true, Utility.GetHeadersFromCsv(csvfilePath).Select(c => c.Trim()).ToArray());
But in this I am not getting the duplicate records and it is taking nearly 4 mins time to do the operation. Can any one suggest the process which could reduce the time and give the duplicate result set.
Not the final solution but maybe something to start with:
Read the CSV file line by line and calculate a hash value of each line. You should be able to keep those numeric values in memory.
String.GetHashCode() is not good enough for this purpose as it may return same results for different strings as pointed out correctly in the comments. A more stable hashing algorithm is required.
Store them away in a HashSet<int> and check if the value already exists in there. If yes, you can skip the row.
Note: If most of the time is spent reading the file (as assumed in the comment above), you will have to work on this issue first. My assumption is you're worried about finding the duplicates.
Read the csv file as a stream. Read it only a line at a time. For each line read, calculate the MD5 hash and compare if the hash already exists in your stash. If it does it's a duplicate.
I wrote an example with Hashset:
Output (15,000,000 entries in a csv file):
Reading File
File distinct read in 1600,6632 ms
Output (30,000,000 entries in a csv file):
Reading File
File distinct read in 3192,1997 ms
Output (45,000,000 entries in a csv file):
Reading File
File distinct read in 4906,0755 ms
class Program
{
static void Main(string[] args)
{
string csvFile = "test.csv";
if (!File.Exists(csvFile)) //Create a test CSV file
CreateCSVFile(csvFile, 45000000, 15000);
List<string> distinct = GetDistinct(csvFile); //Returns every line once
Console.ReadKey();
}
static List<string> GetDistinct(string filename)
{
Stopwatch sw = new Stopwatch();//just a timer
List<HashSet<string>> lines = new List<HashSet<string>>(); //Hashset is very fast in searching duplicates
HashSet<string> current = new HashSet<string>(); //This hashset is used at the moment
lines.Add(current); //Add the current Hashset to a list of hashsets
sw.Restart(); //just a timer
Console.WriteLine("Reading File"); //just an output message
foreach (string line in File.ReadLines(filename))
{
try
{
if (lines.TrueForAll(hSet => !hSet.Contains(line))) //Look for an existing entry in one of the hashsets
current.Add(line); //If line not found, at the line to the current hashset
}
catch (OutOfMemoryException ex) //Hashset throws an Exception by ca 12,000,000 elements
{
current = new HashSet<string>() { line }; //The line could not added before, add the line to the new hashset
lines.Add(current); //add the current hashset to the List of hashsets
}
}
sw.Stop();//just a timer
Console.WriteLine("File distinct read in " + sw.Elapsed.TotalMilliseconds + " ms");//just an output message
List<string> concatinated = new List<string>(); //Create a list of strings out of the hashset list
lines.ForEach(set => concatinated.AddRange(set)); //Fill the list of strings
return concatinated; //Return the list
}
static void CreateCSVFile(string filename, int entries, int duplicateRow)
{
StringBuilder sb = new StringBuilder();
using (FileStream fs = File.OpenWrite(filename))
using (StreamWriter sw = new StreamWriter(fs))
{
Random r = new Random();
string duplicateLine = null;
string line = "";
for (int i = 0; i < entries; i++)
{
line = r.Next(1, 10) + ";" + r.Next(11, 45) + ";" + r.Next(20, 500) + ";" + r.Next(2, 11) + ";" + r.Next(12, 46) + ";" + r.Next(21, 501);
sw.WriteLine(line);
if (i % duplicateRow == 0)
{
if (duplicateLine != null && i < entries - 1)
{
sb.AppendLine(duplicateLine);
i++;
}
duplicateLine = line;
}
}
}
}
}

How to append several lines when iterating through List?

I'd appreciate if someone could advise on the following.
I read the file containing the below text and writing each line into the List<string>:
CODE/1
NAME/some_name1
SHORT_NAME/short_name1
CODE/2
NAME/document is a piece of paper
containing information often
used as proof of something
SHORT_NAME/document is a piece
Now I'm parsing the list to get CODE, NAME and SHORT_NAME separately.
The problem is that some lines containing NAME have one sencence which is broken into several lines due to its long length. I want to append these lines into one sentence, the output should be:
...
NAME/document is a piece of paper containing information often used as proof of something
...
My code appends only one next line:
List<string> lines = File.ReadLines(path).ToList();
List<string> full_lines = new List<string>();
foreach (string line in lines)
{
if (line.StartsWith("NAME"))
{
name_index = lines.IndexOf(line);
string new_line = "";
if (!lines.ElementAt(name_index + 1).StartsWith("SHORT_NAME")) //checking if
//the next line does not start with SHORT_NAME (then it is continuation of NAME)
{
new_line = line + " " + lines.ElementAt(name_index + 1);//appending the next
//line
full_lines.Add(new_line); //adding into new list
}
else
{
full_lines.Add(line);
}
}
}
So the output is:
...
NAME/document is a piece of paper
...
So, how can I append all lines?
Thank you
When you're reading the file, read each line separately, instead of all them together. Then don't create a new line unless it starts with a key word or if the '/' is unique unless the line contains a '/'. Something like this might help:
List<string> full_lines = new List<string>();
System.IO.StreamReader sr = new System.IO.StreamReader(path);
string line = "";
while(!sr.EndOfStream)
{
line = sr.ReadLine();
if(!line.Contains("/"))
{
full_lines[full_lines.Count - 1] += line;
}
else
full_lines.Add(line);
}
change
if (!lines.ElementAt(name_index + 1).StartsWith("SHORT_NAME")) //checking if
//the next line does not start with SHORT_NAME (then it is continuation of NAME)
{
new_line = line + " " + lines.ElementAt(name_index + 1);//appending the next
//line
full_lines.Add(new_line); //adding into new list
}
else
{
full_lines.Add(line);
}
to
new_line = line;
name_index++;
while (!lines.ElementAt(name_index).StartsWith("SHORT_NAME"))
{
new_line = new_line + " " + lines.ElementAt(name_index);//appending the next line
name_index++;
}
full_lines.Add(new_line);

read file and grab integer [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have txt file which contains number that I want to grab. This number has prefix which can be used to identify location inside file.
GeneratedNumber="120"
Number can be of any Int32 length value.
p.s. format of the file is .txt, one line contains more this key value pairs for example:
<Output Change="12.13" GeneratedNumber="120" Total="99.21" />
You can use the following code. Not very elegant or the best but tested and works fine.
string[] lines = File.ReadAllLines(Path.Combine(Application.StartupPath, "test.txt"));
foreach (string s in lines)
{
if (s.ToLowerInvariant().Contains("generatednumber"))
{
string temp = s.Substring(s.ToLowerInvariant().IndexOf("generatednumber"));
temp = temp.Substring(temp.IndexOf("\"") + 1);
temp = temp.Substring(0,temp.IndexOf("\""));
int yournumber;
if (int.TryParse(temp, out yournumber))
{
Console.WriteLine("Generated Number = ", yournumber);
}
}
}
I've only tested this as far as the xml side but this should work (You may wish to add error handling and the conversion to integers)
var values = new List<string>();
using(var sr = new StreamReader(fileName))
{
string line;
XmlDocument x = new XmlDocument();
while((line = sr.ReadLine()) != null)
{
x.LoadXml(line);
foreach(var node in x.GetElementsByTagName("Output"))
values.Add(node.Attributes["GeneratedNumber"].Value);
}
}
Tested using:
XmlDocument x = new XmlDocument();
x.LoadXml("<Output Change=\"12.13\" GeneratedNumber=\"120\" Total=\"99.21\" />");
Console.WriteLine(x.GetElementsByTagName("Output")[0]
.Attributes["GeneratedNumber"].Value);
Console.ReadLine();
you can use this code
// Read each line of the file into a string array. Each element
// of the array is one line of the file.
string[] lines = System.IO.File.ReadAllLines(#"C:\yourFile.txt");
foreach (string line in lines)
{
string sub = line.Substring(line.IndexOf("GeneratedNumber=") + 1);
int num = int.Parse(sub.IndexOf("\""));
// whatever you want to do with the integer
}
to read the text file lines and parse the lines after the "=" sign to integers.
depend on the look of the file you might use XmlDocument. please read about Xml here
string filePath = "your_file_path";
var match = System.Text.RegularExpressions.Regex.Match(
System.IO.File.ReadAllText(filePath),
#"GeneratedNumber=""(\d+)""",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
int num = match.Success ? int.Parse(match.Groups[1].Value) : 0;
Assuming there's only one instance of that number in the file or you want to grab only the first one even if there are multiple.
string[] lines = File.ReadAllLines("path to file");
Hashtable values = new Hashtable();
foreach (string line in lines)
{
if (line.Contains("=\""))
{
string[] split = line.Split('=');
values.Add(split[0], split[1].Replace("\"",""));
}
}
// GeneratedNumber is the value of GeneratedNumber in the file.
int GeneratedNumber = Int32.Parse(values["GeneratedNumber"].ToString());
This code should match your needs:
private static int GetNumber(string fileName)
{
string line;
string key = "GeneratedNumber=\"";
using (StreamReader file = new StreamReader(fileName))
{
while ((line = file.ReadLine()) != null)
{
if (line.Contains(key))
{
int startIndex = line.IndexOf(key) + key.Length;
int endIndex = line.IndexOf("\"", startIndex);
return int.Parse(line.Substring(startIndex, endIndex - startIndex));
}
}
}
return 0;
}
Also you may be interested in these articles:
Using of StreamReader
String methods
Int32.Parse method

Categories