Replace value & save file during reading CSV file (C#)

Replace value & save file during reading CSV file (C#) - c#

I'm reading csv file:
string line;
StreamReader sr = new StreamReader(file.ToString());
while ((line = sr.ReadLine()) != null)
{
string col1 = line.Split(',')[10]; //old value
col1 = "my value"; //new value
}
sr.Close();
sr.Dispose();
I want to replace old value by the new.
Then I need to save the file with the changes.
How can I do that?

I suggest using File class instead of Streams and Readers. Linq is very convenient when querying data:
var modifiedData = File
.ReadLines(file.ToString())
.Select(line => line.Split(','))
.Select(items => {
//TODO: put relevant logic here: given items we should return csv line
items[10] = "my value";
return string.Join(",", items);
})
.ToList(); // <- we have to store modified data in memory
File.WriteAllLines(file.ToString(), modifiedData);
Another possibility (say, when initial file is too long to fit memory) is to save the modified data into a temporary file and then Move it:
var modifiedData = File
.ReadLines(file.ToString())
.Select(line => line.Split(','))
.Select(items => {
//TODO: put relevant logic here: given items we should return csv line
items[10] = "my value";
return string.Join(",", items);
});
string tempFile = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.tmp");
File.WriteAllLines(tempFile, modifiedData);
File.Delete(file.ToString());
File.Move(tempFile, file.ToString());

Reading an entire file at once is memory-expensive. Not to mention creating its parallel copy. Using streams can fix it. Try this:
void Modify()
{
using (var fs = new FileStream(file, FileMode.Open, FileAccess.ReadWrite))
{
string line;
long position;
while ((line = fs.ReadLine(out position)) != null)
{
var tmp = line.Split(',');
tmp[1] = "00"; // new value
var newLine = string.Join(",", tmp);
fs.WriteLine(position, newLine);
}
}
}
with extensions:
static class FileStreamExtensions
{
private static readonly char[] newLine = Environment.NewLine.ToCharArray();
private static readonly int length = Environment.NewLine.Length;
private static readonly char eof = '\uFFFF';
public static string ReadLine(this FileStream fs, out long position)
{
position = fs.Position;
var chars = new List<char>();
char c;
while ((c = (char)fs.ReadByte()) != eof && (chars.Count < length || !chars.Skip(chars.Count - 2).SequenceEqual(newLine)))
{
chars.Add(c);
}
fs.Position--;
if (chars.Count == 0)
return null;
return new string(chars.ToArray());
}
public static void WriteLine(this FileStream fs, long position, string line)
{
var bytes = line.ToCharArray().Concat(newLine).Select(c => (byte)c).ToArray();
fs.Position = position;
fs.Write(bytes, 0, bytes.Length);
}
}
The shortcoming is you must keep your values the same length. E.g. 999 and __9 are both of length 3. Fixing this makes things much more complicated, so I'd leave it this way.
Full working example

Related

How to count lines

how do i count the line in log file and create a new log files of it?
Below is my log file :
DDD.CGLOG
ID|AFP|DATE|FOLDER
1|DDD|20181204|B
2|DDD|20181104|B
3|DDD|20181004|B
FFF.CGLOG
ID|AFP|DATE|FOLDER
1|FFF|20181204|B
2|FFF|20181104|B
WWW.CGLOG
ID|AFP|DATE|FOLDER
1|WWW|20181204|B
i want to count the line and create a new log file as below :
DDD_QTY.Log
AFP|QTY
DDD|3
EEE_QTY.Log
AFP|QTY
EEE|2
WWW_QTY.Log
AFP|QTY
WWW|1
Below is what i have tried. I have managed to get the count from each log file inside the folder, now i just need to write the count into a new log file using a same name with existing log file.
string[] ori_Files = Directory.GetFiles(#"F:\Work\FLP Code\test", "*.CGLOG*", SearchOption.TopDirectoryOnly);
foreach (var file in ori_Files)
{
using (StreamReader file1 = new StreamReader(file))
{
string line;
int count = 0;
while ((line = file1.ReadLine()) != null)
{
Console.WriteLine(line);
count++;
}
Console.WriteLine(count);
}
}
Console.ReadLine();

Since you only want to count lines, You can keep it simple. Assuming your file name dictates the AFP value
static long CountLinesInFile(string fileName,string outputfile)
{
var afp = Path.GetFileNameWithoutExtension(fileName);
var lineCount = File.ReadAllLines(fileName).Length;
File.WriteAllText(outputfile,$"AFP|QTY{Environment.NewLine}{afp}|{lineCount -1}");
return lineCount-1;
}
Please note you are counting a line less(headers are not counted as in your example). In case the file is different from AFP term, you can use regex to parse the AFP Term from the any line other than the header line in each term. Example Regex for parsing AFP Term
new Regex(#"^[0-9]+\|(?<AFP>[a-zA-Z]+)\|[0-9]+\|[a-zA-Z]+$")
Update
In case your file is pretty large (say 15-20Gb - considering it is a log file), a better approach would be
static long CountLinesInFile(string fileName,string outputFileName)
{
var afp = Path.GetFileNameWithoutExtension(fileName);
uint count = 0;
int query = (int)Convert.ToByte('\n');
using (var stream = File.OpenRead(fileName))
{
int current;
do
{
current = stream.ReadByte();
if (current == query)
{
count++;
continue;
}
} while (current!= -1);
}
using (System.IO.StreamWriter file = new System.IO.StreamWriter(outputFileName, true))
{
file.WriteLine($"AFP|QTY{Environment.NewLine}{afp}|{count}");
}
return count;
}
Update 2
To invoke the method for all files in a given folder, you can make use DirectoryInfo.GetFiles, for example
DirectoryInfo d = new DirectoryInfo(#"E:\TestFolder");
FileInfo[] Files = d.GetFiles("*.txt");
foreach(FileInfo file in Files )
{
CountLinesInFile(file.FullName,$"{file.FullName}.processed");
}

a simple 2 liner
static void CountLines(string path,sting outfile)
{
var count = File.ReadLines(path).Count();
File.WriteAllText(outfile, $"AFP|QTY{Environment.NewLine}DDD|{count}");
}

Reading a Line from text file and return back

I am developing a C# application in which I need to read a line from a text file and return back to first of line.
As file size may be too large I can't copy it into an array .
I tried this code
StreamReader str1 = new StreamReader(#"c:\file1.txt");
StreamReader str2 = new StreamReader(#"c:\file2.txt");
int a, b;
long pos1, pos2;
while (!str1.EndOfStream && !str2.EndOfStream)
{
pos1 = str1.BaseStream.Position;
pos2 = str2.BaseStream.Position;
a = Int32.Parse(str1.ReadLine());
b = Int32.Parse(str2.ReadLine());
if (a <= b)
{
Console.WriteLine("File1 ---> " + a.ToString());
str2.BaseStream.Seek(pos2, SeekOrigin.Begin);
}
else
{
Console.WriteLine("File2 ---> " + b.ToString());
str1.BaseStream.Seek(pos1, SeekOrigin.Begin);
}
}
When I debuged the program I found out str1.BaseStream.Position and str2.BaseStream.Position are same in every loop , so nothing will change.
Is there any better way ?
Thanks

You can use ReadLines for large file, it is deferred execution and does not load the whole file into memory, so you can manipulate lines in IEnumerable type:
var lines = File.ReadLines("path");
If you are in old .NET version, below is how to build ReadLines by yourself:
public IEnumerable<string> ReadLine(string path)
{
using (var streamReader = new StreamReader(path))
{
string line;
while((line = streamReader.ReadLine()) != null)
{
yield return line;
}
}
}

Another way Which I prefer to use.
Create a Function like this:
string ReadLine( Stream sr,bool goToNext)
{
if (sr.Position >= sr.Length)
return string.Empty;
char readKey;
StringBuilder strb = new StringBuilder();
long position = sr.Position;
do
{
readKey = (char)sr.ReadByte();
strb.Append(readKey);
}
while (readKey != (char)ConsoleKey.Enter && sr.Position<sr.Length);
if(!goToNext)
sr.Position = position;
return strb.ToString();
}
Then , Create a stream from file for It's argument
Stream stream = File.Open("C:\\1.txt", FileMode.Open);

Split text file, fastest method

Morning,
I'm trying to split a large text file (15,000,000 rows) using StreamReader/StreamWriter. Is there a quicker way?
I tested it with 130,000 rows and it took 2min 40sec which implies 15,000,000 rows will take approx 5hrs which seems a bit excessive.
//Perform split.
public void SplitFiles(int[] newFiles, string filePath, int processorCount)
{
using (StreamReader Reader = new StreamReader(filePath))
{
for (int i = 0; i < newFiles.Length; i++)
{
string extension = System.IO.Path.GetExtension(filePath);
string temp = filePath.Substring(0, filePath.Length - extension.Length)
+ i.ToString();
string FilePath = temp + extension;
if (!File.Exists(FilePath))
{
for (int x = 0; x < newFiles[i]; x++)
{
DataWriter(Reader.ReadLine(), FilePath);
}
}
else
{
return;
}
}
}
}
public void DataWriter(string rowData, string filePath)
{
bool appendData = true;
using (StreamWriter sr = new StreamWriter(filePath, appendData))
{
{
sr.WriteLine(rowData);
}
}
}
Thanks for your help.

You haven't made it very clear, but I'm assuming that the value of each element of the newFiles array is the number of lines to copy from the original into that file. Note that currently you don't detect the situation where there's either extra data at the end of the input file, or it's shorter than expected. I suspect you want something like this:
public void SplitFiles(int[] newFiles, string inputFile)
{
string baseName = Path.GetFileNameWithoutExtension(inputFile);
string extension = Path.GetExtension(inputFile);
using (TextReader reader = File.OpenText(inputFile))
{
for (int i = 0; i < newFiles.Length; i++)
{
string outputFile = baseName + i + extension;
if (File.Exists(outputFile))
{
// Better than silently returning, I'd suggest...
throw new IOException("File already exists: " + outputFile);
}
int linesToCopy = newFiles[i];
using (TextWriter writer = File.CreateText(outputFile))
{
for (int j = 0; i < linesToCopy; j++)
{
string line = reader.ReadLine();
if (line == null)
{
return; // Premature end of input
}
writer.WriteLine(line);
}
}
}
}
}
Note that this still won't detect if there's any unconsumed input... it's not clear what you want to do in that situation.
One option for code clarity is to extract the middle of this into a separate method:
public void SplitFiles(int[] newFiles, string inputFile)
{
string baseName = Path.GetFileNameWithoutExtension(inputFile);
string extension = Path.GetExtension(inputFile);
using (TextReader reader = File.OpenText(inputFile))
{
for (int i = 0; i < newFiles.Length; i++)
{
string outputFile = baseName + i + extension;
// Could put this into the CopyLines method if you wanted
if (File.Exists(outputFile))
{
// Better than silently returning, I'd suggest...
throw new IOException("File already exists: " + outputFile);
}
CopyLines(reader, outputFile, newFiles[i]);
}
}
}
private static void CopyLines(TextReader reader, string outputFile, int count)
{
using (TextWriter writer = File.CreateText(outputFile))
{
for (int i = 0; i < count; i++)
{
string line = reader.ReadLine();
if (line == null)
{
return; // Premature end of input
}
writer.WriteLine(line);
}
}
}

There are utilities for splitting files that may outperform your solution - e.g. search for "split file by line".
If they don't suit, there are solutions for loading all the source file into memory and then writing out the files but that probably isn't appropriate given the size of the source file.
In terms of improving your code, a minor improvement would be the generation of the destination file path (and also clarifying the confusing between the source filePath you use and the destination files). You don't need to re-establish the source file extension each time in your loop.
The second improvement (and probably more significant improvement - as highlighted by commenters) is about how you write out the destination files - these seem to have a differing number of lines from the source (value in each newFiles entry) that you specify you want in individual destination files? So I'd suggest for each entry you read all the source file relevant to the next destination file, then output the destination rather than repeatedly opening a destination file. You could "gather" the lines in a StringBuilder/List etc - alternatively just write them directly out to the destination file (but only opening it once)
public void SplitFiles(int[] newFiles, string sourceFilePath, int processorCount)
{
string sourceDirectory = System.IO.Path.GetDirectoryName(sourceFilePath);
string sourceFileName = System.IO.Path.GetFileNameWithoutExtension(sourceFilePath);
string extension = System.IO.Path.GetExtension(sourceFilePath);
using (StreamReader Reader = new StreamReader(sourceFilePath))
{
for (int i = 0; i < newFiles.Length; i++)
{
string destinationFileNameWithExtension = string.Format("{0}{1}{2}", sourceFileName, i, extension);
string destinationFilePath = System.IO.Path.Combine(sourceDirectory, destinationFileNameWithExtension);
if (!File.Exists(destinationFilePath))
{
// Read all the lines relevant to this destination file
// and temporarily store them in memory
StringBuilder destinationText = new StringBuilder();
for (int x = 0; x < newFiles[i]; x++)
{
destinationText.Append(Reader.ReadLine());
}
DataWriter(destinationFilePath, destinationText.ToString());
}
else
{
return;
}
}
}
}
private static void DataWriter(string destinationFilePath, string content)
{
using (StreamWriter sr = new StreamWriter(destinationFilePath))
{
{
sr.Write(content);
}
}
}

I've recently had to do this for several hundred files under 2 GB each (up to 1.92 GB), and the fastest method I found (if you have the memory available) is StringBuilder. All the other methods I tried were painfully slow.
Please note that this is memory dependent. Adjust "CurrentPosition = 130000" accordingly.
string CurrentLine = String.Empty;
int CurrentPosition = 0;
int CurrentSplit = 0;
foreach (string file in Directory.GetFiles(#"C:\FilesToSplit"))
{
StringBuilder sb = new StringBuilder();
using (StreamReader sr = new StreamReader(file))
{
while ((CurrentLine = sr.ReadLine()) != null)
{
if (CurrentPosition == 130000) // Or whatever you want to split by.
{
using (StreamWriter sw = new StreamWriter(#"C:\FilesToSplit\SplitFiles\" + Path.GetFileNameWithoutExtension(file) + "-" + CurrentSplit + "." + Path.GetExtension(file)))
{
// Append this line too, so we don't lose it.
sb.Append(CurrentLine);
// Write the StringBuilder contents
sw.Write(sb.ToString());
// Clear the StringBuilder buffer, so it doesn't get too big. You can adjust this based on your computer's available memory.
sb.Clear();
// Increment the CurrentSplit number.
CurrentSplit++;
// Reset the current line position. We've found 130,001 lines of text.
CurrentPosition = 0;
}
}
else
{
sb.Append(CurrentLine);
CurrentPosition++;
}
}
}
// Reset the integers at the end of each file check, otherwise it can quickly go out of order.
CurrentPosition = 0;
CurrentSplit = 0;
}

C# Find if a word is in a document

I am looking for a way to check if the "foo" word is present in a text file using C#.
I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.
Any comments ?

What's wrong with a simple search?
If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()

Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.
string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;
FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while ((input = SR.ReadLine()) != null)
{
firstword = input.Substring(0, input.IndexOf(" "));
if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); }
Regex RegPattern = new Regex(pattern);
Match Match1 = RegPattern.Match(input);
string value1 = Match1.ToString();
if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true; }
lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}

Here is a quick quick example using LINQ
static void Main(string[] args)
{
{ //LINQ version
bool hasFoo = "file.txt".AsLines()
.Any(l => l.Contains("foo"));
}
{ // No LINQ or Extension Methods needed
bool hasFoo = false;
foreach (var line in Tools.AsLines("file.txt"))
if (line.Contains("foo"))
{
hasFoo = true;
break;
}
}
}
}
public static class Tools
{
public static IEnumerable<string> AsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
while (line.EndsWith("-") && !reader.EndOfStream)
line = line.Substring(0, line.Length - 1)
+ reader.ReadLine();
yield return line;
}
}
}

What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.
Regex r = new Regex("\bfoo\b");
Also ensure you are taking into consideration case insensitivity if you need to.

You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.
using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
string line = null;
while (!sr.EndOfStream) {
line = sr.ReadLine();
if (line.Contains("foo"))
{
// foo was found in the file
}
}
}

You could construct a regex which allows for newlines to be placed between every character.
private static bool IsSubstring(string input, string substring)
{
string[] letters = new string[substring.Length];
for (int i = 0; i < substring.Length; i += 1)
{
letters[i] = substring[i].ToString();
}
string regex = #"\b" + string.Join(#"(\r?\n?)", letters) + #"\b";
return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}

Delete specific line from a text file?

I need to delete an exact line from a text file but I cannot for the life of me workout how to go about doing this.
Any suggestions or examples would be greatly appreciated?
Related Questions
Efficient way to delete a line from a text file (C#)

If the line you want to delete is based on the content of the line:
string line = null;
string line_to_delete = "the line i want to delete";
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
if (String.Compare(line, line_to_delete) == 0)
continue;
writer.WriteLine(line);
}
}
}
Or if it is based on line number:
string line = null;
int line_number = 0;
int line_to_delete = 12;
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
line_number++;
if (line_number == line_to_delete)
continue;
writer.WriteLine(line);
}
}
}

The best way to do this is to open the file in text mode, read each line with ReadLine(), and then write it to a new file with WriteLine(), skipping the one line you want to delete.
There is no generic delete-a-line-from-file function, as far as I know.

One way to do it if the file is not very big is to load all the lines into an array:
string[] lines = File.ReadAllLines("filename.txt");
string[] newLines = RemoveUnnecessaryLine(lines);
File.WriteAllLines("filename.txt", newLines);

Hope this simple and short code will help.
List linesList = File.ReadAllLines("myFile.txt").ToList();
linesList.RemoveAt(0);
File.WriteAllLines("myFile.txt"), linesList.ToArray());
OR use this
public void DeleteLinesFromFile(string strLineToDelete)
{
string strFilePath = "Provide the path of the text file";
string strSearchText = strLineToDelete;
string strOldText;
string n = "";
StreamReader sr = File.OpenText(strFilePath);
while ((strOldText = sr.ReadLine()) != null)
{
if (!strOldText.Contains(strSearchText))
{
n += strOldText + Environment.NewLine;
}
}
sr.Close();
File.WriteAllText(strFilePath, n);
}

You can actually use C# generics for this to make it real easy:
var file = new List<string>(System.IO.File.ReadAllLines("C:\\path"));
file.RemoveAt(12);
File.WriteAllLines("C:\\path", file.ToArray());

This can be done in three steps:
// 1. Read the content of the file
string[] readText = File.ReadAllLines(path);
// 2. Empty the file
File.WriteAllText(path, String.Empty);
// 3. Fill up again, but without the deleted line
using (StreamWriter writer = new StreamWriter(path))
{
foreach (string s in readText)
{
if (!s.Equals(lineToBeRemoved))
{
writer.WriteLine(s);
}
}
}

Read and remember each line
Identify the one you want to get rid
of
Forget that one
Write the rest back over the top of
the file

I cared about the file's original end line characters ("\n" or "\r\n") and wanted to maintain them in the output file (not overwrite them with what ever the current environment's char(s) are like the other answers appear to do). So I wrote my own method to read a line without removing the end line chars then used it in my DeleteLines method (I wanted the option to delete multiple lines, hence the use of a collection of line numbers to delete).
DeleteLines was implemented as a FileInfo extension and ReadLineKeepNewLineChars a StreamReader extension (but obviously you don't have to keep it that way).
public static class FileInfoExtensions
{
public static FileInfo DeleteLines(this FileInfo source, ICollection<int> lineNumbers, string targetFilePath)
{
var lineCount = 1;
using (var streamReader = new StreamReader(source.FullName))
{
using (var streamWriter = new StreamWriter(targetFilePath))
{
string line;
while ((line = streamReader.ReadLineKeepNewLineChars()) != null)
{
if (!lineNumbers.Contains(lineCount))
{
streamWriter.Write(line);
}
lineCount++;
}
}
}
return new FileInfo(targetFilePath);
}
}
public static class StreamReaderExtensions
{
private const char EndOfFile = '\uffff';
/// <summary>
/// Reads a line, similar to ReadLine method, but keeps any
/// new line characters (e.g. "\r\n" or "\n").
/// </summary>
public static string ReadLineKeepNewLineChars(this StreamReader source)
{
if (source == null)
throw new ArgumentNullException(nameof(source));
char ch = (char)source.Read();
if (ch == EndOfFile)
return null;
var sb = new StringBuilder();
while (ch != EndOfFile)
{
sb.Append(ch);
if (ch == '\n')
break;
ch = (char)source.Read();
}
return sb.ToString();
}
}

Are you on a Unix operating system?
You can do this with the "sed" stream editor. Read the man page for "sed"

What?
Use file open, seek position then stream erase line using null.
Gotch it? Simple,stream,no array that eat memory,fast.
This work on vb.. Example search line culture=id where culture are namevalue and id are value and we want to change it to culture=en
Fileopen(1, "text.ini")
dim line as string
dim currentpos as long
while true
line = lineinput(1)
dim namevalue() as string = split(line, "=")
if namevalue(0) = "line name value that i want to edit" then
currentpos = seek(1)
fileclose()
dim fs as filestream("test.ini", filemode.open)
dim sw as streamwriter(fs)
fs.seek(currentpos, seekorigin.begin)
sw.write(null)
sw.write(namevalue + "=" + newvalue)
sw.close()
fs.close()
exit while
end if
msgbox("org ternate jua bisa, no line found")
end while
that's all..use #d

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Replace value & save file during reading CSV file (C#) - c#

Related

How to count lines

Reading a Line from text file and return back

Split text file, fastest method

C# Find if a word is in a document

Delete specific line from a text file?

Categories

Resources