For a project that I am doing, one of the things that I must do is delete the first X lines of a plaintext file. I'm saying X because I will need to do this routine multiple times and each time, the lines to delete will be different, but they will always start from the beginning, delete the first X and then output the results to the same file.
I am thinking about doing something like this, which I pieced together from other tutorials and examples that I read:
String line = null;
String tempFile = Path.GetTempFileName();
String filePath = openFileDialog.FileName;
int line_number = 0;
int lines_to_delete = 25;
using (StreamReader reader = new StreamReader(originalFile)) {
using (StreamWriter writer = new StreamWriter(tempFile)) {
while ((line = reader.ReadLine()) != null) {
line_number++;
if (line_number <= lines_to_delete)
continue;
writer.WriteLine(line);
}
}
}
if (File.Exists(tempFile)) {
File.Delete(originalFile);
File.Move(tempFile, originalFile);
}
But I don't know if this would work because of small stuff like line numbers starting at line 0 or whatnot... also, I don't know if it is good code in terms of efficiency and form.
Thanks a bunch.
I like it short...
File.WriteAllLines(
fileName,
File.ReadAllLines(fileName).Skip(numberLinesToSkip).ToArray());
It's OK, and doesn't look it would have the off-by-one problems you fear. However a leaner approach would be afforded by two separate loops -- one to just count the first X lines from the input file (and do nothing else), a separate one to just copy the other lines from input to output. I.e., instead of your single while loop, have...:
while ((line = reader.ReadLine()) != null) {
line_number++;
if (line_number > lines_to_delete)
break;
}
while ((line = reader.ReadLine()) != null) {
writer.WriteLine(line);
}
I like your approach, I see nothing wrong with it. If you know for certain they are small files then the other suggestions may be a little less code if that matters to you.
A slightly less verbose version of what you already have:
using (StreamReader reader = new StreamReader(originalFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while(lines_to_delete-- > 0)
reader.ReadLine();
while ((line = reader.ReadLine()) != null)
writer.WriteLine(line);
}
You could read the file into an array of lines, ignore the first few elements, and write the rest back.
The downside to this approach is that it will consume the size of the file in memory. Your approach (although pretty unreadable, no offence) doesn't have this memory problem. Although if the files are not too large, there shouldn't be a reason to worry about memory usage.
Example:
string[] lines = System.IO.File.ReadAllLines("YourFile.txt").Skip(10).ToArray();
System.IO.File.WriteAllLines("OutFile.txt", lines);
Related
I would like to insert text from one text file to another.
So for example I have a text file at C:\Users\Public\Test1.txt
first
second
third
forth
And i have a second text file at C:\Users\Public\Test2.txt
1
2
3
4
I want to insert Test2.txt into Test1.txt
The end result should be:
first
second
1
2
3
4
third
forth
It should be inserted at the third line.
So far I have this:
string strTextFileName = #"C:\Users\Public\test1.txt";
int iInsertAtLineNumber = 2;
string strTextToInsert = #"C:\Users\Public\test2.txt";
ArrayList lines = new ArrayList();
StreamReader rdr = new StreamReader(
strTextFileName);
string line;
while ((line = rdr.ReadLine()) != null)
lines.Add(line);
rdr.Close();
if (lines.Count > iInsertAtLineNumber)
lines.Insert(iInsertAtLineNumber,
strTextToInsert);
else
lines.Add(strTextToInsert);
StreamWriter wrtr = new StreamWriter(
strTextFileName);
foreach (string strNewLine in lines)
wrtr.WriteLine(strNewLine);
wrtr.Close();
However I get this when i run it:
first
second
C:\Users\Public\test2.txt
third
forth
Thanks in advance!
Instead of using StreamReaders/Writers, you can use methods from the File helper class.
const string textFileName = #"C:\Users\Public\test1.txt";
const string textToInsertFileName = #"C:\Users\Public\test2.txt";
const int insertAtLineNumber = 2;
List<string> fileContent = File.ReadAllLines(textFileName).ToList();
fileContent.InsertRange(insertAtLineNumber , File.ReadAllLines(textToInsertFileName));
File.WriteAllLines(textFileName, fileContent);
A List<string> is way more convenient than an ArrayList. I also renamed a couple of your variables (most notably textToInsertFileName, and removed the prefix cluttering your declarations, any modern IDE will tell you the datatype if you hover for half a second) and declared your constants with const.
Your original problem was related to the fact that you're never reading from strTextToInsert, looks like you thought it was already the text to insert where it's actually the filename.
Without changing your structure or types around too much you could create a method to read the lines
public ArrayList GetFileLines(string fileName)
{
var lines = new ArrayList();
using (var rdr = new StreamReader(fileName))
{
string line;
while ((line = rdr.ReadLine()) != null)
lines.Add(line);
}
return lines;
}
In the intiial question you were not reading the second file, in the following example it is a little easier to determine when you are reading the files and that each one is read:
string strTextFileName = #"C:\Users\Public\test1.txt";
int iInsertAtLineNumber = 2;
string strTextToInsert = #"C:\Users\Public\test2.txt";
ArrayList lines = new ArrayList();
lines.AddRange(GetFileLines(strTextFileName));
lines.InsertRange(iInsertAtLineNumber, GetFileLines(strTextToInsert));
using (var wrtr = new StreamWriter(strTextFileName))
{
foreach (string strNewLine in lines)
wrtr.WriteLine(strNewLine);
}
NOTE: if you wrap a reader or write in a using statement it will automatically close
I haven't tested this and it could be done better, but hopefully this gets you pointed in the right direction. This solution would completely rewrite the first file
I have a problem with the stream reader. i want to read from a text file just one line.
I want a specific line, like the seventh line. and i don't know how to.
it's a function or something like that ? like file.ReadLine(number 7) ?
The simplest approach would probably be to use LINQ combined with File.ReadLines:
string line = File.ReadLines("foo.txt").ElementAt(6); // 0-based
You could use File.ReadAllLines instead, but that would read the whole file even if you only want an early one. If you need various different lines of course, it means you can read them in one go. You could write a method to read multiple specific lines efficiently (i.e. in one pass, but no more than one line at a time) reasonably easily, but it would be overkill if you only want one line.
Note that this will throw an exception if there aren't enough lines - you could use ElementAtOrDefault if you want to handle that without any exceptions.
If you want to read line by number it's better to use
string line = File.ReadLines(fileName).Skip(N).FirstOrDefault();
Thus you will avoid reading all lines from file, and you'll read lines only until you get line you need. If you need several lines, then it's better to read all lines to array, and then get your lines from that array:
string[] lines = File.ReadAllLines(fileName);
if (lines.Count() > N)
line = lines[N];
if you want to specific line by using StreamReader.
Suppose you have a data Line1,Line2,Line3,Line4 in text files.
Every time you call "ReadLine" method it will increase 1 line.
That mean you can write you own function and passing your parmeter to function.
You can do it by.
string l1, l2, l3, l4;
StreamReader sr = new StreamReader(sourcePath);
l1 = sr.Readline(); // Line 1
l2 = sr.Readline(); // Line 2
l3 = sr.Readline(); // Line 3
public string StreamReadLine(string sourcepath, int lineNum)
{
int index = lineNum;
string strLine = "N/A";
StreamReader sr = new StreamReader(sourcepath);
try
{
for (var i = 0; i <= index; i++)
{
strLine = sr.ReadLine();
if (i == index)
break;
i += 1;
}
}
catch (Exception ex)
{
strLine = ex.ToString();
}
return strLine;
}
I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.
How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?
The process would be something like this:
Open a StreamWriter to a temporary file.
Open a StreamReader to the target file.
For each line:
Split the text into columns based on a delimiter.
Check the columns for the values you want to replace, and replace them.
Join the column values back together using your delimiter.
Write the line to the temporary file.
When you are finished, delete the target file, and move the temporary file to the target file path.
Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.
Basic example just using StreamReader and StreamWriter:
var sourcePath = #"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;
var splitExpression = new Regex(#"(" + delimiter + #")(?=(?:[^""]|""[^""]*"")*$)");
using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
string line = null;
string[] headers = null;
if (firstLineContainsHeaders)
{
line = reader.ReadLine();
lineNumber++;
if (string.IsNullOrEmpty(line)) return; // file is empty;
headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
writer.WriteLine(line); // write the original header to the temp file.
}
while ((line = reader.ReadLine()) != null)
{
lineNumber++;
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
// if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
if (headers == null) headers = new string[columns.Length];
if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));
// TODO: search and replace in columns
// example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", #"\/");
writer.WriteLine(string.Join(delimiter, columns));
}
}
File.Delete(sourcePath);
File.Move(tempPath, sourcePath);
memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files.
read here http://msdn.microsoft.com/en-us/library/dd997372.aspx
or google Memory-mapped files
Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.
using (var sr = new StreamReader(new FileStream(#"C:\temp\file.csv", FileMode.Open)))
{
var line = sr.ReadLine();
while (!sr.EndOfStream)
{
// do stuff
line = sr.ReadLine();
}
}
so i am loading a file that has some encrypted text in it, it uses a custom character table, how can i load it from an external file or put the character table in the code ?
Thank you.
Start by going over the file and counting the lines so you can allocate an array. You could just use a list here but arrays have much better performance and you have a significant amount of items which you'll have to loop over a lot (once for each encoded char in the file) so I think you should use an array instead.
int lines = 0;
try
{
using (StreamReader sr = new StreamReader("Encoding.txt"))
{
string line;
while ((line = sr.ReadLine()) != null)
{
lines++;
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
Now we're going to allocate and array of tuples;
Tuple<string, string> tuples = new Tuple<string, string>[lines];
After that we'll loop over the file again adding each key-value pair as a tuple.
try
{
using (StreamReader sr = new StreamReader("Encoding.txt"))
{
string line;
for (int i =0; i < lines; i++)
{
line = sr.Readline();
if (!line.startsWith('#')) //ignore comments
{
string[] tokens = line.Split('='); //split for key and value
foreach(string token in tokens)
token.Trim(' '); // remove whitespaces
tuples[i].Item1 = tokens[0];
tuples[i].Item2 = tokens[1];
}
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
I've given you a lot of code although this may take a little tinkering to make work. I didn't both to write the second loop in a compiler and I'm too lazy to look up things like System.String.Trim and make sure I'm using it correctly. I'll leave those things to you. This has the core logic to do it. If you want to instead use a list move the logic inside of the for loop into the while loop where I count the lines.
Do decode the file you're reading you'll have to loop over this array and compare the keys or values until you have a match.
One other thing - your array of tuples is going to have some empty indexes (the array is of length lines while there are actually lines - comments + blankLines in the file). You'll need some check to make sure you're not accessing these indexes when you try to match characters. Alternatively, you could enhance the file reading so it doesn't count blank lines or comments or remove those lines from the file you read from. The best solution would be to enhance the file reading but that's also the most work.
I am trying to read some text files, where each line needs to be processed. At the moment I am just using a StreamReader, and then reading each line individually.
I am wondering whether there is a more efficient way (in terms of LoC and readability) to do this using LINQ without compromising operational efficiency. The examples I have seen involve loading the whole file into memory, and then processing it. In this case however I don't believe that would be very efficient. In the first example the files can get up to about 50k, and in the second example, not all lines of the file need to be read (sizes are typically < 10k).
You could argue that nowadays it doesn't really matter for these small files, however I believe that sort of the approach leads to inefficient code.
First example:
// Open file
using(var file = System.IO.File.OpenText(_LstFilename))
{
// Read file
while (!file.EndOfStream)
{
String line = file.ReadLine();
// Ignore empty lines
if (line.Length > 0)
{
// Create addon
T addon = new T();
addon.Load(line, _BaseDir);
// Add to collection
collection.Add(addon);
}
}
}
Second example:
// Open file
using (var file = System.IO.File.OpenText(datFile))
{
// Compile regexs
Regex nameRegex = new Regex("IDENTIFY (.*)");
while (!file.EndOfStream)
{
String line = file.ReadLine();
// Check name
Match m = nameRegex.Match(line);
if (m.Success)
{
_Name = m.Groups[1].Value;
// Remove me when other values are read
break;
}
}
}
You can write a LINQ-based line reader pretty easily using an iterator block:
static IEnumerable<SomeType> ReadFrom(string file) {
string line;
using(var reader = File.OpenText(file)) {
while((line = reader.ReadLine()) != null) {
SomeType newRecord = /* parse line */
yield return newRecord;
}
}
}
or to make Jon happy:
static IEnumerable<string> ReadFrom(string file) {
string line;
using(var reader = File.OpenText(file)) {
while((line = reader.ReadLine()) != null) {
yield return line;
}
}
}
...
var typedSequence = from line in ReadFrom(path)
let record = ParseLine(line)
where record.Active // for example
select record.Key;
then you have ReadFrom(...) as a lazily evaluated sequence without buffering, perfect for Where etc.
Note that if you use OrderBy or the standard GroupBy, it will have to buffer the data in memory; ifyou need grouping and aggregation, "PushLINQ" has some fancy code to allow you to perform aggregations on the data but discard it (no buffering). Jon's explanation is here.
It's simpler to read a line and check whether or not it's null than to check for EndOfStream all the time.
However, I also have a LineReader class in MiscUtil which makes all of this a lot simpler - basically it exposes a file (or a Func<TextReader> as an IEnumerable<string> which lets you do LINQ stuff over it. So you can do things like:
var query = from file in Directory.GetFiles("*.log")
from line in new LineReader(file)
where line.Length > 0
select new AddOn(line); // or whatever
The heart of LineReader is this implementation of IEnumerable<string>.GetEnumerator:
public IEnumerator<string> GetEnumerator()
{
using (TextReader reader = dataSource())
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Almost all the rest of the source is just giving flexible ways of setting up dataSource (which is a Func<TextReader>).
Since .NET 4.0, the File.ReadLines() method is available.
int count = File.ReadLines(filepath).Count(line => line.StartsWith(">"));
NOTE: You need to watch out for the IEnumerable<T> solution, as it will result in the file being open for the duration of processing.
For example, with Marc Gravell's response:
foreach(var record in ReadFrom("myfile.csv")) {
DoLongProcessOn(record);
}
the file will remain open for the whole of the processing.
Thanks all for your answers! I decided to go with a mixture, mainly focusing on Marc's though as I will only need to read lines from a file. I guess you could argue seperation is needed everywhere, but heh, life is too short!
Regarding the keeping the file open, that isn't going to be an issue in this case, as the code is part of a desktop application.
Lastly I noticed you all used lowercase string. I know in Java there is a difference between capitalised and non capitalised string, but I thought in C# lowercase string was just a reference to capitalised String?
public void Load(AddonCollection<T> collection)
{
// read from file
var query =
from line in LineReader(_LstFilename)
where line.Length > 0
select CreateAddon(line);
// add results to collection
collection.AddRange(query);
}
protected T CreateAddon(String line)
{
// create addon
T addon = new T();
addon.Load(line, _BaseDir);
return addon;
}
protected static IEnumerable<String> LineReader(String fileName)
{
String line;
using (var file = System.IO.File.OpenText(fileName))
{
// read each line, ensuring not null (EOF)
while ((line = file.ReadLine()) != null)
{
// return trimmed line
yield return line.Trim();
}
}
}