Hey guys I've been having some trouble skipping some unnecessary lines from the txt file that I am reading into my program. The data has the following format:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
I want to read line 1, trim lines 3, 4 and the white space and then read line 5, trim lines 7 and 8. I have read something similar to this here on this website, however, that particular case was skipping the first 5 lines of the text file. This is what I have tried so far:
string TextLine;
System.IO.StreamReader file =
new System.IO.StreamReader("C://log.txt");
while ((TextLine = file.ReadLine()) != null)
{
foreach (var i in Enumerable.Range(2, 3)) file.ReadLine();
Console.WriteLine(TextLine);
}
As you guys can see, for the range, I have specified the start as line 2 and then skip 3 lines, which includes the white space. However, that first parameter of Enumerable.Range does not seem to matter. I can put a 0 and it will yield the same results. As I have it right now, the program trims from the first line, until the number specified in the second parameter of the .Range function. Does anyone know of a way to get around this problem? Thanks
Why not read all the lines into an array and then just index the ones you want
var lines = File.ReadAllLines("C://log.txt");
Console.WriteLine(lines[0]);
Console.WriteLine(lines[5]);
If it's a really big file with consistent repeating sections you can create a read method and do:
while (!file.EndOfStream)
{
yield return file.ReadLine();
yield return file.ReadLine();
file.ReadLine();
file.ReadLine();
file.ReadLine();
}
or similar for whatever block format you need.
Here's an expanded version of the solution provided here at the OP's request.
public static IEnumerable<string> getMeaningfulLines(string filename)
{
System.IO.StreamReader file =
new System.IO.StreamReader(filename);
while (!file.EndOfStream)
{
//keep two lines that we care about
yield return file.ReadLine();
yield return file.ReadLine();
//discard three lines that we don't need
file.ReadLine();
file.ReadLine();
file.ReadLine();
}
}
public static void Main()
{
foreach(string line in getMeaningfulLines(#"C:/log.txt"))
{
//or do whatever else you want with the "meaningful" lines.
Console.WriteLine(line);
}
}
Here is another version that's going to be a little bit less fragile if the input file ends abruptly.
//Just get all lines from a file as an IEnumerable; handy helper method in general.
public static IEnumerable<string> GetAllLines(string filename)
{
System.IO.StreamReader file =
new System.IO.StreamReader(filename);
while (!file.EndOfStream)
{
yield return file.ReadLine();
}
}
public static IEnumerable<string> getMeaningfulLines2(string filename)
{
int counter = 0;
//This will yield when counter is 0 or 1, and not when it's 2, 3, or 4.
//The result is yield two, skip 3, repeat.
foreach(string line in GetAllLines(filename))
{
if(counter < 2)
yield return line;
//add one to the counter and have it wrap,
//so it is always between 0 and 4 (inclusive).
counter = (counter + 1) % 5;
}
}
of course the range doesn't matter ... what you're doing is skipping 2 lines at a time inside every while loop iteration - the 2-3 has no effect on the file reader pointer. I would suggest you just have a counter telling you on which line you are and skip if the line number is one of those you'd like to skip, e.g.
int currentLine = 1;
while ((TextLine = file.ReadLine()) != null)
{
if ( LineEnabled( currentLine )){
Console.WriteLine(TextLine);
}
currentLine++;
}
private boolean LineEnabled( int lineNumber )
{
if ( lineNumber == 2 || lineNumber == 3 || lineNumber == 4 ){ return false; }
return true;
}
I don't think you want to go about reading the line in two places (one in the loop and then again inside the loop). I would take this approach:
while ((TextLine = file.ReadLine()) != null)
{
if (string.IsNullOrWhitespace(TextLine)) // Or any other conditions
continue;
Console.WriteLine(TextLine);
}
The documentation for Enumerable.Range states:
public static IEnumerable<int> Range(
int start,
int count
)
Parameters
start
Type: System.Int32
The value of the first integer in the sequence.
count
Type: System.Int32
The number of sequential integers to generate.
So changing the first parameter won't change the logic of your program.
However, this is an odd way to do this. A for loop would be much simpler, easier to understand and far more efficient.
Also, you're code currently reads the first line, skips three lines and then outputs the first line and then repeats.
Have you tried something like this?
using (var file = new StreamReader("C://log.txt"))
{
var lineCt = 0;
while (var line = file.ReadLine())
{
lineCt++;
//logic for lines to keep
if (lineCt == 1 || lineCt == 5)
{
Console.WriteLine(line);
}
}
}
Although unless this is an extremely fixed format input file I'd find a different way to figure out what to do with each line rather than a fixed line number.
Related
I am reading from a text file which basically looks like:
>Name
>12345
>Name2
>32458
>Name3
>82745
and so on. I want it so once the program detects Name it prints both Name and the line after it: 12345 to the console.
Here is my code so far:
if (args[0] == "prog1")
{
List<string> lines = File.ReadAllLines(filename).ToList();
foreach (var line in lines)
{
if (line.Contains("Name"))
{
Console.WriteLine(line);
}
}
}
So far this only prints "Name" to the console and I am unsure of how to get it to print the line after it as well.
You can't access the next line if you're using a foreach loop (behind the scenes a foreach loop sets up an enumerator but you can't access it see the third solution for a way to make your own enumerator that you can control directly), but you can either:
Switch to using a for loop, and print the n+1 line
if (args[0] == "prog1")
{
string[] lines = File.ReadAllLines(filename);
for(int i = 0; i< lines.Length; i++)
{
var line = lines[i];
if (line.Contains("Name"))
{
Console.WriteLine(line);
Console.WriteLine(lines[++i]); // ++i means "increment i, then use it" so it is incremented first then used to access the line
}
}
}
Keep using the foreach and toggle a boolean to true, that will cause the next line to print even though it doesn't contain "Name", then toggle it off when you do the print
if (args[0] == "prog1")
{
List<string> lines = File.ReadAllLines(filename).ToList();
bool printLine = false;
foreach (var line in lines)
{
if (line.Contains("Name"))
{
printLine = true;
Console.WriteLine(line);
}
else if(printLine){
Console.WriteLine(line);
printLine = false;
}
}
}
Set up your own enumerator so you can move it onto the next thing yourself
string[] lines = File.ReadAllLines(filename);
var enumerator = lines.GetEnumerator(); //the enumerator starts "before" the first line of the file
while (enumerator.MoveNext()){ //moveNext returns true until the enumerator reaches the end
if(enumerator.Current.Contains("Name")){
Console.WriteLine(enumerator.Current); //print current line
if(enumerator.MoveNext()) //did we move to next line?
Console.WriteLine(enumerator.Current); //print next line
}
}
For what it's worth, I'f use the classic for loop as i find it easiest to read, understand, maintain..
Other notes:
You should add some error checking that prevents the ++i version causing a crash if the last line of the file contains "Name" - currently the code will just increment past the end of the array and then try to access it, causing a crash.
Handling this could take the form of something as simple as running to i < Length - 1 so it stops on the second to last line
Similarly the enumerator version would need protecting against this if the last line is a match for "Name" - I handled this by seeing if MoveNext() returned false
Strictly speaking you don't need to use a List<string> - File.ReadAllLines returns an array, and turning it to a list is a relatively expensive operation to perform if you don't need to. If all you will do is iterate it or change the content of individual lines (but not add or remove lines), leave it as an array of string. Using a List would make your life easier if you plan to manipulate it by inserting/removing lines though
You can implement FST (Finite State Machine); we have 2 states to consider:
0 - line doesn't contain "Name"
1 - line contains "Name"
Code:
if (args[0] == "prog1")
{
int state = 0;
// ReadLines - we don't have to read the entire file into a collection
foreach (var line in File.ReadLines(filename)) {
if (state == 0) {
if (line.Contains("Name")) {
state = 1;
Console.WriteLine(line);
}
}
else if (state == 1) {
state = 0;
Console.WriteLine(line);
}
}
}
List<string> lines = File.ReadAllLines(filename).ToList();
for (int i = 0; i < lines.Count - 1; i++)
{
if (lines[i].Contains("Name"))
{
Console.WriteLine(lines[i]);
Console.WriteLine(lines[i + 1]);
}
}
I have several log files that I need to parse and combine based on a timestamp. They're of the format:
GaRbAgE fIrSt LiNe
[1124 0905 134242422 ] Logs initialized
[1124 0905 134242568 SYSTEM] Good log entry:
{ Collection:
["Attribute"|String]
...
[1124 0905 135212932 SYSTEM] Good log entry:
As you can see I don't need the first line.
I'm currently using some Regex to parse each file: one expression determines if I have a "Logs initialized" line, which I don't care about and discard; another determines if I have a "Good log entry", which I keep and parse; and some of the good log entries span multiple lines. I simply accept the logs that are on multiple lines. However, the code currently also captures the first garbage line because it is indistinguishable from a multi-line log comment from a Regex viewpoint. Furthermore, from what I read Regex is not the solution here (Parsing a log file with regular expressions).
There are many log files and they can grow to be rather large. For this reason, I'm only reading 50 lines at a time per log before buffering and then combining them into a separate file. I loop through every file as long as there are non-null files left. Below is a code example where I replaced some conditions and variables with explanations.
while (there are non-null files left to read)
{
foreach (object logFile in logFiles) //logFiles is an array that stores the log names
{
int numLinesRead = 0;
using (StreamReader fileReader = File.OpenText(logFile.ToString()))
{
string fileLine;
// read in a line from the file
while ((fileLine = fileReader.ReadLine()) != null && numLinesRead < 50)
{
// compare line to regex expressions
Match rMatch = rExp.Match(fileLine);
if (rMatch.Success) // found good log entry
{
...
How would you skip that first garbage line? Unfortunately it is not as easy as simply consuming a line with ReadLine() because the StreamReader is within a loop and I'll end up deleting a line every 50 others.
I thought of keeping a list or array of files for which I've skipped that first line already (in order to not skip it more than once) but that is sort of ugly. I also thought of getting rid of the using statement and opening the StreamReader up before the loop but I'd prefer not to do that.
EDIT after posting I just realized that my implementation might not be correct at all. When the StreamReader closes and disposes I believe my previous position in the file will be lost. In which case, should I still use StreamReader without the using construct or is there a different type of file reader I should consider?
You could just use something like this:
Instead of this:
using (StreamReader fileReader = File.OpenText(logFile.ToString()))
{
string fileLine;
// read in a line from the file
while ((fileLine = fileReader.ReadLine()) != null && numLinesRead < 50)
{
do this:
int numLinesRead = 0;
foreach (var fileLine in File.ReadLines(logFile.ToString()).Skip(1))
{
if (++numLinesRead >= 50)
break;
Add another parameter to the method for the position in the file. First time in it's zero, and you can consume the line before you go into the loop. After that you can use it to position the stream where that last one left off.
e.g
long position = 0;
while position >= 0
{
position = ReadFiftyLines(argLogFile,0);
}
public long ReadFiftyLines(string argLogFile, long argPosition)
{
using(FileStream fs = new FileStream(argLogFile,FileMode.Open,FileAccess.Read))
{
string line = null;
if (argPosition == 0)
{
line = reader.Readline();
if (line == null)
{
return -1; // empty file
}
}
else
{
fs.Seek(argPosition,SeekOrigin.Begin);
}
StreamReader reader = new StreamReader(fs);
int count = 0;
while ((line = reader.ReadLine() != null) && (count < 50))
{
count++;
// do stuff with line
}
if (line == null)
{
return -1; // end of file
}
return fs.Position;
}
}
or somesuch.
I have a file and I have to process this file, but I have to pick just the last line of the file, and check if this line begins with the number 9, how can I do this using linq ... ?
This record, which begins with the number 9, can sometimes, not be the last line of the file, because the last line can be a \r\n
I maded one simple system to make thsi:
var lines = File.ReadAllLines(file);
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].StartsWith("9"))
{
//...
}
}
But, I whant to know if is possible to make something more fast... or, more better, using linq... :)
string output=File.ReadAllLines(path)
.Last(x=>!Regex.IsMatch(x,#"^[\r\n]*$"));
if(output.StartsWith("9"))//found
The other answers are fine, but the following is more intuitive to me (I love self-documenting code):
Edit: misinterpreted your question, updating my example code to be more appropriate
var nonEmptyLines =
from line in File.ReadAllLines(path)
where !String.IsNullOrEmpty(line.Trim())
select line;
if (nonEmptyLines.Any())
{
var lastLine = nonEmptyLines.Last();
if (lastLine.StartsWith("9")) // or char.IsDigit(lastLine.First()) for 'any number'
{
// Your logic here
}
}
You don't need LINQ something like following should work:
var fileLines = File.ReadAllLines("yourpath");
if(char.IsDigit(fileLines[fileLines.Count() - 1][0])
{
//last line starts with a digit.
}
Or for checking against specific digit 9 you can do:
if(fileLines.Last().StartsWith("9"))
if(list.Last(x =>!string.IsNullOrWhiteSpace(x)).StartsWith("9"))
{
}
Since you need to check the last two lines (in case the last line is a newline), you can do this. You can change lines to however many last lines you want to check.
int lines = 2;
if(File.ReadLines(file).Reverse().Take(lines).Any(x => x.StartsWith("9")))
{
//one of the last X lines starts with 9
}
else
{
//none of the last X lines start with 9
}
Example
If I had a text file with these lines:
The cat meowed.
The dog barked.
The cat ran up a tree.
I would want to end up with a matrix of rows and columns like this:
0 1 2 3 4 5 6 7 8 9
0| t-h-e- -c-a-t- -m-e-o-w-e-d-.- - - - - - - -
1| t-h-e- -d-o-g- -b-a-r-k-e-d-.- - - - - - - -
2| t-h-e- -c-a-t- -r-a-n- -u-p- -a- -t-r-e-e-.-
Then I would like to query this matrix to quickly determine information about the text file itself. For example, I would quickly be able to tell if everything in column "0" is a "t" (it is).
I realize that this might seem like a strange thing to do. I am trying to ultimately (among other things) determine if various text files are fixed-width delimited without any prior knowledge about the file. I also want to use this matrix to detect patterns.
The actual files that will go through this are quite large.
Thanks!
For example, I would quickly be able to tell if everything in column "0" is a "t" (it is).
int column = 0;
char charToCheck = 't';
bool b = File.ReadLines(filename)
.All(s => (s.Length > column ? s[column] : '\0') == charToCheck);
What you can do is read the first line of your text file and use it as a mask. Compare every next line to the mask and remove every character from the mask that is not the same as the character at the same position. After processing al lines you'll have a list of delimiters.
Btw, code is not very clean but it is a good starter I think.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace DynamicallyDetectFixedWithDelimiter
{
class Program
{
static void Main(string[] args)
{
var sr = new StreamReader(#"C:\Temp\test.txt");
// Get initial list of delimiters
char[] firstLine = sr.ReadLine().ToCharArray();
Dictionary<int, char> delimiters = new Dictionary<int, char>();
for (int i = 0; i < firstLine.Count(); i++)
{
delimiters.Add(i, firstLine[i]);
}
// Read subsequent lines, remove delimeters from
// the dictionary that are not present in subsequent lines
string line;
while ((line = sr.ReadLine()) != null && delimiters.Count() != 0)
{
var subsequentLine = line.ToCharArray();
var invalidDelimiters = new List<int>();
// Compare all chars in first and subsequent line
foreach (var delimiter in delimiters)
{
if (delimiter.Key >= subsequentLine.Count())
{
invalidDelimiters.Add(delimiter.Key);
continue;
}
// Remove delimiter when it differs from the
// character at the same position in a subsequent line
if (subsequentLine[delimiter.Key] != delimiter.Value)
{
invalidDelimiters.Add(delimiter.Key);
}
}
foreach (var invalidDelimiter in invalidDelimiters)
{
delimiters.Remove(invalidDelimiter);
}
}
foreach (var delimiter in delimiters)
{
Console.WriteLine(String.Format("Delimiter at {0} = {1}", delimiter.Key, delimiter.Value));
}
sr.Close();
}
}
}
"I am trying to ultimately (among other things) determine if various text files are fixed-width (...)"
If that's so, you could try this:
public bool isFixedWidth (string fileName)
{
string[] lines = File.ReadAllLines(fileName);
int length = lines[0].Length;
foreach (string s in lines)
{
if (s.length != Length)
{
return false;
}
}
return true;
}
Once you get that lines variable, you can access any character as though they were in a matrix. Like char c = lines[3][1];. However, there is no hard guarantee that all lines are the same length. You could pad them to be the same length as the longest one, if you so wanted.
Also,
"how would I query to get a list of all columns that contain a space character for ALL rows (for example)"
You could try this:
public bool CheckIfAllCharactersInAColumnAreTheSame (string[] lines, int colIndex)
{
char c = lines[0][colIndex];
try
{
foreach (string s in lines)
{
if (s[colIndex] != c)
{
return false;
}
}
return true;
}
catch (IndexOutOfRangeException ex)
{
return false;
}
}
Since it's not clear where you're have difficulty exactly, here are a few pointers.
Reading the file as strings, one per line:
string[] lines = File.ReadAllLines("filename.txt");
Obtaning a jagged array (a matrix) of characters from the lines (this step seems unnecessary since strings can be indexed just like character arrays):
char[][] charMatrix = lines.Select(l => l.ToCharArray()).ToArray();
Example query: whether every character in column 0 is a 't':
bool allTs = charMatrix.All(row => row[0] == 't');
suppose this is my txt file:
line1
line2
line3
line4
line5
im reading content of this file with:
string line;
List<string> stdList = new List<string>();
StreamReader file = new StreamReader(myfile);
while ((line = file.ReadLine()) != null)
{
stdList.Add(line);
}
finally
{//need help here
}
Now i want to read data in stdList, but read only value every 2 line(in this case i've to read "line2" and "line4").
can anyone put me in the right way?
Even shorter than Yuck's approach and it doesn't need to read the whole file into memory in one go :)
var list = File.ReadLines(filename)
.Where((ignored, index) => index % 2 == 1)
.ToList();
Admittedly it does require .NET 4. The key part is the overload of Where which provides the index as well as the value for the predicate to act on. We don't really care about the value (which is why I've named the parameter ignored) - we just want odd indexes. Obviously we care about the value when we build the list, but that's fine - it's only ignored for the predicate.
You can simplify your file read logic into one line, and just loop through every other line this way:
var lines = File.ReadAllLines(myFile);
for (var i = 1; i < lines.Length; i += 2) {
// do something
}
EDIT: Starting at i = 1 which is line2 in your example.
Add a conditional block and a tracking mechanism inside of a loop. (The body of the loop is as follows:)
int linesProcessed = 0;
if( linesProcessed % 2 == 1 ){
// Read the line.
stdList.Add(line);
}
else{
// Don't read the line (Do nothing.)
}
linesProcessed++;
The line linesProcessed % 2 == 1 says: take the number of lines we have processed already, and find the mod 2 of this number. (The remainder when you divide that integer by 2.) That will check to see if the number of lines processed is even or odd.
If you have processed no lines, it will be skipped (such as line 1, your first line.) If you have processed one line or any odd number of lines already, go ahead and process this current line (such as line 2.)
If modular math gives you any trouble, see the question: https://stackoverflow.com/a/90247/758446
try this:
string line;
List<string> stdList = new List<string>();
StreamReader file = new StreamReader(myfile);
while ((line = file.ReadLine()) != null)
{
stdList.Add(line);
var trash = file.ReadLine(); //this advances to the next line, and doesn't do anything with the result
}
finally
{
}