reading string each number c# - c#

suppose this is my txt file:
line1
line2
line3
line4
line5
im reading content of this file with:
string line;
List<string> stdList = new List<string>();
StreamReader file = new StreamReader(myfile);
while ((line = file.ReadLine()) != null)
{
stdList.Add(line);
}
finally
{//need help here
}
Now i want to read data in stdList, but read only value every 2 line(in this case i've to read "line2" and "line4").
can anyone put me in the right way?

Even shorter than Yuck's approach and it doesn't need to read the whole file into memory in one go :)
var list = File.ReadLines(filename)
.Where((ignored, index) => index % 2 == 1)
.ToList();
Admittedly it does require .NET 4. The key part is the overload of Where which provides the index as well as the value for the predicate to act on. We don't really care about the value (which is why I've named the parameter ignored) - we just want odd indexes. Obviously we care about the value when we build the list, but that's fine - it's only ignored for the predicate.

You can simplify your file read logic into one line, and just loop through every other line this way:
var lines = File.ReadAllLines(myFile);
for (var i = 1; i < lines.Length; i += 2) {
// do something
}
EDIT: Starting at i = 1 which is line2 in your example.

Add a conditional block and a tracking mechanism inside of a loop. (The body of the loop is as follows:)
int linesProcessed = 0;
if( linesProcessed % 2 == 1 ){
// Read the line.
stdList.Add(line);
}
else{
// Don't read the line (Do nothing.)
}
linesProcessed++;
The line linesProcessed % 2 == 1 says: take the number of lines we have processed already, and find the mod 2 of this number. (The remainder when you divide that integer by 2.) That will check to see if the number of lines processed is even or odd.
If you have processed no lines, it will be skipped (such as line 1, your first line.) If you have processed one line or any odd number of lines already, go ahead and process this current line (such as line 2.)
If modular math gives you any trouble, see the question: https://stackoverflow.com/a/90247/758446

try this:
string line;
List<string> stdList = new List<string>();
StreamReader file = new StreamReader(myfile);
while ((line = file.ReadLine()) != null)
{
stdList.Add(line);
var trash = file.ReadLine(); //this advances to the next line, and doesn't do anything with the result
}
finally
{
}

Related

Merging CSV lines in huge file

I have a CSV that looks like this
783582893T,2014-01-01 00:00,0,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582893T,2014-01-01 00:15,1,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582893T,2014-01-01 00:30,2,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582855T,2014-01-01 00:00,0,128,35.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582855T,2014-01-01 00:15,1,128,35.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582855T,2014-01-01 00:30,2,128,35.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
...
783582893T,2014-01-02 00:00,0,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582893T,2014-01-02 00:15,1,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582893T,2014-01-02 00:30,2,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
although there are 5 billion records. If you notice the first column and part of the 2nd column (the day), three of the records are all 'grouped' together and are just a breakdown of 15 minute intervals for the first 30 minutes of that day.
I want the output to look like
783582893T,2014-01-01 00:00,0,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
783582855T,2014-01-01 00:00,0,128,35.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
...
783582893T,2014-01-02 00:00,0,124,29.1,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y,40.0,0.0,40,40,5,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,Y
Where the first 4 columns of the repeating rows are ommitted and the rest of the columns are combined with the first record of it's kind. Basically I am converting the day from being each line is 15 minutes, to each line is 1 day.
Since I will be processing 5 billion records, I think the best thing is to use regular expressions (and EmEditor) or some tool that is made for this (multithreading, optimized), rather than a custom programmed solution. Althought I am open to ideas in nodeJS or C# that are relatively simple and super quick.
How can this be done?
If there's always a set number of records records and they're in order, it'd be fairly easy to just read a few lines at a time and parse and output them. Trying to do regex on billions of records would take forever. Using StreamReader and StreamWriter should make it possible to read and write these large files since they read and write one line at a time.
using (StreamReader sr = new StreamReader("inputFile.txt"))
using (StreamWriter sw = new StreamWriter("outputFile.txt"))
{
string line1;
int counter = 0;
var lineCountToGroup = 3; //change to 96
while ((line1 = sr.ReadLine()) != null)
{
var lines = new List<string>();
lines.Add(line1);
for(int i = 0; i < lineCountToGroup - 1; i++) //less 1 because we already added line1
lines.Add(sr.ReadLine());
var groupedLine = lines.SomeLinqIfNecessary();//whatever your grouping logic is
sw.WriteLine(groupedLine);
}
}
Disclaimer- untested code with no error handling and assuming that there are indeed the correct number of lines repeated, etc. You'd obviously need to do some tweaks for your exact scenario.
You could do something like this (untested code without any error handling - but should give you the general gist of it):
using (var sin = new SteamReader("yourfile.csv")
using (var sout = new SteamWriter("outfile.csv")
{
var line = sin.ReadLine(); // note: should add error handling for empty files
var cells = line.Split(","); // note: you should probably check the length too!
var key = cells[0]; // use this to match other rows
StringBuilder output = new StringBuilder(line); // this is the output line we build
while ((line = sin.ReadLine()) != null) // if we have more lines
{
cells = line.Split(","); // split so we can get the first column
while(cells[0] == key) // if the first column matches the current key
{
output.Append(String.Join(",",cells.Skip(4))); // add this row to our output line
}
// once the key changes
sout.WriteLine(output.ToString()); // write out the line we've built up
output.Clear();
output.Append(line); // update the new line to build
key = cells[0]; // and update the key
}
// once all lines have been processed
sout.WriteLine(output.ToString()); // We'll have just the last line to write out
}
The idea is to loop through each line in turn and keep track of the current value of the first column. When that value changes, you write out the output line you've been building up and update the key. This way you don't have to worry about exactly how many matches you have or if you might be missing a few points.
One note, it might be more efficient to use a StringBuilder for output rather than a String if you are going to concatentate 96 rows.
Define the ProcessOutputLine to store merged lines.
Call ProcessLine after each ReadLine and at end of file.
string curKey ="" ;
string keyLength = ... ; // set totalength of 4 first columns
string outputLine = "" ;
private void ProcessInputLine(string line)
{
string newKey=line.substring(0,keyLength) ;
if (newKey==curKey) outputline+=line.substring(keyLength) ;
else
{
if (outputline!="") ProcessOutPutLine(outputLine)
curkey = newKey ;
outputLine=Line ;
}
EDIT : this solution is very similar to that of Matt Burland, the only noticable difference is that I don't use the Split function.

Read last 30,000 lines of a file [duplicate]

This question already has answers here:
How to read last "n" lines of log file [duplicate]
(9 answers)
Closed 9 years ago.
If has a csv file whose data will increase by time to time. Now what i need to do is to read the last 30,000 lines.
Code :
string[] lines = File.ReadAllLines(Filename).Where(r => r.ToString() != "").ToArray();
int count = lines.Count();
int loopCount = count > 30000 ? count - 30000 : 0;
for (int i = loopCount; i < lines.Count(); i++)
{
string[] columns = lines[i].Split(',');
orderList.Add(columns[2]);
}
It is working fine but the problem is
File.ReadAllLines(Filename)
Read a complete file which causes performance lack. I want something like it only reads the last 30,000 lines which iteration through the complete file.
PS : i am using .Net 3.5 . Files.ReadLines() not exists in .Net 3.5
You can Use File.ReadLines() Method instead of using File.ReadAllLines()
From MSDN:File.ReadLines()
The ReadLines and ReadAllLines methods differ as follows:
When you use ReadLines, you can start enumerating the collection of strings before
the whole collection is returned; when you use ReadAllLines, you must
wait for the whole array of strings be returned before you can access
the array.
Therefore, when you are working with very large files,
ReadLines can be more efficient.
Solution 1 :
string[] lines = File.ReadAllLines(FileName).Where(r => r.ToString() != "").ToArray();
int count = lines.Count();
List<String> orderList = new List<String>();
int loopCount = count > 30000 ? 30000 : 0;
for (int i = count-1; i > loopCount; i--)
{
string[] columns = lines[i].Split(',');
orderList.Add(columns[2]);
}
Solution 2: if you are using .NET Framework 3.5 as you said in comments below , you can not use File.ReadLines() method as it is avaialble since .NET 4.0 .
You can use StreamReader as below:
List<string> lines = new List<string>();
List<String> orderList = new List<String>();
String line;
int count=0;
using (StreamReader reader = new StreamReader("c:\\Bethlehem-Deployment.txt"))
{
while ((line = reader.ReadLine()) != null)
{
lines.Add(line);
count++;
}
}
int loopCount = (count > 30000) ? 30000 : 0;
for (int i = count-1; i > loopCount; i--)
{
string[] columns = lines[i].Split(',');
orderList.Add(columns[0]);
}
You can use File.ReadLines by you can start enumerating the collection of strings before the whole collection is returned.
After that you can use the linq to make things lot more easier. Reverse will reverse the order of collection and Take will take the n number of items. Now put again Reverse to get the last n lines in original format.
var lines = File.ReadLines(Filename).Reverse().Take(30000).Reverse();
If you are using the .NET 3.5 or earlier you can create your own method which works same as File.ReadLines like this. Here is the code for the method originally written by #Jon
public IEnumerable<string> ReadLines(string file)
{
using (TextReader reader = File.OpenText(file))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Now you can use linq over this function as well like the above statement.
var lines = ReadLines(Filename).Reverse().Take(30000).Reverse();
The problem is that you do not know where to start reading the file to get the last 30,000 lines. Unless you want to maintain a separate index of line offsets you can either read the file from the start counting lines only retaining the last 30,000 lines or you can start from the end counting lines backwards. The last approach can be efficient if the file is very large and you only want a few lines. However, 30,000 does not seem like "a few lines" so here is an approach that reads the file from the start and uses a queue to keep the last 30,000 lines:
var filename = #" ... ";
var linesToRead = 30000;
var queue = new Queue<String>();
using (var streamReader = File.OpenText(fileName)) {
while (!streamReader.EndOfStream) {
queue.Enqueue(streamReader.ReadLine());
if (queue.Count > linesToRead)
queue.Dequeue();
}
}
Now you can access the lines that are stored in queue. This class implements IEnumerable<String> allowing you to use foreach to iterate the lines. However, if you want random access you will have to use the ToArray method to convert the queue into an array which adds some overhead to the computation.
This solution is efficient in terms memory because at most 30,000 lines has to be kept in memory and the garbage collector can free any extra lines when required. Using File.ReadAllLines will pull all the lines into memory at once possibly increasing the memory required by the process.
Or I have a diffrent ideo for this.
Try splitting the csv to categories like A-D , E-G ....
and acces what first character you need .
Or you can split data with count of entites. Every file will contain 15.000 entites for example. And a text file which will contain tiny data about entits and location Like :
Txt File:
entitesID | inWhich.Csv
....

Skip first line in log file using C# StreamReader in a loop

I have several log files that I need to parse and combine based on a timestamp. They're of the format:
GaRbAgE fIrSt LiNe
[1124 0905 134242422 ] Logs initialized
[1124 0905 134242568 SYSTEM] Good log entry:
{ Collection:
["Attribute"|String]
...
[1124 0905 135212932 SYSTEM] Good log entry:
As you can see I don't need the first line.
I'm currently using some Regex to parse each file: one expression determines if I have a "Logs initialized" line, which I don't care about and discard; another determines if I have a "Good log entry", which I keep and parse; and some of the good log entries span multiple lines. I simply accept the logs that are on multiple lines. However, the code currently also captures the first garbage line because it is indistinguishable from a multi-line log comment from a Regex viewpoint. Furthermore, from what I read Regex is not the solution here (Parsing a log file with regular expressions).
There are many log files and they can grow to be rather large. For this reason, I'm only reading 50 lines at a time per log before buffering and then combining them into a separate file. I loop through every file as long as there are non-null files left. Below is a code example where I replaced some conditions and variables with explanations.
while (there are non-null files left to read)
{
foreach (object logFile in logFiles) //logFiles is an array that stores the log names
{
int numLinesRead = 0;
using (StreamReader fileReader = File.OpenText(logFile.ToString()))
{
string fileLine;
// read in a line from the file
while ((fileLine = fileReader.ReadLine()) != null && numLinesRead < 50)
{
// compare line to regex expressions
Match rMatch = rExp.Match(fileLine);
if (rMatch.Success) // found good log entry
{
...
How would you skip that first garbage line? Unfortunately it is not as easy as simply consuming a line with ReadLine() because the StreamReader is within a loop and I'll end up deleting a line every 50 others.
I thought of keeping a list or array of files for which I've skipped that first line already (in order to not skip it more than once) but that is sort of ugly. I also thought of getting rid of the using statement and opening the StreamReader up before the loop but I'd prefer not to do that.
EDIT after posting I just realized that my implementation might not be correct at all. When the StreamReader closes and disposes I believe my previous position in the file will be lost. In which case, should I still use StreamReader without the using construct or is there a different type of file reader I should consider?
You could just use something like this:
Instead of this:
using (StreamReader fileReader = File.OpenText(logFile.ToString()))
{
string fileLine;
// read in a line from the file
while ((fileLine = fileReader.ReadLine()) != null && numLinesRead < 50)
{
do this:
int numLinesRead = 0;
foreach (var fileLine in File.ReadLines(logFile.ToString()).Skip(1))
{
if (++numLinesRead >= 50)
break;
Add another parameter to the method for the position in the file. First time in it's zero, and you can consume the line before you go into the loop. After that you can use it to position the stream where that last one left off.
e.g
long position = 0;
while position >= 0
{
position = ReadFiftyLines(argLogFile,0);
}
public long ReadFiftyLines(string argLogFile, long argPosition)
{
using(FileStream fs = new FileStream(argLogFile,FileMode.Open,FileAccess.Read))
{
string line = null;
if (argPosition == 0)
{
line = reader.Readline();
if (line == null)
{
return -1; // empty file
}
}
else
{
fs.Seek(argPosition,SeekOrigin.Begin);
}
StreamReader reader = new StreamReader(fs);
int count = 0;
while ((line = reader.ReadLine() != null) && (count < 50))
{
count++;
// do stuff with line
}
if (line == null)
{
return -1; // end of file
}
return fs.Position;
}
}
or somesuch.

Get line with starts with some number

I have a file and I have to process this file, but I have to pick just the last line of the file, and check if this line begins with the number 9, how can I do this using linq ... ?
This record, which begins with the number 9, can sometimes, not be the last line of the file, because the last line can be a \r\n
I maded one simple system to make thsi:
var lines = File.ReadAllLines(file);
for (int i = 0; i < lines.Length; i++)
{
if (lines[i].StartsWith("9"))
{
//...
}
}
But, I whant to know if is possible to make something more fast... or, more better, using linq... :)
string output=File.ReadAllLines(path)
.Last(x=>!Regex.IsMatch(x,#"^[\r\n]*$"));
if(output.StartsWith("9"))//found
The other answers are fine, but the following is more intuitive to me (I love self-documenting code):
Edit: misinterpreted your question, updating my example code to be more appropriate
var nonEmptyLines =
from line in File.ReadAllLines(path)
where !String.IsNullOrEmpty(line.Trim())
select line;
if (nonEmptyLines.Any())
{
var lastLine = nonEmptyLines.Last();
if (lastLine.StartsWith("9")) // or char.IsDigit(lastLine.First()) for 'any number'
{
// Your logic here
}
}
You don't need LINQ something like following should work:
var fileLines = File.ReadAllLines("yourpath");
if(char.IsDigit(fileLines[fileLines.Count() - 1][0])
{
//last line starts with a digit.
}
Or for checking against specific digit 9 you can do:
if(fileLines.Last().StartsWith("9"))
if(list.Last(x =>!string.IsNullOrWhiteSpace(x)).StartsWith("9"))
{
}
Since you need to check the last two lines (in case the last line is a newline), you can do this. You can change lines to however many last lines you want to check.
int lines = 2;
if(File.ReadLines(file).Reverse().Take(lines).Any(x => x.StartsWith("9")))
{
//one of the last X lines starts with 9
}
else
{
//none of the last X lines start with 9
}

how to skip lines in txt file

Hey guys I've been having some trouble skipping some unnecessary lines from the txt file that I am reading into my program. The data has the following format:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
I want to read line 1, trim lines 3, 4 and the white space and then read line 5, trim lines 7 and 8. I have read something similar to this here on this website, however, that particular case was skipping the first 5 lines of the text file. This is what I have tried so far:
string TextLine;
System.IO.StreamReader file =
new System.IO.StreamReader("C://log.txt");
while ((TextLine = file.ReadLine()) != null)
{
foreach (var i in Enumerable.Range(2, 3)) file.ReadLine();
Console.WriteLine(TextLine);
}
As you guys can see, for the range, I have specified the start as line 2 and then skip 3 lines, which includes the white space. However, that first parameter of Enumerable.Range does not seem to matter. I can put a 0 and it will yield the same results. As I have it right now, the program trims from the first line, until the number specified in the second parameter of the .Range function. Does anyone know of a way to get around this problem? Thanks
Why not read all the lines into an array and then just index the ones you want
var lines = File.ReadAllLines("C://log.txt");
Console.WriteLine(lines[0]);
Console.WriteLine(lines[5]);
If it's a really big file with consistent repeating sections you can create a read method and do:
while (!file.EndOfStream)
{
yield return file.ReadLine();
yield return file.ReadLine();
file.ReadLine();
file.ReadLine();
file.ReadLine();
}
or similar for whatever block format you need.
Here's an expanded version of the solution provided here at the OP's request.
public static IEnumerable<string> getMeaningfulLines(string filename)
{
System.IO.StreamReader file =
new System.IO.StreamReader(filename);
while (!file.EndOfStream)
{
//keep two lines that we care about
yield return file.ReadLine();
yield return file.ReadLine();
//discard three lines that we don't need
file.ReadLine();
file.ReadLine();
file.ReadLine();
}
}
public static void Main()
{
foreach(string line in getMeaningfulLines(#"C:/log.txt"))
{
//or do whatever else you want with the "meaningful" lines.
Console.WriteLine(line);
}
}
Here is another version that's going to be a little bit less fragile if the input file ends abruptly.
//Just get all lines from a file as an IEnumerable; handy helper method in general.
public static IEnumerable<string> GetAllLines(string filename)
{
System.IO.StreamReader file =
new System.IO.StreamReader(filename);
while (!file.EndOfStream)
{
yield return file.ReadLine();
}
}
public static IEnumerable<string> getMeaningfulLines2(string filename)
{
int counter = 0;
//This will yield when counter is 0 or 1, and not when it's 2, 3, or 4.
//The result is yield two, skip 3, repeat.
foreach(string line in GetAllLines(filename))
{
if(counter < 2)
yield return line;
//add one to the counter and have it wrap,
//so it is always between 0 and 4 (inclusive).
counter = (counter + 1) % 5;
}
}
of course the range doesn't matter ... what you're doing is skipping 2 lines at a time inside every while loop iteration - the 2-3 has no effect on the file reader pointer. I would suggest you just have a counter telling you on which line you are and skip if the line number is one of those you'd like to skip, e.g.
int currentLine = 1;
while ((TextLine = file.ReadLine()) != null)
{
if ( LineEnabled( currentLine )){
Console.WriteLine(TextLine);
}
currentLine++;
}
private boolean LineEnabled( int lineNumber )
{
if ( lineNumber == 2 || lineNumber == 3 || lineNumber == 4 ){ return false; }
return true;
}
I don't think you want to go about reading the line in two places (one in the loop and then again inside the loop). I would take this approach:
while ((TextLine = file.ReadLine()) != null)
{
if (string.IsNullOrWhitespace(TextLine)) // Or any other conditions
continue;
Console.WriteLine(TextLine);
}
The documentation for Enumerable.Range states:
public static IEnumerable<int> Range(
int start,
int count
)
Parameters
start
Type: System.Int32
The value of the first integer in the sequence.
count
Type: System.Int32
The number of sequential integers to generate.
So changing the first parameter won't change the logic of your program.
However, this is an odd way to do this. A for loop would be much simpler, easier to understand and far more efficient.
Also, you're code currently reads the first line, skips three lines and then outputs the first line and then repeats.
Have you tried something like this?
using (var file = new StreamReader("C://log.txt"))
{
var lineCt = 0;
while (var line = file.ReadLine())
{
lineCt++;
//logic for lines to keep
if (lineCt == 1 || lineCt == 5)
{
Console.WriteLine(line);
}
}
}
Although unless this is an extremely fixed format input file I'd find a different way to figure out what to do with each line rather than a fixed line number.

Categories