Linq and streamreader getting lines - c#

Using LINQ, what is an efficent way to get each string from a tab-delimited .txt file (and then get each word, usually what string.Split(...) does)?
var v = from line in File.ReadAllLines()
select n
Is part of this solution I believe. I don't mind if this uses yield return.
EDIT: I've also seen threads on here detailing exactly what I am trying to do, but can't find them.

I'm not entirely sure what you're asking but it sounds like you're trying to get every word from a tab delimited file as an IEnumerable<string>. If so then try the following
var query = File.ReadAllLines(somePathVariable)
.SelectMany(x => x.Split(new char[] { '\t' });

Using File.ReadAllLines is easy - but not necessarily the most efficient, since it reads the entire line into memory.
A short version would probably be:
var wordsPerLine = from line in File.ReadAllLines(filename)
select string.Split(line, '\t');
foreach(var line in wordsPerLine)
{
foreach(word in line)
{
// process word...
}
}
If you want a single enumerable of the words, you can use SelectMany to get that, too...

Related

Replacing a word in a text file

I'm doing a little program where the data saved on some users are stored in a text file. I'm using Sytem.IO with the Streamwriter to write new information to my text file.
The text in the file is formatted like so :
name1, 1000, 387
name2, 2500, 144
... and so on. I'm using infos = line.Split(',') to return the different values into an array that is more useful for searching purposes. What I'm doing is using a While loop to search for the correct line (where the name match) and I return the number of points by using infos[1].
I'd like to modify this infos[1] value and set it to something else. I'm trying to find a way to replace a word in C# but I can't find a good way to do it. From what I've read there is no way to replace a single word, you have to rewrite the complete file.
Is there a way to delete a line completely, so that I could rewrite it at the end of the text file and not have to worried about it being duplicated?
I tried using the Replace keyword, but it didn't work. I'm a bit lost by looking at the answers proposed for similar problems, so I would really appreciate if someone could explain me what my options are.
If I understand you correctly, you can use File.ReadLines method and LINQ to accomplish this.First, get the line you want:
var line = File.ReadLines("path")
.FirstOrDefault(x => x.StartsWith("name1 or whatever"));
if(line != null)
{
/* change the line */
}
Then write the new line to your file excluding the old line:
var lines = File.ReadLines("path")
.Where(x => !x.StartsWith("name1 or whatever"));
var newLines = lines.Concat(new [] { line });
File.WriteAllLines("path", newLines);
The concept you are looking for is called 'RandomAccess' for file reading/writing. Most of the easy-to-use I/O methods in C# are 'SequentialAccess', meaning you read a chunk or a line and move forward to the next.
However, what you want to do is possible, but you need to read some tutorials on file streams. Here is a related SO question. .NET C# - Random access in text files - no easy way?
You are probably either reading the whole file, or reading it line-for-line as part of your search. If your fields are fixed length, you can read a fixed number of bytes, keep track of the Stream.Position as you read, know how many characters you are going to read and need to replace, and then open the file for writing, move to that exact position in the stream, and write the new value.
It's a bit complex if you are new to streams. If your file is not huge, copying a file line for line can be done pretty efficiently by the System.IO library if coded correctly, so you might just follow your second suggestion which is read the file line-for-line, write it to a new Stream (memory, temp file, whatever), replace the line in question when you get to that value, and when done, replace the original.
It is most likely you are new to C# and don't realize the strings are immutable (a fancy way of saying you can't change them). You can only get new strings from modifying the old:
String MyString = "abc 123 xyz";
MyString.Replace("123", "999"); // does not work
MyString = MyString.Replace("123", "999"); // works
[Edit:]
If I understand your follow-up question, you could do this:
infos[1] = infos[1].Replace("1000", "1500");

Reading a specific time file and writing the contents to another one

At runtime, I want to read all files who has the time stamp of a particular time. For example: if the application is running at 11:00:--, then it should read all files which is created after 11:00:00 till now(excluding the present one) and must write in the present file..I have tried like:
string temp_file_format = "ScriptLog_" + DateTime.Now.ToString("dd_MM_yyyy_HH");
string path = #"C:\\ScriptLogs";
var all_files = Directory.GetFiles(path, temp_file_format).SelectMany(File.ReadAllLines);
using (var w = new StreamWriter(logpath))
foreach (var line in all_files)
w.WriteLine(line);
But, this doesn't seems to be working.No error..No exception..But it doesn't read the files, while it exist.
The pattern parameter of the GetFiles method should probably also include a wildcard, something like:
string temp_file_format = "ScriptLog_" + DateTime.Now.ToString("dd_MM_yyyy_HH") + "*";
This will match all files starting with "ScriptLog_13_09_2013_11"
As #Edwin already solved your problem, I'd just like to add a suggestion regarding your code (mostly performance related).
Since you are only reading these lines in order to write them to a different file and discard them from memory, you should consider using File.ReadLines instead of File.ReadAllLines, because the latter method loads all lines from each file into memory unnecessarily.
Combine this with the File.WriteAllLines method, and you can simplify your code while reducing memory pressure to:
var all_files = Directory.GetFiles(path, temp_file_format);
// File.ReadLines returns a "lazy" IEnumerable<string> which will
// yield lines one by one
var all_lines = all_files.SelectMany(File.ReadLines);
// this iterates through all_lines and writes them to logpath
File.WriteAllLines(logpath, all_lines);
All that can even be written as a one-liner (that is, if you are not paid by your source code line count). ;-)

Counting words from a string builder

I have a string builder which stores many words..for example, i did
StringBuilder builder = new StringBuilder();
builder.Append(reader.Value);
now, builder contains string as
" india is a great great country and it has many states and territories".. it contains many paragraphs.
I want that each word should be unique represented and its word count. example,
india: 1
great: 2
country: 1
and: 2
Also, this result should be saved in a excel file. But I am not getting the result.
I searched in google, but i am getting it by linq or by writing the words itself. Can you please help me out. I am a beginner.
You can use Linq to achieve it. Try something like this.
var result = from word in builder.Split(' ')
group word by word into g
select new { Word = g.Key, Count = g.Count() };
You can also convert this result into Dictionary object like this
Dictionary<string, int> output = result.ToDictionary(a => a.Word, a => a.Count);
So here each item in output will contains Word as Key and it's Count as value.
Well, this is one way to get the words:
IEnumerable<string> words = builder.ToString().Split(' ');
Look into using the String.Split() function to break up your string into words. You can then use a Dictionary<string, int> to keep track of unique words and their counts.
You don't really need a StringBuilder for this, though - a StringBuilder is useful when you contatenate strings together a lot. You only have a single input string here and you won't add to it - you'll split it up.
Once you finish processing all the words in the input string, you can write the code to export the results to Excel. The simplest way to do that is to create a comma-separated text file - search for that phrase and look into using a StreamWriter to save the output. Excel has built-in converters for CSV files.

How do I break a string into an array (or List) in C#?

Using C# (VS 2010 Express) I read the contents of a text file into a string. The string is rather long but reliably broken up by "\t" for tabs and "\r\n" for carriage returns/newlines.
The tabs indicate a new column of data, and new line indicates a new row of data.
I want to create an array or List of dimensions (X)(Y) such that each spot in the array can hold 1 row of data from the text file, and all of the Y columns contained in that 1 row ("\t" means a new column of data, and "\r\n" means a new row of data").
To make things simple let's say my text has 10 rows of data, and 2 columns. I'd like to create an array or List or whatever you think is best to store the data. How do I do this? Thanks.
This is the code that I used to read the data in the text file into a string:
// Read the file as one string.
System.IO.StreamReader myFile = new System.IO.StreamReader("f:\\data.txt");
string myString = myFile.ReadToEnd();
Just as is (you already have a string with everything):
str.Split(new string[]{"\r\n"}, StringSplitOptions.None)
.Select(s => s.Split('\t'));
Gives you an IEnumerable<string[]> producing variantes like list of list, array of array and so on just needs the suitable ToArray() or ToList() etc.
However, if you can deal with each line one at a time, you can be better off with something that lets you do so:
public IEnumerable<string[]> ReadTSV(TextReader tr)
{
using(tr)
for(string line = tr.ReadLine(); line != null; line = tr.ReadLine())
yield return line.Split('\t');
}
Then you only use as much memory as each line needs. We could go further and change the reading to emit each individual cell one at a time, but this is normally enough to read files of several hundred MB in size, with reasonable efficiency.
Edit based on comments on question:
If you really wanted to, you could get a List<string[]> from:
var myFile = new StreamReader("f:\\data.txt");
var list = ReadTSV(myFile).ToList();
Alternatively, change the line yield return line.Split('\t'); to yield return line.Split('\t'); and you get a List<List<string>>.
However, if possible then work on the results directly, rather than putting it into a list first:
var myFile = new StreamReader("f:\\data.txt");
var chunks = ReadTSV(myFile);
foreach(var chunk in chunks)
{
DoSometingOnAChunk(chunk[0], chunk[1]);
}
It'll use less memory, and get started faster rather than pausing to read the whole thing first. Code like this can merrily work its way through gigabytes without complaint.
String.Split
http://msdn.microsoft.com/en-us/library/system.string.split.aspx
File.ReadLines(sourceFilePath)
.Select(line => line.Split('\t'))
.ToArray();
This will read the file and create a list of string arrays for you
List<string[]> rows= File.ReadLines("PathToFile")
.Select(line=>line.Split('\t')).ToList();
If you want string[][] version, simply use ToArray(); instead of ToList(); at the end.
The TextFieldParser is a fantastic class for dealing with text based delimited files. You can provide it a file, a delimiter (in this case "\t") and it will provide a method to get the next line of values (as a string array).
It has advantages over a simple Split in the general case as it can handle comments, quoted fields, escaped delimiters, etc. You may or may not have such cases, but having all of those awkward edge cases handled pretty much for free is rather nice.
var result = contents.Split("\r\n".ToArray(), StringSplitOptions.RemoveEmptyEntries).Select(s => {
s.Split('\t').ToList();
}).ToList();
result will be a List<List<String>>.

Order the lines in a file by the last character on the line

Can you please help me with this:
I want to build a method in C# which will order a lot of files by the following rule
every line contains strings and the last character in every line is an int.
I want to order the lines in the file by this last character, the int.
Thanks
To order ascending by the last character, interpreted as an integer you could do:
var orderedLines= File.ReadAllLines(#"test.txt")
.OrderBy(line => Convert.ToInt32(line[line.Length-1]))
.ToList();
Edit:
With the clarification in your comment - integer following a space character, can be more than one digit:
var orderedLines= File.ReadAllLines(#"test.txt")
.OrderBy(line => Convert.ToInt32(line.Substring(line.LastIndexOf(" ")+1,
line.Length - line.LastIndexOf(" ")-1)))
.ToList();
You could do something like this, where filename is the name of your file:
// Replace with the actual name of your file
string fileName = "MyFile.txt";
// Read the contents of the file into memory
string[] lines = File.ReadAllLines(fileName);
// Sort the contents of the file based on the number after the last space in each line
var orderedLines = lines.OrderBy(x => Int32.Parse(x.Substring(x.LastIndexOf(' '))));
// Write the lines back to the file
File.WriteAllText(fileName, string.Join(Environment.NewLine, orderedLines));
This is just a rough outline; hopefully it's helpful.
File.WriteAllLines(
pathToWriteTo,
File.ReadLines(pathToReadFrom)
.OrderBy(s => Convert.ToInt32(s.Split(' ').Last()))
);
If the file is large, this could be ineffective as this method of sorting effectively requires reading the entire file into memory.
Assuming you want more than single digit integers and that you have a separation character between the filename and the rest (we'll call it 'splitChar') which can be any character at all:
from string str in File.ReadAllLines(fileName)
let split = str.Split(splitChar)
orderby Int32.Parse(split[split.Count()-1])
select str
will get you a sequence of strings in order of the integer value of the last grouping (separated by the split character).
Maybe one of these links can help you by sorting it the natural way:
Natural Sorting in C#
Sorting for Humans : Natural Sort Order

Categories