Counting words from a string builder

Counting words from a string builder - c#

I have a string builder which stores many words..for example, i did
StringBuilder builder = new StringBuilder();
builder.Append(reader.Value);
now, builder contains string as
" india is a great great country and it has many states and territories".. it contains many paragraphs.
I want that each word should be unique represented and its word count. example,
india: 1
great: 2
country: 1
and: 2
Also, this result should be saved in a excel file. But I am not getting the result.
I searched in google, but i am getting it by linq or by writing the words itself. Can you please help me out. I am a beginner.

You can use Linq to achieve it. Try something like this.
var result = from word in builder.Split(' ')
group word by word into g
select new { Word = g.Key, Count = g.Count() };
You can also convert this result into Dictionary object like this
Dictionary<string, int> output = result.ToDictionary(a => a.Word, a => a.Count);
So here each item in output will contains Word as Key and it's Count as value.

Well, this is one way to get the words:
IEnumerable<string> words = builder.ToString().Split(' ');

Look into using the String.Split() function to break up your string into words. You can then use a Dictionary<string, int> to keep track of unique words and their counts.
You don't really need a StringBuilder for this, though - a StringBuilder is useful when you contatenate strings together a lot. You only have a single input string here and you won't add to it - you'll split it up.
Once you finish processing all the words in the input string, you can write the code to export the results to Excel. The simplest way to do that is to create a comma-separated text file - search for that phrase and look into using a StreamWriter to save the output. Excel has built-in converters for CSV files.

Related

how to get a value from json with just the index?

Im making an app which needs to loop through steam games.
reading libraryfolder.vbf, i need to loop through and find the first value and save it as a string.
"libraryfolders"
{
"0"
{
"path" "D:\\Steam"
"label" ""
"contentid" "-1387328137801257092942"
"totalsize" "0"
"update_clean_bytes_tally" "42563526469"
"time_last_update_corruption" "1663765126"
"apps"
{
"730" "31892201109"
"4560" "9665045969"
"9200" "22815860246"
"11020" "776953234"
"34010" "11967809445"
"34270" "1583765638"
for example, it would record:
730
4560
9200
11020
34010
34270
Im already using System.Text.JSON in the program, is there any way i could loop through and just get the first value using System.Text.JSON or would i need to do something different as vdf doesnt separate the values with colons or commas?

That is not JSON, that is the KeyValues format developed by Valve. You can read more about the format here:
https://developer.valvesoftware.com/wiki/KeyValues
There are existing stackoverflow questions regarding converting a VDF file to JSON, and they mention libraries already developed to help read VDF which can help you out.
VDF to JSON in C#
If you want a very quick and dirty way to read the file without needing any external library I would probably use REGEX and do something like this:
string pattern = "\"apps\"\\s+{\\s+(\"(\\d+)\"\\s+\"\\d+\"\\s+)+\\s+}";
string libraryPath = #"C:\Program Files (x86)\Steam\steamapps\libraryfolders.vdf";
string input = File.ReadAllText(libraryPath);
List<string> indexes = Regex.Matches(input, pattern, RegexOptions.Singleline)
.Cast<Match>().ToList()
.Select(m => m.Groups[2].Captures).ToList()
.SelectMany(c => c.Cast<Capture>())
.Select(c => c.Value).ToList();
foreach(string s in indexes)
{
Debug.WriteLine(s);
}
See the regular expression explaination here:
https://regex101.com/r/bQSt79/1
It basically captures all occurances of "apps" { } in the 0 group, and does a repeating capture of pairs of numbers inbetween the curely brackets in the 1 group, but also captures the left most number in the pair of numbers in the 2 group. Generally repeating captures will only keep the last occurance but because this is C# we can still access the values.
The rest of the code takes each match, the 2nd group of each match, the captures of each group, and the values of those captures, and puts them in a list of strings. Then a foreach will print the value of those strings to log.

I have a csv file whose contents is like as the following. How I can convert it to collection of List<string> dynamically?

This csv file has lots of rows but all the rows doesn't have the equal number of values.

For dealing complex CSV files it is better to use a reliable solution, See this http://joshclose.github.io/CsvHelper/
It's easy to use simple like this
var csv = new CsvReader( textReader );
var records = csv.GetRecords<MyClass>().ToList();

I would use the string split method. https://msdn.microsoft.com/en-us/library/system.string.split(v=vs.110).aspx.
Suppose you read your csv file into a string, you can split the string on line breaks into multiple strings. You'll create a new list for each new line string. You'll split each line on whatever you have for your delimiter, then add those values to your list.
Edit: here is a similar question How to split() a delimited string to a List<String>

Regex to select all commas up to a specific character

I am having a terrible time with regular expressions. Its terrible for me to admit, but I just don't use them enough to be really good when I need to be. Basically due to the way our application runs, I have the contents of a .csv file pulled out into a string. I need to essentially insert a new row above and below what already exists. The amount of columns can change depending on the report. What I would like to do is grab all commas without any other characters (including whitespace) up to the first set of \r\n in the string. This way I have all the columns and I can insert a blank row up top and populate the columns with what I need. Here is an example of the .csv text:
"Date, Account Code, Description, Amount\r\n23-Apr-13,12345,Account1,$12345\r\n"
What I would like the regex to grab:
",,," or ",,,\r\n"
I just cannot seem to get this. Thank you.

You don't need a regex for this.
string firstLine = file.ReadLines().First();
int numCommas = firstLine.Count(c => c == ',');
string commaString = new String(',', numCommas);
If you don't have access to file.ReadLines() method, you can use the following from this link:
string firstline = test.Substring(0, test.IndexOf(Environment.NewLine));

You actually don't need to complicate your code with Regular Epressions to accomplish what you want: to count the columns.
Here's an extremely simple method:
String textline = csvtext.Substring(0, csvtext.IndexOfAny(Environment.NewLine.ToCharArray()));
int columns = textline.Split(',').Length;
Now the columns variable has your total number of columns.
The first line grabs just the first line out of the CSV text. The second line splits that text into an array separated by commas (,), and returns the total number.

you can make use the below regex
(?<=[\d\w\s])(\r|\n|,)(?=[\d\w\s\W])
to match , and new line characters,
Use can make use of Regex.Replace("inputstring","regexpattern","replacechar", RegexOptions.IgnoreCase)
This can be done by string operations itself
string[] strs= inputstr.split(new string[]{"\n","\r",","}, StringSplitOptions.RemoveEmptyEntries);
foreach(string str in strs)
{
// do you part
}

How do I break a string into an array (or List) in C#?

Using C# (VS 2010 Express) I read the contents of a text file into a string. The string is rather long but reliably broken up by "\t" for tabs and "\r\n" for carriage returns/newlines.
The tabs indicate a new column of data, and new line indicates a new row of data.
I want to create an array or List of dimensions (X)(Y) such that each spot in the array can hold 1 row of data from the text file, and all of the Y columns contained in that 1 row ("\t" means a new column of data, and "\r\n" means a new row of data").
To make things simple let's say my text has 10 rows of data, and 2 columns. I'd like to create an array or List or whatever you think is best to store the data. How do I do this? Thanks.
This is the code that I used to read the data in the text file into a string:
// Read the file as one string.
System.IO.StreamReader myFile = new System.IO.StreamReader("f:\\data.txt");
string myString = myFile.ReadToEnd();

Just as is (you already have a string with everything):
str.Split(new string[]{"\r\n"}, StringSplitOptions.None)
.Select(s => s.Split('\t'));
Gives you an IEnumerable<string[]> producing variantes like list of list, array of array and so on just needs the suitable ToArray() or ToList() etc.
However, if you can deal with each line one at a time, you can be better off with something that lets you do so:
public IEnumerable<string[]> ReadTSV(TextReader tr)
{
using(tr)
for(string line = tr.ReadLine(); line != null; line = tr.ReadLine())
yield return line.Split('\t');
}
Then you only use as much memory as each line needs. We could go further and change the reading to emit each individual cell one at a time, but this is normally enough to read files of several hundred MB in size, with reasonable efficiency.
Edit based on comments on question:
If you really wanted to, you could get a List<string[]> from:
var myFile = new StreamReader("f:\\data.txt");
var list = ReadTSV(myFile).ToList();
Alternatively, change the line yield return line.Split('\t'); to yield return line.Split('\t'); and you get a List<List<string>>.
However, if possible then work on the results directly, rather than putting it into a list first:
var myFile = new StreamReader("f:\\data.txt");
var chunks = ReadTSV(myFile);
foreach(var chunk in chunks)
{
DoSometingOnAChunk(chunk[0], chunk[1]);
}
It'll use less memory, and get started faster rather than pausing to read the whole thing first. Code like this can merrily work its way through gigabytes without complaint.

String.Split
http://msdn.microsoft.com/en-us/library/system.string.split.aspx

File.ReadLines(sourceFilePath)
.Select(line => line.Split('\t'))
.ToArray();

This will read the file and create a list of string arrays for you
List<string[]> rows= File.ReadLines("PathToFile")
.Select(line=>line.Split('\t')).ToList();
If you want string[][] version, simply use ToArray(); instead of ToList(); at the end.

The TextFieldParser is a fantastic class for dealing with text based delimited files. You can provide it a file, a delimiter (in this case "\t") and it will provide a method to get the next line of values (as a string array).
It has advantages over a simple Split in the general case as it can handle comments, quoted fields, escaped delimiters, etc. You may or may not have such cases, but having all of those awkward edge cases handled pretty much for free is rather nice.

var result = contents.Split("\r\n".ToArray(), StringSplitOptions.RemoveEmptyEntries).Select(s => {
s.Split('\t').ToList();
}).ToList();
result will be a List<List<String>>.

Order the lines in a file by the last character on the line

Can you please help me with this:
I want to build a method in C# which will order a lot of files by the following rule
every line contains strings and the last character in every line is an int.
I want to order the lines in the file by this last character, the int.
Thanks

To order ascending by the last character, interpreted as an integer you could do:
var orderedLines= File.ReadAllLines(#"test.txt")
.OrderBy(line => Convert.ToInt32(line[line.Length-1]))
.ToList();
Edit:
With the clarification in your comment - integer following a space character, can be more than one digit:
var orderedLines= File.ReadAllLines(#"test.txt")
.OrderBy(line => Convert.ToInt32(line.Substring(line.LastIndexOf(" ")+1,
line.Length - line.LastIndexOf(" ")-1)))
.ToList();

You could do something like this, where filename is the name of your file:
// Replace with the actual name of your file
string fileName = "MyFile.txt";
// Read the contents of the file into memory
string[] lines = File.ReadAllLines(fileName);
// Sort the contents of the file based on the number after the last space in each line
var orderedLines = lines.OrderBy(x => Int32.Parse(x.Substring(x.LastIndexOf(' '))));
// Write the lines back to the file
File.WriteAllText(fileName, string.Join(Environment.NewLine, orderedLines));
This is just a rough outline; hopefully it's helpful.

File.WriteAllLines(
pathToWriteTo,
File.ReadLines(pathToReadFrom)
.OrderBy(s => Convert.ToInt32(s.Split(' ').Last()))
);
If the file is large, this could be ineffective as this method of sorting effectively requires reading the entire file into memory.

Assuming you want more than single digit integers and that you have a separation character between the filename and the rest (we'll call it 'splitChar') which can be any character at all:
from string str in File.ReadAllLines(fileName)
let split = str.Split(splitChar)
orderby Int32.Parse(split[split.Count()-1])
select str
will get you a sequence of strings in order of the integer value of the last grouping (separated by the split character).

Maybe one of these links can help you by sorting it the natural way:
Natural Sorting in C#
Sorting for Humans : Natural Sort Order

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Counting words from a string builder - c#

Well, this is one way to get the words: IEnumerable<string> words = builder.ToString().Split(' ');

Related

how to get a value from json with just the index?

I have a csv file whose contents is like as the following. How I can convert it to collection of List<string> dynamically?

Regex to select all commas up to a specific character

How do I break a string into an array (or List) in C#?

Order the lines in a file by the last character on the line

Categories

Resources