How to count paragraphs in a text? - c#

I'm stuck.
I have song text stored in a string.
I need to count the song houses (houses separates by empty line. empty line is my delimiter).
In addition I need an access to each word, so i can associate the word to its house.
I really will appreciate yours help
This is my base code:
var paragraphMarker = Environment.NewLine;
var paragraphs = fileText.Split(new[] {paragraphMarker},
StringSplitOptions.RemoveEmptyEntries);
foreach (var paragraph in paragraphs)
{
var words = paragraph.Split(new[] {' '},
StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Trim());
//do something
}

You should be able to perform Regex.Split on \r\n\r\n which would be two carridge return line feeds (assuming that your empty line is actually empty) and then String.Split those by ' ' to get the individual words in each paragraph.
This will break it apart into two sections and then count the words in each. For simplicity I've only got one sentence in each bit.
var poem = "Roses are red, violets are blue\r\n\r\nSomething something darkside";
var verses = System.Text.RegularExpressions.Regex.Split(poem, "\r\n");
foreach (var verse in verses)
{
var words = verse.Split(' ');
Console.WriteLine(words.Count());
}
You'll need to tidy up more edge cases like punctuation etc, but this should give you a starting point.

String.Split will create an array using your delimiter as a token.
Array.Count will tell you how many elements are in an array.
For example, to find the count of words in this sentence:
var count = #"Hello! This is a naive example.".Split(' ').Count;

Related

How to get all words existing after x word count in c#

I am attempting to get all the words that exist after the fourth word in a string.
For example
private string myText="";
privaate int wordCount ;
private string excessString="";
void IRunOncePerFrame()
{
// A users can type in a text into a textarea to set the myText of
myText = EditorGUI.TextArea(someArea,myText);
//flag to check if someone edited the text
if(GUI.changed)
{
// get the number of actual words separated by white spaces, new lines
wordCount = myText.Split(new char[] {' ','\r','\n'},StringSplitOptions.RemoveEmptyEntries).Length;
if(wordCount > 4)
{
// get all the words that exist after the fourth word and store it in ExcessString
}
}
}
So if the users writes, "This solves the issue, now time for a drink". All strings after the word 'issue' should be placed in excessString
That is the general portion of the problem which may be helpful to others.
The more personal portion for me is this , I want to take excessString and and make it excessString = "<color = red>" + excessString +"</color>" and add it back to myText. since i do not yet know how to get the words after the fourth word i cant think how to get this done yet.
To get from word 4th till the last one, you can use: words.GetRange(4, words.Count - 4). Then use Join to concatenate the result into new string.
var words = myText.Split(new char[] { ' ', '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries).ToList();
if (words.Count > 4)
{
// get all the words that exist after the fourth word and store it in ExcessString
excessString = string.Join(" ", words.GetRange(4, words.Count - 4));
}

How to remove Whitespce from stringArray formed based on whitespace

I have a string which contains value like.
90 524 000 1234567890 2207 1926 00:34 02:40 S
Now i have broken this string into string Array based on white-space.Now i want to create one more string array into such a way so that all the white-space gets removed and it contains only real value.
Also i want to get the position of the string array element from the original string array based on the selection from the new string array formed by removing white space.
Please help me.
You can use StringSplitOptions.RemoveEmptyEntries via String.Split.
var values = input.Split(new [] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries: The return value does not include array elements that contain an empty string
When the Split method encounters two consecutive white-space it will return an empty string.Using StringSplitOptions.RemoveEmptyEntries will remove the empty strings and give you only the values you want.
You can also achieve this using LINQ
var values = input.Split().Where(x => x != string.Empty).ToArray();
Edit: If I understand you correctly you want the positions of the values in your old array. If so you can do this by creating a dictionary where the keys are the actual values and the values are indexes:
var oldValues = input.Split(' ');
var values = input.Split().Where(x => x != string.Empty).ToArray();
var indexes = values.ToDictionary(x => x, x => Array.IndexOf(oldValues, x));
Then indexes["1234567890"] will give you the position of 1234567890 in the first array.
You can use StringSplitOptions.RemoveEmptyEntries:
string[] arr = str.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
Note that i've also added tab character as delimiter. There are other white-space characters like the line separator character, add as desired. Full list here.
string s = "90 524 000 1234567890 2207 1926 00:34 02:40 S ";
s.Split(' ').Where(x=>!String.IsNullOrWhiteSpace(x))

Using lambdas in C# to perform multiple functions on an array

I have a string which I would like to split on a particular delimiter and then remove starting and trailing whitespace from each member. Currently the code looks like:
string s = "A, B, C ,D";
string[] parts = s.Split(',');
for(int i = 0; i++; i< parts.Length)
{
parts[i] = parts[i].Trim();
}
I feel like there should be a way to do this with lambdas, so that it could fit on one line, but I can't wrap my head around it. I'd rather stay away from LINQ, but I'm not against it as a solution either.
string s = "A, B, C ,D";
string[] parts = s.Split(','); // This line should be able to perform the trims as well
I've been working in Python recently and I think that's what has made me revisit how I think about solutions to problems in C#.
What about:
string[] parts = s.Split(',').Select(x => x.Trim()).ToArray();
var parts = s.Split(',').Select(part => part.Trim());
If you really want to avoid LINQ, you can split on multiple characters and discard the extra "empty" entries you get between the "," and spaces. Note that you can end up getting odd results (e.g. if you have consecutive "," delimiters you won't get the empty string in between them anymore):
s.Split(new char[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
This will work for your sample input, but its very fragile. For example, as #Oscar points out, whitespace inside your tokens will cause them to get split as well. I'd highly recommend you go with one of the LINQ-based options instead.

How to indicate whitespaces while reading from a .txt file

I have a simple .txt file with X,Y-values in it. It is structured like this:
-25.7754 35.87
-22.1233 32.16
-20.361 30.75
etc.
I am able to read single lines or the whole text to the end, with objstream.ReadToEnd(); & objstream.ReadLine().
But here's my question how could I indicate when the String after the first value ends so I can save/parse it to float & proceed reading the value of the next string?
Here is the read functionality I have so far :)
StreamReader objStream = new StreamReader("C:blablabla\\Text.asc");
textBox1.Text = objStream.ReadLine();
Thanks in advance,
BC++
Use String.split()
As requested, an example :
string s = "there is a cat";
//
// Split string on spaces.
// ... This will separate all the words.
//
string[] words = s.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
}
The output is :
there
is
a
cat
Look at the string.Split methods:
var line1 = objStream.ReadLine();
var lineParts = line1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
textBox1.Text = lineParts[0];
textBox2.Text = lineParts[1];
Note the use of an overload that uses StringSplitOptions.RemoveEmptyEntries - the means that if you have multiple spaces in succession, the result will not contain empty entries.
If you really mean white-space and not space then you have to go this way:
string line = "-25.7754 35.87";
string[] values = line.Split(new char[] { }, StringSplitOptions.RemoveEmptyEntries);
The difference from the other answers in the splitting character. If this not defined then white-space characters are assumed to be the delimiters. In other words you will get the same result for
string line = "-25.7754\t35.87"; // tab instead of spaces.
You will have the flexibility to split correctly fixed length or tab delimited lines using the same code.

Read text file word-by-word using LINQ

I am learning LINQ, and I want to read a text file (let's say an e-book) word by word using LINQ.
This is wht I could come up with:
static void Main()
{
string[] content = File.ReadAllLines("text.txt");
var query = (from c in content
select content);
foreach (var line in content)
{
Console.Write(line+"\n");
}
}
This reads the file line by line. If i change ReadAllLines to ReadAllText, the file is read letter by letter.
Any ideas?
string[] content = File.ReadAllLines("text.txt");
var words=content.SelectMany(line=>line.Split(' ', StringSplitOptions.RemoveEmptyEntries));
foreach(string word in words)
{
}
You'll need to add whatever whitespace characters you need. Using StringSplitOptions to deal with consecutive whitespaces is cleaner than the Where clause I originally used.
In .net 4 you can use File.ReadLines for lazy evaluation and thus lower RAM usage when working on large files.
string str = File.ReadAllText();
char[] separators = { '\n', ',', '.', ' ', '"', ' ' }; // add your own
var words = str.Split(separators, StringSplitOptions.RemoveEmptyEntries);
string content = File.ReadAllText("Text.txt");
var words = from word in content.Split(WhiteSpace, StringSplitOptions.RemoveEmptyEntries)
select word;
You will need to define the array of whitespace chars with your own values like so:
List<char> WhiteSpace = { Environment.NewLine, ' ' , '\t'};
This code assumes that panctuation is a part of the word (like a comma).
It's probably better to read all the text using ReadAllText() then use regular expressions to get the words. Using the space character as a delimiter can cause some troubles as it will also retrieve punctuation (commas, dots .. etc). For example:
Regex re = new Regex("[a-zA-Z0-9_-]+", RegexOptions.Compiled); // You'll need to change the RE to fit your needs
Match m = re.Match(text);
while (m.Success)
{
string word = m.Groups[1].Value;
// do your processing here
m = m.NextMatch();
}
The following uses iterator blocks, and therefore uses deferred loading. Other solutions have you loading the entire file into memory before being able to iterate over the words.
static IEnumerable<string> GetWords(string path){
foreach (var line in File.ReadLines(path)){
foreach (var word in line.Split(null)){
yield return word;
}
}
}
(Split(null) automatically removes whitespace)
Use it like this:
foreach (var word in GetWords(#"text.txt")){
Console.WriteLine(word);
}
Works with standard Linq funness too:
GetWords(#"text.txt").Take(25);
GetWords(#"text.txt").Where(w => w.Length > 3)
Of course error handling etc. left out for sake of learning.
You could write content.ToList().ForEach(p => p.Split(' ').ToList().ForEach(Console.WriteLine)) but that's not a lot of linq.

Categories