String split using C# - c#

I have the following string:
string text = "1. This is first sentence. 2. This is the second sentence. 3. This is the third sentence. 4. This is the fourth sentence."
I want to split it according to 1. 2. 3. and so on:
result[0] == "This is first sentence."
result[1] == "This is the second sentence."
result[2] == "This is the third sentence."
result[3] == "This is the fourth sentence."
Is there any way I can do it C#?

Assuming that you can't encounter such a pattern in your sentences : X. (a integer, followed by a point, followed by a space), this should work:
String[] result = Regex.Split(text, #"[0-9]+\. ");

is it possible that there will be numbers in the sentence too?
As I do not know you formatting, you already said you cannot do on EOL/New Line I would try something like...
List<string> lines = new List<string>();
string buffer = "";
int count = 1;
foreach(char c in input)
{
if(c.ToString() == count.ToString())
{
if(!string.IsNullOrEmpty(buffer))
{
lines.Add(buffer);
buffer = "";
}
count++;
}
buffer += c;
}
//lines will now contain your splitted data
You can then access each sentence like this...
string s1 = lines[0];
string s2 = lines[1];
string s3 = lines[2];
Important: Make sure you check the count of lines before getting sentence like...
string s1 = lines.Count > 0 ? lines[0] : "";
This makes a big assumption that you will not have the next lines number ID in a given sentance (i.e. sentence 2 will not contain the number 3)
If this does not help the provide you input in original format (do not add lines breaks if there are none)
EDIT: Fixed my code (wrong variable sorry)

int index = 1;
String[] result = Regex.Split(text, #"[0-9]+\. ").Where(i => !string.IsNullOrEmpty(i)).Select(i => (index++).ToString() + ". " + i).ToArray();
result will contain your sentences, including the "line number".

You could split on the '.' char and drop anything smaller than 2 char from the resulting array.
Of course, this relies on the fact that you would have no datapoints of 1 character other than the numeric indicator, if that was the case you could also check for it as a numeric value.
This answer would also drop a period from your sentences, so you'd have to add that back in. There is a lot of manipulation but this saves you from having to read each char and decision it independently.

This is the easiest way:
var str = "1. This is first sentence." +
"2. This is the second sentence." +
"3. This is the third sentence." +
"n. This is the nenth sentence";
//set your max number e.g 10000
var num = Enumerable.Range(1, 10000).Select(x=>x.ToString()+".").ToArray();
var res=str.Split(num ,StringSplitOptions.RemoveEmptyEntries);
Hope this help ;)

Related

Take string before and after 'First' space character

I have a big word inside a string. Example
White wine extra offer.
I want to take 'White' in first line and 'wine extra offer in second.
using this code below:
string value="White wine extra offer";
value = value.Split(' ').FirstOrDefault() + ' ' + Environment.NewLine + value.Split(' ').LastOrDefault();
I'm getting in output White/r offer.
I'm taking the word after last space and no after first.
You can find the index of the first space and use substring I suppose.
string value = "White wine extra offer";
var spaceIndex = value.IndexOf(" ");
var firstLine = value.Substring(0, spaceIndex);
var secondLine = value.Substring(spaceIndex + 1);
var fullText = $"{firstLine}{Environment.NewLine}{secondLine}";
Your issue is because of how you are splitting your content. You have separated your content on a space, but then you have created an array with four different indexes. You can solve a couple of different approaches.
var sentence = "White wine extra offer";
var words = sentence.Split(' ');
var white = words.FirstOrDefault();
var wineExtraOffer = String.Join(" ", words.Skip(1));
You also should realize that if you manipulate a string directly with Linq, it will treat as a char[]. So you need to ensure you do not use the same variable for a bunch of Linq while assigning values.
Fiddle with output.
Can be done this way way :
string value="White wine extra offer";
string[] words = value.Split(' ');
// Take the first word and add break line
value = words[0] + Environment.NewLine;
// Add the rest of the phrase
for(int i = 1; i < words.lenght; ++i)
value += words[i];

Extracting a string before space and sequence of numbers

I have a string that has numbers dash and numbers so it can be
1-2
234-45
23-8
It can be any sequence of any number up to 12 characters.
These all numbers are preceded by a string. I need to extract this string before this sequence begins.
This is a Test1 1-2
This is a test for the first time 234-45
This is a test that is good 23-8
so I need to extract
This is a Test1
This is a test for the first time
This is a test that is good
there is only one space between this string and the sequence.
Is there any way I can extract that string. Split method is not working here.
I forgot to mention that I have numbers/test before the string too so it can be
2123 This is a test for the first time 23-456
or
Ac23 This is a test for the first time 23-457
any help will be appreciated.
Here's one way:
var sample = "2123 This is a Test1 1-2";
// Find the first occurrence of a space, and record the position of
// the next letter
var start = sample.IndexOf(' ') + 1;
// Pull from the string everything starting with the index found above
// to the last space (accounting for the difference from the starting index)
var text = sample.Substring(start, sample.LastIndexOf(' ') - start);
After this, text should equal:
This is a Test1
Wrap it up in a nice little function and send your collection of strings through it:
string ParseTextFromLine(string input)
{
var start = input.IndexOf(' ') + 1;
return input.Substring(start, input.LastIndexOf(' ') - start);
}
This is pretty easy,
string s = "This is a Test1 1-2";
s = s.Substring(0,s.LastIndexOf(" ");
and now s will be "This is a Test1"

RegEx - Find and Replace while ignoring a number in the middle?

In the middle of a long string, I am looking for "No. 1234. "
The number (1234) in my example above can be any length whole number. It also has to match on the space at the end.
So I am looking for examples:
1) This is a test No. 42. Hello Nice People
2) I have no idea wtf No. 1234412344124. I am doing.
I have figured out a way to match on this pattern with the following regex:
(No. [\d]{1,}. )'
What I cannot figure out, though, is how to do one simple thing when finding a match: Replace that last period with a darn comma!
So, with the two examples up above, I want to transform them into:
1) This is a test No. 42, Hello Nice People
2) I have no idea wtf No. 1234412344124, I am doing.
(Notice the commas now after the numbers)
How might one do this in C# and RegEx? Thank you!
EDIT:
Another way of looking at this is...
I can do this easily and have for years:
str = Replace(str, "Find this", "Replace it with this")
However, how can I do that by combining regex and the unknown portion of the string in the middle to replace the last period (not to be confused with the last character since the last character still needs to be a space)
This is a test No. 42. Hello Nice People
This is a test No. (some unknown length number). Hello Nice People
becomes
This is a test No. 42, Hello Nice People
This is a test No. (some unknown length number), Hello Nice People
(Notice the comma)
So you are essentially trying to match two adjacent groups, "\d+" and ". " then replace the second with ", ".
var r = new Regex(#"(\d+)(\. )");
var input = "This is a test No. 42. Hello Nice People";
var output = r.Replace(input, "$1, ");
Use the parenthesis to match two groups then with replace keep the first group and dump in the ", ".
Edit: derp, escape that period.
Edit - #1:
neilh's way is much better!
Ok, i know the code looks ugly.. i don't know how to edit the last char of a match directly in a regex
string[] stringhe = new string[5] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!",
"Nope"
};
Regex reg = new Regex(#"No.\s*([0-9]+)");
Match match;
int idx = 0;
StringBuilder builder;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + stringhe[idx]);
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
++idx;
}
Console.ReadKey(true);
If you want to edit the char after the number of if it's a ',':
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
if (stringa[indexEnd] == ',')
{
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
Or, we can edit the Regex to detect only if the number is followed by a comma with (better anyway)
No.\s*([0-9]+),
I'm not the best at Regex, but this should do what you want.
No.\s+([0-9]+)
If you except zero or more whitespaces between No. {NUMBER} this Regex should do the work:
No.\s*([0-9]+)
An example of how can look C# code:
string[] stringhe = new string[4] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!"
};
Regex reg = new Regex(#"No.\s+([0-9]+)");
Match match;
int idx = 0;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + match.Groups[1].Value);
}
++idx;
}
Here is the code :
private string Format(string input)
{
Match m = new Regex("No. [0-9]*.").Match(input);
int targetIndex = m.Index + m.Length - 1;
return input.Remove(targetIndex, 1).Insert(targetIndex, ",");
}

Length of string WITHOUT spaces (C#)

Quick little question...
I need to count the length of a string, but WITHOUT the spaces inside of it.
E.g. for a string like "I am Bob", string.Length would return 8 (6 letters + 2 spaces).
I need a method, or something, to give me the length (or number of) just the letters (6 in the case of "I am Bob")
I have tried the following
s.Replace (" ", "");
s.Replace (" ", null);
s.Replace (" ", string.empty);
to try and get "IamBob", which I did, but it didn't solve my problem because it still counted "" as a character.
Any help?
This returns the number of non-whitespace characters:
"I am Bob".Count(c => !Char.IsWhiteSpace(c));
Demo
Char.IsWhiteSpace:
White space characters are the following Unicode characters:
Members of the SpaceSeparator category, which includes the characters SPACE (U+0020), OGHAM SPACE MARK (U+1680), MONGOLIAN VOWEL SEPARATOR (U+180E), EN QUAD (U+2000), EM QUAD (U+2001), EN SPACE (U+2002), EM SPACE (U+2003), THREE-PER-EM SPACE (U+2004), FOUR-PER-EM SPACE (U+2005), SIX-PER-EM SPACE (U+2006), FIGURE SPACE (U+2007), PUNCTUATION SPACE (U+2008), THIN SPACE (U+2009), HAIR SPACE (U+200A), NARROW NO-BREAK SPACE (U+202F), MEDIUM MATHEMATICAL SPACE (U+205F), and IDEOGRAPHIC SPACE (U+3000).
Members of the LineSeparator category, which consists solely of the LINE SEPARATOR character (U+2028).
Members of the ParagraphSeparator category, which consists solely of the PARAGRAPH SEPARATOR character (U+2029).
The characters CHARACTER TABULATION (U+0009), LINE FEED (U+000A), LINE TABULATION (U+000B), FORM FEED (U+000C), CARRIAGE RETURN (U+000D), NEXT LINE (U+0085), and NO-BREAK SPACE (U+00A0).
No. It doesn't.
string s = "I am Bob";
Console.WriteLine(s.Replace(" ", "").Length); // 6
Console.WriteLine(s.Replace(" ", null).Length); //6
Console.WriteLine(s.Replace(" ", string.Empty).Length); //6
Here is a DEMO.
But what are whitespace characters?
http://en.wikipedia.org/wiki/Whitespace_character
You probably forgot to reassign the result of Replace. Try this:
string s = "I am bob";
Console.WriteLine(s.Length); // 8
s = s.Replace(" ", "");
Console.WriteLine(s.Length); // 6
A pretty simple way is to write an extension method that will do just that- count the characters without the white spaces. Here's the code:
public static class MyExtension
{
public static int CharCountWithoutSpaces(this string str)
{
string[] arr = str.Split(' ');
string allChars = "";
foreach (string s in arr)
{
allChars += s;
}
int length = allChars.Length;
return length;
}
}
To execute, simply call the method on the string:
string yourString = "I am Bob";
int count = yourString.CharCountWithoutSpaces();
Console.WriteLine(count); //=6
Alternatively, you can split the string an way you want if you don't want to include say, periods or commas:
string[] arr = str.Split('.');
or:
string[] arr = str.Split(',');
this is fastest way:
var spaceCount = 0;
for (var i 0; i < #string.Lenght; i++)
{
if (#string[i]==" ") spaceCount++;
}
var res = #string.Lenght-spaceCount;
Your problem is probably related to Replace() method not actually changing the string, rather returning the replaced value;
string withSpaces = "I am Bob";
string withoutSpaces = withSpaces.Replace(" ","");
Console.WriteLine(withSpaces);
Console.WriteLine(withoutSpaces);
Console.WriteLine(withSpaces.Length);
Console.WriteLine(withoutSpaces.Length);
//output
//I am Bob
//IamBob
//8
//6
You can use a combination of Length and Count functions on the string object. Here is a simple example.
string sText = "This is great text";
int nSpaces = sText.Length - sText.Count(Char.IsWhiteSpace);
This will count single or multiple (consistent) spaces accurately.
Hope it helps.

How to find the number of occurrences of a letter in only the first sentence of a string?

I want to find number of letter "a" in only first sentence. The code below finds "a" in all sentences, but I want in only first sentence.
static void Main(string[] args)
{
string text; int k = 0;
text = "bla bla bla. something second. maybe last sentence.";
foreach (char a in text)
{
char b = 'a';
if (b == a)
{
k += 1;
}
}
Console.WriteLine("number of a in first sentence is " + k);
Console.ReadKey();
}
This will split the string into an array seperated by '.', then counts the number of 'a' char's in the first element of the array (the first sentence).
var count = Text.Split(new[] { '.', '!', '?', })[0].Count(c => c == 'a');
This example assumes a sentence is separated by a ., ? or !. If you have a decimal number in your string (e.g. 123.456), that will count as a sentence break. Breaking up a string into accurate sentences is a fairly complex exercise.
This is perhaps more verbose than what you were looking for, but hopefully it'll breed understanding as you read through it.
public static void Main()
{
//Make an array of the possible sentence enders. Doing this pattern lets us easily update
// the code later if it becomes necessary, or allows us easily to move this to an input
// parameter
string[] SentenceEnders = new string[] {"$", #"\.", #"\?", #"\!" /* Add Any Others */};
string WhatToFind = "a"; //What are we looking for? Regular Expressions Will Work Too!!!
string SentenceToCheck = "This, but not to exclude any others, is a sample."; //First example
string MultipleSentencesToCheck = #"
Is this a sentence
that breaks up
among multiple lines?
Yes!
It also has
more than one
sentence.
"; //Second Example
//This will split the input on all the enders put together(by way of joining them in [] inside a regular
// expression.
string[] SplitSentences = Regex.Split(SentenceToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase);
//SplitSentences is an array, with sentences on each index. The first index is the first sentence
string FirstSentence = SplitSentences[0];
//Now, split that single sentence on our matching pattern for what we should be counting
string[] SubSplitSentence = Regex.Split(FirstSentence, WhatToFind, RegexOptions.IgnoreCase);
//Now that it's split, it's split a number of times that matches how many matches we found, plus one
// (The "Left over" is the +1
int HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the first sentence, {0} '{1}'.", HowMany, WhatToFind));
//Do all this again for the second example. Note that ideally, this would be in a separate function
// and you wouldn't be writing code twice, but I wanted you to see it without all the comments so you can
// compare and contrast
SplitSentences = Regex.Split(MultipleSentencesToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
SubSplitSentence = Regex.Split(SplitSentences[0], WhatToFind, RegexOptions.IgnoreCase | RegexOptions.Singleline);
HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the second sentence, {0} '{1}'.", HowMany, WhatToFind));
}
Here is the output:
We found, in the first sentence, 3 'a'.
We found, in the second sentence, 4 'a'.
You didn't define "sentence", but if we assume it's always terminated by a period (.), just add this inside the loop:
if (a == '.') {
break;
}
Expand from this to support other sentence delimiters.
Simply "break" the foreach(...) loop when you encounter a "." (period)
Well, assuming you define a sentence as being ended with a '.''
Use String.IndexOf() to find the position of the first '.'. After that, searchin a SubString instead of the entire string.
find the place of the '.' in the text ( you can use split )
count the 'a' in the text from the place 0 to instance of the '.'
string SentenceToCheck = "Hi, I can wonder this situation where I can do best";
//Here I am giving several way to find this
//Using Regular Experession
int HowMany = Regex.Split(SentenceToCheck, "a", RegexOptions.IgnoreCase).Length - 1;
int i = Regex.Matches(SentenceToCheck, "a").Count;
// Simple way
int Count = SentenceToCheck.Length - SentenceToCheck.Replace("a", "").Length;
//Linq
var _lamdaCount = SentenceToCheck.ToCharArray().Where(t => t.ToString() != string.Empty)
.Select(t => t.ToString().ToUpper().Equals("A")).Count();
var _linqAIEnumareable = from _char in SentenceToCheck.ToCharArray()
where !String.IsNullOrEmpty(_char.ToString())
&& _char.ToString().ToUpper().Equals("A")
select _char;
int a =linqAIEnumareable.Count;
var _linqCount = from g in SentenceToCheck.ToCharArray()
where g.ToString().Equals("a")
select g;
int a = _linqCount.Count();

Categories