Regex. Camel case to underscore. Ignore first occurrence - c#

For example:
thisIsMySample
should be:
this_Is_My_Sample
My code:
System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", "_$0", System.Text.RegularExpressions.RegexOptions.Compiled);
It works fine, but if the input is changed to:
ThisIsMySample
the output will be:
_This_Is_My_Sample
How can first occurrence be ignored?

Non-Regex solution
string result = string.Concat(input.Select((x,i) => i > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString()));
Seems to be quite fast too: Regex: 2569ms, C#: 1489ms
Stopwatch stp = new Stopwatch();
stp.Start();
for (int i = 0; i < 1000000; i++)
{
string input = "ThisIsMySample";
string result = System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
System.Text.RegularExpressions.RegexOptions.Compiled);
}
stp.Stop();
MessageBox.Show(stp.ElapsedMilliseconds.ToString());
// Result 2569ms
Stopwatch stp2 = new Stopwatch();
stp2.Start();
for (int i = 0; i < 1000000; i++)
{
string input = "ThisIsMySample";
string result = string.Concat(input.Select((x, j) => j > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString()));
}
stp2.Stop();
MessageBox.Show(stp2.ElapsedMilliseconds.ToString());
// Result: 1489ms

You can use a lookbehind to ensure that each match is preceded by at least one character:
System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
System.Text.RegularExpressions.RegexOptions.Compiled);
lookaheads and lookbehinds allow you to make assertions about the text surrounding a match without including that text within the match.

Maybe like;
var str = Regex.Replace(input, "([A-Z])", "_$0", RegexOptions.Compiled);
if(str.StartsWith("_"))
str = str.SubString(1);

// (Preceded by a lowercase character or digit) (a capital) => The character prefixed with an underscore
var result = Regex.Replace(input, "(?<=[a-z0-9])[A-Z]", m => "_" + m.Value);
result = result.ToLowerInvariant();
This works for both PascalCase and camelCase.
It creates no leading or trailing underscores.
It leaves in tact any sequences of non-word characters and underscores in the string, because they would seem intentional, e.g. __HiThere_Guys becomes __hi_there_guys.
Digit suffixes are (intentionally) considered part of the word, e.g. NewVersion3 becomes new_version3.
Digit prefixes follow the original casing, e.g. 3VersionsHere becomes 3_versions_here, but 3rdVersion becomes 3rd_version.
Unfortunately, capitalized two-letter acronyms (e.g. in IDNumber, where ID would be considered a separate word), as suggested in Microsoft's Capitalization Conventions, are not supported, since they conflict with other cases. I recommend, in general, to resist this guideline, as it is a seemingly arbitrary exception to the convention of not capitalizing acronyms. Stick with IdNumber.

Elaborating on sa_ddam213's solution, mine extends this:
public static string GetConstStyleName(this string value)
{
return string.Concat(value.Select((x, i) =>
{
//want to avoid putting underscores between pairs of upper-cases or pairs of numbers, or adding redundant underscores if they already exist.
bool isPrevCharLower = (i == 0) ? false : char.IsLower(value[i - 1]);
bool isPrevCharNumber = (i == 0) ? false : char.IsNumber(value[i - 1]);
return (isPrevCharLower && (char.IsUpper(x) || char.IsNumber(x))) //lower-case followed by upper-case or number needs underscore
|| (isPrevCharNumber && (char.IsUpper(x))) //number followed by upper-case needs underscore
? "_" + x.ToString() : x.ToString();
})).ToUpperInvariant();
}

Use ".([A-Z])" for your regular expression, and then "_$1" for the replacement. So you use the captured string for the replacement and with the leading . you are sure you are not catching the first char of your string.

You need to modify your regex to not match the first char by defining you want to ignore the first char at all by
.([A-Z])
The above regex simply excludes every char that comes first and since it is not in the braces it would be in the matching group.
Now you need to match the second group like Bibhu noted:
System.Text.RegularExpressions.Regex.Replace(s, "(.)([A-Z])", "$1_$2", System.Text.RegularExpressions.RegexOptions.Compiled);

Related

How to remove multiple, repeating & unnecessary punctuation from string in C#?

Considering strings like this:
"This is a string....!"
"This is another...!!"
"What is this..!?!?"
...
// There are LOTS of examples of weird/angry sentence-endings like the ones above.
I want to replace the unnecessary punctuation at the end to make it look like this:
"This is a string!"
"This is another!"
"What is this?"
What I basically do is:
- split by space
- check if last char in string contains a punctuation
- start replacing with the patterns below
I have tried a very big ".Replace(string, string)" function, but it does not work - there has to be a simpler regex I guess.
Documentation:
Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.
As well as:
Because this method returns the modified string, you can chain together successive calls to the Replace method to perform multiple replacements on the original string.
Anything is wrong here.
EDIT: ALL the proposed solutions work fine! Thank you very much!
This one was the best suited solution for my project:
Regex re = new Regex("[.?!]*(?=[.?!]$)");
string output = re.Replace(input, "");
Your solution works almost fine (demo), the only issue is when the same sequence could be matched starting at different spots. For example, ..!?!? from your last line is not part of the substitution list, so ..!? and !? get replaced by two separate matches, producing ?? in the output.
It looks like your strategy is pretty straightforward: in a chain of multiple punctuation characters the last character wins. You can use regular expressions to do the replacement:
[!?.]*([!?.])
and replace it with $1, i.e. the capturing group that has the last character:
string s;
while ((s = Console.ReadLine()) != null) {
s = Regex.Replace(s, "[!?.]*([!?.])", "$1");
Console.WriteLine(s);
}
Demo
Simply
[.?!]*(?=[.?!]$)
should do it for you. Like
Regex re = new Regex("[.?!]*(?=[.?!]$)");
Console.WriteLine(re.Replace("This is a string....!", ""));
This replaces all punctuations but the last with nothing.
[.?!]* matches any number of consecutive punctuation characters, and the (?=[.?!]$) is a positive lookahead making sure it leaves one at the end of the string.
See it here at ideone.
Or you can do it without regExps:
string TrimPuncMarks(string str)
{
HashSet<char> punctMarks = new HashSet<char>() {'.', '!', '?'};
int i = str.Length - 1;
for (; i >= 0; i--)
{
if (!punctMarks.Contains(str[i]))
break;
}
// the very last punct mark or null if there were no any punct marks in the end
char? suffix = i < str.Length - 1 ? str[str.Length - 1] : (char?)null;
return str.Substring(0, i+1) + suffix;
}
Debug.Assert("What is this?" == TrimPuncMarks("What is this..!?!?"));
Debug.Assert("What is this" == TrimPuncMarks("What is this"));
Debug.Assert("What is this." == TrimPuncMarks("What is this."));

c# How do trim all non numeric character in a string

what is the faster way to trim all alphabet in a string that have alphabet prefix.
For example, input sting "ABC12345" , and i wish to havee 12345 as output only.
Thanks.
Please use "char.IsDigit", try this:
static void Main(string[] args)
{
var input = "ABC12345";
var numeric = new String(input.Where(char.IsDigit).ToArray());
Console.Read();
}
You can use Regular Expressions to trim an alphabetic prefix
var input = "ABC123";
var trimmed = Regex.Replace(input, #"^[A-Za-z]+", "");
// trimmed = "123"
The regular expression (second parameter) ^[A-Za-z]+ of the replace method does most of the work, it defines what you want to be replaced using the following rules:
The ^ character ensures a match only exists at the start of a string
The [A-Za-z] will match any uppercase or lowercase letters
The + means the upper or lowercase letters will be matched as many times in a row as possible
As this is the Replace method, the third parameter then replaces any matches with an empty string.
The other answers seem to answer what is the slowest way .. so if you really need the fastest way, then you can find the index of the first digit and get the substring:
string input = "ABC12345";
int i = 0;
while ( input[i] < '0' || input[i] > '9' ) i++;
string output = input.Substring(i);
The shortest way to get the value would probably be the VB Val method:
double value = Microsoft.VisualBasic.Conversion.Val("ABC12345"); // 12345.0
You would have to regular expression. It seems you are looking for only digits and not letters.
Sample:
string result =
System.Text.RegularExpressions.Regex.Replace("Your input string", #"\D+", string.Empty);

How to find the number of occurrences of a letter in only the first sentence of a string?

I want to find number of letter "a" in only first sentence. The code below finds "a" in all sentences, but I want in only first sentence.
static void Main(string[] args)
{
string text; int k = 0;
text = "bla bla bla. something second. maybe last sentence.";
foreach (char a in text)
{
char b = 'a';
if (b == a)
{
k += 1;
}
}
Console.WriteLine("number of a in first sentence is " + k);
Console.ReadKey();
}
This will split the string into an array seperated by '.', then counts the number of 'a' char's in the first element of the array (the first sentence).
var count = Text.Split(new[] { '.', '!', '?', })[0].Count(c => c == 'a');
This example assumes a sentence is separated by a ., ? or !. If you have a decimal number in your string (e.g. 123.456), that will count as a sentence break. Breaking up a string into accurate sentences is a fairly complex exercise.
This is perhaps more verbose than what you were looking for, but hopefully it'll breed understanding as you read through it.
public static void Main()
{
//Make an array of the possible sentence enders. Doing this pattern lets us easily update
// the code later if it becomes necessary, or allows us easily to move this to an input
// parameter
string[] SentenceEnders = new string[] {"$", #"\.", #"\?", #"\!" /* Add Any Others */};
string WhatToFind = "a"; //What are we looking for? Regular Expressions Will Work Too!!!
string SentenceToCheck = "This, but not to exclude any others, is a sample."; //First example
string MultipleSentencesToCheck = #"
Is this a sentence
that breaks up
among multiple lines?
Yes!
It also has
more than one
sentence.
"; //Second Example
//This will split the input on all the enders put together(by way of joining them in [] inside a regular
// expression.
string[] SplitSentences = Regex.Split(SentenceToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase);
//SplitSentences is an array, with sentences on each index. The first index is the first sentence
string FirstSentence = SplitSentences[0];
//Now, split that single sentence on our matching pattern for what we should be counting
string[] SubSplitSentence = Regex.Split(FirstSentence, WhatToFind, RegexOptions.IgnoreCase);
//Now that it's split, it's split a number of times that matches how many matches we found, plus one
// (The "Left over" is the +1
int HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the first sentence, {0} '{1}'.", HowMany, WhatToFind));
//Do all this again for the second example. Note that ideally, this would be in a separate function
// and you wouldn't be writing code twice, but I wanted you to see it without all the comments so you can
// compare and contrast
SplitSentences = Regex.Split(MultipleSentencesToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
SubSplitSentence = Regex.Split(SplitSentences[0], WhatToFind, RegexOptions.IgnoreCase | RegexOptions.Singleline);
HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the second sentence, {0} '{1}'.", HowMany, WhatToFind));
}
Here is the output:
We found, in the first sentence, 3 'a'.
We found, in the second sentence, 4 'a'.
You didn't define "sentence", but if we assume it's always terminated by a period (.), just add this inside the loop:
if (a == '.') {
break;
}
Expand from this to support other sentence delimiters.
Simply "break" the foreach(...) loop when you encounter a "." (period)
Well, assuming you define a sentence as being ended with a '.''
Use String.IndexOf() to find the position of the first '.'. After that, searchin a SubString instead of the entire string.
find the place of the '.' in the text ( you can use split )
count the 'a' in the text from the place 0 to instance of the '.'
string SentenceToCheck = "Hi, I can wonder this situation where I can do best";
//Here I am giving several way to find this
//Using Regular Experession
int HowMany = Regex.Split(SentenceToCheck, "a", RegexOptions.IgnoreCase).Length - 1;
int i = Regex.Matches(SentenceToCheck, "a").Count;
// Simple way
int Count = SentenceToCheck.Length - SentenceToCheck.Replace("a", "").Length;
//Linq
var _lamdaCount = SentenceToCheck.ToCharArray().Where(t => t.ToString() != string.Empty)
.Select(t => t.ToString().ToUpper().Equals("A")).Count();
var _linqAIEnumareable = from _char in SentenceToCheck.ToCharArray()
where !String.IsNullOrEmpty(_char.ToString())
&& _char.ToString().ToUpper().Equals("A")
select _char;
int a =linqAIEnumareable.Count;
var _linqCount = from g in SentenceToCheck.ToCharArray()
where g.ToString().Equals("a")
select g;
int a = _linqCount.Count();

.Net Removing all the first 0 of a string

I got the following :
01.05.03
I need to convert that to 1.5.3
The problem is I cannot only trim the 0 because if I got :
01.05.10
I need to convert that to 1.5.10
So, what's the better way to solve that problem ? Regex ? If so, any regex example doing that ?
Expanding on the answer of #FrustratedWithFormsDesigner:
string Strip0s(string s)
{
return string.Join<int>(".", from x in s.Split('.') select int.Parse(x));
}
Regex-replace
(?<=^|\.)0+
with the empty string. The regex is:
(?<= # begin positive look-behind (i.e. "a position preceded by")
^|\. # the start of the string or a literal dot †
) # end positive look-behind
0+ # one or more "0" characters
† note that not all regex flavors support variable-length look-behind, but .NET does.
If you expect this kind of input: "00.03.03" and want to to keep the leading zero in this case (like "0.3.3"), use this expression instead:
(?<=^|\.)0+(?=\d)
and again replace with the empty string.
From the comments (thanks Kobi): There is a more concise expression that does not require look-behind and is equivalent to my second suggestion:
\b0+(?=\d)
which is
\b # a word boundary (a position between a word char and a non-word char)
0+ # one or more "0" characters
(?=\d) # positive look-ahead: a position that's followed by a digit
This works because the 0 happens to be a word character, so word boundaries can be used to find the first 0 in a row. It is a more compatible expression, because many regex flavors do not support variable-length look-behind, and some (like JavaScript) no look-behind at all.
You could split the string on ., then trim the leading 0s on the results of the split, then merge them back together.
I don't know of a way to do this in a single operation, but you could write a function that hides this and makes it look like a single operation. ;)
UPDATE:
I didn't even think of the other guy's regex. Yeah, that will probably do it in a single operation.
Here's another way you could do what FrustratedWithFormsDesigner suggests:
string s = "01.05.10";
string s2 = string.Join(
".",
s.Split('.')
.Select(str => str.TrimStart('0'))
.ToArray()
);
This is almost the same as dtb's answer, but doesn't require that the substrings be valid integers (it would also work with, e.g., "000A.007.0HHIMARK").
UPDATE: If you'd want any strings consisting of all 0s in the input string to be output as a single 0, you could use this:
string s2 = string.Join(
".",
s.Split('.')
.Select(str => TrimLeadingZeros(str))
.ToArray()
);
public static string TrimLeadingZeros(string text) {
int number;
if (int.TryParse(text, out number))
return number.ToString();
else
return text.TrimStart('0');
}
Example input/output:
00.00.000A.007.0HHIMARK // input
0.0.A.7.HHIMARK // output
There's also the old-school way which probably has better performance characteristics than most other solutions mentioned. Something like:
static public string NormalizeVersionString(string versionString)
{
if(versionString == null)
throw new NullArgumentException("versionString");
bool insideNumber = false;
StringBuilder sb = new StringBuilder(versionString.Length);
foreach(char c in versionString)
{
if(c == '.')
{
sb.Append('.');
insideNumber = false;
}
else if(c >= '1' && c <= '9')
{
sb.Append(c);
insideNumber = true;
}
else if(c == '0')
{
if(insideNumber)
sb.Append('0');
}
}
return sb.ToString();
}
string s = "01.05.10";
string newS = s.Replace(".0", ".");
newS = newS.StartsWith("0") ? newS.Substring(1, newS.Length - 1) : newS;
Console.WriteLine(newS);
NOTE: You will have to thoroughly check for possible input combination.
This looks like it is a date format, if so I would use Date processing code
DateTime time = DateTime.Parse("01.02.03");
String newFormat = time.ToString("d.M.yy");
or even better
String newFormat = time.ToShortDateString();
which will respect you and your clients culture setting.
If this data is not a date then don't use this :)
I had a similar requirement to parse a string with street adresses, where some of the house numbers had leading zeroes and I needed to remove them while keeping the rest of the text intact, so I slightly edited the accepted answer to meet my requirements, maybe someone finds it useful. Basically doing the same as accepted answer, with the difference that I am checking if the string part can be parsed as an integer, and defaulting to the string value when false;
string Strip0s(string s)
{
int outputValue;
return
string.Join(" ",
from x in s.Split(new[] { ' ' })
select int.TryParse(x, out outputValue) ? outputValue.ToString() : x);
}
Input: "Islands Brygge 34 B 07 TV"
Output: "Islands Brygge 34 B 7 TV"

Formatting sentences in a string using C#

I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word.
eg ."this is some code. the code is in C#. "
The ouput must be "This is some code. The code is in C#".
one way would be to split the string based on '.' and then capitalize the first letter and then rejoin.
Is there a better solution?
In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.
I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a Match object as input and returns a string replacement.
Here's the code:
public static string Capitalise(string input)
{
//now the first character
return Regex.Replace(input, #"(?<=(^|[.;:])\s*)[a-z]",
(match) => { return match.Value.ToUpper(); });
}
The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the [.;:] bit you can add the extra ones you want (e.g. [.;:?."] to add ? and " characters.
This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).
All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).
Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.
EDIT
Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.
Here's the crude benchmark I did:
public string LowerCaseLipsum
{
get
{
//went to lipsum.com and generated 10 paragraphs of lipsum
//which I then initialised into the backing field with #"[lipsumtext]".ToLower()
return _lowerCaseLipsum;
}
}
[TestMethod]
public void CapitaliseAhmadsWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(#"(^|\p{P}\s+)(\w+)", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1)));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
[TestMethod]
public void CapitaliseLookAroundWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(#"(?<=(^|[.;:])\s*)[a-z]", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Value.ToUpper()));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).
Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.
Here's a regex solution that uses the punctuation category to avoid having to specify .!?" etc. although you should certainly check if it covers your needs or set them explicitly. Read up on the "P" category under the "Supported Unicode General Categories" section located on the MSDN Character Classes page.
string input = #"this is some code. the code is in C#? it's great! In ""quotes."" after quotes.";
string pattern = #"(^|\p{P}\s+)(\w+)";
// compiled for performance (might want to benchmark it for your loop)
Regex rx = new Regex(pattern, RegexOptions.Compiled);
string result = rx.Replace(input, m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1));
If you decide not to use the \p{P} class you would have to specify the characters yourself, similar to:
string pattern = #"(^|[.?!""]\s+)(\w+)";
EDIT: below is an updated example to demonstrate 3 patterns. The first shows how all punctuations affect casing. The second shows how to pick and choose certain punctuation categories by using class subtraction. It uses all punctuations while removing specific punctuation groups. The third is similar to the 2nd but using different groups.
The MSDN link doesn't spell out what some of the punctuation categories refer to, so here's a breakdown:
P: all punctuations (comprises all of the categories below)
Pc: underscore _
Pd: dash -
Ps: open parenthesis, brackets and braces ( [ {
Pe: closing parenthesis, brackets and braces ) ] }
Pi: initial single/double quotes (MSDN says it "may behave like Ps/Pe depending on usage")
Pf: final single/double quotes (MSDN Pi note applies)
Po: other punctuation such as commas, colons, semi-colons and slashes ,, :, ;, \, /
Carefully compare how the results are affected by these groups. This should grant you a great degree of flexibility. If this doesn't seem desirable then you may use specific characters in a character class as shown earlier.
string input = #"foo ( parens ) bar { braces } foo [ brackets ] bar. single ' quote & "" double "" quote.
dash - test. Connector _ test. Comma, test. Semicolon; test. Colon: test. Slash / test. Slash \ test.";
string[] patterns = {
#"(^|\p{P}\s+)(\w+)", // all punctuation chars
#"(^|[\p{P}-[\p{Pc}\p{Pd}\p{Ps}\p{Pe}]]\s+)(\w+)", // all punctuation chars except Pc/Pd/Ps/Pe
#"(^|[\p{P}-[\p{Po}]]\s+)(\w+)" // all punctuation chars except Po
};
// compiled for performance (might want to benchmark it for your loop)
foreach (string pattern in patterns)
{
Console.WriteLine("*** Current pattern: {0}", pattern);
string result = Regex.Replace(input, pattern,
m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1));
Console.WriteLine(result);
Console.WriteLine();
}
Notice that "Dash" is not capitalized using the last pattern and it's on a new line. One way to make it capitalized is to use the RegexOptions.Multiline option. Try the above snippet with that to see if it meets your desired result.
Also, for the sake of example, I didn't use RegexOptions.Compiled in the above loop. To use both options OR them together: RegexOptions.Compiled | RegexOptions.Multiline.
You have a few different options:
Your approach of splitting the string, capitalizing and then re-joining
Using regular expressions to perform a replace of the expressions (which can be a bit tricky for case)
Write a C# iterator that iterates over each character and yields a new IEnumerable<char> with the first letter after a period in upper case. May offer benefit of a streaming solution.
Loop over each char and upper-case those that appear immediately after a period (whitespace ignored) - a StringBuffer may make this easier.
The code below uses an iterator:
public static string ToSentenceCase( string someString )
{
var sb = new StringBuilder( someString.Length );
bool wasPeriodLastSeen = true; // We want first letter to be capitalized
foreach( var c in someString )
{
if( wasPeriodLastSeen && !c.IsWhiteSpace )
{
sb.Append( c.ToUpper() );
wasPeriodLastSeen = false;
}
else
{
if( c == '.' ) // you may want to expand this to other punctuation
wasPeriodLastSeen = true;
sb.Append( c );
}
}
return sb.ToString();
}
I don't know why, but I decided to give yield return a try, based on what LBushkin had suggested. Just for fun.
static IEnumerable<char> CapitalLetters(string sentence)
{
//capitalize first letter
bool capitalize = true;
char lastLetter;
for (int i = 0; i < sentence.Length; i++)
{
lastLetter = sentence[i];
yield return (capitalize) ? Char.ToUpper(sentence[i]) : sentence[i];
if (Char.IsWhiteSpace(lastLetter) && capitalize == true)
continue;
capitalize = false;
if (lastLetter == '.' || lastLetter == '!') //etc
capitalize = true;
}
}
To use it:
string sentence = new String(CapitalLetters("this is some code. the code is in C#.").ToArray());
Do your work in a StringBuffer.
Lowercase the whole thing.
Loop through and uppercase leading chars.
Call ToString.

Categories