Can I use the same substring as part of different captures? - c#

I want to create a function that will allow me to convert CamelCase to Title Case. This seems like a good task for regular expressions, but I am not committed to using regular expressions, if you have a better solution.
Here is my first attempt that works in most cases, but there are some issues I will get to in a few lines:
private static Regex camelSplitRegex = new Regex(#"(\S)([A-Z])");
private static String camelReplacement = "$1 $2";
public String SplitCamel(String text){
return camelSplitRegex.Replace(text, camelReplacement);
}
The regex pattern looks for a non-whitespace character (1st capture) followed by a capital letter (2nd capture). In the function, Regex.Replace is used to insert a space between the 1st and 2nd captures.
This works fine for many examples:
SplitCamel("privateField") returns "private Field"
SplitCamel("PublicMethod") returns "Public Method"
SplitCamel(" LeadingSpace") returns " Leading Space" without inserting an extra space before "Leading", as desired.
The problem I have is when dealing with multiple consecutive capital letters.
SplitCamel("NASA") returns "N AS A" not "N A S A"
SplitCamel("C3PO") returns "C3 PO" not "C3 P O"
SplitCamel("CAPS LOCK FEVER") returns "C AP S L OC K F EV E R" not "C A P S L O C K F E V E R"
In these cases, I believe the issue is that each capital letter is only being captured as either \S or [A-Z], but cannot be \S on one match and [A-Z] on the next match.
My main question is, "Does the .NET regex engine has some way of supporting the same substring being used as different captures on consecutive matches?" Secondarily, is there a better way of splitting camel case?

private static Regex camelSplitRegex = new Regex(#"(?<=\w)(?=[A-Z])");
private static String camelReplacement = " ";
does the job.
The problem with your pattern is that when you have the string "ABCD", \S matches A and ([A-Z]) matches B and you obtain "A BCD", but for the next replacement B is already consumed by the pattern and can't be used any more.
The way is to use lookarounds (a lookbehind (?<=...) and a lookahead (?=...)) that don't consume characters, they are only tests for the current position in the string, that's why you don't need any reference in the replacement string, you only need to put a space at the current position.
The \w character class contains unicode letters, unicode digits and the underscore. If you want to restrict the search to ASCII digits and letters, use [0-9a-zA-Z] instead.
To be more precise:
for unicode, use (?<=[\p{L}\p{N}])(?=\p{Lu}) that works with accented letters and other alphabets and digits.
for ASCII use (?<=[a-zA-Z0-9])(?=[A-Z])

Here's a non-regular expression way to do that.
public static string SplitCamel(this string stuff)
{
var builder = new StringBuilder();
char? prev = null;
foreach (char c in stuff)
{
if (prev.HasValue && !char.IsWhiteSpace(prev.Value) && 'A' <= c && c <= 'Z')
builder.Append(' ');
builder.Append(c);
prev = c;
}
return builder.ToString();
}
The following
Console.WriteLine("'{0}'", "privateField".SplitCamel());
Console.WriteLine("'{0}'", "PublicMethod".SplitCamel());
Console.WriteLine("'{0}'", " LeadingSpace".SplitCamel());
Console.WriteLine("'{0}'", "NASA".SplitCamel());
Console.WriteLine("'{0}'", "C3PO".SplitCamel());
Console.WriteLine("'{0}'", "CAPS LOCK FEVER".SplitCamel());
Prints
'private Field'
'Public Method'
' Leading Space'
'N A S A'
'C3 P O'
'C A P S L O C K F E V E R'

please consider switching to the value type string instead of the string class. Update to this.
private static Regex camelSplitRegex = new Regex(#"(^\S)?([A-Z])");

Related

compiled Regex template with passing value dynamically [duplicate]

This is the input string: 23x^45*y or 2x^2 or y^4*x^3.
I am matching ^[0-9]+ after letter x. In other words I am matching x followed by ^ followed by numbers. Problem is that I don't know that I am matching x, it could be any letter that I stored as variable in my char array.
For example:
foreach (char cEle in myarray) // cEle is letter in char array x, y, z, ...
{
match CEle in regex(input) //PSEUDOCODE
}
I am new to regex and I new that this can be done if I define regex variables, but I don't know how.
You can use the pattern #"[cEle]\^\d+" which you can create dynamically from your character array:
string s = "23x^45*y or 2x^2 or y^4*x^3";
char[] letters = { 'e', 'x', 'L' };
string regex = string.Format(#"[{0}]\^\d+",
Regex.Escape(new string(letters)));
foreach (Match match in Regex.Matches(s, regex))
Console.WriteLine(match);
Result:
x^45
x^2
x^3
A few things to note:
It is necessary to escape the ^ inside the regular expression otherwise it has a special meaning "start of line".
It is a good idea to use Regex.Escape when inserting literal strings from a user into a regular expression, to avoid that any characters they type get misinterpreted as special characters.
This will also match the x from the end of variables with longer names like tax^2. This can be avoided by requiring a word boundary (\b).
If you write x^1 as just x then this regular expression will not match it. This can be fixed by using (\^\d+)?.
The easiest and faster way to implement from my point of view is the following:
Input: This?_isWhat?IWANT
string tokenRef = "?";
Regex pattern = new Regex($#"([^{tokenRef}\/>]+)");
The pattern should remove my tokenRef and storing the following output:
Group1 This
Group2 _isWhat
Group3 IWANT
Try using this pattern for capturing the number but excluding the x^ prefix:
(?<=x\^)[0-9]+
string strInput = "23x^45*y or 2x^2 or y^4*x^3";
foreach (Match match in Regex.Matches(strInput, #"(?<=x\^)[0-9]+"))
Console.WriteLine(match);
This should print :
45
2
3
Do not forget to use the option IgnoreCase for matching, if required.

Regex for alphanumeric, at least 1 number and special chars

I am trying to find a regex which will give me the following validation:
string should contain at least 1 digit and at least 1 special character. Does allow alphanumeric.
I tried the following but this fails:
#"^[a-zA-Z0-9##$%&*+\-_(),+':;?.,!\[\]\s\\/]+$]"
I tried "password1$" but that failed
I also tried "Password1!" but that also failed.
ideas?
UPDATE
Need the solution to work with C# - currently the suggestions posted as of Oct 22 2013 do not appear to work.
Try this:
Regex rxPassword = new Regex( #"
^ # start-of-line, followed by
[a-zA-Z0-9!##]+ # a sequence of one or more characters drawn from the set consisting of ASCII letters, digits or the punctuation characters ! # and #
(<=[0-9]) # at least one of which is a decimal digit
(<=[!##]) # at least one of which is one of the special characters
(<=[a-zA-Z]) # at least one of which is an upper- or lower-case letter
$ # followed by end-of-line
" , RegexOptions.IgnorePatternWhitespace ) ;
The construct (<=regular-expression) is a zero-width positive look-behind assertion.
Sometimes it's a lot simpler to do things one step at a time. The static constructor builds the escaped character class characters from a simple list of allowed special characters. The built-in Regex.Escape method doesn't work here.
public static class PasswordValidator {
private const string ALLOWED_SPECIAL_CHARS = #"##$%&*+_()':;?.,![]\-";
private static string ESCAPED_SPECIAL_CHARS;
static PasswordValidator() {
var escapedChars = new List<char>();
foreach (char c in ALLOWED_SPECIAL_CHARS) {
if (c == '[' || c == ']' || c == '\\' || c == '-')
escapedChars.AddRange(new[] { '\\', c });
else
escapedChars.Add(c);
}
ESCAPED_SPECIAL_CHARS = new string(escapedChars.ToArray());
}
public static bool IsValidPassword(string input) {
// Length requirement?
if (input.Length < 8) return false;
// First just check for a digit
if (!Regex.IsMatch(input, #"\d")) return false;
// Then check for special character
if (!Regex.IsMatch(input, "[" + ESCAPED_SPECIAL_CHARS + "]")) return false;
// Require a letter?
if (!Regex.IsMatch(input, "[a-zA-Z]")) return false;
// DON'T allow anything else:
if (Regex.IsMatch(input, #"[^a-zA-Z\d" + ESCAPED_SPECIAL_CHARS + "]")) return false;
return true;
}
}
This may be work, there are two possible, the digit before special char or the digit after the special char. You should use DOTALL(the dot point all char)
^((.*?[0-9].*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*)|(.*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*?[0-9].*))$
This worked for me:
#"(?=^[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]{8,}$)(?=([!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]\W+){1,})(?=[^0-9][0-9])[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]*$"
alphanumeric, at least 1 numeric, and special character with a min length of 8
This should do the work
(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+
Tested with javascript, not sure about c#, may need some little adjust.
What it does is use anticipated positive lookahead to find the required elements of the password.
EDIT
Regular expression is designed to test if there are matches. Since all the patterns are lookahead, no real characters get captured and matches are empty, but if the expression "match", then the password is valid.
But, since the question is C# (sorry, i don't know c#, just improvising and adapting samples)
string input = "password1!";
string pattern = #"^(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+.*$";
Regex rgx = new Regex(pattern, RegexOptions.None);
MatchCollection matches = rgx.Matches(input);
if (matches.Count > 0) {
Console.WriteLine("{0} ({1} matches):", input, matches.Count);
foreach (Match match in matches)
Console.WriteLine(" " + match.Value);
}
Adding start of line, and a .*$ to the end, the expression will match if the password is valid. And the match value will be the password. (i guess)

Regular expression to remove whitespace around a comma, except when quoted

I have a CSV file that has rows resembling this:
1, 4, 2, "PUBLIC, JOHN Q" ,ACTIVE , 1332
I am looking for a regular expression replacement that will match against these rows and spit out something resembling this:
1,4,2,"PUBLIC, JOHN Q",ACTIVE,1332
I thought this would be rather easy: I made the expression ([ \t]+,) and replaced it with ,. I made a complement expression (,[ \t]+) with a replacement of , and I thought I had achieved a good means of right-trimming and left-trimming strings.
...but then I noticed that my "PUBLIC, JOHN Q" was now "PUBLIC,JOHN Q" which isn't what I wanted. (Note the space following the comma is now gone).
What would be the appropriate expression to trim the white space before and after a comma, but leave quoted text untouched?
UPDATE
To clarify, I am using an application to handle the file. This application allows me to define multiple regular expression replacements; it does not provide a parsing capability. While this may not be the ideal mechanism for this, it would sure beat making another application for this one file.
If the engine used by your tool is the C# regular expression engine, then you can try the following expression:
(?<!,\s*"(?:[^\\"]|\\")*)\s+(?!(?:[^\\"]|\\")*"\s*,)
replace with empty string.
The guys answers assumed the quotes are balanced and used counting to determine if the space is part of a quoted value or not.
My expression looks for all spaces that are not part of a quoted value.
RegexHero Demo
Something like this might do the job:
(?<!(^[^"]*"[^"]*(("[^"]*){2})*))[\t ]*,[ \t]*
Which matches [\t ]*,[ \t]*, only when not preceded by an odd number of quotes.
Going with some CSV library or parsing the file yourself would be much more easier, and IMO should be preferable option here.
But if you really insist on a regex, you can use this one:
"\s+(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
And replace it with empty string - ""
This regex matches one or more whitespaces, followed by an even number of quotes. This will of course work only if you have balanced quote.
(?x) # Ignore Whitespace
\s+ # One or more whitespace characters
(?= # Followed by
( # A group - This group captures even number of quotes
[^\"]* # Zero or more non-quote characters
\" # A quote
[^\"]* # Zero or more non-quote characters
\" # A quote
)* # Zero or more repetition of previous group
[^\"]* # Zero or more non-quote characters
$ # Till the end
) # Look-ahead end
string format(string val)
{
if (val.StartsWith("\"")) val = " " + val;
string[] vals = val.Split('\"');
for (int i = 0; i < vals.Length; i += 2) vals[i] = vals[i].Replace(" ", "").Replace("\t", "");
return string.Join("\t", vals);
}
This will work if you have properly closed quoted strings in between
Forget the regex (See Bart's comment on the question, regular expressions aren't suitable for CSV).
public static string ReduceSpaces( string input )
{
char[] a = input.ToCharArray();
int placeComma = 0, placeOther = 0;
bool inQuotes = false;
bool followedComma = true;
foreach( char c in a ) {
inQuotes ^= (c == '\"');
if (c == ' ') {
if (!followedComma)
a[placeOther++] = c;
}
else if (c == ',') {
a[placeComma++] = c;
placeOther = placeComma;
followedComma = true;
}
else {
a[placeOther++] = c;
placeComma = placeOther;
followedComma = false;
}
}
return new String(a, 0, placeComma);
}
Demo: http://ideone.com/NEKm09

Removing words with special characters in them

I have a long string composed of a number of different words.
I want to go through all of them, and if the word contains a special character or number (except '-'), or starts with a Capital letter, I want to delete it (the whole word not just that character). For all intents and purposes 'foreign' letters can count as special characters.
The obvious solution is to run a loop through each word (after splitting it) and then a loop through each character - but I'm hoping there's a faster way of doing it? Perhaps using Regex but I've almost no experience with it.
Thanks
ADDED:
(What I want for example:)
Input: "this Is an Example of 5 words in an input like-so from example.com"
Output: {this,an,of,words,in,an,input,like-so,from}
(What I've tried so far)
List<string> response = new List<string>();
string[] splitString = text.Split(' ');
foreach (string s in splitString)
{
bool add = true;
foreach (char c in s.ToCharArray())
{
if (!(c.Equals('-') || (Char.IsLetter(c) && Char.IsLower(c))))
{
add = false;
break;
}
if (add)
{
response.Add(s);
}
}
}
Edit 2:
For me a word should be a number of characters (a..z) seperated by a space. ,/./!/... at the end shouldn't count for the 'special character' condition (which is really mostly just to remove urls or the like)
So:
"I saw a dog. It was black!"
should result in
{saw,a,dog,was,black}
So you want to find all "words" that only contain characters a-z or -, for words that are separated by spaces?
A regex like this will find such words:
(?<!\S)[a-z-]+(?!\S)
To also allow for words that end with single punctuation, you could use:
(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))
Example (ideone):
var re = #"(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))";
var str = "this, Is an! Example of 5 words in an input like-so from example.com foo: bar?";
var m = Regex.Matches(str, re);
Console.WriteLine("Matched: ");
foreach (Match i in m)
Console.Write(i + " ");
Notice the punctuation in the string.
Output:
Matched:
this an of words in an input like-so from foo bar
How about this?
(?<=^|\s+)(?[a-z-]+)(?=$|\s+)
Edit: Meant (?<=^|\s+)(?<word>[a-z\-]+)(?=(?:\.|,|!|\.\.\.)?(?:$|\s+))
Rules:
Word can only be preceded by start of line or some number of whitespace characters
Word can only be followed by end of line or some number of whitespace characters (Edit supports words ending with periods, commas, exclamation points, and ellipses)
Word can only contain lower case (latin) letters and dashes
The named group containing each word is "word"
Have a look at Microsoft's How to: Search Strings Using Regular Expressions (C# Programming Guide) - it's about regexes in C#.
List<string> strings = new List<string>() {"asdf", "sdf-sd", "sdfsdf"};
for (int i = strings.Count-1; i > 0; i--)
{
if (strings[i].Contains("-"))
{
strings.Remove(strings[i]);
}
}
This could be a starting point. right now it just checks only for "." as a special char. This outputs : "this an of words in an like-so from"
string pattern = #"[A-Z]\w+|\w*[0-9]+\w*|\w*[\.]+\w*";
string line = "this Is an Example of 5 words in an in3put like-so from example.com";
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(pattern);
line = r.Replace(line,"");
You can do this in two ways, the white-list way and the black-list way. With a white-list you define the set of characters that you consider to be acceptable and with the black-list its the opposite.
Lets assume the white-list way and that you accept only characters a-z, A-Z and the - character. Additionally you have the rule that the first character of a word cannot be an upper case character.
With this you can do something like this:
string target = "This is a white-list example: (Foo, bar1)";
var matches = Regex.Matches(target, #"(?:\b)(?<Word>[a-z]{1}[a-zA-Z\-]*)(?:\b)");
string[] words = matches.Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(string.Join(", ", words));
Outputs:
// is, a, white-list, example
You can use look-aheads and look-behinds to do this. Here's a regex that matches your example:
(?<=\s|^)[a-z-]+(?=\s|$)
The explanation is: match one or more alphabetic characters (lowercase only, plus hyphen), as long as what comes before the characters is whitespace (or the start of the string), and as long as what comes after is whitespace or the end of the string.
All you need to do now is plug that into System.Text.RegularExpressions.Regex.Matches(input, regexString) to get your list of words.
Reference: http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet

Formatting sentences in a string using C#

I have a string with multiple sentences. How do I Capitalize the first letter of first word in every sentence. Something like paragraph formatting in word.
eg ."this is some code. the code is in C#. "
The ouput must be "This is some code. The code is in C#".
one way would be to split the string based on '.' and then capitalize the first letter and then rejoin.
Is there a better solution?
In my opinion, when it comes to potentially complex rules-based string matching and replacing - you can't get much better than a Regex-based solution (despite the fact that they are so hard to read!). This offers the best performance and memory efficiency, in my opinion - you'll be surprised at just how fast this'll be.
I'd use the Regex.Replace overload that accepts an input string, regex pattern and a MatchEvaluator delegate. A MatchEvaluator is a function that accepts a Match object as input and returns a string replacement.
Here's the code:
public static string Capitalise(string input)
{
//now the first character
return Regex.Replace(input, #"(?<=(^|[.;:])\s*)[a-z]",
(match) => { return match.Value.ToUpper(); });
}
The regex uses the (?<=) construct (zero-width positive lookbehind) to restrict captures only to a-z characters preceded by the start of the string, or the punctuation marks you want. In the [.;:] bit you can add the extra ones you want (e.g. [.;:?."] to add ? and " characters.
This means, also, that your MatchEvaluator doesn't have to do any unnecessary string joining (which you want to avoid for performance reasons).
All the other stuff mentioned by one of the other answerers about using the RegexOptions.Compiled is also relevant from a performance point of view. The static Regex.Replace method does offer very similar performance benefits, though (there's just an additional dictionary lookup).
Like I say - I'll be surprised if any of the other non-regex solutions here will work better and be as fast.
EDIT
Have put this solution up against Ahmad's as he quite rightly pointed out that a look-around might be less efficient than doing it his way.
Here's the crude benchmark I did:
public string LowerCaseLipsum
{
get
{
//went to lipsum.com and generated 10 paragraphs of lipsum
//which I then initialised into the backing field with #"[lipsumtext]".ToLower()
return _lowerCaseLipsum;
}
}
[TestMethod]
public void CapitaliseAhmadsWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(#"(^|\p{P}\s+)(\w+)", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1)));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
[TestMethod]
public void CapitaliseLookAroundWay()
{
List<string> results = new List<string>();
DateTime start = DateTime.Now;
Regex r = new Regex(#"(?<=(^|[.;:])\s*)[a-z]", RegexOptions.Compiled);
for (int f = 0; f < 1000; f++)
{
results.Add(r.Replace(LowerCaseLipsum, m => m.Value.ToUpper()));
}
TimeSpan duration = DateTime.Now - start;
Console.WriteLine("Operation took {0} seconds", duration.TotalSeconds);
}
In a release build, the my solution was about 12% faster than the Ahmad's (1.48 seconds as opposed to 1.68 seconds).
Interestingly, however, if it was done through the static Regex.Replace method, both were about 80% slower, and my solution was slower than Ahmad's.
Here's a regex solution that uses the punctuation category to avoid having to specify .!?" etc. although you should certainly check if it covers your needs or set them explicitly. Read up on the "P" category under the "Supported Unicode General Categories" section located on the MSDN Character Classes page.
string input = #"this is some code. the code is in C#? it's great! In ""quotes."" after quotes.";
string pattern = #"(^|\p{P}\s+)(\w+)";
// compiled for performance (might want to benchmark it for your loop)
Regex rx = new Regex(pattern, RegexOptions.Compiled);
string result = rx.Replace(input, m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1));
If you decide not to use the \p{P} class you would have to specify the characters yourself, similar to:
string pattern = #"(^|[.?!""]\s+)(\w+)";
EDIT: below is an updated example to demonstrate 3 patterns. The first shows how all punctuations affect casing. The second shows how to pick and choose certain punctuation categories by using class subtraction. It uses all punctuations while removing specific punctuation groups. The third is similar to the 2nd but using different groups.
The MSDN link doesn't spell out what some of the punctuation categories refer to, so here's a breakdown:
P: all punctuations (comprises all of the categories below)
Pc: underscore _
Pd: dash -
Ps: open parenthesis, brackets and braces ( [ {
Pe: closing parenthesis, brackets and braces ) ] }
Pi: initial single/double quotes (MSDN says it "may behave like Ps/Pe depending on usage")
Pf: final single/double quotes (MSDN Pi note applies)
Po: other punctuation such as commas, colons, semi-colons and slashes ,, :, ;, \, /
Carefully compare how the results are affected by these groups. This should grant you a great degree of flexibility. If this doesn't seem desirable then you may use specific characters in a character class as shown earlier.
string input = #"foo ( parens ) bar { braces } foo [ brackets ] bar. single ' quote & "" double "" quote.
dash - test. Connector _ test. Comma, test. Semicolon; test. Colon: test. Slash / test. Slash \ test.";
string[] patterns = {
#"(^|\p{P}\s+)(\w+)", // all punctuation chars
#"(^|[\p{P}-[\p{Pc}\p{Pd}\p{Ps}\p{Pe}]]\s+)(\w+)", // all punctuation chars except Pc/Pd/Ps/Pe
#"(^|[\p{P}-[\p{Po}]]\s+)(\w+)" // all punctuation chars except Po
};
// compiled for performance (might want to benchmark it for your loop)
foreach (string pattern in patterns)
{
Console.WriteLine("*** Current pattern: {0}", pattern);
string result = Regex.Replace(input, pattern,
m => m.Groups[1].Value
+ m.Groups[2].Value.Substring(0, 1).ToUpper()
+ m.Groups[2].Value.Substring(1));
Console.WriteLine(result);
Console.WriteLine();
}
Notice that "Dash" is not capitalized using the last pattern and it's on a new line. One way to make it capitalized is to use the RegexOptions.Multiline option. Try the above snippet with that to see if it meets your desired result.
Also, for the sake of example, I didn't use RegexOptions.Compiled in the above loop. To use both options OR them together: RegexOptions.Compiled | RegexOptions.Multiline.
You have a few different options:
Your approach of splitting the string, capitalizing and then re-joining
Using regular expressions to perform a replace of the expressions (which can be a bit tricky for case)
Write a C# iterator that iterates over each character and yields a new IEnumerable<char> with the first letter after a period in upper case. May offer benefit of a streaming solution.
Loop over each char and upper-case those that appear immediately after a period (whitespace ignored) - a StringBuffer may make this easier.
The code below uses an iterator:
public static string ToSentenceCase( string someString )
{
var sb = new StringBuilder( someString.Length );
bool wasPeriodLastSeen = true; // We want first letter to be capitalized
foreach( var c in someString )
{
if( wasPeriodLastSeen && !c.IsWhiteSpace )
{
sb.Append( c.ToUpper() );
wasPeriodLastSeen = false;
}
else
{
if( c == '.' ) // you may want to expand this to other punctuation
wasPeriodLastSeen = true;
sb.Append( c );
}
}
return sb.ToString();
}
I don't know why, but I decided to give yield return a try, based on what LBushkin had suggested. Just for fun.
static IEnumerable<char> CapitalLetters(string sentence)
{
//capitalize first letter
bool capitalize = true;
char lastLetter;
for (int i = 0; i < sentence.Length; i++)
{
lastLetter = sentence[i];
yield return (capitalize) ? Char.ToUpper(sentence[i]) : sentence[i];
if (Char.IsWhiteSpace(lastLetter) && capitalize == true)
continue;
capitalize = false;
if (lastLetter == '.' || lastLetter == '!') //etc
capitalize = true;
}
}
To use it:
string sentence = new String(CapitalLetters("this is some code. the code is in C#.").ToArray());
Do your work in a StringBuffer.
Lowercase the whole thing.
Loop through and uppercase leading chars.
Call ToString.

Categories