Better regular expression for ReverseStringFormat - c#

I've been using for a while this neat function found here on SO:
private List<string> ReverseStringFormat(string template, string str)
{
string pattern = "^" + Regex.Replace(template, #"\{[0-9]+\}", "(.*?)") + "$";
Regex r = new Regex(pattern);
Match m = r.Match(str);
List<string> ret = new List<string>();
for (int i = 1; i < m.Groups.Count; i++)
ret.Add(m.Groups[i].Value);
return ret;
}
This function is able to process correctly templates like:
My name is {0} and I'm {1} years old
While it fails with patterns like:
My name is {0} and I'm {1:00} years old
I would like to handle this failing scenario and add fixed length parsing.
The function transforms the (first) template as following:
My name is (.*?) and I'm (.*?) years old
I've been trying to write the above regular expression to limit the number of characters captured for the second group without success. This is my (terrible) attempt:
My name is (.*?) and I'm (.{2}) years old
I've been trying to process inputs like the following but the below PATTERN doesn't work:
PATTERN: My name is (.*?) (.{3})(.{5})
INPUT: My name is John 123ABCDE
EXPECTED OUTPUT: John, 123, ABCDE
Every suggestion is highly appreciated

It is highly unlikely that you will be able to measure the length of a captured group within the same Regex replacement.
I would strongly suggest you look at the following state machine implementation.
Please note that this implementation also solves the multiple curly brace escape feature of string.Format.
First you will need a state enum, very much like this one:
public enum State {
Outside,
OutsideAfterCurly,
Inside,
InsideAfterColon
}
Then you will need a nice way to iterate over each character in a string.
The string chars parameter represents your template parameter while the returning IEnumerable<string> represents consecutive parts of the resulting pattern:
public static IEnumerable<string> InnerTransmogrify(string chars) {
State state = State.Outside;
int counter = 0;
foreach (var #char in chars) {
switch (state) {
case State.Outside:
switch (#char) {
case '{':
state = State.OutsideAfterCurly;
break;
default:
yield return #char.ToString();
break;
}
break;
case State.OutsideAfterCurly:
switch (#char) {
case '{':
state = State.Outside;
break;
default:
state = State.Inside;
counter = 0;
yield return "(.";
break;
}
break;
case State.Inside:
switch (#char) {
case '}':
state = State.Outside;
yield return "*?)";
break;
case ':':
state = State.InsideAfterColon;
break;
default:
break;
}
break;
case State.InsideAfterColon:
switch (#char) {
case '}':
state = State.Outside;
yield return "{" + counter + "})";
break;
default:
counter++;
break;
}
break;
}
}
}
You could join the parts like so:
public static string Transmogrify(string chars) {
var parts = InnerTransmogrify(chars);
var result = string.Join("", parts);
return result;
}
And then wrap everything up, like you originally intended:
private List<string> ReverseStringFormat(string template, string str) {
string pattern = <<SOME_PLACE>> .Transmogrify(template);
Regex r = new Regex(pattern);
Match m = r.Match(str);
List<string> ret = new List<string>();
for (int i = 1; i < m.Groups.Count; i++)
ret.Add(m.Groups[i].Value);
return ret;
}
Hope you understand why the Regex language isn't expressive enough (at least as far as my understanding is concerned) for this sort of job.

The only way to solve your problem with regular expressions is using a custom matcher to replace the group capture length.
The code bellow does this in your example:
private static string PatternFromStringFormat(string template)
{
// replaces only elements like {0}
string firstPass = Regex.Replace(template, #"\{[0-9]+\}", "(.*?)");
// replaces elements like {0:000} using a custom matcher
string secondPass = Regex.Replace(firstPass, #"\{[0-9]+\:(?<len>[0-9]+)\}",
(match) =>
{
var len = match.Groups["len"].Value.Length;
return "(.{" + len + "*})";
});
return "^" + secondPass + "$";
}
private static List<string> ReverseStringFormat(string template, string str)
{
string pattern = PatternFromStringFormat(template);
Regex r = new Regex(pattern);
Match m = r.Match(str);
List<string> ret = new List<string>();
for (int i = 1; i < m.Groups.Count; i++)
ret.Add(m.Groups[i].Value);
return ret;
}

Related

translate special character in strings

I have a program that reads from a xml document. In this xml document some of the attributes contain special characters like "\n", "\t", etc.
Is there an easy way to replace all of these strings with the actual character or do I just have to do it manually for each character like the following example?
Manual example:
s.Replace("\\n", "\n").Replace("\\t", "\t")...
edit:
I'm looking for some way to treat the string like an escaped string like this(even though I know this doesn't work)
s.Replace("\\", "\");
Try Regex.Unescape().
Official docs here:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.unescape(v=vs.110).aspx
Why not just walk the document and build up the new string in one pass. Saves a lot of duplicate searching and intermediate allocations
string ConvertSpecialCharacters(string input) {
var builder = new StringBuilder();
bool inEscape = false;
for (int i = 0; i < input.Length ; i++) {
if (inEscape) {
switch (input[i]) {
case 'n':
builder.Append('\t');
break;
case 't':
builder.Append('\n');
break;
default:
builder.Append('\\');
builder.Append(input[i]);
}
else if (input[i] == '\\' && i + 1 < input.Length) {
inEscape = true;
}
else {
builder.Append(input[i]);
}
}
return builder.ToString();
}

convert or figure formula which is contained parentheses

i need to find a way to conert treated formula(just using digits,letters and parentheses)
for example, for this input: '5(2(a)sz)' the output should be :'aaszaaszaaszaaszaasz'
i tried in that way:
string AddChainDeleteBracks(int open, int close, string input)
{
string to="",from="";
//get the local chain multipule the number in input[open-1]
//the number of the times the chain should be multiplied
for (int i = input[open - 1]; i > 0; i--)
{
//the content
for (int m = open + 1; m < close; m++)
{
to = to + input[m];
}
}
//get the chain i want to replace with "to"
for (int j = open - 1; j <= close; j++)
{
from = from + input[j];
}
String output = input.Replace(from, to);
return output;
}
but it doesn't work. Do u have a better idea to solve this?
You could store the opening parenthesis positions along with the number associated with that parenthesis in a stack (Last-in-First-out, e.g. System.Collections.Generic.Stack); then when you encounter the first (that is: next) closing parenthesis, pop the top of the stack: this will give you the beginning and ending position of the substring between the (so far most inner) parentheses you need to repeat. Then replace this portion of the original string (including the repetion number) with the repeated string. Continue until you reach the end of the string.
Things to be aware of:
when you do the replacement, you will need to update your current position so it now points to the end of the repetiotion string in the new (modified) string
depending whether 0 repetion is allowed, you might need to handle an empty repetition -- that is an empty string
when you reach the end of the string, the stack should be empty (all opening parentheses were matched with a closing one)
the stack might become empty in the middle of the string -- if you encounter a closing parentheses, the input string was malformed
there might be a way to escape the opening/cloding parentheses, so they don't count as part of the repetition pattern -- this depends on your requirements
Since the syntax of your expression is recursive, I suggest a recursive approach.
First split the expression into single tokens. I use Regex to do it and remove empty entries.
Example: "5(2(a)sz)" is split into "5", "(", "2", "(", "a", ")", "sz", ")"
Using an Enumerator enables you to get the tokens one by one. tokens.MoveNext() gets the next token. tokens.Current is the current token.
public string ConvertExpression(string expression)
{
IEnumerator<string> tokens = Regex.Split(expression, #"\b")
.Where(s => s != "")
.GetEnumerator();
if (tokens.MoveNext()) {
return Parse(tokens);
}
return "";
}
Here the main job is done in a recursive way
private string Parse(IEnumerator<string> tokens)
{
string s = "";
while (tokens.Current != ")") {
int n;
if (tokens.Current == "(") {
if (tokens.MoveNext()) {
s += Parse(tokens);
if (tokens.Current == ")") {
tokens.MoveNext();
return s;
}
}
} else if (Int32.TryParse(tokens.Current, out n)) {
if (tokens.MoveNext()) {
string subExpr = Parse(tokens);
var sb = new StringBuilder();
for (int i = 0; i < n; i++) {
sb.Append(subExpr);
}
s += sb.ToString();
}
} else {
s += tokens.Current;
if (!tokens.MoveNext())
return s;
}
}
return s;
}
Here is my second answer. My first answer was a quick shot. Here I tried to create a parser by doing the things one by one.
In order to convert an expression, you need to parse it. This means that you have to analyze its syntax. While analyzing its syntax you can produce an output as well.
1 The first thing to do, is to define the syntax of all the valid expressions.
Here I use EBNF to do it. EBNF is simple.
{ and } enclose repetitions (possibly zero).
[ and ] encloses an optional part.
| separates alternatives.
See Extended Backus–Naur Form (EBNF) on Wikpedia for more detailed information on EBNF. (The EBNF variant used here drops the concatenation operator ",").
Our syntax in EBNF
Expression = { Term }.
Term = [ Number ] Factor.
Factor = Text | "(" Expression ")" | Term.
Examples
5(2(a)sz) => aaszaaszaaszaaszaasz
5(2a sz) => aaszaaszaaszaaszaasz
2 3(a 2b)c => abbabbabbabbabbabbc
2 Lexical analysis
Before we analyze the syntax we have to split the whole expression into single lexical tokens (numbers, operators, etc.).
We use an enum to indicate the token type
private enum TokenType
{
None,
LPar,
RPar,
Number,
Text
}
The following fields are used to hold the token information and the Boolean _error which tells whether an error occurred during parsing.
private IEnumerator<Match> _matches;
TokenType _tokenType;
string _text;
int _number;
bool _error;
The method ConvertExpression starts the conversion. It splits the expression into single tokens represented as Regex.Matches.
Those are used by the method GetToken, which in turn converts the Regex.Matches into more useful information. This information is stored in the fields described above.
public string ConvertExpression(string expression)
{
_matches = Regex.Matches(expression, #"\d+|\(|\)|[a-zA-Z]+")
.Cast<Match>()
.GetEnumerator();
_error = false;
return GetToken() ? Expression() : "";
}
private bool GetToken()
{
_number = 0;
_tokenType = TokenType.None;
_text = null;
if (_error || !_matches.MoveNext())
return false;
_text = _matches.Current.Value;
switch (_text[0]) {
case '(':
_tokenType = TokenType.LPar;
break;
case ')':
_tokenType = TokenType.RPar;
break;
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
_tokenType = TokenType.Number;
_number = Int32.Parse(_text);
break;
default:
_tokenType = TokenType.Text;
break;
}
return true;
}
3 Syntactic and Semantic Analysis
Now we have everything we need to perform the actual parsing and expression conversion. Each of the methods below analyses one EBNF syntax production and returns the result of the conversion as string.
The conversion of EBNF into C# code is straight forward. A repetition in the syntax is converted to a C# loop statement.
An option is converted to an if statement and alternatives are converted to a switch statement.
// Expression = { Term }.
private string Expression()
{
string s = "";
do {
s += Term();
} while (_tokenType != TokenType.RPar && _tokenType != TokenType.None);
return s;
}
// Term = [ Number ] Factor.
private string Term()
{
int n;
if (_tokenType == TokenType.Number) {
n = _number;
if (!GetToken()) {
_error = true;
return " Error: Factor expected.";
}
string factor = Factor();
if (_error) {
return factor;
}
var sb = new StringBuilder(n * factor.Length);
for (int i = 0; i < n; i++) {
sb.Append(factor);
}
return sb.ToString();
}
return Factor();
}
// Factor = Text | "(" Expression ")" | Term.
private string Factor()
{
switch (_tokenType) {
case TokenType.None:
_error = true;
return " Error: Unexpected end of Expression.";
case TokenType.LPar:
if (GetToken()) {
string s = Expression();
if (_tokenType == TokenType.RPar) {
GetToken();
return s;
} else {
_error = true;
return s + " Error ')' expected.";
}
} else {
_error = true;
return " Error: Unexpected end of Expression.";
}
case TokenType.RPar:
_error = true;
GetToken();
return " Error: Unexpected ')'.";
case TokenType.Text:
string t = _text;
GetToken();
return t;
default:
return Term();
}
}

Pars Text or script into C# code

I want to store combination of method invocations and formulas as text or script into database. for example, i want to store something like this in Database as a string and execute it somewhere in code:
if(Vessel.Weight>200)
{
return Cargo.Weight*Cargo.Tariff*2
}
else
{
Cargo.Weight*Cargo.Tariff
}
invocation:
var cost = executeFormula("CalculateTariff",new {Cargo= GetCargo(), Vessel=GetVessel()});
because these rules will change frequently and i don't want to deploy dll (CLR solution), and i don't want to store these rules as SP and mix business rules with DAL.
Any idea or tool?
If you place all values in a hashtable or a dictionary by changing the . with a _ (Vessel.Weight will become "Vessel_Weight") and simplify the syntax to one line it will be much easier to create a solution. This rule can be written for example as:
result=(Vessel_Weight>200)
?(Cargo_Weight*Cargo_Tariff*2)
:(Cargo_Weight*Cargo_Tariff)
Having rules defined like the above one you can use the following (draft, not optimal ...) code as a guide for a properly coded function that will do the job. I repeat that the following code is not perfect, but bottom line it's more than enough as a proof of concept.
Dictionary<string, dynamic> compute = new Dictionary<string, dynamic>();
compute.Add("Vessel_Weight", 123);
compute.Add("Cargo_Weight", 24);
compute.Add("Cargo_Tariff", 9);
string rule = "result=(Vessel_Weight>200)
?(Cargo_Weight*Cargo_Tariff*2)
:(Cargo_Weight*Cargo_Tariff)";
string process = rule.Replace(" ", "");
foreach (Match level1 in Regex.Matches(process, "\\([^\\)]+\\)"))
{
string parenthesis = level1.Value;
string keepit = parenthesis;
Console.Write("{0} -> ", parenthesis);
// replace all named variable with values from the dictionary
foreach (Match level2 in Regex.Matches(parenthesis, "[a-zA-z0-9_]+"))
{
string variable = level2.Value;
if (Regex.IsMatch(variable, "[a-zA-z_]+"))
{
if (!compute.ContainsKey(variable))
throw new Exception("Variable not found");
parenthesis = parenthesis.Replace(variable, compute[variable].ToString());
}
}
parenthesis = parenthesis.Replace("(", "").Replace(")", "");
Console.Write("{0} -> ", parenthesis);
// do the math
List<double> d = new List<double>();
foreach (Match level3 in Regex.Matches(parenthesis, "[0-9]+(\\.[0-9]+)?"))
{
d.Add(double.Parse(level3.Value));
parenthesis = Regex.Replace(parenthesis, level3.Value, "");
}
double start = d[0];
for (var i = 1; i < d.Count; i++)
{
switch (parenthesis[i - 1])
{
case '+':
start += d[i];
break;
case '-':
start -= d[i];
break;
case '*':
start *= d[i];
break;
case '/':
start /= d[i];
break;
case '=':
start = (start == d[i]) ? 0 : 1;
break;
case '>':
start = (start > d[i]) ? 0 : 1;
break;
case '<':
start = (start < d[i]) ? 0 : 1;
break;
}
}
parenthesis = start.ToString();
Console.WriteLine(parenthesis);
rule = rule.Replace(keepit, parenthesis);
}
Console.WriteLine(rule);
// peek a value in case of a condition
string condition = "[0-9]+(\\.[0-9]+)?\\?[0-9]+(\\.[0-9]+)?:[0-9]+(\\.[0-9]+)?";
if (Regex.IsMatch(rule, condition))
{
MatchCollection m = Regex.Matches(rule, "[0-9]+(\\.[0-9]+)?");
int check = int.Parse(m[0].Value) + 1;
rule = rule.Replace(Regex.Match(rule, condition).Value, m[check].Value);
}
Console.WriteLine(rule);
// final touch
int equal = rule.IndexOf("=");
compute.Add(rule.Substring(0, equal - 1), double.Parse(rule.Substring(equal + 1)));
Now the result is a named item in the dictionary. This way you may process more rules in the sense of intermediate results and have a final rule based on them. The code as is written does not guarantee correct execution order for arithmetic operations, but if you keep your rules simple (and possibly split them if is needed) your will achieve your goal.

c# pad left to string

i want to find a efficent way to do :
i have a string like :
'1,2,5,11,33'
i want to pad zero only to the numbers that below 10 (have one digit)
so i want to get
'01,02,05,11,33'
thanks
How much do you really care about efficiency? Personally I'd use:
string padded = string.Join(",", original.Split(',')
.Select(x => x.PadLeft(2, '0')));
(As pointed out in the comments, if you're using .NET 3.5 you'll need a call to ToArray after the Select.)
That's definitely not the most efficient solution, but it's what I would use until I'd proved that it wasn't efficient enough. Here's an alternative...
// Make more general if you want, with parameters for the separator, length etc
public static string PadCommaSeparated(string text)
{
StringBuilder builder = new StringBuilder();
int start = 0;
int nextComma = text.IndexOf(',');
while (nextComma >= 0)
{
int itemLength = nextComma - start;
switch (itemLength)
{
case 0:
builder.Append("00,");
break;
case 1:
builder.Append("0");
goto default;
default:
builder.Append(text, start, itemLength);
builder.Append(",");
break;
}
start = nextComma + 1;
nextComma = text.IndexOf(',', start);
}
// Now deal with the end...
int finalItemLength = text.Length - start;
switch (finalItemLength)
{
case 0:
builder.Append("00");
break;
case 1:
builder.Append("0");
goto default;
default:
builder.Append(text, start, finalItemLength);
break;
}
return builder.ToString();
}
It's horrible code, but I think it will do what you want...
string input= "1,2,3,11,33";
string[] split = string.Split(input);
List<string> outputList = new List<string>();
foreach(var s in split)
{
outputList.Add(s.PadLeft(2, '0'));
}
string output = string.Join(outputList.ToArray(), ',');

Best way to convert Pascal Case to a sentence

What is the best way to convert from Pascal Case (upper Camel Case) to a sentence.
For example starting with
"AwaitingFeedback"
and converting that to
"Awaiting feedback"
C# preferable but I could convert it from Java or similar.
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
In versions of visual studio after 2015, you can do
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => $"{m.Value[0]} {char.ToLower(m.Value[1])}");
}
Based on: Converting Pascal case to sentences using regular expression
I will prefer to use Humanizer for this. Humanizer is a Portable Class Library that meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities.
Short Answer
"AwaitingFeedback".Humanize() => Awaiting feedback
Long and Descriptive Answer
Humanizer can do a lot more work other examples are:
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Can_return_title_Case".Humanize(LetterCasing.Title) => "Can Return Title Case"
"CanReturnLowerCase".Humanize(LetterCasing.LowerCase) => "can return lower case"
Complete code is :
using Humanizer;
using static System.Console;
namespace HumanizerConsoleApp
{
class Program
{
static void Main(string[] args)
{
WriteLine("AwaitingFeedback".Humanize());
WriteLine("PascalCaseInputStringIsTurnedIntoSentence".Humanize());
WriteLine("Underscored_input_string_is_turned_into_sentence".Humanize());
WriteLine("Can_return_title_Case".Humanize(LetterCasing.Title));
WriteLine("CanReturnLowerCase".Humanize(LetterCasing.LowerCase));
}
}
}
Output
Awaiting feedback
Pascal case input string is turned into sentence
Underscored input string is turned into sentence Can Return Title Case
can return lower case
If you prefer to write your own C# code you can achieve this by writing some C# code stuff as answered by others already.
Here you go...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CamelCaseToString
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CamelCaseToString("ThisIsYourMasterCallingYou"));
}
private static string CamelCaseToString(string str)
{
if (str == null || str.Length == 0)
return null;
StringBuilder retVal = new StringBuilder(32);
retVal.Append(char.ToUpper(str[0]));
for (int i = 1; i < str.Length; i++ )
{
if (char.IsLower(str[i]))
{
retVal.Append(str[i]);
}
else
{
retVal.Append(" ");
retVal.Append(char.ToLower(str[i]));
}
}
return retVal.ToString();
}
}
}
This works for me:
Regex.Replace(strIn, "([A-Z]{1,2}|[0-9]+)", " $1").TrimStart()
This is just like #SSTA, but is more efficient than calling TrimStart.
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Found this in the MvcContrib source, doesn't seem to be mentioned here yet.
return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
Just because everyone has been using Regex (except this guy), here's an implementation with StringBuilder that was about 5x faster in my tests. Includes checking for numbers too.
"SomeBunchOfCamelCase2".FromCamelCaseToSentence == "Some Bunch Of Camel Case 2"
public static string FromCamelCaseToSentence(this string input) {
if(string.IsNullOrEmpty(input)) return input;
var sb = new StringBuilder();
// start with the first character -- consistent camelcase and pascal case
sb.Append(char.ToUpper(input[0]));
// march through the rest of it
for(var i = 1; i < input.Length; i++) {
// any time we hit an uppercase OR number, it's a new word
if(char.IsUpper(input[i]) || char.IsDigit(input[i])) sb.Append(' ');
// add regularly
sb.Append(input[i]);
}
return sb.ToString();
}
Here's a basic way of doing it that I came up with using Regex
public static string CamelCaseToSentence(this string value)
{
var sb = new StringBuilder();
var firstWord = true;
foreach (var match in Regex.Matches(value, "([A-Z][a-z]+)|[0-9]+"))
{
if (firstWord)
{
sb.Append(match.ToString());
firstWord = false;
}
else
{
sb.Append(" ");
sb.Append(match.ToString().ToLower());
}
}
return sb.ToString();
}
It will also split off numbers which I didn't specify but would be useful.
string camel = "MyCamelCaseString";
string s = Regex.Replace(camel, "([A-Z])", " $1").ToLower().Trim();
Console.WriteLine(s.Substring(0,1).ToUpper() + s.Substring(1));
Edit: didn't notice your casing requirements, modifed accordingly. You could use a matchevaluator to do the casing, but I think a substring is easier. You could also wrap it in a 2nd regex replace where you change the first character
"^\w"
to upper
\U (i think)
I'd use a regex, inserting a space before each upper case character, then lowering all the string.
string spacedString = System.Text.RegularExpressions.Regex.Replace(yourString, "\B([A-Z])", " \k");
spacedString = spacedString.ToLower();
It is easy to do in JavaScript (or PHP, etc.) where you can define a function in the replace call:
var camel = "AwaitingFeedbackDearMaster";
var sentence = camel.replace(/([A-Z].)/g, function (c) { return ' ' + c.toLowerCase(); });
alert(sentence);
Although I haven't solved the initial cap problem... :-)
Now, for the Java solution:
String ToSentence(String camel)
{
if (camel == null) return ""; // Or null...
String[] words = camel.split("(?=[A-Z])");
if (words == null) return "";
if (words.length == 1) return words[0];
StringBuilder sentence = new StringBuilder(camel.length());
if (words[0].length() > 0) // Just in case of camelCase instead of CamelCase
{
sentence.append(words[0] + " " + words[1].toLowerCase());
}
else
{
sentence.append(words[1]);
}
for (int i = 2; i < words.length; i++)
{
sentence.append(" " + words[i].toLowerCase());
}
return sentence.toString();
}
System.out.println(ToSentence("AwaitingAFeedbackDearMaster"));
System.out.println(ToSentence(null));
System.out.println(ToSentence(""));
System.out.println(ToSentence("A"));
System.out.println(ToSentence("Aaagh!"));
System.out.println(ToSentence("stackoverflow"));
System.out.println(ToSentence("disableGPS"));
System.out.println(ToSentence("Ahh89Boo"));
System.out.println(ToSentence("ABC"));
Note the trick to split the sentence without loosing any character...
Pseudo-code:
NewString = "";
Loop through every char of the string (skip the first one)
If char is upper-case ('A'-'Z')
NewString = NewString + ' ' + lowercase(char)
Else
NewString = NewString + char
Better ways can perhaps be done by using regex or by string replacement routines (replace 'X' with ' x')
An xquery solution that works for both UpperCamel and lowerCamel case:
To output sentence case (only the first character of the first word is capitalized):
declare function content:sentenceCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),lower-case(replace($remainingCharacters, '([A-Z])', ' $1')))
};
To output title case (first character of each word capitalized):
declare function content:titleCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),replace($remainingCharacters, '([A-Z])', ' $1'))
};
Found myself doing something similar, and I appreciate having a point-of-departure with this discussion. This is my solution, placed as an extension method to the string class in the context of a console application.
using System;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string piratese = "avastTharMatey";
string ivyese = "CheerioPipPip";
Console.WriteLine("{0}\n{1}\n", piratese.CamelCaseToString(), ivyese.CamelCaseToString());
Console.WriteLine("For Pete\'s sake, man, hit ENTER!");
string strExit = Console.ReadLine();
}
}
public static class StringExtension
{
public static string CamelCaseToString(this string str)
{
StringBuilder retVal = new StringBuilder(32);
if (!string.IsNullOrEmpty(str))
{
string strTrimmed = str.Trim();
if (!string.IsNullOrEmpty(strTrimmed))
{
retVal.Append(char.ToUpper(strTrimmed[0]));
if (strTrimmed.Length > 1)
{
for (int i = 1; i < strTrimmed.Length; i++)
{
if (char.IsUpper(strTrimmed[i])) retVal.Append(" ");
retVal.Append(char.ToLower(strTrimmed[i]));
}
}
}
}
return retVal.ToString();
}
}
}
Most of the preceding answers split acronyms and numbers, adding a space in front of each character. I wanted acronyms and numbers to be kept together so I have a simple state machine that emits a space every time the input transitions from one state to the other.
/// <summary>
/// Add a space before any capitalized letter (but not for a run of capitals or numbers)
/// </summary>
internal static string FromCamelCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input)) return String.Empty;
var sb = new StringBuilder();
bool upper = true;
for (var i = 0; i < input.Length; i++)
{
bool isUpperOrDigit = char.IsUpper(input[i]) || char.IsDigit(input[i]);
// any time we transition to upper or digits, it's a new word
if (!upper && isUpperOrDigit)
{
sb.Append(' ');
}
sb.Append(input[i]);
upper = isUpperOrDigit;
}
return sb.ToString();
}
And here's some tests:
[TestCase(null, ExpectedResult = "")]
[TestCase("", ExpectedResult = "")]
[TestCase("ABC", ExpectedResult = "ABC")]
[TestCase("abc", ExpectedResult = "abc")]
[TestCase("camelCase", ExpectedResult = "camel Case")]
[TestCase("PascalCase", ExpectedResult = "Pascal Case")]
[TestCase("Pascal123", ExpectedResult = "Pascal 123")]
[TestCase("CustomerID", ExpectedResult = "Customer ID")]
[TestCase("CustomABC123", ExpectedResult = "Custom ABC123")]
public string CanSplitCamelCase(string input)
{
return FromCamelCaseToSentence(input);
}
Mostly already answered here
Small chage to the accepted answer, to convert the second and subsequent Capitalised letters to lower case, so change
if (char.IsUpper(text[i]))
newText.Append(' ');
newText.Append(text[i]);
to
if (char.IsUpper(text[i]))
{
newText.Append(' ');
newText.Append(char.ToLower(text[i]));
}
else
newText.Append(text[i]);
Here is my implementation. This is the fastest that I got while avoiding creating spaces for abbreviations.
public static string PascalCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input) || input.Length < 2)
return input;
var sb = new char[input.Length + ((input.Length + 1) / 2)];
var len = 0;
var lastIsLower = false;
for (int i = 0; i < input.Length; i++)
{
var current = input[i];
if (current < 97)
{
if (lastIsLower)
{
sb[len] = ' ';
len++;
}
lastIsLower = false;
}
else
{
lastIsLower = true;
}
sb[len] = current;
len++;
}
return new string(sb, 0, len);
}

Categories