How to keep only numbers and some special characters in a string? - c#

I have a string containing regular characters, special characters and numbers. I'm trying to remove the regular characters, just keeping the numbers and the special characters. I use a loop to check if a character is a special character or a number. Then, I replace it with an empty string. However, this doesn't seem to work because I get an error "can't apply != to string or char". My code is below. If possible, please give me some ideas to fix this. Thanks.
public string convert_string_to_no(string val)
{
string str_val = "";
int val_len = val.Length;
for (int i = 0; i < val_len; i++)
{
char myChar = Convert.ToChar(val.Substring(i, 1));
if ((char.IsDigit(myChar) == false) && (myChar != "-"))
{
str_val = str_val.replace(str_val.substring(i,1),"");
}
}
return str_val;
}

you can use regular expressions to do that.its faster than using loop and clean
String test ="Q1W2-hjkxas1-EE3R4-5T";
Regex rgx = new Regex("[^0-9-]");
Console.WriteLine(rgx.Replace(test, ""));
check the working code here

It seem try to change "-" to '-', and better to construct the string and not replacing the char.
public string convert_string_to_no(string val)
{
string str_val = "";
int val_len = val.Length;
for (int i = 0; i < val_len; i++)
{
char myChar = Convert.ToChar(val.Substring(i, 1));
if (char.IsDigit(myChar) && myChar == '-')
{
str_val += myChar;
}
}
return str_val;
}

I perfer Linq:
public static class StringExtensions
{
public static string ToLimitedString(this string instance,
string validCharacters)
{
// null reference checking...
var result = new string(instance
.Where(c => validCharacters.Contains(c))
.ToArray());
return result;
}
}
usage:
var test ="Q1W2-hjkxas1-EE3R4-5T";
var limited = test.ToLimitedString("01234567890-");
Console.WriteLine(limited);
result:
12-1-34-5
DotNetFiddle Example

Related

From end to start how to get everything from the 5th occurence of "-"?

I have this string:
abc-8479fe75-82rb5-45g00-b563-a7346703098b
I want to go through this string starting from the end and grab everything till the 5th occurence of the "-" sign.
how do I substring this string to only show (has to be through the procedure described above): -8479fe75-82rb5-45g00-b563-a7346703098b
You can also define handy extension method if that's common use case:
public static class StringExtensions
{
public static string GetSubstringFromReversedOccurence(this string #this,
char #char,
int occurence)
{
var indexes = new List<int>();
var idx = 0;
while ((idx = #this.IndexOf(#char, idx + 1)) > 0)
{
indexes.Add(idx);
}
if (occurence > indexes.Count)
{
throw new Exception("Not enough occurences of character");
}
var desiredIndex = indexes[indexes.Count - occurence];
return #this.Substring(desiredIndex);
}
}
According to your string rules, I would do a simple split :
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var split = text.split('-');
if (split.Lenght <= 5)
return text;
return string.Join('-', split.Skip(split.Lenght - 5);
We could use a regex replacement with a capture group:
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var pattern = #"([^-]+(?:-[^-]+){4}).*";
var output = Regex.Replace(text, pattern, "$1");
Console.WriteLine(output); // abc-8479fe75-82rb5-45g00-b563
If we can rely on the input always having 6 hyphen separated components, we could also use this version:
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var pattern = #"-[^-]+$";
var output = Regex.Replace(text, pattern, "");
Console.WriteLine(output); // abc-8479fe75-82rb5-45g00-b563
I think this should be the most performant
public string GetSubString(string text)
{
var count = 0;
for (int i = text.Length-1; i > 0; i--)
{
if (text[i] == '-') count++;
if (count == 5) return text.Substring(i);
}
return null;
}

Multiple string replace in c#

I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines.
Example:
Source string:
"consecuti?vely"
Replace rules:
.Replace("cuti?",#"cuti?(-\s+)?")
.Replace("con",#"con(-\s+)?")
.Replace("consecu",#"consecu(-\s+)?")
Desired output:
"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"
The replace rules are built dynamically, this is just an example which causes problems.
Whats the best solution to perform such a multiple replace, which will produce the desired output?
So far I thought about using Regex.Replace and zipping the word to replace with optional (-\s+)? between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context.
EDIT: My current code, doesnt work when replace rules overlap like in example above
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
for (int i = 0; i < hyphenatedParts.Count; i++)
{
var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));
regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + #"(-\s+)?");
}
return regex;
}
the output of this program is "con(-\s+)?secu(-\s+)?ti?(-\s+)?vely";
and as I understand your problem, my code can completely solve your problem.
class Program
{
class somefields
{
public string first;
public string secound;
public string Add;
public int index;
public somefields(string F, string S)
{
first = F;
secound = S;
}
}
static void Main(string[] args)
{
//declaring output
string input = "consecuti?vely";
List<somefields> rules=new List<somefields>();
//declaring rules
rules.Add(new somefields("cuti?",#"cuti?(-\s+)?"));
rules.Add(new somefields("con",#"con(-\s+)?"));
rules.Add(new somefields("consecu",#"consecu(-\s+)?"));
// finding the string which must be added to output string and index of that
foreach (var rul in rules)
{
var index=input.IndexOf(rul.first);
if (index != -1)
{
var add = rul.secound.Remove(0,rul.first.Count());
rul.Add = add;
rul.index = index+rul.first.Count();
}
}
// sort rules by index
for (int i = 0; i < rules.Count(); i++)
{
for (int j = i + 1; j < rules.Count(); j++)
{
if (rules[i].index > rules[j].index)
{
somefields temp;
temp = rules[i];
rules[i] = rules[j];
rules[j] = temp;
}
}
}
string output = input.ToString();
int k=0;
foreach(var rul in rules)
{
if (rul.index != -1)
{
output = output.Insert(k + rul.index, rul.Add);
k += rul.Add.Length;
}
}
System.Console.WriteLine(output);
System.Console.ReadLine();
}
}
You should probably write your own parser, it's probably easier to maintain :).
Maybe you could add "special characters" around pattern in order to protect them like "##" if the strings not contains it.
Try this one:
var final = Regex.Replace(originalTextOfThePage, #"(\w+)(?:\-[\s\r\n]*)?", "$1");
I had to give up an easy solution and did the editing of the regex myself. As a side effect, the new approach goes only twice trough the string.
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var indexesToInsertPossibleHyphenation = GetPossibleHyphenPositions(regex, searchedPage);
var hyphenationToken = #"(-\s+)?";
return InsertStringTokenInAllPositions(regex, indexesToInsertPossibleHyphenation, hyphenationToken);
}
private static string InsertStringTokenInAllPositions(string sourceString, List<int> insertionIndexes, string insertionToken)
{
if (insertionIndexes == null || string.IsNullOrEmpty(insertionToken)) return sourceString;
var sb = new StringBuilder(sourceString.Length + insertionIndexes.Count * insertionToken.Length);
var linkedInsertionPositions = new LinkedList<int>(insertionIndexes.Distinct().OrderBy(x => x));
for (int i = 0; i < sourceString.Length; i++)
{
if (!linkedInsertionPositions.Any())
{
sb.Append(sourceString.Substring(i));
break;
}
if (i == linkedInsertionPositions.First.Value)
{
sb.Append(insertionToken);
}
if (i >= linkedInsertionPositions.First.Value)
{
linkedInsertionPositions.RemoveFirst();
}
sb.Append(sourceString[i]);
}
return sb.ToString();
}
private List<int> GetPossibleHyphenPositions(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
var indexesToInsertPossibleHyphenation = new List<int>();
//....
// Aho-Corasick to find all occurences of all
//strings in "hyphenatedParts" in the "regex" string
// ....
return indexesToInsertPossibleHyphenation;
}

Getting number from a string in C#

I am scraping some website content which is like this - "Company Stock Rs. 7100".
Now, what i want is to extract the numeric value from this string. I tried split but something or the other goes wrong with my regular expression.
Please let me know how to get this value.
Use:
var result = Regex.Match(input, #"\d+").Value;
If you want to find only number which is last "entity" in the string you should use this regex:
\d+$
If you want to match last number in the string, you can use:
\d+(?!\D*\d)
int val = int.Parse(Regex.Match(input, #"\d+", RegexOptions.RightToLeft).Value);
I always liked LINQ:
var theNumber = theString.Where(x => char.IsNumber(x));
Though Regex sounds like the native choice...
This code will return the integer at the end of the string. This will work better than the regular expressions in the case that there is a number somewhere else in the string.
public int getLastInt(string line)
{
int offset = line.Length;
for (int i = line.Length - 1; i >= 0; i--)
{
char c = line[i];
if (char.IsDigit(c))
{
offset--;
}
else
{
if (offset == line.Length)
{
// No int at the end
return -1;
}
return int.Parse(line.Substring(offset));
}
}
return int.Parse(line.Substring(offset));
}
If your number is always after the last space and your string always ends with this number, you can get it this way:
str.Substring(str.LastIndexOf(" ") + 1)
Here is my answer ....it is separating numeric from string using C#....
static void Main(string[] args)
{
String details = "XSD34AB67";
string numeric = "";
string nonnumeric = "";
char[] mychar = details.ToCharArray();
foreach (char ch in mychar)
{
if (char.IsDigit(ch))
{
numeric = numeric + ch.ToString();
}
else
{
nonnumeric = nonnumeric + ch.ToString();
}
}
int i = Convert.ToInt32(numeric);
Console.WriteLine(numeric);
Console.WriteLine(nonnumeric);
Console.ReadLine();
}
}
}
You can use \d+ to match the first occurrence of a number:
string num = Regex.Match(input, #"\d+").Value;

How can you remove duplicate characters in a string?

I have to implements a function that takes a string as an input and finds the non-duplicate character from this string.
So an an example is if I pass string str = "DHCD" it will return "DHC"
or str2 = "KLKLHHMO" it will return "KLHMO"
A Linq approach:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
It will do the job
string removedupes(string s)
{
string newString = string.Empty;
List<char> found = new List<char>();
foreach(char c in s)
{
if(found.Contains(c))
continue;
newString+=c.ToString();
found.Add(c);
}
return newString;
}
I should note this is criminally inefficient.
I think I was delirious on first revision.
For arbitrary length strings of byte-sized characters (not for wide characters or other encodings), I would use a lookup table, one bit per character (32 bytes for a 256-bit table). Loop through your string, only output characters that don't have their bits turned on, then turn the bit on for that character.
string removedupes(string s)
{
string t;
byte[] found = new byte[256];
foreach(char c in s)
{
if(!found[c]) {
t.Append(c);
found[c]=1;
}
}
return t;
}
I am not good with C#, so I don't know the right way to use a bitfield instead of a byte array.
If you know that your strings are going to be very short, then other approaches would offer better memory usage and/or speed.
void removeDuplicate()
{
string value1 = RemoveDuplicateChars("Devarajan");
}
static string RemoveDuplicateChars(string key)
{
string result = "";
foreach (char value in key)
if (result.IndexOf(value) == -1)
result += value;
return result;
}
It sounds like homework to me, so I'm just going to describe at a high level.
Loop over the string, examining each character
Check if you've seen the character before
if you have, remove it from the string
if you haven't, note that you've now seen that character
this is in C#. validation left out for brevity. primitive solution for removing duplicate chars from a given string
public static char[] RemoveDup(string s)
{
char[] chars = new char[s.Length];
int unique = 0;
chars[unique] = s[0]; // Assume: First char is unique
for (int i = 1; i < s.Length; i++)
{
// add char in i index to unique array
// if char in i-1 != i index
// i.e s = "ab" -> a != b
if (s[i-1] != s[i]
chars[++unique] = s[i];
}
return chars;
}
My answer in java language.
Posting here so that you might get a idea even it is in Java language.Algorithm would remain same.
public String removeDup(String s)
{
if(s==null) return null;
int l = s.length();
//if length is less than 2 return string
if(l<2)return s;
char arr[] = s.toCharArray();
for(int i=0;i<l;i++)
{
int j =i+1; //index to check with ith index
int t = i+1; //index of first repetative char.
while(j<l)
{
if(arr[j]==arr[i])
{
j++;
}
else
{
arr[t]=arr[j];
t++;
j++;
}
}
l=t;
}
return new String(arr,0,l);
}
you may use HashSet:
static void Main()
{
string textWithDuplicates = "aaabbcccggg";
Console.WriteLine(textWithDuplicates.Count());
var letters = new HashSet<char>(textWithDuplicates);
Console.WriteLine(letters.Count());
foreach (char c in letters) Console.Write(c);
}
class Program
{
static void Main(string[] args)
{
bool[] doesExists = new bool[256];
String st = Console.ReadLine();
StringBuilder sb = new StringBuilder();
foreach (char ch in st)
{
if (!doesExists[ch])
{
sb.Append(ch);
doesExists[ch] = true;
}
}
Console.WriteLine(sb.ToString());
}
}
Revised version of the first answer i.e: You don't need ToCharArray() function for this to work.
public static string RemoveDuplicates(string input)
{
return new string(input.Distinct().ToArray());
}
char *remove_duplicates(char *str)
{
char *str1, *str2;
if(!str)
return str;
str1 = str2 = str;
while(*str2)
{
if(strchr(str, *str2)<str2)
{
str2++;
continue;
}
*str1++ = *str2++;
}
*str1 = '\0';
return str;
}
char* removeDups(const char* str)
{
char* new_str = (char*)malloc(256*sizeof(char));
int i,j,current_pos = 0,len_of_new_str;
new_str[0]='\0';
for(i=0;i<strlen(str);i++)
{
len_of_new_str = strlen(new_str);
for(j=0;j<len_of_new_str && new_str[j]!=str[i];j++)
;
if(j==len_of_new_str)
{
new_str[len_of_new_str] = str[i];
new_str[len_of_new_str+1] = '\0';
}
}
return new_str;
}
Hope this helps
String str="AABBCANCDE";
String newStr="";
for( int i=0; i<str.length(); i++)
{
if(!newStr.contains(str.charAt(i)+""))
newStr= newStr+str.charAt(i);
}
System.out.println(newStr);
// Remove both upper-lower duplicates
public static string RemoveDuplicates(string key)
{
string Result = string.Empty;
foreach (char a in key)
{
if (Result.Contains(a.ToString().ToUpper()) || Result.Contains(a.ToString().ToLower()))
continue;
Result += a.ToString();
}
return Result;
}
var input1 = Console.ReadLine().ToLower().ToCharArray();
var input2 = input1;
var WithoutDuplicate = input1.Union(input2);
Console.WriteLine("Enter String");
string str = Console.ReadLine();
string result = "";
result += str[0]; // first character of string
for (int i = 1; i < str.Length; i++)
{
if (str[i - 1] != str[i])
result += str[i];
}
Console.WriteLine(result);
I like Quintin Robinson answer, only there should be some improvements like removing List, because it is not necessarry in this case.
Also, in my opinion Uppercase char ("K") and lowercase char ("k") is the same thing, so they should be counted as one.
So here is how I would do it:
private static string RemoveDuplicates(string textEntered)
{
string newString = string.Empty;
foreach (var c in textEntered)
{
if (newString.Contains(char.ToLower(c)) || newString.Contains(char.ToUpper(c)))
{
continue;
}
newString += c.ToString();
}
return newString;
}
Not sure how optimal it is:
public static string RemoveDuplicates(string input)
{
var output = string.Join("", input.ToHashSet());
return output;
}
Below is the code to remove duplicate chars from a string
var input = "SaaSingeshe";
var filteredString = new StringBuilder();
foreach(char c in input)
{
if(filteredString.ToString().IndexOf(c)==-1)
{
filteredString.Append(c);
}
}
Console.WriteLine(filteredString);
Console.ReadKey();
namespace Demo { class Program {
static void Main(string[] args) {
string myStr = "kkllmmnnouo";
Console.WriteLine("Initial String: "+myStr);
// var unique = new HashSet<char>(myStr);
HashSet<char> unique = new HashSet<char>(myStr);
Console.Write("New String after removing duplicates: ");
foreach (char c in unique)
Console.Write(c);
} } }
this works for me
private string removeDuplicateChars(String value)
{
return new string(value.Distinct().ToArray());
}

Best way to convert Pascal Case to a sentence

What is the best way to convert from Pascal Case (upper Camel Case) to a sentence.
For example starting with
"AwaitingFeedback"
and converting that to
"Awaiting feedback"
C# preferable but I could convert it from Java or similar.
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
In versions of visual studio after 2015, you can do
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => $"{m.Value[0]} {char.ToLower(m.Value[1])}");
}
Based on: Converting Pascal case to sentences using regular expression
I will prefer to use Humanizer for this. Humanizer is a Portable Class Library that meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities.
Short Answer
"AwaitingFeedback".Humanize() => Awaiting feedback
Long and Descriptive Answer
Humanizer can do a lot more work other examples are:
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Can_return_title_Case".Humanize(LetterCasing.Title) => "Can Return Title Case"
"CanReturnLowerCase".Humanize(LetterCasing.LowerCase) => "can return lower case"
Complete code is :
using Humanizer;
using static System.Console;
namespace HumanizerConsoleApp
{
class Program
{
static void Main(string[] args)
{
WriteLine("AwaitingFeedback".Humanize());
WriteLine("PascalCaseInputStringIsTurnedIntoSentence".Humanize());
WriteLine("Underscored_input_string_is_turned_into_sentence".Humanize());
WriteLine("Can_return_title_Case".Humanize(LetterCasing.Title));
WriteLine("CanReturnLowerCase".Humanize(LetterCasing.LowerCase));
}
}
}
Output
Awaiting feedback
Pascal case input string is turned into sentence
Underscored input string is turned into sentence Can Return Title Case
can return lower case
If you prefer to write your own C# code you can achieve this by writing some C# code stuff as answered by others already.
Here you go...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CamelCaseToString
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CamelCaseToString("ThisIsYourMasterCallingYou"));
}
private static string CamelCaseToString(string str)
{
if (str == null || str.Length == 0)
return null;
StringBuilder retVal = new StringBuilder(32);
retVal.Append(char.ToUpper(str[0]));
for (int i = 1; i < str.Length; i++ )
{
if (char.IsLower(str[i]))
{
retVal.Append(str[i]);
}
else
{
retVal.Append(" ");
retVal.Append(char.ToLower(str[i]));
}
}
return retVal.ToString();
}
}
}
This works for me:
Regex.Replace(strIn, "([A-Z]{1,2}|[0-9]+)", " $1").TrimStart()
This is just like #SSTA, but is more efficient than calling TrimStart.
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Found this in the MvcContrib source, doesn't seem to be mentioned here yet.
return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
Just because everyone has been using Regex (except this guy), here's an implementation with StringBuilder that was about 5x faster in my tests. Includes checking for numbers too.
"SomeBunchOfCamelCase2".FromCamelCaseToSentence == "Some Bunch Of Camel Case 2"
public static string FromCamelCaseToSentence(this string input) {
if(string.IsNullOrEmpty(input)) return input;
var sb = new StringBuilder();
// start with the first character -- consistent camelcase and pascal case
sb.Append(char.ToUpper(input[0]));
// march through the rest of it
for(var i = 1; i < input.Length; i++) {
// any time we hit an uppercase OR number, it's a new word
if(char.IsUpper(input[i]) || char.IsDigit(input[i])) sb.Append(' ');
// add regularly
sb.Append(input[i]);
}
return sb.ToString();
}
Here's a basic way of doing it that I came up with using Regex
public static string CamelCaseToSentence(this string value)
{
var sb = new StringBuilder();
var firstWord = true;
foreach (var match in Regex.Matches(value, "([A-Z][a-z]+)|[0-9]+"))
{
if (firstWord)
{
sb.Append(match.ToString());
firstWord = false;
}
else
{
sb.Append(" ");
sb.Append(match.ToString().ToLower());
}
}
return sb.ToString();
}
It will also split off numbers which I didn't specify but would be useful.
string camel = "MyCamelCaseString";
string s = Regex.Replace(camel, "([A-Z])", " $1").ToLower().Trim();
Console.WriteLine(s.Substring(0,1).ToUpper() + s.Substring(1));
Edit: didn't notice your casing requirements, modifed accordingly. You could use a matchevaluator to do the casing, but I think a substring is easier. You could also wrap it in a 2nd regex replace where you change the first character
"^\w"
to upper
\U (i think)
I'd use a regex, inserting a space before each upper case character, then lowering all the string.
string spacedString = System.Text.RegularExpressions.Regex.Replace(yourString, "\B([A-Z])", " \k");
spacedString = spacedString.ToLower();
It is easy to do in JavaScript (or PHP, etc.) where you can define a function in the replace call:
var camel = "AwaitingFeedbackDearMaster";
var sentence = camel.replace(/([A-Z].)/g, function (c) { return ' ' + c.toLowerCase(); });
alert(sentence);
Although I haven't solved the initial cap problem... :-)
Now, for the Java solution:
String ToSentence(String camel)
{
if (camel == null) return ""; // Or null...
String[] words = camel.split("(?=[A-Z])");
if (words == null) return "";
if (words.length == 1) return words[0];
StringBuilder sentence = new StringBuilder(camel.length());
if (words[0].length() > 0) // Just in case of camelCase instead of CamelCase
{
sentence.append(words[0] + " " + words[1].toLowerCase());
}
else
{
sentence.append(words[1]);
}
for (int i = 2; i < words.length; i++)
{
sentence.append(" " + words[i].toLowerCase());
}
return sentence.toString();
}
System.out.println(ToSentence("AwaitingAFeedbackDearMaster"));
System.out.println(ToSentence(null));
System.out.println(ToSentence(""));
System.out.println(ToSentence("A"));
System.out.println(ToSentence("Aaagh!"));
System.out.println(ToSentence("stackoverflow"));
System.out.println(ToSentence("disableGPS"));
System.out.println(ToSentence("Ahh89Boo"));
System.out.println(ToSentence("ABC"));
Note the trick to split the sentence without loosing any character...
Pseudo-code:
NewString = "";
Loop through every char of the string (skip the first one)
If char is upper-case ('A'-'Z')
NewString = NewString + ' ' + lowercase(char)
Else
NewString = NewString + char
Better ways can perhaps be done by using regex or by string replacement routines (replace 'X' with ' x')
An xquery solution that works for both UpperCamel and lowerCamel case:
To output sentence case (only the first character of the first word is capitalized):
declare function content:sentenceCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),lower-case(replace($remainingCharacters, '([A-Z])', ' $1')))
};
To output title case (first character of each word capitalized):
declare function content:titleCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),replace($remainingCharacters, '([A-Z])', ' $1'))
};
Found myself doing something similar, and I appreciate having a point-of-departure with this discussion. This is my solution, placed as an extension method to the string class in the context of a console application.
using System;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string piratese = "avastTharMatey";
string ivyese = "CheerioPipPip";
Console.WriteLine("{0}\n{1}\n", piratese.CamelCaseToString(), ivyese.CamelCaseToString());
Console.WriteLine("For Pete\'s sake, man, hit ENTER!");
string strExit = Console.ReadLine();
}
}
public static class StringExtension
{
public static string CamelCaseToString(this string str)
{
StringBuilder retVal = new StringBuilder(32);
if (!string.IsNullOrEmpty(str))
{
string strTrimmed = str.Trim();
if (!string.IsNullOrEmpty(strTrimmed))
{
retVal.Append(char.ToUpper(strTrimmed[0]));
if (strTrimmed.Length > 1)
{
for (int i = 1; i < strTrimmed.Length; i++)
{
if (char.IsUpper(strTrimmed[i])) retVal.Append(" ");
retVal.Append(char.ToLower(strTrimmed[i]));
}
}
}
}
return retVal.ToString();
}
}
}
Most of the preceding answers split acronyms and numbers, adding a space in front of each character. I wanted acronyms and numbers to be kept together so I have a simple state machine that emits a space every time the input transitions from one state to the other.
/// <summary>
/// Add a space before any capitalized letter (but not for a run of capitals or numbers)
/// </summary>
internal static string FromCamelCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input)) return String.Empty;
var sb = new StringBuilder();
bool upper = true;
for (var i = 0; i < input.Length; i++)
{
bool isUpperOrDigit = char.IsUpper(input[i]) || char.IsDigit(input[i]);
// any time we transition to upper or digits, it's a new word
if (!upper && isUpperOrDigit)
{
sb.Append(' ');
}
sb.Append(input[i]);
upper = isUpperOrDigit;
}
return sb.ToString();
}
And here's some tests:
[TestCase(null, ExpectedResult = "")]
[TestCase("", ExpectedResult = "")]
[TestCase("ABC", ExpectedResult = "ABC")]
[TestCase("abc", ExpectedResult = "abc")]
[TestCase("camelCase", ExpectedResult = "camel Case")]
[TestCase("PascalCase", ExpectedResult = "Pascal Case")]
[TestCase("Pascal123", ExpectedResult = "Pascal 123")]
[TestCase("CustomerID", ExpectedResult = "Customer ID")]
[TestCase("CustomABC123", ExpectedResult = "Custom ABC123")]
public string CanSplitCamelCase(string input)
{
return FromCamelCaseToSentence(input);
}
Mostly already answered here
Small chage to the accepted answer, to convert the second and subsequent Capitalised letters to lower case, so change
if (char.IsUpper(text[i]))
newText.Append(' ');
newText.Append(text[i]);
to
if (char.IsUpper(text[i]))
{
newText.Append(' ');
newText.Append(char.ToLower(text[i]));
}
else
newText.Append(text[i]);
Here is my implementation. This is the fastest that I got while avoiding creating spaces for abbreviations.
public static string PascalCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input) || input.Length < 2)
return input;
var sb = new char[input.Length + ((input.Length + 1) / 2)];
var len = 0;
var lastIsLower = false;
for (int i = 0; i < input.Length; i++)
{
var current = input[i];
if (current < 97)
{
if (lastIsLower)
{
sb[len] = ' ';
len++;
}
lastIsLower = false;
}
else
{
lastIsLower = true;
}
sb[len] = current;
len++;
}
return new string(sb, 0, len);
}

Categories