c# finding indexof and remove character atsign - c#

there is a String which contains some # characters, i want to find " # " in my string and remove them, but it also finds and removes these ones: "#"
int atsignPlace = str.IndexOf(" # ");
while (atsignPlace >= 0)
{
str = str.Remove(atsignPlace,3);
atsignPlace = str.IndexOf(" # ");
}
i tried this code, but it removes nothing, so it always finds first '#' ,which makes it an infinite loop.
int atsignPlace = str.IndexOf(" #");
while (atsignPlace >= 0)
{
if( atsignPlace+1 < str.Length && str[atsignPlace+1] == ' ' )
str = str.Remove(atsignPlace,3);
atsignPlace = str.IndexOf(" # ");
}
Replace method also doesn't work correct.
str = str.Replace(" # ", String.Empty);
maybe there is a problem with '#' character.
the input string is a sql query, i am trying to remove some parameters from it.
[ i have used try-catch for exceptions ]

Your code works fine. Short but complete program to demonstrate:
using System;
class Test
{
static void Main()
{
string before = "xyz # abc#123";
string after = CustomRemove(before);
Console.WriteLine(after); // Prints xyzabc#123
}
static string CustomRemove(string text)
{
int atSignIndex = text.IndexOf(" # ");
while (atSignIndex >= 0)
{
text = text.Remove(atSignIndex, 3);
atSignIndex = text.IndexOf(" # ");
}
return text;
}
}
EDIT: Of course, Replace works fine too:
using System;
class Test
{
static void Main()
{
string before = "xyz # abc#123";
string after = before.Replace(" # ", "");
Console.WriteLine(after); // Prints xyzabc#123
}
}
If you're still seeing a problem with either of these, then the issue is in how you're using this code, not in the code itself.
One guess: you might have non-printed characters within the " # " which is preventing them from being removed. But you haven't really given us enough information to say. A short but complete program demonstrating it not working would help...

Keep it simple:
string result = input.Replace(" # ", String.Empty);
MSDN: String.Replace Method (String, String)

I would use regex to make sure that you get any number of whitespaces:
Regex.Replace(input, #"\s+#\s+", m => string.Empty);

string LclString = "#12 # 123#123 # #";
LclString = LclString.Replace(" # ", " ");
Yields this:
#12 123#123 #

Related

last and first word in a string c#

i need to print the first and the last word in a string here is what i've tried
Console.WriteLine("please enter a string");
string str = Console.ReadLine();
string first = str.Substring(0, str.IndexOf(" "));
string last = str.Substring(str.LastIndexOf(' '),str.Length-1);
Console.WriteLine(first + " " + last);
when i run the code this massage appear
Unhandled Exception: System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
Parameter name: length
at System.String.Substring(Int32 startIndex, Int32 length)
at ConsoleApp1.Tar13.Main() in C:\Users\User\source\repos\ConsoleApp1\ConsoleApp1\Tar13.cs:line 16
i dont know what is the problem
If this is homework, don't hand this in unless you really understand it, have done LINQ (or have a supervisor that approves of off-piste learning and you're prepared to acknowledge you got outside assistance/did background learning) and are willing to explain it if asked:
Console.WriteLine("please enter a string");
string str = Console.ReadLine();
string[] bits = str.Split();
Console.WriteLine(bits.First() + " " + bits.Last());
For a non-LINQ version:
Console.WriteLine("please enter a string");
string str = Console.ReadLine();
string first = str.Remove(str.IndexOf(' '));
string last = str.Substring(str.LastIndexOf(' ') + 1);
Console.WriteLine(first + " " + last);
Bear in mind that these will crash if there are no spaces in the string - the Split version won't
Look at String Remove and Substring
If you want to robust things up so it doesn't crash:
Console.WriteLine("please enter a string");
string str = Console.ReadLine();
if(str.Contains(" ")){
string first = str.Remove(str.IndexOf(' '));
string last = str.Substring(str.LastIndexOf(' ') + 1);
Console.WriteLine(first + " " + last);
}
I'll leave a "what might we put in an else?" in that last code block, as an exercise for you :)
you can split the string and get first and last...
var s = str.Split(' ', StringSplitOptions.RemoveEmptyEntries );
if(s.Length >= 2)
{
var first = s.First();
var last = s.Last();
Console.WriteLine($"{first} {last}");
}
In general case when sentence can contain punctuation, not necessary English letters you can try regular expressions. Let's define
Word is non empty sequence of letters and apostrophes
And so we have
Code:
using System.Linq;
using System.Text.RegularExpressions;
...
private static (string first, string last) Solve(string value) {
if (string.IsNullOrWhiteSpace(value))
return ("", "");
var words = Regex
.Matches(value, #"[\p{L}']+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
return words.Length > 0
? (words[0], words[words.Length - 1])
: ("", "");
}
Demo:
string[] tests = new string[] {
"Simple string", // Simple Smoke Test
"Single", // Single word which is both first an last
"", // No words at all; let's return empty strings
"words, punctuations: the end.", // Punctuations
"Русская (Russian) строка!", // Punctuations, non-English words
};
var result = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-30} :: {Solve(test)}"));
Console.Write(result);
Outcome:
Simple string :: (Simple, string)
Single :: (Single, Single)
:: (, )
words, punctuations: the end. :: (words, end)
Русская (Russian) строка! :: (Русская, строка)
If you want to get the last and first-word try to do the following:
string sentence = "Hello World"; //Sentence
string first = sentence.Split(" ")[0]; //First word
string last = sentence.Split(" ")[sentence.Split(" ").Length -1]; //Last word
Console.WriteLine(first + " "+ last);

Console Application to count all '#' characters in a text file

My code is posted below. I expect it to output the number of # characters in an input file. It is currently not providing the expected output.
static void Main(string[] args)
{
StreamReader oReader;
if (File.Exists(#"C:\Documents and Settings\9chat73\Desktop\count.txt"))
{
Console.WriteLine("Enter a word to search");
string cSearforSomething = Console.ReadLine().Trim();
oReader = new StreamReader(#"C:\Documents and Settings\9chat73\Desktop\count.txt");
string cColl = oReader.ReadToEnd();
string cCriteria = #"\b" + cSearforSomething + #"\b";
System.Text.RegularExpressions.Regex oRegex = new System.Text.RegularExpressions.Regex(cCriteria, RegexOptions.IgnoreCase);
int count = oRegex.Matches(cColl).Count;
Console.WriteLine(count.ToString());
}
Console.ReadLine();
}
This is giving me output as 0 every time. I have the below file as count.txt :00100324103| #00100324137| #00100324145| #00100324153| #00100324179| . I want to calculate the number of Hashes(#) inside the file. How to do that.
You are looking for # as separate word. Remove word boundaries requirement from your criteria:
string cCriteria = cSearforSomething;
The problem is that '#' (symbol your looking for) is a special symbol in regular expressions
and so, should be escaped:
static void Main(string[] args) {
//String fileName = #"C:\Documents and Settings\9chat73\Desktop\count.txt";
// To search dinamically, just ask for a file:
Console.WriteLine("Enter a file to search");
String fileName = Console.ReadLine().Trim();
if (File.Exists(fileName)) {
Console.WriteLine("Enter a word to search");
String pattern = Console.ReadLine().Trim();
// Do not forget to escape the pattern!
int count = Regex.Matches(File.ReadAllText(fileName),
Regex.Escape(pattern),
RegexOptions.IgnoreCase).Count;
Console.WriteLine(count.ToString());
}
Console.ReadLine();
}
Try the above
int count = cColl.Count(x => x == '#');
var count = File.ReadAllText(#"c:\...").Count(x => x == '#');
string cCriteria = #"\b" + cSearforSomething + #"\b";
This is your problem. If you remove the #"\b" from each end, you'll get the correct amount of '#' characters, because those characters denote the ends of a word and the '#' character is not it's own word.

Adding a '-' to my string in C#

I Have a string in the form "123456789".
While displaying it on the screen I want to show it as 123-456-789.
Please let me knwo how to add the "-" for every 3 numbers.
Thanks in Advance.
You can use string.Substring:
s = s.Substring(0, 3) + "-" + s.Substring(3, 3) + "-" + s.Substring(6, 3);
or a regular expression (ideone):
s = Regex.Replace(s, #"\d{3}(?=\d)", "$0-");
I'll go ahead and give the Regex based solution:
string rawNumber = "123456789";
var formattedNumber = Regex.Replace(rawNumber, #"(\d{3}(?!$))", "$1-");
That regex breaks down as follows:
( // Group the whole pattern so we can get its value in the call to Regex.Replace()
\d // This is a digit
{3} // match the previous pattern 3 times
(?!$) // This weird looking thing means "match anywhere EXCEPT the end of the string"
)
The "$1-" replacement string means that whenever a match for the above pattern is found, replace it with the same thing (the $1 part), followed by a -. So in "123456789", it would match 123 and 456, but not 789 because it's at the end of the string. It then replaces them with 123- and 456-, giving the final result 123-456-789.
You can use for loop also if the string length is not fixed to 9 digits as follows
string textnumber = "123456789"; // textnumber = "123456789012346" also it will work
string finaltext = textnumber[0]+ "";
for (int i = 1; i < textnumber.Length; i++)
{
if ((i + 1) % 3 == 0)
{
finaltext = finaltext + textnumber[i] + "-";
}
else
{
finaltext = finaltext + textnumber[i];
}
}
finaltext = finaltext.Remove(finaltext.Length - 1);

Replace lowercase characters with star

i wrote a code for replacing lowercase characters to *.
but it does not work.
where is the problem?
private void CharacterReplacement()
{
Console.WriteLine("Enter a string to replacement : ");
string TargetString = Console.ReadLine();
string MainString = TargetString;
for (int i = 0; i < TargetString.Length; i++)
{
if (char.IsLower(TargetString[i]))
{
TargetString.Replace(TargetString[i], '*');
}
}
Console.WriteLine("The string {0} has converted to {1}", MainString, TargetString);
}
Replace() returns a new string, so you need to re-assign it to TargetString:
TargetString = TargetString.Replace(TargetString[i], '*');
Another way to express your intend would be with Linq - not sure which I like better, this avoids all the temporary strings but has other overhead:
TargetString = new string(TargetString.Select(c => char.IsLower(c) ? '*' : c)
.ToArray());
You can of course write this in one short line by using a regular expression:
string output = Regex.Replace("ABCdef123", "[a-z]", "*"); // output = "ABC***123"
Improved version based on Arto's comment, that handles all lowercase unicode characters:
string output = Regex.Replace("ABCdefëï123", "\p{Ll}", "*"); // output = "ABC*****123"

Add spaces before Capital Letters

Given the string "ThisStringHasNoSpacesButItDoesHaveCapitals" what is the best way to add spaces before the capital letters. So the end string would be "This String Has No Spaces But It Does Have Capitals"
Here is my attempt with a RegEx
System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")
The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)
This function
string AddSpacesToSentence(string text, bool preserveAcronyms)
{
if (string.IsNullOrWhiteSpace(text))
return string.Empty;
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]))
if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
(preserveAcronyms && char.IsUpper(text[i - 1]) &&
i < text.Length - 1 && !char.IsUpper(text[i + 1])))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).
It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.
Update
It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).
On a string with 'Abbbbbbbbb' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.
Update 2
Will it take Acronyms into account? It will now!
The logic of the if statment is fairly obscure, as you can see expanding it to this ...
if (char.IsUpper(text[i]))
if (char.IsUpper(text[i - 1]))
if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
newText.Append(' ');
else ;
else if (text[i - 1] != ' ')
newText.Append(' ');
... doesn't help at all!
Here's the original simple method that doesn't worry about Acronyms
string AddSpacesToSentence(string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]) && text[i - 1] != ' ')
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Your solution has an issue in that it puts a space before the first letter T so you get
" This String..." instead of "This String..."
To get around this look for the lower case letter preceding it as well and then insert the space in the middle:
newValue = Regex.Replace(value, "([a-z])([A-Z])", "$1 $2");
Edit 1:
If you use #"(\p{Ll})(\p{Lu})" it will pick up accented characters as well.
Edit 2:
If your strings can contain acronyms you may want to use this:
newValue = Regex.Replace(value, #"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " $0");
So "DriveIsSCSICompatible" becomes "Drive Is SCSI Compatible"
Didn't test performance, but here in one line with linq:
var val = "ThisIsAStringToTest";
val = string.Concat(val.Select(x => Char.IsUpper(x) ? " " + x : x.ToString())).TrimStart(' ');
I know this is an old one, but this is an extension I use when I need to do this:
public static class Extensions
{
public static string ToSentence( this string Input )
{
return new string(Input.SelectMany((c, i) => i > 0 && char.IsUpper(c) ? new[] { ' ', c } : new[] { c }).ToArray());
}
}
This will allow you to use MyCasedString.ToSentence()
I set out to make a simple extension method based on Binary Worrier's code which will handle acronyms properly, and is repeatable (won't mangle already spaced words). Here is my result.
public static string UnPascalCase(this string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
var newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
var currentUpper = char.IsUpper(text[i]);
var prevUpper = char.IsUpper(text[i - 1]);
var nextUpper = (text.Length > i + 1) ? char.IsUpper(text[i + 1]) || char.IsWhiteSpace(text[i + 1]): prevUpper;
var spaceExists = char.IsWhiteSpace(text[i - 1]);
if (currentUpper && !spaceExists && (!nextUpper || !prevUpper))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Here are the unit test cases this function passes. I added most of tchrist's suggested cases to this list. The three of those it doesn't pass (two are just Roman numerals) are commented out:
Assert.AreEqual("For You And I", "ForYouAndI".UnPascalCase());
Assert.AreEqual("For You And The FBI", "ForYouAndTheFBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "AManAPlanACanalPanama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNSServer".UnPascalCase());
Assert.AreEqual("For You And I", "For You And I".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "MountMᶜKinleyNationalPark".UnPascalCase());
Assert.AreEqual("El Álamo Tejano", "ElÁlamoTejano".UnPascalCase());
Assert.AreEqual("The Ævar Arnfjörð Bjarmason", "TheÆvarArnfjörðBjarmason".UnPascalCase());
Assert.AreEqual("Il Caffè Macchiato", "IlCaffèMacchiato".UnPascalCase());
//Assert.AreEqual("Mister Dženan Ljubović", "MisterDženanLjubović".UnPascalCase());
//Assert.AreEqual("Ole King Henry Ⅷ", "OleKingHenryⅧ".UnPascalCase());
//Assert.AreEqual("Carlos Ⅴº El Emperador", "CarlosⅤºElEmperador".UnPascalCase());
Assert.AreEqual("For You And The FBI", "For You And The FBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "A Man A Plan A Canal Panama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNS Server".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "Mount Mᶜ Kinley National Park".UnPascalCase());
Welcome to Unicode
All these solutions are essentially wrong for modern text. You need to use something that understands case. Since Bob asked for other languages, I'll give a couple for Perl.
I provide four solutions, ranging from worst to best. Only the best one is always right. The others have problems. Here is a test run to show you what works and what doesn’t, and where. I’ve used underscores so that you can see where the spaces have been put, and I’ve marked as wrong anything that is, well, wrong.
Testing TheLoneRanger
Worst: The_Lone_Ranger
Ok: The_Lone_Ranger
Better: The_Lone_Ranger
Best: The_Lone_Ranger
Testing MountMᶜKinleyNationalPark
[WRONG] Worst: Mount_MᶜKinley_National_Park
[WRONG] Ok: Mount_MᶜKinley_National_Park
[WRONG] Better: Mount_MᶜKinley_National_Park
Best: Mount_Mᶜ_Kinley_National_Park
Testing ElÁlamoTejano
[WRONG] Worst: ElÁlamo_Tejano
Ok: El_Álamo_Tejano
Better: El_Álamo_Tejano
Best: El_Álamo_Tejano
Testing TheÆvarArnfjörðBjarmason
[WRONG] Worst: TheÆvar_ArnfjörðBjarmason
Ok: The_Ævar_Arnfjörð_Bjarmason
Better: The_Ævar_Arnfjörð_Bjarmason
Best: The_Ævar_Arnfjörð_Bjarmason
Testing IlCaffèMacchiato
[WRONG] Worst: Il_CaffèMacchiato
Ok: Il_Caffè_Macchiato
Better: Il_Caffè_Macchiato
Best: Il_Caffè_Macchiato
Testing MisterDženanLjubović
[WRONG] Worst: MisterDženanLjubović
[WRONG] Ok: MisterDženanLjubović
Better: Mister_Dženan_Ljubović
Best: Mister_Dženan_Ljubović
Testing OleKingHenryⅧ
[WRONG] Worst: Ole_King_HenryⅧ
[WRONG] Ok: Ole_King_HenryⅧ
[WRONG] Better: Ole_King_HenryⅧ
Best: Ole_King_Henry_Ⅷ
Testing CarlosⅤºElEmperador
[WRONG] Worst: CarlosⅤºEl_Emperador
[WRONG] Ok: CarlosⅤº_El_Emperador
[WRONG] Better: CarlosⅤº_El_Emperador
Best: Carlos_Ⅴº_El_Emperador
BTW, almost everyone here has selected the first way, the one marked "Worst". A few have selected the second way, marked "OK". But no one else before me has shown you how to do either the "Better" or the "Best" approach.
Here is the test program with its four methods:
#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
# First I'll prove these are fine variable names:
my (
$TheLoneRanger ,
$MountMᶜKinleyNationalPark ,
$ElÁlamoTejano ,
$TheÆvarArnfjörðBjarmason ,
$IlCaffèMacchiato ,
$MisterDženanLjubović ,
$OleKingHenryⅧ ,
$CarlosⅤºElEmperador ,
);
# Now I'll load up some string with those values in them:
my #strings = qw{
TheLoneRanger
MountMᶜKinleyNationalPark
ElÁlamoTejano
TheÆvarArnfjörðBjarmason
IlCaffèMacchiato
MisterDženanLjubović
OleKingHenryⅧ
CarlosⅤºElEmperador
};
my($new, $best, $ok);
my $mask = " %10s %-8s %s\n";
for my $old (#strings) {
print "Testing $old\n";
($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Worst:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Ok:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Better:", $new;
($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Best:", $new;
}
When you can score the same as the "Best" on this dataset, you’ll know you’ve done it correctly. Until then, you haven’t. No one else here has done better than "Ok", and most have done it "Worst". I look forward to seeing someone post the correct ℂ♯ code.
I notice that StackOverflow’s highlighting code is miserably stoopid again. They’re making all the same old lame as (most but not all) of the rest of the poor approaches mentioned here have made. Isn’t it long past time to put ASCII to rest? It doens’t make sense anymore, and pretending it’s all you have is simply wrong. It makes for bad code.
This Regex places a space character in front of every capital letter:
using System.Text.RegularExpressions;
const string myStringWithoutSpaces = "ThisIsAStringWithoutSpaces";
var myStringWithSpaces = Regex.Replace(myStringWithoutSpaces, "([A-Z])([a-z]*)", " $1$2");
Mind the space in front if "$1$2", this is what will get it done.
This is the outcome:
"This Is A String Without Spaces"
Binary Worrier, I have used your suggested code, and it is rather good, I have just one minor addition to it:
public static string AddSpacesToSentence(string text)
{
if (string.IsNullOrEmpty(text))
return "";
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < result.Length; i++)
{
if (char.IsUpper(result[i]) && !char.IsUpper(result[i - 1]))
{
newText.Append(' ');
}
else if (i < result.Length)
{
if (char.IsUpper(result[i]) && !char.IsUpper(result[i + 1]))
newText.Append(' ');
}
newText.Append(result[i]);
}
return newText.ToString();
}
I have added a condition !char.IsUpper(text[i - 1]). This fixed a bug that would cause something like 'AverageNOX' to be turned into 'Average N O X', which is obviously wrong, as it should read 'Average NOX'.
Sadly this still has the bug that if you have the text 'FromAStart', you would get 'From AStart' out.
Any thoughts on fixing this?
Inspired from #MartinBrown,
Two Lines of Simple Regex, which will resolve your name, including Acyronyms anywhere in the string.
public string ResolveName(string name)
{
var tmpDisplay = Regex.Replace(name, "([^A-Z ])([A-Z])", "$1 $2");
return Regex.Replace(tmpDisplay, "([A-Z]+)([A-Z][^A-Z$])", "$1 $2").Trim();
}
Here's mine:
private string SplitCamelCase(string s)
{
Regex upperCaseRegex = new Regex(#"[A-Z]{1}[a-z]*");
MatchCollection matches = upperCaseRegex.Matches(s);
List<string> words = new List<string>();
foreach (Match match in matches)
{
words.Add(match.Value);
}
return String.Join(" ", words.ToArray());
}
Make sure you aren't putting spaces at the beginning of the string, but you are putting them between consecutive capitals. Some of the answers here don't address one or both of those points. There are other ways than regex, but if you prefer to use that, try this:
Regex.Replace(value, #"\B[A-Z]", " $0")
The \B is a negated \b, so it represents a non-word-boundary. It means the pattern matches "Y" in XYzabc, but not in Yzabc or X Yzabc. As a little bonus, you can use this on a string with spaces in it and it won't double them.
What you have works perfectly. Just remember to reassign value to the return value of this function.
value = System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0");
Here is how you could do it in SQL
create FUNCTION dbo.PascalCaseWithSpace(#pInput AS VARCHAR(MAX)) RETURNS VARCHAR(MAX)
BEGIN
declare #output varchar(8000)
set #output = ''
Declare #vInputLength INT
Declare #vIndex INT
Declare #vCount INT
Declare #PrevLetter varchar(50)
SET #PrevLetter = ''
SET #vCount = 0
SET #vIndex = 1
SET #vInputLength = LEN(#pInput)
WHILE #vIndex <= #vInputLength
BEGIN
IF ASCII(SUBSTRING(#pInput, #vIndex, 1)) = ASCII(Upper(SUBSTRING(#pInput, #vIndex, 1)))
begin
if(#PrevLetter != '' and ASCII(#PrevLetter) = ASCII(Lower(#PrevLetter)))
SET #output = #output + ' ' + SUBSTRING(#pInput, #vIndex, 1)
else
SET #output = #output + SUBSTRING(#pInput, #vIndex, 1)
end
else
begin
SET #output = #output + SUBSTRING(#pInput, #vIndex, 1)
end
set #PrevLetter = SUBSTRING(#pInput, #vIndex, 1)
SET #vIndex = #vIndex + 1
END
return #output
END
replaceAll("(?<=[^^\\p{Uppercase}])(?=[\\p{Uppercase}])"," ");
static string AddSpacesToColumnName(string columnCaption)
{
if (string.IsNullOrWhiteSpace(columnCaption))
return "";
StringBuilder newCaption = new StringBuilder(columnCaption.Length * 2);
newCaption.Append(columnCaption[0]);
int pos = 1;
for (pos = 1; pos < columnCaption.Length-1; pos++)
{
if (char.IsUpper(columnCaption[pos]) && !(char.IsUpper(columnCaption[pos - 1]) && char.IsUpper(columnCaption[pos + 1])))
newCaption.Append(' ');
newCaption.Append(columnCaption[pos]);
}
newCaption.Append(columnCaption[pos]);
return newCaption.ToString();
}
In Ruby, via Regexp:
"FooBarBaz".gsub(/(?!^)(?=[A-Z])/, ' ') # => "Foo Bar Baz"
I took Kevin Strikers excellent solution and converted to VB. Since i'm locked into .NET 3.5, i also had to write IsNullOrWhiteSpace. This passes all of his tests.
<Extension()>
Public Function IsNullOrWhiteSpace(value As String) As Boolean
If value Is Nothing Then
Return True
End If
For i As Integer = 0 To value.Length - 1
If Not Char.IsWhiteSpace(value(i)) Then
Return False
End If
Next
Return True
End Function
<Extension()>
Public Function UnPascalCase(text As String) As String
If text.IsNullOrWhiteSpace Then
Return String.Empty
End If
Dim newText = New StringBuilder()
newText.Append(text(0))
For i As Integer = 1 To text.Length - 1
Dim currentUpper = Char.IsUpper(text(i))
Dim prevUpper = Char.IsUpper(text(i - 1))
Dim nextUpper = If(text.Length > i + 1, Char.IsUpper(text(i + 1)) Or Char.IsWhiteSpace(text(i + 1)), prevUpper)
Dim spaceExists = Char.IsWhiteSpace(text(i - 1))
If (currentUpper And Not spaceExists And (Not nextUpper Or Not prevUpper)) Then
newText.Append(" ")
End If
newText.Append(text(i))
Next
Return newText.ToString()
End Function
The question is a bit old but nowadays there is a nice library on Nuget that does exactly this as well as many other conversions to human readable text.
Check out Humanizer on GitHub or Nuget.
Example
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Underscored_input_String_is_turned_INTO_sentence".Humanize() => "Underscored input String is turned INTO sentence"
// acronyms are left intact
"HTML".Humanize() => "HTML"
Seems like a good opportunity for Aggregate. This is designed to be readable, not necessarily especially fast.
someString
.Aggregate(
new StringBuilder(),
(str, ch) => {
if (char.IsUpper(ch) && str.Length > 0)
str.Append(" ");
str.Append(ch);
return str;
}
).ToString();
Found a lot of these answers to be rather obtuse but I haven't fully tested my solution, but it works for what I need, should handle acronyms, and is much more compact/readable than the others IMO:
private string CamelCaseToSpaces(string s)
{
if (string.IsNullOrEmpty(s)) return string.Empty;
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
stringBuilder.Append(s[i]);
int nextChar = i + 1;
if (nextChar < s.Length && char.IsUpper(s[nextChar]) && !char.IsUpper(s[i]))
{
stringBuilder.Append(" ");
}
}
return stringBuilder.ToString();
}
In addition to Martin Brown's Answer, I had an issue with numbers as well. For Example: "Location2", or "Jan22" should be "Location 2", and "Jan 22" respectively.
Here is my Regular Expression for doing that, using Martin Brown's answer:
"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))|((?<=[\p{Ll}\p{Lu}])\p{Nd})|((?<=\p{Nd})\p{Lu})"
Here are a couple great sites for figuring out what each part means as well:
Java Based Regular Expression Analyzer (but works for most .net regex's)
Action Script Based Analyzer
The above regex won't work on the action script site unless you replace all of the \p{Ll} with [a-z], the \p{Lu} with [A-Z], and \p{Nd} with [0-9].
Here's my solution, based on Binary Worriers suggestion and building in Richard Priddys' comments, but also taking into account that white space may exist in the provided string, so it won't add white space next to existing white space.
public string AddSpacesBeforeUpperCase(string nonSpacedString)
{
if (string.IsNullOrEmpty(nonSpacedString))
return string.Empty;
StringBuilder newText = new StringBuilder(nonSpacedString.Length * 2);
newText.Append(nonSpacedString[0]);
for (int i = 1; i < nonSpacedString.Length; i++)
{
char currentChar = nonSpacedString[i];
// If it is whitespace, we do not need to add another next to it
if(char.IsWhiteSpace(currentChar))
{
continue;
}
char previousChar = nonSpacedString[i - 1];
char nextChar = i < nonSpacedString.Length - 1 ? nonSpacedString[i + 1] : nonSpacedString[i];
if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar)
&& !(char.IsUpper(previousChar) && char.IsUpper(nextChar)))
{
newText.Append(' ');
}
else if (i < nonSpacedString.Length)
{
if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar) && !char.IsUpper(nextChar))
{
newText.Append(' ');
}
}
newText.Append(currentChar);
}
return newText.ToString();
}
For anyone who is looking for a C++ function answering this same question, you can use the following. This is modeled after the answer given by #Binary Worrier. This method just preserves Acronyms automatically.
using namespace std;
void AddSpacesToSentence(string& testString)
stringstream ss;
ss << testString.at(0);
for (auto it = testString.begin() + 1; it != testString.end(); ++it )
{
int index = it - testString.begin();
char c = (*it);
if (isupper(c))
{
char prev = testString.at(index - 1);
if (isupper(prev))
{
if (index < testString.length() - 1)
{
char next = testString.at(index + 1);
if (!isupper(next) && next != ' ')
{
ss << ' ';
}
}
}
else if (islower(prev))
{
ss << ' ';
}
}
ss << c;
}
cout << ss.str() << endl;
The tests strings I used for this function, and the results are:
"helloWorld" -> "hello World"
"HelloWorld" -> "Hello World"
"HelloABCWorld" -> "Hello ABC World"
"HelloWorldABC" -> "Hello World ABC"
"ABCHelloWorld" -> "ABC Hello World"
"ABC HELLO WORLD" -> "ABC HELLO WORLD"
"ABCHELLOWORLD" -> "ABCHELLOWORLD"
"A" -> "A"
A C# solution for an input string that consists only of ASCII characters. The regex incorporates negative lookbehind to ignore a capital (upper case) letter that appears at the beginning of the string. Uses Regex.Replace() to return the desired string.
Also see regex101.com demo.
using System;
using System.Text.RegularExpressions;
public class RegexExample
{
public static void Main()
{
var text = "ThisStringHasNoSpacesButItDoesHaveCapitals";
// Use negative lookbehind to match all capital letters
// that do not appear at the beginning of the string.
var pattern = "(?<!^)([A-Z])";
var rgx = new Regex(pattern);
var result = rgx.Replace(text, " $1");
Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
}
}
Expected Output:
Input: [ThisStringHasNoSpacesButItDoesHaveCapitals]
Output: [This String Has No Spaces But It Does Have Capitals]
Update: Here's a variation that will also handle acronyms (sequences of upper-case letters).
Also see regex101.com demo and ideone.com demo.
using System;
using System.Text.RegularExpressions;
public class RegexExample
{
public static void Main()
{
var text = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
// Use positive lookbehind to locate all upper-case letters
// that are preceded by a lower-case letter.
var patternPart1 = "(?<=[a-z])([A-Z])";
// Used positive lookbehind and lookahead to locate all
// upper-case letters that are preceded by an upper-case
// letter and followed by a lower-case letter.
var patternPart2 = "(?<=[A-Z])([A-Z])(?=[a-z])";
var pattern = patternPart1 + "|" + patternPart2;
var rgx = new Regex(pattern);
var result = rgx.Replace(text, " $1$2");
Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
}
}
Expected Output:
Input: [ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ]
Output: [This String Has No Spaces ASCII But It Does Have Capitals LINQ]
Here is a more thorough solution that doesn't put spaces in front of words:
Note: I have used multiple Regexs (not concise but it will also handle acronyms and single letter words)
Dim s As String = "ThisStringHasNoSpacesButItDoesHaveCapitals"
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z](?=[A-Z])[a-z]*)", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([A-Z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2") // repeat a second time
In:
"ThisStringHasNoSpacesButItDoesHaveCapitals"
"IAmNotAGoat"
"LOLThatsHilarious!"
"ThisIsASMSMessage"
Out:
"This String Has No Spaces But It Does Have Capitals"
"I Am Not A Goat"
"LOL Thats Hilarious!"
"This Is ASMS Message" // (Difficult to handle single letter words when they are next to acronyms.)
All the previous responses looked too over complicated.
I had string that had a mixture of capitals and _ so used, string.Replace() to make the _, " " and used the following to add a space at the capital letters.
for (int i = 0; i < result.Length; i++)
{
if (char.IsUpper(result[i]))
{
counter++;
if (i > 1) //stops from adding a space at if string starts with Capital
{
result = result.Insert(i, " ");
i++; //Required** otherwise stuck in infinite
//add space loop over a single capital letter.
}
}
}
Inspired by Binary Worrier answer I took a swing at this.
Here's the result:
/// <summary>
/// String Extension Method
/// Adds white space to strings based on Upper Case Letters
/// </summary>
/// <example>
/// strIn => "HateJPMorgan"
/// preserveAcronyms false => "Hate JP Morgan"
/// preserveAcronyms true => "Hate JPMorgan"
/// </example>
/// <param name="strIn">to evaluate</param>
/// <param name="preserveAcronyms" >determines saving acronyms (Optional => false) </param>
public static string AddSpaces(this string strIn, bool preserveAcronyms = false)
{
if (string.IsNullOrWhiteSpace(strIn))
return String.Empty;
var stringBuilder = new StringBuilder(strIn.Length * 2)
.Append(strIn[0]);
int i;
for (i = 1; i < strIn.Length - 1; i++)
{
var c = strIn[i];
if (Char.IsUpper(c) && (Char.IsLower(strIn[i - 1]) || (preserveAcronyms && Char.IsLower(strIn[i + 1]))))
stringBuilder.Append(' ');
stringBuilder.Append(c);
}
return stringBuilder.Append(strIn[i]).ToString();
}
Did test using stopwatch running 10000000 iterations and various string lengths and combinations.
On average 50% (maybe a bit more) faster than Binary Worrier answer.
private string GetProperName(string Header)
{
if (Header.ToCharArray().Where(c => Char.IsUpper(c)).Count() == 1)
{
return Header;
}
else
{
string ReturnHeader = Header[0].ToString();
for(int i=1; i<Header.Length;i++)
{
if (char.IsLower(Header[i-1]) && char.IsUpper(Header[i]))
{
ReturnHeader += " " + Header[i].ToString();
}
else
{
ReturnHeader += Header[i].ToString();
}
}
return ReturnHeader;
}
return Header;
}
This one includes acronyms and acronym plurals and is a bit faster than the accepted answer:
public string Sentencify(string value)
{
if (string.IsNullOrWhiteSpace(value))
return string.Empty;
string final = string.Empty;
for (int i = 0; i < value.Length; i++)
{
if (i != 0 && Char.IsUpper(value[i]))
{
if (!Char.IsUpper(value[i - 1]))
final += " ";
else if (i < (value.Length - 1))
{
if (!Char.IsUpper(value[i + 1]) && !((value.Length >= i && value[i + 1] == 's') ||
(value.Length >= i + 1 && value[i + 1] == 'e' && value[i + 2] == 's')))
final += " ";
}
}
final += value[i];
}
return final;
}
Passes these tests:
string test1 = "RegularOTs";
string test2 = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
string test3 = "ThisStringHasNoSpacesButItDoesHaveCapitals";
An implementation with fold, also known as Aggregate:
public static string SpaceCapitals(this string arg) =>
new string(arg.Aggregate(new List<Char>(),
(accum, x) =>
{
if (Char.IsUpper(x) &&
accum.Any() &&
// prevent double spacing
accum.Last() != ' ' &&
// prevent spacing acronyms (ASCII, SCSI)
!Char.IsUpper(accum.Last()))
{
accum.Add(' ');
}
accum.Add(x);
return accum;
}).ToArray());
In addition to the request, this implementation correctly saves leading, inner, trailing spaces and acronyms, for example,
" SpacedWord " => " Spaced Word ",
"Inner Space" => "Inner Space",
"SomeACRONYM" => "Some ACRONYM".

Categories