Find a match for query within text

Find a match for query within text - c#

This question was asked recently during interview and i couldn't solve so need some suggestion how can i solve the problem
Declaration: I can not use REGEX or any in-built libraries
***** Problem Statement is as below*********
**matches
Input: text (string), query (string)
Output: true if you can find a match for query within text, false otherwise
If there are no special characters, most languages have a contains method that will just do this.
One special character: '?' -- if you find '?' in the query string, it signals that the previous character is optional (matches 0 or 1 times).
Examples:
No question marks:
matches("hello World", "hello") returns true
matches("hello World", "world") returns false
matches("hello World", "o W")returns true
matches("hello World", "W o") returns false
matches("hello World", "h W") returns false
With question marks -- "l?" means "optional l":
matches("heo World", "hel?o") returns true
matches("helo World", "hel?o") returns true
matches("hello World", "hel?o") returns false
Make sure you understand this case:
matches("hello World", "hell?lo") returns true
You can have more than one question mark:
matches("hello World", "ex?llo Worlds?") returns true
***** My approach was as below*********
public class StringPatternMatch
{
public static bool MatchPattern(string inputText, string pattern)
{
int count = 0; int patternIndex = 0;
for (var i = 0; i < inputText.Length; i++)
{
if (patternIndex > pattern.Length)
break;
if (inputText[i] == pattern[patternIndex] ||
(inputText[i] != pattern[patternIndex] && pattern[patternIndex + 1] == '?'))
count++;
patternIndex++;
}
return pattern.Length == count;
}
}
traverse both strings from one side to other side (say from rightmost character to leftmost). If we find a matching character, we move ahead in both strings with increasing counter for pattern - at the end match count with pattern-length
Also i have provided my code but that doesn't cover all the cases
Of course i didn't go next round, but i am still thinking about this problem and haven't found accurate solution - hope to see some interesting answers!

Your idea can work but your implementation is over-simplified:
// assumes the pattern is valid, e.g. no ??
public static boolean matches(String string, String pattern) {
int p = 0; // position in pattern
// because we only return boolean we can discard all optional characters at the beginning of the pattern
while (p + 1 < pattern.length() && pattern.charAt(p + 1) == '?')
p += 2;
if (p >= pattern.length())
return true;
for (int s = 0; s < string.length(); s++) // s is position in string
// find a valid start position for the first mandatory character in pattern and check if it matches
if (string.charAt(s) == pattern.charAt(p) && matches(string, pattern, s + 1, p + 1))
return true;
return false;
}
private static boolean matches(String string, String pattern, int s, int p) {
if (p >= pattern.length()) // end of pattern reached
return true;
if (s >= string.length() || string.charAt(s) != pattern.charAt(p)) // end of string or no match
// if the next character of the pattern is optional check if the rest matches
return p + 1 < pattern.length() && pattern.charAt(p + 1) == '?' && matches(string, pattern, s, p + 2);
// here we know the characters are matching
if (p + 1 < pattern.length() && pattern.charAt(p + 1) == '?') // if it is an optional character
// check if matching the optional character or skipping it leads to success
return matches(string, pattern, s + 1, p + 2) || matches(string, pattern, s, p + 2);
// the character wasn't optional (but matched as we know from above)
return matches(string, pattern, s + 1, p + 1);
}

Related

C#, Finding equal amount of brackets

I've made my own way to check if the amount of () and "" are equal. So for example "H(ell)o") is correct. However, the problem I face is that what if the first bracket is ) and the other is ( example "H)ell(o" this would mean it's incorrect. So my question is how would I check whether the first bracket in any word is opening?
EDIT:
public static Boolean ArTinkaSintakse(char[] simboliai)
{
int openingBracketsAmount = 0;
int closingBracketsAmount = 0;
int quotationMarkAmount = 0;
for (int i = 0; i < simboliai.Length; i++)
{
if (openingBracketsAmount == 0 && simboliai[i] == ')')
break;
else if (simboliai[i] == '\"')
quotationMarkAmount++;
else if (simboliai[i] == '(')
openingBracketsAmount++;
else if (simboliai[i] == ')')
closingBracketsAmount++;
}
int bracketAmount = openingBracketsAmount + closingBracketsAmount;
if (quotationMarkAmount % 2 == 0 && bracketAmount % 2 == 0)
return true;
else
return false;
}

Add a check for if (openingBracketsAmount < closingBracketsAmount). If that's ever true, you know that the brackets are unbalanced.

Add an if statement that breaks out of the loop if the first bracket encountered is a closing bracket, like so:
for (int i = 0; i < word.Length; i++)
{
if ((openingBracketsAmount == 0) && (word[i] == ')'))
{
<Log error...>
break;
}
<Rest of your loop...>
}
This way, as soon as openingBracketsAmount is updated, this if statement will be unreachable.

I would approach the problem recursively.
Create a Dictionary<char, char> to keep track of which opening character goes with which closing one.
Then I would implement:
boolean findMatches(string input, out int lastIndex) {}
The method will scan until a member of the Dictionary keys is found. Then it will recursively call itself with the remainder of the string. If the recursive call comes back false, return that. If true, check if the lastIndex character (or the one after; I always need to write the code to check fenceposts) is the matching bracket you want. If it is, and you're not at the end of the string, return the value of a recursive call with the rest of the string after that matching bracket. If you are at the end, return true with that character's index. If that character isn't the matching bracket/quote, pass the remainder of the string (including that last character) to another recursive call.
Continue until you reach the end of the string (returning true if you aren't matching a bracket or quote, false otherwise). Either way with a lastIndex of the last character.

Probably you need a stack-based approach to validate such expressions.
Please try the following code:
public static bool IsValid(string s)
{
var pairs = new List<Tuple<char, char>>
{
new Tuple<char, char>('(', ')'),
new Tuple<char, char>('{', '}'),
new Tuple<char, char>('[', ']'),
};
var openTags = new HashSet<char>();
var closeTags = new Dictionary<char, char>(pairs.Count);
foreach (var p in pairs)
{
openTags.Add(p.Item1);
closeTags.Add(p.Item2, p.Item1);
}
// Remove quoted parts
Regex r = new Regex("\".*?\"", RegexOptions.Compiled | RegexOptions.Multiline);
s = r.Replace(s, string.Empty);
var opened = new Stack<char>();
foreach (var ch in s)
{
if (ch == '"')
{
// Unpaired quote char
return false;
}
if (openTags.Contains(ch))
{
// This is a legal open tag
opened.Push(ch);
}
else if (closeTags.TryGetValue(ch, out var openTag) && openTag != opened.Pop())
{
// This is an illegal close tag or an unbalanced legal close tag
return false;
}
}
return true;
}

I want make some validation string that match the format below

I have a textbox, and I want to validate that it matches the pattern Car Plate = [X]__[####]_[ZZZ].
[X] = One upper case letter
_ = Space
[####] = Four digit number
[ZZZ] = Three upper case letter
Example : A 1234 BCD
How do I set validation to match this in a textbox?
this is my code according sir dimitri
private void isvalidplate(string a)
{
if (a[0] < 'A' && a[0] > 'Z')
{
MessageBox.Show("Car Plate is invalid!");
}
else if (a[1] != ' ' && a[5] != ' ')
{
MessageBox.Show("Car Plate is invalid!");
}
else if (a[2] != Int64.Parse(a) && a[3]!= Int64.Parse(a) && a[4]!= Int64.Parse(a) )
{
MessageBox.Show("Car Plate is invalid!");
}
else if ((a[6] < 'A' && a[6] > 'Z')&&(a[7] < 'A' && a[7] > 'Z')&&(a[8] < 'A' && a[8] > 'Z')&&(a[9] < 'A' && a[9] > 'Z'))
{
MessageBox.Show("Car Plate is invalid!");
}
}
but it show an error that "input String Was Not In A correct Format"
the error is in this line
else if (a[2] != Int64.Parse(a) && a[3]!= Int64.Parse(a) && a[4]!= Int64.Parse(a) )

A common tool used across many languages for string validation and parsing is Regular Expressions (often referred to as regex). Getting used to using them is very handy as a developer. A regex matching what you need looks like:
^[A-Z]\s\d{4}\s[A-Z]{3}$
This site shows your regex in action. In C#, you could test your string using the Regex library:
bool valid = Regex.IsMatch(myTestString, #"^[A-Z]\s\d{4}\s[A-Z]{3}$");
There are tons of resources on learning regex online.

So you have to implement explicit tests:
public static String IsCarPlateValid(String value) {
// we don't accept null or empty strings
if (String.IsNullOrEmpty(value))
return false;
// should contain exactly 11 characters
// letter{1}, space {2}, digit{4}, space{1}, letter{3}
// 1 + 2 + 4 + 1 + 3 == 11
if (value.Length != 11)
return false;
// First character should be in ['A'..'Z'] range;
// if it either less than 'A' or bigger than 'Z', car plate is wrong
if ((value[0] < 'A') || (value[0] > 'Z'))
return false;
//TODO: it's your homework, implement the rest
// All tests passed
return true;
}
To test your implementation you can use regular expression:
Boolean isValid = Regex.IsMatch(carPlate, "^[A-Z]{1} {2}[0-9]{4} {1}[A-Z]{3}$");
with evident meaning: one letter, two spaces, four digits, one space, three letters.

How find sub-string(0,91) from long string?

I write this program in c#:
static void Main(string[] args)
{
int i;
string ss = "fc7600109177";
// I want to found (0,91) in ss string
for (i=0; i<= ss.Length; i++)
if (((char)ss[i] == '0') && (((char)ss[i+1] + (char)ss[i+2]) == "91" ))
Console.WriteLine(" found");
}
What's wrong in this program and how can I find (0,91)?

First of all, you don't have to cast to char your ss[i] or others. ss[i] and others are already char.
As a second, you try to concatanate two char (ss[i+1] and ss[i+2]) in your if loop and after you check equality with a string. This is wrong. Change it to;
if ( (ss[i] == '0') && (ss[i + 1] == '9') && (ss[i + 2]) == '1')
Console.WriteLine("found");
As a third, which I think the most important, don't write code like that. You can easly use String.Contains method which does exactly what you want.
Returns a value indicating whether the specified String object occurs
within this string.
string ss = "fc7600109177";
bool found = ss.Contains("091");
Here a DEMO.
use "contain" return only true or false and "index of" return location
of string but I want to find location of "091" in ss and if "091"
repeat like: ss ="763091d44a0914" how can I find second "091" ??
Here how you can find all indexes in your string;
string chars = "091";
string ss = "763091d44a0914";
List<int> indexes = new List<int>();
foreach ( Match match in Regex.Matches(ss, chars) )
{
indexes.Add(match.Index);
}
for (int i = 0; i < indexes.Count; i++)
{
Console.WriteLine("{0}. match in index {1}", i+1, indexes[i]);
}
Output will be;
1. match in index: 3
2. match in index: 10
Here a DEMO.

Use String.Contains() for this purpose
if(ss.Contains("091"))
{
Console.WriteLine(" found");
}

if you want to know where "091" starts in the string then you can use:
var pos = ss.IndexOf("091")

Parsing strings recursively

I am trying to extract information out of a string - a fortran formatting string to be specific. The string is formatted like:
F8.3, I5, 3(5X, 2(A20,F10.3)), 'XXX'
with formatting fields delimited by "," and formatting groups inside brackets, with the number in front of the brackets indicating how many consecutive times the formatting pattern is repeated. So, the string above expands to:
F8.3, I5, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 5X, A20,F10.3, A20,F10.3, 'XXX'
I am trying to make something in C# that will expand a string that conforms to that pattern. I have started going about it with lots of switch and if statements, but am wondering if I am not going about it the wrong way?
I was basically wondering if some Regex wizzard thinks that Regular expressions can do this in one neat-fell swoop? I know nothing about regular expressions, but if this could solve my problem I am considering putting in some time to learn how to use them... on the other hand if regular expressions can't sort this out then I'd rather spend my time looking at another method.

This has to be doable with Regex :)
I've expanded my previous example and it test nicely with your example.
// regex to match the inner most patterns of n(X) and capture the values of n and X.
private static readonly Regex matcher = new Regex(#"(\d+)\(([^(]*?)\)", RegexOptions.None);
// create new string by repeating X n times, separated with ','
private static string Join(Match m)
{
var n = Convert.ToInt32(m.Groups[1].Value); // get value of n
var x = m.Groups[2].Value; // get value of X
return String.Join(",", Enumerable.Repeat(x, n));
}
// expand the string by recursively replacing the innermost values of n(X).
private static string Expand(string text)
{
var s = matcher.Replace(text, Join);
return (matcher.IsMatch(s)) ? Expand(s) : s;
}
// parse a string for occurenses of n(X) pattern and expand then.
// return the string as a tokenized array.
public static string[] Parse(string text)
{
// Check that the number of parantheses is even.
if (text.Sum(c => (c == '(' || c == ')') ? 1 : 0) % 2 == 1)
throw new ArgumentException("The string contains an odd number of parantheses.");
return Expand(text).Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries);
}

I would suggest using a recusive method like the example below( not tested ):
ResultData Parse(String value, ref Int32 index)
{
ResultData result = new ResultData();
Index startIndex = index; // Used to get substrings
while (index < value.Length)
{
Char current = value[index];
if (current == '(')
{
index++;
result.Add(Parse(value, ref index));
startIndex = index;
continue;
}
if (current == ')')
{
// Push last result
index++;
return result;
}
// Process all other chars here
}
// We can't find the closing bracket
throw new Exception("String is not valid");
}
You maybe need to modify some parts of the code, but this method have i used when writing a simple compiler. Although it's not completed, just a example.

Personally, I would suggest using a recursive function instead. Every time you hit an opening parenthesis, call the function again to parse that part. I'm not sure if you can use a regex to match a recursive data structure.
(Edit: Removed incorrect regex)

Ended up rewriting this today. It turns out that this can be done in one single method:
private static string ExpandBrackets(string Format)
{
int maxLevel = CountNesting(Format);
for (int currentLevel = maxLevel; currentLevel > 0; currentLevel--)
{
int level = 0;
int start = 0;
int end = 0;
for (int i = 0; i < Format.Length; i++)
{
char thisChar = Format[i];
switch (Format[i])
{
case '(':
level++;
if (level == currentLevel)
{
string group = string.Empty;
int repeat = 0;
/// Isolate the number of repeats if any
/// If there are 0 repeats the set to 1 so group will be replaced by itself with the brackets removed
for (int j = i - 1; j >= 0; j--)
{
char c = Format[j];
if (c == ',')
{
start = j + 1;
break;
}
if (char.IsDigit(c))
repeat = int.Parse(c + (repeat != 0 ? repeat.ToString() : string.Empty));
else
throw new Exception("Non-numeric character " + c + " found in front of the brackets");
}
if (repeat == 0)
repeat = 1;
/// Isolate the format group
/// Parse until the first closing bracket. Level is decremented as this effectively takes us down one level
for (int j = i + 1; j < Format.Length; j++)
{
char c = Format[j];
if (c == ')')
{
level--;
end = j;
break;
}
group += c;
}
/// Substitute the expanded group for the original group in the format string
/// If the group is empty then just remove it from the string
if (string.IsNullOrEmpty(group))
{
Format = Format.Remove(start - 1, end - start + 2);
i = start;
}
else
{
string repeatedGroup = RepeatString(group, repeat);
Format = Format.Remove(start, end - start + 1).Insert(start, repeatedGroup);
i = start + repeatedGroup.Length - 1;
}
}
break;
case ')':
level--;
break;
}
}
}
return Format;
}
CountNesting() returns the highest level of bracket nesting in the format statement, but could be passed in as a parameter to the method. RepeatString() just repeats a string the specified number of times and substitutes it for the bracketed group in the format string.

Add spaces before Capital Letters

Given the string "ThisStringHasNoSpacesButItDoesHaveCapitals" what is the best way to add spaces before the capital letters. So the end string would be "This String Has No Spaces But It Does Have Capitals"
Here is my attempt with a RegEx
System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")

The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)
This function
string AddSpacesToSentence(string text, bool preserveAcronyms)
{
if (string.IsNullOrWhiteSpace(text))
return string.Empty;
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]))
if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
(preserveAcronyms && char.IsUpper(text[i - 1]) &&
i < text.Length - 1 && !char.IsUpper(text[i + 1])))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).
It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.
Update
It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).
On a string with 'Abbbbbbbbb' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.
Update 2
Will it take Acronyms into account? It will now!
The logic of the if statment is fairly obscure, as you can see expanding it to this ...
if (char.IsUpper(text[i]))
if (char.IsUpper(text[i - 1]))
if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
newText.Append(' ');
else ;
else if (text[i - 1] != ' ')
newText.Append(' ');
... doesn't help at all!
Here's the original simple method that doesn't worry about Acronyms
string AddSpacesToSentence(string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]) && text[i - 1] != ' ')
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}

Your solution has an issue in that it puts a space before the first letter T so you get
" This String..." instead of "This String..."
To get around this look for the lower case letter preceding it as well and then insert the space in the middle:
newValue = Regex.Replace(value, "([a-z])([A-Z])", "$1 $2");
Edit 1:
If you use #"(\p{Ll})(\p{Lu})" it will pick up accented characters as well.
Edit 2:
If your strings can contain acronyms you may want to use this:
newValue = Regex.Replace(value, #"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " $0");
So "DriveIsSCSICompatible" becomes "Drive Is SCSI Compatible"

Didn't test performance, but here in one line with linq:
var val = "ThisIsAStringToTest";
val = string.Concat(val.Select(x => Char.IsUpper(x) ? " " + x : x.ToString())).TrimStart(' ');

I know this is an old one, but this is an extension I use when I need to do this:
public static class Extensions
{
public static string ToSentence( this string Input )
{
return new string(Input.SelectMany((c, i) => i > 0 && char.IsUpper(c) ? new[] { ' ', c } : new[] { c }).ToArray());
}
}
This will allow you to use MyCasedString.ToSentence()

I set out to make a simple extension method based on Binary Worrier's code which will handle acronyms properly, and is repeatable (won't mangle already spaced words). Here is my result.
public static string UnPascalCase(this string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
var newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
var currentUpper = char.IsUpper(text[i]);
var prevUpper = char.IsUpper(text[i - 1]);
var nextUpper = (text.Length > i + 1) ? char.IsUpper(text[i + 1]) || char.IsWhiteSpace(text[i + 1]): prevUpper;
var spaceExists = char.IsWhiteSpace(text[i - 1]);
if (currentUpper && !spaceExists && (!nextUpper || !prevUpper))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Here are the unit test cases this function passes. I added most of tchrist's suggested cases to this list. The three of those it doesn't pass (two are just Roman numerals) are commented out:
Assert.AreEqual("For You And I", "ForYouAndI".UnPascalCase());
Assert.AreEqual("For You And The FBI", "ForYouAndTheFBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "AManAPlanACanalPanama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNSServer".UnPascalCase());
Assert.AreEqual("For You And I", "For You And I".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "MountMᶜKinleyNationalPark".UnPascalCase());
Assert.AreEqual("El Álamo Tejano", "ElÁlamoTejano".UnPascalCase());
Assert.AreEqual("The Ævar Arnfjörð Bjarmason", "TheÆvarArnfjörðBjarmason".UnPascalCase());
Assert.AreEqual("Il Caffè Macchiato", "IlCaffèMacchiato".UnPascalCase());
//Assert.AreEqual("Mister ǅenan ǈubović", "Misterǅenanǈubović".UnPascalCase());
//Assert.AreEqual("Ole King Henry Ⅷ", "OleKingHenryⅧ".UnPascalCase());
//Assert.AreEqual("Carlos Ⅴº El Emperador", "CarlosⅤºElEmperador".UnPascalCase());
Assert.AreEqual("For You And The FBI", "For You And The FBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "A Man A Plan A Canal Panama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNS Server".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "Mount Mᶜ Kinley National Park".UnPascalCase());

Welcome to Unicode
All these solutions are essentially wrong for modern text. You need to use something that understands case. Since Bob asked for other languages, I'll give a couple for Perl.
I provide four solutions, ranging from worst to best. Only the best one is always right. The others have problems. Here is a test run to show you what works and what doesn’t, and where. I’ve used underscores so that you can see where the spaces have been put, and I’ve marked as wrong anything that is, well, wrong.
Testing TheLoneRanger
Worst: The_Lone_Ranger
Ok: The_Lone_Ranger
Better: The_Lone_Ranger
Best: The_Lone_Ranger
Testing MountMᶜKinleyNationalPark
[WRONG] Worst: Mount_MᶜKinley_National_Park
[WRONG] Ok: Mount_MᶜKinley_National_Park
[WRONG] Better: Mount_MᶜKinley_National_Park
Best: Mount_Mᶜ_Kinley_National_Park
Testing ElÁlamoTejano
[WRONG] Worst: ElÁlamo_Tejano
Ok: El_Álamo_Tejano
Better: El_Álamo_Tejano
Best: El_Álamo_Tejano
Testing TheÆvarArnfjörðBjarmason
[WRONG] Worst: TheÆvar_ArnfjörðBjarmason
Ok: The_Ævar_Arnfjörð_Bjarmason
Better: The_Ævar_Arnfjörð_Bjarmason
Best: The_Ævar_Arnfjörð_Bjarmason
Testing IlCaffèMacchiato
[WRONG] Worst: Il_CaffèMacchiato
Ok: Il_Caffè_Macchiato
Better: Il_Caffè_Macchiato
Best: Il_Caffè_Macchiato
Testing Misterǅenanǈubović
[WRONG] Worst: Misterǅenanǈubović
[WRONG] Ok: Misterǅenanǈubović
Better: Mister_ǅenan_ǈubović
Best: Mister_ǅenan_ǈubović
Testing OleKingHenryⅧ
[WRONG] Worst: Ole_King_HenryⅧ
[WRONG] Ok: Ole_King_HenryⅧ
[WRONG] Better: Ole_King_HenryⅧ
Best: Ole_King_Henry_Ⅷ
Testing CarlosⅤºElEmperador
[WRONG] Worst: CarlosⅤºEl_Emperador
[WRONG] Ok: CarlosⅤº_El_Emperador
[WRONG] Better: CarlosⅤº_El_Emperador
Best: Carlos_Ⅴº_El_Emperador
BTW, almost everyone here has selected the first way, the one marked "Worst". A few have selected the second way, marked "OK". But no one else before me has shown you how to do either the "Better" or the "Best" approach.
Here is the test program with its four methods:
#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
# First I'll prove these are fine variable names:
my (
$TheLoneRanger ,
$MountMᶜKinleyNationalPark ,
$ElÁlamoTejano ,
$TheÆvarArnfjörðBjarmason ,
$IlCaffèMacchiato ,
$Misterǅenanǈubović ,
$OleKingHenryⅧ ,
$CarlosⅤºElEmperador ,
);
# Now I'll load up some string with those values in them:
my #strings = qw{
TheLoneRanger
MountMᶜKinleyNationalPark
ElÁlamoTejano
TheÆvarArnfjörðBjarmason
IlCaffèMacchiato
Misterǅenanǈubović
OleKingHenryⅧ
CarlosⅤºElEmperador
};
my($new, $best, $ok);
my $mask = " %10s %-8s %s\n";
for my $old (#strings) {
print "Testing $old\n";
($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Worst:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Ok:", $new;
($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Better:", $new;
($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
$ok = ($new ne $best) && "[WRONG]";
printf $mask, $ok, "Best:", $new;
}
When you can score the same as the "Best" on this dataset, you’ll know you’ve done it correctly. Until then, you haven’t. No one else here has done better than "Ok", and most have done it "Worst". I look forward to seeing someone post the correct ℂ♯ code.
I notice that StackOverflow’s highlighting code is miserably stoopid again. They’re making all the same old lame as (most but not all) of the rest of the poor approaches mentioned here have made. Isn’t it long past time to put ASCII to rest? It doens’t make sense anymore, and pretending it’s all you have is simply wrong. It makes for bad code.

This Regex places a space character in front of every capital letter:
using System.Text.RegularExpressions;
const string myStringWithoutSpaces = "ThisIsAStringWithoutSpaces";
var myStringWithSpaces = Regex.Replace(myStringWithoutSpaces, "([A-Z])([a-z]*)", " $1$2");
Mind the space in front if "$1$2", this is what will get it done.
This is the outcome:
"This Is A String Without Spaces"

Binary Worrier, I have used your suggested code, and it is rather good, I have just one minor addition to it:
public static string AddSpacesToSentence(string text)
{
if (string.IsNullOrEmpty(text))
return "";
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < result.Length; i++)
{
if (char.IsUpper(result[i]) && !char.IsUpper(result[i - 1]))
{
newText.Append(' ');
}
else if (i < result.Length)
{
if (char.IsUpper(result[i]) && !char.IsUpper(result[i + 1]))
newText.Append(' ');
}
newText.Append(result[i]);
}
return newText.ToString();
}
I have added a condition !char.IsUpper(text[i - 1]). This fixed a bug that would cause something like 'AverageNOX' to be turned into 'Average N O X', which is obviously wrong, as it should read 'Average NOX'.
Sadly this still has the bug that if you have the text 'FromAStart', you would get 'From AStart' out.
Any thoughts on fixing this?

Inspired from #MartinBrown,
Two Lines of Simple Regex, which will resolve your name, including Acyronyms anywhere in the string.
public string ResolveName(string name)
{
var tmpDisplay = Regex.Replace(name, "([^A-Z ])([A-Z])", "$1 $2");
return Regex.Replace(tmpDisplay, "([A-Z]+)([A-Z][^A-Z$])", "$1 $2").Trim();
}

Here's mine:
private string SplitCamelCase(string s)
{
Regex upperCaseRegex = new Regex(#"[A-Z]{1}[a-z]*");
MatchCollection matches = upperCaseRegex.Matches(s);
List<string> words = new List<string>();
foreach (Match match in matches)
{
words.Add(match.Value);
}
return String.Join(" ", words.ToArray());
}

Make sure you aren't putting spaces at the beginning of the string, but you are putting them between consecutive capitals. Some of the answers here don't address one or both of those points. There are other ways than regex, but if you prefer to use that, try this:
Regex.Replace(value, #"\B[A-Z]", " $0")
The \B is a negated \b, so it represents a non-word-boundary. It means the pattern matches "Y" in XYzabc, but not in Yzabc or X Yzabc. As a little bonus, you can use this on a string with spaces in it and it won't double them.

What you have works perfectly. Just remember to reassign value to the return value of this function.
value = System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0");

Here is how you could do it in SQL
create FUNCTION dbo.PascalCaseWithSpace(#pInput AS VARCHAR(MAX)) RETURNS VARCHAR(MAX)
BEGIN
declare #output varchar(8000)
set #output = ''
Declare #vInputLength INT
Declare #vIndex INT
Declare #vCount INT
Declare #PrevLetter varchar(50)
SET #PrevLetter = ''
SET #vCount = 0
SET #vIndex = 1
SET #vInputLength = LEN(#pInput)
WHILE #vIndex <= #vInputLength
BEGIN
IF ASCII(SUBSTRING(#pInput, #vIndex, 1)) = ASCII(Upper(SUBSTRING(#pInput, #vIndex, 1)))
begin
if(#PrevLetter != '' and ASCII(#PrevLetter) = ASCII(Lower(#PrevLetter)))
SET #output = #output + ' ' + SUBSTRING(#pInput, #vIndex, 1)
else
SET #output = #output + SUBSTRING(#pInput, #vIndex, 1)
end
else
begin
SET #output = #output + SUBSTRING(#pInput, #vIndex, 1)
end
set #PrevLetter = SUBSTRING(#pInput, #vIndex, 1)
SET #vIndex = #vIndex + 1
END
return #output
END

replaceAll("(?<=[^^\\p{Uppercase}])(?=[\\p{Uppercase}])"," ");

static string AddSpacesToColumnName(string columnCaption)
{
if (string.IsNullOrWhiteSpace(columnCaption))
return "";
StringBuilder newCaption = new StringBuilder(columnCaption.Length * 2);
newCaption.Append(columnCaption[0]);
int pos = 1;
for (pos = 1; pos < columnCaption.Length-1; pos++)
{
if (char.IsUpper(columnCaption[pos]) && !(char.IsUpper(columnCaption[pos - 1]) && char.IsUpper(columnCaption[pos + 1])))
newCaption.Append(' ');
newCaption.Append(columnCaption[pos]);
}
newCaption.Append(columnCaption[pos]);
return newCaption.ToString();
}

In Ruby, via Regexp:
"FooBarBaz".gsub(/(?!^)(?=[A-Z])/, ' ') # => "Foo Bar Baz"

I took Kevin Strikers excellent solution and converted to VB. Since i'm locked into .NET 3.5, i also had to write IsNullOrWhiteSpace. This passes all of his tests.
<Extension()>
Public Function IsNullOrWhiteSpace(value As String) As Boolean
If value Is Nothing Then
Return True
End If
For i As Integer = 0 To value.Length - 1
If Not Char.IsWhiteSpace(value(i)) Then
Return False
End If
Next
Return True
End Function
<Extension()>
Public Function UnPascalCase(text As String) As String
If text.IsNullOrWhiteSpace Then
Return String.Empty
End If
Dim newText = New StringBuilder()
newText.Append(text(0))
For i As Integer = 1 To text.Length - 1
Dim currentUpper = Char.IsUpper(text(i))
Dim prevUpper = Char.IsUpper(text(i - 1))
Dim nextUpper = If(text.Length > i + 1, Char.IsUpper(text(i + 1)) Or Char.IsWhiteSpace(text(i + 1)), prevUpper)
Dim spaceExists = Char.IsWhiteSpace(text(i - 1))
If (currentUpper And Not spaceExists And (Not nextUpper Or Not prevUpper)) Then
newText.Append(" ")
End If
newText.Append(text(i))
Next
Return newText.ToString()
End Function

The question is a bit old but nowadays there is a nice library on Nuget that does exactly this as well as many other conversions to human readable text.
Check out Humanizer on GitHub or Nuget.
Example
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Underscored_input_String_is_turned_INTO_sentence".Humanize() => "Underscored input String is turned INTO sentence"
// acronyms are left intact
"HTML".Humanize() => "HTML"

Seems like a good opportunity for Aggregate. This is designed to be readable, not necessarily especially fast.
someString
.Aggregate(
new StringBuilder(),
(str, ch) => {
if (char.IsUpper(ch) && str.Length > 0)
str.Append(" ");
str.Append(ch);
return str;
}
).ToString();

Found a lot of these answers to be rather obtuse but I haven't fully tested my solution, but it works for what I need, should handle acronyms, and is much more compact/readable than the others IMO:
private string CamelCaseToSpaces(string s)
{
if (string.IsNullOrEmpty(s)) return string.Empty;
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
stringBuilder.Append(s[i]);
int nextChar = i + 1;
if (nextChar < s.Length && char.IsUpper(s[nextChar]) && !char.IsUpper(s[i]))
{
stringBuilder.Append(" ");
}
}
return stringBuilder.ToString();
}

In addition to Martin Brown's Answer, I had an issue with numbers as well. For Example: "Location2", or "Jan22" should be "Location 2", and "Jan 22" respectively.
Here is my Regular Expression for doing that, using Martin Brown's answer:
"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))|((?<=[\p{Ll}\p{Lu}])\p{Nd})|((?<=\p{Nd})\p{Lu})"
Here are a couple great sites for figuring out what each part means as well:
Java Based Regular Expression Analyzer (but works for most .net regex's)
Action Script Based Analyzer
The above regex won't work on the action script site unless you replace all of the \p{Ll} with [a-z], the \p{Lu} with [A-Z], and \p{Nd} with [0-9].

Here's my solution, based on Binary Worriers suggestion and building in Richard Priddys' comments, but also taking into account that white space may exist in the provided string, so it won't add white space next to existing white space.
public string AddSpacesBeforeUpperCase(string nonSpacedString)
{
if (string.IsNullOrEmpty(nonSpacedString))
return string.Empty;
StringBuilder newText = new StringBuilder(nonSpacedString.Length * 2);
newText.Append(nonSpacedString[0]);
for (int i = 1; i < nonSpacedString.Length; i++)
{
char currentChar = nonSpacedString[i];
// If it is whitespace, we do not need to add another next to it
if(char.IsWhiteSpace(currentChar))
{
continue;
}
char previousChar = nonSpacedString[i - 1];
char nextChar = i < nonSpacedString.Length - 1 ? nonSpacedString[i + 1] : nonSpacedString[i];
if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar)
&& !(char.IsUpper(previousChar) && char.IsUpper(nextChar)))
{
newText.Append(' ');
}
else if (i < nonSpacedString.Length)
{
if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar) && !char.IsUpper(nextChar))
{
newText.Append(' ');
}
}
newText.Append(currentChar);
}
return newText.ToString();
}

For anyone who is looking for a C++ function answering this same question, you can use the following. This is modeled after the answer given by #Binary Worrier. This method just preserves Acronyms automatically.
using namespace std;
void AddSpacesToSentence(string& testString)
stringstream ss;
ss << testString.at(0);
for (auto it = testString.begin() + 1; it != testString.end(); ++it )
{
int index = it - testString.begin();
char c = (*it);
if (isupper(c))
{
char prev = testString.at(index - 1);
if (isupper(prev))
{
if (index < testString.length() - 1)
{
char next = testString.at(index + 1);
if (!isupper(next) && next != ' ')
{
ss << ' ';
}
}
}
else if (islower(prev))
{
ss << ' ';
}
}
ss << c;
}
cout << ss.str() << endl;
The tests strings I used for this function, and the results are:
"helloWorld" -> "hello World"
"HelloWorld" -> "Hello World"
"HelloABCWorld" -> "Hello ABC World"
"HelloWorldABC" -> "Hello World ABC"
"ABCHelloWorld" -> "ABC Hello World"
"ABC HELLO WORLD" -> "ABC HELLO WORLD"
"ABCHELLOWORLD" -> "ABCHELLOWORLD"
"A" -> "A"

A C# solution for an input string that consists only of ASCII characters. The regex incorporates negative lookbehind to ignore a capital (upper case) letter that appears at the beginning of the string. Uses Regex.Replace() to return the desired string.
Also see regex101.com demo.
using System;
using System.Text.RegularExpressions;
public class RegexExample
{
public static void Main()
{
var text = "ThisStringHasNoSpacesButItDoesHaveCapitals";
// Use negative lookbehind to match all capital letters
// that do not appear at the beginning of the string.
var pattern = "(?<!^)([A-Z])";
var rgx = new Regex(pattern);
var result = rgx.Replace(text, " $1");
Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
}
}
Expected Output:
Input: [ThisStringHasNoSpacesButItDoesHaveCapitals]
Output: [This String Has No Spaces But It Does Have Capitals]
Update: Here's a variation that will also handle acronyms (sequences of upper-case letters).
Also see regex101.com demo and ideone.com demo.
using System;
using System.Text.RegularExpressions;
public class RegexExample
{
public static void Main()
{
var text = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
// Use positive lookbehind to locate all upper-case letters
// that are preceded by a lower-case letter.
var patternPart1 = "(?<=[a-z])([A-Z])";
// Used positive lookbehind and lookahead to locate all
// upper-case letters that are preceded by an upper-case
// letter and followed by a lower-case letter.
var patternPart2 = "(?<=[A-Z])([A-Z])(?=[a-z])";
var pattern = patternPart1 + "|" + patternPart2;
var rgx = new Regex(pattern);
var result = rgx.Replace(text, " $1$2");
Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
}
}
Expected Output:
Input: [ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ]
Output: [This String Has No Spaces ASCII But It Does Have Capitals LINQ]

Here is a more thorough solution that doesn't put spaces in front of words:
Note: I have used multiple Regexs (not concise but it will also handle acronyms and single letter words)
Dim s As String = "ThisStringHasNoSpacesButItDoesHaveCapitals"
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z](?=[A-Z])[a-z]*)", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([A-Z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2") // repeat a second time
In:
"ThisStringHasNoSpacesButItDoesHaveCapitals"
"IAmNotAGoat"
"LOLThatsHilarious!"
"ThisIsASMSMessage"
Out:
"This String Has No Spaces But It Does Have Capitals"
"I Am Not A Goat"
"LOL Thats Hilarious!"
"This Is ASMS Message" // (Difficult to handle single letter words when they are next to acronyms.)

All the previous responses looked too over complicated.
I had string that had a mixture of capitals and _ so used, string.Replace() to make the _, " " and used the following to add a space at the capital letters.
for (int i = 0; i < result.Length; i++)
{
if (char.IsUpper(result[i]))
{
counter++;
if (i > 1) //stops from adding a space at if string starts with Capital
{
result = result.Insert(i, " ");
i++; //Required** otherwise stuck in infinite
//add space loop over a single capital letter.
}
}
}

Inspired by Binary Worrier answer I took a swing at this.
Here's the result:
/// <summary>
/// String Extension Method
/// Adds white space to strings based on Upper Case Letters
/// </summary>
/// <example>
/// strIn => "HateJPMorgan"
/// preserveAcronyms false => "Hate JP Morgan"
/// preserveAcronyms true => "Hate JPMorgan"
/// </example>
/// <param name="strIn">to evaluate</param>
/// <param name="preserveAcronyms" >determines saving acronyms (Optional => false) </param>
public static string AddSpaces(this string strIn, bool preserveAcronyms = false)
{
if (string.IsNullOrWhiteSpace(strIn))
return String.Empty;
var stringBuilder = new StringBuilder(strIn.Length * 2)
.Append(strIn[0]);
int i;
for (i = 1; i < strIn.Length - 1; i++)
{
var c = strIn[i];
if (Char.IsUpper(c) && (Char.IsLower(strIn[i - 1]) || (preserveAcronyms && Char.IsLower(strIn[i + 1]))))
stringBuilder.Append(' ');
stringBuilder.Append(c);
}
return stringBuilder.Append(strIn[i]).ToString();
}
Did test using stopwatch running 10000000 iterations and various string lengths and combinations.
On average 50% (maybe a bit more) faster than Binary Worrier answer.

private string GetProperName(string Header)
{
if (Header.ToCharArray().Where(c => Char.IsUpper(c)).Count() == 1)
{
return Header;
}
else
{
string ReturnHeader = Header[0].ToString();
for(int i=1; i<Header.Length;i++)
{
if (char.IsLower(Header[i-1]) && char.IsUpper(Header[i]))
{
ReturnHeader += " " + Header[i].ToString();
}
else
{
ReturnHeader += Header[i].ToString();
}
}
return ReturnHeader;
}
return Header;
}

This one includes acronyms and acronym plurals and is a bit faster than the accepted answer:
public string Sentencify(string value)
{
if (string.IsNullOrWhiteSpace(value))
return string.Empty;
string final = string.Empty;
for (int i = 0; i < value.Length; i++)
{
if (i != 0 && Char.IsUpper(value[i]))
{
if (!Char.IsUpper(value[i - 1]))
final += " ";
else if (i < (value.Length - 1))
{
if (!Char.IsUpper(value[i + 1]) && !((value.Length >= i && value[i + 1] == 's') ||
(value.Length >= i + 1 && value[i + 1] == 'e' && value[i + 2] == 's')))
final += " ";
}
}
final += value[i];
}
return final;
}
Passes these tests:
string test1 = "RegularOTs";
string test2 = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
string test3 = "ThisStringHasNoSpacesButItDoesHaveCapitals";

An implementation with fold, also known as Aggregate:
public static string SpaceCapitals(this string arg) =>
new string(arg.Aggregate(new List<Char>(),
(accum, x) =>
{
if (Char.IsUpper(x) &&
accum.Any() &&
// prevent double spacing
accum.Last() != ' ' &&
// prevent spacing acronyms (ASCII, SCSI)
!Char.IsUpper(accum.Last()))
{
accum.Add(' ');
}
accum.Add(x);
return accum;
}).ToArray());
In addition to the request, this implementation correctly saves leading, inner, trailing spaces and acronyms, for example,
" SpacedWord " => " Spaced Word ",
"Inner Space" => "Inner Space",
"SomeACRONYM" => "Some ACRONYM".

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find a match for query within text - c#

Related

C#, Finding equal amount of brackets

I want make some validation string that match the format below

How find sub-string(0,91) from long string?

Parsing strings recursively

Add spaces before Capital Letters

Categories

Resources