Compare two strings ignoring little changes - c#

I want to compare two strings ignoring few words (say three).
Like if I compare these two strings:
"Hello! My name is Alex Jolig. Its nice to meet you."
"My name is Alex. Nice to meet you."
I should get result as True.
Is there any way to do that?

Nothing inbuilt that comes to my mind, but I think you can tokenise both the strings using a delimiter (' ' in your case) & punctuation marks (! & . in your case).
Once both the strings are broken down in ordered tokens you can apply a comparison between individual tokens as per your requirement.

You could split the strings into words and compare them like this;
private bool compareStrings()
{
string stringLeft = "Hello! My name is Alex Jolig. Its nice to meet you.";
string stringRight = "My name is Alex. Nice to meet you.";
List<string> liLeft = stringLeft.Split(' ').ToList();
List<string> liRight = stringRight.Split(' ').ToList();
double totalWordCount = liLeft.Count();
double matchingWordCount = 0;
foreach (var item in liLeft)
{
if(liRight.Contains(item)){
matchingWordCount ++;
}
}
//return bool based on percentage of matching words
return ((matchingWordCount / totalWordCount) * 100) >= 50;
}
This returns a boolean based on a percentage of matching words, you might want to use a Regex or similar to replace some format characters for more accurate results.

There is an article on Fuzzy String Matching with Edit Distance in codeproject
You can probably extend this idea to suit your requirement. It uses Levenshtein's Edit Distance as a Fuzzy String Match.
http://www.codeproject.com/Articles/162790/Fuzzy-String-Matching-with-Edit-Distance

Hey so my here is my go at an answer.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var string1 = "Hi thar im a string";
var string2 = "Hi thar im a string";
var string3 = "Hi thar im a similar string";
var string4 = "im a really different string";
var string5 = "Hi thar im a string but have many different words";
Console.WriteLine(StringComparo(string1, string2));
Console.WriteLine(StringComparo(string1, string3));
Console.WriteLine(StringComparo(string1, string4));
Console.WriteLine(StringComparo(string1, string5));
Console.ReadLine();
}
public static bool StringComparo(string str1, string str2, int diffCounterLimiter = 3)
{
var counter = 0;
var arr1 = str1.Split(' ');
var arr2 = str2.Split(' ');
while (counter <= diffCounterLimiter)
{
TreeNode bestResult = null;
for (int i = 0; i < arr1.Length; i++)
{
for (int j = 0; j < arr2.Length; j++)
{
var result = new TreeNode() { arr1Index = i, arr2Index = j };
if (string.Equals(arr1[i], arr2[j]) && (bestResult == null || bestResult.diff < result.diff))
{
bestResult = result;
}
}
}
// no result found
if(bestResult == null)
{
// any left over words plus current counter
return arr1.Length + arr2.Length + counter <= diffCounterLimiter;
}
counter += bestResult.diff;
arr1 = arr1.Where((val, idx) => idx != bestResult.arr1Index).ToArray();
arr2 = arr2.Where((val, idx) => idx != bestResult.arr2Index).ToArray();
}
return false;
}
}
public class TreeNode
{
public int arr1Index;
public int arr2Index;
public int diff => Math.Abs(arr1Index - arr2Index);
}
}
I tried to implement a tree search(I know its not really a search tree I may re write it a bit).
In essence it find the closest matched elements in each string.While under the limit of 3 differences it removes the elements matched them adds the difference and repeats. Hope it helps.

Related

Find How many times an Anagram is contained in a String - asp.net c#

I need to find how many times the Anagrams are contained in a String like in this example:(the anagrams and the string itself)
Input 1(String) = thegodsanddogsweredogged
Input 2(String) = dog
Output(int) = 3
the output will be 3 because of these - (thegodsanddogsweredogged)
So far i managed to check how many times the word "dog" is contained in the string:
public ActionResult FindAnagram (string word1, string word2)
{
int ?count = Regex.Matches(word1, word2).Count;
return View(count);
}
This works to check how many times the word is contained but i still get an error: Cannot convert null to 'int' because it is a non-nulable value type.
So i need to check for how many times input 2 and the anagrams of input 2(dog,god,dgo,ogd etc) are contained in input 1?(int this case its 3 times-thegodsanddogsweredogged)
Thank you
I wanted to post a variation which is more readable at the cost of some runtime performance. Anton's answer is probably more performant, but IMHO less readable than it could be.
The nice thing about anagrams is that you know their exact length, and you can figure out all possible anagram locations quite easily. For a 3 letter anagram in a 100 letter haystack, you know that there are 98 possible locations:
0..2
1..3
2..4
...
96..98
97..99
These indexes can be generated quite easily:
var amountOfPossibleAnagramLocations = haystack.Length - needle.Length + 1;
var substringIndexes = Enumerable.Range(0, amountOfPossibleAnagramLocations);
At this point, you simply take every listed substring and test if it's an anagram.
var anagramLength = needle.Length;
int count = 0;
foreach(var index in substringIndexes)
{
var substring = haystack.Substring(index, anagramLength);
if(substring.IsAnagramOf(needle))
count++;
}
Note that a lot of this can be condensed into a single LINQ chain:
var amountOfPossibleAnagramLocations = haystack.Length - needle.Length + 1;
var anagramLength = needle.Length;
var anagramCount = Enumerable
.Range(0, amountOfPossibleAnagramLocations)
.Select(x => haystack.Substring(x, anagramLength))
.Count(substring => substring.IsAnagramOf(needle));
Whether it's more readable or not depends on how comfortable you are with LINQ. I personally prefer it (up to a reasonable size, of course).
To check for an anagram, simply sort the characters and check for equality. I used an extension method for the readability bonus:
public static bool IsAnagramOf(this string word1, string word2)
{
var word1Sorted = String.Concat(word1.OrderBy(c => c));
var word2Sorted = String.Concat(word2.OrderBy(c => c));
return word1Sorted == word2Sorted;
}
I've omitted things like case insensitivity or ignoring whitespace for the sake of brevity.
It would be better not to try to use Regex but write your own logic.
You can use a dictionary with key char - a letter of the word and value int - number of letter occurrences. And build such a dictionary for the word.
Anagrams will have similar dictionaries, so you can build a temp dictionary for each temp string built using the Windowing method over your str and compare it with the dictionary built for your word.
Here is my code:
using System;
using System.Collections.Generic;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var str = "thegodsanddogsweredogged";
var word = "dog";
Console.WriteLine("Word: " + word);
Console.WriteLine("Str: " + str);
Console.WriteLine();
var count = CountAnagrams(str, word);
Console.WriteLine("Count: " + count);
Console.ReadKey();
}
private static int CountAnagrams(string str, string word) {
var charDict = BuildCharDict(word);
int count = 0;
for (int i = 0; i < str.Length - word.Length + 1; i++) {
string tmp = "";
for (int j = i; j < str.Length; j++) {
tmp += str[j];
if (tmp.Length == word.Length)
break;
}
var tmpCharDict = BuildCharDict(tmp);
if (CharDictsEqual(charDict, tmpCharDict)) {
count++;
Console.WriteLine("Anagram: " + tmp);
Console.WriteLine("Index: " + i);
Console.WriteLine();
}
}
return count;
}
private static Dictionary<char, int> BuildCharDict(string word) {
var charDict = new Dictionary<char, int>();
foreach (var ch in word)
{
if (charDict.ContainsKey(ch))
{
charDict[ch] += 1;
}
else
{
charDict[ch] = 1;
}
}
return charDict;
}
private static bool CharDictsEqual(Dictionary<char, int> dict1, Dictionary<char, int> dict2)
{
if (dict1.Count != dict2.Count)
return false;
foreach (var kv in dict1) {
if (!dict2.TryGetValue(kv.Key, out var val) || val != kv.Value)
{
return false;
}
}
return true;
}
}
}
Possibly there is a better solution, but mine works.
P.S. About your error. You should change int? to int, because your View might expect non-nullable int type

Mutating to a new string array

I'm doing an exercise that says I have to enter a phrase and put it in array, then I have to delete all the repeated characters and show the new phrase.
I did it like this but I don't know how to get char from string array and put it into another string array.
PS: I must only use the basics of C# and arrays :(
namespace Exercice3
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Entrez une phrase SVP!!");
string phrase = Console.ReadLine();
string[] phraseArray = new string[]{ phrase };
string[] newPhrase = new string[phrase.Length];
for (int i = 0; i <= phrase.Length - 1; i++)
{
for (int j = 1; j <= phrase.Length - 1; j++)
{
if (phraseArray[i] != phraseArray[j])
newPhrase = phraseArray[i]; //Probleme here
}
}
Console.WriteLine(newPhrase);
}
}
}
something like this would do it. You don't have to do it this way...
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
var newColl = new List<char>();
foreach(char c in "aaabbbccc".ToCharArray())
{
if (!newColl.Contains(c))
newColl.Add(c);
}
Console.WriteLine(new string(newColl.ToArray()));
}
}
Output:
abc
Array-only method
using System;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
const string orig = "aaabbbcccddd";
int origLen = orig.Length;
char[] newArr = new char[origLen]; // get max possible spots in arr
int newCount = 0;
for(int i = 0; i < origLen; i++)
{
bool yes = false;
for(int j = 0; j < newCount + 1; j++)
{
if (newArr[j] == orig[i])
{
yes = true;
break;
}
}
if (!yes)
{
newArr[newCount] = orig[i];
newCount++;
}
}
Console.WriteLine(new string(newArr));
}
}
Output:
abcd
The main problem here is that you are using arrays of strings. That is inappropriate because you are trying to iterate the characters of the string.
You need to construct your array of characters like this:
char[] phraseArray = phrase.ToCharArray();
This should allow you the ability to iterate the set of characters, check for duplicates, and form the new character array.
The reason why you're getting an IndexOutOfRangeException is because if you look at the two arrays:
string[] phraseArray = new string[]{ phrase };
And
string[] newPhrase = new string[phrase.Length];
The length of both arrays are completely different, phraseArray has a length of 1 and newPhrase will be set to the length of your phrase, so if you insert a world of more than 1 character both arrays will not match and then either:
if (phraseArray[i] != phraseArray[j])
Or
newPhrase = phraseArray[i];
Will fail, specially newPhrase = phraseArray[i];, because here you're trying to replace newPhrase an array, for a character phrase[i]. Basically your solution won't work. So what you can do is do what #Travis suggested, basically change your arrays from String[]to char[], then:
char[] phraseArray = phrase.ToCharArray();
Or Instead of using arrays you can just use another string to filter your phrase. The first string being your phrase and the second would be the string you'd add your non repeated characters to. basically this:
Console.WriteLine("Entrez une phrase SVP!!");
string phrase = Console.ReadLine();
string newPhrase = "";
// loop through each character in phrase
for (int i = 0; i < phrase.Length; i++) {
// We check if the character of phrease is within our newPhrase, if it isn't we add it, otherwise do nothing
if (newPhrase.IndexOf(phrase[i]) == -1)
newPhrase += phrase[i]; // here we add it to newPhrase
}
Console.WriteLine(newPhrase);
Don't forget to read the comments in the code.
Edit:
Based on the comments and suggestions given I implemented a similar solution:
Console.WriteLine("Entrez une phrase SVP!!");
char[] phrase = Console.ReadLine().ToCharArray();
char[] newPhrase = new char[phrase.Length];
int index = 0;
// loop through each character in phrase
for (int i = 0; i < phrase.Length; i++) {
// We check if the character of phrease is within our newPhrase, if it isn't we add it, otherwise do nothing
if (Array.IndexOf(newPhrase, phrase[i]) == -1)
newPhrase[index++] = phrase[i]; // here we add it to newPhrase
}
Console.WriteLine(newPhrase);

C# How to generate a new string based on multiple ranged index

Let's say I have a string like this one, left part is a word, right part is a collection of indices (single or range) used to reference furigana (phonetics) for kanjis in my word:
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"
The pattern in detail:
word,<startIndex>(-<endIndex>):<furigana>
What would be the best way to achieve something like this (with a space in front of the kanji to mark which part is linked to the [furigana]):
子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Edit: (thanks for your comments guys)
Here is what I wrote so far:
static void Main(string[] args)
{
string myString = "ABCDEF,1:test;3:test2";
//Split Kanjis / Indices
string[] tokens = myString.Split(',');
//Extract furigana indices
string[] indices = tokens[1].Split(';');
//Dictionnary to store furigana indices
Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();
//Collect
foreach (string index in indices)
{
string[] splitIndex = index.Split(':');
furiganaIndices.Add(splitIndex[0], splitIndex[1]);
}
//Processing
string result = tokens[0] + ",";
for (int i = 0; i < tokens[0].Length; i++)
{
string currentIndex = i.ToString();
if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
{
string currentFurigana = furiganaIndices[currentIndex].ToString();
result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
}
else //nothing to add
{
result = result + tokens[0].ElementAt(i);
}
}
File.AppendAllText(#"D:\test.txt", result + Environment.NewLine);
}
Result:
ABCDEF,A B[test]C D[test2]EF
I struggle to find a way to process ranged indices:
string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF
I don't have anything against manually manipulating strings per se. But given that you seem to have a regular pattern describing the inputs, it seems to me that a solution that uses regex would be more maintainable and readable. So with that in mind, here's an example program that takes that approach:
class Program
{
private const string _kinvalidFormatException = "Invalid format for edit specification";
private static readonly Regex
regex1 = new Regex(#"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
regex2 = new Regex(#"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);
static void Main(string[] args)
{
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
string result = EditString(myString);
}
private static string EditString(string myString)
{
Match editsMatch = regex1.Match(myString);
if (!editsMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int ichCur = 0;
string input = editsMatch.Groups["word"].Value;
StringBuilder text = new StringBuilder();
foreach (Capture capture in editsMatch.Groups["edit"].Captures)
{
Match oneEditMatch = regex2.Match(capture.Value);
if (!oneEditMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int start, end;
if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
{
throw new ArgumentException(_kinvalidFormatException);
}
Group endGroup = oneEditMatch.Groups["end"];
if (endGroup.Success)
{
if (!int.TryParse(endGroup.Value, out end))
{
throw new ArgumentException(_kinvalidFormatException);
}
}
else
{
end = start;
}
text.Append(input.Substring(ichCur, start - ichCur));
if (text.Length > 0)
{
text.Append(' ');
}
ichCur = end + 1;
text.Append(input.Substring(start, ichCur - start));
text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
}
if (ichCur < input.Length)
{
text.Append(input.Substring(ichCur));
}
return text.ToString();
}
}
Notes:
This implementation assumes that the edit specifications will be listed in order and won't overlap. It makes no attempt to validate that part of the input; depending on where you are getting your input from you may want to add that. If it's valid for the specifications to be listed out of order, you can also extend the above to first store the edits in a list and sort the list by the start index before actually editing the string. (In similar fashion to the way the other proposed answer works; though, why they are using a dictionary instead of a simple list to store the individual edits, I have no idea…that seems arbitrarily complicated to me.)
I included basic input validation, throwing exceptions where failures occur in the pattern matching. A more user-friendly implementation would add more specific information to each exception, describing what part of the input actually was invalid.
The Regex class actually has a Replace() method, which allows for complete customization. The above could have been implemented that way, using Replace() and a MatchEvaluator to provide the replacement text, instead of just appending text to a StringBuilder. Which way to do it is mostly a matter of preference, though the MatchEvaluator might be preferred if you have a need for more flexible implementation options (i.e. if the exact format of the result can vary).
If you do choose to use the other proposed answer, I strongly recommend you use StringBuilder instead of simply concatenating onto the results variable. For short strings it won't matter much, but you should get into the habit of always using StringBuilder when you have a loop that is incrementally adding onto a string value, because for long string the performance implications of using concatenation can be very negative.
This should do it (and even handle ranged indices), based on the formatting of the input string you have-
using System;
using System.Collections.Generic;
public class stringParser
{
private struct IndexElements
{
public int start;
public int end;
public string value;
}
public static void Main()
{
//input string
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
int wordIndexSplit = myString.IndexOf(',');
string word = myString.Substring(0,wordIndexSplit);
string indices = myString.Substring(wordIndexSplit + 1);
string[] eachIndex = indices.Split(';');
Dictionary<int,IndexElements> index = new Dictionary<int,IndexElements>();
string[] elements;
IndexElements e;
int dash;
int n = 0;
int last = -1;
string results = "";
foreach (string s in eachIndex)
{
e = new IndexElements();
elements = s.Split(':');
if (elements[0].Contains("-"))
{
dash = elements[0].IndexOf('-');
e.start = int.Parse(elements[0].Substring(0,dash));
e.end = int.Parse(elements[0].Substring(dash + 1));
}
else
{
e.start = int.Parse(elements[0]);
e.end = e.start;
}
e.value = elements[1];
index.Add(n,e);
n++;
}
//this is the part that takes the "setup" from the parts above and forms the result string
//loop through each of the "indices" parsed above
for (int i = 0; i < index.Count; i++)
{
//if this is the first iteration through the loop, and the first "index" does not start
//at position 0, add the beginning characters before its start
if (last == -1 && index[i].start > 0)
{
results += word.Substring(0,index[i].start);
}
//if this is not the first iteration through the loop, and the previous iteration did
//not stop at the position directly before the start of the current iteration, add
//the intermediary chracters
else if (last != -1 && last + 1 != index[i].start)
{
results += word.Substring(last + 1,index[i].start - (last + 1));
}
//add the space before the "index" match, the actual match, and then the formatted "index"
results += " " + word.Substring(index[i].start,(index[i].end - index[i].start) + 1)
+ "[" + index[i].value + "]";
//remember the position of the ending for the next iteration
last = index[i].end;
}
//if the last "index" did not stop at the end of the input string, add the remaining characters
if (index[index.Keys.Count - 1].end + 1 < word.Length)
{
results += word.Substring(index[index.Keys.Count-1].end + 1);
}
//trimming spaces that may be left behind
results = results.Trim();
Console.WriteLine("INPUT - " + myString);
Console.WriteLine("OUTPUT - " + results);
Console.Read();
}
}
input - 子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす
output - 子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Note that this should also work with characters the English alphabet if you wanted to use English instead-
input - iliketocodeverymuch,2:A;4-6:B;9-12:CDEFG
output - il i[A]k eto[B]co deve[CDEFG]rymuch

How to remove spaces to make a combination of strings [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I've been trying to figure out the best approach to combining words in a string to make combinations of that string. I'm trying to do this for a class project. If the string is "The quick fox", I need to find a way to output "Thequick fox", "the quickfox", and "thequickfox". I've tried using string.split and gluing them back together, but haven't had a lot of luck. The issues is the string input could be of any size.
I decided to try this for fun. The idea here is to split the bigger problems into smaller subproblems. So I first started with strings that had 0 and 1 space. I see that with 0 space, the only possible combinations is the string items. With 1 space, I can either have that space or not.
Then I just have to recursively divide the problem until I get one of the base cases. So to that do that I Skip elements in the split array in increments of 2. That way I am guaranteed to get one of the base cases eventually. Once I do that, I run it through the program again and figure out how to add all the results of that to my current set of combinations.
Here's the code:
class Program
{
static void Main(string[] args)
{
string test1 = "fox";
string test2 = "The quick";
string test3 = "The quick fox";
string test4 = "The quick fox says";
string test5 = "The quick fox says hello";
var splittest1 = test1.Split(' ');
var splittest2 = test2.Split(' ');
var splittest3 = test3.Split(' ');
var splittest4 = test4.Split(' ');
var splittest5 = test5.Split(' ');
var ans1 = getcombinations(splittest1);
var ans2 = getcombinations(splittest2);
var ans3 = getcombinations(splittest3);
var ans4 = getcombinations(splittest4);
var ans5 = getcombinations(splittest5);
}
static List<string> getcombinations(string[] splittest)
{
var combos = new List<string>();
var numspaces = splittest.Count() - 1;
if (numspaces == 1)
{
var addcombos = AddTwoStrings(splittest[0], splittest[1]);
var withSpacesCurrent = addcombos.Item1;
var noSpacesCurrent = addcombos.Item2;
combos.Add(withSpacesCurrent);
combos.Add(noSpacesCurrent);
}
else if (numspaces == 0)
{
combos.Add(splittest[0]);
}
else
{
var addcombos = AddTwoStrings(splittest[0], splittest[1]);
var withSpacesCurrent = addcombos.Item1;
var noSpacesCurrent = addcombos.Item2;
var futureCombos = getcombinations(splittest.Skip(2).ToArray());
foreach (var futureCombo in futureCombos)
{
var addFutureCombos = AddTwoStrings(withSpacesCurrent, futureCombo);
var addFutureCombosNoSpaces = AddTwoStrings(noSpacesCurrent, futureCombo);
var combo1 = addFutureCombos.Item1;
var combo2 = addFutureCombos.Item2;
var combo3 = addFutureCombosNoSpaces.Item1;
var combo4 = addFutureCombosNoSpaces.Item2;
combos.Add(combo1);
combos.Add(combo2);
combos.Add(combo3);
combos.Add(combo4);
}
}
return combos;
}
static Tuple<string, string> AddTwoStrings(string a, string b)
{
return Tuple.Create(a + " " + b, a + b);
}
}
}
This is how I got it working, not sure if it is the best algorithm.
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine("Enter a string");
string input = Console.ReadLine();
//split the input string into an array
string[] arrInput = input.Split(' ');
Console.WriteLine("The combinations are...");
//output the original string
Console.WriteLine(input);
//this loop decide letter combination
for (int i = 2; i <= arrInput.Length; i++)
{
//this loop decide how many outputs we would get for a letter combination
//for ex. we would get 2 outputs in a 3 word string if we combine 2 words
for (int j = i-1; j < arrInput.Length; j++)
{
int end = j; // end index
int start = (end - i) + 1; //start index
string output = Combine(arrInput, start, end);
Console.WriteLine(output);
}
}
Console.ReadKey();
}
//combine array into a string with space except from start to end
public static string Combine(string[] arrInput, int start, int end) {
StringBuilder builder = new StringBuilder();
bool combine = false;
for (int i = 0; i < arrInput.Length; i++) {
//first word in the array... don't worry
if (i == 0) {
builder.Append(arrInput[i]);
continue;
}
//don't append " " if combine is true
combine = (i > start && i <= end) ? true : false;
if (!combine)
{
builder.Append(" ");
}
builder.Append(arrInput[i]);
}
return builder.ToString();
}
}

Best way to convert Pascal Case to a sentence

What is the best way to convert from Pascal Case (upper Camel Case) to a sentence.
For example starting with
"AwaitingFeedback"
and converting that to
"Awaiting feedback"
C# preferable but I could convert it from Java or similar.
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
In versions of visual studio after 2015, you can do
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => $"{m.Value[0]} {char.ToLower(m.Value[1])}");
}
Based on: Converting Pascal case to sentences using regular expression
I will prefer to use Humanizer for this. Humanizer is a Portable Class Library that meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities.
Short Answer
"AwaitingFeedback".Humanize() => Awaiting feedback
Long and Descriptive Answer
Humanizer can do a lot more work other examples are:
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Can_return_title_Case".Humanize(LetterCasing.Title) => "Can Return Title Case"
"CanReturnLowerCase".Humanize(LetterCasing.LowerCase) => "can return lower case"
Complete code is :
using Humanizer;
using static System.Console;
namespace HumanizerConsoleApp
{
class Program
{
static void Main(string[] args)
{
WriteLine("AwaitingFeedback".Humanize());
WriteLine("PascalCaseInputStringIsTurnedIntoSentence".Humanize());
WriteLine("Underscored_input_string_is_turned_into_sentence".Humanize());
WriteLine("Can_return_title_Case".Humanize(LetterCasing.Title));
WriteLine("CanReturnLowerCase".Humanize(LetterCasing.LowerCase));
}
}
}
Output
Awaiting feedback
Pascal case input string is turned into sentence
Underscored input string is turned into sentence Can Return Title Case
can return lower case
If you prefer to write your own C# code you can achieve this by writing some C# code stuff as answered by others already.
Here you go...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CamelCaseToString
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CamelCaseToString("ThisIsYourMasterCallingYou"));
}
private static string CamelCaseToString(string str)
{
if (str == null || str.Length == 0)
return null;
StringBuilder retVal = new StringBuilder(32);
retVal.Append(char.ToUpper(str[0]));
for (int i = 1; i < str.Length; i++ )
{
if (char.IsLower(str[i]))
{
retVal.Append(str[i]);
}
else
{
retVal.Append(" ");
retVal.Append(char.ToLower(str[i]));
}
}
return retVal.ToString();
}
}
}
This works for me:
Regex.Replace(strIn, "([A-Z]{1,2}|[0-9]+)", " $1").TrimStart()
This is just like #SSTA, but is more efficient than calling TrimStart.
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Found this in the MvcContrib source, doesn't seem to be mentioned here yet.
return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
Just because everyone has been using Regex (except this guy), here's an implementation with StringBuilder that was about 5x faster in my tests. Includes checking for numbers too.
"SomeBunchOfCamelCase2".FromCamelCaseToSentence == "Some Bunch Of Camel Case 2"
public static string FromCamelCaseToSentence(this string input) {
if(string.IsNullOrEmpty(input)) return input;
var sb = new StringBuilder();
// start with the first character -- consistent camelcase and pascal case
sb.Append(char.ToUpper(input[0]));
// march through the rest of it
for(var i = 1; i < input.Length; i++) {
// any time we hit an uppercase OR number, it's a new word
if(char.IsUpper(input[i]) || char.IsDigit(input[i])) sb.Append(' ');
// add regularly
sb.Append(input[i]);
}
return sb.ToString();
}
Here's a basic way of doing it that I came up with using Regex
public static string CamelCaseToSentence(this string value)
{
var sb = new StringBuilder();
var firstWord = true;
foreach (var match in Regex.Matches(value, "([A-Z][a-z]+)|[0-9]+"))
{
if (firstWord)
{
sb.Append(match.ToString());
firstWord = false;
}
else
{
sb.Append(" ");
sb.Append(match.ToString().ToLower());
}
}
return sb.ToString();
}
It will also split off numbers which I didn't specify but would be useful.
string camel = "MyCamelCaseString";
string s = Regex.Replace(camel, "([A-Z])", " $1").ToLower().Trim();
Console.WriteLine(s.Substring(0,1).ToUpper() + s.Substring(1));
Edit: didn't notice your casing requirements, modifed accordingly. You could use a matchevaluator to do the casing, but I think a substring is easier. You could also wrap it in a 2nd regex replace where you change the first character
"^\w"
to upper
\U (i think)
I'd use a regex, inserting a space before each upper case character, then lowering all the string.
string spacedString = System.Text.RegularExpressions.Regex.Replace(yourString, "\B([A-Z])", " \k");
spacedString = spacedString.ToLower();
It is easy to do in JavaScript (or PHP, etc.) where you can define a function in the replace call:
var camel = "AwaitingFeedbackDearMaster";
var sentence = camel.replace(/([A-Z].)/g, function (c) { return ' ' + c.toLowerCase(); });
alert(sentence);
Although I haven't solved the initial cap problem... :-)
Now, for the Java solution:
String ToSentence(String camel)
{
if (camel == null) return ""; // Or null...
String[] words = camel.split("(?=[A-Z])");
if (words == null) return "";
if (words.length == 1) return words[0];
StringBuilder sentence = new StringBuilder(camel.length());
if (words[0].length() > 0) // Just in case of camelCase instead of CamelCase
{
sentence.append(words[0] + " " + words[1].toLowerCase());
}
else
{
sentence.append(words[1]);
}
for (int i = 2; i < words.length; i++)
{
sentence.append(" " + words[i].toLowerCase());
}
return sentence.toString();
}
System.out.println(ToSentence("AwaitingAFeedbackDearMaster"));
System.out.println(ToSentence(null));
System.out.println(ToSentence(""));
System.out.println(ToSentence("A"));
System.out.println(ToSentence("Aaagh!"));
System.out.println(ToSentence("stackoverflow"));
System.out.println(ToSentence("disableGPS"));
System.out.println(ToSentence("Ahh89Boo"));
System.out.println(ToSentence("ABC"));
Note the trick to split the sentence without loosing any character...
Pseudo-code:
NewString = "";
Loop through every char of the string (skip the first one)
If char is upper-case ('A'-'Z')
NewString = NewString + ' ' + lowercase(char)
Else
NewString = NewString + char
Better ways can perhaps be done by using regex or by string replacement routines (replace 'X' with ' x')
An xquery solution that works for both UpperCamel and lowerCamel case:
To output sentence case (only the first character of the first word is capitalized):
declare function content:sentenceCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),lower-case(replace($remainingCharacters, '([A-Z])', ' $1')))
};
To output title case (first character of each word capitalized):
declare function content:titleCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),replace($remainingCharacters, '([A-Z])', ' $1'))
};
Found myself doing something similar, and I appreciate having a point-of-departure with this discussion. This is my solution, placed as an extension method to the string class in the context of a console application.
using System;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string piratese = "avastTharMatey";
string ivyese = "CheerioPipPip";
Console.WriteLine("{0}\n{1}\n", piratese.CamelCaseToString(), ivyese.CamelCaseToString());
Console.WriteLine("For Pete\'s sake, man, hit ENTER!");
string strExit = Console.ReadLine();
}
}
public static class StringExtension
{
public static string CamelCaseToString(this string str)
{
StringBuilder retVal = new StringBuilder(32);
if (!string.IsNullOrEmpty(str))
{
string strTrimmed = str.Trim();
if (!string.IsNullOrEmpty(strTrimmed))
{
retVal.Append(char.ToUpper(strTrimmed[0]));
if (strTrimmed.Length > 1)
{
for (int i = 1; i < strTrimmed.Length; i++)
{
if (char.IsUpper(strTrimmed[i])) retVal.Append(" ");
retVal.Append(char.ToLower(strTrimmed[i]));
}
}
}
}
return retVal.ToString();
}
}
}
Most of the preceding answers split acronyms and numbers, adding a space in front of each character. I wanted acronyms and numbers to be kept together so I have a simple state machine that emits a space every time the input transitions from one state to the other.
/// <summary>
/// Add a space before any capitalized letter (but not for a run of capitals or numbers)
/// </summary>
internal static string FromCamelCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input)) return String.Empty;
var sb = new StringBuilder();
bool upper = true;
for (var i = 0; i < input.Length; i++)
{
bool isUpperOrDigit = char.IsUpper(input[i]) || char.IsDigit(input[i]);
// any time we transition to upper or digits, it's a new word
if (!upper && isUpperOrDigit)
{
sb.Append(' ');
}
sb.Append(input[i]);
upper = isUpperOrDigit;
}
return sb.ToString();
}
And here's some tests:
[TestCase(null, ExpectedResult = "")]
[TestCase("", ExpectedResult = "")]
[TestCase("ABC", ExpectedResult = "ABC")]
[TestCase("abc", ExpectedResult = "abc")]
[TestCase("camelCase", ExpectedResult = "camel Case")]
[TestCase("PascalCase", ExpectedResult = "Pascal Case")]
[TestCase("Pascal123", ExpectedResult = "Pascal 123")]
[TestCase("CustomerID", ExpectedResult = "Customer ID")]
[TestCase("CustomABC123", ExpectedResult = "Custom ABC123")]
public string CanSplitCamelCase(string input)
{
return FromCamelCaseToSentence(input);
}
Mostly already answered here
Small chage to the accepted answer, to convert the second and subsequent Capitalised letters to lower case, so change
if (char.IsUpper(text[i]))
newText.Append(' ');
newText.Append(text[i]);
to
if (char.IsUpper(text[i]))
{
newText.Append(' ');
newText.Append(char.ToLower(text[i]));
}
else
newText.Append(text[i]);
Here is my implementation. This is the fastest that I got while avoiding creating spaces for abbreviations.
public static string PascalCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input) || input.Length < 2)
return input;
var sb = new char[input.Length + ((input.Length + 1) / 2)];
var len = 0;
var lastIsLower = false;
for (int i = 0; i < input.Length; i++)
{
var current = input[i];
if (current < 97)
{
if (lastIsLower)
{
sb[len] = ' ';
len++;
}
lastIsLower = false;
}
else
{
lastIsLower = true;
}
sb[len] = current;
len++;
}
return new string(sb, 0, len);
}

Categories