Ok, so a friend of mine asked me to help him out with a string reverse method that can be reused without using String.Reverse (it's a homework assignment for him). Now, I did, below is the code. It works. Splendidly actually. Obviously by looking at it you can see the larger the string the longer the time it takes to work. However, my question is WHY does it work? Programming is a lot of trial and error, and I was more pseudocoding than actual coding and it worked lol.
Can someone explain to me how exactly reverse = ch + reverse; is working? I don't understand what is making it go into reverse :/
class Program
{
static void Reverse(string x)
{
string text = x;
string reverse = string.Empty;
foreach (char ch in text)
{
reverse = ch + reverse;
// this shows the building of the new string.
// Console.WriteLine(reverse);
}
Console.WriteLine(reverse);
}
static void Main(string[] args)
{
string comingin;
Console.WriteLine("Write something");
comingin = Console.ReadLine();
Reverse(comingin);
// pause
Console.ReadLine();
}
}
If the string passed through is "hello", the loop will be doing this:
reverse = 'h' + string.Empty
reverse = 'e' + 'h'
reverse = 'l' + 'eh'
until it's equal to
olleh
If your string is My String, then:
Pass 1, reverse = 'M'
Pass 2, reverse = 'yM'
Pass 3, reverse = ' yM'
You're taking each char and saying "that character and tack on what I had before after it".
I think your question has been answered. My reply goes beyond the immediate question and more to the spirit of the exercise. I remember having this task many decades ago in college, when memory and mainframe (yikes!) processing time was at a premium. Our task was to reverse an array or string, which is an array of characters, without creating a 2nd array or string. The spirit of the exercise was to teach one to be mindful of available resources.
In .NET, a string is an immutable object, so I must use a 2nd string. I wrote up 3 more examples to demonstrate different techniques that may be faster than your method, but which shouldn't be used to replace the built-in .NET Replace method. I'm partial to the last one.
// StringBuilder inserting at 0 index
public static string Reverse2(string inputString)
{
var result = new StringBuilder();
foreach (char ch in inputString)
{
result.Insert(0, ch);
}
return result.ToString();
}
// Process inputString backwards and append with StringBuilder
public static string Reverse3(string inputString)
{
var result = new StringBuilder();
for (int i = inputString.Length - 1; i >= 0; i--)
{
result.Append(inputString[i]);
}
return result.ToString();
}
// Convert string to array and swap pertinent items
public static string Reverse4(string inputString)
{
var chars = inputString.ToCharArray();
for (int i = 0; i < (chars.Length/2); i++)
{
var temp = chars[i];
chars[i] = chars[chars.Length - 1 - i];
chars[chars.Length - 1 - i] = temp;
}
return new string(chars);
}
Please imagine that you entrance string is "abc". After that you can see that letters are taken one by one and add to the start of the new string:
reverse = "", ch='a' ==> reverse (ch+reverse) = "a"
reverse= "a", ch='b' ==> reverse (ch+reverse) = b+a = "ba"
reverse= "ba", ch='c' ==> reverse (ch+reverse) = c+ba = "cba"
To test the suggestion by Romoku of using StringBuilder I have produced the following code.
public static void Reverse(string x)
{
string text = x;
string reverse = string.Empty;
foreach (char ch in text)
{
reverse = ch + reverse;
}
Console.WriteLine(reverse);
}
public static void ReverseFast(string x)
{
string text = x;
StringBuilder reverse = new StringBuilder();
for (int i = text.Length - 1; i >= 0; i--)
{
reverse.Append(text[i]);
}
Console.WriteLine(reverse);
}
public static void Main(string[] args)
{
int abcx = 100; // amount of abc's
string abc = "";
for (int i = 0; i < abcx; i++)
abc += "abcdefghijklmnopqrstuvwxyz";
var x = new System.Diagnostics.Stopwatch();
x.Start();
Reverse(abc);
x.Stop();
string ReverseMethod = "Reverse Method: " + x.ElapsedMilliseconds.ToString();
x.Restart();
ReverseFast(abc);
x.Stop();
Console.Clear();
Console.WriteLine("Method | Milliseconds");
Console.WriteLine(ReverseMethod);
Console.WriteLine("ReverseFast Method: " + x.ElapsedMilliseconds.ToString());
System.Console.Read();
}
On my computer these are the speeds I get per amount of alphabet(s).
100 ABC(s)
Reverse ~5-10ms
FastReverse ~5-15ms
1000 ABC(s)
Reverse ~120ms
FastReverse ~20ms
10000 ABC(s)
Reverse ~16,852ms!!!
FastReverse ~262ms
These time results will vary greatly depending on the computer but one thing is for certain if you are processing more than 100k characters you are insane for not using StringBuilder! On the other hand if you are processing less than 2000 characters the overhead from the StringBuilder definitely seems to catch up with its performance boost.
Related
I need to reverse a string using a for loop but I just can't do it. What I'm doing wrong?
class Program
{
static void Main(string[] args)
{
string s;
string sReverse;
Console.WriteLine("Say any word please: ");
s = Console.ReadLine();
for (int i = 0; i < s.Length; i++)
{
s[s.Length - i] = sReversa[i];
}
}
}
Strings are immutable. You can't change them character by character.
If you want random access to the characters in a string, convert it to a char array first (using ToCharArray), then convert it back when you're done (String has a constructor that accepts a char array).
string a = "1234";
char[] b = a.ToCharArray();
b[1] = 'X';
a = new string(b);
Console.WriteLine("{0}", a);
Output:
1X34
There are much easier ways to reverse a string (e.g. LINQ would let you do var output = new string(input.Reverse().ToArray());), but if you want to use a for loop and your current approach, this is probably the piece of information you are missing.
These will be the minimal changes required in your code:
public static void Main()
{
string s;
string sReverse = string.Empty; // Initialise (always recommended)
Console.WriteLine("Say any word please: ");
s = Console.ReadLine();
for (int i = s.Length-1; i >=0 ; i--) // Chnage the order of indexing
{
sReverse += s[i]; // makes a new string everytime since strings are immutable.
}
Console.WriteLine(sReverse); // print your new string
}
As others said this may not be the best approach for very large number of string manipulation / formation. Instead, use StringBuilder.
StringBuilder is suited in situations where you need to perform repeated modifications to a string, the overhead associated with
creating a new String object can be costly.
Below is the snippet on how to use StringBuilder:
using System;
using System.Text; // Add this for StringBuilder
public class Program
{
public static void Main()
{
string s;
StringBuilder sReverse = new StringBuilder(); // Instantiate the object.
Console.WriteLine("Say any word please: ");
s = Console.ReadLine();
for (int i = s.Length-1; i >=0 ; i--)
{
sReverse.Append(s[i]); // Keep Appending to original string.
}
Console.WriteLine(sReverse.ToString()); // finally convert to printable string.
}
}
string reverse = string.Join("", "some word".Reverse());
You can not use s[s.Length - i] = sReversa[i];
Because Strings are immutable.
Instead, you can use sReverse = sReverse + s[Length];
The following code will give you reverse of a string:
static void Main(string[] args)
{
string s;
string sReverse = "";
int Length;
Console.WriteLine("Say any word please: ");
s = Console.ReadLine();
Length = s.Length - 1;
for (int i = Length; i >= 0; i--)
{
sReverse = sReverse + s[Length];
Length--;
}
Console.WriteLine("{0}", sReverse);
Console.ReadLine();
}
Note: For a large number of iteration, this might be a performance issue.
The simplest way to make your code work with minimal changes is to use a StringBuilder. Unlike string a StringBuilder is mutable.
Here's the code:
Console.WriteLine("Say any word please: ");
string s = Console.ReadLine();
StringBuilder sb = new StringBuilder(s);
for (int i = 0; i < s.Length; i++)
{
sb[s.Length - i - 1] = s[i];
}
string r = sb.ToString();
If you must use a for loop the way you want, you will have to allocate a new char array, initialize it with the string's characters from back to front, then take advantage of the string constructor that takes a char array as input. The code looks as follows:
public static string ReverseWithFor(string s)
{
if (string.IsNullOrEmpty(s)) return s;
char[] a = new char[s.Length];
for (int i = 0; i < s.Length; i++)
a[s.Length - 1 - i] = s[i];
return new string(a);
}
I am doing this in this way but its remove string previous characters, its out put is (Magic,Agic,Gic,Ic,C) but I want the whole string to be concate before and after.
public string[] Transform(string st)
{
string[] arr = new string[st.Length];
string[] arr1 = new string[st.Length];
for (int x = 0; x < st.Length; x++)
{
arr1[x] = char.ToLower(st[x]) + "".ToString();
}
for (int i = 0; i < st.Length; i++)
{
string st1 = "";
{
st1 = char.ToUpper(st[i]) + st.Substring(i + 1);
}
arr[i] = st1;
}
return arr;
}
You can do this with a single loop:
public static string[] Transform(string str)
{
var strs = new List<string>();
var sb = new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
sb.Clear();
sb.Append(str);
sb[i] = char.ToUpper(str[i]);
strs.Add(sb.ToString());
}
return strs.ToArray();
}
What this does is adds the str to a StringBuilder and then modifies the indexed character with the upper case version of that character. For example, the input abcde will give:
Abcde
aBcde
abCde
abcDe
abcdE
Try it out on DotNetFiddle
If you wanted to get really fancy I'm sure there is some convoluted LINQ that can do the same, but this gives you a basic framework for how it can work.
You forgot to add left part of the string. Try to do like this:
st1 = st.ToLower().Substring + char.ToUpper(st[i]) + st.Substring(i + 1);
Here. This is twice as fast as the method that uses a string builder and a List
public static string[] Transform(string str)
{
var strs = new string [str.Length];
var sb = str.ToCharArray();
char oldCh;
for (int i = 0; i < str.Length; i++)
{
oldCh = sb[i];
sb[i] = char.ToUpper(sb[i]);
strs[i] = new string (sb);
sb[i] = oldCh;
}
return strs;
}
There's no need to clear and keep reading the string to the string builder. We also know the size of the array so that can be allocated at the start.
I wrote an answer for your questions (it's second code snippet), you can modify it for your needs, like changing the return type to string[], or use ToArray() extension method if you wanna stick with it. I think it's more readable this way.
I decided to put a the end little profiler to check CPU usage and memory compared to #Ron Beyer answer.
Here is my first attempt:
public static void Main()
{
var result = Transform("abcde");
result.ToList().ForEach(WriteLine);
}
public static IEnumerable<string> Transform(string str)
{
foreach (var w in str)
{
var split = str.Split(w);
yield return split[0] + char.ToUpper(w) + split[1];
}
}
Result:
Abcde
aBcde
abCde
abcDe
abcdE
Code fiddle https://dotnetfiddle.net/gnsAGX
There is one huge drawback of that code above, it works only if the passed word has unique letters. Therefore "aaaaa" won't produce proper result.
Here is my second successful attempt that seems works with any string input. I used one instance of StringBuilder to decrease the number of objects that would need to be created and manage on one instance, instead of so much copying objects so it's more optimized.
public static void Main()
{
var result = Transform("aaaaa");
result.ToList().ForEach(WriteLine);
}
public static IEnumerable<string> Transform(string str)
{
var result = new StringBuilder(str.ToLower());
for( int i = 0; i < str.Length; i++)
{
result[i] = char.ToUpper(str[i]);
yield return result.ToString();
result[i] = char.ToLower(str[i]);
}
}
Result:
Aaaaa
aAaaa
aaAaa
aaaAa
aaaaA
Code fiddle: https://dotnetfiddle.net/tzhXtP
Measuring execute time and memory uses.
I will use dotnetfiddle.net status panel, to make it easier.
Fiddle has few limitations like time execution of code 10 sec and used memory
besides differences are very significant.
I tested programs with 14 000 repetitions, my code additionally changes the output to array[].
My answer (https://dotnetfiddle.net/1fLVw9)
Last Run: 12:23:09 pm
Compile: 0.046s
Execute: 7.563s
Memory: 16.22Gb
CPU: 7.609s
Compared answer (https://dotnetfiddle.net/Zc88F2)
Compile: 0.031s
Execute: 9.953s
Memory: 16.22Gb
CPU: 9.938s
It slightly reduces the execution time.
Hope this helps!
public static string[] Transform(string str)
{
var strs = new string [str.Length];
var sb = str.ToCharArray();
char oldCh;
for (int i = 0; i < str.Length; i++)
{
oldCh = sb[i];
sb[i] = char.ToUpper(sb[i]);
strs[i] = new string (sb);
sb[i] = oldCh;
}
return strs;
}
Let's say I have a string like this one, left part is a word, right part is a collection of indices (single or range) used to reference furigana (phonetics) for kanjis in my word:
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"
The pattern in detail:
word,<startIndex>(-<endIndex>):<furigana>
What would be the best way to achieve something like this (with a space in front of the kanji to mark which part is linked to the [furigana]):
子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Edit: (thanks for your comments guys)
Here is what I wrote so far:
static void Main(string[] args)
{
string myString = "ABCDEF,1:test;3:test2";
//Split Kanjis / Indices
string[] tokens = myString.Split(',');
//Extract furigana indices
string[] indices = tokens[1].Split(';');
//Dictionnary to store furigana indices
Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();
//Collect
foreach (string index in indices)
{
string[] splitIndex = index.Split(':');
furiganaIndices.Add(splitIndex[0], splitIndex[1]);
}
//Processing
string result = tokens[0] + ",";
for (int i = 0; i < tokens[0].Length; i++)
{
string currentIndex = i.ToString();
if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
{
string currentFurigana = furiganaIndices[currentIndex].ToString();
result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
}
else //nothing to add
{
result = result + tokens[0].ElementAt(i);
}
}
File.AppendAllText(#"D:\test.txt", result + Environment.NewLine);
}
Result:
ABCDEF,A B[test]C D[test2]EF
I struggle to find a way to process ranged indices:
string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF
I don't have anything against manually manipulating strings per se. But given that you seem to have a regular pattern describing the inputs, it seems to me that a solution that uses regex would be more maintainable and readable. So with that in mind, here's an example program that takes that approach:
class Program
{
private const string _kinvalidFormatException = "Invalid format for edit specification";
private static readonly Regex
regex1 = new Regex(#"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
regex2 = new Regex(#"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);
static void Main(string[] args)
{
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
string result = EditString(myString);
}
private static string EditString(string myString)
{
Match editsMatch = regex1.Match(myString);
if (!editsMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int ichCur = 0;
string input = editsMatch.Groups["word"].Value;
StringBuilder text = new StringBuilder();
foreach (Capture capture in editsMatch.Groups["edit"].Captures)
{
Match oneEditMatch = regex2.Match(capture.Value);
if (!oneEditMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int start, end;
if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
{
throw new ArgumentException(_kinvalidFormatException);
}
Group endGroup = oneEditMatch.Groups["end"];
if (endGroup.Success)
{
if (!int.TryParse(endGroup.Value, out end))
{
throw new ArgumentException(_kinvalidFormatException);
}
}
else
{
end = start;
}
text.Append(input.Substring(ichCur, start - ichCur));
if (text.Length > 0)
{
text.Append(' ');
}
ichCur = end + 1;
text.Append(input.Substring(start, ichCur - start));
text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
}
if (ichCur < input.Length)
{
text.Append(input.Substring(ichCur));
}
return text.ToString();
}
}
Notes:
This implementation assumes that the edit specifications will be listed in order and won't overlap. It makes no attempt to validate that part of the input; depending on where you are getting your input from you may want to add that. If it's valid for the specifications to be listed out of order, you can also extend the above to first store the edits in a list and sort the list by the start index before actually editing the string. (In similar fashion to the way the other proposed answer works; though, why they are using a dictionary instead of a simple list to store the individual edits, I have no idea…that seems arbitrarily complicated to me.)
I included basic input validation, throwing exceptions where failures occur in the pattern matching. A more user-friendly implementation would add more specific information to each exception, describing what part of the input actually was invalid.
The Regex class actually has a Replace() method, which allows for complete customization. The above could have been implemented that way, using Replace() and a MatchEvaluator to provide the replacement text, instead of just appending text to a StringBuilder. Which way to do it is mostly a matter of preference, though the MatchEvaluator might be preferred if you have a need for more flexible implementation options (i.e. if the exact format of the result can vary).
If you do choose to use the other proposed answer, I strongly recommend you use StringBuilder instead of simply concatenating onto the results variable. For short strings it won't matter much, but you should get into the habit of always using StringBuilder when you have a loop that is incrementally adding onto a string value, because for long string the performance implications of using concatenation can be very negative.
This should do it (and even handle ranged indices), based on the formatting of the input string you have-
using System;
using System.Collections.Generic;
public class stringParser
{
private struct IndexElements
{
public int start;
public int end;
public string value;
}
public static void Main()
{
//input string
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
int wordIndexSplit = myString.IndexOf(',');
string word = myString.Substring(0,wordIndexSplit);
string indices = myString.Substring(wordIndexSplit + 1);
string[] eachIndex = indices.Split(';');
Dictionary<int,IndexElements> index = new Dictionary<int,IndexElements>();
string[] elements;
IndexElements e;
int dash;
int n = 0;
int last = -1;
string results = "";
foreach (string s in eachIndex)
{
e = new IndexElements();
elements = s.Split(':');
if (elements[0].Contains("-"))
{
dash = elements[0].IndexOf('-');
e.start = int.Parse(elements[0].Substring(0,dash));
e.end = int.Parse(elements[0].Substring(dash + 1));
}
else
{
e.start = int.Parse(elements[0]);
e.end = e.start;
}
e.value = elements[1];
index.Add(n,e);
n++;
}
//this is the part that takes the "setup" from the parts above and forms the result string
//loop through each of the "indices" parsed above
for (int i = 0; i < index.Count; i++)
{
//if this is the first iteration through the loop, and the first "index" does not start
//at position 0, add the beginning characters before its start
if (last == -1 && index[i].start > 0)
{
results += word.Substring(0,index[i].start);
}
//if this is not the first iteration through the loop, and the previous iteration did
//not stop at the position directly before the start of the current iteration, add
//the intermediary chracters
else if (last != -1 && last + 1 != index[i].start)
{
results += word.Substring(last + 1,index[i].start - (last + 1));
}
//add the space before the "index" match, the actual match, and then the formatted "index"
results += " " + word.Substring(index[i].start,(index[i].end - index[i].start) + 1)
+ "[" + index[i].value + "]";
//remember the position of the ending for the next iteration
last = index[i].end;
}
//if the last "index" did not stop at the end of the input string, add the remaining characters
if (index[index.Keys.Count - 1].end + 1 < word.Length)
{
results += word.Substring(index[index.Keys.Count-1].end + 1);
}
//trimming spaces that may be left behind
results = results.Trim();
Console.WriteLine("INPUT - " + myString);
Console.WriteLine("OUTPUT - " + results);
Console.Read();
}
}
input - 子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす
output - 子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Note that this should also work with characters the English alphabet if you wanted to use English instead-
input - iliketocodeverymuch,2:A;4-6:B;9-12:CDEFG
output - il i[A]k eto[B]co deve[CDEFG]rymuch
I am using this method to clean a string:
public static string CleanString(string dirtyString)
{
string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
string result = dirtyString;
foreach (char c in removeChars)
{
result = result.Replace(c.ToString(), string.Empty);
}
return result;
}
This method gives the correct result. However, there is a performance glitch in this method. Every time I pass the string, every character goes into the loop. If I have a large string then it will take too much time to return the object.
Is there a better way of doing the same thing? Maybe using LINQ or jQuery/JavaScript?
Any suggestions would be appreciated.
OK, consider the following test:
public class CleanString
{
//by MSDN http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.71).aspx
public static string UseRegex(string strIn)
{
// Replace invalid characters with empty strings.
return Regex.Replace(strIn, #"[^\w\.#-]", "");
}
// by Paolo Tedesco
public static String UseStringBuilder(string strIn)
{
const string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !removeChars.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
// by Paolo Tedesco, but using a HashSet
public static String UseStringBuilderWithHashSet(string strIn)
{
var hashSet = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(strIn.Length);
foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
{
sb.Append(x);
}
return sb.ToString();
}
// by SteveDog
public static string UseStringBuilderWithHashSet2(string dirtyString)
{
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
StringBuilder result = new StringBuilder(dirtyString.Length);
foreach (char c in dirtyString)
if (removeChars.Contains(c))
result.Append(c);
return result.ToString();
}
// original by patel.milanb
public static string UseReplace(string dirtyString)
{
string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
string result = dirtyString;
foreach (char c in removeChars)
{
result = result.Replace(c.ToString(), string.Empty);
}
return result;
}
// by L.B
public static string UseWhere(string dirtyString)
{
return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray());
}
}
static class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main()
{
var dirtyString = "sdfdf.dsf8908()=(=(sadfJJLef#ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef#ssyd€sdöf////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef#ssyd€sdöf";
var sw = new Stopwatch();
var iterations = 50000;
sw.Start();
for (var i = 0; i < iterations; i++)
CleanString.<SomeMethod>(dirtyString);
sw.Stop();
Debug.WriteLine("CleanString.<SomeMethod>: " + sw.ElapsedMilliseconds.ToString());
sw.Reset();
....
<repeat>
....
}
}
Output
CleanString.UseReplace: 791
CleanString.UseStringBuilder: 2805
CleanString.UseStringBuilderWithHashSet: 521
CleanString.UseStringBuilderWithHashSet2: 331
CleanString.UseRegex: 1700
CleanString.UseWhere: 233
Conclusion
It probably does not matter which method you use.
The difference in time between the fastest (UseWhere: 233ms) and the slowest (UseStringBuilder: 2805ms) method is 2572ms when called 50000 (!) times in a row. If you don't run the method that often, the difference does not really matter.
But if performance is critical, use the UseWhere method (written by L.B). Note, however, that its behavior is slightly different.
If it's purely speed and efficiency you are after, I would recommend doing something like this:
public static string CleanString(string dirtyString)
{
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
StringBuilder result = new StringBuilder(dirtyString.Length);
foreach (char c in dirtyString)
if (!removeChars.Contains(c)) // prevent dirty chars
result.Append(c);
return result.ToString();
}
RegEx is certainly an elegant solution, but it adds extra overhead. By specifying the starting length of the string builder, it will only need to allocate the memory once (and a second time for the ToString at the end). This will cut down on memory usage and increase the speed, especially on longer strings.
However, as L.B. said, if you are using this to properly encode text that is bound for HTML output, you should be using HttpUtility.HtmlEncode instead of doing it yourself.
use regex [?&^$##!()+-,:;<>’\'-_*] for replacing with empty string
I don't know if, performance-wise, using a Regex or LINQ would be an improvement.
Something that could be useful, would be to create the new string with a StringBuilder instead of using string.Replace each time:
using System.Linq;
using System.Text;
static class Program {
static void Main(string[] args) {
const string removeChars = " ?&^$##!()+-,:;<>’\'-_*";
string result = "x&y(z)";
// specify capacity of StringBuilder to avoid resizing
StringBuilder sb = new StringBuilder(result.Length);
foreach (char x in result.Where(c => !removeChars.Contains(c))) {
sb.Append(x);
}
result = sb.ToString();
}
}
This one is even faster!
use:
string dirty=#"tfgtf$#$%gttg%$% 664%$";
string clean = dirty.Clean();
public static string Clean(this String name)
{
var namearray = new Char[name.Length];
var newIndex = 0;
for (var index = 0; index < namearray.Length; index++)
{
var letter = (Int32)name[index];
if (!((letter > 96 && letter < 123) || (letter > 64 && letter < 91) || (letter > 47 && letter < 58)))
continue;
namearray[newIndex] = (Char)letter;
++newIndex;
}
return new String(namearray).TrimEnd();
}
Give this a try: http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx
Perhaps it helps to first explain the 'why' and then the 'what'. The reason you're getting slow performance is because c# copies-and-replaces the strings for each replacement. From my experience using Regex in .NET isn't always better - although in most scenario's (I think including this one) it'll probably work just fine.
If I really need performance I usually don't leave it up to luck and just tell the compiler exactly what I want: that is: create a string with the upper bound number of characters and copy all the chars in there that you need. It's also possible to replace the hashset with a switch / case or array in which case you might end up with a jump table or array lookup - which is even faster.
The 'pragmatic' best, but fast solution is:
char[] data = new char[dirtyString.Length];
int ptr = 0;
HashSet<char> hs = new HashSet<char>() { /* all your excluded chars go here */ };
foreach (char c in dirtyString)
if (!hs.Contains(c))
data[ptr++] = c;
return new string(data, 0, ptr);
BTW: this solution is incorrect when you want to process high surrogate Unicode characters - but can easily be adapted to include these characters.
-Stefan.
I use this in my current project and it works fine. It takes a sentence, it removes all the non alphanumerical characters, it then returns the sentence with all the words in the first letter upper case and everything else in lower case. Maybe I should call it SentenceNormalizer. Naming is hard :)
internal static string StringSanitizer(string whateverString)
{
whateverString = whateverString.Trim().ToLower();
Regex cleaner = new Regex("(?:[^a-zA-Z0-9 ])", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);
var listOfWords = (cleaner.Replace(whateverString, string.Empty).Split(' ', StringSplitOptions.RemoveEmptyEntries)).ToList();
string cleanString = string.Empty;
foreach (string word in listOfWords)
{
cleanString += $"{word.First().ToString().ToUpper() + word.Substring(1)} ";
}
return cleanString;
}
I am not able to spend time on acid testing this but this line did not actually clean slashes as desired.
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’\'-_*");
I had to add slashes individually and escape the backslash
HashSet<char> removeChars = new HashSet<char>(" ?&^$##!()+-,:;<>’'-_*");
removeChars.Add('/');
removeChars.Add('\\');
What is the best way to convert from Pascal Case (upper Camel Case) to a sentence.
For example starting with
"AwaitingFeedback"
and converting that to
"Awaiting feedback"
C# preferable but I could convert it from Java or similar.
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => m.Value[0] + " " + char.ToLower(m.Value[1]));
}
In versions of visual studio after 2015, you can do
public static string ToSentenceCase(this string str)
{
return Regex.Replace(str, "[a-z][A-Z]", m => $"{m.Value[0]} {char.ToLower(m.Value[1])}");
}
Based on: Converting Pascal case to sentences using regular expression
I will prefer to use Humanizer for this. Humanizer is a Portable Class Library that meets all your .NET needs for manipulating and displaying strings, enums, dates, times, timespans, numbers and quantities.
Short Answer
"AwaitingFeedback".Humanize() => Awaiting feedback
Long and Descriptive Answer
Humanizer can do a lot more work other examples are:
"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Can_return_title_Case".Humanize(LetterCasing.Title) => "Can Return Title Case"
"CanReturnLowerCase".Humanize(LetterCasing.LowerCase) => "can return lower case"
Complete code is :
using Humanizer;
using static System.Console;
namespace HumanizerConsoleApp
{
class Program
{
static void Main(string[] args)
{
WriteLine("AwaitingFeedback".Humanize());
WriteLine("PascalCaseInputStringIsTurnedIntoSentence".Humanize());
WriteLine("Underscored_input_string_is_turned_into_sentence".Humanize());
WriteLine("Can_return_title_Case".Humanize(LetterCasing.Title));
WriteLine("CanReturnLowerCase".Humanize(LetterCasing.LowerCase));
}
}
}
Output
Awaiting feedback
Pascal case input string is turned into sentence
Underscored input string is turned into sentence Can Return Title Case
can return lower case
If you prefer to write your own C# code you can achieve this by writing some C# code stuff as answered by others already.
Here you go...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace CamelCaseToString
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(CamelCaseToString("ThisIsYourMasterCallingYou"));
}
private static string CamelCaseToString(string str)
{
if (str == null || str.Length == 0)
return null;
StringBuilder retVal = new StringBuilder(32);
retVal.Append(char.ToUpper(str[0]));
for (int i = 1; i < str.Length; i++ )
{
if (char.IsLower(str[i]))
{
retVal.Append(str[i]);
}
else
{
retVal.Append(" ");
retVal.Append(char.ToLower(str[i]));
}
}
return retVal.ToString();
}
}
}
This works for me:
Regex.Replace(strIn, "([A-Z]{1,2}|[0-9]+)", " $1").TrimStart()
This is just like #SSTA, but is more efficient than calling TrimStart.
Regex.Replace("ThisIsMyCapsDelimitedString", "(\\B[A-Z])", " $1")
Found this in the MvcContrib source, doesn't seem to be mentioned here yet.
return Regex.Replace(input, "([A-Z])", " $1", RegexOptions.Compiled).Trim();
Just because everyone has been using Regex (except this guy), here's an implementation with StringBuilder that was about 5x faster in my tests. Includes checking for numbers too.
"SomeBunchOfCamelCase2".FromCamelCaseToSentence == "Some Bunch Of Camel Case 2"
public static string FromCamelCaseToSentence(this string input) {
if(string.IsNullOrEmpty(input)) return input;
var sb = new StringBuilder();
// start with the first character -- consistent camelcase and pascal case
sb.Append(char.ToUpper(input[0]));
// march through the rest of it
for(var i = 1; i < input.Length; i++) {
// any time we hit an uppercase OR number, it's a new word
if(char.IsUpper(input[i]) || char.IsDigit(input[i])) sb.Append(' ');
// add regularly
sb.Append(input[i]);
}
return sb.ToString();
}
Here's a basic way of doing it that I came up with using Regex
public static string CamelCaseToSentence(this string value)
{
var sb = new StringBuilder();
var firstWord = true;
foreach (var match in Regex.Matches(value, "([A-Z][a-z]+)|[0-9]+"))
{
if (firstWord)
{
sb.Append(match.ToString());
firstWord = false;
}
else
{
sb.Append(" ");
sb.Append(match.ToString().ToLower());
}
}
return sb.ToString();
}
It will also split off numbers which I didn't specify but would be useful.
string camel = "MyCamelCaseString";
string s = Regex.Replace(camel, "([A-Z])", " $1").ToLower().Trim();
Console.WriteLine(s.Substring(0,1).ToUpper() + s.Substring(1));
Edit: didn't notice your casing requirements, modifed accordingly. You could use a matchevaluator to do the casing, but I think a substring is easier. You could also wrap it in a 2nd regex replace where you change the first character
"^\w"
to upper
\U (i think)
I'd use a regex, inserting a space before each upper case character, then lowering all the string.
string spacedString = System.Text.RegularExpressions.Regex.Replace(yourString, "\B([A-Z])", " \k");
spacedString = spacedString.ToLower();
It is easy to do in JavaScript (or PHP, etc.) where you can define a function in the replace call:
var camel = "AwaitingFeedbackDearMaster";
var sentence = camel.replace(/([A-Z].)/g, function (c) { return ' ' + c.toLowerCase(); });
alert(sentence);
Although I haven't solved the initial cap problem... :-)
Now, for the Java solution:
String ToSentence(String camel)
{
if (camel == null) return ""; // Or null...
String[] words = camel.split("(?=[A-Z])");
if (words == null) return "";
if (words.length == 1) return words[0];
StringBuilder sentence = new StringBuilder(camel.length());
if (words[0].length() > 0) // Just in case of camelCase instead of CamelCase
{
sentence.append(words[0] + " " + words[1].toLowerCase());
}
else
{
sentence.append(words[1]);
}
for (int i = 2; i < words.length; i++)
{
sentence.append(" " + words[i].toLowerCase());
}
return sentence.toString();
}
System.out.println(ToSentence("AwaitingAFeedbackDearMaster"));
System.out.println(ToSentence(null));
System.out.println(ToSentence(""));
System.out.println(ToSentence("A"));
System.out.println(ToSentence("Aaagh!"));
System.out.println(ToSentence("stackoverflow"));
System.out.println(ToSentence("disableGPS"));
System.out.println(ToSentence("Ahh89Boo"));
System.out.println(ToSentence("ABC"));
Note the trick to split the sentence without loosing any character...
Pseudo-code:
NewString = "";
Loop through every char of the string (skip the first one)
If char is upper-case ('A'-'Z')
NewString = NewString + ' ' + lowercase(char)
Else
NewString = NewString + char
Better ways can perhaps be done by using regex or by string replacement routines (replace 'X' with ' x')
An xquery solution that works for both UpperCamel and lowerCamel case:
To output sentence case (only the first character of the first word is capitalized):
declare function content:sentenceCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),lower-case(replace($remainingCharacters, '([A-Z])', ' $1')))
};
To output title case (first character of each word capitalized):
declare function content:titleCase($string)
{
let $firstCharacter := substring($string, 1, 1)
let $remainingCharacters := substring-after($string, $firstCharacter)
return
concat(upper-case($firstCharacter),replace($remainingCharacters, '([A-Z])', ' $1'))
};
Found myself doing something similar, and I appreciate having a point-of-departure with this discussion. This is my solution, placed as an extension method to the string class in the context of a console application.
using System;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string piratese = "avastTharMatey";
string ivyese = "CheerioPipPip";
Console.WriteLine("{0}\n{1}\n", piratese.CamelCaseToString(), ivyese.CamelCaseToString());
Console.WriteLine("For Pete\'s sake, man, hit ENTER!");
string strExit = Console.ReadLine();
}
}
public static class StringExtension
{
public static string CamelCaseToString(this string str)
{
StringBuilder retVal = new StringBuilder(32);
if (!string.IsNullOrEmpty(str))
{
string strTrimmed = str.Trim();
if (!string.IsNullOrEmpty(strTrimmed))
{
retVal.Append(char.ToUpper(strTrimmed[0]));
if (strTrimmed.Length > 1)
{
for (int i = 1; i < strTrimmed.Length; i++)
{
if (char.IsUpper(strTrimmed[i])) retVal.Append(" ");
retVal.Append(char.ToLower(strTrimmed[i]));
}
}
}
}
return retVal.ToString();
}
}
}
Most of the preceding answers split acronyms and numbers, adding a space in front of each character. I wanted acronyms and numbers to be kept together so I have a simple state machine that emits a space every time the input transitions from one state to the other.
/// <summary>
/// Add a space before any capitalized letter (but not for a run of capitals or numbers)
/// </summary>
internal static string FromCamelCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input)) return String.Empty;
var sb = new StringBuilder();
bool upper = true;
for (var i = 0; i < input.Length; i++)
{
bool isUpperOrDigit = char.IsUpper(input[i]) || char.IsDigit(input[i]);
// any time we transition to upper or digits, it's a new word
if (!upper && isUpperOrDigit)
{
sb.Append(' ');
}
sb.Append(input[i]);
upper = isUpperOrDigit;
}
return sb.ToString();
}
And here's some tests:
[TestCase(null, ExpectedResult = "")]
[TestCase("", ExpectedResult = "")]
[TestCase("ABC", ExpectedResult = "ABC")]
[TestCase("abc", ExpectedResult = "abc")]
[TestCase("camelCase", ExpectedResult = "camel Case")]
[TestCase("PascalCase", ExpectedResult = "Pascal Case")]
[TestCase("Pascal123", ExpectedResult = "Pascal 123")]
[TestCase("CustomerID", ExpectedResult = "Customer ID")]
[TestCase("CustomABC123", ExpectedResult = "Custom ABC123")]
public string CanSplitCamelCase(string input)
{
return FromCamelCaseToSentence(input);
}
Mostly already answered here
Small chage to the accepted answer, to convert the second and subsequent Capitalised letters to lower case, so change
if (char.IsUpper(text[i]))
newText.Append(' ');
newText.Append(text[i]);
to
if (char.IsUpper(text[i]))
{
newText.Append(' ');
newText.Append(char.ToLower(text[i]));
}
else
newText.Append(text[i]);
Here is my implementation. This is the fastest that I got while avoiding creating spaces for abbreviations.
public static string PascalCaseToSentence(string input)
{
if (string.IsNullOrEmpty(input) || input.Length < 2)
return input;
var sb = new char[input.Length + ((input.Length + 1) / 2)];
var len = 0;
var lastIsLower = false;
for (int i = 0; i < input.Length; i++)
{
var current = input[i];
if (current < 97)
{
if (lastIsLower)
{
sb[len] = ' ';
len++;
}
lastIsLower = false;
}
else
{
lastIsLower = true;
}
sb[len] = current;
len++;
}
return new string(sb, 0, len);
}