Low complexity algorithm to remove/replace special characters [duplicate] - c#

This question already has answers here:
A faster way of doing multiple string replacements
(8 answers)
Closed 9 years ago.
I want to replace some invalid characters in the name of a file uploaded to my application.
I've searched up to something on the internet and found some complex algorithms to do it, here's one:
public static string RemoverAcentuacao(string palavra)
{
string palavraSemAcento = null;
string caracterComAcento = "áàãâäéèêëíìîïóòõôöúùûüçáàãâÄéèêëíìîïóòõÖôúùûÜç, ?&:/!;ºª%‘’()\"”“";
string caracterSemAcento = "aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC___________________";
if (!String.IsNullOrEmpty(palavra))
{
for (int i = 0; i < palavra.Length; i++)
{
if (caracterComAcento.IndexOf(Convert.ToChar(palavra.Substring(i, 1))) >= 0)
{
int car = caracterComAcento.IndexOf(Convert.ToChar(palavra.Substring(i, 1)));
palavraSemAcento += caracterSemAcento.Substring(car, 1);
}
else
{
palavraSemAcento += palavra.Substring(i, 1);
}
}
string[] cEspeciais = { "#39", "---", "--", "'", "#", "\r\n", "\n", "\r" };
for (int q = 0; q < cEspeciais.Length; q++)
{
palavraSemAcento = palavraSemAcento.Replace(cEspeciais[q], "-");
}
for (int x = (cEspeciais.Length - 1); x > -1; x--)
{
palavraSemAcento = palavraSemAcento.Replace(cEspeciais[x], "-");
}
palavraSemAcento = palavraSemAcento.Replace("+", "-").Replace(Environment.NewLine, "").TrimStart('-').TrimEnd('-').Replace("<i>", "-").Replace("<-i>", "-").Replace("<br>", "").Replace("--", "-");
}
else
{
palavraSemAcento = "indefinido";
}
return palavraSemAcento.ToLower();
}
There's a way to do it with a less complex algorithm?
I think this algorithm is very complex to something not too complex, but I can't think in something diferent of this.

I want to replace some invalid characters in the name of a file
if this is really what you want then it is easy
string ToLegalFileName(string s)
{
var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars());
return String.Join("", s.Select(c => invalidChars.Contains(c) ? '_' : c));
}
if your intent is to replace accented chars with their ascii counterparts then
string RemoverAcentuacao(string s)
{
return String.Join("",
s.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
}
and this is the 3rd version which replaces accented chars + other chars with '_'
string RemoverAcentuacao2(string s)
{
return String.Join("",
s.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
.Select(c => char.IsLetterOrDigit(c) ? c : '_')
.Select(c => (int)c < 128 ? c : '_'));
}

A solution using regular expressions:
string ReplaceSpecial(string input, string replace, char replacewith)
{
char[] back = input.ToCharArray();
var matches = Regex.Matches(String.Format("[{0}]", replace), input);
foreach (var i in matches)
back[i.Index] = replacewith;
return new string(back);
}
A somewhat simpler solution using String.Replace:
string ReplaceSpecial(string input, char[] replace, char replacewith)
{
string back = input;
foreach (char i in replace)
back.Replace(i, replacewith);
return back;
}

static string RemoverAcentuacao(string s)
{
string caracterComAcento = "áàãâäéèêëíìîïóòõôöúùûüçáàãâÄéèêëíìîïóòõÖôúùûÜç, ?&:/!;ºª%‘’()\"”“";
string caracterSemAcento = "aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC___________________";
return new String(s.Select(c =>
{
int i = caracterComAcento.IndexOf(c);
return (i == -1) ? c : caracterSemAcento[i];
}).ToArray());
}

Here is a really simple method that I've used recently.
I hope it meets your requirements. To be honest, the code is a bit difficult to read due to the language of the variable declarations.
List<char> InvalidCharacters = new List<char>() { 'a','b','c' };
static string StripInvalidCharactersFromField(string field)
{
for (int i = 0; i < field.Length; i++)
{
string s = new string(new char[] { field[i] });
if (InvalidCharacters.Contains(s))
{
field = field.Remove(i, 1);
i--;
}
}
return field;
}

Related

Replace char if not between "\'" char

I have a string that represent an action,
each arg in the action is seporated by the char ';',
for each arg I want to replace the char ',' with the char '.' but only if the ',' is not between ' char using Regex replace
For example:
1- "ActionName('1,b';1,2)"
2- "ActionName('a,b';1,2;1.2;'1,3')"
Desire result:
1- "ActionName('1,b';1.2)"
2- "ActionName('a,b';1.2;1.2;'1,3')
Conditions:
The ',' can appear multiple times inside a string.
Currntly I split the string for ';' loop over all the parts and each part I split for '\''.
Example Code:
public string Transform(string expression)
{
string newExpression = string.Empty;
string[] expParts = expression.Split(';');
for (int i = 0; i < expParts.Length; i++)
{
string newSubExpression = string.Empty;
string[] subExpParts = expParts[i].Split(new char[] { '\'' });
for (int subIndex = 0; subIndex < subExpParts.Length; subIndex += 2)
{
newSubExpression += subExpParts[subIndex].Replace(',', ".");
if (subIndex < subExpParts.Length - 1)
newSubExpression += "\'" + subExpParts[subIndex + 1] + "\'";
}
newExpression += newSubExpression;
if (i < expParts.Length - 1)
newExpression = newExpression + ",";
}
return newExpression;
}
You can use (?<=^([^']|'[^']*')*),
var myPattern= "(?<=^([^']|'[^']*')*),";
var regex = new Regex(myPattern);
var result = regex.Replace("ActionName('a,b';1,2;1.2;'1,3')", ".");
Output
ActionName('a,b';1.2;1.2;'1,3')
Demo here
Since you have tagged the question a regex, I post a regex that works for your input (at least what you posted):
(,(?![\w\d]*'))
Just an example, I think that it can be useful for you as a starting point...
You need to replace the matching regex with a ., in C# you can do like this:
result = Regex.Replace(input, #"(,(?![\w\d]*'))", #".");
Take a look at regex lookaround documentation for more information.
A simple FSM (Finite State Machine) will do. Please, notice that we have just two states (encoded with inQuotation): are we within quotated chunk or not.
public static string Transform(string expression) {
if (string.IsNullOrEmpty(expression))
return expression; // Or throw ArgumentNullException
StringBuilder sb = new StringBuilder(expression.Length);
bool inQuotation = false;
foreach (char c in expression)
if (c == ',' && !inQuotation)
sb.Append('.');
else {
if (c == '\'')
inQuotation = !inQuotation;
sb.Append(c);
}
return sb.ToString();
}
Tests:
string[] tests = new string[] {
"ActionName('1,b';1,2)",
"ActionName('a,b';1,2;1.2;'1,3')",
};
var result = tests
.Select((line, index) => $"{index + 1}- {Transform(line)}");
Console.WriteLine(string.Join(Environment.NewLine, result));
Outcome:
1- ActionName('1,b';1.2)
2- ActionName('a,b';1.2;1.2;'1,3')

English to PigLatin c#

Just had this Pig Latin problem as "homework". The conditions I have been given are:
For words that begin with consonant sounds, all letters before the initial vowel are placed at the end of the word sequence. Then, ay is added.
For words that begin with vowel sounds move the initial vowel(s) along with the first consonant or consonant cluster to the end of the word and add ay.
For words that have no consonant add way.
Tested with:
Write a method that will convert an English sentence into Pig Latin
That turned into
itewray away ethodmay atthay illway onvertcay anay ishenglay entencesay ointay igpay atinlay
It does what it should with one exception which is not in the rules but I thought about it and I have no idea how I can implement it. The method I created does exactly what the problem is asking but if I try to convert an all consonants word into piglatin it does not work. For example grrrrr into piglatin should be grrrrray.
public static string ToPigLatin(string sentencetext)
{
string vowels = "AEIOUaeiou";
//string cons = "bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ";
List<string> newWords = new List<string>();
foreach (string word in sentencetext.Split(' '))
{
if (word.Length == 1)
{
newWords.Add(word + "way");
}
if (word.Length == 2 && vowels.Contains(word[0]))
{
newWords.Add(word + "ay");
}
if (word.Length == 2 && vowels.Contains(word[1]) && !vowels.Contains(word[0]))
{
newWords.Add(word.Substring(1) + word.Substring(0, 1) + "ay");
}
if (word.Length == 2 && !vowels.Contains(word[1]) && !vowels.Contains(word[0]))
{
newWords.Add(word + "ay");
}
for (int i = 1; i < word.Length; i++)
{
if (vowels.Contains(word[i]) && (vowels.Contains(word[0])))
{
newWords.Add(word.Substring(i) + word.Substring(0, i) + "ay");
break;
}
}
for (int i = 0; i < word.Length; i++)
{
if (vowels.Contains(word[i]) && !(vowels.Contains(word[0])) && word.Length > 2)
{
newWords.Add(word.Substring(i) + word.Substring(0, i) + "ay");
break;
}
}
}
return string.Join(" ", newWords);
}
static void Main(string[] args)
{
//Console.WriteLine("Enter a sentence to convert to PigLatin:");
// string sentencetext = Console.ReadLine();
string pigLatin = ToPigLatin("Write a method that will convert an English sentence into Pig Latin");
Console.WriteLine(pigLatin);
Console.ReadKey();
}
Give this a go:
public static string ToPigLatin(string sentencetext)
{
string vowels = "AEIOUaeiou";
string cons = "bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ";
Func<string, string> toPigLatin = word =>
{
word = word.ToLower();
var result = word;
Func<string, string, (string, string)> split = (w, l) =>
{
var prefix = new string(w.ToArray().TakeWhile(x => l.Contains(x)).ToArray());
return (prefix, w.Substring(prefix.Length));
};
if (!word.Any(w => cons.Contains(w)))
{
result = word + "way";
}
else
{
var (s, e) = split(word, vowels);
var (s2, e2) = split(e, cons);
result = e2 + s + s2 + "ay";
}
return result;
};
return string.Join(" ", sentencetext.Split(' ').Select(x => toPigLatin(x)));
}
The code:
string pigLatin = ToPigLatin("Grrrr Write a method that will convert an English sentence into Pig Latin");
Console.WriteLine(pigLatin);
gives:
grrrray itewray away ethodmay atthay illway onvertcay anay ishenglay entencesay ointay igpay atinlay

Replace character in string with Uppercase of next in line (Pascal Casing)

I want to remove all underscores from a string with the uppercase of the character following the underscore. So for example: _my_string_ becomes: MyString similarly: my_string becomes MyString
Is there a simpler way to do it? I currently have the following (assuming no input has two consecutive underscores):
StringBuilder sb = new StringBuilder();
int i;
for (i = 0; i < input.Length - 1; i++)
{
if (input[i] == '_')
sb.Append(char.ToUpper(input[++i]));
else if (i == 0)
sb.Append(char.ToUpper(input[i]));
else
sb.Append(input[i]);
}
if (i < input.Length && input[i] != '_')
sb.Append(input[i]);
return sb.ToString();
Now I know this is not totally related, but I thought to run some numbers on the implementations provided in the answers, and here are the results in Milliseconds for each implementation using 1000000 iterations of the string: "_my_string_121_a_" :
Achilles: 313
Raj: 870
Damian: 7916
Dmitry: 5380
Equalsk: 574
method utilised:
Stopwatch stp = new Stopwatch();
stp.Start();
for (int i = 0; i < 1000000; i++)
{
sb = Test("_my_string_121_a_");
}
stp.Stop();
long timeConsumed= stp.ElapsedMilliseconds;
In the end I think I'll go with Raj's implementation, because it's just very simple and easy to understand.
This must do it using ToTitleCase using System.Globalization namespace
static string toCamel(string input)
{
TextInfo info = CultureInfo.CurrentCulture.TextInfo;
input= info.ToTitleCase(input).Replace("_", string.Empty);
return input;
}
Shorter (regular expressions), but I doubt if it's better (regular expressions are less readable):
string source = "_my_string_123__A_";
// MyString123A
string result = Regex
// _ + lower case Letter -> upper case letter (thanks to Wiktor Stribiżew)
.Replace(source, #"(_+|^)(\p{Ll})?", match => match.Groups[2].Value.ToUpper())
// all the other _ should be just removed
.Replace("_", "");
Loops over each character and converts to uppercase as necessary.
public string GetNewString(string input)
{
var convert = false;
var sb = new StringBuilder();
foreach (var c in input)
{
if (c == '_')
{
convert = true;
continue;
}
if (convert)
{
sb.Append(char.ToUpper(c));
convert = false;
continue;
}
sb.Append(c);
}
return sb.ToString().First().ToString().ToUpper() + sb.ToString().Substring(1);
}
Usage:
GetNewString("my_string");
GetNewString("___this_is_anewstring_");
GetNewString("___this_is_123new34tring_");
Output:
MyString
ThisIsAnewstring
ThisIs123new34tring
Try with Regex:
var regex = new Regex("^[a-z]|_[a-z]?");
var result = regex.Replace("my_string_1234", x => x.Value== "_" ? "" : x.Value.Last().ToString().ToUpper());
Tested with:
my_string -> MyString
_my_string -> MyString
_my_string_ -> MyString
You need convert snake case to camel case, You can use this code it's working for me
var x ="_my_string_".Split(new[] {"_"}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => char.ToUpperInvariant(s[0]) + s.Substring(1, s.Length - 1))
.Aggregate(string.Empty, (s1, s2) => s1 + s2);
x = MyString
static string toCamel(string input)
{
StringBuilder sb = new StringBuilder();
int i;
for (i = 0; i < input.Length; i++)
{
if ((i == 0) || (i > 0 && input[i - 1] == '_'))
sb.Append(char.ToUpper(input[i]));
else
sb.Append(char.ToLower(input[i]));
}
return sb.ToString();
}

Split string by character count and store in string array [duplicate]

This question already has answers here:
Splitting a string into chunks of a certain size
(39 answers)
Split string after certain character count
(4 answers)
Closed 8 years ago.
I have a string like this
abcdefghij
And I wast to split this string by 3 characters each.
My desired output will be a string array containing this
abc
def
ghi
j
Is is possible using string.Split() method?
This code will group the chars in groups of 3, and convert each group to a string.
string s = "abcdefghij";
var split = s.Select((c, index) => new {c, index})
.GroupBy(x => x.index/3)
.Select(group => group.Select(elem => elem.c))
.Select(chars => new string(chars.ToArray()));
foreach (var str in split)
Console.WriteLine(str);
prints
abc
def
ghi
j
Fiddle: http://dotnetfiddle.net/1PgFu7
Using a bit of Linq
static IEnumerable<string> Split(string str)
{
while (str.Length > 0)
{
yield return new string(str.Take(3).ToArray());
str = new string(str.Skip(3).ToArray());
}
}
Here is the Demo
IEnumerable<string> GetNextChars ( string str, int iterateCount )
{
var words = new List<string>();
for ( int i = 0; i < str.Length; i += iterateCount )
if ( str.Length - i >= iterateCount ) words.Add(str.Substring(i, iterateCount));
else words.Add(str.Substring(i, str.Length - i));
return words;
}
This will avoid ArgumentOutOfRangeException in #Sajeetharan's answer.
Edit: Sorry for completely dumb previous answer of mine :) this is supposed to do the trick.
No, I don't believe it is possible using just string.Split(). But it is simple enough to create your own function...
string[] MySplit(string input)
{
List<string> results = new List<string>();
int count = 0;
string temp = "";
foreach(char c in input)
{
temp += c;
count++;
if(count == 3)
{
result.Add(temp);
temp = "";
count = 0;
}
}
if(temp != "")
result.Add(temp);
return result.ToArray();
}
IEnumerable<string> Split(string str) {
for (int i = 0; i < str.Length; i += 3)
yield return str.Substring(i, Math.Min(str.Length - i, 3));
}

Reverse case of all alphabetic characters in C# string

What is the simplest way to reverse the case of all alphabetic characters in a C# string? For example "aBc1$;" should become "AbC1$;" I could easily write a method that does this, but I am hoping there is a library call that I don't know about that would make this easier. I would also like to avoid having a list of all known alphabetic characters and comparing each character to what is in the list. Maybe this can be done with regular expressions, but I don't know them very well. Thanks.
Thanks for the help. I created a string extension method for this that is mostly inspired by Anthony Pegram's solution, but without the LINQ. I think this strikes a good balance between readability and performance. Here is what I came up with.
public static string SwapCase(this string source) {
char[] caseSwappedChars = new char[source.Length];
for(int i = 0; i < caseSwappedChars.Length; i++) {
char c = source[i];
if(char.IsLetter(c)) {
caseSwappedChars[i] =
char.IsUpper(c) ? char.ToLower(c) : char.ToUpper(c);
} else {
caseSwappedChars[i] = c;
}
}
return new string(caseSwappedChars);
}
You could do it in a line with LINQ. One method:
string input = "aBc1$";
string reversedCase = new string(
input.Select(c => char.IsLetter(c) ? (char.IsUpper(c) ?
char.ToLower(c) : char.ToUpper(c)) : c).ToArray());
If you don't care about internationalization:
string input = "aBc1$#[\\]^_{|{~";
Encoding enc = new System.Text.ASCIIEncoding();
byte[] b = enc.GetBytes(input);
for (int i = input.Length - 1; i >= 0; i -= 1) {
if ((b[i] & 0xdf) >= 65 && (b[i] & 0xdf) <= 90) { //check if alpha
b[i] ^= 0x20; // then XOR the correct bit to change case
}
}
Console.WriteLine(input);
Console.WriteLine(enc.GetString(b));
If, on the other hand, you DO care about internationalization, you'll want to pass in CultureInfo.InvariantCulture to your ToUpper() and ToLower() functions...
You could do it old-school if you don't know LINQ.
static string InvertCasing(string s)
{
char[] c = s.ToCharArray();
char[] cUpper = s.ToUpper().ToCharArray();
char[] cLower = s.ToLower().ToCharArray();
for (int i = 0; i < c.Length; i++)
{
if (c[i] == cUpper[i])
{
c[i] = cLower[i];
}
else
{
c[i] = cUpper[i];
}
}
return new string(c);
}
Here's a regex approach:
string input = "aBcdefGhI123jKLMo$";
string result = Regex.Replace(input, "[a-zA-Z]",
m => Char.IsUpper(m.Value[0]) ?
Char.ToLower(m.Value[0]).ToString() :
Char.ToUpper(m.Value[0]).ToString());
Console.WriteLine("Original: " + input);
Console.WriteLine("Modified: " + result);
You can use Char.Parse(m.Value) as an alternate to m.Value[0]. Also, be mindful of using the ToUpperInvariant and ToLowerInvariant methods instead. For more info see this question: In C# what is the difference between ToUpper() and ToUpperInvariant()?
I made an extension method for strings which does just this!
public static class InvertStringExtension
{
public static string Invert(this string s)
{
char[] chars = s.ToCharArray();
for (int i = 0; i < chars.Length; i++)
chars[i] = chars[i].Invert();
return new string(chars);
}
}
public static class InvertCharExtension
{
public static char Invert(this char c)
{
if (!char.IsLetter(c))
return c;
return char.IsUpper(c) ? char.ToLower(c) : char.ToUpper(c);
}
}
To use
var hello = "hELLO wORLD";
var helloInverted = hello.Invert();
// helloInverted == "Hello World"
char[] carr = str.ToCharArray();
for (int i = 0; i < carr.Length; i++)
{
if (char.IsLetter(carr[i]))
{
carr[i] = char.IsUpper(carr[i]) ? char.ToLower(carr[i]) : char.ToUpper(carr[i]);
}
}
str = new string(carr);
I was asked a similar question yesterday and my answer is like:
public static partial class StringExtensions {
public static String InvertCase(this String t) {
Func<char, String> selector=
c => (char.IsUpper(c)?char.ToLower(c):char.ToUpper(c)).ToString();
return t.Select(selector).Aggregate(String.Concat);
}
}
You can easily change the method signature to add a parameter of type CultureInfo, and use it with methods like char.ToUpper for a requirement of globalization.
A little bit faster than some other methods listed here and it is nice because it uses Char arithmetics!
var line = "someStringToSwapCase";
var charArr = new char[line.Length];
for (int i = 0; i < line.Length; i++)
{
if (line[i] >= 65 && line[i] <= 90)
{
charArr[i] = (char)(line[i] + 32);
}
else if (line[i] >= 97 && line[i] <= 122)
{
charArr[i] = (char)(line[i] - 32);
}
else
{
charArr[i] = line[i];
}
}
var res = new String(charArr);
This will helps you more.. because here i have not use directly function.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Practice
{
class Program
{
static void Main(string[] args)
{
char[] new_str = new char[50];
string str;
int ch;
Console.Write("Enter String : ");
str = Console.ReadLine();
for (int i = 0; i < str.Length; i++)
{
ch = (int)str[i];
if (ch > 64 && ch < 91)
{
ch = ch + 32;
new_str[i] = Convert.ToChar(ch);
}
else
{
ch = ch - 32;
new_str[i] = Convert.ToChar(ch);
}
}
Console.Write(new_str);
Console.ReadKey();
}
}
}
I am sure this will also works for you.. Thank you.
Code below makes only 2 function calls to each letter. Instead of checking if IsLetter, we just apply upper/lowercase if necessary.
string result="";
foreach (var item in S)
{
if (char.ToLower(item) != item )
result+= char.ToLower(item);
else
result+= char.ToUpper(item);
}
It would be also possible (tho less readable) to create an extra variable and set it to char.ToLower(item) before the check, exchanging one function call for one extra variable, thie way:
string result="";
foreach (var item in S)
{
var temp=char.ToLower(item);
if (temp != item )
result+= temp;
else
result+= char.ToUpper(item);
}

Categories