Is there a method for removing whitespace characters from a string?

Is there a method for removing whitespace characters from a string? - c#

Is there a string class member function (or something else) for removing all spaces from a string? Something like Python's str.strip() ?

You could simply do:
myString = myString.Replace(" ", "");
If you want to remove all white space characters you could use Linq, even if the syntax is not very appealing for this use case:
myString = new string(myString.Where(c => !char.IsWhiteSpace(c)).ToArray());

String.Trim method removes trailing and leading white spaces. It is the functional equivalent of Python's strip method.

LINQ feels like overkill here, converting a string to a list, filtering the list, then turning it back onto a string. For removal of all white space, I would go for a regular expression. Regex.Replace(s, #"\s", ""). This is a common idiom and has probably been optimized.

If you want to remove the spaces that prepend the string or at itt's end, you might want to have a look at TrimStart() and TrimEnd() and Trim().

If you're looking to replace all whitespace in a string (not just leading and trailing whitespace) based on .NET's determination of what's whitespace or not, you could use a pretty simple LINQ query to make it work.
string whitespaceStripped = new string((from char c in someString
where !char.IsWhiteSpace(c)
select c).ToArray());

Yes, Trim.
String a = "blabla ";
var b = a.Trim(); // or TrimEnd or TrimStart

Yes, String.Trim().
var result = " a b ".Trim();
gives "a b" in result. By default all whitespace is trimmed. If you want to remove only space you need to type
var result = " a b ".Trim(' ');
If you want to remove all spaces in a string you can use string.Replace().
var result = " a b ".Replace(" ", "");
gives "ab" in result. But that is not equivalent to str.strip() in Python.

I don't know much about Python...
IF the str.strip() just removes whitespace at the start and the end then you could use str = str.Trim() in .NET... otherwise you could just str = str.Replace ( " ", "") for removing all spaces.
IF it removes all whitespace then use
str = (from c in str where !char.IsWhiteSpace(c) select c).ToString()

There are many diffrent ways, some faster then others:
public static string StripTabsAndNewlines(this string s) {
//string builder (fast)
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.Length; i++) {
if ( ! Char.IsWhiteSpace(s[i])) {
sb.Append();
}
}
return sb.tostring();
//linq (faster ?)
return new string(input.ToCharArray().Where(c => !Char.IsWhiteSpace(c)).ToArray());
//regex (slow)
return Regex.Replace(s, #"\s+", "")
}

you could use
StringVariable.Replace(" ","")

I'm surprised no one mentioned this:
String.Join("", " all manner\tof\ndifferent\twhite spaces!\n".Split())
string.Split by default splits along the characters that are char.IsWhiteSpace so this is a very similar solution to filtering those characters out by the direct use of char.IsWhiteSpace and it's a one-liner that works in pre-LINQ environments as well.

Strip spaces? Strip whitespaces? Why should it matter? It only matters if we're searching for an existing implementation, but let's not forget how fun it is to program the solution rather than search MSDN (boring).
You should be able to strip any chars from any string by using 1 of the 2 functions below.
You can remove any chars like this
static string RemoveCharsFromString(string textChars, string removeChars)
{
string tempResult = "";
foreach (char c in textChars)
{
if (!removeChars.Contains(c))
{
tempResult = tempResult + c;
}
}
return tempResult;
}
or you can enforce a character set (so to speak) like this
static string EnforceCharLimitation(string textChars, string allowChars)
{
string tempResult = "";
foreach (char c in textChars)
{
if (allowChars.Contains(c))
{
tempResult = tempResult + c;
}
}
return tempResult;
}

Related

What is the regular expression to replace white space with a specified character?

I have searched lot of questions and answers but, I just got lengthy and complicated expressions. Now I want to replace all white spaces from the string. I know it can be done by regex. but, I don't have enough knowledge about regex and how to replace all white space with ','(comma) using it. I have checked some links but, I didn't get exact answer. If you have any link of posted question or answer like this. please suggest me.
My string is defined as below.
string sText = "BankMaster AccountNo decimal To varchar";
and the result should be return as below.
"BankMaster,AccountNo,decimal,To,varchar"
Full Code:
string sItems = Clipboard.GetText();
string[] lines = sItems.Split('\n');
for (int iLine =0; iLine<lines.Length;iLine++)
{
string sLine = lines[iLine];
sLine = //CODE TO REPLACE WHITE SPACE WITH ','
string[] cells = sLine.Split(',');
grdGrid.Rows.Add(iLine, cells[0], cells[1], cells[2], cells[4]);
}
Additional Details
I have more than 16000 line in a list. and all lines are same formatted like given example above. So, I am going to use regular expression instead of loop and recursive function call. If you have any other way to make this process more faster than regex then please suggest me.

string result = Regex.Replace(sText, "\\s+", ",");
\s+ stands for "capture all sequential whitespaces of any kind".
By whitespace regex engine undeerstands space (), tab (\t), newline (\n) and caret return (\r)

string a = "Some text with spaces";
Regex rgx = new Regex("\\s+");
string result = rgx.Replace(a, ",");
Console.WriteLine(result);
The code above will replace all the white spaces with ',' character

there are lot's of samples to do that by regular expressions:
Flex: replace all spaces with comma,
Regex replace all commas with value,
http://www.perlmonks.org/?node_id=896548,
http://www.dslreports.com/forum/r20971008-sed-help-whitespace-to-comma

Try This:
string str = "BankMaster AccountNo decimal To varchar";
StringBuilder temp = new StringBuilder();
str=str.Trim(); //trim before logic to avoid any trailing/leading whitespaces.
foreach(char ch in str)
{
if (ch == ' ' && temp[temp.Length-1] != ',')
{
temp.Append(",");
}
else if (ch != ' ')
{
temp.Append(ch.ToString());
}
}
Console.WriteLine(temp);
Output:
BankMaster,AccountNo,decimal,To,varchar

Try this:
sText = Regex.Replace(sText , #"\s+", ",");

How to remove the exact occurence of characters from a string?

For Example, I have a string like :
string str = "santhosh,phani,ravi,phani123,praveen,sathish,prakash";
I want to delete the charaters ,phani from str.
Now, I am using str = str.Replace(",phani", string.Empty);
then my output is : str="santhosh,ravi123,praveen,sathish,prakash";
But I want a output like : str="santhosh,ravi,phani123,praveen,sathish,prakash";

string str = "santhosh,phani,ravi,phani123,praveen,sathish,prakash";
var words = str.Split(',');
str = String.Join(",", words.Where(word => word != "phani"));

the better choice is to use a Split and Join method.
Easy in Linq :
String str = "santhosh,phani,ravi,phani123,praveen,sathish,prakash";
String token = "phani";
String result = String.Join(",", str.Split(',').Where(s => s != token));
(edit : I take time for testing and i'm not first ^^)

String.join(",", str.split(',').ToList().remove("phani"));
Removes any given name from the list.

How about
str = str.Replace(",phani,", ",");
This, however, does not work if "phani" is the last item in the string. To get around this, you could do this:
string source = "...";
source += ","; // Explicitly add a comma to the end
source = source.Replace(",phani,", ",").TrimEnd(',');
This adds a comma, replaces "phani" and removes the trailing comma.
A third solution would be this:
str = String.Join(",", str.Split(',').ToList().Remove("phani").ToArray());

Try to use with comma instead of;
string str = "santhosh,ravi,phani,phani123,praveen,sathish,prakash";
str = str.Replace(",phani,", ",");
Console.WriteLine(str);
Output will be;
santhosh,ravi,phani123,praveen,sathish,prakash
Here is a DEMO.
As Davin mentioned in comment, this won't work if phani is last item in the string. Silvermind's answer looks like the right answer.

string str = "santhosh,phani,ravi,phani123,praveen,sathish,prakash";
string pattern = #"\b,phani,\b";
string replace = ",";
Console.WriteLine(Regex.Replace(str, pattern, replace));
Output:
santhosh,ravi,phani123,praveen,sathish,prakash

You may use the regular expression, but you have to take care of cases when your string starts or ends with the substring:
var pattern = #",?\bphani\b,?";
var regex = new Regex(pattern);
var result = regex.Replace(input, ",").Trim(',');
Shorter notation could look like this:
var result = Regex.Replace(input, #",?\bphani\b,?", ",").Trim(',');
Explanation of the regular expression: ,?\bphani\b,? matches the word phani, but only if preceded and followed by word-delimiter characters (because of the word boundary metacharacter \b), and it can be (but doesn't have to be) preceded and followed by the comma thanks to ,? which means none or more comma(s).
At the end we need to remove possible commas from the beginning and end of the string, that's why there's Trim(',') on the result.

C# Capitalizing string, but only after certain punctuation marks

I'm trying to find an efficient way to take an input string and capitalize the first letter after every punctuation mark (. : ? !) which is followed by a white space.
Input:
"I ate something. but I didn't:
instead, no. what do you think? i
think not! excuse me.moi"
Output:
"I ate something. But I didn't:
Instead, no. What do you think? I
think not! Excuse me.moi"
The obvious would be to split it and then capitalize the first char of every group, then concatenate everything. But it's uber ugly. What's the best way to do this? (I'm thinking Regex.Replace using a MatchEvaluator that capitalizes the first letter but would like to get more ideas)
Thanks!

Fast and easy:
static class Ext
{
public static string CapitalizeAfter(this string s, IEnumerable<char> chars)
{
var charsHash = new HashSet<char>(chars);
StringBuilder sb = new StringBuilder(s);
for (int i = 0; i < sb.Length - 2; i++)
{
if (charsHash.Contains(sb[i]) && sb[i + 1] == ' ')
sb[i + 2] = char.ToUpper(sb[i + 2]);
}
return sb.ToString();
}
}
Usage:
string capitalized = s.CapitalizeAfter(new[] { '.', ':', '?', '!' });

Try this:
string expression = #"[\.\?\!,]\s+([a-z])";
string input = "I ate something. but I didn't: instead, no. what do you think? i think not! excuse me.moi";
char[] charArray = input.ToCharArray();
foreach (Match match in Regex.Matches(input, expression,RegexOptions.Singleline))
{
charArray[match.Groups[1].Index] = Char.ToUpper(charArray[match.Groups[1].Index]);
}
string output = new string(charArray);
// "I ate something. But I didn't: instead, No. What do you think? I think not! Excuse me.moi"

I use an extension method.
public static string CorrectTextCasing(this string text)
{
// /[.:?!]\\s[a-z]/ matches letters following a space and punctuation,
// /^(?:\\s+)?[a-z]/ matches the first letter in a string (with optional leading spaces)
Regex regexCasing = new Regex("(?:[.:?!]\\s[a-z]|^(?:\\s+)?[a-z])", RegexOptions.Multiline);
// First ensure all characters are lower case.
// (In my case it comes all in caps; this line may be omitted depending upon your needs)
text = text.ToLower();
// Capitalize each match in the regular expression, using a lambda expression
text = regexCasing.Replace(text, s => (s.Value.ToUpper));
// Return the new string.
return text;
}
Then I can do the following:
string mangled = "i'm A little teapot, short AND stout. here IS my Handle.";
string corrected = s.CorrectTextCasing();
// returns "I'm a little teapot, short and stout. Here is my handle."

Using the Regex / MatchEvaluator route, you could match on
"[.:?!]\s[a-z]"
and capitalize the entire match.

Where the text variable contains the string
string text = "I ate something. but I didn't: instead, no. what do you think? i think not! excuse me.moi";
string[] punctuators = { "?", "!", ",", "-", ":", ";", "." };
for (int i = 0; i< 7;i++)
{
int pos = text.IndexOf(punctuators[i]);
while(pos!=-1)
{
text = text.Insert(pos+2, char.ToUpper(text[pos + 2]).ToString());
text = text.Remove(pos + 3, 1);
pos = text.IndexOf(punctuators[i],pos+1);
}
}

Regular expression to split string and number

I have a string of the form:
codename123
Is there a regular expression that can be used with Regex.Split() to split the alphabetic part and the numeric part into a two-element string array?

I know you asked for the Split method, but as an alternative you could use named capturing groups:
var numAlpha = new Regex("(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)");
var match = numAlpha.Match("codename123");
var alpha = match.Groups["Alpha"].Value;
var num = match.Groups["Numeric"].Value;

splitArray = Regex.Split("codename123", #"(?<=\p{L})(?=\p{N})");
will split between a Unicode letter and a Unicode digit.

Regex is a little heavy handed for this, if your string is always of that form. You could use
"codename123".IndexOfAny(new char[] {'1','2','3','4','5','6','7','8','9','0'})
and two calls to Substring.

A little verbose, but
Regex.Split( "codename123", #"(?<=[a-zA-Z])(?=\d)" );
Can you be more specific about your requirements? Maybe a few other input examples.

IMO, it would be a lot easier to find matches, like:
Regex.Matches("codename123", #"[a-zA-Z]+|\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
rather than to use Regex.Split.

Well, is a one-line only: Regex.Split("codename123", "^([a-z]+)");

Another simpler way is
string originalstring = "codename123";
string alphabets = string.empty;
string numbers = string.empty;
foreach (char item in mainstring)
{
if (Char.IsLetter(item))
alphabets += item;
if (Char.IsNumber(item))
numbers += item;
}

this code is written in java/logic should be same elsewhere
public String splitStringAndNumber(String string) {
String pattern = "(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(string);
if (m.find()) {
return (m.group(1) + " " + m.group(2));
}
return "";
}

Replace char in a string

how to change
XXX#YYY.ZZZ into XXX_YYY_ZZZ
One way i know is to use the string.replace(char, char) method,
but i want to replace "#" & "." The above method replaces just one char.
one more case is what if i have XX.X#YYY.ZZZ...
i still want the output to look like XX.X_YYY_ZZZ
Is this possible?? any suggestions thanks

So, if I'm understanding correctly, you want to replace # with _, and . with _, but only if . comes after #? If there is a guaranteed # (assuming you're dealing with e-mail addresses?):
string e = "XX.X#YYY.ZZZ";
e = e.Substring(0, e.IndexOf('#')) + "_" + e.Substring(e.IndexOf('#')+1).Replace('.', '_');

Here's a complete regex solution that covers both your cases. The key to your second case is to match dots after the # symbol by using a positive look-behind.
string[] inputs = { "XXX#YYY.ZZZ", "XX.X#YYY.ZZZ" };
string pattern = #"#|(?<=#.*?)\.";
foreach (var input in inputs)
{
string result = Regex.Replace(input, pattern, "_");
Console.WriteLine("Original: " + input);
Console.WriteLine("Modified: " + result);
Console.WriteLine();
}
Although this is simple enough to accomplish with a couple of string Replace calls. Efficiency is something you will need to test depending on text size and number of replacements the code will make.

You can use the Regex.Replace method:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=VS.90).aspx

You can use the following extension method to do your replacement without creating too many temporary strings (as occurs with Substring and Replace) or incurring regex overhead. It skips to the # symbol, and then iterates through the remaining characters to perform the replacement.
public static string CustomReplace(this string s)
{
var sb = new StringBuilder(s);
for (int i = Math.Max(0, s.IndexOf('#')); i < sb.Length; i++)
if (sb[i] == '#' || sb[i] == '.')
sb[i] = '_';
return sb.ToString();
}

you can chain replace
var newstring = "XX.X#YYY.ZZZ".Replace("#","_").Replace(".","_");

Create an array with characters you want to have replaced, loop through array and do the replace based off the index.

Assuming data format is like XX.X#YYY.ZZZ, here is another alternative with String.Split(char seperator):
string[] tmp = "XX.X#YYY.ZZZ".Split('#');
string newstr = tmp[0] + "_" + tmp[1].Replace(".", "_");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Is there a method for removing whitespace characters from a string? - c#

Is there a string class member function (or something else) for removing all spaces from a string? Something like Python's str.strip() ?

You could simply do: myString = myString.Replace(" ", ""); If you want to remove all white space characters you could use Linq, even if the syntax is not very appealing for this use case: myString = new string(myString.Where(c => !char.IsWhiteSpace(c)).ToArray());

String.Trim method removes trailing and leading white spaces. It is the functional equivalent of Python's strip method.

LINQ feels like overkill here, converting a string to a list, filtering the list, then turning it back onto a string. For removal of all white space, I would go for a regular expression. Regex.Replace(s, #"\s", ""). This is a common idiom and has probably been optimized.

If you want to remove the spaces that prepend the string or at itt's end, you might want to have a look at TrimStart() and TrimEnd() and Trim().

Yes, Trim. String a = "blabla "; var b = a.Trim(); // or TrimEnd or TrimStart

you could use StringVariable.Replace(" ","")

Related

What is the regular expression to replace white space with a specified character?

How to remove the exact occurence of characters from a string?

C# Capitalizing string, but only after certain punctuation marks

Regular expression to split string and number

Replace char in a string

Categories

Resources