Given the c# code:
string foo = #"
abcde
fghijk";
I am trying to remove all formatting, including whitespaces between the lines.
So far the code
foo = foo.Replace("\n","").Replace("\r", "");
works but the whitespace between lines 2 and 3 and still kept.
I assume a regular expression is the only solution?
Thanks.
I'm assuming you want to keep multiple lines, if not, i'd choose CAbbott's answer.
var fooNoWhiteSpace = string.Join(
Environment.NewLine,
foo.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim())
);
What this does it split the string into lines (foo.Split),
trim whitespace from the start and end of each line (.Select(fooline => fooline.Trim())),
then combine them back together with a new line inbetween (string.Join).
You could use a regular expression:
foo = Regex.Replace(foo, #"\s+", "");
How about this?
string input = #"
abcde
fghijk";
string output = "";
string[] parts = input.Split('\n');
foreach (var part in parts)
{
// If you want everything on one line... else just + "\n" to it
output += part.Trim();
}
This should remove everthing.
If the whitespace is all spaces, you could use
foo.Replace(" ", "");
For any other whitespace that may be in there, do the same. Example:
foo.Replace("\t", "");
Just add a Replace(" ", "") your dealing with a string literal which mean all the white space is part of the string.
Try something like this:
string test = #"
abcde
fghijk";
EDIT: Addded code to only filter out white spaces.
string newString = new string(test.Where(c => Char.IsWhiteSpace(c) == false).ToArray());
Produces the following: abcdefghijk
I've written something similar to George Duckett but put my logic into a string extension method so it easier for other to read/consume:
public static class Extensions
{
public static string RemoveTabbing(this string fmt)
{
return string.Join(
System.Environment.NewLine,
fmt.Split(new string[] { System.Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(fooline => fooline.Trim()));
}
}
you can the call it like this:
string foo = #"
abcde
fghijk".RemoveTabbing();
I hope that helps someone
Related
I've got a string value with a lot of different characters
I want to:
replace TAB,ENTER, with Space
replace Arabic ي with Persian ی
replace Arabic ك with Persian ک
remove newlines from both sides of a string
replace multiple space with one space
Trim space
The following Function is for cleaning data. and it works correctly.
Does anyone have any idea for better performance and less code for maintenance :)
static void Main(string[] args)
{
var output = "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
//output = output.Replace("\u064A", "\u0649");//ي
output = output.Replace("\u064A", "\u06CC");//replace arabic ي with persian ی
output = output.Replace("\u0643", "\u06A9");//replace arabic ك with persian ک
output = output.Trim('\r', '\n');//remove newlines from both sides of a string
output = output.Replace("\n", "").Replace("\r", " ");//replace newline with space
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);//replace multiple space with one space
output = regex.Replace(output, " ");
char tab = '\u0009';
output = output.Replace(tab.ToString(), "");
Console.WriteLine(output);
}
You can refactor using two lists: one for the trim process and one for the replace process.
var itemsTrimChars = new List<char>()
{
'\r',
'\n'
};
var itemsReplaceStrings = new Dictionary<string, string>()
{
{ "\n", "" },
{ "\r", " " },
{ "\u064A", "\u06CC" },
{ "\u0643", "\u06A9" },
{ "\u0009", "" }
}.ToList();
Thus they are maintenable tables with the technology you want: as local in this example, declared at the level of a class, using tables in a database, using disk text files...
Used like that:
itemsTrimChars.ForEach(c => output = output.Trim(c));
itemsReplaceStrings.ForEach(p => output = output.Replace(p.Key, p.Value));
For the regex to replace double spaces, I know nothing about, but if you need to replace other doubled, you can create a third list.
You can do this by iterating over each character and apply those rules, forming a new output string that is the format you want. It should be faster than all those string.Replace, and Regex.Match.
Use string builder for performance when appending, don't use string += string
First Find Character in your string and then remove it and in the same index add new character
private string ReplaceChars(string Source, string Find, string Replace)
{
int Place = Source.IndexOf(Find);
string result = Source.Remove(Place, Find.Length).Insert(Place, Replace);
return result;
}
Usage :
text= "كgeeks 01$سهيلاطريقي03. اشك!#!!.ي";
var result =ReplaceChars(text,"ي","ی");
It is very basic question but i am not sure why it is not working. I have code where 'And' can be written in any of the ways 'And', 'and', etc. and i want to replace it with ','
I tried this:
and.Replace("and".ToUpper(),",");
but this is not working, any other way to do this or make it work?
You should check out the Regex class
http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx
using System.Text.RegularExpressions;
Regex re = new Regex("\band\b", RegexOptions.IgnoreCase);
string and = "This is my input string with and string in between.";
re.Replace(and, ",");
words = words.Replace("AND", ",")
.Replace("and", ",");
Or use RegEx.
The Replace method returns a string where the replacement is visible. It does not modify the original string. You should try something along the lines of
and = and.Replace("and",",");
You can do this for all variations of "and" you may encounter, or as other answers have suggested, you could use a regex.
I guess you should take care if some words contain and, say "this is sand and sea". The word "sand" must not be influenced by the replacement.
string and = "this is sand and sea";
//here you should probably add those delimiters that may occur near your "and"
//this substitution is not universal and will omit smth like this " and, "
string[] delimiters = new string[] { " " };
//it result in: "this is sand , sea"
and = string.Join(" ",
and.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Length == 3 && s.ToUpper().Equals("AND")
? ","
: s));
I would also add smth like this:
and = and.Replace(" , ", ", ");
So, the output:
this is sand, sea
try this way to use the static Regex.Replace() method:
and = System.Text.RegularExpressions.Regex.Replace(and,"(?i)and",",");
The "(?i)" causes the following text search to be case-insensitive.
http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx
http://msdn.microsoft.com/en-us/library/xwewhkd1(v=vs.100).aspx
I am trying to remove " from a string using Regex.
I am receiving a string into a method, I would like to take the string and split it up into the words that are in the string.
My Code is below, hopefully you can see what I am doing.
The problem I am having is trying to tell Regex that " is what I would like to remove. I have tried numerous ways: I have searched Google for a answer and have had to resort to here.
search_string looks like this: blah="blah" la="la" ta="ta" and in the end I want just the blah blah la la ta ta.
public blahblah blahblah(blah blah, string search_string)
{
Regex r = new Regex(#"/"+");
string s3 = r.Replace(search_string, #" ");
Regex r2 = new Regex(" ");
Regex r3 = new Regex("=");
string[] new_Split = { };
string[] split_String = r2.Split(s3);
foreach (string match in split_String)
{
new_Split = r3.Split(match);
}
//do blahblah stuff with new_Split[1] .. etc
// new_Split[0] should be blah and new_Split[1] should
// be blah with out "", not "blah"
return blah_Found;
Just use:
myString = myString.Replace( "\"", String.Empty );
[Update]
The String.Empty or "" is not a space char. You wrote this
blah="blah" la="la" ta="ta"
you want to convert to
blah blah la la ta ta
So you have white spaces anyway. If you want this:
blahblahlalatata
you need to remove them too:
myString = myString.Replace( "\"", String.Empty ).Replace( " ", String.Empty );
for '=' do it again, and so on...
You need to be more precise in your questions.
As a quick thought - and barking maybe up entirely the wrong tree, but wouldnt you want something like
Regex r = new Regex("(\".*\")");
eg, a reg expression of ".*"
This is one way to do it.
It will Search for anything in that form: SomeWord="somethingelse"
and replace it with SomeWord somethingelse
var regex = new Regex(#"(\w+)=\""(.+)\""");
var result = regex.Replace("bla=\"bla\"", "$1 $2");
I can't help you with Regex.
Anyway if you only need to remove = and " and split words you could try:
string[] arr = s
.Replace("="," ")
.Replace("\""," ")
.Split(new string[1] {" "}, StringSplitOptions.RemoveEmptyEntries);
I did it in 2 passes
string input = "blah=\"blah\" la=\"la\" ta=\"ta\"";
//replace " and = with a space
string output = Regex.Replace(input, "[\"=]", " ");
//condense the spaces
output = Regex.Replace(output, #"\s+", " ");
EDIT:
Treating " and = differently as per comment.
string input = "blah=\"blah\" la=\"la\" ta=\"ta\"";
//replace " and = with a space
string output = Regex.Replace(input, "\"", String.Empty);
output = Regex.Replace(output, "=", " ");
Clearly regex is a bit overkill here.
I have some code that tokenizes a equation input into a string array:
string infix = "( 5 + 2 ) * 3 + 4";
string[] tokens = tokenizer(infix, #"([\+\-\*\(\)\^\\])");
foreach (string s in tokens)
{
Console.WriteLine(s);
}
Now here is the tokenizer function:
public string[] tokenizer(string input, string splitExp)
{
string noWSpaceInput = Regex.Replace(input, #"\s", "");
Console.WriteLine(noWSpaceInput);
Regex RE = new Regex(splitExp);
return (RE.Split(noWSpaceInput));
}
When I run this, I get all characters split, but there is an empty string inserted before the parenthesis chracters...how do I remove this?
//empty string here
(
5
+
2
//empty string here
)
*
3
+
4
I would just filter them out:
public string[] tokenizer(string input, string splitExp)
{
string noWSpaceInput = Regex.Replace(input, #"\s", "");
Console.WriteLine(noWSpaceInput);
Regex RE = new Regex(splitExp);
return (RE.Split(noWSpaceInput)).Where(x => !string.IsNullOrEmpty(x)).ToArray();
}
What you're seeing is because you have nothing then a separator (i.e. at the beginning of the string is(), then two separator characters next to one another (i.e. )* in the middle). This is by design.
As you may have found with String.Split, that method has an optional enum which you can give to have it remove any empty entries, however, there is no such parameter with regular expressions. In your specific case you could simply ignore any token with a length of 0.
foreach (string s in tokens.Where(tt => tt.Length > 0))
{
Console.WriteLine(s);
}
Well, one option would be to filter them out afterwards:
return RE.Split(noWSpaceInput).Where(x => !string.IsNullOrEmpty(x)).ToArray();
Try this (if you don't want to filter the result):
tokenizer(infix, #"(?=[-+*()^\\])|(?<=[-+*()^\\])");
Perl demo:
perl -E "say join ',', split /(?=[-+*()^])|(?<=[-+*()^])/, '(5+2)*3+4'"
(,5,+,2,),*,3,+,4
Altho it would be better to use a match instead of split in this case imo.
I think you can use the [StringSplitOptions.RemoveEmptyEntries] by the split
static void Main(string[] args)
{
string infix = "( 5 + 2 ) * 3 + 4";
string[] results = infix.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (var result in results)
Console.WriteLine(result);
Console.ReadLine();
}
How do I replace \n with empty space?
I get an empty literal error if I do this:
string temp = mystring.Replace('\n', '');
String.Replace('\n', '') doesn't work because '' is not a valid character literal.
If you use the String.Replace(string, string) override, it should work.
string temp = mystring.Replace("\n", "");
As replacing "\n" with "" doesn't give you the result that you want, that means that what you should replace is actually not "\n", but some other character combination.
One possibility is that what you should replace is the "\r\n" character combination, which is the newline code in a Windows system. If you replace only the "\n" (line feed) character it will leave the "\r" (carriage return) character, which still may be interpreted as a line break, depending on how you display the string.
If the source of the string is system specific you should use that specific string, otherwise you should use Environment.NewLine to get the newline character combination for the current system.
string temp = mystring.Replace("\r\n", string.Empty);
or:
string temp = mystring.Replace(Environment.NewLine, string.Empty);
This should work.
string temp = mystring.Replace("\n", "");
Are you sure there are actual \n new lines in your original string?
string temp = mystring.Replace("\n", string.Empty).Replace("\r", string.Empty);
Obviously, this removes both '\n' and '\r' and is as simple as I know how to do it.
If you use
string temp = mystring.Replace("\r\n", "").Replace("\n", "");
then you won't have to worry about where your string is coming from.
One caveat: in .NET the linefeed is "\r\n". So if you're loading your text from a file, you might have to use that instead of just "\n"
edit> as samuel pointed out in the comments, "\r\n" is not .NET specific, but is windows specific.
What about creating an Extension Method like this....
public static string ReplaceTHAT(this string s)
{
return s.Replace("\n\r", "");
}
And then when you want to replace that wherever you want you can do this.
s.ReplaceTHAT();
Best Regards!
Here is your exact answer...
const char LineFeed = '\n'; // #10
string temp = new System.Text.RegularExpressions.Regex(
LineFeed
).Replace(mystring, string.Empty);
But this one is much better... Specially if you are trying to split the lines (you may also use it with Split)
const char CarriageReturn = '\r'; // #13
const char LineFeed = '\n'; // #10
string temp = new System.Text.RegularExpressions.Regex(
string.Format("{0}?{1}", CarriageReturn, LineFeed)
).Replace(mystring, string.Empty);
string temp = mystring.Replace("\n", " ");
#gnomixa - What do you mean in your comment about not achieving anything? The following works for me in VS2005.
If your goal is to remove the newline characters, thereby shortening the string, look at this:
string originalStringWithNewline = "12\n345"; // length is 6
System.Diagnostics.Debug.Assert(originalStringWithNewline.Length == 6);
string newStringWithoutNewline = originalStringWithNewline.Replace("\n", ""); // new length is 5
System.Diagnostics.Debug.Assert(newStringWithoutNewline.Length == 5);
If your goal is to replace the newline characters with a space character, leaving the string length the same, look at this example:
string originalStringWithNewline = "12\n345"; // length is 6
System.Diagnostics.Debug.Assert(originalStringWithNewline.Length == 6);
string newStringWithoutNewline = originalStringWithNewline.Replace("\n", " "); // new length is still 6
System.Diagnostics.Debug.Assert(newStringWithoutNewline.Length == 6);
And you have to replace single-character strings instead of characters because '' is not a valid character to be passed to Replace(string,char)
I know this is an old post but I'd like to add my method.
public static string Replace(string text, string[] toReplace, string replaceWith)
{
foreach (string str in toReplace)
text = text.Replace(str, replaceWith);
return text;
}
Example usage:
string newText = Replace("This is an \r\n \n an example.", new string[] { "\r\n", "\n" }, "");
Found on Bytes.com:
string temp = mystring.Replace('\n', '\0');// '\0' represents an empty char