Regex to replace every newline - c#

I need to replace every newline character in a string with another character (say 'X'). I do not wish to collapse multiple newlines with a single newline!
PS: this regex replaces all consecutive newlines with a single newline but its not what I need.
Regex regex_newline = new Regex("(\r\n|\r|\n)+");

That will replace one or more newlines with something, not necessarily with a single newline -- that's determined by the regex_newline.Replace(...) call, which you don't show.
So basically, the answer is
Regex regex_newline = new Regex("(\r\n|\r|\n)"); // remove the '+'
// :
regex_newline.replace(somestring, "X");

Just use the String.Replace method and replace the Environment.NewLine in your string. No need for Regex.
http://msdn.microsoft.com/en-us/library/system.environment.newline.aspx

Related

Matching multiple different start and end of line in one multiline regex [duplicate]

I need to remove lines that match a particular pattern from some text. One way to do this is to use a regular expression with the begin/end anchors, like so:
var re = new Regex("^pattern$", RegexOptions.Multiline);
string final = re.Replace(initial, "");
This works fine except that it leaves an empty line instead of removing the entire line (including the line break).
To solve this, I added an optional capturing group for the line break, but I want to be sure it includes all of the different flavors of line breaks, so I did it like so:
var re = new Regex(#"^pattern$(\r\n|\r|\n)?", RegexOptions.Multiline);
string final = re.Replace(initial, "");
This works, but it seems like there should be a more straightforward way to do this. Is there a simpler way to reliably remove the entire line including the ending line break (if any)?
To match any single line break sequence you may use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029]) pattern. So, instead of (\r\n|\r|\n)?, you can use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?.
Details:
‎000A - a newline, \n
‎000B - a line tabulation char
‎000C - a form feed char
‎000D - a carriage return, \r
‎0085 - a next line char, NEL
‎2028 - a line separator char
‎- 2029 - a paragraph separator char.
If you want to remove any 0+ non-horizontal (or vertical) whitespace chars after a matched line, you may use [\s-[\p{Zs}\t]]*: any whitespace (\s) but (-[...]) a horizontal whitespace (matched with [\p{Zs}\t]). Note that for some reason, \p{Zs} Unicode category class does not match tab chars.
One more aspect must be dealt with here since you are using the RegexOptions.Multiline option: it makes $ match before a newline (\n) or end of string. That is why if your line endings are CRLF the pattern may fail to match. Hence, add an optional \r? before $ in your pattern.
So, either use
#"^pattern\r?$(?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?"
or
#"^pattern\r?$[\s-[\p{Zs}\t]]*"

How to remove special characters from start of string but leave in the body of the string?

I have a very long string, I would like to remove keep only letters at the beginning:
Example: If the string starts like: 1234 I am going to the movies at 2pm: or !##$% OMG I am 2 scared.
Then output should be: I am going to the movies at 2pm: And OMG I am 2 scared.
Note: The 2 still preserved.
In a nutshell I only want to remove the special character at the beginning of the string but leave all else in. I could use a Regex like this:
Regex.Replace(input, #"[^a-zA-Z]", "");
But not sure how I would incorporate this with s StartWith method?
By special character I am meaning only letters A-Z or a-z.
Thanks
var result = Regex.Replace(source, "^[^a-zA-Z]+", "");
^ at the beginning of regex pattern matches beginning of he string.
This method of string may help: String.TrimStart
I would go with TrimStart if you know the characters you want to remove. Or if all the characters are together at the start you could also do
myString.Substring(myString.IndexOf(' '));
to remove everything up until the first space

Regex.Replace removes '\r' character in "\r\n"

Here is a simple example
string text = "parameter=120\r\n";
int newValue = 250;
text = Regex.Replace(text, #"(?<=parameter\s*=).*", newValue.ToString());
text will be "parameter=250\n" after replacement. Replace() method removes '\r'. Does it uses unix-style for line feed by default? Adding \b to my regex (?<=parameter\s*=).*\b solves the problem, but I suppose there should be a better way to parse lines with windows-style line feeds.
Take a look at this answer. In short, the period (.) matches every character except \n in pretty much all regex implementations. Nothing to do with Replace in particular - you told it to remove any number of ., and that will slurp up \r as well.
Can't test now, but you might be able to rewrite it as (?<=parameter\s*=)[^\r\n]* to explicitly state which characters you want disallowed.
. by default doesn't match \n..If you want it to match you have to use single line mode..
(?s)(?<=parameter\s*=).*
^
(?s) would toggle the single line mode
Try this:
string text = "parameter=120\r\n";
int newValue = 250;
text = Regex.Replace(text, #"(parameter\s*=).*\r\n", "${1}" + newValue.ToString() + "\n");
Final value of text:
parameter=250\n
Match carriage return and newline explicitly. Will only match lines ending in \r\n.

Replace a string in multiline regex with end of line token

I got the following regex
var fixedString = Regex.Replace(subject, #"(:[\w]+ [\d]+)$", "",
RegexOptions.Multiline);
which doesn't work. It works if I use \r\n, but I would like to support all types of line breaks. As another answer states I have to use RegexOptions.Multiline to be able to use $ as end of line token (instead of end of string). But it doesn't seem to help.
What am I doing wrong?
I am not sure what you want to achieve, I think I understood, you want to replace also the newline character at the end of the row.
The problem is the $ is a zero width assertion. It does not match the newline character, it matches the position before \n.
You could do different other things:
If it is OK to match all following newlines, means also all following empty rows, you could do this:
var fixedString = Regex.Replace(subject, #"(:[\w]+ [\d]+)[\r\n]+", "");
If you only want to match the newline after the row and keep following empty rows, you have to make a pattern for all possible combinations, e.g.:
var fixedString = Regex.Replace(subject, #"(:[\w]+ [\d]+)\r?\n", "");
This would match the combination \n and \r\n

.NET Regular Expression: Get paragraphs

I'm trying to get paragraphs from a string in C# with Regular Expressions.
By paragraphs; I mean string blocks ending with double or more \r\n. (NOT HTML paragraphs <p>)...
Here is a sample text:
For example this is a paragraph with a carriage return here
and a new line here. At this point, second paragraph starts. A paragraph ends if double or more \r\n is matched or if reached at the end of the string ($).
I tried the pattern:
Regex regex = new Regex(#"(.*)(?:(\r\n){2,}|\r{2,}|\n{2,}|$)", RegexOptions.Multiline);
but this does not work. It matches every line ending with a single \r\n. What I need is to get all characters including single carriage returns and newline chars till reached a double \r\n.
.* is being greedy and consuming as much as it can. Your second set of () has a $ so the expression that is being used is (.*)(?). In order to make the .* not be greedy, follow it with a ?.
When you specify RegexOptions.Multiline, .NET will split the input on line breaks. Use RegexOptions.Singleline to make it treat the entire input as one.
Regex regex = new Regex(#"(.*?)(?:(\r\n){2,}|\r{2,}|\n{2,}|$)", RegexOptions.Singleline);
An opposite approach will be to match the separators instead of the paragraphs, making the problem almost trivial. Consider:
string[] paragraphs = Regex.Split(text, #"^\s*$", RegexOptions.Multiline);
By splitting the input string by empty lines you can easily get all paragraphs. If you only want blank lines with no spaces you can simplify that even further, and use the parretn ^$. In that case you can also use the non-regex String.Split, with an array of separators:
string[] separators = {"\n\n", "\r\r", "\r\n\r\n"};
string[] paragraphs = text.Split(separators,
StringSplitOptions.RemoveEmptyEntries);
Do you have to use a regular expression? Tools like COCO/R could make this job pretty easy as well. In addition it might just prove to be faster than generating code at runtime using a regex.
COMPILER YourParaProcessor
// your code goes here
TOKENS
newLine= '\r'|'\n'.
paraLetter = ANY - '\n' - '\r' .
YourParaProcessor
=
{Paragraph}
.
Paragraph =
{paraLetter} '\r\n' .

Categories