With C#, I'm trying now to use Regular Expressions to replace newline (\n) in a text file by semicolon (;), but just if there is any content in that line;
If the text file is:
This is the program
Hello World
Then my return would be
This is the program;
Hello World;
I'm trying to use
my_str = Regex.Replace(val, "\n", ";");
But it affects also the lines without content.
Try capturing 1+ characters followed by a newline?
(.+)[\r\n]?
\1;\n
C#:
my_str = Regex.Replace(val, "(.+)[\r\n]?", "$1;\n");
I think something like this will work.
my_str = Regex.Replace(val, "(?<prev>.+)\\n", "${prev};\\n");
Related
I'm trying to parse some HTML that has a bunch of escaped chars inside it, a lot of
\t, \n, \r, and every double quote is escaped by a backslash. Sample HTML:
<div id=\"error-modal\" title=\"Retrieving Document Error\" class=\"text-hide\">\n We're sorry, we were unable to retrieve your requested document or image.</div>
I'm trying to replace these characters by doing this:
var xpar = new XML.Parser(wConn.RawString.Replace("\\n", "").Replace("\\t", "").Replace("\\r","").Replace("\\\"", "\""))
The parser errs out because there's something else in the HTML it doesn't like, but in the exception the string is the same as it was before, the backslashes are all still there. What am I doing wrong?
The problem is that replacement method take \n \r \t as a code and not as text that you want.
You can use a regular expressions to achieve that.
var patternToMatch = "\\\\(n|r|t|\\\")";
var replacement = "";
var escapedString = Regex.Replace(inputString, patternToMatch, replacement);
modify the pattern to match with your requirements but basically this expression can solve your problem.
I have a csv file. It is not so big, the problem is this. The end of the line has these two characters at the end of every line cr lf.
Unfortunately in only one single register there is a column with a LF character in the middle. When I try to import the document, this character generates a conflict.
The line looks like this in Notepad++
text1, text2,te(LF)
xt3, text4 (CR LF)
And I need this
text1, text2,text3, text4 (CR LF)
Now, mi question is, how can I delete this character in C# without affecting the end of the row?
Regards
Try this code:
string result = Regex.Replace(text, #"([^\r])\n", "$1");
you simply replace any new line that does not come just after CR with just what comes before it.
Ideone sample
Delete all LF. Then replace all CR by CR,LF. Use string.Replace for this.
I'm having trouble with my c# code.
I have stored a string object when I print it out it prints out "username" and when I view it in the debugger it is shown as "\"username\"". How can I replace "\ with whitespace in the variable? It is stopping me from making comparison operations.
I tried with
memberNameStripped = teamMemberName.Replace(#"\", "");
But it does not replace the "\ so how can I do it?
Thanks in advance.
Why regex? Use String.Trim to remove leading and trailing quotes("):
memberNameStripped = memberNameStripped.Trim('"');
It's efficient and clear.
The \ is an escape character, what you probably want to replace is the double quote "
So try:
memberNameStripped = teamMemberName.Replace("\"", "");
In debugger it is shown as "\"username\"" because it is a quoted string. This is why it prints out "username". You could get rid of quotes using Replace("\"", "")
Here is a simple example
string text = "parameter=120\r\n";
int newValue = 250;
text = Regex.Replace(text, #"(?<=parameter\s*=).*", newValue.ToString());
text will be "parameter=250\n" after replacement. Replace() method removes '\r'. Does it uses unix-style for line feed by default? Adding \b to my regex (?<=parameter\s*=).*\b solves the problem, but I suppose there should be a better way to parse lines with windows-style line feeds.
Take a look at this answer. In short, the period (.) matches every character except \n in pretty much all regex implementations. Nothing to do with Replace in particular - you told it to remove any number of ., and that will slurp up \r as well.
Can't test now, but you might be able to rewrite it as (?<=parameter\s*=)[^\r\n]* to explicitly state which characters you want disallowed.
. by default doesn't match \n..If you want it to match you have to use single line mode..
(?s)(?<=parameter\s*=).*
^
(?s) would toggle the single line mode
Try this:
string text = "parameter=120\r\n";
int newValue = 250;
text = Regex.Replace(text, #"(parameter\s*=).*\r\n", "${1}" + newValue.ToString() + "\n");
Final value of text:
parameter=250\n
Match carriage return and newline explicitly. Will only match lines ending in \r\n.
I'm trying to get paragraphs from a string in C# with Regular Expressions.
By paragraphs; I mean string blocks ending with double or more \r\n. (NOT HTML paragraphs <p>)...
Here is a sample text:
For example this is a paragraph with a carriage return here
and a new line here. At this point, second paragraph starts. A paragraph ends if double or more \r\n is matched or if reached at the end of the string ($).
I tried the pattern:
Regex regex = new Regex(#"(.*)(?:(\r\n){2,}|\r{2,}|\n{2,}|$)", RegexOptions.Multiline);
but this does not work. It matches every line ending with a single \r\n. What I need is to get all characters including single carriage returns and newline chars till reached a double \r\n.
.* is being greedy and consuming as much as it can. Your second set of () has a $ so the expression that is being used is (.*)(?). In order to make the .* not be greedy, follow it with a ?.
When you specify RegexOptions.Multiline, .NET will split the input on line breaks. Use RegexOptions.Singleline to make it treat the entire input as one.
Regex regex = new Regex(#"(.*?)(?:(\r\n){2,}|\r{2,}|\n{2,}|$)", RegexOptions.Singleline);
An opposite approach will be to match the separators instead of the paragraphs, making the problem almost trivial. Consider:
string[] paragraphs = Regex.Split(text, #"^\s*$", RegexOptions.Multiline);
By splitting the input string by empty lines you can easily get all paragraphs. If you only want blank lines with no spaces you can simplify that even further, and use the parretn ^$. In that case you can also use the non-regex String.Split, with an array of separators:
string[] separators = {"\n\n", "\r\r", "\r\n\r\n"};
string[] paragraphs = text.Split(separators,
StringSplitOptions.RemoveEmptyEntries);
Do you have to use a regular expression? Tools like COCO/R could make this job pretty easy as well. In addition it might just prove to be faster than generating code at runtime using a regex.
COMPILER YourParaProcessor
// your code goes here
TOKENS
newLine= '\r'|'\n'.
paraLetter = ANY - '\n' - '\r' .
YourParaProcessor
=
{Paragraph}
.
Paragraph =
{paraLetter} '\r\n' .