How to replace "-" with " -" only when it's not preceded by "e"? - c#

I imagine I should use Regex for that but I still scratch my head a lot about it (and the only similar question I found wasn't exactly my case) so I decided to ask for help. This is the input and expected output:
Input: "c-0.68219,-0.0478 -1.01455-0.0441 0.9e-4,0.43212"
Output: "c -0.68219,-0.0478 -1.01455 -0.0441 0.9e-4,0.43212"
Basically I need either commas or spaces as value separators, but I can't break the exponential index (e‑4). Maybe do two successive replacements?

Been some time for me, but you should be able to use something like this:
Regex rx = new Regex(#"([^e\s])-(\d)");
rx.Replace(input, "$1 -$2");
Edit: This will add a space in front even if there's a comma. Any reason not doing this?

You can use below also
ResultString = Regex.Replace(MyString, "([^0-9,e,., ,e-])", "$1 ",
RegexOptions.IgnoreCase);

Related

Text files - how to programmatically mimic opening in Wordpad and overwriting as plain text [duplicate]

How can I replace lone instances of \n with \r\n (LF alone with CRLF) using a regular expression in C#?
I know to do it using plan String.Replace, like:
myStr.Replace("\n", "\r\n");
myStr.Replace("\r\r\n", "\r\n");
However, this is inelegant, and would destroy any "\r+\r\n" already in the text (although they are not likely to exist).
It might be faster if you use this.
(?<!\r)\n
It basically looks for any \n that is not preceded by a \r. This would most likely be faster, because in the other case, almost every letter matches [^\r], so it would capture that, and then look for the \n after that. In the example I gave, it would only stop when it found a \n, and them look before that to see if it found \r
Will this do?
[^\r]\n
Basically it matches a '\n' that is preceded with a character that is not '\r'.
If you want it to detect lines that start with just a single '\n' as well, then try
([^\r]|$)\n
Which says that it should match a '\n' but only those that is the first character of a line or those that are not preceded with '\r'
There might be special cases to check since you're messing with the definition of lines itself the '$' might not work too well. But I think you should get the idea.
EDIT: credit #Kibbee Using look-ahead s is clearly better since it won't capture the matched preceding character and should help with any edge cases as well. So here's a better regex + the code becomes:
myStr = Regex.Replace(myStr, "(?<!\r)\n", "\r\n");
I was trying to do the code below to a string and it was not working.
myStr.Replace("(?<!\r)\n", "\r\n")
I used Regex.Replace and it worked
Regex.Replace( oldValue, "(?<!\r)\n", "\r\n")
I guess that "myStr" is an object of type String, in that case, this is not regex.
\r and \n are the equivalents for CR and LF.
My best guess is that if you know that you have an \n for EACH line, no matter what, then you first should strip out every \r. Then replace all \n with \r\n.
The answer chakrit gives would also go, but then you need to use regex, but since you don't say what "myStr" is...
Edit:looking at the other examples tells me one thing.. why do the difficult things, when you can do it easy?, Because there is regex, is not the same as "must use" :D
Edit2: A tool is very valuable when fiddling with regex, xpath, and whatnot that gives you strange results, may I point you to: http://www.regexbuddy.com/
myStr.Replace("([^\r])\n", "$1\r\n");
$ may need to be a \
Try this: Replace(Char.ConvertFromUtf32(13), Char.ConvertFromUtf32(10) + Char.ConvertFromUtf32(13))
If I know the line endings must be one of CRLF or LF, something that works for me is
myStr.Replace("\r?\n", "\r\n");
This essentially does the same neslekkiM's answer except it performs only one replace operation on the string rather than two. This is also compatible with Regex engines that don't support negative lookbehinds or backreferences.

Remove "[ ]" and all that occurences within in a string

I'm building a small text clean up program and I'm currently testing it on Wiki articles,and I'm trying to efficiently remove the "[2]", "[14]", "[nb 6]" etc.
I have this code which nearly does the job, but its seem very overly long and I feel there must be a way to do it in one line, but I'm new to Regex and can't figure it out. Also I've read mixed opinions on Regex so if theres an alternsate way that'd be great.
Anyway here is my current code:
string refinedText = Regex.Replace(sourceText, #"\[[0-9]\]", "");
refinedText = Regex.Replace(refinedText, #"\[[0-9]", "");
refinedText = Regex.Replace(refinedText, #"\[[a-z]", "");
refinedText = Regex.Replace(refinedText, #"[0-9]\]", "");
The issue is there are 2 numbers within the "[ ]" and I don't know how to tell it to remove both, as doing "0-9" just removes the first number, I can do the replace in 2 parts for them; but for the instances of "[nb 3]" the b always remains as there no way I can specify the lone "b" after the "[ ]" are gone to be used as the reference. Also "[nb 14]" same issue with if there are double digits after the "nb".
I'm sure this is simply to do in 1 line, but I can't find any where explaining regex to this extent.
-Thanks.
If you would like to delete the square brackets along with their content, no matter what that content is, the expression looks like this:
#"\[[^\]]*\]"
This means "match everything until you get to the closing bracket". This is more efficient than the dot with reluctant qualifier .*? because it avoids so-called catastrophic backtracking.
Use the + modifier:
string refinedText = Regex.Replace(sourceText, #"\[[0-9]+\]", "");
As the Regular Expression Language - Quick Reference explains:
Matches the previous element one or more times.
To remove any characters between the brackets:
string refinedText = Regex.Replace("[0as9]", #"\[.+\]", "");
Or if you want to also handle the "[]" case, then change + to *:
Matches the previous element zero or more times.
string refinedText = Regex.Replace("[0as9]", #"\[.*\]", "");
Try like this:
string refinedText = Regex.Replace(sourceText, #"\[[0-9]+\]", "");
Also you may try like this:
var refinedText = Regex.Replace(sourceText, #" ?\[.*?\]", string.Empty);
REGEX DEMO
This will remove everything inside the text box including characters and numbers

Format String : Parsing

I have a parsing question. I have a paragraph which has instances of :  word  . So basically it has a colon, two spaces, a word (could be anything), then two more spaces.
So when I have those instances I want to convert the string so I have
A new line character after : and the word.
Removed the double space after the word.
Replace all double spaces with new line characters.
Don't know exactly how about to do this. I'm using C# to do this. Bullet point 2 above is what I'm having a hard time doing this.
Thanks
Assuming your original string is exactly in the form you described, this will do:
var newString = myString.Trim().Replace(" ", "\n");
The Trim() removes leading and trailing whitespaces, taking care of your spaces at the end of the string.
Then, the Replace replaces the remaining " " two space characters, with a "\n" new line character.
The result is assigned to the newString variable. This is needed, as myString will not change - as strings in .NET are immutable.
I suggest you read up on the String class and all its methods and properties.
You can try
var str = ": first : second ";
var result = Regex.Replace(str, ":\\s{2}(?<word>[a-zA-Z0-9]+)\\s{2}",
":\n${word}\n");
Using RegularExpressions will give you exact matches on what you are looking for.
The regex match for a colon, two spaces, a word, then two more spaces is:
Dim reg as New Regex(": [a-zA-Z]* ")
[a-zA-Z] will look for any character within the alphabetical range. Can append 0-9 on as well if you accept numbers within the word. The * afterwards indicated that there can be 0 or more instances of the preceding value.
[a-zA-Z]* will attempt to do a full match of any set of contiguous alpha characters.
Upon further reading, you may use [\w] in place of [a-zA-Z0-9] if that's what you are looking for. This will match any 'word' character.
source: http://msdn.microsoft.com/en-us/library/ms972966.aspx
You can retrieve all the matches using reg.Matches(inputString).
Review http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx for more information on regular expression replacements and your options from there out
edit: Before I was using \s to search for spaces. This will match any whitespace character including tabs, new lines and other. That is not what we want, so I reverted it back to search for exact space characters.
You can use string.TrimEnd - http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx - to trim spaces at the end of the string.
The following is an example using Regular Expressions. See also this question for more info.
Basically the pattern string tells the regex to look for a colon followed by two spaces. Then we save in a capture group named "word" whatever the word is surrounded by two spaces on either side. Finally two more spaces are specified to finish the pattern.
The replace uses a lambda which says for every match, replace it with a colon, a new line, the "lone" word, and another newline.
string Paragraph = "Jackdaws love my big sphinx of quartz: fizz The quick onyx goblin jumps over the lazy dwarf. Where: buzz The crazy dogs.";
string Pattern = #": (?<word>\S*) ";
string Result = Regex.Replace(Paragraph, Pattern, m =>
{
var LoneWord = m.Groups[1].Value;
return #":" + Environment.NewLine + LoneWord + Environment.NewLine;
},
RegexOptions.IgnoreCase);
Input
Jackdaws love my big sphinx of quartz: fizz The quick onyx goblin jumps over the lazy dwarf. Where: buzz The crazy dogs.
Output
Jackdaws love my big sphinx of quartz:
fizz
The quick onyx goblin jumps over the lazy dwarf. Where:
buzz
The quick brown fox.
Note, for item 3 on your list, if you also want to replace individual occurrences of two spaces with newlines, you could do this:
Result = Result.Replace(" ", Environment.NewLine);

regex can't recognize "\n"?

so at the end the end(after few days of debuging) i found a problem. It isnt in regex at all :/ . It seams that i was trimming ekstra white spaces with
intput= Regex.Replace(input, "\\s+", " ");
so all new lines are replaced with " ". Stupid! Moderator, please remove this if unnecesary!
I have regexp for tokenizing some text and it looks like this :
"(?<html>Ç)|
(?<number>\\d+(?:[.]\\d+)?(?=[][ \f\n\r\t\v!?.,():;\"'„Ç]|$))|
(?<other>(?:[^][Ç \f\n\r\t\v!?.,():;\"'„A-Za-zčćšđžČĆŠĐŽäöÖü][^ Ç\f\n\r\t\vA-Za-zčćšđžČĆŠĐŽäöÖü]*)?[^][ Ç\f\n\r\t\v!?.,():;\"'„A-Za-zčćšđžČĆŠĐŽäöÖü](?=[][!?.,():;\"'„]*(?:$|[ Ç\f\n\r\t\v])))|
(?<word>(?:[^][ Ç\f\n\r\t\v!?.,():;\"'„][^ Ç\f\n\r\t\v]*)?[^][ Ç\f\n\r\t\v!?.,():;\"'„])|
(?<punctuation>[][ \f\n\r\t\v!?.,():;\"'„])"
Problem is in this part: (?<punctuation>[][ \f\n\r\t\v!?.,():;\"'„]). So when im prsing text with input "\n\n" it is grouping in punctuation matches: " "," " - in other words, space and space... and I don't know why?
I could be wrong, but you need to hand the String as String to the RegEx...means you need to escape the backslashes.
... (?=[][ \\f\\n\\r\\t\\v!?.,():;\\" ...
Or otherwise C# will replace \n with a linebreak within the RegEx-Statement.
Edit: It's also possible to use literal strings, but the need to be marked with beginning # (see Martin's answer).
If you put an # in front of string you can use single backslashes and line-breaks will be recognized.
#"(?<html>Ç)|
Set RegexOptions.IgnorePatternWhiteSpace
Update:
Are you sure [^] is correct? Unless it's somekind of character group (that I have never used), that will be the same as . . Same goes for []. Perhaps I just have not used all of RE before :p

matching string in C#

Alright, so i need to be able to match strings in ways that give me more flexibility.
so, here is an example. for instance, if i had the string "This is my random string", i would want some way to make
" *random str* ",
" *is __ ran* ",
" *is* ",
" *this is * string ",
all match up with it, i think at this point a simple true or false would be okay to weather it match's or not, but id like basically * to be any length of any characters, also that _ would match any one character. i can't think of a way, although im sure there is, so if possible, could answers please contain code examples, and thanks in advance!
I can't quite figure out what you're trying to do, but in response to:
but id like basically * to be any length of any characters, also that _ would match any one character
In regex, you can use . to match any single character and .+ to match any number of characters (at least one), or .* to match any number of characters (or none if necessary).
So your *is __ ran* example might turn into the regex .+is .. ran.+, whilst this is * string could be this is .+ string.
If this doesn't answer your question then you might want to try re-wording it to make it clearer.
For learning more regex syntax, a popular site is regular-expressions.info, which provides pretty much everything you need to get started.
Use Regular Expressions.
In C#, you would use the Regex class.
For example:
var str = "This is my random string";
Console.WriteLine(Regex.IsMatch(str, ".*is .. ran.*")); //Prints "True"

Categories