I have two simple questions about regular expressions.
Having the string $10/$50, I want to get the 50, which will always be at the end of the string. So I made: ([\\d]*$)
Having the string 50c/70c I want to get the 70, which will always be at the end of the string(i want it without the c), so I made: ([\\d]*)c$
Both seem do to what I want, but I actually would like to do 2 things with it:
a) I'd like to put both on the same
string(is it possible?). I tried with
the | but it didn't seem to work.
**b)**If indeed it is possible to do a),
i'd like to know if it's possible to
format the text. As you can see, both
for dollars and cents, I will
retrieve with the regular expression
the value the string shows. But while
in the first case we are dealing with
dollars, in the second we're dealing
with cents, so I'd like to transform
50 cents into 0,5. Is it possible, or
will I have to code that by myself?
For matching both cases you're basically saying that the "c" is optional. Then use the "?" which means "zero or one match of the preceeding char". This should give you the following:
([\d]*)c?$
Hope that helps.
(a) is easy:
(\d+)c?$
(b) you can't do with regular expressions.
Providing these are Perl-style regexes (I don't actually know C#/.net):
(\d+(?=c$)|(?<=\$)\d+$)
Formatting the text would probably be something to do outside of the regex matching.
(\$?[\d]+)c?
The following will match both in one hit from a paragraph. Use the following tool to see with the following example text
http://gskinner.com/RegExr/
Example Text
This is the string 50c/70c $10/$50
This is the string 50c/70c $9/$50
This is the string 50c/70c $8/$50
This is the string 50c/70c $7/$50
This is the string 50c/70c $6/$50
This is the string 50c/70c $5/$50
Your existings regexes can be simplified:
([\d]*$) ([\d]*)c$
you don't need the square backets
(\d*$) (\d*)c$
But I'd recommend that you demand to have at least on digit in your number, so use + instead of *
(\d+$) (\d+)c$
Now you can join the two together:
(\d+)(c?)$
I would't recommend doing the calculation inside the regex.
We captured the 'c' in the second parenthesis, so we can work with this information.
This is what the whole thing would look like in perl, please adapt apropriately:
if( m/(\d+)(c?)$/ ) {
if( $2 eq 'c' ) {
$dollars = $1/100;
} else {
$dollars = $1;
}
print "$_ are $dollars dollars\n"
}
As I commented above: the calculatino could be done in a regex / subsitution
s/.*?(\d+)(c?)$/$1*($2?0.01:1)/e
but that might bit a bit obfuscated
Others have pointed out that (\d+)(c?)$ satisfies a).
But I see no reason why you can't follow that with substitution statements to do the formatting:
s/(\d)0c/0,$1/
s/(\d\d)c/0,$1/
s/(\d)c/0,0$1/
Related
I wrote a small program in C# to Capture ingame Text.
My issue is that the Text allso containts Collor Codes which i try to not to have. I read about the function Regex.Replace
Which i think is going to suite for that.
I have Following String (Line) i want to clear i used the small little tool espresso to play a little bit with regular expression but i never figured it really out.
This is the String i am going to work with:
|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R
I try to use ^|( [a-zA-Z0-9]{9})
which gave me theese matches
c001177ff
cff00AA00
cff00AA00
cff00AA00
cffff69b4
cff00AA00
cff40e0d0
cffffff00
cffffff00
cff40e0d0
cffff69b4
cff00AA00
Well i am not good at regex more likly i just started it. I don't want any body to present me completed solution (you are more than welcome to do that) at least a little help how i can solve that issue. I want to filter the Text.
Inpute Code
|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R
Should be Filtered to this
Save Code = AGQg R9$# 4fR
I think theese are Hexadecimal Color Codes the |c marks the beginning and the |r the End of the string.I think the |r | is just used to indicate that the first color string ends than we get an SPACE and the | indicates the next start.
How about a simple Linq?
var output = String.Join("", input.Split('|')
.Select(s => s.Length != 10 ? ' ' : s.Last()))
.Trim();
So I think the problem you were having was not escaping your |... the following regex works for me:
var replaced = Regex.Replace(intput, #"\|c[0-9a-zA-Z]{8}|\|r", "");
\|c[0-9a-zA-Z]{8} - match starting with "|c" and then any 8 letters or numbers
| - or
\|r - match "|r"
You're on the right track. Your regex
^|( [a-zA-Z0-9]{9})
Both forces the match to be only at the start of your input string, due to the ^ start-of-line anchor, and the | needs to be escaped, because unescaped, it's a special "or" operator, which completely changes the meaning of your regex.
In addition, the space after the | is undesired, and the capture group is unnecessary, as you only want to eliminate this portion.
If you replace all instances of this
\|[a-zA-z0-9]{9}
with nothing (the empty string)
You will achieve most of your goal. Try it here: http://regex101.com/r/rF6yB6/1
But it seems you really want to eliminate not just nine characters after the pipe, but up through nine characters. So use the {1,9} range quantifier instead:
\|[a-zA-z0-9]{1,9}
Try it: http://regex101.com/r/rF6yB6/2
This seems to achieve your goal exactly.
Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference.
string input = "[The example input from your question]";
string output = input.Replace("|r", "");
while (output.Contains("|c"))
output = output.Remove(output.IndexOf("|c"), 10);
// output = "Save Code = AGQg R9$# 4fR"
I like this much more than using Regexes just because it's so much more clear to me.
var str1 = "|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R"
var str2 = Regex.Replace(str,#"\|(r|[a-zA-Z0-9]{9})","") //"Save Code = AGQg R9$# 4fR"
In addition to this answer re: escaping the "pipe" character, you're starting your regex with the caret (^) character. This matches the beginning of a line.
A correct regex would be:
\|c[0-9a-zA-Z]{8}
This regex should match all of the characters you want to remove:
([|]c([0-9]|[a-f]|[A-F]){8})|[|]r
Here's the breakdown...
The vertical pipe is an OR marker, so to search for it, place it in square brackets [ and ].
The parenthesis makes a set. So you're searching for ([|]c([0-9]|[a-f]|[A-F]){8}) OR [|]r which is all of your color codes OR |r.
Breakdown of the color codes is the set that begins with |c and is followed by the set of exactly 8 characters that can be 0 though 9 or a through f or A through F.
I tested it at RegexPal.com.
I have a regular expression designed to extract a number from between two parenthesis. It had been working fine until we made the input string customizable. Now, if a number is found somewhere else in the string, the last number is taken. My expression is below:
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"(\d+)(?!.#\d)", RegexOptions.RightToLeft).Value);
If I send the string This (12) is a test, it works fine, extracting the 12. However, if I send This (12) is a test2, the result is 2. I realize I can change the RightToLeft to LeftToRight, which will fix this instance, but I only want to get the number between the parenthesis.
I am sure this will be easy for anyone with any regular expression experience (which is obviously not me). I am hoping you could show me how to correct this to get what I want, but also give a brief explanation of what I am doing wrong so I can hopefully improve.
Thank you.
Additional Information
I appreciate all of the responses. I have taken the agreed upon advice, and tried each of these formats:
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"(\(\d+\))(?!.#\d)", RegexOptions.RightToLeft).Value);
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"(\(\d+\))", RegexOptions.RightToLeft).Value);
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"\(\d+\)", RegexOptions.RightToLeft).Value);
Unfortunately, with each, I get an exception stating that the input string was not in a correct format. I did not get that before. I'm sure that I could resolve this without using a regular expression in a minute or two, but my stubbornness has kicked in.
Thank you everyone for your comments.
you need to escape parenthesis in regex, because they mean something
#"(\(\d+\))(?!.#\d)
or, if you didn't actually intend your number to be caught in a group
#"\(\d+\)(?!.#\d)
Try this regular expression:
\(#(\d+)\)
The brackets are escaped \( and \) and inside them is the normal search for numbers.
If you use the .Value property, it will give you the number surrounded by brackets. Instead you need to use the Groups collection. So to use in your code, you do this: (now with added error checking!)
var match = Regex.Match("hgf", #"\(#(\d+)\)", RegexOptions.RightToLeft).Groups[1].Value;
if(!string.IsNullOrEmpty(match))
{
var icorrespid = Convert.ToInt32(match);
}
else
{
//No match found
}
Use:
\(\d+\)(?!.#\d)
( and ) are reserved characters known as a capture group.
Parentheses have a meaning in regex, so you need to escape them:
\(\d+\)
The actual meaning is to create a capture group, so if you're relying on a capture group in your code, you need another pair of parentheses like this:
\((\d+)\)
I'm not quite sure what the purpose of the (?!.#\d) part is from your question, but if you do need it, you can leave it where it is (just append it to the end of either of the versions above)
Been scratching my head all day about this one!
Ok, so I have a string which contains the following:
?\"width=\"1\"height=\"1\"border=\"0\"style=\"display:none;\">');
I want to convert that string to the following:
?\"width=1height=1border=0style=\"display:none;\">');
I could theoretically just do a String.Replace on "\"1\"" etc. But this isn't really a viable option as the string could theoretically have any number within the expression.
I also thought about removing the string "\"", however there are other occurrences of this which I don't want to be replaced.
I have been attempting to use the Regex.Replace method as I believe this exists to solve problems along my lines. Here's what I've got:
chunkContents = Regex.Replace(chunkContents, "\".\"", ".");
Now that really messes things up (It replaces the correct elements, but with a full stop), but I think you can see what I am attempting to do with it. I am also worrying that this will only work for single numbers (\"1\" rather than \"11\").. So that led me into thinking about using the "*" or "+" expression rather than ".", however I foresaw the problem of this picking up all of the text inbetween the desired characters (which are dotted all over the place) whereas I obviously only want to replace the ones with numeric characters in between them.
Hope I've explained that clearly enough, will be happy to provide any extra info if needed :)
Try this
var str = "?\"width=\"1\"height=\"1234\"border=\"0\"style=\"display:none;\">');";
str = Regex.Replace(str , "\"(\\d+)\"", "$1");
(\\d+) is a capturing group that looks for one or more digits and $1 references what the group captured.
This works
String input = #"?\""width=\""1\""height=\""1\""border=\""0\""style=\""display:none;\"">');";
//replace the entire match of the regex with only what's captured (the number)
String result = Regex.Replace(input, #"\\""(\d+)\\""", match => match.Result("$1"));
//control string for excpected result
String shouldBe = #"?\""width=1height=1border=0style=\""display:none;\"">');";
//prints true
Console.WriteLine(result.Equals(shouldBe).ToString());
I need some help on a problem.
In fact I search to check for an image type by the hexadecimal code.
string JpgHex = "FF-D8-FF-E0-xx-xx-4A-46-49-46-00";
Then I have a condition on
string.StartsWith(pngHex).
The problem is that the "x" characters presents in my "JpgHex" string can be whatever I want.
I think I need a regex to check that but I don't know how!!
Thanks a lot!
I'm not quite clear what exactly you want to do, but the dot '.' character represents any character in Regex.
So the regex "^FF-D8-FF-E0-..-..-4A-46-49-46-00" will probably do the trick. '^' = Start of input.
If you want to allow only hex chars you can use "^FF-D8-FF-E0-[0-9A-F]{2}-[0-9A-F]{2}-4A-46-49-46-00".
Like I said, I'd need a better idea of what pattern you need to match.
Here are some examples:
Regex rgx =
new Regex(#"^FF-D8-FF-E0-[a-zA-Z0-9]{2}-[a-zA-Z0-9]{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex); // is match will return a bool.
I use [a-zA-Z0-9]{2} to denote two instances of a character, caps or small or a number. So the above regex would match :
FF-D8-FF-E0-aa-zZ-4A-46-49-46-00
FF-D8-FF-E0-11-22-4A-46-49-46-00
.. etc
Based on your need change the regex accordingly so for capitals and numbers only you change to [A-Z0-9]. The {2} denotes two occurrences.
The ^ denotes the string should start with FF and $ means the string should end with 00.
Lets say you wanted to only match two numbers, so you would use \d{2}, the whole thing would look like this:
Regex rgx = new Regex(#"^FF-D8-FF-E0-\d{2}-\d{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex);
How do I know of these magical characters? Simple, there are docs everywhere. See this MSDN page for some basic regex patterns. This page shows some quantifiers, those are things like match one or more or match only one.
Cheat-sheets also come in handy.
A regex would help you; you can use the following tool to help you test and learn: -
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
I recommend you have a play because then you'll learn!
To simply match any character in place of the x, the following should work: -
"^FF-D8-FF-E0-..-..-4A-46-49-46-00$"
In C#, it would be something like this: -
var test = "FF-D8-FF-E0-AB-CD-4A-46-49-46-00";
var foo = new Regex("^FF-D8-FF-E0-..-..-4A-46-49-46-00$");
if (foo.IsMatch(test))
{
// Do magic
}
You will need to read up on regular expressions to understand some of the characters that may not look familiar, i.e. ^ and $. See http://www.regular-expressions.info/
I've created the following regex pattern in an attempt to match a string 6 characters in length ending in either "PRI" or "SEC", unless the string = "SIGSEC". For example, I want to match ABCPRI, XYZPRI, ABCSEC and XYZSEC, but not SIGSEC.
(\w{3}PRI$|[^SIG].*SEC$)
It is very close and sort of works (if I pass in "SINSEC", it returns a partial match on "NSEC"), but I don't have a good feeling about it in its current form. Also, I may have a need to add more exclusions besides "SIG" later and realize that this probably won't scale too well. Any ideas?
BTW, I'm using System.Text.RegularExpressions.Regex.Match() in C#
Thanks,
Rich
Assuming your regex engine supports negative lookaheads, try this:
((?!SIGSEC)\w{3}(?:SEC|PRI))
Edit: A commenter pointed out that .NET does support negative lookaheads, so this should work fine (thanks, Charlie).
To help break down Dan's (correct) answer, here's how it works:
( // outer capturing group to bind everything
(?!SIGSEC) // negative lookahead: a match only works if "SIGSEC" does not appear next
\w{3} // exactly three "word" characters
(?: // non-capturing group - we don't care which of the following things matched
SEC|PRI // either "SEC" or "PRI"
)
)
All together: ((?!SIGSEC)\w{3}(?:SEC|PRI))
You can try this one:
#"\w{3}(?:PRI|(?<!SIG)SEC)"
Matches 3 "word" characters
Matches PRI or SEC (but not after SIG i.e. SIGSEC is excluded) (? < !x)y - is a negative lookbehind (it mathces y if it's not preceded by x)
Also, I may have a need to add more
exclusions besides "SIG" later and
realize that this probably won't scale
too well
Using my code, you can easily add another exceptions, for example following code excludes SIGSEC and FOOSEC
#"\w{3}(?:PRI|(?<!SIG|FOO)SEC)"
Why not use more readable code? In my opinion this is much more maintainable.
private Boolean HasValidEnding(String input)
{
if (input.EndsWith("SEC",StringComparison.Ordinal) || input.EndsWith("PRI",StringComparison.Ordinal))
{
if (!input.Equals("SIGSEC",StringComparison.Ordinal))
{
return true;
}
}
return false;
}
or in one line
private Boolean HasValidEnding(String input)
{
return (input.EndsWith("SEC",StringComparison.Ordinal) || input.EndsWith("PRI",StringComparison.Ordinal)) && !input.Equals("SIGSEC",StringComparison.Ordinal);
}
It's not that I don't use regular expressions, but in this case I wouldn't use them.
Personally, I'd be inclined to build-up the exclusion list using a second variable, then include it into the full expression - it's the approach I've used in the past when having to build any complex expression.
Something like exclude = 'someexpression'; prefix = 'list of prefixes'; suffix = 'list of suffixes'; expression = '{prefix}{exclude}{suffix}';
You may not even want to do the exclusions in the regex. For example, if this were Perl (I don't know C#, but you can probably follow along), I'd do it like this
if ( ( $str =~ /^\w{3}(?:PRI|SEC)$/ ) && ( $str ne 'SIGSEC' ) )
to be clear. It's doing exactly what you wanted:
Three word characters, followed by PRI or SEC, and
It's not SIGSEC
Nobody says you have to force everything into one regex.