Regex to match against something that is not a specific substring - c#

I am looking for a regex that will match a string that starts with one substring and does not end with a certain substring.
Example:
// Updated to be correct, thanks #Apocalisp
^foo.*(?<!bar)$
Should match anything that starts with "foo" and doesn't end with "bar". I know about the [^...] syntax, but I can't find anything that will do that for a string instead of single characters.
I am specifically trying to do this for Java's regex, but I've run into this before so answers for other regex engines would be great too.
Thanks to #Kibbee for verifying that this works in C# as well.

I think in this case you want negative lookbehind, like so:
foo.*(?<!bar)

Verified #Apocalisp's answer using:
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
Pattern p = Pattern.compile("^foo.*(?<!bar)$");
System.out.println(p.matcher("foobar").matches());
System.out.println(p.matcher("fooBLAHbar").matches());
System.out.println(p.matcher("1foo").matches());
System.out.println(p.matcher("fooBLAH-ar").matches());
System.out.println(p.matcher("foo").matches());
System.out.println(p.matcher("foobaz").matches());
}
}
This output the the right answers:
false
false
false
true
true
true

I'm not familiar with Java regex but documentation for the Pattern Class would suggest you could use (?!X) for a non-capturing zero-width negative lookahead (it looks for something that is not X at that postision, without capturing it as a backreference). So you could do:
foo.*(?!bar) // not correct
Update: Apocalisp's right, you want negative lookbehind. (you're checking that what the .* matches doesn't end with bar)

As other commenters said, you need a negative lookahead. In Java you can use this pattern:
"^first_string(?!.?second_string)\\z"
^ - ensures that string starts with
first_string
\z - ensures that string ends with second_string
(?!.?second_string) - means that first_string can't be followed by second_string

Related

length in regular expression [duplicate]

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

Regular expression - starting and not ending with a pattern

How do I put a regular expression to check if a string starts with certain pattern and is NOT ending with certain pattern.
Example:
Must StartsWith: "US.INR.USD.CONV"
Should not end with: ".VALUE"
Passes Regex: "US.INR.USD.CONV.ABC.DEF.FACTOR"
Fails Regex Check: "US.INR.USD.CONV.ABC.DEF.VALUE"
I am using C#.
You can use this regex based on negative lookahead:
^US\.INR\.USD\.CONV(?!.*?\.VALUE$).*$
RegEx Demo
Explanation:
^US\.INR\.USD\.CONV - Match US.INR.USD.CONV at start of input
(?!.*?\.VALUE$) - Negative lookahead to make sure line is not ending with .value
^US\.INR\.USD\.CONV.*(?<!\.VALUE)$
Try this.See demo.
https://regex101.com/r/fA6wE2/26
Just use a negative lookbehind to make .VALUE is not before $ or end of string.
(?<!\.VALUE)$ ==>Makes sure regex engine looks behind and checks if `.VALUE` is not there when it reaches the end of string.
You don't need regular expressions for that. You can just use String.StartsWith and String.EndsWith
if(val.StartsWith("US.INR.USD.CONV") && !val.EndsWith(".VALUE"))
{
// valid
}
And as you mention in your comment to anubhava's answer you can do this to check for ".PERCENT" at the end as well.
if(val.StartsWith("US.INR.USD.CONV") &&
!val.EndsWith(".VALUE") &&
!val.EndsWith(".PERCENT"))
{
// valid
}
IMHO this makes the code much more readable and will almost definitely perform faster as well.

Regex in a string

I need some help on a problem.
In fact I search to check for an image type by the hexadecimal code.
string JpgHex = "FF-D8-FF-E0-xx-xx-4A-46-49-46-00";
Then I have a condition on
string.StartsWith(pngHex).
The problem is that the "x" characters presents in my "JpgHex" string can be whatever I want.
I think I need a regex to check that but I don't know how!!
Thanks a lot!
I'm not quite clear what exactly you want to do, but the dot '.' character represents any character in Regex.
So the regex "^FF-D8-FF-E0-..-..-4A-46-49-46-00" will probably do the trick. '^' = Start of input.
If you want to allow only hex chars you can use "^FF-D8-FF-E0-[0-9A-F]{2}-[0-9A-F]{2}-4A-46-49-46-00".
Like I said, I'd need a better idea of what pattern you need to match.
Here are some examples:
Regex rgx =
new Regex(#"^FF-D8-FF-E0-[a-zA-Z0-9]{2}-[a-zA-Z0-9]{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex); // is match will return a bool.
I use [a-zA-Z0-9]{2} to denote two instances of a character, caps or small or a number. So the above regex would match :
FF-D8-FF-E0-aa-zZ-4A-46-49-46-00
FF-D8-FF-E0-11-22-4A-46-49-46-00
.. etc
Based on your need change the regex accordingly so for capitals and numbers only you change to [A-Z0-9]. The {2} denotes two occurrences.
The ^ denotes the string should start with FF and $ means the string should end with 00.
Lets say you wanted to only match two numbers, so you would use \d{2}, the whole thing would look like this:
Regex rgx = new Regex(#"^FF-D8-FF-E0-\d{2}-\d{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex);
How do I know of these magical characters? Simple, there are docs everywhere. See this MSDN page for some basic regex patterns. This page shows some quantifiers, those are things like match one or more or match only one.
Cheat-sheets also come in handy.
A regex would help you; you can use the following tool to help you test and learn: -
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
I recommend you have a play because then you'll learn!
To simply match any character in place of the x, the following should work: -
"^FF-D8-FF-E0-..-..-4A-46-49-46-00$"
In C#, it would be something like this: -
var test = "FF-D8-FF-E0-AB-CD-4A-46-49-46-00";
var foo = new Regex("^FF-D8-FF-E0-..-..-4A-46-49-46-00$");
if (foo.IsMatch(test))
{
// Do magic
}
You will need to read up on regular expressions to understand some of the characters that may not look familiar, i.e. ^ and $. See http://www.regular-expressions.info/

How do I match an entire string with a regex?

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

C# Check a word exists in a string

Is the best way to do this with Regex? I don't want it picking up partial words for example if I'm search for Gav it shouldn't match Gavin.
Any examples would be great as my regular expression skills are non existant.
Thanks
Yes, a Regex is perfect for the job.
Something like:
string regexPattern = string.Format(#"\b{0}\b", Regex.Escape(yourWord));
if (Regex.IsMatch(yourString, regexPattern)) {
// word found
}
What you want is probably like this:
if (Regex.IsMatch(myString, #"\bGav\b")) { ... }
The \b:s in the regex indicate word boundaries, i.e. a whitespace or start/end of the string. You may also want to throw in RegexOptions.IgnoreCase as the third parameter if you want that. Note that the #-sign in front of the regex is essential, otherwise it gets misinterpreted due to the double meaning of the \ sign.

Categories