how to match multiple words in string using regex? - c#

I am trying to match 3 words that can appear anywhere in the string:
Win
Enter
Now
All 3 words must exist in the string for it return as a match. But I am having issues for getting a match when all 3 words do exist.
Below is the regex I am using: http://regexr.com/39b83
^(?=.*?win)(?=.*?(enter))(?=.*?(now)).*
Regex is working when all three words are within the same line... when its spread out across the entire string on different lines, it is failing to match.
Any direction or help is appreciated.

Since you don't want to match words like center (with the word "enter"), I would use:
/(\benter\b)|(\bwin\b)|(\bnow\b)/
Link to Fiddler

I think C# would support (?s) DOTALL modifier. If yes then you could try the below regex,
(?i)(?s)win.*?enter.*?now

How about...
/(win|enter|now)/gi

It sounds like you want to match the lines on which these words appear, across up to three lines. That’s not really easy, but:
/^.*win.*(?:\s+.*)?enter.*(?:\s+.*)?now.*|^.*win.*(?:\s+.*)?now.*(?:\s+.*)?enter.*|^.*enter.*(?:\s+.*)?win.*(?:\s+.*)?now.*|^.*enter.*(?:\s+.*)?now.*(?:\s+.*)?win.*|^.*now.*(?:\s+.*)?win.*(?:\s+.*)?enter.*|^.*now.*(?:\s+.*)?enter.*(?:\s+.*)?win.*/igm
should do it.

It 's because the dot doesn't match the newline character. To change this, you have to ways. The first, use the s modifier (that allows the dot to match newlines):
(?s)^(?=.*\bwin\b)(?=.*\benter\b)(?=.*\bnow\b).*
But this feature isn't always available (for example in Javascript). The second way consists to replace the dot with [\s\S] (a character class that matches all the characters):
^(?=[\s\S]*\bwin\b)(?=[\s\S]*\benter\b)(?=[\s\S]*\bnow\b)[\s\S]+

Related

C# Regex Validation

Can someone please validate this for me (newbie of regex match cons).
Rather than asking the question, I am writing this:
Regex rgx = new Regex (#"^{3}[a-zA-Z0-9](\d{5})|{3}[a-zA-Z0-9](\d{9})$"
Can someone telll me if it's OK...
The accounts I am trying to match are either of:
1. BAA89345 (8 chars)
2. 12345678 (8 chars)
3. 123456789112 (12 chars)
Thanks in advance.
You can use a Regex tester. Plenty of free ones online. My Regex Tester is my current favorite.
Is the value with 3 characters then followed by digits always starting with three... can it start with less than or more than three. What are these mins and max chars prior to the digits if they can be.
You need to place your quantifiers after the characters they are supposed to quantify. Also, character classes need to be wrapped in square brackets. This should work:
#"^(?:[a-zA-Z0-9]{3}|\d{3}\d{4})\d{5}$"
There are several good, automated regex testers out there. You may want to check out regexpal.
Although that may be a perfectly valid match, I would suggest rewriting it as:
^([a-zA-Z]{3}\d{5}|\d{8}|\d{12})$
which requires the string to match one of:
[a-zA-Z]{3}\d{5} three alpha and five numbers
\d{8} 8 digits or
\d{12} twelve digits.
Makes it easier to read, too...
I'm not 100% on your objective, but there are a few problems I can see right off the bat.
When you list the acceptable characters to match, like with a-zA-Z0-9, you need to put it inside brackets, like [a-zA-Z0-9] Using a ^ at the beginning will negate the contained characters, e.g. `[^a-zA-Z0-9]
Word characters can be matched like \w, which is equivalent to [a-zA-Z0-9_].
Quantifiers need to appear at the end of the match expression. So, instead of {3}[a-zA-Z0-9], you would need to write [a-zA-Z0-9]{3} (assuming you want to match three instances of a character that matches [a-zA-Z0-9]

Regular expression need to identify where sentences don't have a space between them

I need a regular expression to identify all instances where a sentence begins without a space following the previous period.
For example, this is a bad sentence:
I'm sentence one.This is sentence two.
this needs to be fixed as follows:
I'm sentence one. This is sentence two.
It's not simply a case of doing a string replace of '.' with '. ' because there are a also a lot of isntances where the rest of the sentences in the paragraph the correct spacing, and this would give those an extra space.
\.(?!\s) will match dots not followed by a space. You probably want exclamation marks and question marks as well though: [\.\!\?](?!\s)
Edit:
If C# supports it, try this: [\.\!\?](?!\s|$). It won't match the punctuation at the end of the string.
You could search for \w\s{1}\.[A-Z] to find a word character, followed by a single space character, followed by a period, followed by a Capital letter, to identify these. For a find/replace: find: (\w\s{1}\.)(A-Z]) and replace with $1 $2.
I doubt that you can create a regular expression that will work in the general case.
Any regex solution you come up with is going to have some interesting edge cases that you'll have to look at carefully. For example, the abbreviation "i.e." would become "i. e." (i.e., it will have an extra space and, if this parenthetical comment were run through the regex, it would become "i. e. ,").
Also, the proper way to quote text is to include the punctuation inside the quotes, as in "He said it was okay." If you had ["He said it was okay."This is a new sentence.], your regex solution might put a space before the final quote, or might ignore the error altogether.
Those are just two cases that come to mind immediately. There are plenty of others.
Whereas a regular expression will work in a limited set of simple sentences, real written language will quickly show that regular expressions are insufficient to provide a general solution to this problem.
if a sentence ends with e.g. ... you probably don't want to change this to . . .
I think the previous answers don't consider this case.
try to insert space where you find a word followed a new word starting with uppercase
find (\w+[\.!?])([A-Z]'?\w+) replace $1 $2
Best website ever: http://www.regular-expressions.info/reference.html

C# Regex Replace Pattern (Replace String) Return $1

I'm currently working with parsing some data from SQL Server and I'm in need of help with a Regex.
I have an assembly in Sql Server 2005 that helps me Replace strings using C# Regex.Replace() Method.
I need to parse the following.
Strings:
CAD 90890
(CAD 90892)
CAD G67859
CAD 34G56
CAD 3S56.
AX CAD 890990
CAD 783783 MX
Needed Results:
90890
90892
G67859
34G56
3S56
890990
783783
SELECT TOP 25 CADCODE, dbo.RegExReplace(CADCODE, '*pattern*', '$1')
FROM dbo.CADCODES
WHERE CADCODE LIKE '%CAD%'
I need to get the proceeding string after the CAD word until it hits a white-space or anything that not a number or digit. I managed to get the digits but it really fails on others. I'm trying to get it to work but I can't find a real solution.
Thanks in advance.
Updated to reflect new Strings
AX CAD 890990
CAD 783783 MX
Try this:
(\w+)\W*$
The pattern matches the last word - made of alphanumeric (and underscores).
Example: http://www.rubular.com/r/1zWQQVLZy1
Another option is to find a word with at least one digit - this one can match anywhere on the string, so you may need to handle multiple matches. In this case, you can add a capturing group around the whole pattern, or replace using $&.
[a-zA-Z_]*\d\w*
Example: http://www.rubular.com/r/XUrFNuPQUv
If you can't match (Regex.Match) and must use Regex.Replace, you can match the entire string start to end and replace it with the group you need:
RegExReplace(CADCODE, '^.*\b([a-zA-Z_]*\d\w*)\b.*$', '$1')
I think this is what you're after:
^\W*\w*CAD\w*\W*(\w+)\W*$
The regex has to match the whole string so RegExReplace can replace it with $1, effectively stripping off the unwanted parts.
EDIT: Let me back up and make sure I've got this right. Because of the
WHERE CADCODE LIKE '%CAD%'
in your query, you already know every string contains the sequence CAD. That being the case, there's no need to complicate the regex by matching that sequence again. This should be all you need:
^.*?(\w+)\W*$
Try this:
(?:\(CAD\)|CAD)\s+?([\dA-Z]+)
You can get the result from the capture group number 1.
The problem with regex is that it's always easy to get a good pattern if you have a limited sample set.
In your case, you use:
\w{4}\w*
which just says, 4 alphanumerics, followed by 0 or more alphanumerics, so all the CAD sections would not match, nor would spaces or ().

Regex two given words in one sentence

I want to get a regex which can tell if two given words are in one sentence (word order matters). The problem is that I can have a contraction in a sentence, so the period doesn't indicate that there's the end of the sentence. The part of regex which indicates the end of the sentence is
\\.(\s+[A-Z]|\s*$)
What would the pattern look like?
You could use this:
(\b\w+\b)(?:[^.]|\.\s)*(\b\w+\b)
This basically says, match and capture a word, then anything that is not a period, or a period followed b a space, any number of times, and finally match and capture another word.
EDIT: For given words in either order, use:
(\bWord1\b)(?:[^.]|\.\s)*(\bWord2\b)|(\bWord2\b)(?:[^.]|\.\s)*(\bWord1\b)
Not c#, but you should get the idea
for sentence in split_text_with_regex(text):
index_word1 = sentence.find(word1)
index_word2 = sentence.find(word2)
# do your thing
There is a very good set of options available here http://www.regular-expressions.info/near.html
Also you can construct the regular expression in Visual Studio itself . Refer to this link http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx first paragraph
So I think it's something like this (untested):
(([\w\s]*\s)?Word1\s([\w\s]*)?\sWord2(\s[\w\s]*)?\.)(?=(\s+[A-Z]|\s*$))
Edit: Thinking about it, that won't match punctuation (commas, apostrophes). Perhaps each [\w\s] should be [^\.] or a list of possible characters.

Regular Expression to reject special characters other than commas

I am working in asp.net. I am using Regular Expression Validator
Could you please help me in creating a regular expression for not allowing special characters other than comma. Comma has to be allowed.
I checked in regexlib, however I could not find a match. I treid with ^(a-z|A-Z|0-9)*[^#$%^&*()']*$ . When I add other characters as invalid, it does not work.
Also could you please suggest me a place where I can find a good resource of regular expressions? regexlib seems to be big; but any other place which lists very limited but most used examples?
Also, can I create expressions using C# code? Any articles for that?
[\w\s,]+
works fine, as you can see bellow.
RegExr is a great place to test your regular expressions with real time results, it also comes with a very complete list of common expressions.
[] character class \w Matches any word character (alphanumeric & underscore). \s
Matches any whitespace character (spaces, tabs, line breaks). , include comma + is greedy match; which will match the previous 1 or more times.
[\d\w\s,]*
Just a guess
To answer on any articles, I got started here, find it to be an excellent resource:
http://www.regular-expressions.info/
For your current problem, try something like this:
[\w\s,]*
Here's a breakdown:
Match a single character present in the list below «[\w\s,]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A word character (letters, digits, etc.) «\w»
A whitespace character (spaces, tabs, line breaks, etc.) «\s»
The character “,” «,»
For a single character that is not a comma, [^,] should work perfectly fine.
You can try [\w\s,] regular expression. This regex will match only alpha-numeric characters and comma. If any other character appears within text, then this wont match.
For your second question regarding regular expression resource, you can goto
http://www.regular-expressions.info/
This website has lot of tutorials on regex, plus it has lot of usefult information.
Also, can I create expressions using
C# code? Any articles for that?
By this, do you mean to say you want to know which class and methods for regular expression execution? Or you want tool that will create regular expression for you?
You can create expressions with C#, something like this usually does the trick:
Regex regex = new Regex(#"^[a-z | 0-9 | /,]*$", RegexOptions.IgnoreCase);
System.Console.Write("Enter Text");
String s = System.Console.ReadLine();
Match match = regex.Match(s);
if (match.Success == true)
{
System.Console.WriteLine("True");
}
else
{
System.Console.WriteLine("False");
}
System.Console.ReadLine();
You need to import the System.Text.RegularExpressions;
The regular expression above, accepts only numbers, letters (both upper and lower case) and the comma.
For a small introduction to Regular Expressions, I think that the book for MCTS 70-536 can be of a big help, I am pretty sure that you can either download it from somewhere or obtain a copy.
I am assuming that you never messed around with regular expressions in C#, hence I provided the code above.
Hope this helps.
Thank you, all..
[\w\s,]* works
Let me go through regular-expressions.info and come back if I need further support.
Let me try the C# code approach and come back if I need further support.
[This forum is awesome. Quality replies so qucik..]
Thanks again
(…) is denoting a grouping and not a character set that’s denoted with […]. So try this:
^[a-zA-Z0-9,]*$
This will only allow alphanumeric characters and the comma.

Categories