I want to get a regex which can tell if two given words are in one sentence (word order matters). The problem is that I can have a contraction in a sentence, so the period doesn't indicate that there's the end of the sentence. The part of regex which indicates the end of the sentence is
\\.(\s+[A-Z]|\s*$)
What would the pattern look like?
You could use this:
(\b\w+\b)(?:[^.]|\.\s)*(\b\w+\b)
This basically says, match and capture a word, then anything that is not a period, or a period followed b a space, any number of times, and finally match and capture another word.
EDIT: For given words in either order, use:
(\bWord1\b)(?:[^.]|\.\s)*(\bWord2\b)|(\bWord2\b)(?:[^.]|\.\s)*(\bWord1\b)
Not c#, but you should get the idea
for sentence in split_text_with_regex(text):
index_word1 = sentence.find(word1)
index_word2 = sentence.find(word2)
# do your thing
There is a very good set of options available here http://www.regular-expressions.info/near.html
Also you can construct the regular expression in Visual Studio itself . Refer to this link http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx first paragraph
So I think it's something like this (untested):
(([\w\s]*\s)?Word1\s([\w\s]*)?\sWord2(\s[\w\s]*)?\.)(?=(\s+[A-Z]|\s*$))
Edit: Thinking about it, that won't match punctuation (commas, apostrophes). Perhaps each [\w\s] should be [^\.] or a list of possible characters.
Related
/((?:(?:is|will) (?!is))[^<?]+)/i
Test sentences:
text can be infront, Will is a good guy?
same here. Will you be able to help me?
I dont want it to match the first sentence and I want it to match the second.(which it does atm)
I am trying to learn how I can return the whole regex as false if the word after "is|will" is "is", but it keeps matching and eventually finds a match in example number one. I am kind of new to regex, so all help is appreciated.
This is what the match looks like:
Thanks in advance
Your expression seems fine, but you need to anchor it so that it starts matching from the beginning of the text. You can use ^ at the start of the regex.
/^((?:(?:is|will) (?!is))[^<?]+)/i
Without the anchor, the regex engine tries to find a match anywhere in the string. In the case of the first sentence, it can match the word is as the first word in the expression, and then the rest of the expression matches.
I am trying to match 3 words that can appear anywhere in the string:
Win
Enter
Now
All 3 words must exist in the string for it return as a match. But I am having issues for getting a match when all 3 words do exist.
Below is the regex I am using: http://regexr.com/39b83
^(?=.*?win)(?=.*?(enter))(?=.*?(now)).*
Regex is working when all three words are within the same line... when its spread out across the entire string on different lines, it is failing to match.
Any direction or help is appreciated.
Since you don't want to match words like center (with the word "enter"), I would use:
/(\benter\b)|(\bwin\b)|(\bnow\b)/
Link to Fiddler
I think C# would support (?s) DOTALL modifier. If yes then you could try the below regex,
(?i)(?s)win.*?enter.*?now
How about...
/(win|enter|now)/gi
It sounds like you want to match the lines on which these words appear, across up to three lines. That’s not really easy, but:
/^.*win.*(?:\s+.*)?enter.*(?:\s+.*)?now.*|^.*win.*(?:\s+.*)?now.*(?:\s+.*)?enter.*|^.*enter.*(?:\s+.*)?win.*(?:\s+.*)?now.*|^.*enter.*(?:\s+.*)?now.*(?:\s+.*)?win.*|^.*now.*(?:\s+.*)?win.*(?:\s+.*)?enter.*|^.*now.*(?:\s+.*)?enter.*(?:\s+.*)?win.*/igm
should do it.
It 's because the dot doesn't match the newline character. To change this, you have to ways. The first, use the s modifier (that allows the dot to match newlines):
(?s)^(?=.*\bwin\b)(?=.*\benter\b)(?=.*\bnow\b).*
But this feature isn't always available (for example in Javascript). The second way consists to replace the dot with [\s\S] (a character class that matches all the characters):
^(?=[\s\S]*\bwin\b)(?=[\s\S]*\benter\b)(?=[\s\S]*\bnow\b)[\s\S]+
I need some help on a problem.
In fact I search to check for an image type by the hexadecimal code.
string JpgHex = "FF-D8-FF-E0-xx-xx-4A-46-49-46-00";
Then I have a condition on
string.StartsWith(pngHex).
The problem is that the "x" characters presents in my "JpgHex" string can be whatever I want.
I think I need a regex to check that but I don't know how!!
Thanks a lot!
I'm not quite clear what exactly you want to do, but the dot '.' character represents any character in Regex.
So the regex "^FF-D8-FF-E0-..-..-4A-46-49-46-00" will probably do the trick. '^' = Start of input.
If you want to allow only hex chars you can use "^FF-D8-FF-E0-[0-9A-F]{2}-[0-9A-F]{2}-4A-46-49-46-00".
Like I said, I'd need a better idea of what pattern you need to match.
Here are some examples:
Regex rgx =
new Regex(#"^FF-D8-FF-E0-[a-zA-Z0-9]{2}-[a-zA-Z0-9]{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex); // is match will return a bool.
I use [a-zA-Z0-9]{2} to denote two instances of a character, caps or small or a number. So the above regex would match :
FF-D8-FF-E0-aa-zZ-4A-46-49-46-00
FF-D8-FF-E0-11-22-4A-46-49-46-00
.. etc
Based on your need change the regex accordingly so for capitals and numbers only you change to [A-Z0-9]. The {2} denotes two occurrences.
The ^ denotes the string should start with FF and $ means the string should end with 00.
Lets say you wanted to only match two numbers, so you would use \d{2}, the whole thing would look like this:
Regex rgx = new Regex(#"^FF-D8-FF-E0-\d{2}-\d{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex);
How do I know of these magical characters? Simple, there are docs everywhere. See this MSDN page for some basic regex patterns. This page shows some quantifiers, those are things like match one or more or match only one.
Cheat-sheets also come in handy.
A regex would help you; you can use the following tool to help you test and learn: -
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
I recommend you have a play because then you'll learn!
To simply match any character in place of the x, the following should work: -
"^FF-D8-FF-E0-..-..-4A-46-49-46-00$"
In C#, it would be something like this: -
var test = "FF-D8-FF-E0-AB-CD-4A-46-49-46-00";
var foo = new Regex("^FF-D8-FF-E0-..-..-4A-46-49-46-00$");
if (foo.IsMatch(test))
{
// Do magic
}
You will need to read up on regular expressions to understand some of the characters that may not look familiar, i.e. ^ and $. See http://www.regular-expressions.info/
I need a regular expression to identify all instances where a sentence begins without a space following the previous period.
For example, this is a bad sentence:
I'm sentence one.This is sentence two.
this needs to be fixed as follows:
I'm sentence one. This is sentence two.
It's not simply a case of doing a string replace of '.' with '. ' because there are a also a lot of isntances where the rest of the sentences in the paragraph the correct spacing, and this would give those an extra space.
\.(?!\s) will match dots not followed by a space. You probably want exclamation marks and question marks as well though: [\.\!\?](?!\s)
Edit:
If C# supports it, try this: [\.\!\?](?!\s|$). It won't match the punctuation at the end of the string.
You could search for \w\s{1}\.[A-Z] to find a word character, followed by a single space character, followed by a period, followed by a Capital letter, to identify these. For a find/replace: find: (\w\s{1}\.)(A-Z]) and replace with $1 $2.
I doubt that you can create a regular expression that will work in the general case.
Any regex solution you come up with is going to have some interesting edge cases that you'll have to look at carefully. For example, the abbreviation "i.e." would become "i. e." (i.e., it will have an extra space and, if this parenthetical comment were run through the regex, it would become "i. e. ,").
Also, the proper way to quote text is to include the punctuation inside the quotes, as in "He said it was okay." If you had ["He said it was okay."This is a new sentence.], your regex solution might put a space before the final quote, or might ignore the error altogether.
Those are just two cases that come to mind immediately. There are plenty of others.
Whereas a regular expression will work in a limited set of simple sentences, real written language will quickly show that regular expressions are insufficient to provide a general solution to this problem.
if a sentence ends with e.g. ... you probably don't want to change this to . . .
I think the previous answers don't consider this case.
try to insert space where you find a word followed a new word starting with uppercase
find (\w+[\.!?])([A-Z]'?\w+) replace $1 $2
Best website ever: http://www.regular-expressions.info/reference.html
I am working in asp.net. I am using Regular Expression Validator
Could you please help me in creating a regular expression for not allowing special characters other than comma. Comma has to be allowed.
I checked in regexlib, however I could not find a match. I treid with ^(a-z|A-Z|0-9)*[^#$%^&*()']*$ . When I add other characters as invalid, it does not work.
Also could you please suggest me a place where I can find a good resource of regular expressions? regexlib seems to be big; but any other place which lists very limited but most used examples?
Also, can I create expressions using C# code? Any articles for that?
[\w\s,]+
works fine, as you can see bellow.
RegExr is a great place to test your regular expressions with real time results, it also comes with a very complete list of common expressions.
[] character class \w Matches any word character (alphanumeric & underscore). \s
Matches any whitespace character (spaces, tabs, line breaks). , include comma + is greedy match; which will match the previous 1 or more times.
[\d\w\s,]*
Just a guess
To answer on any articles, I got started here, find it to be an excellent resource:
http://www.regular-expressions.info/
For your current problem, try something like this:
[\w\s,]*
Here's a breakdown:
Match a single character present in the list below «[\w\s,]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A word character (letters, digits, etc.) «\w»
A whitespace character (spaces, tabs, line breaks, etc.) «\s»
The character “,” «,»
For a single character that is not a comma, [^,] should work perfectly fine.
You can try [\w\s,] regular expression. This regex will match only alpha-numeric characters and comma. If any other character appears within text, then this wont match.
For your second question regarding regular expression resource, you can goto
http://www.regular-expressions.info/
This website has lot of tutorials on regex, plus it has lot of usefult information.
Also, can I create expressions using
C# code? Any articles for that?
By this, do you mean to say you want to know which class and methods for regular expression execution? Or you want tool that will create regular expression for you?
You can create expressions with C#, something like this usually does the trick:
Regex regex = new Regex(#"^[a-z | 0-9 | /,]*$", RegexOptions.IgnoreCase);
System.Console.Write("Enter Text");
String s = System.Console.ReadLine();
Match match = regex.Match(s);
if (match.Success == true)
{
System.Console.WriteLine("True");
}
else
{
System.Console.WriteLine("False");
}
System.Console.ReadLine();
You need to import the System.Text.RegularExpressions;
The regular expression above, accepts only numbers, letters (both upper and lower case) and the comma.
For a small introduction to Regular Expressions, I think that the book for MCTS 70-536 can be of a big help, I am pretty sure that you can either download it from somewhere or obtain a copy.
I am assuming that you never messed around with regular expressions in C#, hence I provided the code above.
Hope this helps.
Thank you, all..
[\w\s,]* works
Let me go through regular-expressions.info and come back if I need further support.
Let me try the C# code approach and come back if I need further support.
[This forum is awesome. Quality replies so qucik..]
Thanks again
(…) is denoting a grouping and not a character set that’s denoted with […]. So try this:
^[a-zA-Z0-9,]*$
This will only allow alphanumeric characters and the comma.