Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"
Related
I am trying to match text between two delimiters, [% %], and I want to get everything whether the string contains new lines or not.
Code
string strEmailContent = sr.ReadToEnd();
string commentPatt = #"\[%((\r\n?|\n).*(\r\n?|\n))%\]";
Regex commentRgx = new Regex(commentPatt, RegexOptions.Singleline);
Sample Inputs
//Successful
[%
New Comment
%] other content from input
//Match: [%\r\nNew Comment\r\n%]
//Fail
[% New Comment %]
//Match: false
//Successfully match single line with
string commentPatt = #"\[%(.*)%\]";
//Match: [% New Comment %]
I do not know how to combine these two patterns to match both cases. Can anyone provide any assistance?
To get text between two delimiters you need to use lazy matching with .*?, but to also match newline symbols, you need (?s) singleline modifier so that the dot could also match newline symbols:
(?s)\[%(.*?)%]
Note that (?s)\[%(.*?)%] will match even if the % is inside [%...%].
See regex demo. Note that the ] does not have to be escaped since it is situated in an unambiguous position and can only be interpreted as a literal ].
In C#, you can use
var rx = new Regex(#"(?s)\[%(.*?)%]");
var res = rx.Matches(str).Cast<Match>().Select(p => p.Groups[1].Value).ToList();
Try this pattern:
\[%([^%]*)%\]
It captures all characters between "[%" and "%]" that is not a "%" character.
Tested # Regex101
If you want to "see" the "\r\n" in your results, you'll have to escape them with a String.Replace().
See Fiddle Demo
My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.
I need to extract a substring from an existing string. This String starts with uninteresting characters (include "," "space" and numbers) and ends with ", 123," or ", 57," or something like this where the numbers can change. I only need the Numbers.
Thanks
public static void Main(string[] args)
{
string input = "This is 2 much junk, 123,";
var match = Regex.Match(input, #"(\d*),$"); // Ends with at least one digit
// followed by comma,
// grab the digits.
if(match.Success)
Console.WriteLine(match.Groups[1]); // Prints '123'
}
Regex to match numbers: Regex regex = new Regex(#"\d+");
Source (slightly modified): Regex for numbers only
I think this is what you're looking for:
Remove all non numeric characters from a string using Regex
using System.Text.RegularExpressions;
...
string newString = Regex.Replace(oldString, "[^.0-9]", "");
(If you don't want to allow the decimal delimiter in the final result, remove the . from the regular expression above).
Try something like this :
String numbers = new String(yourString.TakeWhile(x => char.IsNumber(x)).ToArray());
You can use \d+ to match all digits within a given string
So your code would be
var lst=Regex.Matches(inp,reg)
.Cast<Match>()
.Select(x=x.Value);
lst now contain all the numbers
But if your input would be same as provided in your question you don't need regex
input.Substring(input.LastIndexOf(", "),input.LastIndexOf(","));
I am having a regular expression
Regex r = new Regex(#"(\s*)([A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]\d(?!.*[DFIOQU])(?:[A-Z](\s?)\d[A-Z]\d))(\s*)",RegexOptions.IgnoreCase);
and having a string
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
I have to fetch C1C 1C1.This running fine.
But if a modify test string as
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
then it is unable to find the pattern i.e C1C 1C1.
any idea why this expression is failing?
You have a negative look ahead:
(?!.*[DFIOQU])
That matches the "O" in "ON" and since it is a negative look ahead, the whole pattern fails. And, as an aside, I think you want to replace this:
[A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]
With this:
[A-CEGHJ-NPR-TVYX]
A pipe (|) is a literal character inside a character class, not an alternation, and you can use ranges to help hilight the characters that you're leaving out.
A single regex might not be the best way to parse that string. Or perhaps you just need a looser regex.
You are searching for a not a following DFIOQU with your negative look ahead (?!.*[DFIOQU])
In your second string there is a O at the end in ON, so it must be failing to match.
If you remove the .* in your negative look ahead it will only check the directly following character and not the complete string to the end (Is it this what you want?).
\s*([ABCEGHJKLMNPRSTVYX]\d(?![DFIOQU])(?:[A-Z]\s?\d[A-Z]\d))\s*
then it works, see it here on Regexr. It is now checking if there is not one of the characters in the class directly after the digit, I don't know if this is intended.
Btw. I removed the | from your first character class, its not needed and also some brackets around your whitespaces, also not needed.
As I understood you need to find the C1C 1C1 text in your string
I've used this regex for do this
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
after that you can extract text from named groups
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
string secondStr = "LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
Match match = myRegex.Match(strTargetString);
string c1c = match.Groups["c1c"].Value;
string c1c2 = match.Groups["c1c2"].Value;
Console.WriteLine(c1c + " " +c1c2);
What am I doing wrong here?
string q = "john s!";
string clean = Regex.Replace(q, #"([^a-zA-Z0-9]|^\s)", string.Empty);
// clean == "johns". I want "john s";
just a FYI
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
would actually be better like
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);
This:
string clean = Regex.Replace(dirty, "[^a-zA-Z0-9\x20]", String.Empty);
\x20 is ascii hex for 'space' character
you can add more individual characters that you want to be allowed.
If you want for example "?" to be ok in the return string add \x3f.
I got it:
string clean = Regex.Replace(q, #"[^a-zA-Z0-9\s]", string.Empty);
Didn't know you could put \s in the brackets
The following regex is for space inclusion in textbox.
Regex r = new Regex("^[a-zA-Z\\s]+");
r.IsMatch(textbox1.text);
This works fine for me.
I suspect ^ doesn't work the way you think it does outside of a character class.
What you're telling it to do is replace everything that isn't an alphanumeric with an empty string, OR any leading space. I think what you mean to say is that spaces are ok to not replace - try moving the \s into the [] class.
There appear to be two problems.
You're using the ^ outside a [] which matches the start of the line
You're not using a * or + which means you will only match a single character.
I think you want the following regex #"([^a-zA-Z0-9\s])+"
bottom regex with space, supports all keyboard letters from different culture
string input = "78-selim güzel667.,?";
Regex regex = new Regex(#"[^\w\x20]|[\d]");
var result= regex.Replace(input,"");
//selim güzel
The circumflex inside the square brackets means all characters except the subsequent range. You want a circumflex outside of square brackets.
This regex will help you to filter if there is at least one alphanumeric character and zero or more special characters i.e. _ (underscore), \s whitespace, -(hyphen)
string comparer = "string you want to compare";
Regex r = new Regex(#"^([a-zA-Z0-9]+[_\s-]*)+$");
if (!r.IsMatch(comparer))
{
return false;
}
return true;
Create a set using [a-zA-Z0-9]+ for alphanumeric characters, "+" sign (a quantifier) at the end of the set will make sure that there will be at least one alphanumeric character within the comparer.
Create another set [_\s-]* for special characters, "*" quantifier is to validate that there can be special characters within comparer string.
Pack these sets into a capture group ([a-zA-Z0-9]+[_\s-]*)+ to say that the comparer string should occupy these features.
[RegularExpression(#"^[A-Z]+[a-zA-Z""'\s-]*$")]
Above syntax also accepts space