How to distinguish hexadecimal number and decimal number in Regex expression? - c#

My English skill is poor because i'm not a native English speaker.
If rude expression exists in this article, please understand.
I am using regex library of .Net Core.
I wanted to distinguish hexadecimal number and decimal number so I written a code to do this.
But the code was not operated so I extract core logic that I want and created the below code on Test project newly.
string identPattern = "(?<rwC>[_a-zA-Z][_a-zA-Z0-9]*)";
string hexaPattern = "(?<iAZ>[0x][0-9]+)";
string decimalPattern = "(?<oKZ>[0-9]+)";
string pattern = string.Format("{0}|{1}|{2}", identPattern, hexaPattern, decimalPattern);
foreach (var data in Regex.Matches("0x10;", pattern, RegexOptions.Multiline | RegexOptions.ExplicitCapture))
{
var matchData = data as Match;
}
If it executes the above code "0" of "0x10" string is matched.
This means "0x10" string is matched with decimalPattern ("(?[0-9]+)"). This is not result that I want.
I want that "0x10" string is matched with hexaPattern.
Why don't matched with hexaPattern? and How to solve this problem?
My test code and execute result is as below.
Thanks for reading.

There is an error is in your hex pattern. you should not surround 0x with [] as it means charset and will match only one character.
Try below for hex pattern
(?<iAZ>0x[0-9]+)
And if you don't want your decimal pattern to match your hex number, try adding ^ at the beginning of decimal pattern.

Related

Extract value from a string in C# from a specific position

I have bunch of files in a folder and I am looping through them.
How do I extract the value from the below example? I need the value 0519 only.
DOC 75-20-0519-1.PDF
The below code gives the complete part include -1.
Convert.ToInt32(Path.GetFileNameWithoutExtension(objFile).Split('-')[2]);
Appreciate any help.
You can try regular expressions in order to match the value.
pattern:
[0-9]+ - one ore more digits
(?=[^0-9][0-9]+$) - followed by not a digit and one or more digits and end of string
code:
using System.Text.RegularExpressions;
...
string file = "DOC 75-20-0519-1.PDF";
// "0519"
string result = Regex
.Match(Path.GetFileNameWithoutExtension(file), #"[0-9]+(?=[^0-9][0-9]+$)")
.Value;
If Split('-') fails, and you have an entire string as a result, it seems that you have a wrong delimiter. It can be, say, one of the dashes:
"DOC 75–20–0519–1.PDF"; // n-dash
"DOC 75—20—0519—1.PDF"; // m-dash
You can use REGEX for this
Match match = Regex.Match("DOC 75-20-0519-1.PDF", #"DOC\s+\d+\-\d+\-(\d+)\-\d+", RegexOptions.IgnoreCase);
string data = match.Groups[1].Value;

C# Regex for retrieving capital string in quotation mark

Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"

Simple regex question C#

I need to match the string that is shown in the window displayed below :
8% of setup_av_free.exe from software-files-l.cnet.com Completed
98% of test.zip from 65.55.72.119 Completed
[numeric]%of[filename]from[hostname | IP address]Completed
I have written the regex pattern halfway
if (Regex.IsMatch(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s]"))
MessageBox.Show(text);
and I now need to integrate the following regex into my code above
ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
ValidHostnameRegex = "^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
The 2 regex were taken from this link. These 2 regex works well when i use the Regex.ismatch to match "123.123.123.123" and "software-files-l.cnet.com" . However i cannot get it to work when i intergrate both of them to my existin regex code. I tried several variant but not able to get it to work. Can someone guide me to integrate the 2 regex to my existing code. Thanks in advance.
You can certainly combine all these regular expressions into one, but I'd recommend against it. Consider this method, first it checks wether your input text has the correct form overall, then it checks if the "from" part is an IP address or a hostname.
bool CheckString(string text) {
const string ValidIpAddressRegex = #"^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
const string ValidHostnameRegex = #"^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
var match = Regex.Match(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s](\S+)");
if(!match.Success)
return false;
string address = match.Groups[3].Value;
return Regex.IsMatch(address, ValidIpAddressRegex) ||
Regex.IsMatch(address, ValidHostnameRegex);
}
It does what you want and is much more readable and than single monster-sized regular expression. If you aren't going to call this method millions of time in a loop there is no reason to be concerned about it being less performant that single regex.
Also, in case you aren't aware of that the brackets around \d or \s aren't necessary.
The "Problem" that those two regexes do not match your string is that they start with ^ and end with $
^ means match the start of the string (or row if the m modifier is activated)
$ means match the end of the string (or row if the m modifier is activated)
When you try it this is true but in your real text they are in the middle of the string, so it is not matched.
Try just remove the ^ at the very beginning and the $ at the very end.
Here you go.
^[\d]+%[\s+]of[\s+](.+?)(\.[^.]*)[\s+]from[\s+]((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])|((([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])))[\s+]Completed
Remove the ^ and $ characters from the ValidIpAddressRegex and ValidHostnameRegex samples above, and add them separated by the or character (|) enclosed by parentheses.
You could use this, its should work for all cases. I mightve accidentally deleted a character while formatting so let me know if it doesnt work.
string captureString = "8% of setup_av_free.exe from software-files-l.cnet.com Completed";
Regex reg = new Regex(#"(?<perc>\d+)% of (?<file>\w+\.\w+) from (?<host>" +
#"(\d+\.\d+.\d+.\d+)|(((https?|ftp|gopher|telnet|file|notes|ms-help):" +
#"((//)|(\\\\))+)?[\w\d:##%/;$()~_?\+-=\\\.&]*)) Completed");
Match m = reg.Match(captureString);
string perc = m.Groups["perc"].Value;
string file = m.Groups["file"].Value;
string host = m.Groups["host"].Value;

Parse the number with Regex with non capturing group

I'm trying to parse phone number with regex. Exactly I want to get a string with phone number in it using function like this:
string phoneRegex = #"^([+]|00)(\d{2,12}(?:\s*-*)){1,5}$";
string formated = Regex.Match(e.Value.ToString(), phoneRegex).Value;
As you can see I'm trying to use non-capturing group (?:\s*-*) but I'm doing something wrong.
Expected resoult should be:
input (e.Value): +48 123 234 344 or +48 123234344 or +48 123-234-345
output: +48123234344
Thanks in advance for any suggestions.
Regex.Match will not alter the string for you; it will simply match it. If you have a phone number string and want to format it by removing unwanted characters, you will want to use the Regex.Replace method:
// pattern for matching anything that is not '+' or a decimal digit
string replaceRegex = #"[^+\d]";
string formated = Regex.Replace("+48 123 234 344", replaceRegex, string.Empty);
In my sample the phone number is hard-coded, but it's just for demonstration purposes.
As a side note; the regex that you have in your code sample above assumes that the country code is 2 digits; this may not be the case. The United States has a one digit code (1) and many countries have 3-digit codes (perhaps there are countries with more digits than that, as well?).
This should work:
Match m = Regex.Match(s, #"^([+]|00)\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$");
return String.Format("{0}{1}{2}{4}", m.Groups[1], m.Groups[2], m.Groups[3], m.Groups[3]);

Regex for numbers only

I haven't used regular expressions at all, so I'm having difficulty troubleshooting. I want the regex to match only when the contained string is all numbers; but with the two examples below it is matching a string that contains all numbers plus an equals sign like "1234=4321". I'm sure there's a way to change this behavior, but as I said, I've never really done much with regular expressions.
string compare = "1234=4321";
Regex regex = new Regex(#"[\d]");
if (regex.IsMatch(compare))
{
//true
}
regex = new Regex("[0-9]");
if (regex.IsMatch(compare))
{
//true
}
In case it matters, I'm using C# and .NET2.0.
Use the beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.
Note that "\d" will match [0-9] and other digit characters like the Eastern Arabic numerals ٠١٢٣٤٥٦٧٨٩. Use "^[0-9]+$" to restrict matches to just the Arabic numerals 0 - 9.
If you need to include any numeric representations other than just digits (like decimal values for starters), then see #tchrist's comprehensive guide to parsing numbers with regular expressions.
Your regex will match anything that contains a number, you want to use anchors to match the whole string and then match one or more numbers:
regex = new Regex("^[0-9]+$");
The ^ will anchor the beginning of the string, the $ will anchor the end of the string, and the + will match one or more of what precedes it (a number in this case).
If you need to tolerate decimal point and thousand marker
var regex = new Regex(#"^-?[0-9][0-9,\.]+$");
You will need a "-", if the number can go negative.
This works with integers and decimal numbers. It doesn't match if the number has the coma thousand separator ,
"^-?\\d*(\\.\\d+)?$"
some strings that matches with this:
894
923.21
76876876
.32
-894
-923.21
-76876876
-.32
some strings that doesn't:
hello
9bye
hello9bye
888,323
5,434.3
-8,336.09
87078.
It is matching because it is finding "a match" not a match of the full string. You can fix this by changing your regexp to specifically look for the beginning and end of the string.
^\d+$
Perhaps my method will help you.
public static bool IsNumber(string s)
{
return s.All(char.IsDigit);
}
If you need to check if all the digits are number (0-9) or not,
^[0-9]+$
Matches
1425
0142
0
1
And does not match
154a25
1234=3254
Sorry for ugly formatting.
For any number of digits:
[0-9]*
For one or more digit:
[0-9]+
^\d+$, which is "start of string", "1 or more digits", "end of string" in English.
Here is my working one:
^(-?[1-9]+\\d*([.]\\d+)?)$|^(-?0[.]\\d*[1-9]+)$|^0$
And some tests
Positive tests:
string []goodNumbers={"3","-3","0","0.0","1.0","0.1","0.0001","-555","94549870965"};
Negative tests:
string []badNums={"a",""," ","-","001","-00.2","000.5",".3","3."," -1","--1","-.1","-0"};
Checked not only for C#, but also with Java, Javascript and PHP
Use beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.
While non of the above solutions was fitting my purpose, this worked for me.
var pattern = #"^(-?[1-9]+\d*([.]\d+)?)$|^(-?0[.]\d*[1-9]+)$|^0$|^0.0$";
return Regex.Match(value, pattern, RegexOptions.IgnoreCase).Success;
Example of valid values:
"3",
"-3",
"0",
"0.0",
"1.0",
"0.7",
"690.7",
"0.0001",
"-555",
"945465464654"
Example of not valid values:
"a",
"",
" ",
".",
"-",
"001",
"00.2",
"000.5",
".3",
"3.",
" -1",
"--1",
"-.1",
"-0",
"00099",
"099"
Another way: If you like to match international numbers such as Persian or Arabic, so you can use following expression:
Regex = new Regex(#"^[\p{N}]+$");
To match literal period character use:
Regex = new Regex(#"^[\p{N}\.]+$");
Regex for integer and floating point numbers:
^[+-]?\d*\.\d+$|^[+-]?\d+(\.\d*)?$
A number can start with a period (without leading digits(s)),
and a number can end with a period (without trailing digits(s)).
Above regex will recognize both as correct numbers.
A . (period) itself without any digits is not a correct number.
That's why we need two regex parts there (separated with a "|").
Hope this helps.
I think that this one is the simplest one and it accepts European and USA way of writing numbers e.g. USA 10,555.12 European 10.555,12
Also this one does not allow several commas or dots one after each other e.g. 10..22 or 10,.22
In addition to this numbers like .55 or ,55 would pass. This may be handy.
^([,|.]?[0-9])+$
console.log(/^(0|[1-9][0-9]*)$/.test(3000)) // true
If you want to extract only numbers from a string the pattern "\d+" should help.
To check string is uint, ulong or contains only digits one .(dot) and digits
Sample inputs
Regex rx = new Regex(#"^([1-9]\d*(\.)\d*|0?(\.)\d*[1-9]\d*|[1-9]\d*)$");
string text = "12.0";
var result = rx.IsMatch(text);
Console.WriteLine(result);
Samples
123 => True
123.1 => True
0.123 => True
.123 => True
0.2 => True
3452.434.43=> False
2342f43.34 => False
svasad.324 => False
3215.afa => False
The following regex accepts only numbers (also floating point) in both English and Arabic (Persian) languages (just like Windows calculator):
^((([0\u0660\u06F0]|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+)(\.)[0-9\u0660-\u0669\u06F0-\u06F9]+)|(([0\u0660\u06F0]?|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+))|\b)$
The above regex accepts the following patterns:
11
1.2
0.3
۱۲
۱.۳
۰.۲
۲.۷
The above regex doesn't accept the following patterns:
3.
.3
0..3
.۱۲
Regex regex = new Regex ("^[0-9]{1,4}=[0-9]{1,4]$")

Categories