How to extract digits between two fixed strings in Arabic Language? - c#

I have a string in the format:
خصم بقيمة 108 بتاريخ 31-01-2021
And I want to replace the digits between the words: بقيمة & بتاريخ with a "?" character.
And keep the digits in the date part of the string
I tried using this Regular Expression: (?<=بقيمة)(.*?)(?=بتاريخ)
Which works on https://regex101.com/
But when I implement it in C# in Regex.Replace function, it doesn't have any effect when I use the Arabic words:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=بقيمة)(.*?)(?=بتاريخ)", "?");
But it works if I use Latin letters:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=X)(.*?)(?=Y)", "?");
Is there anyway to make the function work with Arabic characters?
Or is there a better approach I can take to achieve the desired result? For example excluding the date part?

Since the needed digits (without "-"s) are bookended by spaces just use \s(\d+)\s.
var txt = "خصم بقيمة 108 بتاريخ 12-31-2021";
var pattern = #"\s(\d+)\s";
Console.WriteLine( Regex.Match(txt, pattern).Value ); // 108

Related

How to convert Pascal Case with Numbers to a sentence?

I am trying to convert a Pascal Case string with numbers to a sentence:
OpenHouse2StartTimestamp = > Open House 2 Start Timestamp
I've been able to use regex to separate them without numbers, thanks to this answer, but how to do so when numbers are present is eluding me:
string sentence = Regex.Replace(label, "[a-z][A-Z]", m => m.Value[0] + " " + m.Value[1]);
How can I add numbers into the mix?
You can use
var sentence = Regex.Replace(label, #"(?<=[a-z])(?=[A-Z])|(?<=\d)(?=\D)|(?<=\D)(?=\d)", " ");
See the .NET regex demo. The regex matches:
(?<=[a-z])(?=[A-Z])| - a location between a lower- and an uppercase ASCII letters, or
(?<=\d)(?=\D)| - a location between a digit and a non-digit, or
(?<=\D)(?=\d) - a location between a non-digit and a digit.
Since all you need is inserting a space at the positions matched, you do not need a Match evaluator, just use a string replacement pattern.

Search for 2 specific letters followed by 4 numbers Regex

I need to check if a string begins with 2 specific letters and then is followed by any 4 numbers.
the 2 letters are "BR" so BR1234 would be valid so would BR7412 for example.
what bit of code do I need to check that the string is a match with the Regex in C#?
the regex I have written is below, there is probably a more efficient way of writing this (I'm new to RegEx)
[B][R][0-9][0-9][0-9][0-9]
You can use this:
Regex regex = new Regex(#"^BR\d{4}");
^ defines the start of the string (so there should be no other characters before BR)
BR matches - well - BR
\d is a digit (0-9)
{4} says there must be exactly 4 of the previously mentioned group (\d)
You did not specify what is allowed to follow the four digits. If this should be the end of the string, add a $.
Usage in C#:
string matching = "BR1234";
string notMatching = "someOther";
Regex regex = new Regex(#"^BR\d{4}");
bool doesMatch = regex.IsMatch(matching); // true
doesMatch = regex.IsMatch(notMatching); // false;
BR\d{4}
Some text to make answer at least 30 characters long :)

Removing numbers from text using C#

I have a text file for processing, which has some numbers. I want JUST text in it, and nothing else. I managed to remove the punctuation marks, but how do I remove the numbers? I want this using C# code.
Also, I want to remove words with length greater than 10. How do I do that using Reg Expressions?
You can do this with a regex:
string withNumbers = // string with numbers
string withoutNumbers = Regex.Replace(withNumbers, "[0-9]", "");
Use this regex to remove words with more than 10 characters:
[\w]{10, 100}
100 defines the max length to match. I don't know if there is a quantifier for min length...
Only letters and nothing else (because I see you also want to remove the punctuation marks)
Regex.IsMatch(input, #"^[a-zA-Z]+$");
You can also use string.Join:
string s = "asdasdad34534t3sdf43534";
s = string.Join(null, System.Text.RegularExpressions.Regex.Split(s, "[\\d]"));
The Regex.Replace method should do the trick.
// regex to match any digit
var regex = new Regex("\d");
// replace all matches in input with empty string
var output = regex.Replace(input, String.Empty);

Parse the number with Regex with non capturing group

I'm trying to parse phone number with regex. Exactly I want to get a string with phone number in it using function like this:
string phoneRegex = #"^([+]|00)(\d{2,12}(?:\s*-*)){1,5}$";
string formated = Regex.Match(e.Value.ToString(), phoneRegex).Value;
As you can see I'm trying to use non-capturing group (?:\s*-*) but I'm doing something wrong.
Expected resoult should be:
input (e.Value): +48 123 234 344 or +48 123234344 or +48 123-234-345
output: +48123234344
Thanks in advance for any suggestions.
Regex.Match will not alter the string for you; it will simply match it. If you have a phone number string and want to format it by removing unwanted characters, you will want to use the Regex.Replace method:
// pattern for matching anything that is not '+' or a decimal digit
string replaceRegex = #"[^+\d]";
string formated = Regex.Replace("+48 123 234 344", replaceRegex, string.Empty);
In my sample the phone number is hard-coded, but it's just for demonstration purposes.
As a side note; the regex that you have in your code sample above assumes that the country code is 2 digits; this may not be the case. The United States has a one digit code (1) and many countries have 3-digit codes (perhaps there are countries with more digits than that, as well?).
This should work:
Match m = Regex.Match(s, #"^([+]|00)\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$");
return String.Format("{0}{1}{2}{4}", m.Groups[1], m.Groups[2], m.Groups[3], m.Groups[3]);

Regex for numbers only

I haven't used regular expressions at all, so I'm having difficulty troubleshooting. I want the regex to match only when the contained string is all numbers; but with the two examples below it is matching a string that contains all numbers plus an equals sign like "1234=4321". I'm sure there's a way to change this behavior, but as I said, I've never really done much with regular expressions.
string compare = "1234=4321";
Regex regex = new Regex(#"[\d]");
if (regex.IsMatch(compare))
{
//true
}
regex = new Regex("[0-9]");
if (regex.IsMatch(compare))
{
//true
}
In case it matters, I'm using C# and .NET2.0.
Use the beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.
Note that "\d" will match [0-9] and other digit characters like the Eastern Arabic numerals ٠١٢٣٤٥٦٧٨٩. Use "^[0-9]+$" to restrict matches to just the Arabic numerals 0 - 9.
If you need to include any numeric representations other than just digits (like decimal values for starters), then see #tchrist's comprehensive guide to parsing numbers with regular expressions.
Your regex will match anything that contains a number, you want to use anchors to match the whole string and then match one or more numbers:
regex = new Regex("^[0-9]+$");
The ^ will anchor the beginning of the string, the $ will anchor the end of the string, and the + will match one or more of what precedes it (a number in this case).
If you need to tolerate decimal point and thousand marker
var regex = new Regex(#"^-?[0-9][0-9,\.]+$");
You will need a "-", if the number can go negative.
This works with integers and decimal numbers. It doesn't match if the number has the coma thousand separator ,
"^-?\\d*(\\.\\d+)?$"
some strings that matches with this:
894
923.21
76876876
.32
-894
-923.21
-76876876
-.32
some strings that doesn't:
hello
9bye
hello9bye
888,323
5,434.3
-8,336.09
87078.
It is matching because it is finding "a match" not a match of the full string. You can fix this by changing your regexp to specifically look for the beginning and end of the string.
^\d+$
Perhaps my method will help you.
public static bool IsNumber(string s)
{
return s.All(char.IsDigit);
}
If you need to check if all the digits are number (0-9) or not,
^[0-9]+$
Matches
1425
0142
0
1
And does not match
154a25
1234=3254
Sorry for ugly formatting.
For any number of digits:
[0-9]*
For one or more digit:
[0-9]+
^\d+$, which is "start of string", "1 or more digits", "end of string" in English.
Here is my working one:
^(-?[1-9]+\\d*([.]\\d+)?)$|^(-?0[.]\\d*[1-9]+)$|^0$
And some tests
Positive tests:
string []goodNumbers={"3","-3","0","0.0","1.0","0.1","0.0001","-555","94549870965"};
Negative tests:
string []badNums={"a",""," ","-","001","-00.2","000.5",".3","3."," -1","--1","-.1","-0"};
Checked not only for C#, but also with Java, Javascript and PHP
Use beginning and end anchors.
Regex regex = new Regex(#"^\d$");
Use "^\d+$" if you need to match more than one digit.
While non of the above solutions was fitting my purpose, this worked for me.
var pattern = #"^(-?[1-9]+\d*([.]\d+)?)$|^(-?0[.]\d*[1-9]+)$|^0$|^0.0$";
return Regex.Match(value, pattern, RegexOptions.IgnoreCase).Success;
Example of valid values:
"3",
"-3",
"0",
"0.0",
"1.0",
"0.7",
"690.7",
"0.0001",
"-555",
"945465464654"
Example of not valid values:
"a",
"",
" ",
".",
"-",
"001",
"00.2",
"000.5",
".3",
"3.",
" -1",
"--1",
"-.1",
"-0",
"00099",
"099"
Another way: If you like to match international numbers such as Persian or Arabic, so you can use following expression:
Regex = new Regex(#"^[\p{N}]+$");
To match literal period character use:
Regex = new Regex(#"^[\p{N}\.]+$");
Regex for integer and floating point numbers:
^[+-]?\d*\.\d+$|^[+-]?\d+(\.\d*)?$
A number can start with a period (without leading digits(s)),
and a number can end with a period (without trailing digits(s)).
Above regex will recognize both as correct numbers.
A . (period) itself without any digits is not a correct number.
That's why we need two regex parts there (separated with a "|").
Hope this helps.
I think that this one is the simplest one and it accepts European and USA way of writing numbers e.g. USA 10,555.12 European 10.555,12
Also this one does not allow several commas or dots one after each other e.g. 10..22 or 10,.22
In addition to this numbers like .55 or ,55 would pass. This may be handy.
^([,|.]?[0-9])+$
console.log(/^(0|[1-9][0-9]*)$/.test(3000)) // true
If you want to extract only numbers from a string the pattern "\d+" should help.
To check string is uint, ulong or contains only digits one .(dot) and digits
Sample inputs
Regex rx = new Regex(#"^([1-9]\d*(\.)\d*|0?(\.)\d*[1-9]\d*|[1-9]\d*)$");
string text = "12.0";
var result = rx.IsMatch(text);
Console.WriteLine(result);
Samples
123 => True
123.1 => True
0.123 => True
.123 => True
0.2 => True
3452.434.43=> False
2342f43.34 => False
svasad.324 => False
3215.afa => False
The following regex accepts only numbers (also floating point) in both English and Arabic (Persian) languages (just like Windows calculator):
^((([0\u0660\u06F0]|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+)(\.)[0-9\u0660-\u0669\u06F0-\u06F9]+)|(([0\u0660\u06F0]?|([1-9\u0661-\u0669\u06F1-\u06F9][0\u0660\u06F0]*?)+))|\b)$
The above regex accepts the following patterns:
11
1.2
0.3
۱۲
۱.۳
۰.۲
۲.۷
The above regex doesn't accept the following patterns:
3.
.3
0..3
.۱۲
Regex regex = new Regex ("^[0-9]{1,4}=[0-9]{1,4]$")

Categories