regex issue c# numbers are underscores now - c#

My Regex is removing all numeric (0-9) in my string.
I don't get why all numbers are replaced by _
EDIT: I understand that my "_" regex pattern changes the characters into underscores. But not why numbers!
Can anyone help me out? I only need to remove like all special characters.
See regex here:
string symbolPattern = "[!##$%^&*()-=+`~{}'|]";
Regex.Replace("input here 12341234" , symbolPattern, "_");
Output: "input here ________"

The problem is your pattern uses a dash in the middle, which acts as a range of the ascii characters from ) to =. Here's a breakdown:
): 41
1: 49
=: 61
As you can see, numbers start at 49, and falls between the range of 41-61, so they're matched and replaced.
You need to place the - at either the beginning or end of the character class for it to be matched literally rather than act as a range:
"[-!##$%^&*()=+`~{}'|]"

you must escape - because sequence [)-=] contains digits
string symbolPattern = "[!##$%^&*()\-=+`~{}'|]";

Move the - to the end of the list so it is seen as a literal:
"[!##$%^&*()=+`~{}'|-]"
Or, to the front:
"[-!##$%^&*()=+`~{}'|]"
As it stands, it will match all characters in the range )-=, which includes all numerals.

You need to escape your special characters in your regex. For instance, * is a wildcard match. Look at what some of those special characters mean for your match.

I've not used C#, but typically the "*" character is also a control character that would need escaping.
The following matches a whole line of any characters, although the "^" and "$" are some what redundant:
^.*$
This matches any number of "A" characters that appear in a string:
A*
The "Owl" book from oreilly is what you really need to research this:
http://shop.oreilly.com/product/9780596528126.do?green=B5B9A1A7-B828-5E41-9D38-70AF661901B8&intcmp=af-mybuy-9780596528126.IP

Related

Regex to match if string is exactly as defined

How can I check that the string is in correct format. I want the string to compare and pass only if matches exactly. Following are the correct formats :
0.#
0.##
0.###
0.####
0.#####
The hash (#) after the dot (.) can be upto 10 characters but it should only have 0.# nothing else is allowed.
Can someone please guide me how can I validate a string of this type ?
Im Regular Expression the carret (^) represent start-of-line and the ($) represents end-of-line (or before newline).
A regex with an exact match is just what you want enclosed by ^ and $. But you must ensure that special regular expression characters are quoted. For example the regex
^Hello World$
would match exactly on the String "Hello World" and nothing else.
You also can use numbers directly. You need to escape the dot "." as a dot in a regular expression means any character except newline. You escape a character by adding a backslash.
Next you should know about quantifiers. The usually ones are
-> 0 or many
-> 1 or many
{n} -> exactly n times
{n,} -> at least n times
{n,m} -> n to m times
So you can write:
^0\.#{1,10}$
If you use a normal string in C# with quotations (") you must use two backslashes
^0\\.#{1,10}$

Ignore spaces at the end of a string

I use the following regex, which is working, but I want to add a condition so as to accept spaces at the end of the value. Currently it is not working.
What am I missinghere?
^[a-zA-Z][a-zA-Z0-9_]+\s?$[\s]*$
Assumption: you added the two end of string anchors $ by mistake.
? quantifier, matching one or zero repetitions, makes the previous item optional
* quantifier, matching zero or more repetitions
So change your expression to
^[a-zA-Z][a-zA-Z0-9_]+\s*$
this is matching any amount of whitespace at the end of the string.
Be aware, whitespace is not just the space character, it is also tabs and newlines (and more)!
If you really want to match only space, just write a space or make a character class with all the characters you want to match.
^[a-zA-Z][a-zA-Z0-9_]+ *$
or
^[a-zA-Z][a-zA-Z0-9_]+[ \t]*$
Next thing is: Are you sure you only want plain ASCII letters? Today there is Unicode and you can use Unicode properties, scripts and blocks in your regular expressions.
Your expression in Unicode, allowing all letters and digits.
^\p{L}\w+\s*$
\p{L} Unicode property, any kind of letter from any language.
\w shorthand character class for word characters (letters, digits and connector characters like "_") [\p{L}\p{Nd}\p{Pc}] as character class with Unicode properties. Definition on msdn
why two dollars?
^[a-zA-Z][a-zA-Z0-9_]+\s*$
or make it this :
"^[a-zA-Z][a-zA-Z0-9_]+\s?\$\s*$"
if you want to literally match the dollar.
Try this -
"^[a-zA-Z][a-zA-Z0-9_]+(\s)?$"
or this -
"^[a-zA-Z][a-zA-Z0-9_]+((\s){,})$"
$ indicates end of expression, if you are looking $ as character, then escape it with \

How to correctly represent a whitespace character

I wanted to know how to represent a whitespace character in C#. I found the empty string representation string.Empty. Is there anything like that that represents a whitespace character?
I would like to do something like this:
test.ToLower().Split(string.Whitespace)
//test.ToLower().Split(Char.Whitespace)
Which whitespace character? The empty string is pretty unambiguous - it's a sequence of 0 characters. However, " ", "\t" and "\n" are all strings containing a single character which is characterized as whitespace.
If you just mean a space, use a space. If you mean some other whitespace character, there may well be a custom escape sequence for it (e.g. "\t" for tab) or you can use a Unicode escape sequence ("\uxxxx"). I would discourage you from including non-ASCII characters in your source code, particularly whitespace ones.
EDIT: Now that you've explained what you want to do (which should have been in your question to start with) you'd be better off using Regex.Split with a regular expression of \s which represents whitespace:
Regex regex = new Regex(#"\s");
string[] bits = regex.Split(text.ToLower());
See the Regex Character Classes documentation for more information on other character classes.
No, there isn't such constant.
The WhiteSpace CHAR can be referenced using ASCII Codes here.
And Character# 32 represents a white space, Therefore:
char space = (char)32;
For example, you can use this approach to produce desired number of white spaces anywhere you want:
int _length = {desired number of white spaces}
string.Empty.PadRight(_length, (char)32));
So I had the same problem so what I did was create a string with a white space and just index the character.
String string = "Hello Morning Good Night";
char empty = string.charAt(5);
Now whenever I need a empty character I will pull it from my reference in memory.
Which whitespace character? The most common is the normal space, which is between each word in my sentences. This is just " ".
Using regular expressions, you can represent any whitespace character with the metacharacter "\s"
MSDN Reference
You can always use Unicode character, for me personally this is the most clear solution:
var space = "\u0020"

Regex.Split on plus and minus sign

I have a string 1.5(+1.2/-0.5). I want to use Regex to extract numerical value: {1.5, 1.2, 0.5}.
My plan is to split the string with (, +, / and -. When I do split with ( and /, it splits OK, but if I also add + and -, then program crashes.
string[] foo = Regex.Split("1.5(+1.5/-0.5)", #"(?=[(/)])");
// OK
string[] foo = Regex.Split("1.5(+1.5/-0.5)", #"(?=[(/+-)])");
// Exception catched
And the caught exception is:
System.ArgumentException: parsing "(?=[(/+-)])" - [x-y] range in
reverse order
The dash is a special character when inside square brackets in a regexp. It means a range: [a-z] means any character from a to z. When you wrote [(/+-)], it would actually mean (, or any character from + to ). The error comes from the fact that in ASCII ordering ) comes before +, so a character range [+-)] is invalid.
To fix this, dash must always come first or last when in brackets, or it needs to be backslashed.
And I agree, I'd probably use a global regexp to pick out [0-9.]+, and not a split to cut on everything else.
Tried to escape signs like +?
And why not a RegEx like /\d+\.?\d+/ ? This won't split it but return the numbers.

Regular Expression: single word

I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
{
...
}
Can you recomend website with a comprensiv list of regex examples?!
It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.
You should add anchors for start and end of string: ^[A-Za-z]+$
Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.
The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
^[a-zA-Z]+$
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.
I don't know what the C#'s regex syntax is, but try [A-Za-z]+.
Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.
I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
{
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
}
The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
^(\p{L}\p{M}*)+$
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)
if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance

Categories