Regex to exclude a particular substring pattern - c#

My current regex code is as beow .
var caption = Regex.Replace(fileImage.Caption, #"[^\w\s\(\)\.#-]", "",RegexOptions.None);
Here I replace special characters with empty character excluding certain special characters
Now I have a strange situation where I along with above match i need to exculde a substring of type × where 215 can any number . Its can be a normal decimal or hexadecimal . if it is a hexadecimal number then it starts with 'x' after &#.
How can I achive this?

Think you mean this,
var caption = Regex.Replace(fileImage.Caption, #"(&#x?[a-f\d]+;)|[^\w\s\(\)\.#-]", "$1");
DEMO

Related

Split String At Every Non-Letter/Non-Number Character

Imagine a string that contains special characters like $§%%,., numbers and letters.
I want to receive the letter and number junks of an arbitrary string as an array of strings.
A good solution seems to be the use of regex, but I don't know how to express [numbers and letters]
// example
"abc" = {"abc"};
"ab .c" = {"ab", "c"}
"ab123,cd2, ,,%&$§56" = {"ab123", "cd2", "56"}
// try
string input = "jdahs32455$§&%$§df233§$fd";
string[] output = input.Split(Regex("makejunksfromstring"));
To extract chunks of 1 or more letters/digits you may use
[A-Za-z0-9]+ # ASCII only letters/digits
[\p{L}0-9]+ # Any Unicode letters and ASCII only digits
[\p{L}\p{N}]+ # Any Unicode letters/digits
See a regex demo.
C# usage:
string[] output = Regex.Matches(input, #"[\p{L}\p{N}]+").Cast<Match>().Select(x => x.Value).ToArray();
Yes, regex is indeed a good solution for this.
And in fact, to just match all standard words in the input sequence, this is all you need:
(\w+)
Let me quickly explain
\w matches any word character and is equivalent to [a-zA-Z0-9_] - matching a through z or A through Z or 0-9 or _, you might wanna go with [a-zA-Z0-9] instead to avoid that underscore.
Wrapping an expression in () means that you want to capture that part as a group.
The + means that you want sequences of 1 or more of the preceding characters.
Refer to a regular expression cheat sheet to see all the possibilities, such as
https://cheatography.com/davechild/cheat-sheets/regular-expressions/
Or any that you find online.
Also there are tools available to quickly test out your regular expressions, such as
https://regex101.com/ (quite well visualised matching)
or http://regexstorm.net/tester specifically for .NET

What is the C# regex format to match +- signs or space?

I've writing code using Regex format to search for numbers that has signs before it either: + , - or space like the following numbers :
(+02.00)
(-03.50)
( 00.00)
I'm using this format but i want to include space with +-
[+-]\d{2}.\d{2}
Please help , thanks.
You can use:
[+\s-]\d{2}\.\d{2}
Take note of \s in the first character class [...] that will match a whitespace.
Unescaped hyphen should be at first or last position in character class.
Also, you need to escape the dot otherwise it will match any character.

Regex to allow some special characters c#

I have to check whether a string contains special characters or not but I can allow these 5 special characters in it .()_-
i have written my regex as
var specialCharacterSet = "^[()_-.]";
var test = Regex.Match("a!", specialCharacterSet);
var isValid = test.Success;
but its throwing an:
error parsing "^[()_-.]" - [x-y] range in reverse order.
You have specified a range with -. Place it at the end:
[()_.-]
Otherwise the range is not correct: the lower boundary symbol _ appears later in the character table than the upper bound symbol .:
Also, if you plan to check if any of the character inside a string belongs to this set, you should remove ^ that checks only at the beginning of a string.
To test if a string meets some pattern, use Regex.IsMatch:
Indicates whether the regular expression finds a match in the input string.
var specialCharacterSet = "[()_.-]";
var test = Regex.IsMatch("a!", specialCharacterSet);
UPDATE
To accept any string value that doesnt contains the five characters, you can use
var str = "file.na*me";
if (!Regex.IsMatch(str, #"[()_.-]"))
Console.WriteLine(string.Format("{0}: Valid!", str));
else
Console.WriteLine(string.Format("{0}: Invalid!", str));
See IDEONE demo
You can use ^[()_\-.] or ^[()_.-] if you use special characters then best use \ before any special characters (which are used in regex special char.).
[()_.-]
Keep - at end or escape it to avoid it forming an invalid range.- inside a character class forms a range.Here
_ is decimal 95
. is decimal 46.
So it is forming an invalid range from 95 to 46
var specialCharacterSet = "^[()_.-]";
var test = Regex.IsMatch("a!", specialCharacterSet);
Console.WriteLine(test);
Console.ReadLine();
Convert all special characters in a pattern to text using Regex.Escape(). Suppose you already have using System.Text.RegularExpressions;
string pattern = Regex.Escape("[");
then check like this
if (Regex.IsMatch("ab[c", pattern)) Console.WriteLine("found");
Microsoft doesn't tell about escape in the tutorial. I learned it from Perl.
The best way in terms of C# is [()_\-\.], because . and - are reserved characters for regex. You need to use an escape character before these reserved characters.

regex issue c# numbers are underscores now

My Regex is removing all numeric (0-9) in my string.
I don't get why all numbers are replaced by _
EDIT: I understand that my "_" regex pattern changes the characters into underscores. But not why numbers!
Can anyone help me out? I only need to remove like all special characters.
See regex here:
string symbolPattern = "[!##$%^&*()-=+`~{}'|]";
Regex.Replace("input here 12341234" , symbolPattern, "_");
Output: "input here ________"
The problem is your pattern uses a dash in the middle, which acts as a range of the ascii characters from ) to =. Here's a breakdown:
): 41
1: 49
=: 61
As you can see, numbers start at 49, and falls between the range of 41-61, so they're matched and replaced.
You need to place the - at either the beginning or end of the character class for it to be matched literally rather than act as a range:
"[-!##$%^&*()=+`~{}'|]"
you must escape - because sequence [)-=] contains digits
string symbolPattern = "[!##$%^&*()\-=+`~{}'|]";
Move the - to the end of the list so it is seen as a literal:
"[!##$%^&*()=+`~{}'|-]"
Or, to the front:
"[-!##$%^&*()=+`~{}'|]"
As it stands, it will match all characters in the range )-=, which includes all numerals.
You need to escape your special characters in your regex. For instance, * is a wildcard match. Look at what some of those special characters mean for your match.
I've not used C#, but typically the "*" character is also a control character that would need escaping.
The following matches a whole line of any characters, although the "^" and "$" are some what redundant:
^.*$
This matches any number of "A" characters that appear in a string:
A*
The "Owl" book from oreilly is what you really need to research this:
http://shop.oreilly.com/product/9780596528126.do?green=B5B9A1A7-B828-5E41-9D38-70AF661901B8&intcmp=af-mybuy-9780596528126.IP

Regular Expression for string

I have a string like
e.g AHDFFH XXXX
where 'AHDFFH' can be char string of any length.
AND 'XXXX' will be repeated no. of 'X' chars of any length which needs to be replaced by auto incremented database value in a table.
I need to find repeated 'X' chars from above string using regular expression.
Can anyone please help me to figure this out..??
Try this:
\b(\p{L})\1+\b
Explanation:
<!--
\b(\p{L})\1+\b
Options: case insensitive; ^ and $ match at line breaks
Assert position at a word boundary «\b»
Match the regular expression below and capture its match into backreference number 1 «(\p{L})»
A character with the Unicode property “letter” (any kind of letter from any language) «\p{L}»
Match the same text as most recently matched by capturing group number 1 «\1+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert position at a word boundary «\b»
-->
is your meaning some chars + (on or some)space + some numbers?
if so u can use this regexpression:
\w+\s+(\d+)
c# codes like this:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(#"\w+\s+(\d+)");
System.Text.RegularExpressions.Match m = regex.Match("aaaa 3333");
if(m.Success) {
MessageBox.Show(m.Groups[1].Value);
}

Categories