Regex for including a special character - c#

I have requirement to remove all the special characters from any string except " and ' .
ClientName = Regex.Replace(ClientName, #"\(.*?\)", " ").Trim();
This is the regex I am using. I want exclude all the special characters except " and '.
Example:
clientName= "S"unny, Cool. Mr"
Output should be
"S"unny Cool Mr"

Consider using the following pattern:
#"[^\p{L}\p{Nd}'""\s]+"
This will target all special characters while also excluding single and double quote, as well as whitespace.
string clientName = "S\"unny, Cool. Mr";
string output = Regex.Replace(clientName, #"[^\p{L}\p{Nd}'""]+", "");
Console.WriteLine(output);
This prints:
S"unny Cool Mr
The character classes \p{L} and \p{N} represent all Unicode letters and numbers, so placing them into a negative character class means remove anything which is not a number or letter.

Related

Can't understand apperantly simple code snippet in C# using RegexClass r = new Regex(" |, |,");

I have problems with some code that should be simple.
namespace CSharp
{
using System;
using System.Text;
using System.Text.RegularExpressions;
public class Tester
{
static void Main()
{
string s1 = "One,Two,Three Liberty Associates, Inc.";
Regex theRegex = new Regex(" |, |,");
StringBuilder sBuilder = new StringBuilder();
int id = 1;
foreach (string subString in theRegex.Split(s1))
{
sBuilder.AppendFormat("{0}: {1}\n", id++, subString);
}
Console.WriteLine("{0}", sBuilder);
}
}//tester class
}//namespace
Which outputs:
1: One
2: Two
3: Three
4: Liberty
5: Associates
6: Inc.
If I modify the constructor call to new Regex(" |, ");
I get:
1: One,Two,Three
2: Liberty
3: Associates
4: Inc.
I know | is or and I am parsing with , and space, but I don't understand how it works and why I need it given twice.
You can think about "|" as OR. So, when breakdown this regex, you find all matches with this pattern: " " OR ", " OR ",".
The second regex has space OR comma-plus-space. The string "One,Two,Three" does not have any spaces so does not match any part of that regex. To better see what is happening try Regex("( |, |,)") and Regex("( |, )"). Adding the capture brackets into the regexs adds the text they match into the results. See here which states:
If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array. For example, if you split the string "plum-pear" on a hyphen placed within capturing parentheses, the returned array includes a string element that contains the hyphen.
Additionally, I suggest changing the
sBuilder.AppendFormat("{0}: {1}\n", id++, subString);
to be
sBuilder.AppendFormat("{0}: '{1}'\n", id++, subString);
Enclosing the {1} in quotes makes the string easier to see, especially if it has leading or trailing spaces.
Your first regex " |, |," split the text by three options:
one space (' ')
one comma (',')
one comma and one space (', ')
The second regex " |, " have only two options:
one space (' ')
one comma and one space (', ')
The split by comma not exists so it don't split the "One,Two,Three".
I suggest change ' ' with \s and you can take the two options ',' and ', ' with this code: ',\s?' - one comma and then one or zero spaces.
So the full regex is: "\s|,\s?"
you can check it here: regex
ok so how does the following work?
Regex theReg = new Regex(#"(?<time>(\d|\:)+)\s" + #"(?<ip>(\d|\.)+)\s" +
#"(?<site>\S+)");
#"(?(\d|:)+)\s" - should mean a group called time that has any combination of numbers and : colons right?
#"(?(\d|.)+)\s" - a group called IP that has numbers or dots in any amount
#"(?\S+)") - a group of character
And the way this Regex is designed to work, it only works in pairs of 3 or? made a few tests with it, this is what I understand.

Separate title string with no spaces into words

I want to find and separate words in a title that has no spaces.
Before:
ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]
After:
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
I'm looking for a regular expression rule that can do the following.
I thought I'd identify each word if it starts with an uppercase letter.
But also preserve all uppercase words as not to space them into A L L U P P E R C A S E.
Additional rules:
Space a letter if it touches a number: Hello2019World Hello 2019 World
Ignore spacing initials that contain periods, hyphens, or underscores T.E.S.T.
Ignore spacing if between brackets, parentheses, or quotes [Test] (Test) "Test" 'Test'
Preserve hyphens Hello-World
C#
https://rextester.com/GAZJS38767
// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";
// Detect where to space words
string[] split = Regex.Split(title, "(?<!^)(?=(?<![.\\-'\"([{])[A-Z][\\d+]?)");
// Trim each word of extra spaces before joining
split = (from e in split
select e.Trim()).ToArray();
// Join into new title
string newtitle = string.Join(" ", split);
// Display
Console.WriteLine(newtitle);
Regular expression
I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.
https://regex101.com/r/9IIYGX/1
(?<!^)(?=(?<![.\-'"([{])(?<![A-Z])[A-Z][\d+?]?)
(?<!^) // Negative look behind
(?= // Positive look ahead
(?<![.\-'"([{]) // Ignore if starts with punctuation
(?<![A-Z]) // Ignore if starts with double Uppercase letter
[A-Z] // Space after each Uppercase letter
[\d+]? // Space after number
)
Solution
Thanks for all your combined effort in answers. Here's a Regex example. I'm applying this to file names and have exclude special characters \/:*?"<>|.
https://rextester.com/FYEVE73725
https://regex101.com/r/xi8L4z/1
Here is a regex which seems to work well, at least for your sample input:
(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\W)(?=\W)
This patten says to make a split on a boundary of one of the following conditions:
what precedes is a lowercase, and what precedes is an uppercase (or
vice-versa)
what precedes is a digit and what follows is a letter (or
vice-versa)
what precedes and what follows is a non word character
(e.g. quote, parenthesis, etc.)
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";
string[] split = Regex.Split(title, "(?<=[a-z])(?=[A-Z])|(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|(?<=\\W)(?=\\W)");
split = (from e in split select e.Trim()).ToArray();
string newtitle = string.Join(" ", split);
This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'
Note: You might also want to add this assertion to the regex alternation:
(?<=\W)(?=\w)|(?<=\w)(?=\W)
We got away with this here, because this boundary condition never happened. But you might need it with other inputs.
First few parts are similar to #revo answer: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}, additionally I add the following regex to space between number and letter: (?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])|(?<=[A-Z])(?=\d)|(?<=\d)(?=[A-Z]) and to detect OTPIsADevice then replace with lookahead and lookbehind to find uppercase with a lowercase: (((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Note that | is or operator which allowed all the regex to be executed.
Regex: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])|(?<=[A-Z])(?=\d)|(?<=\d)(?=[A-Z])|(((?<!^)[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))
Demo
Update
Improvised a bit:
From: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=[a-z])(?=\d)|(?<=\d)(?=[a-z])|(?<=[A-Z])(?=\d)|(?<=\d)(?=[A-Z])
into: (?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=\p{L})\d which do the same thing.
(((?<!^)(?<!\p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}]) improvised from OP comment which is adding exception to some punctuation: (((?<!^)(?<!['([{])[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\\]}!&}])
Final regex:
(?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}|(?<=\p{L})\d|(((?<!^)(?<!\p{P})[A-Z](?=[a-z]))|((?<=[a-z])[A-Z]))|(?<!^)(?=[[({&])|(?<=[)\]}!&}])
Demo
Aiming for simplicity rather than huge regex, I would recommend this code with small simple patterns (comments with explanation are in code):
string str = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)\"Test\"'Test'[Test]";
// insert space when there is small letter followed by upercase letter
str = Regex.Replace(str, "(?<=[a-z])(?=[A-Z])", " ");
// insert space whenever there's digit followed by a ltter
str = Regex.Replace(str, #"(?<=\d)(?=[A-Za-z])", " ");
// insert space when there's letter followed by digit
str = Regex.Replace(str, #"(?<=[A-Za-z])(?=\d)", " ");
// insert space when there's one of characters ("'[ followed by letter or digit
str = Regex.Replace(str, #"(?=[(\[""'][a-zA-Z0-9])", " ");
// insert space when what preceeds is on of characters ])"'
str = Regex.Replace(str, #"(?<=[)\]""'])", " ");
You could reduce the requirements to shorten the steps of a regular expression using a different interpretation of them. For example, the first requirement would be the same as to say, preserve capital letters if they are not preceded by punctuation marks or capital letters.
The following regex works almost for all of the mentioned requirements and may be extended to include or exclude other situations:
(?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}
You have to use Replace() method and use $0 as substitution string.
See live demo here
.NET (See it in action):
string input = #"ThisIsAnExample.TitleHELLO-WORLD2019T.E.S.T.(Test)""Test""'Test'[Test]";
Regex regex = new Regex(#"(?<!^|[A-Z\p{P}])[A-Z]|(?<=\p{P})\p{P}", RegexOptions.Multiline);
Console.WriteLine(regex.Replace(input, #" $0"));

how to replace a string having single quote with some characters in C#

I need a code which will search if a string contains single quote ' before a character and that single quote should be replaced with two single quotes ''.
example-:
input = "test's"
output = "test''s"
input = "test'"
output = "test'"
input = "test' "
output = "test' "
Use positive lookahead to check if next character is a word:
string input = "test's";
var result = Regex.Replace(input, #"'(?=\w)", #"""");
This code uses regular expression to replace match in input string with double quotes. Pattern to match is '(?=\w). It contains single quote and positive lookahead of next character (character itself will not be included in match). If match is found (i.e. input contains single quote followed by word character, then quote is replaced with given string (double quote in this case).
UPDATE: After your edit and comments, correct replacement should look like
var result = Regex.Replace(input, "'(?=[a-zA-Z])", "''");
Inputs:
"test's"
"test'"
"test' "
"test'42"
"Mr Jones' test isn't good - it's bad"
Outputs:
"test''s"
"test'"
"test' "
"test'42"
"Mr Jones' test isn''t good - it''s bad"
try this way
String input = input.Replace("'","\"");

Regex for special characters?

string Val = Regex.Replace(TextBox1.Text, #"[^a-z, A-z, 0-9]", string.Empty);
This expression does not match the character ^ and _. What should i do to match those values?
One more things is, If TextBox1.Text string value is more than 10, the last string value(11th string value) should match.
Note that the ^ is has special meaning when enclosed in square brackets. It means match everything but those specified in the character class, basically '[]'.
If you want to match "^" and "_", put the caret (^) in another position than after the opening bracket like so, using the repetition to restrict character length:
[\W_]
That will make sure the characters in the entire string are 10.
Or you escape it using the slash "\^".
string Val = Regex.Replace(TextBox1.Text, #"[\W_]", string.Empty);
Your problem is A-z.
This matches all ASCII letters A through Z, then the characters that lie between Z and a (which contain, among others, ^ and _), then all ASCII letters between a and z. This means that ^ and _ won't be matched by your regex (as well as the comma and space which you included in your regex as well).
To clarify, your regex could also have been written as
[^a-zA-Z0-9\[\\\]^_` ,]
You probably wanted
string Val = Regex.Replace(TextBox1.Text, #"[^a-zA-Z0-9]", string.Empty);

String Manipulation using C#

Using C# we can do string check like if string.contains() method, e.g.:
string test = "Microsoft";
if (test.Contains("i"))
test = test.Replace("i","a");
This is fine. But what if I want to replace a string which contains " symbol to be replaced.
I want to achieve this:
"<html><head>
I want to remove the " symbol present in check so that the result would be:
<html><head>
The " character can also be replaced, just like any other:
test = test.Replace("\"","");
Also, note that you don't have to test if the character exists : your test.Contains("i") could be removed since the .Replace() method won't do anything (no replace, no error thrown) if the character doesn't exist inside the string.
To include a quote symbol in a string, you need to escape it, using a backslash. In your example, you want to use something lik this:
if (test.Contains("\""))
There are two ways to include a '"' character in a string literal. All the answers so far have used the c-style way:
var quotation = "Parting is such sweet sorrow";
var howSweetIsIt = quotation + " that I shall say \"good-night\" till it be morrow.";
In some contexts (especially for users experienced with Visual Basic), the verbatim string literal may be easier to read. A verbatim string literal begins with an # sign, and the only character that requires escaping is the quotation mark -- all other characters are included verbatim (hence the name). Significantly, the method of escaping the quotation mark is different: rather than preceding it with a backslash, it must be doubled:
var howSweetIsIt = quotation + " that I shall say ""good-night"" till it be morrow.";
string SymbolString = "Micro\"so\"ft";
The string above use scape char \ to insert " between the characters
string Result = SymbolString.Replace("\"", string.Empty);
With the following replace I replace the character "" for empty.
This is what you try to achieve?
if (check.Contains("\"")
output = check.Replace("\"", "");
output = check.Replace("\"", "");
Just remember to use "\"" for the quote sign as the backslash is an escape character.
if (str.Contains("\""))
{
str = str.Replace("\"", "");
}

Categories