Replace specific repeating characters from a string - c#

I have a string like "aaa\\\\\\\\test.txt".
How do I replace all the repeating \\ characters by a single \\?
I have tried
pPath = new Regex("\\{2,}").Replace(pPath, Path.DirectorySeparatorChar.ToString());
which matches on http://regexstorm.net/tester but doesn't seem to do the trick in my program.
I'm running this on Windows so the Path.DirectorySeparatorChar is a \\.

Use new Regex(#"\\{2,}") and the rest the same.
You need to actually leave the backslash escaped in your regular expression, so you need to produce a string with two backslashes in it. The two equivalent techniques to produce the correct C# string literal are #"\\{2,}" or "\\\\{2,}"
Both of those string literals are the string \\{2,}, which is the correct regular expression. Your regular expression calls for one backslash character occurring two times, and you have to escape the backslash character. At the risk of being pedantic, if you wanted to replace two a characters, you would use the regular expression a{2,} and if you want to replace to \ characters, you would use the regular expression \\{2,} because \\ is the regular expression that matches a single \. Clear as mud? :)

Not being a demi-god at regex, I would use StringBuilder and do something like this:
string txt = "";
int count = 0;
StringBuilder bldr = new StringBuilder();
foreach(char c in txt)
{
if (c == '\')
{
count++;
if (count < 3)
{
bldr.Append(c);
}
}
else
{
count = 0;
bldr.Append(c);
}
}
string result = bldr.ToString();

Related

Removing special characters using Regex in C#

I have one problem in this code. I want to remove all special characters but the square brackets are not getting removed.
string regExp = "[\\\"]";
string tmp = Regex.Replace(str, regExp," ");
string[] strArray = tmp.Split(',');
obj.amcid = db.Execute("select MAX(amcid)+1 from sca_amcmaster");
foreach (string i in strArray)
{
// int myInts = int.Parse(i);
db.Execute(";EXEC insertitems1 #0,#1", i, obj.invoiceno);
}
Square Brackets are metacharacters in Regular Expressions, which allow us to define list of things. So if you want to match then using Regex you need to change your expression to:
string regExp = "\[\\\"\]";
Therefore, you simply need to include the backslashes before the square brackets to match then too.
If none of them are required in the expression, you can group then using brackets, and the character ? (zero or more matches):
string regExp = "(\[)?(\\)?(\")?(\])?";

what's wrong with this regular expression

I'm doing some experiments with regular expressions and I don't know why the regex don't match.
string line is one line from a file. A line which should match is this
["boxusers:settings/user[boxuser11]/name"] = "username",
The number of the boxuser and the value could be different, so I tried to find a regular expression
My code is this:
string user;
string patternUser = "[\"boxusers:settings/user[boxuser\\d{2,}]/name\"] = \"";
if (Regex.Match(line,patternUser).Success)
user = Regex.Replace(Regex.Replace(line, patternUser, String.Empty), ",*", String.Empty);
So I think \d{2,0} should be a number with two digits and the rest is just the same. But the regex just don't match.
What's going wrong?
Square brackets have a special significance in regular expressions. You need to escape them with a backslash.
var line = #"[""boxusers:settings/user[boxuser11]/name""] = ""username"", ";
string patternUser = #"\[""boxusers:settings/user\[boxuser\d{2,}\]/name""\] = """;
Console.WriteLine(Regex.Match(line, patternUser).Success);
If you don't want to use verbatim strings, you'll need to use two backslashes to escape each regex metacharacter (the first to escape the second).

Regular expression to remove whitespace around a comma, except when quoted

I have a CSV file that has rows resembling this:
1, 4, 2, "PUBLIC, JOHN Q" ,ACTIVE , 1332
I am looking for a regular expression replacement that will match against these rows and spit out something resembling this:
1,4,2,"PUBLIC, JOHN Q",ACTIVE,1332
I thought this would be rather easy: I made the expression ([ \t]+,) and replaced it with ,. I made a complement expression (,[ \t]+) with a replacement of , and I thought I had achieved a good means of right-trimming and left-trimming strings.
...but then I noticed that my "PUBLIC, JOHN Q" was now "PUBLIC,JOHN Q" which isn't what I wanted. (Note the space following the comma is now gone).
What would be the appropriate expression to trim the white space before and after a comma, but leave quoted text untouched?
UPDATE
To clarify, I am using an application to handle the file. This application allows me to define multiple regular expression replacements; it does not provide a parsing capability. While this may not be the ideal mechanism for this, it would sure beat making another application for this one file.
If the engine used by your tool is the C# regular expression engine, then you can try the following expression:
(?<!,\s*"(?:[^\\"]|\\")*)\s+(?!(?:[^\\"]|\\")*"\s*,)
replace with empty string.
The guys answers assumed the quotes are balanced and used counting to determine if the space is part of a quoted value or not.
My expression looks for all spaces that are not part of a quoted value.
RegexHero Demo
Something like this might do the job:
(?<!(^[^"]*"[^"]*(("[^"]*){2})*))[\t ]*,[ \t]*
Which matches [\t ]*,[ \t]*, only when not preceded by an odd number of quotes.
Going with some CSV library or parsing the file yourself would be much more easier, and IMO should be preferable option here.
But if you really insist on a regex, you can use this one:
"\s+(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
And replace it with empty string - ""
This regex matches one or more whitespaces, followed by an even number of quotes. This will of course work only if you have balanced quote.
(?x) # Ignore Whitespace
\s+ # One or more whitespace characters
(?= # Followed by
( # A group - This group captures even number of quotes
[^\"]* # Zero or more non-quote characters
\" # A quote
[^\"]* # Zero or more non-quote characters
\" # A quote
)* # Zero or more repetition of previous group
[^\"]* # Zero or more non-quote characters
$ # Till the end
) # Look-ahead end
string format(string val)
{
if (val.StartsWith("\"")) val = " " + val;
string[] vals = val.Split('\"');
for (int i = 0; i < vals.Length; i += 2) vals[i] = vals[i].Replace(" ", "").Replace("\t", "");
return string.Join("\t", vals);
}
This will work if you have properly closed quoted strings in between
Forget the regex (See Bart's comment on the question, regular expressions aren't suitable for CSV).
public static string ReduceSpaces( string input )
{
char[] a = input.ToCharArray();
int placeComma = 0, placeOther = 0;
bool inQuotes = false;
bool followedComma = true;
foreach( char c in a ) {
inQuotes ^= (c == '\"');
if (c == ' ') {
if (!followedComma)
a[placeOther++] = c;
}
else if (c == ',') {
a[placeComma++] = c;
placeOther = placeComma;
followedComma = true;
}
else {
a[placeOther++] = c;
placeComma = placeOther;
followedComma = false;
}
}
return new String(a, 0, placeComma);
}
Demo: http://ideone.com/NEKm09

Check for special characters are not allowed in C#

I have to validate a text box from a list of special characters that are not allowed.
This all are not allowed characters.
"&";"\";"/";"!";"%";"#";"^";"(";")";"?";"|";"~";"+";" ";
"{";"}";"*";",";"[";"]";"$";";";":";"=";"
Where semi-column is used to just separate between characters .I tried to write a regex for some characters to validate if it had worked i would extend it.it is not working .
What I am doing wrong in this.
Regex.IsMatch(textBox1.Text, #"^[\%\/\\\&\?\,\'\;\:\!\-]+$")
^[\%\/\\\&\?\,\'\;\:\!\-]+$
matches the strings that consist entirely of special characters. You need to invert the character class to match the strings that do not contain a special character:
^[^\%\/\\\&\?\,\'\;\:\!\-]+$
^--- added
Alternatively, you can use this regex to match any string containing only alphanumeric characters, hyphens, underscores and apostrophes.
^[a-zA-Z0-9\-'_]$
The regex you mention in the comments
[^a-zA-Z0-9-'_]
matches a string that contains any character except those that are allowed (you might need to escape the hyphen, though). This works as well, assuming you reverse the condition correctly (accept the strings that do not match).
If you are just looking for any of a list of characters then a regular expression is the more complicated option. String.IndexOfAny will return the first index of any of an array of characters or -1. So the check:
if (input.IndexOfAny(theCharacetrers) != -1) {
// Found one of them.
}
where theCharacetrers has previously been set up at class scope:
private readonly char[] theCharacetrers = new [] {'&','\','/','!','%','#','^',... };
You needed to remove ^ from the beginning and $ from the end of the pattern, otherwise in order to match the string should start and end with the special characters.
So, instead of
#"^[\%\/\\\&\?\,\'\;\:\!\-]+$"
it should be
#"[\%\/\\\&\?\,\'\;\:\!\-]+"
You can read more about start of string and end of string anchors here
Your RegExp is "string consiting only of special characters (since you have begin/end markers ^ and $).
You probably want just check if string does not contain any of the characters #"[\%\/\\\&\?\,\'\;\:\!\-]") would be enough.
Also String.IndexOfAny may be better fit if you just need to see if any of the characters is present in the source string.
PLease use this in textchange event
//Regex regex = new Regex("([a-zA-Z0-9 ._#]+)");
Regex regex = new Regex("^[a-zA-Z0-9_#(+).,-]+$");
string alltxt = txtOthers.Text;//txtOthers is textboxes name;
int k = alltxt.Length;
for (int i = 0; i <= k - 1; i++)
{
string lastch = alltxt.Substring(i, 1);
MatchCollection matches = regex.Matches(lastch);
if (matches.Count > 0)
{
}
else
{
txtOthers.Text = alltxt.Remove(i, 1);
i = i - 1;
alltxt = txtOthers.Text;
k = alltxt.Length;
}
txtOthers.Select(txtOthers.TextLength, 0);
}
BY Sharafu Hameed

Using Regular Expressions for Pattern Finding with Replace

I have a string in the following format in a comma delimited file:
someText, "Text with, delimiter", moreText, "Text Again"
What I need to do is create a method that will look through the string, and will replace any commas inside of quoted text with a dollar sign ($).
After the method, the string will be:
someText, "Text with$ delimiter", moreText, "Text Again"
I'm not entirely good with RegEx, but would like to know how I can use regular expressions to search for a pattern (finding a comma in between quotes), and then replace that comma with the dollar sign.
Personally, I'd avoid regexes here - assuming that there aren't nested quote marks, this is quite simple to write up as a for-loop, which I think will be more efficient:
var inQuotes = false;
var sb = new StringBuilder(someText.Length);
for (var i = 0; i < someText.Length; ++i)
{
if (someText[i] == '"')
{
inQuotes = !inQuotes;
}
if (inQuotes && someText[i] == ',')
{
sb.Append('$');
}
else
{
sb.Append(someText[i]);
}
}
This type of problem is where Regex fails, do this instead:
var sb = new StringBuilder(str);
var insideQuotes = false;
for (var i = 0; i < sb.Length; i++)
{
switch (sb[i])
{
case '"':
insideQuotes = !insideQuotes;
break;
case ',':
if (insideQuotes)
sb.Replace(',', '$', i, 1);
break;
}
}
str = sb.ToString();
You can also use a CSV parser to parse the string and write it again with replaced columns.
Here's how to do it with Regex.Replace:
string output = Regex.Replace(
input,
"\".*?\"",
m => m.ToString().Replace(',', '$'));
Of course, if you want to ignore escaped double quotes it gets more complicated. Especially when the escape character can itself be escaped.
Assuming the escape character is \, then when trying to match the double quotes, you'll want to match only quotation marks which are preceded by an even number of escape characters (including zero). The following pattern will do that for you:
string pattern = #"(?<=((^|[^\\])(\\\\){0,}))"".*?(?<=([^\\](\\\\){0,}))""";
A this point, you might prefer to abandon regular expressions ;)
UPDATE:
In reply to your comment, it is easy to make the operation configurable for different quotation marks, delimiters and placeholders.
string quote = "\"";
string delimiter = ",";
string placeholder = "$";
string output = Regex.Replace(
input,
quote + ".*?" + quote,
m => m.ToString().Replace(delimiter, placeholder));
If you'd like to go the regex route here's what you're looking for:
var result = Regex.Replace( text, "(\"[^,]*),([^,]*\")", "$1$$$2" );
The problem with regex in this case is that it won't catch "this, has, two commas".
See it working at http://refiddle.com/1ab
Can you give this a try: "[\w ],[\w ]" (double quotes included)?
And be careful with the replacement because direct replacement will remove the whole string enclosed in the double quotes.

Categories