C# Regex - How to ignore escape sequences with variables - c#

Say I have this code:
foreach (string filepath in someList)
{
someBool = Regex.IsMatch(someString, filepath);
}
Where someBool, someList, and someString are just a random boolean, list, and string, respectively (This is a simple example of what I'm trying to do). Filepath is a filepath, with a bunch of backslashes (i.e. C:\\somefolder\somefile). The problem is by running this code, I get an ArgumentException error, with an "unrecognized escape sequence" problem for things like "D:\\H..." I tried using
someBool = Regex.IsMatch(someString, #filepath);
and I am still seeing the error. Is there something else I'm forgetting?

Have you tried using Regex.Escape
Regex.IsMatch(someString, Regex.Escape(filepath));

Related

String replace with indication if replaced in one line

I'm looking for an efficient, case inventive string replace. If using Regex I don't want to call Regex.IsMatch and then Regex.Replace because that's unnecessary two searches through input instead of one. I could do the following but again this requires an additional local variable. Is there a way to do it in one line without a local variable? Something like Regex.TryReplace(ref string input, ...) that would return a bool.
string input = "string with pattern";
string replaced = Regex.Replace(input , Regex.Escape("pattern"), "replace value", RegexOptions.IgnoreCase);
if (!ReferenceEquals(replaced, input))
{
input = replaced;
// do something
}
You can do it with with a try/catch using the Replace(String, String, String, RegexOptions, TimeSpan)`overload.
try {
Console.WriteLine(Regex.Replace(words, pattern, evaluator,
RegexOptions.IgnorePatternWhitespace,
TimeSpan.FromSeconds(.25)));
}
catch (RegexMatchTimeoutException) {
Console.WriteLine("Returned words:");
}
}
Reference
But you are still performing two operations: trying to replace, and checking if it's replaced, which you'll always be doing. I'm courious on why such a concern of doing two operations in one line.

Regex/Method to remove namespace from a Type.FullName - C#

I am working on writing a method to remove the namespace from a System.Type.FullName (not XML).
I started off googling and didn't get too far so switched to trying to write a Regex I could use with a Regex.Replace(). But I am far from a master of the Regex arts, so I present myself humbly before the regex gods.
Given the following inputs:
name.space.class
name.space.class<other.name.space.class1>
name.space.class<other.name.space.class1, shortSpace.class2>
I need to remove the namespaces so I get:
class
class<class1>
class<class1, class2>
Alternatively, if anyone knows of an existing library that has this functionality, all the better!
Note: I know System.Type has a Namespace property that I could use to remove the namespace (ie System.Type.FullName - System.Type.Namespace), but my method takes a type name as a string and needs to work with type names that the run-time does not know about (can't resolve).
How about this...
[.\w]+\.(\w+)
...and substiuting with $1. See it in action on regex101.
From looking at some C# examples it seems you would do
string output = Regex.Replace(input, #"[.\w]+\.(\w+)", "$1");
Try this:
public static string RemoveNamespaces(string typename)
{
return string.Join("",
Regex.Split(typename,
#"([^\w\.])").Select(p =>
p.Substring(p.LastIndexOf('.') + 1)));
}
I wouldn't even consider using regexes for this. Imperative code is pretty trivial here, although it requires a bit of string-fu:
public string RemoveNamespace(string typename)
{
if(typename.Contains("<")
{
var genericArguments =
typename.
// in reality, we need a substring before
// first occurence of "<" and last occurence of ">"
SubstringBetween("<", ">").
Split(',').
Select(string.Trim).
Select(RemoveNamespace);
return
RemoveNamespace(typename.SubstringBefore("<")) +
"<" +
string.Join(", ", genericArguments) +
">";
}
else
{
return typename.Trim().SubstringAfterLastOccurenceOf(".");
}
}
Sounds like a good situation to use positive lookahead:
(\w+[.+])+(?=\w+)
This pattern will match any number of words separated by periods or plusses, except the last one in a sequence (the short name of the type). Replacing the matches by the empty string will remove all namespace prefixes.
Why not split by dot(.) and take only the last string

C# String compare not working

I'm having some issues with the string comparison of a string the is received by Request.queryString and a line from a file .resx.
The code receive Request.queryString to a variable named q, then it goes to a function to compare if a line has q value in it:
while ((line = filehtml.ReadLine()) != null)
{
if (line.ToLower().Contains(q.ToLower().ToString()))
HttpContext.Current.Response.Write("<b>Content found!</b>");
else
HttpContext.Current.Response.Write("<b>Content not found!</b>");
}
As it's a search in static files, special characters must be consider and seraching for: Iberê for example, isn't returning true because the .Contains, .IndexOf or .LastindexOf is comparing: iberê, that is coming from q, with iberê that is coming from the line.
Consider that I already tried to use ResXResourceReader (which can't be found by Visual Studio), ResourceReader and ResourceManager (these I couldn't set a static file by the path to be read).
EDIT:
Problem solved. There was a instance of SpecialChars, overwriting q value with EntitiesEncode method
The problem is that the ê character is escaped in both strings. So if you did something like this, it wouldn't work:
string line = "sample iberê text";
string q = "iberê";
if (line.Contains(q)) {
// do something
}
You need to unscape the strings. Use HttpUtility in the System.Web assembly. This will work:
line = System.Web.HttpUtility.HtmlDecode(line);
q = System.Web.HttpUtility.HtmlDecode(q);
if (line.Contains(q)) {
// do something
}
As suggested by #r3bel below, if you're using .net 4 or above you can also use System.Net.WebUtility.HtmlDecode, so you don't need an extra assembly reference.

Replacing _x with string.empty using Regex.Replace

I have a string "region_2>0" where I want to replace _2 with string.empty using Regex.
My expression is ((_)[^_]*)\w(?=[\s=!><]) which in both Regulator and Expresso gives me _2. However, the code(c#):
Regex.Match(legacyExpression, "((_)[^_]*)\\w(?=[\\s=!><])").Value
gives me "_2>0", which also causes the replace to be wrong (It returns "region" since removing the whole "_2>0" instead of "_2". The result I want is "region>0". Shouldn't the code and the regex programs give the same results? And how can I get it to work?
(Note the string is not static, it could be in many different forms, but the rule is I want to replace the last _X in the string with string.empty.
Thanks!
I copied your code as is into the new project:
static void Main(string[] args)
{
var legacyExpression = "region_2>0";
var rex = Regex.Match(legacyExpression, "((_)[^_]*)\\w(?=[\\s=!><])").Value;
Console.WriteLine(rex);
Console.ReadKey();
}
The output is _2.
I think this could work
(_\d+)

CSV Parsing with double quotes

I am trying to use C# to parse CSV. I used regular expressions to find "," and read string if my header counts were equal to my match count.
Now this will not work if I have a value like:
"a",""b","x","y"","c"
then my output is:
'a'
'"b'
'x'
'y"'
'c'
but what I want is:
'a'
'"b","x","y"'
'c'
Is there any regex or any other logic I can use for this ?
CSV, when dealing with things like multi-line, quoted, different delimiters* etc - can get trickier than you might think... perhaps consider a pre-rolled answer? I use this, and it works very well.
*=remember that some locales use [tab] as the C in CSV...
CSV is a great example for code reuse - No matter which one of the csv parsers you choose, don't choose your own. Stop Rolling your own CSV parser
I would use FileHelpers if I were you. Regular Expressions are fine but hard to read, especially if you go back, after a while, for a quick fix.
Just for sake of exercising my mind, quick & dirty working C# procedure:
public static List<string> SplitCSV(string line)
{
if (string.IsNullOrEmpty(line))
throw new ArgumentException();
List<string> result = new List<string>();
bool inQuote = false;
StringBuilder val = new StringBuilder();
// parse line
foreach (var t in line.Split(','))
{
int count = t.Count(c => c == '"');
if (count > 2 && !inQuote)
{
inQuote = true;
val.Append(t);
val.Append(',');
continue;
}
if (count > 2 && inQuote)
{
inQuote = false;
val.Append(t);
result.Add(val.ToString());
continue;
}
if (count == 2 && !inQuote)
{
result.Add(t);
continue;
}
if (count == 2 && inQuote)
{
val.Append(t);
val.Append(',');
continue;
}
}
// remove quotation
for (int i = 0; i < result.Count; i++)
{
string t = result[i];
result[i] = t.Substring(1, t.Length - 2);
}
return result;
}
There's an oft quoted saying:
Some people, when confronted with a
problem, think "I know, I'll use
regular expressions." Now they have
two problems. (Jamie Zawinski)
Given that there's no official standard for CSV files (instead there are a large number of slightly incompatible styles), you need to make sure that what you implement suits the files you will be receiving. No point in implementing anything fancier than what you need - and I'm pretty sure you don't need Regular Expressions.
Here's my stab at a simple method to extract the terms - basically, it loops through the line looking for commas, keeping track of whether the current index is within a string or not:
public IEnumerable<string> SplitCSV(string line)
{
int index = 0;
int start = 0;
bool inString = false;
foreach (char c in line)
{
switch (c)
{
case '"':
inString = !inString;
break;
case ',':
if (!inString)
{
yield return line.Substring(start, index - start);
start = index + 1;
}
break;
}
index++;
}
if (start < index)
yield return line.Substring(start, index - start);
}
Standard caveat - untested code, there may be off-by-one errors.
Limitations
The quotes around a value aren't removed automatically.
To do this, add a check just before the yield return statement near the end.
Single quotes aren't supported in the same way as double quotes
You could add a separate boolean inSingleQuotedString, renaming the existing boolean to inDoubleQuotedString and treating both the same way. (You can't make the existing boolean do double work because you need the string to end with the same quote that started it.)
Whitespace isn't automatically removed
Some tools introduce whitespace around the commas in CSV files to "pretty" the file; it then becomes difficult to tell intentional whitespace from formatting whitespace.
In order to have a parseable CSV file, any double quotes inside a value need to be properly escaped somehow. The two standard ways to do this are by representing a double quote either as two double quotes back to back, or a backslash double quote. That is one of the following two forms:
""
\"
In the second form your initial string would look like this:
"a","\"b\",\"x\",\"y\"","c"
If your input string is not formatted against some rigorous format like this then you have very little chance of successfully parsing it in an automated environment.
If all your values are guaranteed to be in quotes, look for values, not for commas:
("".*?""|"[^"]*")
This takes advantage of the fact that "the earliest longest match wins" - it looks for double quoted values first, and with a lower priority for normal quoted values.
If you don't want the enclosing quote to be part of the match, use:
"(".*?"|[^"]*)"
and go for the value in match group 1.
As I said: Prerequisite for this to work is well-formed input with guaranteed quotes or double quotes around each value. Empty values must be quoted as well! A nice side-effect is that it does not care for the separator char. Commas, TABs, semi-colons, spaces, you name it. All will work.
FileHelpers supports multiline fields.
You could parse files like these:
a,"line 1
line 2
line 3"
b,"line 1
line 2
line 3"
Here is the datatype declaration:
[DelimitedRecord(",")]
public class MyRecord
{
public string field1;
[FieldQuoted('"', QuoteMode.OptionalForRead, MultilineMode.AllowForRead)]
public string field2;
}
Here is the usage:
static void Main()
{
FileHelperEngine engine = new FileHelperEngine(typeof(MyRecord));
MyRecord[] res = engine.ReadFile("file.csv");
}
Try CsvHelper (a library I maintain) or FastCsvReader. Both work well. CsvHelper does writing also. Like everyone else has been saying, don't roll your own. :P
FileHelpers for .Net is your friend.
See the link "Regex fun with CSV" at:
http://snippets.dzone.com/posts/show/4430
The Lumenworks CSV parser (open source, free but needs a codeproject login) is by far the best one I've used. It'll save you having to write the regex and is intuitive to use.
Well, I'm no regex wiz, but I'm certain they have an answer for this.
Procedurally it's going through letter by letter. Set a variable, say dontMatch, to FALSE.
Each time you run into a quote toggle dontMatch.
each time you run into a comma, check dontMatch. If it's TRUE, ignore the comma. If it's FALSE, split at the comma.
This works for the example you give, but the logic you use for quotation marks is fundamentally faulty - you must escape them or use another delimiter (single quotes, for instance) to set major quotations apart from minor quotations.
For instance,
"a", ""b", ""c", "d"", "e""
will yield bad results.
This can be fixed with another patch. Rather than simply keeping a true false you have to match quotes.
To match quotes you have to know what was last seen, which gets into pretty deep parsing territory. You'll probably, at that point, want to make sure your language is designed well, and if it is you can use a compiler tool to create a parser for you.
-Adam
I have just try your regular expression in my code..its work fine for formated text with quote ...
but wondering if we can parse below value by Regex..
"First_Bat7679",""NAME","ENAME","FILE"","","","From: "DDD,_Ala%as"#sib.com"
I am looking for result as:
'First_Bat7679'
'"NAME","ENAME","FILE"'
''
''
'From: "DDD,_Ala%as"#sib.com'
Thanx

Categories