Ways to share regex across DataAnnotations /Attributes - c#

I am playing around with the System.ComponentModel.DataAnnotations namespace, with a view to getting some validation going on my ASP.NET MVC application.
I have already hit an issue with the RegularExpression annotation.
Because these annotations are attributes they require constant expressions.
OK, I can use a class filled with regex string constants.
The problem with that is I don't want to pollute my regex with escape characters required for the C# parser. My preference is to store the regex in a resources file.
The problem is I cant use those string resources in my data annotations, because they are not constants!
Is there any solution to this?
If not, this seems a significant limitation of using attributes for validation.

In C# there is only one escape code you need (double-quote)... if you use verbatim string literals:
#"like \this\ note \slash here does nothing only quote "" needs doubling
you can even use newline";
I always write regex with #"..." strings - avoids many headaches.

Apparently in .NET 4 there are overrides for the DataAnnotations attribubtes that take a Func< string> in their constructor described as "The function that enables access to validation resources."

You could create a custom validation attribute like this as a proxy which would load the regular expressions from your resource file.

Related

Change razor syntax

How can I change razor syntax in RazorEngine?
I need to use specific keyword instead of"#" symbol.
For example: $$Model.someField instead of #Model.someField. ("$$" instead of "#").
You can't. Razor is not really designed in a way to do it. Basically (Microsoft.AspNet.)Razor has some specially written parsers which handle "#" in a special manner (by switching parsers). This means the languages (C#, Html in this case) itself need to be compatible with this procedure as well!
If you want to replace "#" with something else you need to rewrite the Razor Parsers. This is possible, but at this point you already implemented the hardest part of Razor yourself...
The real question you should ask yourself (or answer here) is: Why you want to do it? It is not as trivial as one would think, I was at this point before.
As freedomn-m suggested you should use #Html.Raw("#") or ## if you need to output a "#".
matthid
- a RazorEngine contributor

Is it possible to use Regex to extract text from attributes repeated in a text file - c# .NET

I am working something at the moment and need to extract an attribute from a big list tags, they are formatted like this:
<appid="928" appname="extractapp" supportemail="me#mydomain.com" /><appid="928" appname="extractapp" supportemail="me#mydomain.com" />
The tags are repeated one after another and all have different appid, appname, supportemail.
I need to just extract all of the support emails, just the email address, without the supportemail=
Will I need to use two regex statements, one to seperate each individual tag, then loop through the result and pull out the emails?
I would then go through and Add the emails to a list, then loop through the list and write each one to a txt file, with a comma after it.
I've never really used Regex too much, so don't know if it's suitable for the above?
I would spend more time trying it myself but it's quite urgent. So hopefully somebody can help.
Have you considered Linq to XML?
http://www.hookedonlinq.com/LINQtoXML5MinuteOverview.ashx
Using XML is better, perhaps, but here's the regular expression you'd use (in case there's a particular reason you need/want to use regular expressions to read XML):
(appid="(?<AppID>[^"]+)" appname="(?<AppName>[^"]+)" supportemail="(?<SupportEmail>[^"]+)")
You can just take the last bit there for the support email but this will extract all of the attributes you mentioned and they will be "grouped" within each tag.
What about modify the string to have proper xml format and load xml to extract all the values of supportemail attribute?
Use
string pattern = "supportemail=\"([^\"]+)";
MatchCollection matches = Regex.Matches(inputString, pattern);
foreach(Match m in matches)
Console.WriteLine(m.Groups[1].Value);
See it here.
Problems you'll encounter by using regular expressions instead of an XML DOM:
All of the example regexes posted thus far will fail in the extremely common case that the attribute values are delimited by single quotes.
Any regex that depends on the attributes appearing in a specific order (e.g. appId before appName) will fail in the event that attributes - whose ordering is insignificant to XML - appear in an order different from what the regex expects.
A DOM will resolve entity references for you and a regex will not; if you use regex, you must check the returned values for (at least) the XML character entitites &, &apos;, >, <, and ".
There's a well-known edge case where using regular expressions to parse XML and XHTML unleashes the Great Old Ones. This will complicate your task considerably, as you will be reduced to gibbering madness and then the Earth will be eaten.

Regular expression to define format of backup filenames

In the application I am currently working on, I have an option to create automatic backups of a certain file on the hard disk. What I would like to do is offer the user the possibility to configure the name of the file and its extension.
For example, the backup filename could be something like : "backup_month_year_username.bak". I had the idea to save the format in the form of a regular expression. For the example above, the regexp would look like :
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w).(?<extension>bak)$"
I thought about using regex because I will also have to browse through the directory of backuped files to delete those older than a certain date. The main trouble I have now is how to create a filename using the regex. In a way I should replace the tags with the information. I could do that using regex.replace and another regex, but I feel it's a big weird doing that and it might be a better way.
Thanks
[Edit] Maybe I wasn't really clear in the first go, but the idea is of course that the user (in this case an admin that will know regex syntax) will have the possibility to modify the form of the filename, that's all the idea behind it[/Edit]
... and if the regex changes, it is next to impossible to reconstruct a string from a given regex.
Edit:
Create some predefined "place-holders": %u could be the user's name, %y could be the year, etc.:
backup_%m_%y_%u.bak
and then simple replace the %? with their actual values.
It sounds like you're trying to use the regular expression to create the file name from a pattern which the user should be able to specify.
Regular expressions can - AFAIK - not be used to create output, but only to validate input, so you'd have the user specify two things:
a file name production pattern like Bart suggested
a validation pattern in form of a regular expression that helps you split the file names into their parts
EDIT
By the way, your sample regex contains an error: The "." is use for "any character", also \w only matches one word character, so I guess you meant to write
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
If the filename is always in this form, there is no reason for a regex, as it's easier to process with string.Split ...
With Bart's solution it is easy enough to split (using string.Split) the generated file name using underscore as the delimiter, to get back the information.
Ok, I think I have found a way to use only the regex. As I am using groups to get the information, I will use another regular expression to match the regular expression and replace the groups with the value:
Regex rgx = new Regex("\(\?\<Month\>.+?\)");
rgx.Replace("^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
, DateTime.Now.Month.ToString());
Ok, it's really a hack, but at least it works and I have only one pattern defined by the user. It might not work if the regex is too complex, but I think I can deal with that problem.
What do you think?

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?
For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.
Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.
Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:
The tokenizer would produce the following tokens given the following input = "INP :x:":
Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL
These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E
if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}
I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.
I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:
Get the Regex class source code
(e.g.
http://www.codeplex.com/NetMassDownloader
to download the .Net source).
Change the code to have a readonly
property with the failure index.
Make sure your code uses that Regex
rather than Microsoft's.
I guess such an index would only have meaning in some simple case, like in your example.
If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about?
It will depend on the algorithm used for mathcing...
Could fail on "abbbc..." or on "abbbcbbc..."
I don't believe it's possible, but I am intrigued why you would want it.
In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.
It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.
I run into the same problem came up to your answer and had to work out my own solution. Here it is:
https://stackoverflow.com/a/11730035/637142
hope it helps

PHPs htmlspecialcharacters equivalent in .NET?

PHP has a great function called htmlspecialcharacters() where you pass it a string and it replaces all of HTML's special characters with their safe equivalents, it's almost a one stop shop for sanitizing input. Very nice right?
Well is there an equivalent in any of the .NET libraries?
If not, can anyone link to any code samples or libraries that do this well?
Try this.
var encodedHtml = HttpContext.Current.Server.HtmlEncode(...);
System.Web.HttpUtility.HtmlEncode(string)
Don't know if there's an exact replacement, but there is a method HtmlUtility.HtmlEncode that replaces special characters with their HTML equivalents. A close cousin is HtmlUtility.UrlEncode for rendering URL's. You could also use validator controls like RegularExpressionValidator, RangeValidator, and System.Text.RegularExpression.Regex to make sure you're getting what you want.
Actually, you might want to try this method:
HttpUtility.HtmlAttributeEncode()
Why? Citing the HtmlAttributeEncode page at MSDN docs:
The HtmlAttributeEncode method converts only quotation marks ("), ampersands (&), and left angle brackets (<) to equivalent character entities. It is considerably faster than the HtmlEncode method.
In an addition to the given answers:
When using Razor view engine (which is the default view engine in ASP.NET), using the '#' character to display values will automatically encode the displayed value. This means that you don't have to use encoding.
On the other hand, when you don't want the text being encoded, you have to specify that explicitly (by using #Html.Raw). Which is, in my opinion, a good thing from a security point of view.

Categories