What is this piece of code doing with RegEx - c#

Have been studying a sample source code and I can't understand this part, what is this piece of code doing? Mostly the RegEx part...
in the parameters used, "code" is a string, it is C# source code we are passing in.
Match m = null;
if ((m = Regex.Match(code, "(?ims)^[/']{2}REFDLL (?<ref>.+?)$")).Success)
{
foreach (string refDll in m.Groups["ref"].Value.Split(new char[] { ';', ',' }))
{
//2008-06-18 by Jeffrey, remove redundant \r
string mdfyRefDll = refDll.Replace("\r", "").Replace("\n", "");
//trim the ending .dll if exists
if (mdfyRefDll.ToLower().EndsWith(".dll"))
mdfyRefDll = mdfyRefDll.Substring(0, mdfyRefDll.Length - 4);
string lcRefDll = mdfyRefDll.ToLower();
if (lcRefDll == "system.data.linq" || lcRefDll == "system"
|| lcRefDll == "system.xml.linq" || lcRefDll == "system.core")
continue;
cp.ReferencedAssemblies.Add(mdfyRefDll + ".dll");
}
}

I think this image addresses what's going on in the code you posted:
Mini C# Lab's project description is as follows:
A handy tool for simple short C# code running and testing, you can
save time on waiting for Visual Studio startup and avoid creating a
lot of one-time only project files.
It seems like that project is missing documentation, so it's difficult to extrapolate why the author of the code chose that particular way to add referenced DLLs when there is a using directive in there already. Perhaps he did it to avoid conflicts with the using statement.

First, (?ims) is specifying options. i triggers case-insensitivity, m specifies multi-line mode, and s (IIRC) enables the dot-all option, meaning that the wildcard . includes newline characters.
Then, ^ asserts, "The string must begin here, with no preceding characters..." while the $ at the end asserts, "The string must end here, with no following characters."
The [/']{2} matches exactly two of either the slash or single-quote characters, i.e. //, '', /', and '/.
The REFDLL matches exactly what you see.
The (?<ref>.+?) matches all remaining characters (the final question mark is unnecessary), and remember, due to the s option, this includes newline characters. This portion is stored in a match named ref.
In summary, it's trying to match something like
//REFDLL helloworld foobar
and stores "helloworld foobar" in ref.

Related

Appended extra Escape character in string in c# [duplicate]

var phone = #"^\+(?:[0-9] ?){6,14}[0-9]$";
phone will then equal ^\\+(?:[0-9] ?){6,14}[0-9]$
I thought (and the examples I found seem to show) the # character meant to leave my string how I have it. Why is it doubling the \ and how do I stop it?
The visual studio debugger will show it as if it were doubled, since in C# a \ would precede an escape sequence. Don't worry - your string is unchanged.
It only looks like it's doubled in the debug inspectors.
Note that the strings shown in the inspectors don't start with # - they are showing how you would have to write the string if you were to do it without the #. The two forms are equivalent.
If you're really worried about the contents of the string, output it in a console app.
To reiterate in another way, the comparison
var equal = #"^\+(?:[0-9] ?){6,14}[0-9]$" == "^\\+(?:[0-9] ?){6,14}[0-9]$"
will always be true. As would,
var equal = #"\" == "\\";
If you examine the variables using the Text Visualizer, you will be shown the plain unescaped string, as it was when you declared it verbatim.

Moving away from primary constructors

The C# 6 preview for Visual Studio 2013 supported a primary constructors feature that the team has decided will not make it into the final release. Unfortunately, my team implemented over 200 classes using primary constructors.
We're now looking for the most straightforward path to migrate our source. Since this is a one time thing, a magical regex replacement string or hacky parser would work.
Before I spend a lot of time writing such a beast, is there anyone out there that's already done this or knows of a better way?
As I suggested in comments, you could use the version of Roslyn which does know about primary constructors to parse the code into a syntax tree, then modify that syntax tree to use a "normal" constructor instead. You'd need to put all the initializers that use primary constructor parameters into the new constructor too, mind you.
I suspect that writing that code would take me at least two or three hours, quite possibly more - whereas I could do the job manually for really quite a lot of classes in the same amount of time. Automation's great, but sometimes the quickest solution really is to do things by hand... even 200 classes may well be faster to do manually, and you could definitely parallelize the work across multiple people.
(\{\s*)(\w*\s*?=\s*?\w*\s*?;\s*?)*?(public\s*\w*\s*)(\w*)(\s*?{\s*?get;\s*?\})(\s*?=\s*?\w*;\s*)
\1\2\4\5
A few answers: the first with a simple Regex find and replace which you need to repeat a few times:
Regex: A few lines of explanation then the actual regex string and replacement string:
a. In regex, first you match the full string of what your looking for (in your case a primary constructor). Not hard to do: search for curly bracket, the word public, then two words and an equals sign etc. Each text found according to this is called a Match.
b. Sometimes there are possible repeated sequences in the text that you are looking for. (In your case: The parameters are defined in a line for each). For that, you simply mark the expected sequence as a Group by surrounding it with parenthesis.
c. You then want to mark different parts of what you found, so you can use them or replace them in your corrected text. These parts are also called "Groups" actually "Capture Groups". Again simply surround the parts with parenthesis.
In your case you'll be retaining the first captured group (the curly bracket) and the name of the property with its assignment to the parameter.
d. Here's the regex:
(\{\s*)(\w*\s*?=\s*?\w*\s*?;\s*?)*?(public\s*\w*\s*)(\w*)(\s*?{\s*?get;\s*?})(\s*?=\s*?\w*;\s*)
1. (
// ---- Capture1 -----
{
// code: \{\s*?
// explained: curley bracket followed by possible whitespace
)
2. ( - Capture2 - previously corrected text
// - possible multiple lines of 'corrected' non-primary-constructors
// created during the find-replace process previously,
Propname = paramname; // word, equals-sign, word, semicolon
// code: \w*\s*?=\s*?\w*\s*?;\s*?
// explained: \w - any alphanumeric, \s - any whitespace
// * - one or more times, *? - 0 or more times
)*?
// code: )*?
// explained: this group can be repeated zero or more times
// in other words it may not be found at all.
// These text lines are created during the recursive replacement process...
3. (
// ----Capture 3-----
// The first line of a primary constructor:
public type
// code: public\s*\w*\s*
// explained: the word 'public' and then another word (and [whitespace])
)
4. (
// ----- capture 4 -----
Propname
// code: \w#
// explained: any amount of alphanumeric letters
)
5. (
// ---- capture 5 ----
{ get; }
// code: \s*?{\s*?get;\s*?\}
)
6. (
// ---- capture 6 ----
= propname;
code: \s*?=\s*?\w*;\s*
explained: by now you should get it.
The replacement string is
\1\2\4\6
This leaves:
{
[old corrected code]
[new corrected line]
possible remaining lines to be corrected.
Notepad++ 10 minutes trial-and-error. I guarantee it won't take you more than that.
Visual Studio 2014 refactor. but
a. You have to install it on a separate VM or PC. MS warns you not to install it side by side with your existing code.
b. I'm not sure the refactor works the other way. [Here's an article about it][1]
Visual Studio macros. I know I know, they're long gone, but there are at least two plugins that replace them and perhaps more. I read about them on this SO (StackOverflow) discussion. (They give a few other options) Here:
Visual Commander - Free open source Visual Studio macro runner add-on
VSScript - A Visual Studio add-on: costs $50 !!
Try Automatic Regexp by example:You give it several examples of code in which you highlight what IS the expected result, and then the same (or other) code in which you highlight what IS NOT the expected result. You then wait for it to run through the examples and give you some regex code.
// for the following code (from http://odetocode.com/blogs/scott/archive/2014/08/14/c-6-0-features-part-ii-primary-constructors.aspx )
public struct Money(string currency, decimal amount)
{
public string Currency { get; } = currency;
public decimal Amount { get; } = amount;
}
// I get something like: { ++\w\w[^r-u][^_]++|[^{]++(?={ \w++ =)
Play with the regexp on this great site: https://www.regex101.com/
// I first tried: \{\s*((public\s*\w*\s*)\w*(\s*?{\s*?get;\s*?})\s*?=\s*?\w*;\s*)*\}
The repeated sequence of the primary-constructor lines (the "repeated capture group") only captures the last one.
Use c# code with regex.captures as explained here in another StackOverflow (see accepted answer)

How can I generate a safe class name from a file name?

I'm trying to produce some dynamically compiled code with the Razor engine, and I want to name the generated classes according to their source file names to help understand where a piece of generated code comes from.
For example, I would expect the file C:\source\Foo.cs to be compile with the name Foo.
Given that I have the path to the source file being compiled, is there a way to generate a valid C# identifier based on the file name?
According to the C# spec, the following rules must be adhered to when creating identifiers:
An identifier must start with a letter or an underscore
After the first character, it may contain numbers, letters, connectors, etc
If the identifier is a keyword, it must be prepended with “#”
This helper will satisfy those conditions:
private static string GenerateClassName(string value)
{
string className = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(value);
bool isValid = Microsoft.CSharp.CSharpCodeProvider.CreateProvider("C#").IsValidIdentifier(className);
if (!isValid)
{
// File name contains invalid chars, remove them
Regex regex = new Regex(#"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Nl}\p{Mn}\p{Mc}\p{Cf}\p{Pc}\p{Lm}]");
className = regex.Replace(className, "");
// Class name doesn't begin with a letter, insert an underscore
if (!char.IsLetter(className, 0))
{
className = className.Insert(0, "_");
}
}
return className.Replace(" ", string.Empty);
}
It first converts the file name to camel case (personal preference), it then uses IsValidIdentifier to determine if the file name is already valid for a class name.
If not, it will remove all invalid characters based on the unicode character classes. It then checks whether the file name starts with a letter, if it does, it prepends an _ to fix it.
Finally, I remove all whitespace (even though it would still be a valid identifier with it).
First, you need to extract the File-Name, for example with:
Path.GetFileNameWithoutExtension
Then you have to follow all rules, a c#-class name has.
For example
Starting with a letter or _
i would remove all other characters than _ AND a-z AND 0-9
This should be all!
did you look at the codedom - http://msdn.microsoft.com/en-us/library/ms404245(v=vs.110).aspx ?
Take the path, replace the invalid characters like \ with let's say _ and you're done.
If you prefer shorter names, you could take the path, transform it to lowercase and take a hash value.
Some code sample:
var className = pathIncludingFilename.ToLowerSinceCasingIsNotRelevant().SomeHashFunctionLikeSha1OrPartOfIt() + filename.RemoveInvalidCharactersLikeWhitespace();
The result may look like this:
123a3b6b22foo
The hash should ensure unique names, the filename makes it easier to correlate.

"Evaluate" a c# string

I am reading a C# source file.
When I encounter a string, I want to get it's value.
For instance, in the following example:
public class MyClass
{
public MyClass()
{
string fileName = "C:\\Temp\\A Weird\"FileName";
}
}
I would like to retrieve
C:\Temp\A Weird"FileName
Is there an existing procedure to do that?
Coding a solution with all the possible cases should be quite tricky (#, escape sequences. ...).
I am convinced such procedure exists...
I would like to have the dual function too (to inject a string into a C# source file)
Thanks in advance.
Philippe
P.S:
I gave an example with a filename, but I look for a solution working for all kinds of strings.
I'm pretty sure you can use CodeDOM to read a C# code file and parse its elements. It generates a code tree, and then you can look for nodes representing strings.
http://www.codeproject.com/Articles/2502/C-CodeDOM-parser
Other CodeDom parsers:
http://www.codeproject.com/Articles/14383/An-Expression-Parser-for-CodeDom
NRefactory: https://github.com/icsharpcode/NRefactory and http://www.codeproject.com/Articles/408663/Using-NRefactory-for-analyzing-Csharp-code
There is a way of extracting these strings using a regular expression:
("(\\"|[^"])*")
This particular one works on your simple example and gives the filename (complete with leading and trailing quote characters); whether it would work on more complex ones I can't easily tell unfortunately.
For clarity, (\\"|[^"]) matches any character apart from ", except where it has a leading \ character.
Just use ".*" Regex to match all string values, then remove trailing inverted commas and unescape it.
this will allow \" and "" characters inside your string
so both "C:\\Temp\\A Weird\"FileName" and "Hello ""World""" will match

removing #region

I had to take over a c# project. The guy who developed the software in the first place was deeply in love with #region because he wrapped everything with regions.
It makes me almost crazy and I was looking for a tool or addon to remove all #region from the project. Is there something around?
Just use Visual Studio's built-in "Find and Replace" (or "Replace in Files", which you can open by pressing Ctrl + Shift + H).
To remove #region, you'll need to enable Regular Expression matching; in the "Replace In Files" dialog, check "Use: Regular Expressions". Then, use the following pattern: "\#region .*\n", replacing matches with "" (the empty string).
To remove #endregion, do the same, but use "\#endregion .*\n" as your pattern. Regular Expressions might be overkill for #endregion, but it wouldn't hurt (in case the previous developer ever left comments on the same line as an #endregion or something).
Note: Others have posted patterns that should work for you as well, they're slightly different than mine but you get the general idea.
Use one regex ^[ \t]*\#[ \t]*(region|endregion).*\n to find both: region and endregion. After replacing by empty string, the whole line with leading spaces will be removed.
[ \t]* - finds leading spaces
\#[ \t]*(region|endregion) - finds #region or #endregion (and also very rare case with spaces after #)
.*\n - finds everything after #region or #endregion (but in the same line)
EDIT: Answer changed to be compatible with old Visual Studio regex syntax. Was: ^[ \t]*\#(end)?region.*\n (question marks do not work for old syntax)
EDIT 2: Added [ \t]* after # to handle very rare case found by #Volkirith
In Find and Replace use {[#]<region[^]*} for Find what: and replace it with empty string.
#EndRegion is simple enough to replace.
Should you have to cooperate with region lovers (and keep regions untouched ), then I would recommend "I hate #Regions" Visual Studio extension. It makes regions tolerable - all regions are expanded by default and #region directives are rendered with very small font.
For anyone using ReSharper it's just a simple Atr-Enter on the region line. You will then have the option to remove regions in file, in project, or in solution.
More info on JetBrains.
To remove #region with a newline after it, replace following with empty string:
^(?([^\r\n])\s)*\#region\ ([^\r\n])*\r?\n(?([^\r\n])\s)*\r?\n
To replace #endregion with a leading empty line, replace following with an empty string:
^(?([^\r\n])\s)*\r?\n(?([^\r\n])\s)*\#endregion([^\r\n])*\r?\n
How about writing your own program for it, to replace regions with nothing in all *.cs files in basePath recursively ?
(Hint: Careful with reading files as UTF8 if they aren't.)
public static void StripRegions(string fileName, System.Text.RegularExpressions.Regex re)
{
string input = System.IO.File.ReadAllText(fileName, System.Text.Encoding.UTF8);
string output = re.Replace(input, "");
System.IO.File.WriteAllText(fileName, output, System.Text.Encoding.UTF8);
}
public static void StripRegions(string basePath)
{
System.Text.RegularExpressions.Regex re = new System.Text.RegularExpressions.Regex(#"(^[ \t]*\#[ \t]*(region|endregion).*)(\r)?\n", System.Text.RegularExpressions.RegexOptions.Multiline);
foreach (string file in System.IO.Directory.GetFiles(basePath, "*.cs", System.IO.SearchOption.AllDirectories))
{
StripRegions(file, re);
}
}
Usage:
StripRegions(#"C:\sources\TestProject")
You can use the wildcard find/replace:
*\#region *
*\#endregion
And replace with no value. (Note the # needs to be escaped, as visual stuido uses it to match "any number")

Categories