How can I generate a safe class name from a file name? - c#

I'm trying to produce some dynamically compiled code with the Razor engine, and I want to name the generated classes according to their source file names to help understand where a piece of generated code comes from.
For example, I would expect the file C:\source\Foo.cs to be compile with the name Foo.
Given that I have the path to the source file being compiled, is there a way to generate a valid C# identifier based on the file name?

According to the C# spec, the following rules must be adhered to when creating identifiers:
An identifier must start with a letter or an underscore
After the first character, it may contain numbers, letters, connectors, etc
If the identifier is a keyword, it must be prepended with “#”
This helper will satisfy those conditions:
private static string GenerateClassName(string value)
{
string className = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(value);
bool isValid = Microsoft.CSharp.CSharpCodeProvider.CreateProvider("C#").IsValidIdentifier(className);
if (!isValid)
{
// File name contains invalid chars, remove them
Regex regex = new Regex(#"[^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Nl}\p{Mn}\p{Mc}\p{Cf}\p{Pc}\p{Lm}]");
className = regex.Replace(className, "");
// Class name doesn't begin with a letter, insert an underscore
if (!char.IsLetter(className, 0))
{
className = className.Insert(0, "_");
}
}
return className.Replace(" ", string.Empty);
}
It first converts the file name to camel case (personal preference), it then uses IsValidIdentifier to determine if the file name is already valid for a class name.
If not, it will remove all invalid characters based on the unicode character classes. It then checks whether the file name starts with a letter, if it does, it prepends an _ to fix it.
Finally, I remove all whitespace (even though it would still be a valid identifier with it).

First, you need to extract the File-Name, for example with:
Path.GetFileNameWithoutExtension
Then you have to follow all rules, a c#-class name has.
For example
Starting with a letter or _
i would remove all other characters than _ AND a-z AND 0-9
This should be all!

did you look at the codedom - http://msdn.microsoft.com/en-us/library/ms404245(v=vs.110).aspx ?

Take the path, replace the invalid characters like \ with let's say _ and you're done.
If you prefer shorter names, you could take the path, transform it to lowercase and take a hash value.
Some code sample:
var className = pathIncludingFilename.ToLowerSinceCasingIsNotRelevant().SomeHashFunctionLikeSha1OrPartOfIt() + filename.RemoveInvalidCharactersLikeWhitespace();
The result may look like this:
123a3b6b22foo
The hash should ensure unique names, the filename makes it easier to correlate.

Related

RegularExpressionAttribute Equivalent for SymbolUtilityServices.ValidateSymbolName

In AutoCAD there is a utility for determining if a string is valid for a symbol name, i.e. a Block or Layer name for instance. This utility is:
try
{
// Validate the provided symbol table name
SymbolUtilityServices.ValidateSymbolName(s, false);
System.Windows.Forms.MessageBox.Show(s + " is a valid name.");
}
catch
{
// An exception has been thrown, indicating that
// the name is invalid
System.Windows.Forms.MessageBox.Show(s + " is an invalid name.");
}
where "s" is the string you are testing.
See How to check if a given string is a valid name for an item in a symbol table?
Since this tool throws an exception if the name is out of compliance, I would much rather use a Regex Attribute to do the same, something like:
[RegularExpressionAttribute(#"^[a-Z]+$", ErrorMessage = "Special characters not allowed")]
But here lies my problem I am not well versed with Regex. So what would the expression be to disallow these characters:
\<>/?":;*|,=`
(spaces allowed)
Your thoughts and help are appreciated.
Matt
This expression:
[RegularExpressionAttribute(#"^[a-zA-Z \d_-]+$", ErrorMessage = "Certain special characters not allowed")]
Does seem to do the trick, I put this together, but I feel like it doesn't explicitly disallow the characters, instead, it only allows certain characters.
If there is a more concise answer I will accept it.

C# Save file with "/" in name

FileName contains e.g. Legend/Dery//Times
File.WriteAllBytes("/Pictures" + FileName, buffer);
I can´t save the file because the "/" considered as path, I also can´t remove the "/", because I need it for further processing. Is there any way of saving such file?
You're out of luck. A forward slash can't be part of a file name.
You need to escape it somehow (i.e. change the name but provide a way of changing it back), but there isn't really a conventional way of doing that.
I've seen % been used for this purpose, with %% used to denote a single %, and something like %f for a forward slash, %b for a backslash, etc.
There are rules for names and folders defined by Microsoft that mean you are not allowed to do this.
Instead of escaping in i suggest normalizing your input both when you save a file and when you try to access a file:
//replace all illegal characters with regex (with a dash):
new Regex(#"[<>:""/\\|?*]").Replace("Inpu|t","-")
//Or just replace all non alpha numeric characters (with a dash):
new Regex(#"[^a-zA-Z0-9\-]").Replace("Inpu|t","-")
this way you will always have clean file and folder names and don't have to worry about illegal names.

Extract second section of string

I will have always an string like this:
"/FirstWord/ImportantWord/ThirdWord"
How can I extract the ImportantWord? Words can contain at most one space and they are separated by forward slashlike I put above, for example:
"/Folder/Second Folder/Content"
"/Main folder/Important/Other Content"
I always want to get the second word(Second Folder and Important considering above examples)
how about this:
string ImportantWord = path.Split('/')[2]; // Index 2 will give the required word
I hope you need not to use the String.Split option either with specific characters or with some regular expressions. Since the inputs are well qualified paths to a directory you can use Directory.GetParent method of the System.IO.Directory class, which will give you the parent Directory as DirectoryInfo. From that you can take the Name of Directory which will be the required text.
You can use like this :
string pathFirst = "/Folder/Second Folder/Content";
string pathSecond = "/Main folder/Important/Other Content";
string reqWord1 = Directory.GetParent(pathFirst ).Name; // will give you Second Folder
string reqWord2 = Directory.GetParent(pathSecond).Name; // will give you Important
Additional note: The method Directory.GetParent can be nested if you need to get a name in another level.
Also you may try this:
var stringValue = "/FirstWord/ImportantWord/ThirdWord";
var item = stringValue.Split('/').Skip(2).First(); //item: ImportantWord
There are several ways to solve this. The simplest one is using String.split
Char delimiter = '/';
String[] substrings = value.Split(delimiter);
String secondWord = substrings[1];
(you may want to do some input check to make sure the input is in the right format or else you will get some exception)
Other way is using regex when the pattern is simple /
If you are sure this is a path you can use other answer mention here

"Evaluate" a c# string

I am reading a C# source file.
When I encounter a string, I want to get it's value.
For instance, in the following example:
public class MyClass
{
public MyClass()
{
string fileName = "C:\\Temp\\A Weird\"FileName";
}
}
I would like to retrieve
C:\Temp\A Weird"FileName
Is there an existing procedure to do that?
Coding a solution with all the possible cases should be quite tricky (#, escape sequences. ...).
I am convinced such procedure exists...
I would like to have the dual function too (to inject a string into a C# source file)
Thanks in advance.
Philippe
P.S:
I gave an example with a filename, but I look for a solution working for all kinds of strings.
I'm pretty sure you can use CodeDOM to read a C# code file and parse its elements. It generates a code tree, and then you can look for nodes representing strings.
http://www.codeproject.com/Articles/2502/C-CodeDOM-parser
Other CodeDom parsers:
http://www.codeproject.com/Articles/14383/An-Expression-Parser-for-CodeDom
NRefactory: https://github.com/icsharpcode/NRefactory and http://www.codeproject.com/Articles/408663/Using-NRefactory-for-analyzing-Csharp-code
There is a way of extracting these strings using a regular expression:
("(\\"|[^"])*")
This particular one works on your simple example and gives the filename (complete with leading and trailing quote characters); whether it would work on more complex ones I can't easily tell unfortunately.
For clarity, (\\"|[^"]) matches any character apart from ", except where it has a leading \ character.
Just use ".*" Regex to match all string values, then remove trailing inverted commas and unescape it.
this will allow \" and "" characters inside your string
so both "C:\\Temp\\A Weird\"FileName" and "Hello ""World""" will match

What is this piece of code doing with RegEx

Have been studying a sample source code and I can't understand this part, what is this piece of code doing? Mostly the RegEx part...
in the parameters used, "code" is a string, it is C# source code we are passing in.
Match m = null;
if ((m = Regex.Match(code, "(?ims)^[/']{2}REFDLL (?<ref>.+?)$")).Success)
{
foreach (string refDll in m.Groups["ref"].Value.Split(new char[] { ';', ',' }))
{
//2008-06-18 by Jeffrey, remove redundant \r
string mdfyRefDll = refDll.Replace("\r", "").Replace("\n", "");
//trim the ending .dll if exists
if (mdfyRefDll.ToLower().EndsWith(".dll"))
mdfyRefDll = mdfyRefDll.Substring(0, mdfyRefDll.Length - 4);
string lcRefDll = mdfyRefDll.ToLower();
if (lcRefDll == "system.data.linq" || lcRefDll == "system"
|| lcRefDll == "system.xml.linq" || lcRefDll == "system.core")
continue;
cp.ReferencedAssemblies.Add(mdfyRefDll + ".dll");
}
}
I think this image addresses what's going on in the code you posted:
Mini C# Lab's project description is as follows:
A handy tool for simple short C# code running and testing, you can
save time on waiting for Visual Studio startup and avoid creating a
lot of one-time only project files.
It seems like that project is missing documentation, so it's difficult to extrapolate why the author of the code chose that particular way to add referenced DLLs when there is a using directive in there already. Perhaps he did it to avoid conflicts with the using statement.
First, (?ims) is specifying options. i triggers case-insensitivity, m specifies multi-line mode, and s (IIRC) enables the dot-all option, meaning that the wildcard . includes newline characters.
Then, ^ asserts, "The string must begin here, with no preceding characters..." while the $ at the end asserts, "The string must end here, with no following characters."
The [/']{2} matches exactly two of either the slash or single-quote characters, i.e. //, '', /', and '/.
The REFDLL matches exactly what you see.
The (?<ref>.+?) matches all remaining characters (the final question mark is unnecessary), and remember, due to the s option, this includes newline characters. This portion is stored in a match named ref.
In summary, it's trying to match something like
//REFDLL helloworld foobar
and stores "helloworld foobar" in ref.

Categories