RegEx For Validating File Names - c#

I have some MP3 files that are named with a particular syntax, for example:
1 - Sebastian Ingrosso - Calling (Feat. Ryan Tedder)
I have written a small program in C# that reads the Track, Artist and Title from the ID3 tags. What i would like to do is write a regex expression that can validate that the files are in fact named with the syntax listed above.
So i have a class called song:
//Properties
public string Filename
{
get { return _filename; }
set { _filename = value; }
}
public string Title
{
get { return _title; }
set { _title = value; }
}
public string Artist
{
get { return _artist; }
set { _artist = value; }
}
//Methods
public bool Parse(string strfile)
{
bool CheckFile;
Tags.ID3.ID3v1 Song = new Tags.ID3.ID3v1(strfile, true);
Filename = Song.FileName;
Title = Song.Title;
Artist = Song.Artist;
//Check File Name Formatting
string RegexBuilder = #"\d\s-\s" + Artist + #"\s-\s" + Title;
if (Regex.IsMatch(Filename, RegexBuilder))
{
CheckFile = true;
}
else
{
CheckFile = false;
}
return CheckFile;
}
So it works, MOST OF THE TIME. The minute i have a (Feat. ) in the title it fails. The closest i could come up with is:
\d\s-\s\Artist\s-\s.*
That's obviously not going to work as any text would pass the test, I have tried my very best but I have only been programming for two weeks.
tl;dr Would like song to pass a regex test whether it contains a featured artist or not, for example:
1 - Sebastian Ingrosso - Calling (Feat. Ryan Tedder)
and
1 - Flo Rida - Whistle
Should both pass the test.

The problem is that the "(" and ")" in your regex have meaning to the Regex engine. You should use the following code:
string RegexBuilder = #"\d\s-\s" + Regex.Escape(Artist) + #"\s-\s" + Regex.Escape(Title);
The Escape function will change "(Feat. )" to "\(Feat. \)", which will ensure that you are matching the parentheses and not grouping "Feat. ".
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape.aspx

Related

Find comments in text and replace them using Regex

I currently go trought all my source files and read their text with File.ReadAllLines and i want to filter all comments with one regex. Basically all comment possiblities. I tried several regex solutions i found on the internet. As this one:
#"(#(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/"
And the top result when i google:
string blockComments = #"/\*(.*?)\*/";
string lineComments = #"//(.*?)\r?\n";
string strings = #"""((\\[^\n]|[^""\n])*)""";
string verbatimStrings = #"#(""[^""]*"")+";
See: Regex to strip line comments from C#
The second solution won't recognize any comments.
Thats what i currently do
public static List<string> FormatList(List<string> unformattedList, string dataType)
{
List<string> formattedList = unformattedList;
string blockComments = #"/\*(.*?)\*/";
string lineComments = #"//(.*?)\r?\n";
string strings = #"""((\\[^\n]|[^""\n])*)""";
string verbatimStrings = #"#(""[^""]*"")+";
string regexCS = blockComments + "|" + lineComments + "|" + strings + "|" + verbatimStrings;
//regexCS = #"(#(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/";
string regexSQL = "";
if (dataType.Equals("cs"))
{
for(int i = 0; i < formattedList.Count;i++)
{
string line = formattedList[i];
line = line.Trim(' ');
if(Regex.IsMatch(line, regexCS))
{
line = "";
}
formattedList[i] = line;
}
}
else if(dataType.Equals("sql"))
{
}
else
{
throw new Exception("Unknown DataType");
}
return formattedList;
}
The first Method recognizes the comments, but also finds things like
string[] bla = text.Split('\\\\');
Is there any solution to this problem? That the regex excludes the matches which are in a string/char? If you have any other links i should check out please let me know!
I tried a lot and can't figure out why this won't work for me.
[I also tried these links]
https://blog.ostermiller.org/find-comment
https://codereview.stackexchange.com/questions/167582/regular-expression-to-remove-comments
Regex to find comment in c# source file
Doing this with regexes will be very difficult, as stated in the comments. However, a fine way to eliminate comments would be by utilizing a CSharpSyntaxWalker. The syntaxwalker knows about all language constructs and won't make hard to investigate mistakes (as regexes do).
Add a reference to the Microsoft.CodeAnalysis.CSharp Nuget package and inherit from CSharpSyntaxWalker.
class CommentWalker : CSharpSyntaxWalker
{
public CommentWalker(SyntaxWalkerDepth depth = SyntaxWalkerDepth.Node) : base(depth)
{
}
public override void VisitTrivia(SyntaxTrivia trivia)
{
if (trivia.IsKind(SyntaxKind.MultiLineCommentTrivia)
|| trivia.IsKind(SyntaxKind.SingleLineCommentTrivia))
{
// Do something with the comments
// For example, find the comment location in the file, so you can replace it later.
// Make a List as a public property, so you can iterate the list of comments later on.
}
}
}
Then you can use it like so:
// Get the program text from your .cs file
SyntaxTree tree = CSharpSyntaxTree.ParseText(programText);
CompilationUnitSyntax root = tree.GetCompilationUnitRoot();
var walker = new CommentWalker();
walker.Visit(root);
// Now iterate your list of comments (probably backwards) and remove them.
Further reading:
Syntax walkers
Checking for big blocks of comments in code (NDepend, Roslyn)

Using split() method without text qualifier

I'm trying to get some field value from a text file using a streamReader.
To read my custom value, I'm using split() method. My separator is a colon ':' and my text format looks like:
Title: Mytitle
Manager: Him
Thema: Free
.....
Main Idea: best idea ever
.....
My problem is, when I try to get the first field, which is title, I use:
string title= text.Split(:)[1];
I get title = MyTitle Manager
instead of just: title= MyTitle.
Any suggestions would be nice.
My text looks like this:
My mail : ........................text............
Manager mail : ..................text.............
Entity :.......................text................
Project Title :...............text.................
Principal idea :...................................
Scope of the idea : .........text...................
........................text...........................
Description and detail :................text.......
..................text.....
Cost estimation :..........
........................text...........................
........................text...........................
........................text...........................
Advantage for us :.................................
.......................................................
Direct Manager IM :................................
Updated per your post
//I would create a class to use if you haven't
//Just cleaner and easier to read
public class Entry
{
public string MyMail { get; set; }
public string ManagerMail { get; set; }
public string Entity { get; set; }
public string ProjectTitle { get; set; }
// ......etc
}
//in case your format location ever changes only change the index value here
public enum EntryLocation
{
MyMail = 0,
ManagerMail = 1,
Entity = 2,
ProjectTitle = 3
}
//return the entry
private Entry ReadEntry()
{
string s =
string.Format("My mail: test#test.com{0}Manager mail: test2#test2.com{0}Entity: test entity{0}Project Title: test project title", Environment.NewLine);
//in case you change your delimiter only need to change it once here
char delimiter = ':';
//your entry contains newline so lets split on that first
string[] split = s.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
//populate the entry
Entry entry = new Entry()
{
//use the enum makes it cleaner to read what value you are pulling
MyMail = split[(int)EntryLocation.MyMail].Split(delimiter)[1].Trim(),
ManagerMail = split[(int)EntryLocation.ManagerMail].Split(delimiter)[1].Trim(),
Entity = split[(int)EntryLocation.Entity].Split(delimiter)[1].Trim(),
ProjectTitle = split[(int)EntryLocation.ProjectTitle].Split(delimiter)[1].Trim()
};
return entry;
}
That is because split returns strings delimited by the sign you've specified. In your case:
Title
Mytitle Manager
Him
.1. You can change your data format to get the value you need, for example:
Title: Mytitle:Manager: Him
There each second element will be the value.
text.Split(:)[1] == " Mytitle";
text.Split(:)[3] == " Him";
.2. Or you can call text.Split(' ', ':') to get identical list of name-value pairs without format change.
.3. Also if your data is placed each on a new line in the file like:
Title: Mytitle
Manager: Him
And you content is streamed into single string then you can also do:
text.Split(new string[] {Environment.NewLine, ":"}, StringSplitOptions.None);

Set String.Format at runtime

I have an XML File that I want to allow the end user to set the format of a string.
ex:
<Viewdata>
<Format>{0} - {1}</Format>
<Parm>Name(property of obj being formatted)</Parm>
<Parm>Phone</Parm>
</Viewdata>
So at runtime I would somehow convert that to a String.Format("{0} - {1}", usr.Name, usr.Phone);
Is this even possible?
Of course. Format strings are just that, strings.
string fmt = "{0} - {1}"; // get this from your XML somehow
string name = "Chris";
string phone = "1234567";
string name_with_phone = String.Format(fmt, name, phone);
Just be careful with it, because your end user might be able to disrupt the program. Do not forget to FormatException.
I agree with the other posters who say you probably shouldn't be doing this but that doesn't mean we can't have fun with this interesting question. So first of all, this solution is half-baked/rough but it's a good start if someone wanted to build it out.
I wrote it in LinqPad which I love so Dump() can be replaced with console writelines.
void Main()
{
XElement root = XElement.Parse(
#"<Viewdata>
<Format>{0} | {1}</Format>
<Parm>Name</Parm>
<Parm>Phone</Parm>
</Viewdata>");
var formatter = root.Descendants("Format").FirstOrDefault().Value;
var parms = root.Descendants("Parm").Select(x => x.Value).ToArray();
Person person = new Person { Name = "Jack", Phone = "(123)456-7890" };
string formatted = MagicFormatter<Person>(person, formatter, parms);
formatted.Dump();
/// OUTPUT ///
/// Jack | (123)456-7890
}
public string MagicFormatter<T>(T theobj, string formatter, params string[] propertyNames)
{
for (var index = 0; index < propertyNames.Length; index++)
{
PropertyInfo property = typeof(T).GetProperty(propertyNames[index]);
propertyNames[index] = (string)property.GetValue(theobj);
}
return string.Format(formatter, propertyNames);
}
public class Person
{
public string Name { get; set; }
public string Phone { get; set; }
}
XElement root = XElement.Parse (
#"<Viewdata>
<Format>{0} - {1}</Format>
<Parm>damith</Parm>
<Parm>071444444</Parm>
</Viewdata>");
var format =root.Descendants("Format").FirstOrDefault().Value;
var result = string.Format(format, root.Descendants("Parm")
.Select(x=>x.Value).ToArray());
What about specify your format string with parameter names:
<Viewdata>
<Format>{Name} - {Phone}</Format>
</Viewdata>
Then with something like this:
http://www.codeproject.com/Articles/622309/Extended-string-Format
you can do the work.
Short answer is yes but it depends on the variety of your formatting options how difficult it is going to be.
If you have some formatting strings that accept 5 parameter and some other that accept only 3 that you need to take that into account.
I’d go with parsing XML for params and storing these into array of objects to pass to String.Format function.
You can use System.Linq.Dynamic and make entire format command editable:
class Person
{
public string Name;
public string Phone;
public Person(string n, string p)
{
Name = n;
Phone = p;
}
}
static void TestDynamicLinq()
{
foreach (var x in new Person[] { new Person("Joe", "123") }.AsQueryable().Select("string.Format(\"{0} - {1}\", it.Name, it.Phone)"))
Console.WriteLine(x);
}

How to make filenames web safe using c#

This is not about encoding URLs its more to do with a problem I noticed where you can have a valid filename on IIS sucha as "test & test.jpg" but this cannot be downloaded due to the & causing an error. There are other characters that do this also that are valid in windows but not for web.
My quick solution is to change the filename before saving using a regex below...
public static string MakeFileNameWebSafe(string fileNameIn)
{
string pattern = #"[^A-Za-z0-9. ]";
string safeFilename = System.Text.RegularExpressions.Regex.Replace(fileNameIn, pattern, string.Empty);
if (safeFilename.StartsWith(".")) safeFilename = "noname" + safeFilename;
return safeFilename;
}
but I was wondering if there were any better built in ways of doing this.
Built-in I don't know about.
What you can do is, like you say, scan the original filename and generate a Web-safe version of it.
For such Web-safe versions, you can make it appear like slugs in blogs and blog categories (these are search engine-optimized):
Only lowercase characters
Numbers are allowed
Dashes are allowed
Spaces are replaced by dashes
Nothing else is allowed
Possibly you could replace "&" by "-and-"
So "test & test.jpg" would translate to "test-and-test.jpg".
Just looking back at this question since its fairly popular. Just though I would post my current solution up here with various overloads for anyone who wants it..
public static string MakeSafeFilename(string filename, string spaceReplace)
{
return MakeSafeFilename(filename, spaceReplace, false, false);
}
public static string MakeSafeUrlSegment(string text)
{
return MakeSafeUrlSegment(text, "-");
}
public static string MakeSafeUrlSegment(string text, string spaceReplace)
{
return MakeSafeFilename(text, spaceReplace, false, true);
}
public static string MakeSafeFilename(string filename, string spaceReplace, bool htmlDecode, bool forUrlSegment)
{
if (htmlDecode)
filename = HttpUtility.HtmlDecode(filename);
string pattern = forUrlSegment ? #"[^A-Za-z0-9_\- ]" : #"[^A-Za-z0-9._\- ]";
string safeFilename = Regex.Replace(filename, pattern, string.Empty);
safeFilename = safeFilename.Replace(" ", spaceReplace);
return safeFilename;
}
I think you are referring to the "A potentially dangerous Request.Path value was detected from the client (%)" error which Asp.Net throws for paths which include characters which might indicate cross site scripting attempts:
there is a good article on how to work around this:
http://www.hanselman.com/blog/ExperimentsInWackinessAllowingPercentsAnglebracketsAndOtherNaughtyThingsInTheASPNETIISRequestURL.aspx
Here's the one I use:
public static string MakeFileNameWebSafe(string path, string replace, string other)
{
var folder = System.IO.Path.GetDirectoryName(path);
var name = System.IO.Path.GetFileNameWithoutExtension(path);
var ext = System.IO.Path.GetExtension(path);
if (name == null) return path;
var allowed = #"a-zA-Z0-9" + replace + (other ?? string.Empty);
name = System.Text.RegularExpressions.Regex.Replace(name.Trim(), #"[^" + allowed + "]", replace);
name = System.Text.RegularExpressions.Regex.Replace(name, #"[" + replace + "]+", replace);
if (name.EndsWith(replace)) name = name.Substring(0, name.Length - 1);
return folder + name + ext;
}
If you are not concerned to keep the original name perhaps you could just replace the name with a guid?

How do I implement such template engine?

What I got:
I got a textual representation which my program converts to a much more readable format, especcially for forums, websites and so on.
Why do I need such templage engine
As there are many different forums and blogs, the syntax of each one might be different. Instead of hard-coding the different syntax I would like to generate one class for each of those syntax (preferable extandable with easy modified xml-files) to format my output with the desired syntax.
What I did imagine
For example I need something like
class xyz {
private string start_bold = "[B]";
private string end_bold = "[/B]";
public string bold(string s) {
return start_bold + s + end_bold;
}
}
How can I do that the most elegant way? Feel free to edit this question as I'm not entirely sure it's a template engine I need. Just don't got a better word for it now.
Thanks for any help.
Some additional information:
Andrew's answer was a great hint, but I don't understand how I could several different styles with this method. Currently I do it the hard way:
string s = String.Format("Output of [B]{0}[b] with number [i]{1}[/i]",
Data.Type,
Data.Number);
For this example, I want the output to be designed for a forum. In future I would like to do it like this:
Layout l = new Layout("html");
string s = String.Format("Output of {0} with number {1},
l.bold(Data.Type),
l.italic(Data.Number);
//desired output if style "html" is chosen:
"Output of <b>Name</b> with number <i>5</i>"
//desired output if style "phpbb" is chosen:
"Output of [b]Name[/b] with number [i]5[/i]"
I just don't know how this can be done in the most elegant way.
About the XML: Only the styling conventions should be derived by a xml-document, i.e. adding custom styles without using code.
I would use exension metods. Then you could call string.bold().
I think this would be the syntax:
class xyz {
private string start_bold = "[B]";
private string end_bold = "[/B]";
public static string bold(this string x) {
return start_bold + x + end_bold;
}
}
See: http://msdn.microsoft.com/en-us/library/bb383977.aspx
I'm leaving the code below as an example, but I think what you really need is something along the lines of a "token system"
Say you have a string as such:
string s = "I want {~b}this text to be bold{~~b} and {~i}this text to be italics{~~i}"
You XML document should contain these nodes (i think, my xml is kinda rusty)
<site>
<html>
<style value="{~b}">[b]</style>
<style value="{~~b}">[/b]</style>
<style value="{~i}">[i]</style>
<style value="{~~i}">[/i]</style>
</html>
<phpBBCode>
......
public class Layout {
//private string start_bold = "[B]";
//private string end_bold = "[/B]";
//private string start_italics = "[I]";
//private string end_italics = "[/I]";
private string _stringtoformat;
public string StringToFormat {set{ _stringtoformat = value;}};//syntax is wrong
private string _formattedString;
public string FormattedString {get return _formattedString;}
public Layout(string formattype, int siteid)
{
//get format type logic here
//if(formattype.ToLower() =="html")
//{ . . . do something . . . }
//call XML Doc for specific site, based upon formattype
if(!String.IsNullorEmpty(_stringtoformat))
{
//you will want to put another loop here to loop over all of the custom styles
foreach(node n in siteNode)
{
_stringtoformat.Replace(n.value, n.text);
}
}
//Sorry, can't write XML document parsing code off the top of my head
_formattedString = _stringtoformat;
}
public string bold(this string x) {
return start_bold + x + end_bold;
}
public string italics(this string x) {
return start_italics + x+ end_italics;
}
}
IMPLEMENTATION
Layout l = new Layout("html", siteidorsomeuniqeidentifier);
l.html = stringtoformat;
output = l.formattedstring;
The code can be better, but it should give you a kick in the right direction :)
EDIT 2: based upon further info.....
If you want to do this:
Layout l = new Layout("html");
string s = String.Format("Output of {0} with number {1},
l.bold(Data.Type),
l.italic(Data.Number);
and you are looking to change l.bold() and l.italic() based upon the blog engines specific mark up . . .
public class Layout {
private string start_bold = "[B]";
private string end_bold = "[/B]";
private string start_italics = "[I]";
private string end_italics = "[/I]";
public Layout(string formattype, int siteid)
{
//get format type logic here
//if(formattype.ToLower() =="html")
//{ . . . do something . . . }
//call XML Doc for specific site, based upon formattype
start_bold = Value.From.XML["bold_start"];
end_bold = Value.From.XML["bold_end"];
//Sorry, can't write XML document parsing code off the top of my head
}
public string bold(this string x) {
return start_bold + x + end_bold;
}
public string italics(this string x) {
return start_italics + x+ end_italics;
}
}
Layout l = new Layout("html", siteid);
string s = String.Format("Output of {0} with number {1},
ValueToBeBoldAsAstring.bold(),
ValueToBeItalicAsAstring.italic());

Categories