Regular Expression For File Name in C# - c#

I am trying to find a regular expression to parse two sections out of the file name for the .resx files in my project. There is one main file called "UiText.resx" and then many translation .resx files with convention "UiText.ja-JP.resx". I need both the "UiText" and the "ja-JP" out of the latter string, as we do have other resx files that don't have to be for UiText (e.g. I have some files named "ExceptionText.resx").
The pattern I'm using right now (which works, it just requires a little extra coding after) is "(?<=\.)((.*?)(?=\.resx))". For the example above, "UiText.ja-JP.resx" gets me a match set in C# of "UiText.", "ja-JP.", "ja-JP.", ".resx"
Of course I am able to just take the first occurrence of "ja-JP." and "UiText." from this set and massage it to what I want, but I'd rather just have a cleaner "UiText" "ja-JP" and be done with it.
I figure I'll probably have to have at least two different patterns for this, so that is OK. Thank you in advance!

Since UiText seems to be constant you can use this regex to extract just js-JP into $1:
^UiText\.(.+?)\.resx$
https://regex101.com/r/XKvwHA/1/

If I'm understanding your needs correctly, then the main reason you need "UiText" is not because you have any value for the term itself, but rather because you need to filter your files. The real term you need to play around with is "ja-JP", which changes for the files you need.
If I'm correct, try this regex:
(?<=UiText\.).+(?=\.resx)
Used in C# as follows:
var fileName = "UiText.ja-JP.resx";
var result = new Regex(#"(?<=^UiText\.).+(?=\.resx$)").Match(fileName).Value;
A little explanation:
(?<=^UiText\.) Start of string must begin exactly with "UiText."
.+ Any number of characters (but at least one)
(?=\.resx$) End of string must end with ".resx"
Any file that doesn't meet your criteria will return an empty string for 'result'.

Related

C# Regex filter problems

At this moment in time, i posted something earlier asking about the same type of question regarding Regex. It has given me headaches, i have looked up loads of documentation of how to use regex but i still could not put my finger on it. I wouldn't want to waste another 6 hours looking to filter simple (i think) expressions.
So basically what i want to do is filter all filetypes with the endings of HTML extensions (the '*' stars are from a Winforms Tabcontrol signifying that the file has been modified. I also need them in IgnoreCase:
.html, .htm, .shtml, .shtm, .xhtml
.html*, .htm*, .shtml*, .shtm*, .xhtml*
Also filtering some CSS files:
.css
.css*
And some SQL Files:
.sql, .ddl, .dml
.sql*, .ddl*, .dml*
My previous question got an answer to filtering Python files:
.py, .py, .pyi, .pyx, .pyw
Expression would be: \.py[3ixw]?\*?$
But when i tried to learn from the expression above i would always end up with opening a .xhtml only, the rest are not valid.
For the HTML expression, i currently have this: \.html|.html|.shtml|.shtm|.xhtml\*?$ with RegexOptions.IgnoreCase. But the output will only allow .xhtml case sensitive or insensitive. .html files, .htm and the rest did not match. I would really appreciate an explanation to each of the expressions you provide (so i don't have to ask the same question ever again).
Thank you.
For such cases you may start with a simple regex that can be simplified step by step down to a good regex expression:
In C# this would basically, with IgnoreCase, be
Regex myRegex = new Regex("PATTERN", RegexOptions.IgnoreCase);
Now the pattern: The most easy one is simply concatenating all valid results with OR + escaping (if possible):
\.html|\.htm|\.shtml|\.shtm|\.xhtml|\.html*|\.htm*|\.shtml*|\.shtm*|\.xhtml*
With .html* you mean .html + anything, which is written as .*(Any character, 0-infinite times) in regex.
\.html|\.htm|\.shtml|\.shtm|\.xhtml|\.html.*|\.htm.*|\.shtml.*|\.shtm.*|\.xhtml.*
Then, you may take all repeating patterns and group them together. All file endings start with a dot and may have an optional end and ending.* always contains ending:
\.(html|htm|shtml|shtm|xhtml).*
Then, I see htm pretty often, so I try to extract that. Taking all possible characters before and after htm together (? means 0 or 1 appearance):
\.(s|x)?(htm)l?.*
And, I always check if it's still working in regexstorm for .Net
That way, you may also get regular expressions for the other 2 ones and concat them all together in the end.

Copy Every thing after a regex is matched

i have to create a function GetSourceCodeOfClass("ClassName",FilePath) this function will be used more than 10000 times to get Srouce code from c# Files, and from every source file i have to extract the source code of a complete class i.e
" Class someName { every thing in the body including sinature} "
Now this is simple, if a single file contains a single class but there will be many source files that will contain more than two classes in them , further more the bigger problem is there maybe nested classes inside a single class.
i want following thing :-
i want to extract the complete source of a given Class
if file contains more than two classes then i want to extract only the source code of specified class.
if file contains more than one class and my specified class have nested classes in it then i want to capture myClasses's source as well as all nested classes.
i have an algorithm in mid that is:
1-open file
2-match regex (C# classes signature ) - parameterized
#"(public|private|internal|protected|inline)?[\t ]*(static)?[\t
]class[\t ]" + sOurClassName + #"(([\t ][:][\t ]([a-zA-z]+(([
])[,]([ ])\w+))+))?\s[\n\r\t\s]?{"
3- If Regex is matched in the source file
4 Start copying at that point until the same regex is matched again but without parameters
regex is:
#" (public|private|internal|protected)?[\t ]*(static)?[\t ]class[\t
]\w+(([\t ][:][\t ]([a-zA-z]+(([ ])[,]([
])\w+))+))?\s[\n\r\t\s]?{"
(this is where i have no clue and i am stuck. I want to copy every thing after first matched to the second matched or after first match till the end )
copying nested classes is still an issue and i am still thinking about it if some one have an idea , can help me in this too.
Note- match.groups[0] or match.groups[1] this will only copy the signature but i want the complete source of the class thats why i am doing this way . ..
BTW i am using C#
I agree with Nathan's sentiment that you would be better using an existing C#-aware parser. Trying to write a regex for the task is a lot of work, and you are unlikely to get it right on the first try. It may work on your first example code, or even the first few, but eventually you'll find some code that's slightly different than what you expected and the regex will fail to catch something important.
That said, if you are comfortable with that limitation and risk, the general technique you are asking about (if I understand correctly…the question isn't entirely clear) is common enough, and worth understanding if you expect to use regex a lot. The key points to understand are that with a Match object, you can call the NextMatch() method to obtain the next match in the next, and that when calling the Regex.Match() method, you can pass the start and length of a substring you want to check, and it will limit its processing to that substring.
You can use the latter point to switch from one regex to another mid-parse.
In your scenario, I understand it to be that you want to run a regex containing the specific class name, to find that particular class in the file, and then to search the text after the initial match for any subsequent class in the file. If the second search finds something, you want to only return the text from the start of the first match to the start of the second match. If the second search finds nothing, you want to return the text from the start of the first match to the end of the whole file.
If that's correct, then something like this should work:
string ExtractClass(string fileContents, Regex classRegex, Regex nonClassRegex)
{
Match match1 = classRegex.Match(fileContents);
if (!match1.Success)
{
return null;
}
Match match2 = nonClassRegex.Match(fileContents, match1.Index + match1.Length);
if (!match2.Success)
{
return fileContents.Substring(match1.Index);
}
return fileContents.Substring(match1.Index, match2.Index - match1.Index);
}
I should note that between two class declarations, or between the end of a lone class declaration and the actual end of the file there can easily be other non-white-space text that isn't part of the class declaration. I assume you have a plan for dealing with that.
If the above doesn't address your need, you should examine your question closely, and edit it both for length and clarity.

Looking for simple yet powerful windows wildcards (`*, ?`) matching implementation

I'm looking for simple and powerful way to implement Windows flavoured * and ? wildcards matching in strings.
BeginsWith(), EndsWith() too simple to cover all cases, while translating wildcards expressions to regex'es will look to complex and I'm not sure about performance.
A happy medium wanted.
EDIT: I'm trying to parse .gitignore file and match the same files, as Git does. This means:
File should be out of repository's index (so I'm checking file's path against one stored in index)
Number of patterns in .gitignore can be large;
Number of files to check might also be large.
The equivalents of the Windows wildcards ? and * in regex are just . and .*.
[Edit] Given your new edit (stating that you're looking for actual files), I would skip the translation altogether and let .Net do the searching using Directory.GetFiles().
(note that, for some reason, passing a ? into Directory.GetFiles() matches "zero or one characters," whereas in Windows it always matches exactly one character)
To get an exact match including all corner-cases, use
System.IO.Directory.GetFiles(myPath, myPattern)
You may have to create some tempfiles form your targetstrings first.
In other words, I think you should keep your patterns dry until it's time to meet the filesytem.
You should go with regex based approach unless your data volume is humungous or you have data-points to say regex will severely impact performance.
If that is the case, any other solution will also likely affect the performance and you will probably need to hand-roll something.
Converting * and ? to regex is quite easy.
For ? replace the "?" with ".{1}"
and for * replace the "*" with ".+?"
That should get you the same behaviour as wildcard matching on windows.
EDIT:
boolean PathMatchSpec(input, pattern) will do the job.
Private Declare Auto Function PathMatchSpec Lib "shlwapi" (ByVal pszFileParam As String, ByVal pszSpec As String) As Boolean

Conditional Regex Replace in C# without MatchEvaluator

So, Im trying to make a program to rename some files. For the most part, I want them to look like this,
[Testing]StupidName - 2[720p].mkv
But, I would like to be able to change the format, if so desired. If I use MatchEvaluators, you would have to recompile every time. Thats why I don't want to use the MatchEvaluator.
The problem I have is that I don't know how, or if its possible, to tell Replace that if a group was found, include this string. The only syntax for this I have ever seen was something like (?<group>:data), but I can't get this to work. Well if anyone has an idea, im all for it.
EDIT:
Current Capture Regexes =
^(\[(?<FanSub>[^\]\)\}]+)\])?[. _]*(?<SeriesTitle>[\w. ]*?)[. _]*\-[. _]*(?<EpisodeNumber>\d+)[. _]*(\-[. _]*(?<EpisodeName>[\w. ]*?)[. _]*)?([\[\(\{](?<MiscInfo>[^\]\)\}]*)[\]\)\}][. _]*)*[\w. ]*(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*[Ss](?<SeasonNumber>\d+)[Ee](?<EpisodeNumber>\d+).*?(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*(?<SeasonNumber>\d)(?<EpisodeNumber>\d{2}).*?(?<Extension>\.[a-zA-Z]+)$
Current Replace Regex = [${FanSub}]${SeriesTitle} - ${EpisodeNumber} [${MiscInfo}]${Extension}
Using Regex.Replace, the file TestFile 101.mkv, I get []TestFile - 1[].mkv. What I want to do is make it so that [] is only included if the group FanSub or MiscInfo was found.
I can solve this with a MatchEvaluator because I actually get to compile a function. But this would not be a easy solution for users of the program. The only other idea I have to solve this is to actually make my own Regex.Replace function that accepts special syntax.
It sounds like you want to be able to specify an arbitrary format dynamically rather than hard-code it into your code.
Perhaps one solution is to break your filename parts into specific groups then pass in a replacement pattern that takes advantage of those group names. This would give you the ability to pass in different replacement patterns which return the desired filename structure using the Regex.Replace method.
Since you didn't explain the categories of your filename I came up with some random groups to demonstrate. Here's a quick example:
string input = "Testing StupidName Number2 720p.mkv";
string pattern = #"^(?<Category>\w+)\s+(?<Name>.+?)\s+Number(?<Number>\d+)\s+(?<Resolution>\d+p)(?<Extension>\.mkv)$";
string[] replacePatterns =
{
"[${Category}]${Name} - ${Number}[${Resolution}]${Extension}",
"${Category} - ${Name} - ${Number} - ${Resolution}${Extension}",
"(${Number}) - [${Resolution}] ${Name} [${Category}]${Extension}"
};
foreach (string replacePattern in replacePatterns)
{
Console.WriteLine(Regex.Replace(input, pattern, replacePattern));
}
As shown in the sample, named groups in the pattern, specified as (?<Name>pattern), are referred to in the replacement pattern by ${Name}.
With this approach you would need to know the group names beforehand and pass these in to rearrange the pattern as needed.

Regular expression to define format of backup filenames

In the application I am currently working on, I have an option to create automatic backups of a certain file on the hard disk. What I would like to do is offer the user the possibility to configure the name of the file and its extension.
For example, the backup filename could be something like : "backup_month_year_username.bak". I had the idea to save the format in the form of a regular expression. For the example above, the regexp would look like :
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w).(?<extension>bak)$"
I thought about using regex because I will also have to browse through the directory of backuped files to delete those older than a certain date. The main trouble I have now is how to create a filename using the regex. In a way I should replace the tags with the information. I could do that using regex.replace and another regex, but I feel it's a big weird doing that and it might be a better way.
Thanks
[Edit] Maybe I wasn't really clear in the first go, but the idea is of course that the user (in this case an admin that will know regex syntax) will have the possibility to modify the form of the filename, that's all the idea behind it[/Edit]
... and if the regex changes, it is next to impossible to reconstruct a string from a given regex.
Edit:
Create some predefined "place-holders": %u could be the user's name, %y could be the year, etc.:
backup_%m_%y_%u.bak
and then simple replace the %? with their actual values.
It sounds like you're trying to use the regular expression to create the file name from a pattern which the user should be able to specify.
Regular expressions can - AFAIK - not be used to create output, but only to validate input, so you'd have the user specify two things:
a file name production pattern like Bart suggested
a validation pattern in form of a regular expression that helps you split the file names into their parts
EDIT
By the way, your sample regex contains an error: The "." is use for "any character", also \w only matches one word character, so I guess you meant to write
"^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
If the filename is always in this form, there is no reason for a regex, as it's easier to process with string.Split ...
With Bart's solution it is easy enough to split (using string.Split) the generated file name using underscore as the delimiter, to get back the information.
Ok, I think I have found a way to use only the regex. As I am using groups to get the information, I will use another regular expression to match the regular expression and replace the groups with the value:
Regex rgx = new Regex("\(\?\<Month\>.+?\)");
rgx.Replace("^backup_(?<Month>\d{2})_(?<Year>\d{2})_(?<Username>\w+)\.(?<extension>bak)$"
, DateTime.Now.Month.ToString());
Ok, it's really a hack, but at least it works and I have only one pattern defined by the user. It might not work if the regex is too complex, but I think I can deal with that problem.
What do you think?

Categories