Regex required for renaming file in C# - c#

I need a regex for renaming file in c#. My file name is 22px-Flag_Of_Sweden.svg.png. I want it to rename as sweden.png.
So for that I need regex. Please help me.
I have various files more than 300+ like below:
22px-Flag_Of_Sweden.svg.png - should become sweden.png
13px-Flag_Of_UnitedStates.svg.png - unitedstates.png
17px-Flag_Of_India.svg.png - india.png
22px-Flag_Of_Ghana.svg.png - ghana.png
These are actually flags of country. I want to extract Countryname.Fileextension. Thats all.

var fileNames = new [] {
"22px-Flag_Of_Sweden.svg.png"
,"13px-Flag_Of_UnitedStates.svg.png"
,"17px-Flag_Of_India.svg.png"
,"22px-Flag_Of_Ghana.svg.png"
,"asd.png"
};
var regEx = new Regex(#"^.+Flag_Of_(?<country>.+)\.svg\.png$");
foreach ( var fileName in fileNames )
{
if ( regEx.IsMatch(fileName))
{
var newFileName = regEx.Replace(fileName,"${country}.png").ToLower();
//File.Save(Path.Combine(root, newFileName));
}
}

I am not exactly sure how this would look in c# (although the regex is important and not the language), but in Java this would look like this:
String input = "22px-Flag_Of_Sweden.svg.png";
Pattern p = Pattern.compile(".+_(.+?)\\..+?(\\..+?)$");
Matcher m = p.matcher(input);
System.out.println(m.matches());
System.out.println(m.group(1).toLowerCase() + m.group(2));
Where the relevant for you is this part :
".+_(.+?)\\..+?(\\..+?)$"
Just concat the two groups.
I wish I knew a bit of C# right now :)
Cheers Eugene.

This will return country in the first capture group: ([a-zA-Z]+)\.svg\.png$

I don't know c# but the regex could be:
^.+_(\pL+)\.svg\.png
and the replace part is : $1.png

Related

Search for string w/delimiter character

I created a little console program that will search text files and return all string lines that matches a variable entered by a user. One issue I ran into is, say I want to look up "1234" which represents a location code, but there is also a phone number that has "555-1234" in the string line, I get that one back too. I am thinking if I input the delimiter (ex: ",") with the variable (",1234,") then maybe I can ensure search is accurate. Am I on the right track, or is there a better way? This is where I am at so far:
string[] file = File.ReadAllLines(sPath);
foreach (string s in file)
{
using (StreamWriter sw = File.AppendText(rPath))
{
if (sFound = Regex.IsMatch(s, string.Format(#"\b{0}\b",
Regex.Escape(searchVariable))))
{
sw.WriteLine(s);
}
}
}
I'd say you are on the right track.
I'd suggest changing the regular expressions so that it uses a negative lookbehind to match "searchVariable" that is not preceeded by "-", so "1234" in "555-1234" wouldn't be matched, but ",1234" (for instance) would.
You will only need to use "Regex.Escape()" if you want to include special regular expression characters in your search, which from your question you don't want to do.
You could change the code to something like this (it's late so I haven't tested this!):
var lines= File.ReadAllLines(sPath);
var regex = new Regex(String.Format("(?<!-){0}\b", searchVariable));
if (lines.Any())
{
using (var streamWriter = File.AppendText(rPath))
{
foreach (var line in lines)
{
if (regex.IsMatch(line))
{
streamWriter.WriteLine(line);
}
}
}
}
A great website for testing these (often tricky!) regular expressions is Regex Hero.
Use Linq to CSV and make your life easier. Just go to Nuget and search Linq to CSV.

In C#, what is the best way to parse out this value from a string?

I have to parse out the system name from a larger string. The system name has a prefix of "ABC" and then a number. Some examples are:
ABC500
ABC1100
ABC1300
the full string where i need to parse out the system name from can look like any of the items below:
ABC1100 - 2ppl
ABC1300
ABC 1300
ABC-1300
Managers Associates Only (ABC1100 - 2ppl)
before I saw the last one, i had this code that worked pretty well:
string[] trimmedStrings = jobTitle.Split(new char[] { '-', '–' },StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.ToArray();
return trimmedStrings[0];
but it fails on the last example where there is a bunch of other text before the ABC.
Can anyone suggest a more elegant and future proof way of parsing out the system name here?
One way to do this:
string[] strings =
{
"ABC1100 - 2ppl",
"ABC1300",
"ABC 1300",
"ABC-1300",
"Managers Associates Only (ABC1100 - 2ppl)"
};
var reg = new Regex(#"ABC[\s,-]?[0-9]+");
var systemNames = strings.Select(line => reg.Match(line).Value);
systemNames.ToList().ForEach(Console.WriteLine);
prints:
ABC1100
ABC1300
ABC 1300
ABC-1300
ABC1100
demo
You really could leverage a Regex and get better results. This one should do the trick [A-Za-z]{3}\d+, and here is a Rubular to prove it. Then in the code use it like this:
var matches = Regex.Match(someInputString, #"[A-Za-z]{3}\d+");
if (matches.Success) {
var val = matches.Value;
}
You can use a regular expression to parse this. There may be better expressions, but this one works for your case:
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="ABC500";
string re1="((?:[a-z][a-z]+))";
string re2="(\\d+)"
Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String word1=m.Groups[1].ToString();
String int1=m.Groups[2].ToString();
Console.Write("("+word1.ToString()+")"+"("+int1.ToString()+")"+"\n");
}
}
}
}
You should definitely use Regex for this. Depending on the exact nature of the system name, something like this could prove to be enough:
Regex systemNameRegex = new Regex(#"ABC[0-9]+");
If the ABC part of the name can change, you can modify the Regex to something like this:
Regex systemNameRegex = new Regex(#"[a-zA-Z]+[0-9]+");

Extract File extensions using regular expression in C#

I wanna write a regular expression that can extract file types from a string.
the string is like:
Text Files
(.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF
Files (.pdf)|.pdf|Excel Files
(.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)
result e.g.
.prn
You have the dialog filterformat.
The extensions already appear twice (first appearance is unreliable) and when you try to handle this with a RegEx directly you'll have to think about
Text.Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|
etc.
It looks safer to follow the known structure:
string filter = "Text Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF Files (.pdf)|.pdf|Excel Files (.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)";
string[] filterParts = filter.Split("|");
// go through the odd sections
for (int i = 1; i < filterParts.Length; i += 2)
{
// approx, you may want some validation here first
string filterPart = filterParts[i];
string[] fileTypes = filterPart.Split(";");
// add to collection
}
This (only) requires that the filter string has the correct syntax.
Regex extensionRegex = new Regex(#"\.\w+");
foreach(Match m in extensionRegex.Matches(text))
{
Console.WriteLine(m.Value);
}
If that string format you have there is fairly fixed, then the following should work:
\.[^.;)]+

Find/parse server-side <?abc?>-like tags in html document

I guess I need some regex help. I want to find all tags like <?abc?> so that I can replace it with whatever the results are for the code ran inside. I just need help regexing the tag/code string, not parsing the code inside :p.
<b><?abc print 'test' ?></b> would result in <b>test</b>
Edit: Not specifically but in general, matching (<?[chars] (code group) ?>)
This will build up a new copy of the string source, replacing <?abc code?> with the result of process(code)
Regex abcTagRegex = new Regex(#"\<\?abc(?<code>.*?)\?>");
StringBuilder newSource = new StringBuilder();
int curPos = 0;
foreach (Match abcTagMatch in abcTagRegex.Matches(source)) {
string code = abcTagMatch.Groups["code"].Value;
string result = process(code);
newSource.Append(source.Substring(curPos, abcTagMatch.Index));
newSource.Append(result);
curPos = abcTagMatch.Index + abcTagMatch.Length;
}
newSource.Append(source.Substring(curPos));
source = newSource.ToString();
N.B. I've not been able to test this code, so some of the functions may be slightly the wrong name, or there may be some off-by-one errors.
var new Regex(#"<\?(\w+) (\w+) (.+?)\?>")
This will take this source
<b><?abc print 'test' ?></b>
and break it up like this:
Value: <?abc print 'test' ?>
SubMatch: abc
SubMatch: print
SubMatch: 'test'
These can then be sent to a method that can handle it differently depending on what the parts are.
If you need more advanced syntax handling you need to go beyond regex I believe.
I designed a template engine using Antlr but thats way more complex ;)
exp = new Regex(#"<\?abc print'(.+)' \?>");
str = exp.Replace(str, "$1")
Something like this should do the trick. Change the regexes how you see fit

C# file upload: no groups from reg ex?

this code was working fine till this morning, can anyone spot my mistake? probably really silly but it has me stumped!
i use a form to submit a file (field name 'fileUpEx'), and then i wrote a class to upload it (like i said, it's been working for ages)....
(if i write 'filepath' to the page it is 'Test copy.pdf')
My class returns 'no groups'!!!
Very odd, can anyone please help?
string filepath = fileUpEx.PostedFile.FileName;
string pat = #"\\(?:.+)\\(.+)\.(.+)";
Regex r = new Regex(pat);
Match m = r.Match(filepath);
if (m.Groups[0].Captures.Count != 0)
{
//blaa blaa blaa
}
else
{
return "no Groups";
}
Thanks in advance,
Vauneen
Your regular expression requires that the file path contains a backslash which it doesn't. You could perhaps make that part optional, for example:
#"(?:\\.+\\)?(.+)\.(.+)"
Alternatively you could use the methods available in System.IO.Path:
string extension = Path.GetExtension(filePath);
string filename = Path.GetFilenameWithoutExtension(filePath);

Categories