Rename file using regular expression - c#

What I want to do is rename all file in a particular folder, such that if a filename contains any digit in it, it is removed.
say, if a filename is
someFileName.someExtension it remains the same, but if a file is like this,
03 - Rocketman Elton John it should be renamed to Rocketman Elton John (I did the part to remove the -), another example, if the filename is 15-Trey Songz - Unfortunate (Prod. by Noah 40 Shebib) it should be renamed to Trey Songz Unfortunate (Prod. by Noah Shebib) (again I can remove -). The user is asked to select the folder like this
private void txtFolder_MouseDown(object sender, MouseEventArgs e)
{
FolderBrowserDialog fd = new FolderBrowserDialog();
fd.RootFolder = Environment.SpecialFolder.Desktop;
fd.ShowNewFolderButton = true;
if (fd.ShowDialog() == DialogResult.OK)
{
txtFolder.Text = fd.SelectedPath;
}
}
Also, it renaming starts like this
private void btnGo_Click(object sender, EventArgs e)
{
StartRenaming(txtFolder.Text);
}
and
private void StartRenaming(string FolderName)
{
string[] files = Directory.GetFiles(FolderName);
foreach (string file in files)
RenameFile(file);
}
Now in rename file, I need the function, the regular expression that will remove any number(s) in file.
Its is implemented as
private void RenameFile(string FileName)
{
string fileName = Path.GetFileNameWithoutExtension(FileName);
/* here the function goes that will find numbers in filename using regular experssion and replace them */
}
so what I can do is, I can use something like
1 var matches = Regex.Matches(fileName, #"\d+");
2
3 if (matches.Count == 0)
4 return null;
5
6 // start the loop
7 foreach(var match in matches)
8 {
9 fileName = fileName.Replace(match, ""); /* or fileName.Replace(match.ToString(), ""), whatever be the case */
10 }
11 File.Move(FileName, Path.Combine(Path.GetDirectoryName(FileName), fileName));
12 return;
But I don't think that's the right way to do it? Is there any better option to do this? or is this the best (and only option) to do this? Also, is there anything like IN in String.Replace? Say in sql I can use IN in a select command and specify a bunch of where conditions, but is there something like this with String.Replace so that I don't have to run the loop I ran from line 7 to 10? Are there any other better options?
ps: about that regex, I posted a question Regular Expression for numbers? (apparently I wasn't clear enough) and from that I got my regex, if you think someother regex would do better please tell me, also if you need any other information please let me know...

You can try Regex.Replace to remove digits, ie:
Regex.Replace(fileName, #"\d", "");

In the off chance that you are merely looking to simply rename the files and you thought that creating your own program would be the best way - I would recommend PFrank as a standalone tool (especially if you understand regex already)
If you do desire this and if you do take my suggestion (and since it's not the simplest and clearest interface), you would use \d+(\s?-)? for the match expression (in the first column in PFrank), which should match any number of digits, optionally followed by a hyphen and an additional optional whitespace character between the two. You would then have no replacement expression (zero-length string or an empty second column in PFrank). Finally, select the folder containing the files you want renamed and click the scan button; in the dialog that pops up, confirm your results and click the rename button. Sorry if I wasted anyone's time!

For replacing you should look into Regex.Replace which can replace all occurences at once.
Otherwise code look ok (with exception of strange fileName.Replace("match", "") which uses constant string...)

How about this ?
private void StartRenaming(string FolderName)
{
string[] files = Directory.GetFiles(FolderName);
string[] applicableFiles = (from string s in files
where Regex.IsMatch(s, #"(\d+)|(-+)", RegexOptions.None)
select s).ToArray<string>();
foreach (string file in applicableFiles)
RenameFile(file);
}
private void RenameFile(string file)
{
string newFileName = Regex.Replace(file, #"(\d+)|(-+)", "");
File.Move(file, Path.Combine(Path.GetDirectoryName(file), newFileName));
}
StartRenaming method will now limit the number of files to be processed based on Regex match. If the file contains a digit or - then it will be processed, thus optimizing the complete process.
RenameFile replaces digits and - in a string and gives you a newFileName
I am not quite sure about the correctness of File.Move(file, Path.Combine(Path.GetDirectoryName(file), newFileName)); though, but I guess your problem was to avoid the foreach loop, and I think I have provided an appropriate solution.
Please note that I was not able to completely test this, so let me know whether it works for you and if it doesn't I will be happy to help you further.
EDIT : Forgot to mention that file.Replace(#"(\d+)|(-+)", "") will remove digits as well as - from the file string.
EDIT : Corrected file.Replace to Regex.Replace

I prefer to use brackets to select the before and after and then use the $n method to rebuild the string how you want it to be.
"03 - Rocketman Elton John" -Replace '^([^-]*) - ([^-]*)', '$1 $2'

Related

Find and replace file lines

I have a text file with over 12,000 lines. In that file I need to replace certain lines.
Some lines begin with a ;, some have random words, some start with space. However, I am only concerned with the two types of lines I describe below.
I have a line like
SET avariable:0 ;Comments
and I need to replace it to look like
set aDIFFvariable:0 :Integer // comments
The only CASE that is necessary is in the word Integer I needs to be capitalized.
I also have
String aSTRING(7) ;Comment
that needs to look like
STRING aSTRING(7) :array [0..7] of AnsiChar; // Comments
I need to keep all the spacing the same.
Here is what I have so far
static void Main(string[] args)
{
string text = File.ReadAllText("C:\\old.txt");
text = text.Replace("old text", "new text");
File.WriteAllText("C:\\new.txt", text);
}
I think I need to use REGEX, which I have tried to make for my first example:
\s\s[set]\s*{4}.*[:0]\s*[;].* <-- I now know this is invalid - please advise
I need help with properly setting up my program to find and replace those lines. Should I read one line at a time and if it matches then do something? I am confused really as to where to start.
BRIEF pseudo code of what I want to do
//open file
//step through file
//if line == [regex] then add/replace as needed
//else, go to next line
//if EOF, close file
Taking a stab at this separately because each line is so radically different that capturing both in the same expression will be a nightmare.
To match your first example and replace it:
String input = "SET avariable:0 ;Comments";
if (Regex.IsMatch(input, #"\s?(set)\s*(\w+):?(\d)\s+;?(.*)?"))
{
input = Regex.Replace(input, #"\s?(set)\s*(\w+):?(\d)\s+;?(.*)?", "$1 $2:$3 :Integer // $4";
}
Give that a shot (Play with it here: http://regex101.com/r/zY7hV2)
To match your second example and replace it:
String input = "String aSTRING(7) ;Comments";
if (Regex.IsMatch(input, #"\s?(string)\s*(\w+)\((\d)\)\s*;(.*)"))
{
input = Regex.Replace(input, #"\s?(string)\s*(\w+)\((\d)\)\s*;(.*)", "$1 $2($3) :array [0..$3] of AnsiChar; // $4";
}
And play around with this one here: http://regex101.com/r/jO5wP5

Search for string w/delimiter character

I created a little console program that will search text files and return all string lines that matches a variable entered by a user. One issue I ran into is, say I want to look up "1234" which represents a location code, but there is also a phone number that has "555-1234" in the string line, I get that one back too. I am thinking if I input the delimiter (ex: ",") with the variable (",1234,") then maybe I can ensure search is accurate. Am I on the right track, or is there a better way? This is where I am at so far:
string[] file = File.ReadAllLines(sPath);
foreach (string s in file)
{
using (StreamWriter sw = File.AppendText(rPath))
{
if (sFound = Regex.IsMatch(s, string.Format(#"\b{0}\b",
Regex.Escape(searchVariable))))
{
sw.WriteLine(s);
}
}
}
I'd say you are on the right track.
I'd suggest changing the regular expressions so that it uses a negative lookbehind to match "searchVariable" that is not preceeded by "-", so "1234" in "555-1234" wouldn't be matched, but ",1234" (for instance) would.
You will only need to use "Regex.Escape()" if you want to include special regular expression characters in your search, which from your question you don't want to do.
You could change the code to something like this (it's late so I haven't tested this!):
var lines= File.ReadAllLines(sPath);
var regex = new Regex(String.Format("(?<!-){0}\b", searchVariable));
if (lines.Any())
{
using (var streamWriter = File.AppendText(rPath))
{
foreach (var line in lines)
{
if (regex.IsMatch(line))
{
streamWriter.WriteLine(line);
}
}
}
}
A great website for testing these (often tricky!) regular expressions is Regex Hero.
Use Linq to CSV and make your life easier. Just go to Nuget and search Linq to CSV.

how to find indexof substring in a text file

I have converted an asp.net c# project to framework 3.5 using VS 2008. Purpose of app is to parse a text file containing many rows of like information then inserting the data into a database.
I didn't write original app but developer used substring() to fetch individual fields because they always begin at the same position.
My question is:
What is best way to find the index of substring in text file without having to manually count the position? Does someone have preferred method they use to find position of characters in a text file?
I would say IndexOf() / IndexOfAny() together with Substring(). Alternatively, regular expressions. It the file has an XML-like structure, this.
If the files are delimited eg with commas you can use string.Split
If data is: string[] text = { "1, apple", "2, orange", "3, lemon" };
private void button1_Click(object sender, EventArgs e)
{
string[] lines = this.textBoxIn.Lines;
List<Fruit> fields = new List<Fruit>();
foreach(string s in lines)
{
char[] delim = {','};
string[] fruitData = s.Split(delim);
Fruit f = new Fruit();
int tmpid = 0;
Int32.TryParse(fruitData[0], out tmpid);
f.id = tmpid;
f.name = fruitData[1];
fields.Add(f);
}
this.textBoxOut.Clear();
string text=string.Empty;
foreach(Fruit item in fields)
{
text += item.ToString() + " \n";
}
this.textBoxOut.Text = text;
}
}
The text file I'm reading does not contain delimiters - sometimes there spaces between fields and sometimes they run together. In either case, every line is formatted the same. When I asked the question I was looking at the file in notepad.
Question was: how do you find the position in a file so that position (a number) could be specified as the startIndex of my substring function?
Answer: I've found that opening the text file in notepad++ will display the column # and line count of any position where the curser is in the file and makes this job easier.
You can use indexOf() and then use Length() as the second substring parameter
substr = str.substring(str.IndexOf("."), str.Length - str.IndexOf("."));

Trim all chars off file name after first "_"

I'd like to trim these purchase order file names (a few examples below) so that everything after the first "_" is omitted.
INCOLOR_fc06_NEW.pdf
Keep: INCOLOR (write this to db as the VendorID) Remove: _fc08_NEW.pdf
NORTHSTAR_sc09.xls
Keep: NORTHSTAR (write this to db as the VendorID) Remove: _sc09.xls
Our scenario: The managers are uploading these files to our Intranet web server, to make them available to download/view ect. I'm using Brettles NeatUpload, and for each file uploaded, am writing the files attributes into the PO table (sql 2000). The first part of the file name will be written to the DB as a VendorID.
The naming convention for these files is consistent in that the the first part of the file is always the vendor name (or Vendor ID) followed by an "_" then other unpredictable chars used to identify the type of Purchase Order then the file extention - which is consistently either .xls, .XLS, .PDF, or .pdf.
I tried TrimEnd - but the array of chars that you have to provide ends up being long and can conflict with the part of the file name I want to keep. I have a feeling I'm not using TrimEnd properly.
What is the best way to use string.TrimEnd (or any other string manipulation in C#) that will strip off all chars after the first "_" ?
String s = "INCOLOR_fc06_NEW.pdf";
int index = s.IndexOf("_");
return index >= 0 ? s.Substring(0,index) : s;
I'll probably offend the anti-regex lobby, but here I go (ducking):
string stripped = Regex.Replace(filename, #"(?<=[^_]*)_.*",String.Empty);
This code will strip all extra characters after the first '_', unless there is no '_' in the string (then it will just return the original string).
It's one line of code. It's slower than the more elaborate IndexOf() algorithm, but when used in a non-performance-sensitive part of the code, it's a good solution.
Get your flame-throwers out...
TrimEnd removes white spaces and punctuation marks at the end of the String, it won't help you here. Read more about TrimEnd here:
http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx
Bnaffas code (with a small tweak):
String fileName = "INCOLOR_fc06_NEW.pdf";
int index = fileName.IndexOf("_");
return index >= 0 ? fileName.Substring(0, index) : fileName;
If you want to do something with the other parts, you could use a Split
string fileName = "INCOLOR_fc06_NEW.pdf";
string[] parts = fileName.Split('_');
public string StripOffStuff(string sInput)
{
int iIndex = sInput.IndexOf("_");
return (iIndex > 0) ? sInput.Substring(0, iIndex) : sInput;
}
// Call it like:
string sNewString = StripOffStuff("INCOLOR_fc06_NEW.pdf");
I would go with the SubString approach but to round out the available solutions here's a LINQ approach just for fun:
string filename = "INCOLOR_fc06_NEW.pdf";
string result = new string(filename.TakeWhile(c => c != '_').ToArray());
It'll return the original string if no underscore is found.
To go with all the "alternative" solutions, here's the second one that I thought of (after substring):
string filename = "INCOLOR_fc06_NEW.pdf";
string stripped = filename.Split('_')[0];

Highlight a list of words using a regular expression in c#

I have some site content that contains abbreviations. I have a list of recognised abbreviations for the site, along with their explanations. I want to create a regular expression which will allow me to replace all of the recognised abbreviations found in the content with some markup.
For example:
content: This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.
abbreviations: memb = Member; deb = Debut;
result: This is just a little test of the [a title="Member"]memb[/a] to see if it gets picked up.
[a title="Debut"]Deb[/a] of course should also be caught here.
(This is just example markup for simplicity).
Thanks.
EDIT:
CraigD's answer is nearly there, but there are issues. I only want to match whole words. I also want to keep the correct capitalisation of each word replaced, so that deb is still deb, and Deb is still Deb as per the original text. For example, this input:
This is just a little test of the memb.
And another memb, but not amemba.
Deb of course should also be caught here.deb!
First you would need to Regex.Escape() all the input strings.
Then you can look for them in the string, and iteratively replace them by the markup you have in mind:
string abbr = "memb";
string word = "Member";
string pattern = String.Format("\b{0}\b", Regex.Escape(abbr));
string substitue = String.Format("[a title=\"{0}\"]{1}[/a]", word, abbr);
string output = Regex.Replace(input, pattern, substitue);
EDIT: I asked if a simple String.Replace() wouldn't be enough - but I can see why regex is desirable: you can use it to enforce "whole word" replacements only by making a pattern that uses word boundary anchors.
You can go as far as building a single pattern from all your escaped input strings, like this:
\b(?:{abbr_1}|{abbr_2}|{abbr_3}|{abbr_n})\b
and then using a match evaluator to find the right replacement. This way you can avoid iterating the input string more than once.
Not sure how well this will scale to a big word list, but I think it should give the output you want (although in your question the 'result' seems identical to 'content')?
Anyway, let me know if this is what you're after
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var input = #"This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.";
var dictionary = new Dictionary<string,string>
{
{"memb", "Member"}
,{"deb","Debut"}
};
var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")";
foreach (Match metamatch in Regex.Matches(input
, regex /*#"(memb)|(deb)"*/
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
input = input.Replace(metamatch.Value, dictionary[metamatch.Value.ToLower()]);
}
Console.Write (input);
Console.ReadLine();
}
}
}
For anyone interested, here is my final solution. It is for a .NET user control. It uses a single pattern with a match evaluator, as suggested by Tomalak, so there is no foreach loop. It's an elegant solution, and it gives me the correct output for the sample input while preserving correct casing for matched strings.
public partial class Abbreviations : System.Web.UI.UserControl
{
private Dictionary<String, String> dictionary = DataHelper.GetAbbreviations();
protected void Page_Load(object sender, EventArgs e)
{
string input = "This is just a little test of the memb. And another memb, but not amemba to see if it gets picked up. Deb of course should also be caught here.deb!";
var regex = "\\b(?:" + String.Join("|", dictionary.Keys.ToArray()) + ")\\b";
MatchEvaluator myEvaluator = new MatchEvaluator(GetExplanationMarkup);
input = Regex.Replace(input, regex, myEvaluator, RegexOptions.IgnoreCase);
litContent.Text = input;
}
private string GetExplanationMarkup(Match m)
{
return string.Format("<b title='{0}'>{1}</b>", dictionary[m.Value.ToLower()], m.Value);
}
}
The output looks like this (below). Note that it only matches full words, and that the casing is preserved from the original string:
This is just a little test of the <b title='Member'>memb</b>. And another <b title='Member'>memb</b>, but not amemba to see if it gets picked up. <b title='Debut'>Deb</b> of course should also be caught here.<b title='Debut'>deb</b>!
I doubt it will perform better than just doing normal string.replace, so if performance is critical measure (refactoring a bit to use a compiled regex). You can do the regex version as:
var abbrsWithPipes = "(abbr1|abbr2)";
var regex = new Regex(abbrsWithPipes);
return regex.Replace(html, m => GetReplaceForAbbr(m.Value));
You need to implement GetReplaceForAbbr, which receives the specific abbr being matched.
I'm doing pretty exactly what you're looking for in my application and this works for me:
the parameter str is your content:
public static string GetGlossaryString(string str)
{
List<string> glossaryWords = GetGlossaryItems();//this collection would contain your abbreviations; you could just make it a Dictionary so you can have the abbreviation-full term pairs and use them in the loop below
str = string.Format(" {0} ", str);//quick and dirty way to also search the first and last word in the content.
foreach (string word in glossaryWords)
str = Regex.Replace(str, "([\\W])(" + word + ")([\\W])", "$1<span class='glossaryItem'>$2</span>$3", RegexOptions.IgnoreCase);
return str.Trim();
}

Categories