Extract File extensions using regular expression in C#

Extract File extensions using regular expression in C# - c#

I wanna write a regular expression that can extract file types from a string.
the string is like:
Text Files
(.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF
Files (.pdf)|.pdf|Excel Files
(.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)
result e.g.
.prn

You have the dialog filterformat.
The extensions already appear twice (first appearance is unreliable) and when you try to handle this with a RegEx directly you'll have to think about
Text.Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|
etc.
It looks safer to follow the known structure:
string filter = "Text Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF Files (.pdf)|.pdf|Excel Files (.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw)";
string[] filterParts = filter.Split("|");
// go through the odd sections
for (int i = 1; i < filterParts.Length; i += 2)
{
// approx, you may want some validation here first
string filterPart = filterParts[i];
string[] fileTypes = filterPart.Split(";");
// add to collection
}
This (only) requires that the filter string has the correct syntax.

Regex extensionRegex = new Regex(#"\.\w+");
foreach(Match m in extensionRegex.Matches(text))
{
Console.WriteLine(m.Value);
}

If that string format you have there is fairly fixed, then the following should work:
\.[^.;)]+

Related

Filter a string variable from a file and change it

I am currently working on a basic program but I'm having trouble selecting the string variables. All the variables begin with _ and then a relevant string of words follow.

First, you don't need to read the whole file and save it again inside the loop that goes through each line, you can instead do all your replacements then save once.
For replacement, you can use regular expression replacement. I'm not sure what kind of transformation you're looking to do on the text, but you can do something like the following (in this example I'm just transforming text to upper case):
string[] textLines = File.ReadAllLines(#"c:\Users\Darren\Desktop\Hello.txt");
var results = new List<string>();
foreach(var line in textLines)
{
var result = Regex.Replace(line, #"_(.*)\s", match =>
{
return $"~{match.Groups[1].Value.ToUpper()} ";
});
results.Add(result);
}
File.WriteAllLines(#"c:\Users\Darren\Desktop\newHello.txt", results.ToArray());
Instead of return $"~{match.Groups[1].Value.ToUpper()} "; you can place the code that transforms the text in the way you want. match.Groups[1].Value will contain the text after the _ and before the space

How to get the files in numeric order from the specified directory in c#?

I have to retrieve list of file names from the specific directory using numeric order.Actually file names are combination of strings and numeric values but end with numeric values.
For example : page_1.png,page_2.png,page3.png...,page10.png,page_11.png,page_12.png...
my c# code is below :
string filePath="D:\\vs-2010projects\\delete_sample\\delete_sample\\myimages\\";
string[] filePaths = Directory.GetFiles(filePath, "*.png");
It retrieved in the following format:
page_1.png
page_10.png
page_11.png
page_12.png
page_2.png...
I am expecting to retrieve the list ordered like this:
page_1.png
page_2.png
page_3.png
[...]
page_10.png
page_11.png
page_12.png

Ian Griffiths has a natural sort for C#. It makes no assumptions about where the numbers appear, and even correctly sorts filenames with multiple numeric components, such as app-1.0.2, app-1.0.11.

You can try following code, which sort your file names based on the numeric values. Keep in mind, this logic works based on some conventions such as the availability of '_'. You are free to modify the code to add more defensive approach save you from any business case.
var vv = new DirectoryInfo(#"C:\Image").GetFileSystemInfos("*.bmp").OrderBy(fs=>int.Parse(fs.Name.Split('_')[1].Substring(0, fs.Name.Split('_')[1].Length - fs.Extension.Length)));

First you can extract the number:
static int ExtractNumber(string text)
{
Match match = Regex.Match(text, #"_(\d+)\.(png)");
if (match == null)
{
return 0;
}
int value;
if (!int.TryParse(match.Value, out value))
{
return 0;
}
return value;
}
Then you could sort your list using:
list.Sort((x, y) => ExtractNumber(x).CompareTo(ExtractNumber(y)));

Maybe this?
string[] filePaths = Directory.GetFiles(filePath, "*.png").OrderBy(n => n);
EDIT: As Marcelo pointed, I belive you can get get all file names you can get their numerical part with a regex, than you can sort them including their file names.

This code would do that:
var dir = #"C:\Pictures";
var sorted = (from fn in Directory.GetFiles(dir)
let m = Regex.Match(fn, #"(?<order>\d+)")
where m.Success
let n = int.Parse(m.Groups["order"].Value)
orderby n
select fn).ToList();
foreach (var fn in sorted) Console.WriteLine(fn);
It also filters out those files that has not a number in their names.
You may want to change the regex pattern to match more specific name structures for file names.

how to find indexof substring in a text file

I have converted an asp.net c# project to framework 3.5 using VS 2008. Purpose of app is to parse a text file containing many rows of like information then inserting the data into a database.
I didn't write original app but developer used substring() to fetch individual fields because they always begin at the same position.
My question is:
What is best way to find the index of substring in text file without having to manually count the position? Does someone have preferred method they use to find position of characters in a text file?

I would say IndexOf() / IndexOfAny() together with Substring(). Alternatively, regular expressions. It the file has an XML-like structure, this.

If the files are delimited eg with commas you can use string.Split
If data is: string[] text = { "1, apple", "2, orange", "3, lemon" };
private void button1_Click(object sender, EventArgs e)
{
string[] lines = this.textBoxIn.Lines;
List<Fruit> fields = new List<Fruit>();
foreach(string s in lines)
{
char[] delim = {','};
string[] fruitData = s.Split(delim);
Fruit f = new Fruit();
int tmpid = 0;
Int32.TryParse(fruitData[0], out tmpid);
f.id = tmpid;
f.name = fruitData[1];
fields.Add(f);
}
this.textBoxOut.Clear();
string text=string.Empty;
foreach(Fruit item in fields)
{
text += item.ToString() + " \n";
}
this.textBoxOut.Text = text;
}
}

The text file I'm reading does not contain delimiters - sometimes there spaces between fields and sometimes they run together. In either case, every line is formatted the same. When I asked the question I was looking at the file in notepad.
Question was: how do you find the position in a file so that position (a number) could be specified as the startIndex of my substring function?
Answer: I've found that opening the text file in notepad++ will display the column # and line count of any position where the curser is in the file and makes this job easier.

You can use indexOf() and then use Length() as the second substring parameter
substr = str.substring(str.IndexOf("."), str.Length - str.IndexOf("."));

Regex required for renaming file in C#

I need a regex for renaming file in c#. My file name is 22px-Flag_Of_Sweden.svg.png. I want it to rename as sweden.png.
So for that I need regex. Please help me.
I have various files more than 300+ like below:
22px-Flag_Of_Sweden.svg.png - should become sweden.png
13px-Flag_Of_UnitedStates.svg.png - unitedstates.png
17px-Flag_Of_India.svg.png - india.png
22px-Flag_Of_Ghana.svg.png - ghana.png
These are actually flags of country. I want to extract Countryname.Fileextension. Thats all.

var fileNames = new [] {
"22px-Flag_Of_Sweden.svg.png"
,"13px-Flag_Of_UnitedStates.svg.png"
,"17px-Flag_Of_India.svg.png"
,"22px-Flag_Of_Ghana.svg.png"
,"asd.png"
};
var regEx = new Regex(#"^.+Flag_Of_(?<country>.+)\.svg\.png$");
foreach ( var fileName in fileNames )
{
if ( regEx.IsMatch(fileName))
{
var newFileName = regEx.Replace(fileName,"${country}.png").ToLower();
//File.Save(Path.Combine(root, newFileName));
}
}

I am not exactly sure how this would look in c# (although the regex is important and not the language), but in Java this would look like this:
String input = "22px-Flag_Of_Sweden.svg.png";
Pattern p = Pattern.compile(".+_(.+?)\\..+?(\\..+?)$");
Matcher m = p.matcher(input);
System.out.println(m.matches());
System.out.println(m.group(1).toLowerCase() + m.group(2));
Where the relevant for you is this part :
".+_(.+?)\\..+?(\\..+?)$"
Just concat the two groups.
I wish I knew a bit of C# right now :)
Cheers Eugene.

This will return country in the first capture group: ([a-zA-Z]+)\.svg\.png$

I don't know c# but the regex could be:
^.+_(\pL+)\.svg\.png
and the replace part is : $1.png

searching a textfile for a keyword

I have a text file with names as balamurugan,chendurpandian,......
if i give a value in the textbox as ba ....
If i click a submit button means i have to search the textfile for the value ba and display as pattern matched....
I have read the text file using
string FilePath = txtBoxInput.Text;
and displayed it in a textbox using
textBoxContents.Text = File.ReadAllText(FilePath);
But i dont know how to search a word in a text file using c# can anyone give suggestion???

You can simply use:
textBoxContents.Text.Contains(keyword)
This will return true if your text contains your chosen keyword.

Depends upon the kind of pattern matching that you needs - you can use as simple as String.Contains method or can try out Regular Expressions that will give you more control on how you want to search and give all matches at the same time. Here are couple of links to get you started quickly on regular expressions:
http://www.codeproject.com/KB/dotnet/regextutorial.aspx
http://www.developer.com/open/article.php/3330231/Regular-Expressions-Primer.htm

First, you should split up the input string, after which you could do a contains on each value:
// On file read:
String[] values = File.ReadAllText(FilePath);
// On search:
List<String> results = new List<String>();
for(int i = 0; i < values.Length; i++) {
if(values[i].Contains(search)) results.Add(values[i]);
}
Alternatively, if you only want it to search at the beginning or the end of the string, you can use StartsWith or EndsWith, respectively:
// Only match beginnging
values[i].StartsWith(search);
// Only match end
values[i].EndsWith(search);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract File extensions using regular expression in C# - c#

I wanna write a regular expression that can extract file types from a string. the string is like: Text Files (.prn;.txt;.rtf;.csv;.wq1)|.prn;.txt;.rtf;.csv;.wq1|PDF Files (.pdf)|.pdf|Excel Files (.xls;.xlsx;.xlsm;.xlsb;.xlam;.xltx;.xltm;.xlw) result e.g. .prn

Regex extensionRegex = new Regex(#"\.\w+"); foreach(Match m in extensionRegex.Matches(text)) { Console.WriteLine(m.Value); }

If that string format you have there is fairly fixed, then the following should work: \.[^.;)]+

Related

Filter a string variable from a file and change it

How to get the files in numeric order from the specified directory in c#?

how to find indexof substring in a text file

Regex required for renaming file in C#

searching a textfile for a keyword

Categories

Resources