Regex to get the file extension - c#

I have a list which contains file names (without their full path)
List<string> list=new List<string>();
list.Add("File1.doc");
list.Add("File2.pdf");
list.Add("File3.xls");
foreach(var item in list) {
var val=item.Split('.');
var ext=val[1];
}
I don't want to use String.Split, how will I get the extension of the file with regex?

You don't need to use regex for that. You can use Path.GetExtension method.
Returns the extension of the specified path string.
string name = "notepad.exe";
string ext = Path.GetExtension(name).Replace(".", ""); // exe
Here is a DEMO.

To get the extension using regex:
foreach (var item in list) {
var ext = Regex.Match( item, "[^.]+$" ).Value;
}
Or if you want to make sure there is a dot:
#"(?<=\.)[^.]+$"

You could use Path.GetExtension().
Example (also removes the dot):
string filename = "MyAwesomeFileName.ext";
string extension = Path.GetExtension(filename).Replace(".", "");
// extension now contains "ext"

The regex is
\.([A-Za-z0-9]+)$
Escaped period, 1 or more alpha-numeric characters, end of string
You could also use LastIndexOf(".")
int delim = fileName.LastIndexOf(".");
string ext = fileName.Substring(delim >= 0 ? delim : 0);
But using the built in function is always more convenient.

For the benefit of googlers -
I was dealing with bizarre filenames e.g. FirstPart.SecondPart.xml, with the extension being unknown.
In this case, Path.GetFileExtension() got confused by the extra dots.
The regex I used was
\.[A-z]{3,4}$
i.e. match the last instance of 3 or 4 characters with a dot in front only. You can test it here at Regexr. Not a prize winner, but did the trick.
The obvious flaw is that if the second part were 3-4 chars and the file had no extension, it would pick that up, however I knew that was not a situation I would encounter.

"\\.[^\\.]+" matches anything that starts with . character followed by 1 or more no . characters.
By the way the others are right, regex is overkill here.

Related

Extract all numbers from string

Let's say I have a string such as 123ad456. I want to make a method that separates the groups of numbers into a list, so then the output will be something like 123,456.
I've tried doing return Regex.Match(str, #"-?\d+").Value;, but that only outputs the first occurrence of a number, so the output would be 123. I also know I can use Regex.Matches, but from my understanding, that would output 123456, not separating the different groups of numbers.
I also see from this page on MSDN that Regex.Match has an overload that takes the string to find a match for and an int as an index at which to search for the match, but I don't see an overload that takes in the above in addition to a parameter for the regex pattern to search for, and the same goes for Regex.Matches.
I guess the approach to use would be to use a for loop of some sort, but I'm not entirely sure what to do. Help would be greatly appreciated.
All you have to to use Matches instead of Match. Then simply iterate over all matches:
string result = "";
foreach (Match match in Regex.Matches(str, #"-?\d+"))
{
result += match.result;
}
You may iterate over string data using foreach and use TryParse to check each character.
foreach (var item in stringData)
{
if (int.TryParse(item.ToString(), out int data))
{
// int data is contained in variable data
}
}
Using a combination of string.Join and Regex.Matches:
string result = string.Join(",", Regex.Matches(str, #"-?\d+").Select(m => m.Value));
string.Join performs better than continually appending to an existing string.
\d+ is the regex for integer numbers;
//System.Text.RegularExpressions.Regex
resultString = Regex.Match(subjectString, #"\d+").Value;
returns a string with the very first occurence of a number in subjectString.
Int32.Parse(resultString) will then give you the number.

C# Regex to Get file name without extension?

I want to use regex to get a filename without extension. I'm having trouble getting regex to return a value. I have this:
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var name = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)").Value;
In this case, name always comes back as C:\PERSONAL\TEST\TESTFILE.PDF. What am I doing wrong, I think my search pattern is correct?
(I am aware that I could use Path.GetFileNameWithoutExtension(path);but I specifically want to try using regex)
You need Group[1].Value
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var match = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)");
if(match.Success)
{
var name = match.Groups[1].Value;
}
match.Value returns the Captures.Value which is the entire match
match.Group[0] always has the same value as match.Value
match.Group[1] return the first capture value
For example:
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var match = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)");
if(match.Success)
{
Console.WriteLine(match.Value);
// return the substring of the matching part
//Output: C:\\PERSONAL\\TEST\\TESTFILE.PDF
Console.WriteLine(match.Groups[0].Value)
// always the same as match.Value
//Output: C:\\PERSONAL\\TEST\\TESTFILE.PDF
Console.WriteLine(match.Groups[1].Value)
// return the first capture group which is (.+?) in this case
//Output: C:\\PERSONAL\\TEST\\TESTFILE
Console.WriteLine(match.Groups[2].Value)
// return the second capture group which is (\.[^\.]+$|$) in this case
//Output: .PDF
}
Since the data is on the right side of the string, tell the regex parser to work from the end of the string to the beginning by using the option RightToLeft. Which will significantly reduce the processing time as well as lessen the actual pattern needed.
The pattern below reads from left to right and says, give me everything that is not a \ character (to consume/match up to the slash and not proceed farther) and start consuming up to a period.
Regex.Match(#"C:\PERSONAL\TEST\TESTFILE.PDF",
#"([^\\]+)\.",
RegexOptions.RightToLeft)
.Groups[1].Value
Prints out
TESTFILE
Try this:
.*(?=[.][^OS_FORBIDDEN_CHARACTERS]+$)
For Windows:
OS_FORBIDDEN_CHARACTERS = :\/\\\?"><\|
this is a sleight modification of:
Regular expression get filename without extention from full filepath
If you are fine to match forbidden characters then simplest regex would be:
.*(?=[.].*$)
Can be a bit shorter and greedier:
var name = Regex.Replace(#"C:\PERS.ONAL\TEST\TEST.FILE.PDF", #".*\\(.*)\..*", "$1"); // "TEST.FILE"

How to write regular expression to get the substring from the string using regular expression in c#?

I have following string
string s=#"\Users\Public\Roaming\Intel\Wireless\Settings";
I want output string like
string output="Wireless";
Sub-string what I want should be after "Intel\" and it should ends with the first "\" after "Intel\" before string Intel and after Intel the string may be different.
I have achieved it using string.substring() but I want to get it using regular expression ? what regular expression should I write to get that string.
For a regex solution you may use:
(?<=intel\\)([^\\]+?)[\\$]
Demo
Notice the i flag.
BTW, Split is much simpler and faster solution than regexes. Regex is associated with patterns of string. For a static/fixed string structure, it is a wise solution to manipulate it with string functions.
With regex, it will look like
var txt = #"\Users\Public\Roaming\Intel\Wireless\Settings";
var res = Regex.Match(txt, #"Intel\\([^\\]+)", RegexOptions.IgnoreCase).Groups[1].Value;
But usually, you should use string methods with such requirements. Here is a demo code (without error checking):
var strt = txt.IndexOf("Intel\\") + 6; // 6 is the length of "Intel\"
var end = txt.IndexOf("\\", strt + 1); // Look for the next "\"
var res2 = txt.Substring(strt, end - strt); // Get the substring
See IDEONE demo
You could also use this if you want everything AFTER the intel/
/(?:intel\\)((\w+\\?)+)/gi
http://regexr.com/3blqm
You would need the $1outcome. Note that $1 will be empty or none existent if the string does not contain Intel/ or anything after it.
Why not use Path.GetDirectoryName and Path.GetFileName for this:
string s = #"\Users\Public\Roaming\Intel\Wireless\Settings";
string output = Path.GetFileName(Path.GetDirectoryName(s));
Debug.Assert(output == "Wireless");
It is possible to iterate over directory components until you find the word Intel and return the next component.

Please tell me what the problem is c# regex.split()

string temp_constraint = row["Constraint_Name"].ToString();
string split_string = "FK_"+tableName+"_";
string[] words = Regex.Split(temp_constraint, split_string);
I am trying to split a string using another string.
temp_constraint = FK_ss_foo_ss_fee
split_string = FK_ss_foo_
but it returns a single dimension array with the same string as in temp_constraint
Please help
Your split operation works fine for me:
string temp_constraint = "FK_ss_foo_ss_fee";
string split_string = "FK_ss_foo_";
string[] words = Regex.Split(temp_constraint, split_string);
foreach (string word in words)
{
Console.WriteLine(">{0}<", word);
}
Output:
><
>ss_fee<
I think the problem is that your variables are not set to what you think they are. You will need to debug to find the error elsewhere in your program.
I would also avoid using Split for this (both Regex and String.Split). You aren't really splitting the input - you are removing a string from the start. Split might not always do what you want. Imagine if you have a foreign key like the following:
FK_ss_foo_ss_fee_FK_ss_foo_ss_bee
You want to get ss_fee_FK_ss_foo_ss_bee but split would give you ss_fee_ and ss_bee. This is a contrived example, but it does demonstrate that what you are doing is not a split.
You should use String.Split instead
string[] words =
temp_constraint.Split(new []{split_string}, StringSplitOptions.None);
string split uses a character array to split text and does the split by each character which is not often ideal.
The following article shows how to split text by an entire word
http://www.bytechaser.com/en/functions/ufgr7wkpwf/split-text-by-words-and-not-character-arrays.aspx

C# string manipulation

I have a string like
A150[ff;1];A160;A100;D10;B10'
in which I want to extract A150, D10, B10
In between these valid string, i can have any characters. The one part that is consistent is the semicolumn between each legitimate strings.
Again the junk character that I am trying to remove itself can contain the semi column
Without having more detail for the specific rules, it looks like you want to use String.Split(';') and then construct a regex to parse out the string you really need foreach string in your newly created collection. Since you said that the semi colon can appear in the "junk" it's irrelevant since it won't match your regex.
var input = "A150[ff+1];A160;A150[ff-1]";
var temp = new List<string>();
foreach (var s in input.Split(';'))
{
temp.Add(Regex.Replace(s, "(A[0-9]*)\\[*.*", "$1"));
}
foreach (var s1 in temp.Distinct())
{
Console.WriteLine(s1);
}
produces the output
A150
A160
First,you should use
string s="A150[ff;1];A160;A100;D10;B1";
s.IndexOf("A160");
Through this command you can get the index of A160 and other words.
And then s.Remove(index,count).
If you only want to remove the 'junk' inbetween the '[' and ']' characters you can use regex for that
Regex regex = new Regex(#"\[([^\}]+)\]");
string result = regex.Replace("A150[ff;1];A160;A100;D10;B10", "");
Then String.Split to get the individual items

Categories