Match Characters after last dot in string - c#

I have a string and I want to get the words after the last dot in the string.
Example:
input string = "XimEngine.DynamicGui.PickKind.DropDown";
Result:
DropDown

There's no need in Regex, let's find out the last . and get Substring:
string result = input.Substring(input.LastIndexOf('.') + 1);
If input doesn't have . the entire input will be returned

Not a RegEx answer, but you could do:
var result = input.Split('.').Last();

In Regex you can tell the parser to work from the end of the string/buffer by specifying the option RightToLeft.
By using that we can just specify a forward pattern to find a period (\.) and then capture (using ( )) our text we are interested into group 1 ((\w+)).
var str = "XimEngine.DynamicGui.PickKind.DropDown";
Console.WriteLine(Regex.Match(str,
#"\.(\w+)",
RegexOptions.RightToLeft).Groups[1].Value);
Outputs to console:
DropDown
By working from the other end of the string means we don't have to deal with anything at the beginning of the string to where we need to extract text.

Related

Extract value from a string in C# from a specific position

I have bunch of files in a folder and I am looping through them.
How do I extract the value from the below example? I need the value 0519 only.
DOC 75-20-0519-1.PDF
The below code gives the complete part include -1.
Convert.ToInt32(Path.GetFileNameWithoutExtension(objFile).Split('-')[2]);
Appreciate any help.
You can try regular expressions in order to match the value.
pattern:
[0-9]+ - one ore more digits
(?=[^0-9][0-9]+$) - followed by not a digit and one or more digits and end of string
code:
using System.Text.RegularExpressions;
...
string file = "DOC 75-20-0519-1.PDF";
// "0519"
string result = Regex
.Match(Path.GetFileNameWithoutExtension(file), #"[0-9]+(?=[^0-9][0-9]+$)")
.Value;
If Split('-') fails, and you have an entire string as a result, it seems that you have a wrong delimiter. It can be, say, one of the dashes:
"DOC 75–20–0519–1.PDF"; // n-dash
"DOC 75—20—0519—1.PDF"; // m-dash
You can use REGEX for this
Match match = Regex.Match("DOC 75-20-0519-1.PDF", #"DOC\s+\d+\-\d+\-(\d+)\-\d+", RegexOptions.IgnoreCase);
string data = match.Groups[1].Value;

Regular expression match between string and last digit

I'm trying to come up with a regular expression matches the text in bold in all the examples.
Between the string "JZ" and any character before "-"
JZ123456789-301A
JZ134255872-22013
Between the string "JZ" and the last character
JZ123456789D
I have tried the following but it only works for the first example
(?<=JZ).*(?=-)
You can use (?<=JZ)[0-9]+, presuming the desired text will always be numeric.
Try it out here
You may use
JZ([^-]*)(?:-|.$)
and grab Group 1 value. See the regex demo.
Details
JZ - a literal substring
([^-]*) - Capturing group 1: zero or more chars other than -
(?:-|.$) - a non-capturing group matching either - or any char at the end of the string
C# code:
var m = Regex.Match(s, #"JZ([^-]*)(?:-|.$)");
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
If, for some reason, you need to obtain the required value as a whole match, use lookarounds:
(?<=JZ)[^-]*(?=-|.$)
See this regex variation demo. Use m.Value in the code above to grab the value.
A one-line answer without regex:
string s,r;
// if your string always starts with JZ
s = "JZ123456789-301A";
r = string.Concat(s.Substring(2).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
// if your string starts with anything
s = "A12JZ123456789-301A";
r = string.Concat(s.Substring(s.IndexOf("JZ")).TakeWhile(char.IsDigit));
Console.WriteLine(r); // output : 123456789
Basically, we remove everything before and including the delimiter "JZ", then we take each char while they are digit. The Concat is use to transform the IEnumerable<char> to a string. I think it is easier to read.
Try it online

C# Extract part of the string that starts with specific letters

I have a string which I extract from an HTML document like this:
var elas = htmlDoc.DocumentNode.SelectSingleNode("//a[#class='a-size-small a-link-normal a-text-normal']");
if (elas != null)
{
//
_extractedString = elas.Attributes["href"].Value;
}
The HREF attribute contains this part of the string:
gp/offer-listing/B002755TC0/
And I'm trying to extract the B002755TC0 value, but the problem here is that the string will vary by its length and I cannot simply use Substring method that C# offers to extract that value...
Instead I was thinking if there's a clever way to do this, to perhaps a match beginning of the string with what I search?
For example I know for a fact that each href has this structure like I've shown, So I would simply match these keywords:
offer-listing/
So I would find this keyword and start extracting the part of the string B002755TC0 until the next " / " sign ?
Can someone help me out with this ?
This is a perfect job for a regular expression :
string text = "gp/offer-listing/B002755TC0/";
Regex pattern = new Regex(#"offer-listing/(\w+)/");
Match match = pattern.Match(text);
string whatYouAreLookingFor = match.Groups[1].Value;
Explanation : we just match the exact pattern you need.
'offer-listing/'
followed by any combination of (at least one) 'word characters' (letters, digits, hyphen, etc...),
followed by a slash.
The parenthesis () mean 'capture this group' (so we can extract it later with match.Groups[1]).
EDIT: if you want to extract also from this : /dp/B01KRHBT9Q/
Then you could use this pattern :
Regex pattern = new Regex(#"/(\w+)/$");
which will match both this string and the previous. The $ stands for the end of the string, so this literally means :
capture the characters in between the last two slashes of the string
Though there is already an accepted answer, I thought of sharing another solution, without using Regex. Just find the position of your pattern in the input + it's lenght, so the wanted text will be the next character. to find the end, search for the first "/" after the begining of the wanted text:
string input = "gp/offer-listing/B002755TC0/";
string pat = "offer-listing/";
int begining = input.IndexOf(pat)+pat.Length;
int end = input.IndexOf("/",begining);
string result = input.Substring(begining,end-begining);
If your desired output is always the last piece, you could also use split and get the last non-empty piece:
string result2 = input.Split(new string[]{"/"},StringSplitOptions.RemoveEmptyEntries)
.ToList().Last();

C# Regex to Get file name without extension?

I want to use regex to get a filename without extension. I'm having trouble getting regex to return a value. I have this:
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var name = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)").Value;
In this case, name always comes back as C:\PERSONAL\TEST\TESTFILE.PDF. What am I doing wrong, I think my search pattern is correct?
(I am aware that I could use Path.GetFileNameWithoutExtension(path);but I specifically want to try using regex)
You need Group[1].Value
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var match = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)");
if(match.Success)
{
var name = match.Groups[1].Value;
}
match.Value returns the Captures.Value which is the entire match
match.Group[0] always has the same value as match.Value
match.Group[1] return the first capture value
For example:
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var match = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)");
if(match.Success)
{
Console.WriteLine(match.Value);
// return the substring of the matching part
//Output: C:\\PERSONAL\\TEST\\TESTFILE.PDF
Console.WriteLine(match.Groups[0].Value)
// always the same as match.Value
//Output: C:\\PERSONAL\\TEST\\TESTFILE.PDF
Console.WriteLine(match.Groups[1].Value)
// return the first capture group which is (.+?) in this case
//Output: C:\\PERSONAL\\TEST\\TESTFILE
Console.WriteLine(match.Groups[2].Value)
// return the second capture group which is (\.[^\.]+$|$) in this case
//Output: .PDF
}
Since the data is on the right side of the string, tell the regex parser to work from the end of the string to the beginning by using the option RightToLeft. Which will significantly reduce the processing time as well as lessen the actual pattern needed.
The pattern below reads from left to right and says, give me everything that is not a \ character (to consume/match up to the slash and not proceed farther) and start consuming up to a period.
Regex.Match(#"C:\PERSONAL\TEST\TESTFILE.PDF",
#"([^\\]+)\.",
RegexOptions.RightToLeft)
.Groups[1].Value
Prints out
TESTFILE
Try this:
.*(?=[.][^OS_FORBIDDEN_CHARACTERS]+$)
For Windows:
OS_FORBIDDEN_CHARACTERS = :\/\\\?"><\|
this is a sleight modification of:
Regular expression get filename without extention from full filepath
If you are fine to match forbidden characters then simplest regex would be:
.*(?=[.].*$)
Can be a bit shorter and greedier:
var name = Regex.Replace(#"C:\PERS.ONAL\TEST\TEST.FILE.PDF", #".*\\(.*)\..*", "$1"); // "TEST.FILE"

Extracting only the substring containing letters from a string containing digits strings and symbols

I have a string that is like the following:
string str = hello_16_0_2016;
What I want is to extract hello from the string. As in my program the string part can occur anywhere as it is autogenerated, so I cannot fix the position of my string.
For example: I can take the first five string from above and store it in a new variable.
But as occurring of letters is random and I want to extract only letters from the string above, so can someone guide me to the correct procedure to do this?
Could you just use a simple regular expression to pull out only alphabetic characters, assuming you only need a-z?
using System.Text.RegularExpressions;
var str = "hello_16_0_2016";
var onlyLetters = Regex.Replace(str, #"[^a-zA-Z]", "");
// onlyLetters = "hello"
I'd use something like this (uses Linq):
var str = "hello_16_0_2016";
var result = string.Concat(str.Where(char.IsLetter));
Check it out
Or, if performance is a concern (because you have to do this on a tight loop, or have to convert hundreds of thousands of strings), it'd probably be faster to do:
var result = new string(str.Where(char.IsLetter).ToArray());
Check it too
But as occurring of letters is random and I want to extract only
letters from the string above, so can someone guide me to the correct
procedure to do this?
The following will extract the first text, without numbers anywhere in the string:
Console.WriteLine( Regex.Match("hello_16_0_2016", #"[A-Za-z]+").Value ); // "hello"

Categories