Searching a string for a substring - c#

(Apologies, I'm very new to c#)
Given the following string:
SELECT * FROM TABLE WHERE sDATE ='~~Date~~' AND sName = '~~Some NAME~~'
I need to extract Date & Some NAME from the above string into an array.
It's not always going to be Date and Some NAME inside the ~~
There maybe a single ~~x~~ in the string, there maybe two or more
~~x~~ could be any length and may contain numbers, letters and spaces
~~x~~ will always start and end with ~~
~~x~~ may or may not be in quotes
For a given string I'd like to get an array of the found values. I'd be OK with ~~Date~~ or just Date
I'm thinking this might be a regex situation. I've looked at .Split, .IndexOf, .Contains but none of those get me what I'm looking for since I'm not always looking for the same substring.
Update:
This is not strictly for SQL parsing, that's just a quick example. A string could also be
My name is ~~Some NAME~~ and I'm hunting for some help

Given your description, something like (and assuming there are no ~~ inside the string you want to capture):
~~(.*?)~~
would give you the text between a pair of ~~ in a capture group. You could change . to something more restrictive if you want to.
Example: https://dotnetfiddle.net/t6i7rx
var s = "SELECT * FROM TABLE WHERE sDATE ='~~Date~~' AND sName = '~~Some NAME~~'";
Regex r = new Regex(#"~~(.*?)~~");
foreach (Match m in r.Matches(s)) {
Console.WriteLine(m.Groups[1]);
}
Outputs:
Date
Some NAME
Note the importance of *? versus just * here. * by itself is greedy and will match as much as possible. So it would return Date~~' AND sName = '~~Some NAME because it'll take everything between the first and the last ~~. Adding the ? makes it lazy.

Related

Match Characters after last dot in string

I have a string and I want to get the words after the last dot in the string.
Example:
input string = "XimEngine.DynamicGui.PickKind.DropDown";
Result:
DropDown
There's no need in Regex, let's find out the last . and get Substring:
string result = input.Substring(input.LastIndexOf('.') + 1);
If input doesn't have . the entire input will be returned
Not a RegEx answer, but you could do:
var result = input.Split('.').Last();
In Regex you can tell the parser to work from the end of the string/buffer by specifying the option RightToLeft.
By using that we can just specify a forward pattern to find a period (\.) and then capture (using ( )) our text we are interested into group 1 ((\w+)).
var str = "XimEngine.DynamicGui.PickKind.DropDown";
Console.WriteLine(Regex.Match(str,
#"\.(\w+)",
RegexOptions.RightToLeft).Groups[1].Value);
Outputs to console:
DropDown
By working from the other end of the string means we don't have to deal with anything at the beginning of the string to where we need to extract text.

C# Extract part of the string that starts with specific letters

I have a string which I extract from an HTML document like this:
var elas = htmlDoc.DocumentNode.SelectSingleNode("//a[#class='a-size-small a-link-normal a-text-normal']");
if (elas != null)
{
//
_extractedString = elas.Attributes["href"].Value;
}
The HREF attribute contains this part of the string:
gp/offer-listing/B002755TC0/
And I'm trying to extract the B002755TC0 value, but the problem here is that the string will vary by its length and I cannot simply use Substring method that C# offers to extract that value...
Instead I was thinking if there's a clever way to do this, to perhaps a match beginning of the string with what I search?
For example I know for a fact that each href has this structure like I've shown, So I would simply match these keywords:
offer-listing/
So I would find this keyword and start extracting the part of the string B002755TC0 until the next " / " sign ?
Can someone help me out with this ?
This is a perfect job for a regular expression :
string text = "gp/offer-listing/B002755TC0/";
Regex pattern = new Regex(#"offer-listing/(\w+)/");
Match match = pattern.Match(text);
string whatYouAreLookingFor = match.Groups[1].Value;
Explanation : we just match the exact pattern you need.
'offer-listing/'
followed by any combination of (at least one) 'word characters' (letters, digits, hyphen, etc...),
followed by a slash.
The parenthesis () mean 'capture this group' (so we can extract it later with match.Groups[1]).
EDIT: if you want to extract also from this : /dp/B01KRHBT9Q/
Then you could use this pattern :
Regex pattern = new Regex(#"/(\w+)/$");
which will match both this string and the previous. The $ stands for the end of the string, so this literally means :
capture the characters in between the last two slashes of the string
Though there is already an accepted answer, I thought of sharing another solution, without using Regex. Just find the position of your pattern in the input + it's lenght, so the wanted text will be the next character. to find the end, search for the first "/" after the begining of the wanted text:
string input = "gp/offer-listing/B002755TC0/";
string pat = "offer-listing/";
int begining = input.IndexOf(pat)+pat.Length;
int end = input.IndexOf("/",begining);
string result = input.Substring(begining,end-begining);
If your desired output is always the last piece, you could also use split and get the last non-empty piece:
string result2 = input.Split(new string[]{"/"},StringSplitOptions.RemoveEmptyEntries)
.ToList().Last();

separate number from String containing spaces and hyphens in C#

I am developing C# MVC application. I got an account name and its code from one field from the view but I have to segregate them for storing them in database. I have used Regular Expression and successfully separated the code from rest of the string. But in the string part I can only get the string before the space or hyphen. My Regex is:
string numberPart = Regex.Match(s, #"\d+").Value;
string alphaPart = Regex.Match(s, #"[a-zA-Z]+\s+").Value;
d.code = numberPart;
d.name = alphaPart;
"2103010001 - SALES - PACKING SERV - MUTTON ( 1F )"
this is my complete string from the view. When I used the above Regex for separating code and description, I get the following,
numberPart = 2103010001
alphaPart = SALES
What I want is:
numberPart = 2103010001
alphaPart = SALES - PACKING SERV - MUTTON ( 1F )
What would be the appropriate expression to do this?
For the second regex, you essentially want "everything after (and including) the first letter". Thus you can simply try
string alphaPart = Regex.Match(s, #"[a-zA-Z].*").Value;
If you want to be more specific, you can restrict the "after" part to just the characters you expect, maybe
string alphaPart = Regex.Match(s, #"[a-zA-Z][a-zA-Z0-9 ()-]*").Value;
but you still need the leading [a-zA-Z] because otherwise you'd match the number part too.
Just do splitting accoring to the first - character.
Regex.Split(input, #"(?<=^[^-]*?)\s*-\s*");
DEMO

Get sub-strings from a string that are enclosed using some specified character

Suppose I have a string
Likes (20)
I want to fetch the sub-string enclosed in round brackets (in above case its 20) from this string. This sub-string can change dynamically at runtime. It might be any other number from 0 to infinity. To achieve this my idea is to use a for loop that traverses the whole string and then when a ( is present, it starts adding the characters to another character array and when ) is encountered, it stops adding the characters and returns the array. But I think this might have poor performance. I know very little about regular expressions, so is there a regular expression solution available or any function that can do that in an efficient way?
If you don't fancy using regex you could use Split:
string foo = "Likes (20)";
string[] arr = foo.Split(new char[]{ '(', ')' }, StringSplitOptions.None);
string count = arr[1];
Count = 20
This will work fine regardless of the number in the brackets ()
e.g:
Likes (242535345)
Will give:
242535345
Works also with pure string methods:
string result = "Likes (20)";
int index = result.IndexOf('(');
if (index >= 0)
{
result = result.Substring(index + 1); // take part behind (
index = result.IndexOf(')');
if (index >= 0)
result = result.Remove(index); // remove part from )
}
Demo
For a strict matching, you can do:
Regex reg = new Regex(#"^Likes\((\d+)\)$");
Match m = reg.Match(yourstring);
this way you'll have all you need in m.Groups[1].Value.
As suggested from I4V, assuming you have only that sequence of digits in the whole string, as in your example, you can use the simpler version:
var res = Regex.Match(str,#"\d+")
and in this canse, you can get the value you are looking for with res.Value
EDIT
In case the value enclosed in brackets is not just numbers, you can just change the \d with something like [\w\d\s] if you want to allow in there alphabetic characters, digits and spaces.
Even with Linq:
var s = "Likes (20)";
var s1 = new string(s.SkipWhile(x => x != '(').Skip(1).TakeWhile(x => x != ')').ToArray());
const string likes = "Likes (20)";
int likesCount = int.Parse(likes.Substring(likes.IndexOf('(') + 1, (likes.Length - likes.IndexOf(')') + 1 )));
Matching when the part in paranthesis is supposed to be a number;
string inputstring="Likes (20)"
Regex reg=new Regex(#"\((\d+)\)")
string num= reg.Match(inputstring).Groups[1].Value
Explanation:
By definition regexp matches a substring, so unless you indicate otherwise the string you are looking for can occur at any place in your string.
\d stand for digits. It will match any single digit.
We want it to potentially be repeated several times, and we want at least one. The + sign is regexp for previous symbol or group repeated 1 or more times.
So \d+ will match one or more digits. It will match 20.
To insure that we get the number that is in paranteses we say that it should be between ( and ). These are special characters in regexp so we need to escape them.
(\d+) would match (20), and we are almost there.
Since we want the part inside the parantheses, and not including the parantheses we tell regexp that the digits part is a single group.
We do that by using parantheses in our regexp. ((\d+)) will still match (20), but now it will note that 20 is a subgroup of this match and we can fetch it by Match.Groups[].
For any string in parantheses things gets a little bit harder.
Regex reg=new Regex(#"\((.+)\)")
Would work for many strings. (the dot matches any character) But if the input is something like "This is an example(parantesis1)(parantesis2)", you would match (parantesis1)(parantesis2) with parantesis1)(parantesis2 as the captured subgroup. This is unlikely to be what you are after.
The solution can be to do the matching for "any character exept a closing paranthesis"
Regex reg=new Regex(#"\(([^\(]+)\)")
This will find (parantesis1) as the first match, with parantesis1 as .Groups[1].
It will still fail for nested paranthesis, but since regular expressions are not the correct tool for nested paranthesis I feel that this case is a bit out of scope.
If you know that the string always starts with "Likes " before the group then Saves solution is better.

Problem creating regex to match filename

I am trying to create a regex in C# to extract the artist, track number and song title from a filename named like: 01.artist - title.mp3
Right now I can't get the thing to work, and am having problems finding much relevant help online.
Here is what I have so far:
string fileRegex = "(?<trackNo>\\d{1,3})\\.(<artist>[a-z])\\s-\\s(<title>[a-z])\\.mp3";
Regex r = new Regex(fileRegex);
Match m = r.Match(song.Name); // song.Name is the filname
if (m.Success)
{
Console.WriteLine("Artist is {0}", m.Groups["artist"]);
}
else
{
Console.WriteLine("no match");
}
I'm not getting any matches at all, and all help is appreciated!
You might want to put ?'s before the <> tags in all your groupings, and put a + sign after your [a-z]'s, like so:
string fileRegex = "(?<trackNo>\\d{1,3})\\.(?<artist>[a-z]+)\\s-\\s(?<title>[a-z]+)\\.mp3";
Then it should work. The ?'s are required so that the contents of the angled brackets <> are interpreted as a grouping name, and the +'s are required to match 1 or more repetitions of the last element, which is any character between (and including) a-z here.
Your artist and title groups are matching exactly one character. Try:
"(?<trackNo>\\d{1,3})\\.(?<artist>[a-z]+\\s-\\s(?<title>[a-z]+)\\.mp3"
I really recommend http://www.ultrapico.com/Expresso.htm for building regular expressions. It's brilliant and free.
P.S. i like to type my regex string literals like so:
#"(?<trackNo>\d{1,3})\.(?<artist>[a-z]+\s-\s(?<title>[a-z]+)\.mp3"
Maybe try:
"(?<trackNo>\\d{1,3})\\.(<artist>[a-z]*)\\s-\\s(<title>[a-z]*)\\.mp3";
CODE
String fileName = #"01. Pink Floyd - Another Brick in the Wall.mp3";
String regex = #"^(?<TrackNumber>[0-9]{1,3})\. ?(?<Artist>(.(?!= - ))+) - (?<Title>.+)\.mp3$";
Match match = Regex.Match(fileName, regex);
if (match.Success)
{
Console.WriteLine(match.Groups["TrackNumber"]);
Console.WriteLine(match.Groups["Artist"]);
Console.WriteLine(match.Groups["Title"]);
}
OUTPUT
01
Pink Floyd
Another Brick in the Wall

Categories