How to find the third element value using Regex - c#

All, i am currently trying to parse each element that has the format below using regex and c# to find any value in () below.. Example i would like to extract 2002_max_allow_date .. note not all the names in here will be alpha numeric etc...
I initially have the pattern: Regex regex = new Regex(#"(\w\d\d\d.[A-Z])\w+");
However this only returns the name with the numeric etc
From reply i tried the following and trying to format this so that i do not get the syntax error as well as i don't want to change the regex query...
Can someone please assist me in finding the name located in the third position.. example this,'46032','46032','2002_MAX_ALLOW_DATE'
<button class="longlist-cb longlist-cb-yes" id="cb46032"
onclick="$ll.CATG.toggleCb(this,'46032','46032','2002_MAX_ALLOW_DATE')"
</button>

Please try this
Regex rex = new Regex("'[^']+','[^']+','(?<ThirdElement>[^']+)'");
String data = "'46032','46032','2002_MAX_ALLOW_DATE'";
Match match = rex.Match(data);
Console.WriteLine(match.Groups["ThirdElement"]); // Output: 2002_MAX_ALLOW_DATE

SECOND EDIT:
I've written some code that provides all the elements inside the onclick as capture groups:
Regex regex = new Regex("onclick=\"\\$ll.CATG.toggleCb\\((.*),\\s?(.*),\\s?(.*),\\s?(.*)\\)");
string x = "<button class=\"longlist - cb longlist - cb - yes\" id=\"cb46032\" onclick=\"$ll.CATG.toggleCb(this, '46032', '46032', '2002_MAX_ALLOW_DATE')\"></button>";
Match match = regex.Match(x);
if (match.Success)
{
Console.WriteLine("match.Value returns: " + match.Value);
foreach (Group y in match.Groups)
{
Console.WriteLine("the current capture group: " + y.Value);
}
}
else
{
Console.Write("No match");
}
Console.ReadKey();
will print:
EDIT: After trying with VS, this worked for me: Regex regex = new Regex("onclick=\"\\$ll.CATG.toggleCb\\((.*),.*,.*,.*\\)");
ORIGINAL ANSWER:
If you were to use Regex regex = new Regex(#"onclick="\$ll.CATG.toggleCb\(.*,.*,(.*),.*\)"); on your provided text, that should return '46032'.
You could alter this regex by moving the capturing ( and ) to a different .* to capture, say, the fourth element, like this: onclick="\$ll.CATG.toggleCb\((.*),.*,.*,.*\) would capture this.

Why not get the attribute value of onclick, but to get the all HTML of the button which make question become complex.
And use String.Split can resolve your problem simply, but you choose to use RegExp.
the_button_element.GetAttribute('onclick').Split(',')[3]
Or use RegExp:
new Regex(#".*?,'(\w+)'\)$")

Related

How to use grouped regex content from one file to another?

How do I get the matched regex group value from one file and paste it in a different file
I've tried something like this
var doc=File.ReadAllText(#"D:\Project\12345\database\xyz.txt");
Regex r=new Regex(#"<ttl>(\w+)</ttl>");
Match m=r.Match(doc);
string gr=m.Groups[1].Value;
File.WriteAllText(#"E:\Final\12345\2017\xyz.txt", File.ReadAllText(#"E:\Final\12345\2017\123.txt").Replace("<ce-title>[^<]+</ce-title>","<ce-title>"+gr+"</ce-title>"));
Console.WriteLine("Done");
Console.ReadLine();
But it does not work for some reason and I can't figure out what is wrong?
I'm basically trying to get content inside the first <ttl> element from one file and paste that value to another files <ce-title> element using regex.
NOTE: I'm aware that this can be done using xml/html parsing techniques but I want to know how I can do this simple thing using regex.
Can anyone help me on this?
You are using String.Replace() rather than Regex.Replace.
Re-write your code as follows:
var doc=File.ReadAllText(#"D:\Project\12345\database\xyz.txt");
var r = new Regex(#"<ttl>(\w+)</ttl>");
Match m=r.Match(doc);
if (m.Success)
{
var gr = m.Groups[1].Value;
var rx = new Regex("<ce-title>[^<]+</ce-title>");
File.WriteAllText(#"E:\Final\12345\2017\xyz.txt",
rx.Replace(
File.ReadAllText(#"E:\Final\12345\2017\123.txt‌​"), // Input
string.Format("<ce-title>{0}</ce-title>", gr), // Replacement
1 // Number of occurrences
)
);
}
Console.WriteLine("Done");
Console.ReadLine();
Since gr only consists of word chars, it is safe to use string.Format("<ce-title>{0}</ce-title>", gr) as a replacement. Else, if there is a need to support any chars, you need to use string.Format("<ce-title>{0}</ce-title>", gr.Replace("$", "$$")).

REGEX help needed in c#

I am very new to reg-ex and i am not sure whats going on with this one.... however my friend gave me this to solve my issue BUT somehow it is not working....
string: department_name:womens AND item_type_keyword:base-layer-underwear
reg-ex: (department_name:([\\w-]+))?(item_type_keyword:([\\w-]+))?
desired output: array OR group
1st element should be: department_name:womens
2nd should be: womens
3rd: item_type_keyword:base-layer-underwear
4th: base-layer-underwear
strings can contain department_name OR item_type_keyword, BUT not mendatory, in any order
C# Code
Regex regex = new Regex(#"(department_name:([\w-]+))?(item_type_keyword:([\w-]+))?");
Match match = regex.Match(query);
if (match.Success)
if (!String.IsNullOrEmpty(match.Groups[4].ToString()))
d1.ItemType = match.Groups[4].ToString();
this C# code only returns string array with 3 element
1: department_name:womens
2: department_name:womens
3: womens
somehow it is duplicating 1st and 2nd element, i dont know why. BUT its not return the other elements that i expect..
can someone help me please...
when i am testing the regex online, it looks fine to me...
http://fiddle.re/crvw1
Thanks
You can use something like this to get the output you have in your question:
string txt = "department_name:womens AND item_type_keyword:base-layer-underwear";
var reg = new Regex(#"(?:department_name|item_type_keyword):([\w-]+)", RegexOptions.IgnoreCase);
var ms = reg.Matches(txt);
ArrayList results = new ArrayList();
foreach (Match match in ms)
{
results.Add(match.Groups[0].Value);
results.Add(match.Groups[1].Value);
}
// results is your final array containing all results
foreach (string elem in results)
{
Console.WriteLine(elem);
}
Prints:
department_name:womens
womens
item_type_keyword:base-layer-underwear
base-layer-underwear
match.Groups[0].Value gives the part that matched the pattern, while match.Groups[1].Value will give the part captured in the pattern.
In your first expression, you have 2 capture groups; hence why you have twice department_name:womens appearing.
Once you get the different elements, you should be able to put them in an array/list for further processing. (Added this part in edit)
The loop then allows you to iterate over each of the matches, which you cannot exactly do with if and .Match() (which is better suited for a single match, while here I'm enabling multiple matches so the order they are matched doesn't matter, or the number of matches).
ideone demo
(?:
department_name # Match department_name
| # Or
item_type_keyword # Match item_type_keyword
)
:
([\w-]+) # Capture \w and - characters
It's better to use the alternation (or logical OR) operator | because we don't know the order of the input string.
(department_name:([\w-]+))|(item_type_keyword:([\w-]+))
DEMO
String input = #"department_name:womens AND item_type_keyword:base-layer-underwear";
Regex rgx = new Regex(#"(?:(department_name:([\w-]+))|(item_type_keyword:([\w-]+)))");
foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
Console.WriteLine(m.Groups[4].Value);
}
IDEONE
Another idea using a lookahead for capturing and getting all groups in one match:
^(?!$)(?=.*(department_name:([\w-]+))|)(?=.*(item_type_keyword:([\w-]+))|)
as a .NET String
"^(?!$)(?=.*(department_name:([\\w-]+))|)(?=.*(item_type_keyword:([\\w-]+))|)"
test at regexplanet (click on .NET); test at regex101.com
(add m multiline modifier if multiline input: "^(?m)...)
If you use any spliting with And Or , etc that you can use
(department_name:(.*?)) AND (item_type_keyword:(.*?)$)
•1: department_name:womens
•2: womens
•3: item_type_keyword:base-layer-underwear
•4: base-layer-underwear
(?=(department_name:\w+)).*?:([\w-]+)|(?=(item_type_keyword:.*)$).*?:([\w-]+)
Try this.This uses a lookahead to capture then backtrack and again capture.See demo.
http://regex101.com/r/lS5tT3/52

Regex replacing inside of

Well, I have this code:
StreamReader sr = new StreamReader(#"main.cl", true);
String str = sr.ReadToEnd();
Regex r = new Regex(#"&");
string[] line = r.Split(str);
foreach (string val in line)
{
string Change = val.Replace("puts","System.Console.WriteLine()");
Console.Write(Change);
}
As you can see, I'm trying to replace puts (content) by Console.WriteLine(content) but it would be need Regular Expressions and I didn't found a good article about how to do THIS.
Basically, taking * as the value that is coming, I'd like to do this:
string Change = val.Replace("puts *","System.Console.WriteLine(*)");
Then, if I receive:
puts "Hello World";
I want to get:
System.Console.WriteLine("Hello World");
You need to use Regex.Replace to capture part of the input by using a capturing group and include the captured match into the output. Example:
Regex.Replace(
"puts 'foo'", // input
"puts (.*)", // .* means "any number of characters"
"System.Console.WriteLine($1)") // $1 stands for whatever (.*) matched
If the input always ends in a semicolon you would want to move that semicolon outside the WriteLine parens. One way to do that is:
Regex.Replace(
"puts 'foo';", // input
"puts (.*);", // ; outside parens -- now it's not captured
"System.Console.WriteLine($1);") // manually adding the fixed ; at the end
If you intend to adapt these examples it's a good idea to consult a technical reference first; you can find a very good one here.
What you want to do is look at Grouping Expressions. Give the following a try
Regex.Replace(val, "puts (.*);", "System.Console.WriteLine(${1});");
Note that you can also name your groups, as opposed to using their indexes for replacement. You can do this like so:
Regex.Replace(val, "puts (?<str>.*);", "System.Console.WriteLine(${str});");

Parse out Value

Anyone any ideas how to parse out this value in the simplest way possible. It needs to be quick and lean. Someone said regex but I haven't used them before. Can they be used to get whats inside the value?
name="org.apache.struts.taglib.html.TOKEN" value="THIS IS WHAT IS NEEDED"
var reVal = new Regex( "name=\"org.apache.struts.taglib.html.TOKEN\"\s+value=\"(?<value>.*?)\"" );
string value = reVal.Match( input ).Groups["value"].Value;
And I will explain it as well. First we seek for the word value with a " after it. Then (?<value> specifies a named group with the name "value". .*?\" means match everything up to the first ". Then we grab the value of the group in the second line.
You could start by reading the MSDN docs of the Regex class.
var tokenString = "name=\"org.apache.struts.taglib.html.TOKEN\" value=\"THIS IS WHAT IS NEEDED\"";
Regex regex = new Regex("value=\"(.*)\"");
var match = regex.Match(tokenString);
if (match.Success)
{
Console.WriteLine(match.Groups[1]);
}

Problem creating regex to match filename

I am trying to create a regex in C# to extract the artist, track number and song title from a filename named like: 01.artist - title.mp3
Right now I can't get the thing to work, and am having problems finding much relevant help online.
Here is what I have so far:
string fileRegex = "(?<trackNo>\\d{1,3})\\.(<artist>[a-z])\\s-\\s(<title>[a-z])\\.mp3";
Regex r = new Regex(fileRegex);
Match m = r.Match(song.Name); // song.Name is the filname
if (m.Success)
{
Console.WriteLine("Artist is {0}", m.Groups["artist"]);
}
else
{
Console.WriteLine("no match");
}
I'm not getting any matches at all, and all help is appreciated!
You might want to put ?'s before the <> tags in all your groupings, and put a + sign after your [a-z]'s, like so:
string fileRegex = "(?<trackNo>\\d{1,3})\\.(?<artist>[a-z]+)\\s-\\s(?<title>[a-z]+)\\.mp3";
Then it should work. The ?'s are required so that the contents of the angled brackets <> are interpreted as a grouping name, and the +'s are required to match 1 or more repetitions of the last element, which is any character between (and including) a-z here.
Your artist and title groups are matching exactly one character. Try:
"(?<trackNo>\\d{1,3})\\.(?<artist>[a-z]+\\s-\\s(?<title>[a-z]+)\\.mp3"
I really recommend http://www.ultrapico.com/Expresso.htm for building regular expressions. It's brilliant and free.
P.S. i like to type my regex string literals like so:
#"(?<trackNo>\d{1,3})\.(?<artist>[a-z]+\s-\s(?<title>[a-z]+)\.mp3"
Maybe try:
"(?<trackNo>\\d{1,3})\\.(<artist>[a-z]*)\\s-\\s(<title>[a-z]*)\\.mp3";
CODE
String fileName = #"01. Pink Floyd - Another Brick in the Wall.mp3";
String regex = #"^(?<TrackNumber>[0-9]{1,3})\. ?(?<Artist>(.(?!= - ))+) - (?<Title>.+)\.mp3$";
Match match = Regex.Match(fileName, regex);
if (match.Success)
{
Console.WriteLine(match.Groups["TrackNumber"]);
Console.WriteLine(match.Groups["Artist"]);
Console.WriteLine(match.Groups["Title"]);
}
OUTPUT
01
Pink Floyd
Another Brick in the Wall

Categories