How to use grouped regex content from one file to another? - c#

How do I get the matched regex group value from one file and paste it in a different file
I've tried something like this
var doc=File.ReadAllText(#"D:\Project\12345\database\xyz.txt");
Regex r=new Regex(#"<ttl>(\w+)</ttl>");
Match m=r.Match(doc);
string gr=m.Groups[1].Value;
File.WriteAllText(#"E:\Final\12345\2017\xyz.txt", File.ReadAllText(#"E:\Final\12345\2017\123.txt").Replace("<ce-title>[^<]+</ce-title>","<ce-title>"+gr+"</ce-title>"));
Console.WriteLine("Done");
Console.ReadLine();
But it does not work for some reason and I can't figure out what is wrong?
I'm basically trying to get content inside the first <ttl> element from one file and paste that value to another files <ce-title> element using regex.
NOTE: I'm aware that this can be done using xml/html parsing techniques but I want to know how I can do this simple thing using regex.
Can anyone help me on this?

You are using String.Replace() rather than Regex.Replace.
Re-write your code as follows:
var doc=File.ReadAllText(#"D:\Project\12345\database\xyz.txt");
var r = new Regex(#"<ttl>(\w+)</ttl>");
Match m=r.Match(doc);
if (m.Success)
{
var gr = m.Groups[1].Value;
var rx = new Regex("<ce-title>[^<]+</ce-title>");
File.WriteAllText(#"E:\Final\12345\2017\xyz.txt",
rx.Replace(
File.ReadAllText(#"E:\Final\12345\2017\123.txt‌​"), // Input
string.Format("<ce-title>{0}</ce-title>", gr), // Replacement
1 // Number of occurrences
)
);
}
Console.WriteLine("Done");
Console.ReadLine();
Since gr only consists of word chars, it is safe to use string.Format("<ce-title>{0}</ce-title>", gr) as a replacement. Else, if there is a need to support any chars, you need to use string.Format("<ce-title>{0}</ce-title>", gr.Replace("$", "$$")).

Related

How to find the third element value using Regex

All, i am currently trying to parse each element that has the format below using regex and c# to find any value in () below.. Example i would like to extract 2002_max_allow_date .. note not all the names in here will be alpha numeric etc...
I initially have the pattern: Regex regex = new Regex(#"(\w\d\d\d.[A-Z])\w+");
However this only returns the name with the numeric etc
From reply i tried the following and trying to format this so that i do not get the syntax error as well as i don't want to change the regex query...
Can someone please assist me in finding the name located in the third position.. example this,'46032','46032','2002_MAX_ALLOW_DATE'
<button class="longlist-cb longlist-cb-yes" id="cb46032"
onclick="$ll.CATG.toggleCb(this,'46032','46032','2002_MAX_ALLOW_DATE')"
</button>
Please try this
Regex rex = new Regex("'[^']+','[^']+','(?<ThirdElement>[^']+)'");
String data = "'46032','46032','2002_MAX_ALLOW_DATE'";
Match match = rex.Match(data);
Console.WriteLine(match.Groups["ThirdElement"]); // Output: 2002_MAX_ALLOW_DATE
SECOND EDIT:
I've written some code that provides all the elements inside the onclick as capture groups:
Regex regex = new Regex("onclick=\"\\$ll.CATG.toggleCb\\((.*),\\s?(.*),\\s?(.*),\\s?(.*)\\)");
string x = "<button class=\"longlist - cb longlist - cb - yes\" id=\"cb46032\" onclick=\"$ll.CATG.toggleCb(this, '46032', '46032', '2002_MAX_ALLOW_DATE')\"></button>";
Match match = regex.Match(x);
if (match.Success)
{
Console.WriteLine("match.Value returns: " + match.Value);
foreach (Group y in match.Groups)
{
Console.WriteLine("the current capture group: " + y.Value);
}
}
else
{
Console.Write("No match");
}
Console.ReadKey();
will print:
EDIT: After trying with VS, this worked for me: Regex regex = new Regex("onclick=\"\\$ll.CATG.toggleCb\\((.*),.*,.*,.*\\)");
ORIGINAL ANSWER:
If you were to use Regex regex = new Regex(#"onclick="\$ll.CATG.toggleCb\(.*,.*,(.*),.*\)"); on your provided text, that should return '46032'.
You could alter this regex by moving the capturing ( and ) to a different .* to capture, say, the fourth element, like this: onclick="\$ll.CATG.toggleCb\((.*),.*,.*,.*\) would capture this.
Why not get the attribute value of onclick, but to get the all HTML of the button which make question become complex.
And use String.Split can resolve your problem simply, but you choose to use RegExp.
the_button_element.GetAttribute('onclick').Split(',')[3]
Or use RegExp:
new Regex(#".*?,'(\w+)'\)$")

C# Regex to Get file name without extension?

I want to use regex to get a filename without extension. I'm having trouble getting regex to return a value. I have this:
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var name = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)").Value;
In this case, name always comes back as C:\PERSONAL\TEST\TESTFILE.PDF. What am I doing wrong, I think my search pattern is correct?
(I am aware that I could use Path.GetFileNameWithoutExtension(path);but I specifically want to try using regex)
You need Group[1].Value
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var match = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)");
if(match.Success)
{
var name = match.Groups[1].Value;
}
match.Value returns the Captures.Value which is the entire match
match.Group[0] always has the same value as match.Value
match.Group[1] return the first capture value
For example:
string path = #"C:\PERSONAL\TEST\TESTFILE.PDF";
var match = Regex.Match(path, #"(.+?)(\.[^\.]+$|$)");
if(match.Success)
{
Console.WriteLine(match.Value);
// return the substring of the matching part
//Output: C:\\PERSONAL\\TEST\\TESTFILE.PDF
Console.WriteLine(match.Groups[0].Value)
// always the same as match.Value
//Output: C:\\PERSONAL\\TEST\\TESTFILE.PDF
Console.WriteLine(match.Groups[1].Value)
// return the first capture group which is (.+?) in this case
//Output: C:\\PERSONAL\\TEST\\TESTFILE
Console.WriteLine(match.Groups[2].Value)
// return the second capture group which is (\.[^\.]+$|$) in this case
//Output: .PDF
}
Since the data is on the right side of the string, tell the regex parser to work from the end of the string to the beginning by using the option RightToLeft. Which will significantly reduce the processing time as well as lessen the actual pattern needed.
The pattern below reads from left to right and says, give me everything that is not a \ character (to consume/match up to the slash and not proceed farther) and start consuming up to a period.
Regex.Match(#"C:\PERSONAL\TEST\TESTFILE.PDF",
#"([^\\]+)\.",
RegexOptions.RightToLeft)
.Groups[1].Value
Prints out
TESTFILE
Try this:
.*(?=[.][^OS_FORBIDDEN_CHARACTERS]+$)
For Windows:
OS_FORBIDDEN_CHARACTERS = :\/\\\?"><\|
this is a sleight modification of:
Regular expression get filename without extention from full filepath
If you are fine to match forbidden characters then simplest regex would be:
.*(?=[.].*$)
Can be a bit shorter and greedier:
var name = Regex.Replace(#"C:\PERS.ONAL\TEST\TEST.FILE.PDF", #".*\\(.*)\..*", "$1"); // "TEST.FILE"

Substitutions in Regular Expressions, and Replacement pattern

I spend 4 hours on this and still is not clear to me how should this work.
I want use logic from this link. I want to transform
Some123Grouping TO GroupingSome123
I have 3 parts and should change order using replacement ($1, $2, $3)
Also I need something to transform
name#gmail.com TO name
It is not clear to me how to define replacement and what is captured in my case?
Thanks for help, I would relay appreciate it.
$1, $2, etc. are referring to groups (i.e. the indexes of their appearance of declaration). So you need to define groups in your capturing regex. You do this by using parenthesis. For example:
Regex.Replace("Some123Grouping", #"(Some)(123)(Grouping)", #"$3$1$2")
yields "GroupingSome123".
Note that for better readability, groups can also be named and then referenced by their name. For example:
Regex.Replace("mr.smith#gmail.com", #"(?<name>.*)(#gmail.com)", #"${name}")
yields "mr.smith".
BTW, if you are looking for a general (non .NET specific but great) introduction to Regexes, I recommend Regular-Expressions.info.
Simply using your requirement yields
Regex.Replace("name#gmail.com", #"(name)(#gmail.com)", #"$1")
but I suspect what you want is more along the lines of
Regex.Replace("name#gmail.com", #"(\w*)(#.*)", #"$1")
If I understood correctly:
There is pattern with Text followed by Numbers followed by Text if that is correct this should meet your pattern:
string pattern = #"([A-Za-z]+)(\d+)([A-Za-z]+)";
The next step is getting the groups out if it like:
Regex rx = new Regex(pattern);
var match = rx.Match(input);
Then your result may be obtained in 2 ways, the short version:
result = rx.Replace(input, "$3$1$2");
And the long version:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string input = "Some123Grouping";
string pattern = #"([A-Za-z]+)(\d+)([A-Za-z]+)";
Regex rx = new Regex(pattern);
var match = rx.Match(input);
Console.WriteLine("{0} matches found in:\n {1}",
match.Groups.Count,
input);
var newInput = "";
for(int i= match.Groups.Count;i>0;i--){
newInput += match.Groups[i];
}
Console.WriteLine(newInput);
}
}
Regarding your second issue it seems it is as simple as:
var result ="name#gmail.com".Split('#')[0];

How can I split a regex into exact words?

I need a little help regarding Regular Expressions in C#
I have the following string
"[[Sender.Name]]\r[[Sender.AdditionalInfo]]\r[[Sender.Street]]\r[[Sender.ZipCode]] [[Sender.Location]]\r[[Sender.Country]]\r"
The string could also contain spaces and theoretically any other characters. So I really need do match the [[words]].
What I need is a text array like this
"[[Sender.Name]]",
"[[Sender.AdditionalInfo]]",
"[[Sender.Street]]",
// ... And so on.
I'm pretty sure that this is perfectly doable with:
var stringArray = Regex.Split(line, #"\[\[+\]\]")
I'm just too stupid to find the correct Regex for the Regex.Split() call.
Anyone here that can tell me the correct Regular Expression to use in my case?
As you can tell I'm not that experienced with RegEx :)
Why dont you split according to "\r"?
and you dont need regex for that just use the standard string function
string[] delimiters = {#"\r"};
string[] split = line.Split(delimiters,StringSplitOptions.None);
Do matching if you want to get the [[..]] block.
Regex rgx = new Regex(#"\[\[.*?\]\]");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
The regex you are using (\[\[+\]\]) will capture: literal [s 2 or more, then 2 literal ]s.
A regex solution is capturing all the non-[s inside doubled [ and ]s (and the string inside the brackets should not be empty, I guess?), and cast MatchCollection to a list or array (here is an example with a list):
var str = "[[Sender.Name]]\r[[Sender.AdditionalInfo]]\r[[Sender.Street]]\r[[Sender.ZipCode]] [[Sender.Location]]\r[[Sender.Country]]\r";
var rgx22 = new Regex(#"\[\[[^]]+?\]\]");
var res345 = rgx22.Matches(str).Cast<Match>().ToList();
Output:

Regex replacing inside of

Well, I have this code:
StreamReader sr = new StreamReader(#"main.cl", true);
String str = sr.ReadToEnd();
Regex r = new Regex(#"&");
string[] line = r.Split(str);
foreach (string val in line)
{
string Change = val.Replace("puts","System.Console.WriteLine()");
Console.Write(Change);
}
As you can see, I'm trying to replace puts (content) by Console.WriteLine(content) but it would be need Regular Expressions and I didn't found a good article about how to do THIS.
Basically, taking * as the value that is coming, I'd like to do this:
string Change = val.Replace("puts *","System.Console.WriteLine(*)");
Then, if I receive:
puts "Hello World";
I want to get:
System.Console.WriteLine("Hello World");
You need to use Regex.Replace to capture part of the input by using a capturing group and include the captured match into the output. Example:
Regex.Replace(
"puts 'foo'", // input
"puts (.*)", // .* means "any number of characters"
"System.Console.WriteLine($1)") // $1 stands for whatever (.*) matched
If the input always ends in a semicolon you would want to move that semicolon outside the WriteLine parens. One way to do that is:
Regex.Replace(
"puts 'foo';", // input
"puts (.*);", // ; outside parens -- now it's not captured
"System.Console.WriteLine($1);") // manually adding the fixed ; at the end
If you intend to adapt these examples it's a good idea to consult a technical reference first; you can find a very good one here.
What you want to do is look at Grouping Expressions. Give the following a try
Regex.Replace(val, "puts (.*);", "System.Console.WriteLine(${1});");
Note that you can also name your groups, as opposed to using their indexes for replacement. You can do this like so:
Regex.Replace(val, "puts (?<str>.*);", "System.Console.WriteLine(${str});");

Categories