Need a Regular Expression C# - c#

I need a regular expression to replace text in string:
string s="Insert into VERSION (ENTRYID,APPVERSION,PLATFORMVERSION,TIMESTAMPED,USERNAME,SQLSCRIPTNAME,COMMENTS)VALUES(SWS_Version_ID."NEXTVAL",'[3.02.01P20]','[4.1.38orcl]',sysdate,null,null,null);";
I need to replace 3.02.01P20 in square brackets to NEW_VERSION.
There can be other version except 3.02.01P20 but in the line we can see that the first opening square bracket follows the version.
Also let me know what changes do I have to make if it(3.02.01P20) follows, say 3 opening square bracket ([) so that I wont have to write a separate question for each one.

using System;
using System.Text.RegularExpressions;
class Tester
{
public static void Main()
{
string s = "Insert into VERSION " +
"(ENTRYID,APPVERSION,PLATFORMVERSION,TIMESTAMPED,USERNAME,SQLSCRIPTNAME,COMMENTS)" +
"VALUES(SWS_Version_ID.\"NEXTVAL\",'[3.02.01P20]','[4.1.38orcl]',sysdate,null,null,null);";
Match m = (new Regex("^(.*)(\\[.*?\\])(.*?)(\\[.*?\\])(.*)$")).Match(s);
//Console.WriteLine("{0},{1}", m.Groups[2].Value, m.Groups[3].Value);
string[] parts = {
m.Groups[1].Value,
m.Groups[2].Value, //[3.02.01P20]
m.Groups[3].Value, //','
m.Groups[4].Value, //[4.1.38orcl]
m.Groups[5].Value //tail
};
parts[1] = "[NEW_VERSION]";
Console.WriteLine(string.Join("",parts));
}
}
You mean like this?

Try this:
string output = Regex.Replace(s, #"(.*'\[)(.*)(\]'.*)('\[.*)", "$1" + newVer + "$3$4");

Related

Replacing anchor/link in text

I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards
If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].
Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}

How can I extract a dynamic length string from multiline string?

I am using "nslookup" to get machine name from IP.
nslookup 1.2.3.4
Output is multiline and machine name's length dynamic chars. How can I extract "DynamicLengthString" from all output. All suggestions IndexOf and Split, but when I try to do like that, I was not a good solution for me. Any advice ?
Server: volvo.toyota.opel.tata
Address: 5.6.7.8
Name: DynamicLengthString.toyota.opel.tata
Address: 1.2.3.4
I made it the goold old c# way without regex.
string input = #"Server: volvo.toyota.opel.tata
Address: 5.6.7.8
Name: DynamicLengtdfdfhString.toyota.opel.tata
Address: 1.2.3.4";
string targetLineStart = "Name:";
string[] allLines = input.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
string targetLine = String.Empty;
foreach (string line in allLines)
if (line.StartsWith(targetLineStart))
{
targetLine = line;
}
System.Console.WriteLine(targetLine);
string dynamicLengthString = targetLine.Remove(0, targetLineStart.Length).Split('.')[0].Trim();
System.Console.WriteLine("<<" + dynamicLengthString + ">>");
System.Console.ReadKey();
This extracts "DynamicLengtdfdfhString" from the given input, no matter where the Name-Line is and no matter what comes afterwards.
This is the console version to test & verify it.
You can use Regex
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string Content = "Server: volvo.toyota.opel.tata \rAddress: 5.6.7.8 \rName: DynamicLengthString.toyota.opel.tata \rAddress: 1.2.3.4";
string Pattern = "(?<=DynamicLengthString)(?s)(.*$)";
//string Pattern = #"/^Dy*$/";
MatchCollection matchList = Regex.Matches(Content, Pattern);
Console.WriteLine("Running");
foreach(Match match in matchList)
{
Console.WriteLine(match.Value);
}
}
}
I'm going to assume your output is exactly like you put it.
string output = ExactlyAsInTheQuestion();
var fourthLine = output.Split(Environment.NewLine)[3];
var nameValue = fourthLine.Substring(9); //skips over "Name: "
var firstPartBeforePeriod = nameValue.Split('.')[0];
//firstPartBeforePeriod should equal "DynamicLengthString"
Note that this is a barebones example:
Either check all array indexes before you access them, or be prepared to catch IndexOutOfRangeExceptions.
I've assumed that the four spaces between "Name:" and "DynamicLengthString" are four spaces. If they are a tab character, you'll need to adjust the Substring(9) method to Substring(6).
If "DynamicLengthString" is supposed to also have periods in its value, then my answer does not apply. You'll need to use a regex in that case.
Note: I'm aware that you dismissed Split:
All suggestions IndexOf and Split, but when I try to do like that, I was not a good solution for me.
But based on only this description, it's impossible to know if the issue was in getting Split to work, or it actually being unusable for your situation.

Identify the string that does not exists in another string using regex and C#

I am trying to capture a string that does not contains in another string.
string searchedString = " This is my search string";
string subsetofSearchedString = "This is my";
My output should be "Search string". I would like to go with only regex so that I can handle complex strings.
The below is the code that I have tried so far and I am not successful.
Match match = new Regex(subsetofSearchedString ).Match(searchedString );
if (!string.IsNullOrWhiteSpace(match.Value))
{
UnmatchedString= UnmatchedString.Replace(match.Value, string.Empty);
}
Update : The above code is not working for the below texts.
text1 = 'Property Damage (2015 ACURA)' Exposure Added Automatically for IP:Claimant DriverLoss Reserve Line :Property DamageReserve Amount $ : STATIP Role(s): Owner, DriverExposure Owner :Jaimee Watson_csr Author:
text2 = 'Property Damage (2015 ACURA)' Exposure Added Automatically for IP:Claimant DriverLoss Reserve Line :Property DamageReserve Amount $ : STATIP Role(s): Owner, Driver
Match match = new Regex(text2).Match(text1);
You can use Regex.Split:
var ans = Regex.Split(searchedString, subsetofSearchedString);
If you want the answer as a single string minus the subset, you can join it:
var ansjoined = String.Join("", ans);
Replacing with String.Empty will also work:
var ans = Regex.Replace(searchedString, subsetOfSearchedString, String.Empty);
Answer :
Regex wasn't working for me because of the presence of metacharacters in my string. Regex.Escape did not help me with the comparison.
String Contains worked like a charm here
if (text1.Contains(text2))
{
status = TestResult.Pass;
text1= text1.Replace(text2, string.Empty);
}

How to Regex match a pattern with parentheses in C#

Background: I'm doing some complicated code generation that requires me to extract the methods within a C# interface file. I cannot simply use reflection because this code will feed a T4 template which will not have the compiled code to reflect upon. Thus I am attempting parsing. I can easily make my own parser, but it would be nice if there was a regular expression solution.
Question: Is-there/What regex pattern would match the method declarations (including the return types and parameters) of the string below using C#'s Regular Expressions library?
string testing = #"
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
public interface Service
{
int Test1(int a);
int Test2(int a, int b);
int Test3(
int a,
int b);
int Test4(out int a);
}
}
";
The regex pattern I desire should make four matches:
"int Test1(int a);"
"int Test2(int a, int b);"
"int Test3( int a, int b);" [note: #3 would be multi-line]
"int Test4(out int a);"
Solution Attempt: Here is possibly the closest I have come to a regex solution thus far:
string WhiteSpacePattern = #"\s+";
string PossibleWhiteSpacePattern = #"\s*";
string CsharpWordPattern = #"[a-zA-Z_]+";
string ParenthesesPattern = #"[(][\s\S]*?[)]";
string DoubleCsharpWordPattern = CsharpWordPattern + WhiteSpacePattern + CsharpWordPattern;
string MethodDeclarationPattern =
DoubleCsharpWordPattern +
PossibleWhiteSpacePattern +
ParenthesesPattern;
Pattern usage example:
MatchCollection tests = Regex.Matches(testing, MethodDeclarationPattern);
The individual patterns work perfectly (CsharpWordPattern, ParenthesesPattern, WhiteSpacePattern, and PossibleWhiteSpacePattern). However, when I put them altogether into a single pattern (MethodDeclarationPattern), the full pattern is failing.
How does MethodDeclarationPattern or my usage example need to be altered so that it will start matching the method declarations in the interface code?
To match literal parens, escape them with backslashes:
string ParenthesesPattern = #"\([\s\S]*?\)";
That regex snippet matches a matched pair of parentheses, with optional whitespace between them. You're putting it at the end of your overall regex.
Your complete concatenated regex looks like this:
[a-zA-Z_]+\s+[a-zA-Z_]+\s*[(][\s\S]*?[)]
Identifier, space, identifier, open paren, space, close paren.
For that to match, the method declaration will have to look like this:
"int foo ()"
I believe you'll have better success with something like this:
string openParenPattern = #"\([\s\S]*?";
string closeParenPattern = #"[\s\S]*?\)";
What you really need, conceptually, is this (leaving out space -- no need to clutter it up with that):
identifier
identifier
open paren
((ref|out)? identifier identifier comma)*
((ref|out)? identifier identifier)?
close paren
You know all the syntax for that, I think. You'll have nested groups. Looking at it, I'm really starting to warm up to your idea of putting sub-regexes in string variables and then concatenating them.
The following code matches all four method declarations in your test string:
// This has one bug: It matches "int foo(int a,)"
// Somebody good with regexes could fix that.
var methodPattern =
// return type
identPattern + spacePattern
// method name
+ identPattern + spacePattern
// open paren
+ openParenPattern + spacePattern
// Zero or more parameters followed by commas
+ "(" + paramPattern + spacePattern + "," + spacePattern + ")*" + spacePattern
// Final (or only) parameter not followed by a comma
+ "(" + paramPattern + spacePattern + ")?" + spacePattern
// Close paren
+ closeParenPattern;

Regex C# is it possible to use a variable in substitution?

I got bunch of strings in text, which looks like something like this:
h1. this is the Header
h3. this one the header too
h111. and this
And I got function, which suppose to process this text depends on what lets say iteration it been called
public void ProcessHeadersInText(string inputText, int atLevel = 1)
so the output should look like one below in case of been called
ProcessHeadersInText(inputText, 2)
Output should be:
<h3>this is the Header<h3>
<h5>this one the header too<h5>
<h9 and this <h9>
(last one looks like this because of if value after h letter is more than 9 it suppose to be 9 in the output)
So, I started to think about using regex.
Here's the example https://regex101.com/r/spb3Af/1/
(As you can see I came up with regex like this (^(h([\d]+)\.+?)(.+?)$) and tried to use substitution on it <h$3>$4</h$3>)
Its almost what I'm looking for but I need to add some logic into work with heading level.
Is it possible to add any work with variables in substitution?
Or I need to find other way? (extract all heading first, replace em considering function variables and value of the header, and only after use regex I wrote?)
The regex you may use is
^h(\d+)\.+\s*(.+)
If you need to make sure the match does not span across line, you may replace \s with [^\S\r\n]. See the regex demo.
When replacing inside C#, parse Group 1 value to int and increment the value inside a match evaluator inside Regex.Replace method.
Here is the example code that will help you:
using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
public class Test
{
// Demo: https://regex101.com/r/M9iGUO/2
public static readonly Regex reg = new Regex(#"^h(\d+)\.+\s*(.+)", RegexOptions.Compiled | RegexOptions.Multiline);
public static void Main()
{
var inputText = "h1. Topic 1\r\nblah blah blah, because of bla bla bla\r\nh2. PartA\r\nblah blah blah\r\nh3. Part a\r\nblah blah blah\r\nh2. Part B\r\nblah blah blah\r\nh1. Topic 2\r\nand its cuz blah blah\r\nFIN";
var res = ProcessHeadersInText(inputText, 2);
Console.WriteLine(res);
}
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
return reg.Replace(inputText, m =>
string.Format("<h{0}>{1}</h{0}>", (int.Parse(m.Groups[1].Value) > 9 ?
9 : int.Parse(m.Groups[1].Value) + atLevel), m.Groups[2].Value.Trim()));
}
}
See the C# online demo
Note I am using .Trim() on m.Groups[2].Value as . matches \r. You may use TrimEnd('\r') to get rid of this char.
You can use a Regex like the one used below to fix your issues.
Regex.Replace(s, #"^(h\d+)\.(.*)$", #"<$1>$2<$1>", RegexOptions.Multiline)
Let me explain you what I am doing
// This will capture the header number which is followed
// by a '.' but ignore the . in the capture
(h\d+)\.
// This will capture the remaining of the string till the end
// of the line (see the multi-line regex option being used)
(.*)$
The parenthesis will capture it into variables that can be used as "$1" for the first capture and "$2" for the second capture
Try this:
private static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
// Group 1 = value after 'h'
// Group 2 = Content of header without leading whitespace
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern, match => EvaluateHeaderMatch(match, atLevel), RegexOptions.Multiline);
}
private static string EvaluateHeaderMatch(Match m, int atLevel)
{
int hVal = int.Parse(m.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{m.Groups[2].Value}</h{hVal}>";
}
Then just call
ProcessHeadersInText(input, 2);
This uses the Regex.Replace(string, string, MatchEvaluator, RegexOptions) overload with a custom evaluator function.
You could of course streamline this solution into a single function with an inline lambda expression:
public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
string pattern = #"^h(\d+)\.\s*(.*?)\r?$";
return Regex.Replace(inputText, pattern,
match =>
{
int hVal = int.Parse(match.Groups[1].Value) + atLevel;
if (hVal > 9) { hVal = 9; }
return $"<h{hVal}>{match.Groups[2].Value}</h{hVal}>";
},
RegexOptions.Multiline);
}
A lot of good solution in this thread, but I don't think you really need a Regex solution for your problem. For fun and challenge, here a non regex solution:
Try it online!
using System;
using System.Linq;
public class Program
{
public static void Main()
{
string extractTitle(string x) => x.Substring(x.IndexOf(". ") + 2);
string extractNumber(string x) => x.Remove(x.IndexOf(". ")).Substring(1);
string build(string n, string t) => $"<h{n}>{t}</h{n}>";
var inputs = new [] {
"h1. this is the Header",
"h3. this one the header too",
"h111. and this" };
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}
I use C#7 nested function and C#6 interpolated string. If you want, I can use more legacy C#. The code should be easy to read, I can add comments if needed.
C#5 version
using System;
using System.Linq;
public class Program
{
static string extractTitle(string x)
{
return x.Substring(x.IndexOf(". ") + 2);
}
static string extractNumber(string x)
{
return x.Remove(x.IndexOf(". ")).Substring(1);
}
static string build(string n, string t)
{
return string.Format("<h{0}>{1}</h{0}>", n, t);
}
public static void Main()
{
var inputs = new []{
"h1. this is the Header",
"h3. this one the header too",
"h111. and this"
};
foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
{
Console.WriteLine(line);
}
}
}

Categories