Regex to parse URL from an excel formula - c#

I have a formula in excel which upon reading from C# code looks like this
"=HYPERLINK(CONCATENATE(\"https://abc.efghi.rtyui.com/#/wqeqwq/\",#REF!,\"/asdasd\"), \"View asdas\")"
I want to use regex to fetch the URL from this string, i.e.
https://abc.efghi.rtyui.com/#/wqeqwq/#REF!/asdasd
The url can be different but the format of the formula will remain the same.
"=HYPERLINK(CONCATENATE(\"{SOME_STRING}\",#REF!,\"{SOME_STRING}\"), \"View asdas\")"

Try it like this:
(?<=HYPERLINK\(CONCATENATE\(")[^"]+
Demo
The positive lookbehind allows us to skip part in-front of the URL from the full match.
If you have an arbitrary number of whitespace in-between add some \s*, e.g. see this example that also shows the escaped = at the beginning of the string.
Sample Code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<=HYPERLINK\(CONCATENATE\("")[^""]+";
string input = #"=HYPERLINK(CONCATENATE(""https://abc.efghi.rtyui.com/#/wqeqwq/"",#REF!,""/asdasd""), ""View asdas"")";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Addendum: Here is another technique that uses capturing groups and regex Replace to extract the resulting URL string (after CONCATENATE would have happened):
^\=HYPERLINK\(CONCATENATE\("([^"]+)",([^,]+),"([^"]+)".*$
Demo2
string pattern = #"^\=HYPERLINK\(CONCATENATE\(""([^""]+)"",([^,]+),""([^""]+)"".*$";
string substitution = #"$1$2$3";
string input = #"=HYPERLINK(CONCATENATE(""https://abc.efghi.rtyui.com/#/wqeqwq/"",#REF!,""/asdasd""), ""View asdas"")";
Regex regex = new Regex(pattern);
string result = regex.Replace(input, substitution, 1);

You can extract the URL from the formula using capturing groups in regular expression as given below:
string inputString = "=HYPERLINK(CONCATENATE(\"https://abc.efghi.rtyui.com/#/wqeqwq/\",#REF!,\"/asdasd\"), \"View asdas\")";
string regex = "CONCATENATE\\(\"([\\S]+)\",#REF!,\"([\\S]+)\"\\)";
Regex substringRegex = new Regex(regex, RegexOptions.IgnoreCase);
Match substringMatch = substringRegex.Match(inputString);
if (substringMatch.Success)
{
string url = substringMatch.Groups[1].Value + "#REF!" + substringMatch.Groups[2].Value;
}
I have defined two capturing groups in my regular expression. One for extracting part of the URL before #REF! and the other for extracting part of the URL after #REF!. Then I am concatenating all the extracted parts with #REF! to get the final URL.

Related

How to use regex to extract a string with multiple curly braces?

I have this sample string
`{1:}{2:}{3:}{4:\r\n-}{5:}`
and I want to extract out only {4:\r\n-}
This is my code but it is not working.
var str = "{1:}{2:}{3:}{4:\r\n-}{5:}";
var regex = new Regex(#"{4:*.*-}");
var match = regex.Match(str);
You need to escape the special regex characters (in this case the opening and closing braces and the backslashes) in the search string. This would capture just that part:
var regex = new Regex("\{4:\\r\\n-\}");
... or if you wanted anything up to and including the slash before the closing brace (which is what it looks like you might be trying to do)...
var regex = new Regex("\{4:[^-]*-\}");
You just need to escape your \r and \n characters in your regular expression. You can use the Regex.Escape() method to escape characters in your regex string which returns a string of characters that are converted to their escaped form.
Working example: https://dotnetfiddle.net/6GLZrl
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string str = #"{1:}{2:}{3:}{4:\r\n-}{5:}";
string regex = #"{4:\r\n-}"; //Original regex
Match m = Regex.Match(str, Regex.Escape(regex));
if (m.Success)
{
Console.WriteLine("Found '{0}' at position {1}.", m.Value, m.Index);
}
else
{
Console.WriteLine("No match found");
}
}
}

Regex - Extract string patterns

I have many strings like these
/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item
...
For each one I need to extract the defined pattern like these
/test/v1/{Value}/item
/test/v1/{Value}/item/{Value}/document
/test/v2/field/{Value}/item
The value can be a guid or something else, Can I match the given string patterns with input paths with regex?
I wrote just this code but I don't konw how to match input paths with patterns. The result should be the pattern. Thank you
string pattern1 = "/test/v1/{Value}/item";
string pattern2 = "/test/v1/{Value}/item/{Value}/document";
string pattern3 = "/test/v2/field/{Value}/item";
List<string> paths = new List<string>();
List<string> matched = new List<string>();
paths.Add("/test/v1/3908643GASF/item");
paths.Add("/test/v1/343569/item/AAAS45663/document");
paths.Add("/test/v1/343569/item/AAAS45664/document");
paths.Add("/test/v1/123444/item/AAAS45688/document");
paths.Add("/test/v2/field/1230FRE/item");
foreach (var path in paths)
{
}
This can also be achieved using regex alone. You can probably try:
(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?
Explanation of the above regex:
(\w+) - Represents a capturing group matching a word character one or more time. This group captures our required result.
\/\w+(?<=\/item) - Represents a positive look-behind matching the characters before \items.
$1 - Captured group 1 contains the required information you're expecting.
(\/(\w+)\/)? - Represents the second and third capturing group capturing if after item some other values is present or not.
You can find the demo of the above regex in here.
Sample implementation in C#:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?";
string input = #"/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.Write(m.Groups[1].Value + " ");
if(m.Groups[3].Value != null)
Console.WriteLine(m.Groups[3].Value);
}
}
}
You can find the sample run of the above implementation in here.

How can I replace "XX,XXX" with "XX XXX"?

I need to replace string like "XX,XXX" with "XX XXX". The string "XX,XXX" is in another string, e.g:
"-1299-5,"XXX,XXXX",trft,4,0,10800"
The string is fetched from a text file. I want to split the string by ",". But the comma in the substring led to the wrong result.
The X represents a char. I think regex can help, who can give me the right regex expression.
This expression,
(.*"[^,]*),([^,]*".*)
with a replacement of $1 $2 might work.
Demo
Example
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(.*""[^,]*),([^,]*"".*)";
string substitution = #"\1 \2";
string input = #"-1299-5,""XXX,XXXX"",trft,4,0,10800";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
Simply, use 'Replace' to replace char from your string.
var test = "XXX,XXXX";
var filtered = test.Replace(',', ' ');
Console.WriteLine(filtered);
Output :
XXX XXXX

Looking for patterns in a string how to?

I'm trying to find all instances of the substring EnemyType('XXXX') where XXXX is an arbitrary string and the instasnce of EnemyType('XXXX') can appear multiple times.
Right now I'm using a consortium of index of/substring functions in C# but would like to know if there's a cleaner way of doing it?
Use regex. Example:
using System.Text.RegularExpressions;
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('\d{4}'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Value);
}
It will print:
EnemyType('1234')
EnemyType('5678')
The pattern to match is #"EnemyType\('\d{4}'\)", where \d{4} means 4 numeric characters (0-9). The parentheses are escaped with backslash.
Edit: Since you only want the string inside quotes, not the whole string, you can use named groups instead.
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('(?<id>[^']+)'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Groups["id"].Value);
}
Now it prints:
1234
5678
Regex is a really nice tool for parsing strings. If you often parse strings, regex can make life so much easier.

C# regular expression to find custom markers and take content

I have a string:
productDescription
In it are some custom tags such as:
[MM][/MM]
For example the string might read:
This product is [MM]1000[/MM] long
Using a regular expression how can I find those MM tags, take the content of them and replace everything with another string? So for example the output should be:
This product is 10 cm long
I think you'll need to pass a delegate to the regex for that.
Regex theRegex = new Regex(#"\[MM\](\d+)\[/MM\]");
text = theRegex.Replace(text, delegate(Match thisMatch)
{
int mmLength = Convert.ToInt32(thisMatch.Groups[1].Value);
int cmLength = mmLength / 10;
return cmLength.ToString() + "cm";
});
Using RegexDesigner.NET:
using System.Text.RegularExpressions;
// Regex Replace code for C#
void ReplaceRegex()
{
// Regex search and replace
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"\[MM\](?<value>.*)\[\/MM\]", options);
string input = #"[MM]1000[/MM]";
string replacement = #"10 cm";
string result = regex.Replace(input, replacement);
// TODO: Do something with result
System.Windows.Forms.MessageBox.Show(result, "Replace");
}
Or if you want the orginal text back in the replacement:
Regex regex = new Regex(#"\[MM\](?<theText>.*)\[\/MM\]", options);
string replacement = #"${theText} cm";
A regex like this
\[(\w+)\](\d+)\[\/\w+\]
will find and collect the units (like MM) and the values (like 1000). That would at least allow you to use the pairs of parts intelligently to do the conversion. You could then put the replacement string together, and do a straightforward string replacement, because you know the exact string you're replacing.
I don't think you can do a simple RegEx.Replace, because you don't know the replacement string at the point you do the search.
Regex rex = new Regex(#"\[MM\]([0-9]+)\[\/MM\]");
string s = "This product is [MM]1000[/MM] long";
MatchCollection mc = rex.Matches(s);
Will match only integers.
mc[n].Groups[1].Value;
will then give the numeric part of nth match.

Categories