Find Regex Expression only after finding a given String - c#

I am trying to create a method that receives a text and a string and use regex to find a datetime, associated with the given string.
I don't know the position of the regex match. It can be everywhere and change overtime since the text is editable. The following example have 3 options, but can have 10, 25 or even 100.
At the moment, i already created the method that founds the datetime however it is the first match and not the one after the given string.
private static DateTime getLastExecutionTime(string text, string nameFile)
{
string lastRun = string.Empty;
if (Regex.IsMatch(text, nameFile))
{
lastRun = Regex.Match(text, "[0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2}").ToString();
return DateTime.Parse(lastRun);
}
return nullDate;
}
=============
INPUT EXAMPLE
=============
text = "Cat 01-08-2019 16:32\r\nDog 03-08-2019 12:32\r\nBear 13-07-2019 19:22"
nameFile = "Dog"
===============
EXPECTED OUTPUT
===============
lastRun = "03-08-2019 12:32"

An option will be to remove all the text before your nameFile by using Substring and IndexOf.
private static DateTime getLastExecutionTime(string text, string nameFile)
{
string lastRun = string.Empty;
if (Regex.IsMatch(text, nameFile))
{
lastRun = Regex.Match(text.Substring(text.IndexOf(nameFile)), " [0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2}").ToString();
return DateTime.Parse(lastRun);
}
return new DateTime();
}
You can also used a full regex solution :
private static DateTime getLastExecutionTime(string text, string nameFile)
{
string lastRun = Regex.Match(text, "(?:" + nameFile + ") ([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2})").Groups[1].Value;
if (string.IsNullOrEmpty(lastRun))
return new DateTime();
return DateTime.Parse(lastRun);
}

Try this expression:
Regex.Match(text, "^Dog (.+)$", RegexOptions.Multiline).Groups[1].Value
For your input in a text variable the output is: 03-08-2019 12:32
If you need to parametrize it, go ahead:
Regex.Match(text, $"^{query} (.+)$", RegexOptions.Multiline).Groups[1].Value
But make sure you're receiving the query from a trusted source, to prevent injection attacks.
You can quickly test expression here: https://regex101.com/

We can try doing a one-liner regex replacement using the following pattern:
^[\s\S]*([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2})[\s\S]*$
Script:
string text = "Cat 01-08-2019 16:32\r\nDog 03-08-2019 12:32";
string pattern = #"^[\s\S]*([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2})[\s\S]*$";
string output = Regex.Replace(text, pattern, "$1");
Console.WriteLine(output);
This prints:
03-08-2019 12:32

If your date recogniser is the name of the file, you need to add it to the regular expression.
In this example, this Regex.Match will find a date followed by "Dog".
var matches = Regex.Match(text, "Dog ([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9])");
Or in a more generic way:
var matches = Regex.Match(text, nameFile + " ([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9])");
In this case, you you need to access the "Group1" of the match result. The group at index 0 will return the match without the groups.
matches[0].Groups[1].Value
More about MatchCollection:
MSDN

Related

Removing text between 2 strings

I tried to write a function in C# which removes the string between two strings. Like this:
string RemoveBetween(string sourceString, string startTag, string endTag)
At first I thought this is easy, but after some time I encountered more and more problems
So this is the easy case (All examples with startTag="Start" and endTag="End")
"Any Text Start remove this End between" => "Any Text StartEnd between"
But it should also be able to handle multiples without deleting the text between:
"Any Text Start remove this End between should be still there Start and remove this End multiple" => "Any Text StartEnd between should be still there StartEnd multiple"
It should always take the smallest string to remove:
"So Start followed by Start only remove this End other stuff" => "So Start followed by StartEnd other stuff"
It should also respect the order of the the Tags:
"the End before Start. Start before End is correct" => "the End before Start. StartEnd is correct"
I tried a RegEx which did not work (It could not handle multiples):
public string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*){1}", Regex.Escape(startTag), Regex.Escape(endTag)));
return regex.Replace(sourceString, string.Empty);
}
And than I tried to work with IndexOf and Substring, but I do not see an end. And even if it would work, this cant be the most elegant way to solve this.
Here is a approach with string.Remove()
string input = "So Start followed by Start only remove this End other stuff";
int start = input.LastIndexOf("Start") + "Start".Length;
int end = input.IndexOf("End", start);
string result = input.Remove(start, end - start);
I use LastIndexOf() because there can be multiple starts and you want to have the last one.
You must sligthly modify your function to do a non-greedy match with ? and RegexOptions.RightToLeft to work with all your examples :
public static string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*?){1}", Regex.Escape(startTag), Regex.Escape(endTag)), RegexOptions.RightToLeft);
return regex.Replace(sourceString, startTag+endTag);
}
You can use this:
public static string Remove(string original, string firstTag, string secondTag)
{
string pattern = firstTag + "(.*?)" + secondTag;
Regex regex = new Regex(pattern, RegexOptions.RightToLeft);
foreach(Match match in regex.Matches(original))
{
original = original.Replace(match.Groups[1].Value, string.Empty);
}
return original;
}
string data = "text start this is my text end text";
string startTag = "start";
string endTag = "end";
int startIndex = data.IndexOf(startTag)+ startTag.Length;
Console.WriteLine(data.Substring(startIndex, data.IndexOf(endTag)-startIndex));
Or you could try to use LINQ like showed here
public static string Remove(this string s, IEnumerable<char> chars)
{
return new string(s.Where(c => !chars.Contains(c)).ToArray());
}

c# regular expressions find and extract number of giving length

I have a string such as:
"12/11/2015: Liefertermin 71994 : 30.11.2015 -> 27.11.2015"
And I want to extract the substring 71994, which will always be a number of 5 digits
I have tried the following with no success:
private string FindDispo_InInfo()
{
Regex pattern = new Regex("^[0-9]{5,5}$");
Match match = pattern.Match(textBox1.Text);
string stDispo = match.Groups[0].Value;
return stDispo;
}
Replace the anchors ^ and $ with a word boundary \b and use a verbatim string literal:
Regex pattern = new Regex(#"\b[0-9]{5}\b");
And you can access the value using match.Value:
string stDispo = match.Value;
Fixed code:
private static string FindDispo_InInfo(string text)
{
Regex pattern = new Regex(#"\b[0-9]{5}\b");
Match match = pattern.Match(text);
if (match.Success)
return match.Value;
else
return string.Empty;
}
And here is a C# demo:
Console.WriteLine(FindDispo_InInfo("12/11/2015: Liefertermin 71994 : 30.11.2015 -> 27.11.2015"));
// => 71994
However, creating a regex object inside the method might turn out inefficient. Better declare it as a static private read-only field, and then use inside the method as many times as necessary.
What you need is (\d{5}) which will capture a number of length 5

regex replace matches with function and delete other matches

I have a string like the one below and I want to replace the FieldNN instances with the ouput from a function.
So far I have been able to replace the NN instances with the output from the function. But I am not sure how I can delete the static "field" portion with the same regex.
input string:
(Field30="2010002257") and Field1="yuan" not Field28="AAA"
required output:
(IncidentId="2010002257") and Author="yuan" not Recipient="AAA"
This is the code I have so far:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"(?<=field).*?(?=\=)", delegate(Match Match) {
string fieldId = Match.ToString();
return String.Format("_{0}", getFieldName(Convert.ToInt64(fieldId)));
});
log.Info(String.Format("result={0}", result));
return result;
}
which gives:
(field_IncidentId="2010002257") and field_Author="yuan" not field_Recipient="aaa"
The issues I would like to resolve are:
Remove the static "field" prefixes from the output.
Make the regex case-insenitive on the "FieldNN" parts and not lowercase the quoted text portions.
Make the regex more robust so that the quoted string parts an use either double or single quotes.
Make the regex more robust so that spaces are ignored: FieldNN = "AAA" vs. FieldNN="AAA"
I really only need to address the first issue, the other three would be a bonus but I could probably fix those once I have discovered the right patterns for whitespace and quotes.
Update
I think the pattern below solves issues 2. and 4.
result = Regex.Replace(searchTerm, #"(?<=\b(?i:field)).*?(?=\s*\=)", delegate(Match Match)
To fix first issue use groups instead of positive lookbehind:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"field(.*?)(?=\=)", delegate(Match Match) {
string fieldId = Match.Groups[1].Value;
return getFieldName(Convert.ToInt64(fieldId));
});
log.Info(String.Format("result={0}", result));
return result;
}
In this case "field" prefix will be included in each match and will be replaced.

Replace any string between quotes

Problem:
Cannot find a consistent way to replace a random string between quotes with a specific string I want. Any help would be greatly appreciated.
Example:
String str1 = "test=\"-1\"";
should become
String str2 = "test=\"31\"";
but also work for
String str3 = "test=\"foobar\"";
basically I want to turn this
String str4 = "test=\"antyhingCanGoHere\"";
into this
String str4 = "test=\"31\"";
Have tried:
Case insensitive Regex without using RegexOptions enumeration
How do you do case-insensitive string replacement using regular expressions?
Replace any character in between AnyText: and <usernameredacted#example.com> with an empty string using Regex?
Replace string in between occurrences
Replace a String between two Strings
Current code:
Regex RemoveName = new Regex("(?VARIABLE=\").*(?=\")", RegexOptions.IgnoreCase);
String convertSeccons = RemoveName.Replace(ruleFixed, "31");
Returns error:
System.ArgumentException was caught
Message=parsing "(?VARIABLE=").*(?=")" - Unrecognized grouping construct.
Source=System
StackTrace:
at System.Text.RegularExpressions.RegexParser.ScanGroupOpen()
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options)
at application.application.insertGroupID(String rule) in C:\Users\winserv8\Documents\Visual Studio 2010\Projects\application\application\MainFormLauncher.cs:line 298
at application.application.xmlqueryDB(String xmlSaveLocation, TextWriter tw, String ruleName) in C:\Users\winserv8\Documents\Visual Studio 2010\Projects\application\application\MainFormLauncher.cs:line 250
InnerException:
found answer
string s = Regex.Replace(ruleFixed, "VARIABLE=\"(.*)\"", "VARIABLE=\"31\"");
ruleFixed = s;
I found this code sample at Replace any character in between AnyText: and with an empty string using Regex? which is one of the links i previously posted and just had skipped over this syntax because i thought it wouldnt handle what i needed.
var str1 = "test=\"foobar\"";
var str2 = str1.Substring(0, str1.IndexOf("\"") + 1) + "31\"";
If needed add check for IndexOf != -1
I don't know if I understood you correct, but if you want to replace all chars inside string, why aren't you using simple regular expresission
String str = "test=\"-\"1\"";
Regex regExpr = new Regex("\".*\"", RegexOptions.IgnoreCase);
String result = regExpr.Replace(str , "\"31\"");
Console.WriteLine(result);
prints:
test="31"
Note: You can take advantage of plain old XAttribute
String ruleFixed = "test=\"-\"1\"";
var splited = ruleFixed.Split('=');
var attribute = new XAttribute(splited[0], splited[1]);
attribute.Value = "31";
Console.WriteLine(attribute);//prints test="31"
var parts = given.Split('=');
return string.Format("{0}=\"{1}\"", parts[0], replacement);
In the case that your string has other things in it besides just the key/value pair of key="value", then you need to make the value-match part not match quote marks, or it will match all the way from the first value to the last quote mark in the string.
If that is true, then try this:
Regex.Replace(ruleFixed, "(?<=VARIABLE\s*=\s*\")[^\"]*(?=\")", "31");
This uses negative look-behind to match the VARIABLE=" part (with optional white space around it so VARIABLE = " would work as well, and negative look-ahead to match the ending ", without including the look-ahead/behind in the final match, enabling you to just replace the value you want.
If not, then your solution will work, but is not optimal because you have to repeat the value and the quote marks in the replace text.
Assuming that the string within the quotes does not contain quotes itself, you can use this general pattern in order to find a position between a prefix and a suffix:
(?<=prefix)find(?=suffix)
In your case
(?<=\w+=").*?(?=")
Here we are using the prefix \w+=" where \w+ denotes word characters (the variable) and =" are the equal sign and the quote.
We want to find anything .*? until we encounter the next quote.
The suffix is simply the quote ".
string result = Regex.Replace(input, "(?<=\\w+=\").*?(?=\")", replacement);
Try this:
[^"\r\n]*(?:""[\r\n]*)*
var pattern = "\"(.*)?\"";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var replacement = regex.Replace("test=\"hereissomething\"", "\"31\"");
string s = Regex.Replace(ruleFixed, "VARIABLE=\"(.*)\"", "VARIABLE=\"31\"");
ruleFixed = s;
I found this code sample at Replace any character in between AnyText: and <usernameredacted#example.com> with an empty string using Regex? which is one of the links i previously posted and just had skipped over this syntax because i thought it wouldnt handle what i needed.
String str1 = "test=\"-1\"";
string[] parts = str1.Split(new[] {'"'}, 3);
string str2 = parts.Length == 3 ? string.Join(#"\", parts.First(), "31", parts.Last()) : str1;
String str1 = "test=\"-1\"";
string res = Regex.Replace(str1, "(^+\").+(\"+)", "$1" + "31" + "$2");
Im pretty bad at RegEx but you could make a simple ExtensionMethod using string functions to do this.
public static class StringExtensions
{
public static string ReplaceBetweenQuotes(this string str, string replacement)
{
if (str.Count(c => c.Equals('"')) == 2)
{
int start = str.IndexOf('"') + 1;
str = str.Replace(str.Substring(start, str.LastIndexOf('"') - start), replacement);
}
return str;
}
}
Usage:
String str3 = "test=\"foobar\"";
str3 = str3.ReplaceBetweenQuotes("31");
returns: "test=\"31\""

C# regular expression to find custom markers and take content

I have a string:
productDescription
In it are some custom tags such as:
[MM][/MM]
For example the string might read:
This product is [MM]1000[/MM] long
Using a regular expression how can I find those MM tags, take the content of them and replace everything with another string? So for example the output should be:
This product is 10 cm long
I think you'll need to pass a delegate to the regex for that.
Regex theRegex = new Regex(#"\[MM\](\d+)\[/MM\]");
text = theRegex.Replace(text, delegate(Match thisMatch)
{
int mmLength = Convert.ToInt32(thisMatch.Groups[1].Value);
int cmLength = mmLength / 10;
return cmLength.ToString() + "cm";
});
Using RegexDesigner.NET:
using System.Text.RegularExpressions;
// Regex Replace code for C#
void ReplaceRegex()
{
// Regex search and replace
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"\[MM\](?<value>.*)\[\/MM\]", options);
string input = #"[MM]1000[/MM]";
string replacement = #"10 cm";
string result = regex.Replace(input, replacement);
// TODO: Do something with result
System.Windows.Forms.MessageBox.Show(result, "Replace");
}
Or if you want the orginal text back in the replacement:
Regex regex = new Regex(#"\[MM\](?<theText>.*)\[\/MM\]", options);
string replacement = #"${theText} cm";
A regex like this
\[(\w+)\](\d+)\[\/\w+\]
will find and collect the units (like MM) and the values (like 1000). That would at least allow you to use the pairs of parts intelligently to do the conversion. You could then put the replacement string together, and do a straightforward string replacement, because you know the exact string you're replacing.
I don't think you can do a simple RegEx.Replace, because you don't know the replacement string at the point you do the search.
Regex rex = new Regex(#"\[MM\]([0-9]+)\[\/MM\]");
string s = "This product is [MM]1000[/MM] long";
MatchCollection mc = rex.Matches(s);
Will match only integers.
mc[n].Groups[1].Value;
will then give the numeric part of nth match.

Categories