Removing text between 2 strings - c#

I tried to write a function in C# which removes the string between two strings. Like this:
string RemoveBetween(string sourceString, string startTag, string endTag)
At first I thought this is easy, but after some time I encountered more and more problems
So this is the easy case (All examples with startTag="Start" and endTag="End")
"Any Text Start remove this End between" => "Any Text StartEnd between"
But it should also be able to handle multiples without deleting the text between:
"Any Text Start remove this End between should be still there Start and remove this End multiple" => "Any Text StartEnd between should be still there StartEnd multiple"
It should always take the smallest string to remove:
"So Start followed by Start only remove this End other stuff" => "So Start followed by StartEnd other stuff"
It should also respect the order of the the Tags:
"the End before Start. Start before End is correct" => "the End before Start. StartEnd is correct"
I tried a RegEx which did not work (It could not handle multiples):
public string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*){1}", Regex.Escape(startTag), Regex.Escape(endTag)));
return regex.Replace(sourceString, string.Empty);
}
And than I tried to work with IndexOf and Substring, but I do not see an end. And even if it would work, this cant be the most elegant way to solve this.

Here is a approach with string.Remove()
string input = "So Start followed by Start only remove this End other stuff";
int start = input.LastIndexOf("Start") + "Start".Length;
int end = input.IndexOf("End", start);
string result = input.Remove(start, end - start);
I use LastIndexOf() because there can be multiple starts and you want to have the last one.

You must sligthly modify your function to do a non-greedy match with ? and RegexOptions.RightToLeft to work with all your examples :
public static string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*?){1}", Regex.Escape(startTag), Regex.Escape(endTag)), RegexOptions.RightToLeft);
return regex.Replace(sourceString, startTag+endTag);
}

You can use this:
public static string Remove(string original, string firstTag, string secondTag)
{
string pattern = firstTag + "(.*?)" + secondTag;
Regex regex = new Regex(pattern, RegexOptions.RightToLeft);
foreach(Match match in regex.Matches(original))
{
original = original.Replace(match.Groups[1].Value, string.Empty);
}
return original;
}

string data = "text start this is my text end text";
string startTag = "start";
string endTag = "end";
int startIndex = data.IndexOf(startTag)+ startTag.Length;
Console.WriteLine(data.Substring(startIndex, data.IndexOf(endTag)-startIndex));

Or you could try to use LINQ like showed here
public static string Remove(this string s, IEnumerable<char> chars)
{
return new string(s.Where(c => !chars.Contains(c)).ToArray());
}

Related

Find Regex Expression only after finding a given String

I am trying to create a method that receives a text and a string and use regex to find a datetime, associated with the given string.
I don't know the position of the regex match. It can be everywhere and change overtime since the text is editable. The following example have 3 options, but can have 10, 25 or even 100.
At the moment, i already created the method that founds the datetime however it is the first match and not the one after the given string.
private static DateTime getLastExecutionTime(string text, string nameFile)
{
string lastRun = string.Empty;
if (Regex.IsMatch(text, nameFile))
{
lastRun = Regex.Match(text, "[0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2}").ToString();
return DateTime.Parse(lastRun);
}
return nullDate;
}
=============
INPUT EXAMPLE
=============
text = "Cat 01-08-2019 16:32\r\nDog 03-08-2019 12:32\r\nBear 13-07-2019 19:22"
nameFile = "Dog"
===============
EXPECTED OUTPUT
===============
lastRun = "03-08-2019 12:32"
An option will be to remove all the text before your nameFile by using Substring and IndexOf.
private static DateTime getLastExecutionTime(string text, string nameFile)
{
string lastRun = string.Empty;
if (Regex.IsMatch(text, nameFile))
{
lastRun = Regex.Match(text.Substring(text.IndexOf(nameFile)), " [0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2}").ToString();
return DateTime.Parse(lastRun);
}
return new DateTime();
}
You can also used a full regex solution :
private static DateTime getLastExecutionTime(string text, string nameFile)
{
string lastRun = Regex.Match(text, "(?:" + nameFile + ") ([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2})").Groups[1].Value;
if (string.IsNullOrEmpty(lastRun))
return new DateTime();
return DateTime.Parse(lastRun);
}
Try this expression:
Regex.Match(text, "^Dog (.+)$", RegexOptions.Multiline).Groups[1].Value
For your input in a text variable the output is: 03-08-2019 12:32
If you need to parametrize it, go ahead:
Regex.Match(text, $"^{query} (.+)$", RegexOptions.Multiline).Groups[1].Value
But make sure you're receiving the query from a trusted source, to prevent injection attacks.
You can quickly test expression here: https://regex101.com/
We can try doing a one-liner regex replacement using the following pattern:
^[\s\S]*([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2})[\s\S]*$
Script:
string text = "Cat 01-08-2019 16:32\r\nDog 03-08-2019 12:32";
string pattern = #"^[\s\S]*([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9]{2})[\s\S]*$";
string output = Regex.Replace(text, pattern, "$1");
Console.WriteLine(output);
This prints:
03-08-2019 12:32
If your date recogniser is the name of the file, you need to add it to the regular expression.
In this example, this Regex.Match will find a date followed by "Dog".
var matches = Regex.Match(text, "Dog ([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9])");
Or in a more generic way:
var matches = Regex.Match(text, nameFile + " ([0-9]{2}-[0-9]{2}-[0-9]{4} [0-9]{2}:[0-9])");
In this case, you you need to access the "Group1" of the match result. The group at index 0 will return the match without the groups.
matches[0].Groups[1].Value
More about MatchCollection:
MSDN

Remove last occurrence of a string in a string

I have a string that is of nature
RTT(50)
RTT(A)(50)
RTT(A)(B)(C)(50)
What I want to is to remove the last () occurrence from the string. That is if the string is - RTT(50), then I want RTT only returned. If it is RTT(A)(50), I want RTT(A) returned etc.
How do I achieve this? I currently use a substring method that takes out any occurrence of the () regardless. I thought of using:
Regex.Matches(node.Text, "( )").Count
To count the number of occurrences so I did something like below.
if(Regex.Matches(node.Text, "( )").Count > 1)
//value = node.Text.Remove(Regex.//Substring(1, node.Text.IndexOf(" ("));
else
value = node.Text.Substring(0, node.Text.IndexOf(" ("));
The else part will do what I want. However, how to remove the last occurrence in the if part is where I am stuck.
The String.LastIndexOf method does what you need - returns the last index of a char or string.
If you're sure that every string will have at least one set of parentheses:
var result = node.Text.Substring(0, node.Text.LastIndexOf("("));
Otherwise, you could test the result of LastIndexOf:
var lastParenSet = node.Text.LastIndexOf("(");
var result =
node.Text.Substring(0, lastParenSet > -1 ? lastParenSet : node.Text.Count());
This should do what you want :
your_string = your_string.Remove(your_string.LastIndexOf(string_to_remove));
It's that simple.
There are a couple of different options to consider.
LastIndexOf
Get the last index of the ( character and take the substring up to that index. The downside of this approach is an additional last index check for ) would be needed to ensure that the format is correct and that it's a pair with the closing parenthesis occurring after the opening parenthesis (I did not perform this check in the code below).
var index = input.LastIndexOf('(');
if (index >= 0)
{
var result = input.Substring(0, index);
Console.WriteLine(result);
}
Regex with RegexOptions.RightToLeft
By using RegexOptions.RightToLeft we can grab the last index of a pair of parentheses.
var pattern = #"\(.+?\)";
var match = Regex.Match(input, pattern, RegexOptions.RightToLeft);
if (match.Success)
{
var result = input.Substring(0, match.Index);
Console.WriteLine(result);
}
else
{
Console.WriteLine(input);
}
Regex depending on numeric format
If you're always expecting the final parentheses to have numeric content, similar to your example values where (50) is getting removed, we can use a pattern that matches any numbers inside parentheses.
var patternNumeric = #"\(\d+\)";
var result = Regex.Replace(input, patternNumeric, "");
Console.WriteLine(result);
It's very simple. You can easily achieve like this:
string a=RTT(50);
string res=a.substring (0,a.LastIndexOf("("))
As an extention:
namespace CustomExtensions
{
public static class StringExtension
{
public static string ReplaceLastOf(this string str, string fromStr, string toStr)
{
int lastIndexOf = str.LastIndexOf(fromStr);
if (lastIndexOf < 0)
return str;
string leading = str.Substring(0, lastIndexOf);
int charsToEnd = str.Length - (lastIndexOf + fromStr.Length);
string trailing = str.Substring(lastIndexOf+fromStr.Length, charsToEnd);
return leading + toStr + trailing;
}
}
}
Use:
string myFavColor = "My favourite color is blue";
string newFavColor = myFavColor.ReplaceLastOf("blue", "red");
try something a function this:
public static string ReplaceLastOccurrence(string source, string find, string replace)
{
int place = source.LastIndexOf(find);
return source.Remove(place, find.Length).Insert(place, replace);
}
It will remove the last occurrence of a string string and replace to another one, and use:
string result = ReplaceLastOccurrence(value, "(", string.Empty);
In this case, you find ( string inside the value string, and replace the ( to a string.Empty. It also could be used to replace to another information.

regex replace matches with function and delete other matches

I have a string like the one below and I want to replace the FieldNN instances with the ouput from a function.
So far I have been able to replace the NN instances with the output from the function. But I am not sure how I can delete the static "field" portion with the same regex.
input string:
(Field30="2010002257") and Field1="yuan" not Field28="AAA"
required output:
(IncidentId="2010002257") and Author="yuan" not Recipient="AAA"
This is the code I have so far:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"(?<=field).*?(?=\=)", delegate(Match Match) {
string fieldId = Match.ToString();
return String.Format("_{0}", getFieldName(Convert.ToInt64(fieldId)));
});
log.Info(String.Format("result={0}", result));
return result;
}
which gives:
(field_IncidentId="2010002257") and field_Author="yuan" not field_Recipient="aaa"
The issues I would like to resolve are:
Remove the static "field" prefixes from the output.
Make the regex case-insenitive on the "FieldNN" parts and not lowercase the quoted text portions.
Make the regex more robust so that the quoted string parts an use either double or single quotes.
Make the regex more robust so that spaces are ignored: FieldNN = "AAA" vs. FieldNN="AAA"
I really only need to address the first issue, the other three would be a bonus but I could probably fix those once I have discovered the right patterns for whitespace and quotes.
Update
I think the pattern below solves issues 2. and 4.
result = Regex.Replace(searchTerm, #"(?<=\b(?i:field)).*?(?=\s*\=)", delegate(Match Match)
To fix first issue use groups instead of positive lookbehind:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"field(.*?)(?=\=)", delegate(Match Match) {
string fieldId = Match.Groups[1].Value;
return getFieldName(Convert.ToInt64(fieldId));
});
log.Info(String.Format("result={0}", result));
return result;
}
In this case "field" prefix will be included in each match and will be replaced.

Replace any string between quotes

Problem:
Cannot find a consistent way to replace a random string between quotes with a specific string I want. Any help would be greatly appreciated.
Example:
String str1 = "test=\"-1\"";
should become
String str2 = "test=\"31\"";
but also work for
String str3 = "test=\"foobar\"";
basically I want to turn this
String str4 = "test=\"antyhingCanGoHere\"";
into this
String str4 = "test=\"31\"";
Have tried:
Case insensitive Regex without using RegexOptions enumeration
How do you do case-insensitive string replacement using regular expressions?
Replace any character in between AnyText: and <usernameredacted#example.com> with an empty string using Regex?
Replace string in between occurrences
Replace a String between two Strings
Current code:
Regex RemoveName = new Regex("(?VARIABLE=\").*(?=\")", RegexOptions.IgnoreCase);
String convertSeccons = RemoveName.Replace(ruleFixed, "31");
Returns error:
System.ArgumentException was caught
Message=parsing "(?VARIABLE=").*(?=")" - Unrecognized grouping construct.
Source=System
StackTrace:
at System.Text.RegularExpressions.RegexParser.ScanGroupOpen()
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options)
at application.application.insertGroupID(String rule) in C:\Users\winserv8\Documents\Visual Studio 2010\Projects\application\application\MainFormLauncher.cs:line 298
at application.application.xmlqueryDB(String xmlSaveLocation, TextWriter tw, String ruleName) in C:\Users\winserv8\Documents\Visual Studio 2010\Projects\application\application\MainFormLauncher.cs:line 250
InnerException:
found answer
string s = Regex.Replace(ruleFixed, "VARIABLE=\"(.*)\"", "VARIABLE=\"31\"");
ruleFixed = s;
I found this code sample at Replace any character in between AnyText: and with an empty string using Regex? which is one of the links i previously posted and just had skipped over this syntax because i thought it wouldnt handle what i needed.
var str1 = "test=\"foobar\"";
var str2 = str1.Substring(0, str1.IndexOf("\"") + 1) + "31\"";
If needed add check for IndexOf != -1
I don't know if I understood you correct, but if you want to replace all chars inside string, why aren't you using simple regular expresission
String str = "test=\"-\"1\"";
Regex regExpr = new Regex("\".*\"", RegexOptions.IgnoreCase);
String result = regExpr.Replace(str , "\"31\"");
Console.WriteLine(result);
prints:
test="31"
Note: You can take advantage of plain old XAttribute
String ruleFixed = "test=\"-\"1\"";
var splited = ruleFixed.Split('=');
var attribute = new XAttribute(splited[0], splited[1]);
attribute.Value = "31";
Console.WriteLine(attribute);//prints test="31"
var parts = given.Split('=');
return string.Format("{0}=\"{1}\"", parts[0], replacement);
In the case that your string has other things in it besides just the key/value pair of key="value", then you need to make the value-match part not match quote marks, or it will match all the way from the first value to the last quote mark in the string.
If that is true, then try this:
Regex.Replace(ruleFixed, "(?<=VARIABLE\s*=\s*\")[^\"]*(?=\")", "31");
This uses negative look-behind to match the VARIABLE=" part (with optional white space around it so VARIABLE = " would work as well, and negative look-ahead to match the ending ", without including the look-ahead/behind in the final match, enabling you to just replace the value you want.
If not, then your solution will work, but is not optimal because you have to repeat the value and the quote marks in the replace text.
Assuming that the string within the quotes does not contain quotes itself, you can use this general pattern in order to find a position between a prefix and a suffix:
(?<=prefix)find(?=suffix)
In your case
(?<=\w+=").*?(?=")
Here we are using the prefix \w+=" where \w+ denotes word characters (the variable) and =" are the equal sign and the quote.
We want to find anything .*? until we encounter the next quote.
The suffix is simply the quote ".
string result = Regex.Replace(input, "(?<=\\w+=\").*?(?=\")", replacement);
Try this:
[^"\r\n]*(?:""[\r\n]*)*
var pattern = "\"(.*)?\"";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var replacement = regex.Replace("test=\"hereissomething\"", "\"31\"");
string s = Regex.Replace(ruleFixed, "VARIABLE=\"(.*)\"", "VARIABLE=\"31\"");
ruleFixed = s;
I found this code sample at Replace any character in between AnyText: and <usernameredacted#example.com> with an empty string using Regex? which is one of the links i previously posted and just had skipped over this syntax because i thought it wouldnt handle what i needed.
String str1 = "test=\"-1\"";
string[] parts = str1.Split(new[] {'"'}, 3);
string str2 = parts.Length == 3 ? string.Join(#"\", parts.First(), "31", parts.Last()) : str1;
String str1 = "test=\"-1\"";
string res = Regex.Replace(str1, "(^+\").+(\"+)", "$1" + "31" + "$2");
Im pretty bad at RegEx but you could make a simple ExtensionMethod using string functions to do this.
public static class StringExtensions
{
public static string ReplaceBetweenQuotes(this string str, string replacement)
{
if (str.Count(c => c.Equals('"')) == 2)
{
int start = str.IndexOf('"') + 1;
str = str.Replace(str.Substring(start, str.LastIndexOf('"') - start), replacement);
}
return str;
}
}
Usage:
String str3 = "test=\"foobar\"";
str3 = str3.ReplaceBetweenQuotes("31");
returns: "test=\"31\""

Replace strings in file

I have to replace in a following manner
if the string is "string _countryCode" i have to replace it as "string _sCountryCode"
as you can see where there is _ I replace it with _s followd be next character in capitals ie _sC
more examples:
string _postalCode to be replaced as string _sPostalCode
string _firstName to be replace as string _sFirstName
Please help.
Preferably answer in C# syntax
Not sure I understand why, but perhaps something like:
static readonly Regex hungarian =
new Regex(#"(string\s+_)([a-z])", RegexOptions.Compiled);
...
string text = ...
string newText = hungarian.Replace(text, match =>
match.Groups[1].Value + "s" +
match.Groups[2].Value.ToUpper());
Note that the regex won't necessarily spot examples such as (valid C#):
string
_name = "abc";
If the pattern of the strings are as you have shown, then you do not need to go for a regex. You can do this using Replace method of the string class.
StringBuilder ss=new StringBuilder();
string concat="news_india";//or textbox1.text;
int indexs=concat.LastIndexOf("_")+1;//find "_" index
string find_lower=concat.Substring(indexs,1);
find_lower=find_lower.ToUpper(); //convert upper case
ss.Append(concat);
ss.Insert(indexs,"s"); //s->what ever u like give "+your text+"
ss.Insert(indexs+1,find_lower);
try this..its will work

Categories