Remove last occurrence of a string in a string - c#

I have a string that is of nature
RTT(50)
RTT(A)(50)
RTT(A)(B)(C)(50)
What I want to is to remove the last () occurrence from the string. That is if the string is - RTT(50), then I want RTT only returned. If it is RTT(A)(50), I want RTT(A) returned etc.
How do I achieve this? I currently use a substring method that takes out any occurrence of the () regardless. I thought of using:
Regex.Matches(node.Text, "( )").Count
To count the number of occurrences so I did something like below.
if(Regex.Matches(node.Text, "( )").Count > 1)
//value = node.Text.Remove(Regex.//Substring(1, node.Text.IndexOf(" ("));
else
value = node.Text.Substring(0, node.Text.IndexOf(" ("));
The else part will do what I want. However, how to remove the last occurrence in the if part is where I am stuck.

The String.LastIndexOf method does what you need - returns the last index of a char or string.
If you're sure that every string will have at least one set of parentheses:
var result = node.Text.Substring(0, node.Text.LastIndexOf("("));
Otherwise, you could test the result of LastIndexOf:
var lastParenSet = node.Text.LastIndexOf("(");
var result =
node.Text.Substring(0, lastParenSet > -1 ? lastParenSet : node.Text.Count());

This should do what you want :
your_string = your_string.Remove(your_string.LastIndexOf(string_to_remove));
It's that simple.

There are a couple of different options to consider.
LastIndexOf
Get the last index of the ( character and take the substring up to that index. The downside of this approach is an additional last index check for ) would be needed to ensure that the format is correct and that it's a pair with the closing parenthesis occurring after the opening parenthesis (I did not perform this check in the code below).
var index = input.LastIndexOf('(');
if (index >= 0)
{
var result = input.Substring(0, index);
Console.WriteLine(result);
}
Regex with RegexOptions.RightToLeft
By using RegexOptions.RightToLeft we can grab the last index of a pair of parentheses.
var pattern = #"\(.+?\)";
var match = Regex.Match(input, pattern, RegexOptions.RightToLeft);
if (match.Success)
{
var result = input.Substring(0, match.Index);
Console.WriteLine(result);
}
else
{
Console.WriteLine(input);
}
Regex depending on numeric format
If you're always expecting the final parentheses to have numeric content, similar to your example values where (50) is getting removed, we can use a pattern that matches any numbers inside parentheses.
var patternNumeric = #"\(\d+\)";
var result = Regex.Replace(input, patternNumeric, "");
Console.WriteLine(result);

It's very simple. You can easily achieve like this:
string a=RTT(50);
string res=a.substring (0,a.LastIndexOf("("))

As an extention:
namespace CustomExtensions
{
public static class StringExtension
{
public static string ReplaceLastOf(this string str, string fromStr, string toStr)
{
int lastIndexOf = str.LastIndexOf(fromStr);
if (lastIndexOf < 0)
return str;
string leading = str.Substring(0, lastIndexOf);
int charsToEnd = str.Length - (lastIndexOf + fromStr.Length);
string trailing = str.Substring(lastIndexOf+fromStr.Length, charsToEnd);
return leading + toStr + trailing;
}
}
}
Use:
string myFavColor = "My favourite color is blue";
string newFavColor = myFavColor.ReplaceLastOf("blue", "red");

try something a function this:
public static string ReplaceLastOccurrence(string source, string find, string replace)
{
int place = source.LastIndexOf(find);
return source.Remove(place, find.Length).Insert(place, replace);
}
It will remove the last occurrence of a string string and replace to another one, and use:
string result = ReplaceLastOccurrence(value, "(", string.Empty);
In this case, you find ( string inside the value string, and replace the ( to a string.Empty. It also could be used to replace to another information.

Related

Remove list of words from string

I have a list of words that I want to remove from a string I use the following method
string stringToClean = "The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam";
string[] BAD_WORDS = {
"720p", "web-dl", "hevc", "x265", "Rmteam", "."
};
var cleaned = string.Join(" ", stringToClean.Split(' ').Where(w => !BAD_WORDS.Contains(w, StringComparer.OrdinalIgnoreCase)));
but it is not working And the following text is output
The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam
For this it would be a good idea to create a reusable method that splits a string into words. I'll do this as an extension method of string. If you are not familiar with extension methods, read extension methods demystified
public static IEnumerable<string> ToWords(this string text)
{
// TODO implement
}
Usage will be as follows:
string text = "This is some wild text!"
List<string> words = text.ToWords().ToList();
var first3Words = text.ToWords().Take(3);
var lastWord = text.ToWords().LastOrDefault();
Once you've got this method, the solution to your problem will be easy:
IEnumerable<string> badWords = ...
string inputText = ...
IEnumerable<string> validWords = inputText.ToWords().Except(badWords);
Or maybe you want to use Except(badWords, StringComparer.OrdinalIgnoreCase);
The implementation of ToWords depends on what you would call a word: everything delimited by a dot? or do you want to support whitespaces? or maybe even new-lines?
The implementation for your problem: A word is any sequence of characters delimited by a dot.
public static IEnumerable<string> ToWords(this string text)
{
// find the next dot:
const char dot = '.';
int startIndex = 0;
int dotIndex = text.IndexOf(dot, startIndex);
while (dotIndex != -1)
{
// found a Dot, return the substring until the dot:
int wordLength = dotIndex - startIndex;
yield return text.Substring(startIndex, wordLength;
// find the next dot
startIndex = dotIndex + 1;
dotIndex = text.IndexOf(dot, startIndex);
}
// read until the end of the text. Return everything after the last dot:
yield return text.SubString(startIndex, text.Length);
}
TODO:
Decide what you want to return if text starts with a dot ".ABC.DEF".
Decide what you want to return if the text ends with a dot: "ABC.DEF."
Check if the return value is what you want if text is empty.
Your split/join don't match up with your input.
That said, here's a quick one-liner:
string clean = BAD_WORDS.Aggregate(stringToClean, (acc, word) => acc.Replace(word, string.Empty));
This is basically a "reduce". Not fantastically performant but over strings that are known to be decently small I'd consider it acceptable. If you have to use a really large string or a really large number of "words" you might look at another option but it should work for the example case you've given us.
Edit: The downside of this approach is that you'll get partials. So for example in your token array you have "720p" but the code I suggested here will still match on "720px" but there are still ways around it. For example instead of using string's implementation of Replace you could use a regex that will match your delimiters something like Regex.Replace(acc, $"[. ]{word}([. ])", "$1") (regex not confirmed but should be close and I added a capture for the delimiter in order to put it back for the next pass)

Removing text between 2 strings

I tried to write a function in C# which removes the string between two strings. Like this:
string RemoveBetween(string sourceString, string startTag, string endTag)
At first I thought this is easy, but after some time I encountered more and more problems
So this is the easy case (All examples with startTag="Start" and endTag="End")
"Any Text Start remove this End between" => "Any Text StartEnd between"
But it should also be able to handle multiples without deleting the text between:
"Any Text Start remove this End between should be still there Start and remove this End multiple" => "Any Text StartEnd between should be still there StartEnd multiple"
It should always take the smallest string to remove:
"So Start followed by Start only remove this End other stuff" => "So Start followed by StartEnd other stuff"
It should also respect the order of the the Tags:
"the End before Start. Start before End is correct" => "the End before Start. StartEnd is correct"
I tried a RegEx which did not work (It could not handle multiples):
public string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*){1}", Regex.Escape(startTag), Regex.Escape(endTag)));
return regex.Replace(sourceString, string.Empty);
}
And than I tried to work with IndexOf and Substring, but I do not see an end. And even if it would work, this cant be the most elegant way to solve this.
Here is a approach with string.Remove()
string input = "So Start followed by Start only remove this End other stuff";
int start = input.LastIndexOf("Start") + "Start".Length;
int end = input.IndexOf("End", start);
string result = input.Remove(start, end - start);
I use LastIndexOf() because there can be multiple starts and you want to have the last one.
You must sligthly modify your function to do a non-greedy match with ? and RegexOptions.RightToLeft to work with all your examples :
public static string RemoveBetween(string sourceString, string startTag, string endTag)
{
Regex regex = new Regex(string.Format("{0}(.*?){1}", Regex.Escape(startTag), Regex.Escape(endTag)), RegexOptions.RightToLeft);
return regex.Replace(sourceString, startTag+endTag);
}
You can use this:
public static string Remove(string original, string firstTag, string secondTag)
{
string pattern = firstTag + "(.*?)" + secondTag;
Regex regex = new Regex(pattern, RegexOptions.RightToLeft);
foreach(Match match in regex.Matches(original))
{
original = original.Replace(match.Groups[1].Value, string.Empty);
}
return original;
}
string data = "text start this is my text end text";
string startTag = "start";
string endTag = "end";
int startIndex = data.IndexOf(startTag)+ startTag.Length;
Console.WriteLine(data.Substring(startIndex, data.IndexOf(endTag)-startIndex));
Or you could try to use LINQ like showed here
public static string Remove(this string s, IEnumerable<char> chars)
{
return new string(s.Where(c => !chars.Contains(c)).ToArray());
}

Get string between strings in c#

I am trying to get string between same strings:
The texts starts here ** Get This String ** Some other text ongoing here.....
I am wondering how to get the string between stars. Should I should use some regex or other functions?
You can try Split:
string source =
"The texts starts here** Get This String **Some other text ongoing here.....";
// 3: we need 3 chunks and we'll take the middle (1) one
string result = source.Split(new string[] { "**" }, 3, StringSplitOptions.None)[1];
You can use IndexOf to do the same without regular expressions.
This one will return the first occurence of string between two "**" with trimed whitespaces. It also has checks of non-existence of a string which matches this condition.
public string FindTextBetween(string text, string left, string right)
{
// TODO: Validate input arguments
int beginIndex = text.IndexOf(left); // find occurence of left delimiter
if (beginIndex == -1)
return string.Empty; // or throw exception?
beginIndex += left.Length;
int endIndex = text.IndexOf(right, beginIndex); // find occurence of right delimiter
if (endIndex == -1)
return string.Empty; // or throw exception?
return text.Substring(beginIndex, endIndex - beginIndex).Trim();
}
string str = "The texts starts here ** Get This String ** Some other text ongoing here.....";
string result = FindTextBetween(str, "**", "**");
I usually prefer to not use regex whenever possible.
If you want to use regex, this could do:
.*\*\*(.*)\*\*.*
The first and only capture has the text between stars.
Another option would be using IndexOf to find the position of the first star, check if the following character is a star too and then repeat that for the second set. Substring the part between those indexes.
If you can have multiple pieces of text to find in one string, you can use following regex:
\*\*(.*?)\*\*
Sample code:
string data = "The texts starts here ** Get This String ** Some other text ongoing here..... ** Some more text to find** ...";
Regex regex = new Regex(#"\*\*(.*?)\*\*");
MatchCollection matches = regex.Matches(data);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
You could use split but this would only work if there is 1 occurrence of the word.
Example:
string output = "";
string input = "The texts starts here **Get This String **Some other text ongoing here..";
var splits = input.Split( new string[] { "**", "**" }, StringSplitOptions.None );
//Check if the index is available
//if there are no '**' in the string the [1] index will fail
if ( splits.Length >= 2 )
output = splits[1];
Console.Write( output );
Console.ReadKey();
You can use SubString for this:
String str="The texts starts here ** Get This String ** Some other text ongoing here";
s=s.SubString(s.IndexOf("**"+2));
s=s.SubString(0,s.IndexOf("**"));

how to remove special char from the string and make new string?

I have a string 4(4X),4(4N),3(3X) from this string I want to make string 4,4,3. If I am getting the string 4(4N),3(3A),2(2X) then I want to make my string 4,3,2.
Please someone tell me how can I solve my problem.
This Linq query selects substring from each part of input string, starting from beginning till first open brace:
string input = "4(4N),3(3A),2(2X)";
string result = String.Join(",", input.Split(',')
.Select(s => s.Substring(0, s.IndexOf('('))));
// 4,3,2
This may help:
string inputString = "4(4X),4(4N),3(3X)";
string[] temp = inputString.Split(',');
List<string> result = new List<string>();
foreach (string item in temp)
{
result.Add(item.Split('(')[0]);
}
var whatYouNeed = string.Join(",", result);
You can use regular expressions
String input = #"4(4X),4(4N),3(3X)";
String pattern = #"(\d)\(\1.\)";
// ( ) - first group.
// \d - one number
// \( and \) - braces.
// \1 - means the repeat of first group.
String result = Regex.Replace(input, pattern, "$1");
// $1 means, that founded patterns will be replcaed by first group
//result = 4,4,3

Extracting string between two characters?

I want to extract email id between < >
for example.
input string : "abc" <abc#gmail.com>; "pqr" <pqr#gmail.com>;
output string : abc#gmail.com;pqr#gmail.com
Without regex, you can use this:
public static string GetStringBetweenCharacters(string input, char charFrom, char charTo)
{
int posFrom = input.IndexOf(charFrom);
if (posFrom != -1) //if found char
{
int posTo = input.IndexOf(charTo, posFrom + 1);
if (posTo != -1) //if found char
{
return input.Substring(posFrom + 1, posTo - posFrom - 1);
}
}
return string.Empty;
}
And then:
GetStringBetweenCharacters("\"abc\" <abc#gmail.com>;", '<', '>')
you will get
abc#gmail.com
string input = #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>;";
var output = String.Join(";", Regex.Matches(input, #"\<(.+?)\>")
.Cast<Match>()
.Select(m => m.Groups[1].Value));
Tested
string input = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
matchedValuesConcatenated = string.Join(";",
Regex.Matches(input, #"(?<=<)([^>]+)(?=>)")
.Cast<Match>()
.Select(m => m.Value));
(?<=<) is a non capturing look behind so < is part of the search but not included in the output
The capturing group is anything not > one or more times
Can also use non capturing groups #"(?:<)([^>]+)(?:>)"
The answer from LB +1 is also correct. I just did not realize it was correct until I wrote an answer myself.
Use the String.IndexOf(char, int) method to search for < starting at a given index in the string (e.g. the last index that you found a > character at, i.e. at the end of the previous e-mail address - or 0 when looking for the first address).
Write a loop that repeats for as long as you find another < character, and everytime you find a < character, look for the next > character. Use the String.Substring(int, int) method to extract the e-mail address whose start and end position is then known to you.
Could use the following regex and some linq.
var regex = new Regex(#"\<(.*?)\>");
var input= #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>";
var matches = regex.Matches(input);
var res = string.Join(";", matches.Cast<Match>().Select(x => x.Value.Replace("<","").Replace(">","")).ToArray());
The <> brackets get removed afterwards, you could also integrate it into Regex I guess.
string str = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
string output = string.Empty;
while (str != string.Empty)
{
output += str.Substring(str.IndexOf("<") + 1, str.IndexOf(">") -1);
str = str.Substring(str.IndexOf(">") + 2, str.Length - str.IndexOf(">") - 2).Trim();
}

Categories