remove text in between delimiters in a string - regex

remove text in between delimiters in a string - regex - c#

I have been trying real hard understanding regular expression, Is there any way I can replace character(s) that is between two regex/ For example I have
string datax = "a4726e1e-babb-4898-a5d5-e29d2bc40028;POPULATE DATA AØ99c1d133-15f5-4ef5-bc59- d9ed673b70c6;POPULATE DATA BØ";
how to remove string between regex ";" and "Ø" ???
i try to use code like this :
string xresult = Regex.Replace(datax, #"(?<=;)(\w+?)(?=Ø)", "");
But not working.
please corrected and give me solutions...
thanks...
i want the result like this sir :
string datax = "a4726e1e-babb-4898-a5d5-e29d2bc40028;Ø99c1d133-15f5-4ef5-bc59-d9ed673b70c6;Ø";

I think you need to understand regex a little better and how the replace function works. with regex you're defining capture groups, and with the replace function you want to replace those groups.
how to remove string between regex ";" and "Ø" ???
Step 1: First find ";",then capture all characters up to and including "Ø".
That's (;.*?Ø)
( New Capture Group
; Match ";"
. Match Anything
* Zero or more times
? Be Lazy
Ø Match "Ø"
) End Capture
Step 2: Replace each group with ";Ø"
public static string Replace(string input, string pattern, string
replacement)
So you need to put back the ";Ø" you removed from the original capture.
static void Test2()
{
foreach (string item in SO2588078())
{
Console.WriteLine(item);
}
string input = "a4726e1e-babb-4898-a5d5-e29d2bc40028;POPULATE DATA AØ99c1d133-15f5-4ef5-bc59- d9ed673b70c6;POPULATE DATA BØ";
string regex = "(;.*?Ø)";
string output = Regex.Replace(input, regex, ";Ø");
if (output == string.Join(";Ø", SO2588078()) + ";Ø")
{
Console.WriteLine("TRUE");
}
}
An alternative would be to parse the string without regex. It's a simple format and this gives you more control over the process so you can see what's happening, why it's gone wrong and why it gives the results it does. Since you can step through it.
private static IEnumerable<string> SO2588078()
{
string datax = "a4726e1e-babb-4898-a5d5-e29d2bc40028;POPULATE DATA AØ99c1d133-15f5-4ef5-bc59- d9ed673b70c6;POPULATE DATA BØ";
string temp = datax;
while (!string.IsNullOrEmpty(temp))
{
int index1 = temp.IndexOf(';');
if (index1 > -1)
{
string guid = temp.Remove(index1);
yield return guid;
int index2 = temp.IndexOf('Ø');
if (index2 > -1)
{
temp = temp.Substring(index2 + 1);
}
else
{
temp = null;
}
}
else
{
temp = null;
}
}
}

Related

How to do I cut off a certain part a String?

I have a big String in my program.
For Example:
String Newspaper = "...Blablabla... What do you like?...Blablabla... ";
Now I want to cut out the "What do you like?" an write it to a new String. But the problem is that the "Blablabla" is everytime something diffrent. Whit "cut out" I mean that you submit a start and a end word and all the things wrote between these lines should be in the new string. Because the sentence "What do you like?" changes sometimes except the start word "What" and the end word "like?"
Thanks for every responds

You can write the following method:
public static string CutOut(string s, string start, string end)
{
int startIndex = s.IndexOf(start);
if (startIndex == -1) {
return null;
}
int endIndex = s.IndexOf(end, startIndex);
if (endIndex == -1) {
return null;
}
return s.Substring(startIndex, endIndex - startIndex + end.Length);
}
It returns null if either the start or end pattern is not found. Only end patterns that follow the start pattern are searched for.
If you are working with C# 8+ and .NET Core 3.0+, you can also replace the last line with
return s[startIndex..(endIndex + end.Length)];
Test:
string input = "...Blablabla... What do you like?...Blablabla... ";
Console.WriteLine(CutOut(input, "What ", " like?"));
prints:
What do you like?
If you are happy with Regex, you can also write:
public static string CutOutRegex(string s, string start, string end)
{
Match match = Regex.Match(s, $#"\b{Regex.Escape(start)}.*{Regex.Escape(end)}");
if (match.Success) {
return match.Value;
}
return null;
}
The \b ensures that the start pattern is only found at the beginning of a word. You can drop it if you want. Also, if the end pattern occurs more than once, the result will include all of them unlike the first example with IndexOf which will only include the first one.

You have to do a substring, like the example below. See source for more information on substrings.
// A long string
string bio = "Mahesh Chand is a founder of C# Corner. Mahesh is also an
author, speaker, and software architect. Mahesh founded C# Corner in
2000.";
// Get first 12 characters substring from a string
string authorName = bio.Substring(0, 12);
Console.WriteLine(authorName);

In this case I would do it like this, cut the first part and then the second and concatenate with the fixed words using them as a parameter for cutting.
public string CutPhrase(string phrase)
{
var fst = "What";
var snd = "like?";
string[] cut1 = phrase.Split(new[] { fst }, StringSplitOptions.None);
string[] cut2 = cut1[1].Split(new[] { snd }, StringSplitOptions.None);
var rst = $"{fst} {cut2[0]} {snd}";
return rst;
}

Replacing anchor/link in text

I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards

If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].

Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}

C# - How can stop recursive function?

I want to replace all the word that start via # with another word, here is my code:
public string SemiFinalText { get; set; }
public string FinalText { get; set; }
//sample text : "aaaa bbbb #cccc dddd #eee fff g"
public string GetProperText(string text)
{
if (text.Contains('#'))
{
int index = text.IndexOf('#');
string restText = text.Substring(index);
var indexLast = restText.IndexOf(' ');
var oldName = text.Substring(index, indexLast);
string restText2 = text.Substring( index + indexLast);
SemiFinalText += text.Substring(0, index + indexLast).Replace(oldName, "#New");
if (restText2.Contains('#'))
{
GetProperText(restText2);
}
FinalText = SemiFinalText + restText2;
return FinalText;
}
else
{
return text;
}
}
When return FinalText; is executed I want to stop recursive function. How can fix it?
Maybe another approach is better than recursive function. If you know another way please give an answer to me.

You don't need a recursive solution for this problem. You have a string containing a number of words (separated by spaces) and you want to replace the ones starting with an '#' with another string. Modifying your solution to have a simple method that splits based on spaces, replaces all words starting with # and then combines them once again.
Using Linq:
string text = "aaaa bbbb #cccc dddd #eee fff g";
FinalText = GetProperText(text, "New");
public string GetProperText(string text, string replacewith)
{
text = string.Join(" ", text.Split(' ').Select(x => x.StartsWith("#") ? replacewith: x));
return text;
}
Output: aaaa bbbb New dddd New fff g
Using Regex:
Regex rgx = new Regex("#([^ #])*");
string result = rgx.Replace(text, replaceword);

Solution with Regular Expressions:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string pattern = #"#\w+";
var r = new Regex(pattern);
Console.WriteLine(r.Replace("ABC #ABC ABC #DEF klm.#bhsh", "BOOM!"));
}
}
This does not rely on space character being the delimiter, any non-word (letters and numbers) can be used to separate the 'words'. This example outputs:
ABC BOOM! ABC BOOM! klm.BOOM!
You can test it out here: https://dotnetfiddle.net/rZyjjg
If you're new to Regex: .NET Introduction to Regular Expressions

Here also the proper way to do it recursively for anyone interested. I think your stopping condition was actually oke, but you should concatenate the outcome of the recursive function call to the already processed text. Also I think that using global variables in a recursive function defeats its purpose a little bit.
That being said I think that using RegEx from one of the supplied answer is better and faster.
The recursive code:
//sample text : "aaaa bbbb #cccc dddd #eee fff g"
public string GetProperText(string text)
{
if (text.Contains('#'))
{
int index = text.IndexOf('#'); //Index of first occuring '#'
var indexLast = text.IndexOf(' ',index); //Index of first ' ' after '#'
var oldName = text.Substring(index, indexLast); //Old Name
string processedText = text.Substring(0, index + indexLast).Replace(oldName, "New"); //String with new name
string restText = text.Substring(indexLast); //Rest Text
if (text.Contains('#'))
{
//Here the outcome of the function is pasted on the allready processed text part.
text = processedText + GetProperText(restText);
}
return text;
}
else
{
return text;
}
}

replace a character in a string in c# based on position with a string

I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.

If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.

Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}

I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);

Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8

Finding the number of occurences strings in a specific format occur in a given text

I have a large string, where there can be specific words (text followed by a single colon, like "test:") occurring more than once. For example, like this:
word:
TEST:
word:
TEST:
TEST: // random text
"word" occurs twice and "TEST" occurs thrice, but the amount can be variable. Also, these words don't have to be in the same order and there can be more text in the same line as the word (as shown in the last example of "TEST"). What I need to do is append the occurrence number to each word, for example the output string needs to be this:
word_ONE:
TEST_ONE:
word_TWO:
TEST_TWO:
TEST_THREE: // random text
The RegEx for getting these words which I've written is ^\b[A-Za-z0-9_]{4,}\b:. However, I don't know how to accomplish the above in a fast way. Any ideas?

Regex is perfect for this job - using Replace with a match evaluator:
This example is not tested nor compiled:
public class Fix
{
public static String Execute(string largeText)
{
return Regex.Replace(largeText, "^(\w{4,}):", new Fix().Evaluator);
}
private Dictionary<String, int> counters = new Dictionary<String, int>();
private static String[] numbers = {"ONE", "TWO", "THREE",...};
public String Evaluator(Match m)
{
String word = m.Groups[1].Value;
int count;
if (!counters.TryGetValue(word, out count))
count = 0;
count++;
counters[word] = count;
return word + "_" + numbers[count-1] + ":";
}
}
This should return what you requested when calling:
result = Fix.Execute(largeText);

i think you can do this with Regax.Replace(string, string, MatchEvaluator) and a dictionary.
Dictionary<string, int> wordCount=new Dictionary<string,int>();
string AppendIndex(Match m)
{
string matchedString = m.ToString();
if(wordCount.Contains(matchedString))
wordCount[matchedString]=wordCount[matchedString]+1;
else
wordCount.Add(matchedString, 1);
return matchedString + "_"+ wordCount.ToString();// in the format: word_1, word_2
}
string inputText = "....";
string regexText = #"";
static void Main()
{
string text = "....";
string result = Regex.Replace(text, #"^\b[A-Za-z0-9_]{4,}\b:",
new MatchEvaluator(AppendIndex));
}
see this:
http://msdn.microsoft.com/en-US/library/cft8645c(v=VS.80).aspx

If I understand you correctly, regex is not necessary here.
You can split your large string by the ':' character. Maybe you also need to read line by line (split by '\n'). After that you just create a dictionary (IDictionary<string, int>), which counts the occurrences of certain words. Every time you find word x, you increase the counter in the dictionary.
EDIT
Read your file line by line OR split the string by '\n'
Check if your delimiter is present. Either by splitting by ':' OR using regex.
Get the first item from the split array OR the first match of your regex.
Use a dictionary to count your occurrences.
if (dictionary.Contains(key)) dictionary[key]++;
else dictionary.Add(key, 1);
If you need words instead of numbers, then create another dictionary for these. So that dictionary[key] equals one if key equals 1. Mabye there is another solution for that.

Look at this example (I know it's not perfect and not so nice)
lets leave the exact argument for the Split function, I think it can help
static void Main(string[] args)
{
string a = "word:word:test:-1+234=567:test:test:";
string[] tks = a.Split(':');
Regex re = new Regex(#"^\b[A-Za-z0-9_]{4,}\b");
var res = from x in tks
where re.Matches(x).Count > 0
select x + DecodeNO(tks.Count(y=>y.Equals(x)));
foreach (var item in res)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static string DecodeNO(int n)
{
switch (n)
{
case 1:
return "_one";
case 2:
return "_two";
case 3:
return "_three";
}
return "";
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

remove text in between delimiters in a string - regex - c#

Related

How to do I cut off a certain part a String?

Replacing anchor/link in text

C# - How can stop recursive function?

replace a character in a string in c# based on position with a string

Finding the number of occurences strings in a specific format occur in a given text

Categories

Resources