Getting Text between tags

Getting Text between tags - c#

Hey I have an input string that looks like this:
Just a test Post [c] hello world [/c]
the output should be:
hello world
can anybody help?
I tried to use:
Regex regex = new Regex("[c](.*)[/c]");
var v = regex.Match(post.Content);
string s = v.Groups[1].ToString();

You may do this without Regex. Consider this extension method:
public static string GetStrBetweenTags(this string value,
string startTag,
string endTag)
{
if (value.Contains(startTag) && value.Contains(endTag))
{
int index = value.IndexOf(startTag) + startTag.Length;
return value.Substring(index, value.IndexOf(endTag) - index);
}
else
return null;
}
and use it:
string s = "Just a test Post [c] hello world [/c] ";
string res = s.GetStrBetweenTags("[c]", "[/c]");

In regex
[character_group]
means:
Matches any single character in character_group.
Note that \, *, +, ?, |, {, [, (,), ^, $,., # and white space are Character Escapes and you have to use \ to use them in your expression:
\[c\](.*)\[/c\]
The backslash character \ in a regular expression indicates that the character that follows it either is a special character, or should be interpreted literally.
so that your code should be work correctly if you edit your regex:
Regex regex = new Regex("\[c\](.*)\[/c\]");
var v = regex.Match(post.Content);
string s = v.Groups[1].ToString();

Change your code to:
Regex regex = new Regex(#"\[c\](.*)\[/c\]");
var v = regex.Match(post.Content);
string s = v.Groups[1].Value;

Piggybacking on #horgh's answer, this adds an inclusive/exclusive option:
public static string ExtractBetween(this string str, string startTag, string endTag, bool inclusive)
{
string rtn = null;
int s = str.IndexOf(startTag);
if (s >= 0)
{
if(!inclusive)
s += startTag.Length;
int e = str.IndexOf(endTag, s);
if (e > s)
{
if (inclusive)
e += startTag.Length;
rtn = str.Substring(s, e - s);
}
}
return rtn;
}

You looking for something like this?
var regex = new Regex(#"(?<=\[c\]).*?(?=\[/c\])");
foreach(Match match in regex.Matches(someString))
Console.WriteLine(match.Value);

This code takes into account also identical opening tags and can ignore tag case
public static string GetTextBetween(this string value, string startTag, string endTag, StringComparison stringComparison = StringComparison.CurrentCulture)
{
if (!string.IsNullOrEmpty(value))
{
int startIndex = value.IndexOf(startTag, stringComparison) + startTag.Length;
if (startIndex > -0)
{
var endIndex = value.IndexOf(endTag, startIndex, stringComparison);
if (endIndex > 0)
{
return value.Substring(startIndex, endIndex - startIndex);
}
}
}
return null;
}

Related

Is it possible to remove every occurance of a character and one character after it from entire string using regex?

Is it possible to remove every occurance of a character + one after it from entire string?
Here's a code that achieves what I described, but is rather big:
private static string FilterText(string text){
string filteredText = text;
while (true)
{
int comaIndex = filteredText.IndexOf('.');
if (comaIndex == -1)
{
break;
}
else if (comaIndex > 1)
{
filteredText = filteredText.Substring(0, comaIndex) + filteredText.Substring(comaIndex + 2);
}
else if (comaIndex == 1)
{
filteredText = filteredText[0] + filteredText.Substring(comaIndex + 2);
}
else
{
filteredText = filteredText.Substring(comaIndex + 2);
}
}
return filteredText;
}
This code would turn for example input .Otest.1 .2.,string.w.. to test string
Is it possible to achieve the same result using regex?

You want to use
var output = Regex.Replace(text, #"\..", RegexOptions.Singleline);
See the .NET regex demo. Details:
\. - matches a dot
. - matches any char including a line feed char due to the RegexOptions.Singleline option used.

Try this pattern: (?<!\.)\w+
code:
using System;
using System.Text.RegularExpressions;
public class Test{
public static void Main(){
string str = ".Otest.1 .2.,string.w..";
Console.WriteLine(FilterText(str));
}
private static string FilterText(string text){
string pattern = #"(?<!\.)\w+";
string result = "";
foreach(Match m in Regex.Matches(text, pattern)){
result += m + " ";
}
return result;
}
}
output:
test string

how to replace sections of string enclosed by square brackets with data stored in an array?

I have created a program which takes a string (e.g "[2*4]x + [3/2]x"), isolates all the instances where there is text within square brackets and places them within an array 'matches'. It then strips off the square brackets and by some function (i am using the library flee) takes each string (e.g 2*4) and evaluates it before storing it in an array 'answers'. I now need a way to replace the items within square brackets in the original string with the items in the 'answers' array but I am not sure how to do this
public string result(string solution,int num1, int num2, int num3,int num4)
{
Regex regex = new Regex(#"\[.*?\]");
MatchCollection matches = regex.Matches(solution);
int count = matches.Count;
int [] answers = new int [10];
for (int i = 0; i <= count; i++)
{
string match = matches[i].Value;
match = match.Replace("[", "");
match = match.Replace("]", "");
Console.WriteLine(match);
ExpressionOptions options = new ExpressionOptions();
options.Imports.AddType(typeof(System.Math));
ExpressionOwner owner = new ExpressionOwner();
owner.a = num1;
owner.b = num2;
owner.c = num3;
owner.d = num4;
Expression expressionmethod = new Expression(match, owner, options);
try
{
ExpressionEvaluator<int> evaluator = (ExpressionEvaluator<int>)expressionmethod.Evaluator;
int result = evaluator();
answers[i] = result;
}
catch
{
ExpressionEvaluator<double> evaluator = (ExpressionEvaluator<double>)expressionmethod.Evaluator;
double result = evaluator();
answers[i] = Convert.ToInt32(result);
}
}
}

You may use Regex.Replace with a callback method as the replacement argument where you may do whatever you need with the match value and put it back into the resulting string after modifications. Capture all text between square brackets so as to avoid extra manipulation with the match value.
Here is the code:
public string ReplaceCallback(Match m)
{
string match = m.Groups[1].Value;
Console.WriteLine(match);
ExpressionOptions options = new ExpressionOptions();
options.Imports.AddType(typeof(System.Math));
ExpressionOwner owner = new ExpressionOwner();
owner.a = num1;
owner.b = num2;
owner.c = num3;
owner.d = num4;
Expression expressionmethod = new Expression(match, owner, options);
try
{
ExpressionEvaluator<int> evaluator = (ExpressionEvaluator<int>)expressionmethod.Evaluator;
int result = evaluator();
return result.ToString();
}
catch
{
ExpressionEvaluator<double> evaluator = (ExpressionEvaluator<double>)expressionmethod.Evaluator;
double result = evaluator();
return result.ToString();
}
}
public string result(string solution,int num1, int num2, int num3,int num4)
{
return Regex.Replace(solution, #"\[(.*?)]", ReplaceCallback);
}
The \[(.*?)] regex matches [, then matches and captures any 0+ chars other than a newline as few as possible, and then matches a ] char. So, the text between [...] is inside match.Groups[1].Value that is further modified inside the callback method.

As well as getting the matches, use regex.Split with the same pattern to get the text between the matches.
Then it's just a case of interpolating them with your results to build your result string.
Right after MatchCollection matches = regex.Matches(solution); add the line:
string[] otherStuff = regex.Split(solution);
Debug.Assert(otherStuff.Length == matches.Count + 1); // Optional obviously. Regex can be weird though, I'd check it just in case.
Then you can just do
StringBuilder finalResult = new StringBuilder();
finalResult.Append(otherStuff[0]);
for (int i = 0; i < count; i++)
{
finalResult.Append(answers[i]);
finalResult.Append(otherStuff[i+1]);
}
and finalResult should give you what you need.

Changing a specific part of a string

In C#, I got a string which looks in the following format:
a number|a number|a number,a number
for example: 1|2|3,4
I consider each number as the different part of the string. in the previous example, 1 is the first part, 2 is the second and so on.
I want to be able to replace a specific part of the string given an index of the part I want to change.
It's not that hard to do it with String.Split but that part with the comma makes it tedious since then i need to check if the index is 3 or 4 and then also separate with the comma.
Is there a more elegant way to do a switch of a specific part in the string? maybe somehow with a regular expression?
EDIT: I will add some requirements which I didn't write before:
What if I want to for example take the 3rd part of the string and replace it with the number there and add it 2. for example 1|2|3,4 to 1|2|5,4 where the 5 is NOT a constant but depends on the input string given.

You can create the following method
static string Replace(string input, int index, string replacement)
{
int matchIndex = 0;
return Regex.Replace(input, #"\d+", m => matchIndex++ == index ? replacement : m.Value);
}
Usage:
string input = "1|2|3,4";
string output = Replace(input, 1, "hello"); // "1|hello|3,4
As Eric Herlitz suggested, you can use other regex, the negative of delimiters. For example, if you expect , and | delimiters, you can replace \d+ by [^,|]+ regex. If you expect ,, | and # delimiters, you can use [^,|#] regex.
If you need to do some mathematical operations, you're free to do so:
static string Replace(string input, int index, int add)
{
int matchIndex = 0;
return Regex.Replace(input, #"\d+", m => matchIndex++ == index ? (int.Parse(m.Value) + add).ToString() : m.Value );
}
Example:
string input = "1|2|3,4";
string output = Replace(input, 2, 2); // 1|2|5,4
You can even make it generic:
static string Replace(string input, int index, Func<string,string> operation)
{
int matchIndex = 0;
return Regex.Replace(input, #"\d+", m => matchIndex++ == index ? operation(m.Value) : m.Value);
}
Example:
string input = "1|2|3,4";
string output = Replace(input, 2, value => (int.Parse(value) + 2).ToString()); // 1|2|5,4

Try this:
static void Main()
{
string input = "1|2|3|4,5,6|7,8|9|23|29,33";
Console.WriteLine(ReplaceByIndex(input, "hello", 23));
Console.ReadLine();
}
static string ReplaceByIndex(string input, string replaceWith, int index)
{
int indexStart = input.IndexOf(index.ToString());
int indexEnd = input.IndexOf(",", indexStart);
if (input.IndexOf("|", indexStart) < indexEnd)
indexEnd = input.IndexOf("|", indexStart);
string part1 = input.Substring(0, indexStart);
string part2 = "";
if (indexEnd > 0)
{
part2 = input.Substring(indexEnd, input.Length - indexEnd);
}
return part1 + replaceWith + part2;
}
This is assuming the numbers are in ascending order.

Use Regex.Split for the input and Regex.Match to collect your delimiters
string input = "1|2|3,4,5,6|7,8|9";
string pattern = #"[,|]+";
// Collect the values
string[] substrings = Regex.Split(input, pattern);
// Collect the delimiters
MatchCollection matches = Regex.Matches(input, pattern);
// Replace anything you like, i.e.
substrings[3] = "222";
// Rebuild the string
int i = 0;
string newString = string.Empty;
foreach (string substring in substrings)
{
newString += string.Concat(substring, matches.Count >= i + 1 ? matches[i++].Value : string.Empty);
}
This will output "1|2|3,222,5,6|7,8|9"

Try this (tested):
public static string Replace(string input, int value, int index)
{
string pattern = #"(\d+)|(\d+)|(\d+),(\d+)";
return Regex.Replace(input, pattern, match =>
{
if (match.Index == index * 2) //multiply by 2 for | and , character.
{
return value.ToString();
}
return match.Value;
});
}
Usage example:
string input = "1|2|3,4";
string output = Replace(input, 9, 1);
Updated with new requirement:
public static string ReplaceIncrement(string input, int incrementValue, int index)
{
string pattern = #"(\d+)|(\d+)|(\d+),(\d+)";
return Regex.Replace(input, pattern, match =>
{
if (match.Index == index * 2)
{
return (int.Parse(match.Value) + incrementValue).ToString();
}
return match.Value;
});
}

Getting number from a string in C#

I am scraping some website content which is like this - "Company Stock Rs. 7100".
Now, what i want is to extract the numeric value from this string. I tried split but something or the other goes wrong with my regular expression.
Please let me know how to get this value.

Use:
var result = Regex.Match(input, #"\d+").Value;
If you want to find only number which is last "entity" in the string you should use this regex:
\d+$
If you want to match last number in the string, you can use:
\d+(?!\D*\d)

int val = int.Parse(Regex.Match(input, #"\d+", RegexOptions.RightToLeft).Value);

I always liked LINQ:
var theNumber = theString.Where(x => char.IsNumber(x));
Though Regex sounds like the native choice...

This code will return the integer at the end of the string. This will work better than the regular expressions in the case that there is a number somewhere else in the string.
public int getLastInt(string line)
{
int offset = line.Length;
for (int i = line.Length - 1; i >= 0; i--)
{
char c = line[i];
if (char.IsDigit(c))
{
offset--;
}
else
{
if (offset == line.Length)
{
// No int at the end
return -1;
}
return int.Parse(line.Substring(offset));
}
}
return int.Parse(line.Substring(offset));
}

If your number is always after the last space and your string always ends with this number, you can get it this way:
str.Substring(str.LastIndexOf(" ") + 1)

Here is my answer ....it is separating numeric from string using C#....
static void Main(string[] args)
{
String details = "XSD34AB67";
string numeric = "";
string nonnumeric = "";
char[] mychar = details.ToCharArray();
foreach (char ch in mychar)
{
if (char.IsDigit(ch))
{
numeric = numeric + ch.ToString();
}
else
{
nonnumeric = nonnumeric + ch.ToString();
}
}
int i = Convert.ToInt32(numeric);
Console.WriteLine(numeric);
Console.WriteLine(nonnumeric);
Console.ReadLine();
}
}
}

You can use \d+ to match the first occurrence of a number:
string num = Regex.Match(input, #"\d+").Value;

How do I extract text that lies between parentheses (round brackets)?

I have a string User name (sales) and I want to extract the text between the brackets, how would I do this?
I suspect sub-string but I can't work out how to read until the closing bracket, the length of text will vary.

If you wish to stay away from regular expressions, the simplest way I can think of is:
string input = "User name (sales)";
string output = input.Split('(', ')')[1];

A very simple way to do it is by using regular expressions:
Regex.Match("User name (sales)", #"\(([^)]*)\)").Groups[1].Value
As a response to the (very funny) comment, here's the same Regex with some explanation:
\( # Escaped parenthesis, means "starts with a '(' character"
( # Parentheses in a regex mean "put (capture) the stuff
# in between into the Groups array"
[^)] # Any character that is not a ')' character
* # Zero or more occurrences of the aforementioned "non ')' char"
) # Close the capturing group
\) # "Ends with a ')' character"

Assuming that you only have one pair of parenthesis.
string s = "User name (sales)";
int start = s.IndexOf("(") + 1;
int end = s.IndexOf(")", start);
string result = s.Substring(start, end - start);

Use this function:
public string GetSubstringByString(string a, string b, string c)
{
return c.Substring((c.IndexOf(a) + a.Length), (c.IndexOf(b) - c.IndexOf(a) - a.Length));
}
and here is the usage:
GetSubstringByString("(", ")", "User name (sales)")
and the output would be:
sales

Regular expressions might be the best tool here. If you are not famililar with them, I recommend you install Expresso - a great little regex tool.
Something like:
Regex regex = new Regex("\\((?<TextInsideBrackets>\\w+)\\)");
string incomingValue = "Username (sales)";
string insideBrackets = null;
Match match = regex.Match(incomingValue);
if(match.Success)
{
insideBrackets = match.Groups["TextInsideBrackets"].Value;
}

string input = "User name (sales)";
string output = input.Substring(input.IndexOf('(') + 1, input.IndexOf(')') - input.IndexOf('(') - 1);

A regex maybe? I think this would work...
\(([a-z]+?)\)

using System;
using System.Text.RegularExpressions;
private IEnumerable<string> GetSubStrings(string input, string start, string end)
{
Regex r = new Regex(Regex.Escape(start) +`"(.*?)"` + Regex.Escape(end));
MatchCollection matches = r.Matches(input);
foreach (Match match in matches)
yield return match.Groups[1].Value;
}

int start = input.IndexOf("(") + 1;
int length = input.IndexOf(")") - start;
output = input.Substring(start, length);

Use a Regular Expression:
string test = "(test)";
string word = Regex.Match(test, #"\((\w+)\)").Groups[1].Value;
Console.WriteLine(word);

input.Remove(input.IndexOf(')')).Substring(input.IndexOf('(') + 1);

The regex method is superior I think, but if you wanted to use the humble substring
string input= "my name is (Jayne C)";
int start = input.IndexOf("(");
int stop = input.IndexOf(")");
string output = input.Substring(start+1, stop - start - 1);
or
string input = "my name is (Jayne C)";
string output = input.Substring(input.IndexOf("(") +1, input.IndexOf(")")- input.IndexOf("(")- 1);

var input = "12(34)1(12)(14)234";
var output = "";
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '(')
{
var start = i + 1;
var end = input.IndexOf(')', i + 1);
output += input.Substring(start, end - start) + ",";
}
}
if (output.Length > 0) // remove last comma
output = output.Remove(output.Length - 1);
output : "34,12,14"

Here is a general purpose readable function that avoids using regex:
// Returns the text between 'start' and 'end'.
string ExtractBetween(string text, string start, string end)
{
int iStart = text.IndexOf(start);
iStart = (iStart == -1) ? 0 : iStart + start.Length;
int iEnd = text.LastIndexOf(end);
if(iEnd == -1)
{
iEnd = text.Length;
}
int len = iEnd - iStart;
return text.Substring(iStart, len);
}
To call it in your particular example you can do:
string result = ExtractBetween("User name (sales)", "(", ")");

I'm finding that regular expressions are extremely useful but very difficult to write. So, I did some research and found this tool that makes writing them so easy.
Don't shy away from them because the syntax is difficult to figure out. They can be so powerful.

This code is faster than most solutions here (if not all), packed as String extension method, it does not support recursive nesting:
public static string GetNestedString(this string str, char start, char end)
{
int s = -1;
int i = -1;
while (++i < str.Length)
if (str[i] == start)
{
s = i;
break;
}
int e = -1;
while(++i < str.Length)
if (str[i] == end)
{
e = i;
break;
}
if (e > s)
return str.Substring(s + 1, e - s - 1);
return null;
}
This one is little longer and slower, but it handles recursive nesting more nicely:
public static string GetNestedString(this string str, char start, char end)
{
int s = -1;
int i = -1;
while (++i < str.Length)
if (str[i] == start)
{
s = i;
break;
}
int e = -1;
int depth = 0;
while (++i < str.Length)
if (str[i] == end)
{
e = i;
if (depth == 0)
break;
else
--depth;
}
else if (str[i] == start)
++depth;
if (e > s)
return str.Substring(s + 1, e - s - 1);
return null;
}

I've been using and abusing C#9 recently and I can't help throwing in Spans even in questionable scenarios... Just for the fun of it, here's a variation on the answers above:
var input = "User name (sales)";
var txtSpan = input.AsSpan();
var startPoint = txtSpan.IndexOf('(') + 1;
var length = txtSpan.LastIndexOf(')') - startPoint;
var output = txtSpan.Slice(startPoint, length);
For the OP's specific scenario, it produces the right output.
(Personally, I'd use RegEx, as posted by others. It's easier to get around the more tricky scenarios where the solution above falls apart).
A better version (as extension method) I made for my own project:
//Note: This only captures the first occurrence, but
//can be easily modified to scan across the text (I'd prefer Slicing a Span)
public static string ExtractFromBetweenChars(this string txt, char openChar, char closeChar)
{
ReadOnlySpan<char> span = txt.AsSpan();
int firstCharPos = span.IndexOf(openChar);
int lastCharPos = -1;
if (firstCharPos != -1)
{
for (int n = firstCharPos + 1; n < span.Length; n++)
{
if (span[n] == openChar) firstCharPos = n; //This allows the opening char position to change
if (span[n] == closeChar) lastCharPos = n;
if (lastCharPos > firstCharPos) break;
//This would correctly extract "sales" from this [contrived]
//example: "just (a (name (sales) )))(test"
}
return span.Slice(firstCharPos + 1, lastCharPos - firstCharPos - 1).ToString();
}
return "";
}

Much similar to #Gustavo Baiocchi Costa but offset is being calculated with another intermediate Substring.
int innerTextStart = input.IndexOf("(") + 1;
int innerTextLength = input.Substring(start).IndexOf(")");
string output = input.Substring(innerTextStart, innerTextLength);

I came across this while I was looking for a solution to a very similar implementation.
Here is a snippet from my actual code. Starts substring from the first char (index 0).
string separator = "\n"; //line terminator
string output;
string input= "HowAreYou?\nLets go there!";
output = input.Substring(0, input.IndexOf(separator));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Getting Text between tags - c#

Hey I have an input string that looks like this: Just a test Post [c] hello world [/c] the output should be: hello world can anybody help? I tried to use: Regex regex = new Regex("[c](.*)[/c]"); var v = regex.Match(post.Content); string s = v.Groups[1].ToString();

Change your code to: Regex regex = new Regex(#"\[c\](.*)\[/c\]"); var v = regex.Match(post.Content); string s = v.Groups[1].Value;

You looking for something like this? var regex = new Regex(#"(?<=\[c\]).*?(?=\[/c\])"); foreach(Match match in regex.Matches(someString)) Console.WriteLine(match.Value);

Related

Is it possible to remove every occurance of a character and one character after it from entire string using regex?

how to replace sections of string enclosed by square brackets with data stored in an array?

Changing a specific part of a string

Getting number from a string in C#

How do I extract text that lies between parentheses (round brackets)?

Categories

Resources