Find a string between 2 known values - c#

I need to be able to extract a string between 2 tags for example: "00002" from "morenonxmldata<tag1>0002</tag1>morenonxmldata"
I am using C# and .NET 3.5.

Regex regex = new Regex("<tag1>(.*)</tag1>");
var v = regex.Match("morenonxmldata<tag1>0002</tag1>morenonxmldata");
string s = v.Groups[1].ToString();
Or (as mentioned in the comments) to match the minimal subset:
Regex regex = new Regex("<tag1>(.*?)</tag1>");
Regex class is in System.Text.RegularExpressions namespace.

Solution without need of regular expression:
string ExtractString(string s, string tag) {
// You should check for errors in real-world code, omitted for brevity
var startTag = "<" + tag + ">";
int startIndex = s.IndexOf(startTag) + startTag.Length;
int endIndex = s.IndexOf("</" + tag + ">", startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}

A Regex approach using lazy match and back-reference:
foreach (Match match in Regex.Matches(
"morenonxmldata<tag1>0002</tag1>morenonxmldata<tag2>abc</tag2>asd",
#"<([^>]+)>(.*?)</\1>"))
{
Console.WriteLine("{0}={1}",
match.Groups[1].Value,
match.Groups[2].Value);
}

Extracting contents between two known values can be useful for later as well. So why not create an extension method for it. Here is what i do, Short and simple...
public static string GetBetween(this string content, string startString, string endString)
{
int Start=0, End=0;
if (content.Contains(startString) && content.Contains(endString))
{
Start = content.IndexOf(startString, 0) + startString.Length;
End = content.IndexOf(endString, Start);
return content.Substring(Start, End - Start);
}
else
return string.Empty;
}

string input = "Exemple of value between two string FirstString text I want to keep SecondString end of my string";
var match = Regex.Match(input, #"FirstString (.+?) SecondString ").Groups[1].Value;

To get Single/Multiple values without regular expression
// For Single
var value = inputString.Split("<tag1>", "</tag1>")[1];
// For Multiple
var values = inputString.Split("<tag1>", "</tag1>").Where((_, index) => index % 2 != 0);

For future reference, I found this code snippet at http://www.mycsharpcorner.com/Post.aspx?postID=15 If you need to search for different "tags" it works very well.
public static string[] GetStringInBetween(string strBegin,
string strEnd, string strSource,
bool includeBegin, bool includeEnd)
{
string[] result ={ "", "" };
int iIndexOfBegin = strSource.IndexOf(strBegin);
if (iIndexOfBegin != -1)
{
// include the Begin string if desired
if (includeBegin)
iIndexOfBegin -= strBegin.Length;
strSource = strSource.Substring(iIndexOfBegin
+ strBegin.Length);
int iEnd = strSource.IndexOf(strEnd);
if (iEnd != -1)
{
// include the End string if desired
if (includeEnd)
iEnd += strEnd.Length;
result[0] = strSource.Substring(0, iEnd);
// advance beyond this segment
if (iEnd + strEnd.Length < strSource.Length)
result[1] = strSource.Substring(iEnd
+ strEnd.Length);
}
}
else
// stay where we are
result[1] = strSource;
return result;
}

I strip before and after data.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace testApp
{
class Program
{
static void Main(string[] args)
{
string tempString = "morenonxmldata<tag1>0002</tag1>morenonxmldata";
tempString = Regex.Replace(tempString, "[\\s\\S]*<tag1>", "");//removes all leading data
tempString = Regex.Replace(tempString, "</tag1>[\\s\\S]*", "");//removes all trailing data
Console.WriteLine(tempString);
Console.ReadLine();
}
}
}

Without RegEx, with some must-have value checking
public static string ExtractString(string soapMessage, string tag)
{
if (string.IsNullOrEmpty(soapMessage))
return soapMessage;
var startTag = "<" + tag + ">";
int startIndex = soapMessage.IndexOf(startTag);
startIndex = startIndex == -1 ? 0 : startIndex + startTag.Length;
int endIndex = soapMessage.IndexOf("</" + tag + ">", startIndex);
endIndex = endIndex > soapMessage.Length || endIndex == -1 ? soapMessage.Length : endIndex;
return soapMessage.Substring(startIndex, endIndex - startIndex);
}

public string between2finer(string line, string delimiterFirst, string delimiterLast)
{
string[] splitterFirst = new string[] { delimiterFirst };
string[] splitterLast = new string[] { delimiterLast };
string[] splitRes;
string buildBuffer;
splitRes = line.Split(splitterFirst, 100000, System.StringSplitOptions.RemoveEmptyEntries);
buildBuffer = splitRes[1];
splitRes = buildBuffer.Split(splitterLast, 100000, System.StringSplitOptions.RemoveEmptyEntries);
return splitRes[0];
}
private void button1_Click(object sender, EventArgs e)
{
string manyLines = "Received: from exim by isp2.ihc.ru with local (Exim 4.77) \nX-Failed-Recipients: rmnokixm#gmail.com\nFrom: Mail Delivery System <Mailer-Daemon#isp2.ihc.ru>";
MessageBox.Show(between2finer(manyLines, "X-Failed-Recipients: ", "\n"));
}

Related

From end to start how to get everything from the 5th occurence of "-"?

I have this string:
abc-8479fe75-82rb5-45g00-b563-a7346703098b
I want to go through this string starting from the end and grab everything till the 5th occurence of the "-" sign.
how do I substring this string to only show (has to be through the procedure described above): -8479fe75-82rb5-45g00-b563-a7346703098b
You can also define handy extension method if that's common use case:
public static class StringExtensions
{
public static string GetSubstringFromReversedOccurence(this string #this,
char #char,
int occurence)
{
var indexes = new List<int>();
var idx = 0;
while ((idx = #this.IndexOf(#char, idx + 1)) > 0)
{
indexes.Add(idx);
}
if (occurence > indexes.Count)
{
throw new Exception("Not enough occurences of character");
}
var desiredIndex = indexes[indexes.Count - occurence];
return #this.Substring(desiredIndex);
}
}
According to your string rules, I would do a simple split :
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var split = text.split('-');
if (split.Lenght <= 5)
return text;
return string.Join('-', split.Skip(split.Lenght - 5);
We could use a regex replacement with a capture group:
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var pattern = #"([^-]+(?:-[^-]+){4}).*";
var output = Regex.Replace(text, pattern, "$1");
Console.WriteLine(output); // abc-8479fe75-82rb5-45g00-b563
If we can rely on the input always having 6 hyphen separated components, we could also use this version:
var text = "abc-8479fe75-82rb5-45g00-b563-a7346703098b";
var pattern = #"-[^-]+$";
var output = Regex.Replace(text, pattern, "");
Console.WriteLine(output); // abc-8479fe75-82rb5-45g00-b563
I think this should be the most performant
public string GetSubString(string text)
{
var count = 0;
for (int i = text.Length-1; i > 0; i--)
{
if (text[i] == '-') count++;
if (count == 5) return text.Substring(i);
}
return null;
}

Is it possible to remove every occurance of a character and one character after it from entire string using regex?

Is it possible to remove every occurance of a character + one after it from entire string?
Here's a code that achieves what I described, but is rather big:
private static string FilterText(string text){
string filteredText = text;
while (true)
{
int comaIndex = filteredText.IndexOf('.');
if (comaIndex == -1)
{
break;
}
else if (comaIndex > 1)
{
filteredText = filteredText.Substring(0, comaIndex) + filteredText.Substring(comaIndex + 2);
}
else if (comaIndex == 1)
{
filteredText = filteredText[0] + filteredText.Substring(comaIndex + 2);
}
else
{
filteredText = filteredText.Substring(comaIndex + 2);
}
}
return filteredText;
}
This code would turn for example input .Otest.1 .2.,string.w.. to test string
Is it possible to achieve the same result using regex?
You want to use
var output = Regex.Replace(text, #"\..", RegexOptions.Singleline);
See the .NET regex demo. Details:
\. - matches a dot
. - matches any char including a line feed char due to the RegexOptions.Singleline option used.
Try this pattern: (?<!\.)\w+
code:
using System;
using System.Text.RegularExpressions;
public class Test{
public static void Main(){
string str = ".Otest.1 .2.,string.w..";
Console.WriteLine(FilterText(str));
}
private static string FilterText(string text){
string pattern = #"(?<!\.)\w+";
string result = "";
foreach(Match m in Regex.Matches(text, pattern)){
result += m + " ";
}
return result;
}
}
output:
test string

Using indexOf (C#) to remove a set of characters from a string

public static string shita1(string st1)
{
string st2 = "", stemp = st1;
int i;
for(i=0; i<stemp.Length; i++)
{
if (stemp.IndexOf("cbc") == i)
{
i += 2 ;
stemp = "";
stemp = st1.Substring(i);
i = 0;
}
else
st2 = st2 + stemp[i];
}
return st2;
}
static void Main(string[] args)
{
string st1;
Console.WriteLine("enter one string:");
st1 = Console.ReadLine();
Console.WriteLine(shita1(st1));
}
}
i got a challange from my college, the challange is to move any "cbc" characters from a string...
this is my code... it works when i use only one "cbc" but when i use 2 of them it stucks... help please :)
The IndexOf Method gives you everything you need to know.
Per the documentation.
Reports the zero-based index of the first occurrence of a specified
Unicode character or string within this instance. The method returns
-1 if the character or string is not found in this instance.
This means you can create a loop that repeats as long as the returned index is not -1 and you don't have to loop through the string testing letter by letter.
I think this should work just tested it on some examples. Doesn't use string.Replace or IndexOf
static void Main(string[] args)
{
Console.WriteLine("enter one string:");
var input = Console.ReadLine();
Console.WriteLine(RemoveCBC(input));
}
static string RemoveCBC(string source)
{
var result = new StringBuilder();
for (int i = 0; i < source.Length; i++)
{
if (i + 2 == source.Length)
break;
var c = source[i];
var b = source[i + 1];
var c2 = source[i + 2];
if (c == 'c' && c2 == 'c' && b == 'b')
i = i + 2;
else
result.Append(source[i]);
}
return result.ToString();
}
You can use Replace to remove/replace all occurances of a string inside another string:
string original = "cbc_STRING_cbc";
original = original.Replace("cbc", String.Empty);
If you want remove characters from string using only IndexOf method you can use this code.
public static string shita1(string st1)
{
int index = -1;
string yourMatchingString = "cbc";
while ((index = st1.IndexOf(yourMatchingString)) != -1)
st1 = st1.Remove(index, yourMatchingString.Length);
return st1;
}
This code remove all inputs of you string.
But you can do it just in one line:
st1 = st1.Replace("cbc", string.Empty);
Hope this help.

How can I check a string multiple times for certain pattern?

I have a string (a source code of a website). And I want to check for a string multiple times. I currently use this code:
string verified = "starting";
if (s.Contains(verified))
{
int startIndex = s.IndexOf(verified);
int endIndex = s.IndexOf("ending", startIndex);
string verifiedVal = s.Substring(startIndex + verified.Length, (endIndex - (startIndex + verified.Length)));
imgCodes.Add(verifiedVal);
}
The code does work, but it only works for the first string found. There are more than 1 appearances in the string, so how can I find them all and add them to the list?
Thanks.
class Program
{
static void Main(string[] args)
{
string s = "abc123djhfh123hfghh123jkj12";
string v = "123";
int index = 0;
while (s.IndexOf(v, index) >= 0)
{
int startIndex = s.IndexOf(v, index);
Console.WriteLine(startIndex);
//do something with startIndex
index = startIndex + v.Length ;
}
Console.ReadLine();
}
}
You can use Regex to do that:
var matches = Regex.Matches(s, #"starting(?<content>(?:(?!ending).)*)");
imgCodes.AddRange(matches.Cast<Match>().Select(x => x.Groups["content"].Value));
After that imgCodes will contain ALL substrings between starting and ending.
Sample:
var s = "afjaöklfdata-entry-id=\"aaa\" data-page-numberuiwotdata-entry-id=\"bbb\" data-page-numberweriouw";
var matches = Regex.Matches(s, #"data-entry-id=""(?<content>(?:(?!"" data-page-number).)+)");
var imgCodes = matches.Cast<Match>().Select(x => x.Groups["content"].Value).ToList();
// imgCode = ["aaa", "bbb"] (as List<string> of course)
If you don't need empty strings you can change .)* in Regex to .)+.

C# How can I compare two word strings and indicate which parts are different

For example if I have...
string a = "personil";
string b = "personal";
I would like to get...
string c = "person[i]l";
However it is not necessarily a single character. I could be like this too...
string a = "disfuncshunal";
string b = "dysfunctional";
For this case I would want to get...
string c = "d[isfuncshu]nal";
Another example would be... (Notice that the length of both words are different.)
string a = "parralele";
string b = "parallel";
string c = "par[ralele]";
Another example would be...
string a = "ato";
string b = "auto";
string c = "a[]to";
How would I go about doing this?
Edit: The length of the two strings can be different.
Edit: Added additional examples. Credit goes to user Nenad for asking.
I must be very bored today, but I actually made UnitTest that pass all 4 cases (if you did not add some more in the meantime).
Edit: Added 2 edge cases and fix for them.
Edit2: letters that repeat multiple times (and error on those letters)
[Test]
[TestCase("parralele", "parallel", "par[ralele]")]
[TestCase("personil", "personal", "person[i]l")]
[TestCase("disfuncshunal", "dysfunctional", "d[isfuncshu]nal")]
[TestCase("ato", "auto", "a[]to")]
[TestCase("inactioned", "inaction", "inaction[ed]")]
[TestCase("refraction", "fraction", "[re]fraction")]
[TestCase("adiction", "ad[]diction", "ad[]iction")]
public void CompareStringsTest(string attempted, string correct, string expectedResult)
{
int first = -1, last = -1;
string result = null;
int shorterLength = (attempted.Length < correct.Length ? attempted.Length : correct.Length);
// First - [
for (int i = 0; i < shorterLength; i++)
{
if (correct[i] != attempted[i])
{
first = i;
break;
}
}
// Last - ]
var a = correct.Reverse().ToArray();
var b = attempted.Reverse().ToArray();
for (int i = 0; i < shorterLength; i++)
{
if (a[i] != b[i])
{
last = i;
break;
}
}
if (first == -1 && last == -1)
result = attempted;
else
{
var sb = new StringBuilder();
if (first == -1)
first = shorterLength;
if (last == -1)
last = shorterLength;
// If same letter repeats multiple times (ex: addition)
// and error is on that letter, we have to trim trail.
if (first + last > shorterLength)
last = shorterLength - first;
if (first > 0)
sb.Append(attempted.Substring(0, first));
sb.Append("[");
if (last > -1 && last + first < attempted.Length)
sb.Append(attempted.Substring(first, attempted.Length - last - first));
sb.Append("]");
if (last > 0)
sb.Append(attempted.Substring(attempted.Length - last, last));
result = sb.ToString();
}
Assert.AreEqual(expectedResult, result);
}
Have you tried my DiffLib?
With that library, and the following code (running in LINQPad):
void Main()
{
string a = "disfuncshunal";
string b = "dysfunctional";
var diff = new Diff<char>(a, b);
var result = new StringBuilder();
int index1 = 0;
int index2 = 0;
foreach (var part in diff)
{
if (part.Equal)
result.Append(a.Substring(index1, part.Length1));
else
result.Append("[" + a.Substring(index1, part.Length1) + "]");
index1 += part.Length1;
index2 += part.Length2;
}
result.ToString().Dump();
}
You get this output:
d[i]sfunc[shu]nal
To be honest I don't understand what this gives you, as you seem to completely ignore the changed parts in the b string, only dumping the relevant portions of the a string.
Here is a complete and working console application that will work for both examples you gave:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string a = "disfuncshunal";
string b = "dysfunctional";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < a.Length; i++)
{
if (a[i] != b[i])
{
sb.Append("[");
sb.Append(a[i]);
sb.Append("]");
continue;
}
sb.Append(a[i]);
}
var str = sb.ToString();
var startIndex = str.IndexOf("[");
var endIndex = str.LastIndexOf("]");
var start = str.Substring(0, startIndex + 1);
var mid = str.Substring(startIndex + 1, endIndex - 1);
var end = str.Substring(endIndex);
Console.WriteLine(start + mid.Replace("[", "").Replace("]", "") + end);
}
}
}
it will not work if you want to display more than one entire section of the mismatched word.
You did not specify what to do if the strings were of different lengths, but here is a solution to the problem when the strings are of equal length:
private string Compare(string string1, string string2) {
//This only works if the two strings are the same length..
string output = "";
bool mismatch = false;
for (int i = 0; i < string1.Length; i++) {
char c1 = string1[i];
char c2 = string2[i];
if (c1 == c2) {
if (mismatch) {
output += "]" + c1;
mismatch = false;
} else {
output += c1;
}
} else {
if (mismatch) {
output += c1;
} else {
output += "[" + c1;
mismatch = true;
}
}
}
return output;
}
Not really good approach but as an exercise in using LINQ: task seem to be find matching prefix and suffix for 2 strings, return "prefix + [+ middle of first string + suffix.
So you can match prefix (Zip + TakeWhile(a==b)), than repeat the same for suffix by reversing both strings and reversing result.
var first = "disfuncshunal";
var second = "dysfunctional";
// Prefix
var zipped = first.ToCharArray().Zip(second.ToCharArray(), (f,s)=> new {f,s});
var prefix = string.Join("",
zipped.TakeWhile(c => c.f==c.s).Select(c => c.f));
// Suffix
var zippedReverse = first.ToCharArray().Reverse()
.Zip(second.ToCharArray().Reverse(), (f,s)=> new {f,s});
var suffix = string.Join("",
zippedReverse.TakeWhile(c => c.f==c.s).Reverse().Select(c => c.f));
// Cut and combine.
var middle = first.Substring(prefix.Length,
first.Length - prefix.Length - suffix.Length);
var result = prefix + "[" + middle + "]" + suffix;
Much easier and faster approach is to use 2 for loops (from start to end, and from end to start).

Categories