Related
The string (aka Message) I am trying to parse out looks like this. (It also looks exactly like this when you paste it in Notepad
"CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: /api/acmsxdsreader/readpolicyfrompolicyassignment Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1 CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9 RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy, TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5"
I would like to make 5 separate functions that return each of the specific values which would be like GetCorrelationId, GetRFAPI, GetCaller, GetRqSchema, and GetTenantId and extract out their corresponding values.
How would I do this in C# without using Regex?
Below is the code I have made for the caller (and this method is the same for all other 4 functions) but I have been advised that regex is slow and should not be used by my mentor and the method I have below doesn't even work anyway. Also, my biggest problem with trying to use the regex is that there are multiple delimiters in the message like ',' ' ' and ': ' and ':'
string parseCaller(string message)
{
var pattern = #"Caller:(.*)";
var r = new Regex(pattern).Match(message);
var caller = r.Groups[1].Value;
return caller;
}
Expected result should be:
GetCorrelationId(message) RETURNS b99fb632-78cf-4910-ab23-4f69833ed2d9
GetRFAPI(message) RETURNS /api/acmsxdsreader/readpolicyfrompolicyassignment
GetRqSchema(message) RETURNS {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy
GetCaller(message) RETURNS C2F023C52E2148C9C1D040FBFAC113D463A368B1
GetTenantId(message) RETURNS 7a205197-8e59-487d-b9fa-3fc1b108f1e5
I would approach this a little differently and create a class that has properties for each value you want to parse from the string. Then we can create a static Parse method that creates an instance of the class from an input string, which sets all the properties for you.
If the string always has the same items (CorrelationId, RequestForAPI, Caller, etc) in the same order, we can utilize a simple helper method to GetValueBetween two headers.
The code is pretty simple:
class MessageData
{
public string CorrelationId { get; set; }
public string RequestForAPI { get; set; }
public string RequestedSchemas { get; set; }
public string Caller { get; set; }
public string TennantId { get; set; }
public static MessageData Parse(string input)
{
return new MessageData
{
CorrelationId = GetValueBetween(input, "CorrelationId:", "Request for API:"),
RequestForAPI = GetValueBetween(input, "Request for API:", "Caller:"),
Caller = GetValueBetween(input, "Caller:", "CorrelationId:"),
RequestedSchemas = GetValueBetween(input, "RequestedSchemas:", "TenantId:"),
TennantId = GetValueBetween(input, "TenantId:", null),
};
}
private static string GetValueBetween(string input, string startDelim, string endDelim)
{
if (input == null) return string.Empty;
var start = input.IndexOf(startDelim);
if (start == -1) return string.Empty;
start += startDelim.Length;
var length = endDelim == null
? input.Length - start
: input.IndexOf(endDelim, start) - start;
if (length < 0) length = input.Length - start;
return input.Substring(start, length).Trim();
}
}
And now we can just call MessageData.Parse(inputString), and we have a class with all it's properties set from the input string:
private static void Main()
{
var message = #"CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: /api/acmsxdsreader/readpolicyfrompolicyassignment Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1 CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9 RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy, TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5";
var messageData = MessageData.Parse(message);
// Now we can access any property
Console.WriteLine(messageData.CorrelationId);
Console.WriteLine(messageData.RequestForAPI);
Console.WriteLine(messageData.RequestedSchemas);
Console.WriteLine(messageData.Caller);
Console.WriteLine(messageData.TennantId);
Console.ReadKey();
}
Based on these specifications in your question:
How would I do this in C# without using Regex?
and
I would like to make 5 separate functions
You can try the below. It's really simple, as you can study your string and use IndexOf and SubString functions appropriately:
using System;
class ParseTest
{
static string GetCorrelationId(string message)
{
int i = message.IndexOf(": ") + 2; //length of ": "
int j = message.IndexOf("Request");
return message.Substring(i, j-i).Trim();
}
static string GetRFAPI(string message)
{
int i = message.IndexOf("API: ") + 5; //length of "API: "
int j = message.IndexOf("Caller");
return message.Substring(i, j-i).Trim();
}
static string GetCaller(string message)
{
int i = message.IndexOf("Caller:") + 7; //length of "Caller: "
int j = message.IndexOf(" CorrelationId");
return message.Substring(i, j-i).Trim();
}
static string GetRqSchema(string message)
{
int i = message.IndexOf("RequestedSchemas:") + 17; //length of "RequestedSchemas:"
int j = message.IndexOf(", TenantId:");
return message.Substring(i, j-i).Trim();
}
static string GetTenantId(string message)
{
int i = message.IndexOf("TenantId:") + 9; //length of "TenantId: "
return message.Substring(i).Trim();
}
static void Main()
{
string m = #"CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9
Request for API: /api/acmsxdsreader/readpolicyfrompolicyassignment Caller:C2F023C52E2148C9C1D040FBFAC113D463A368B1 CorrelationId: b99fb632-78cf-4910-ab23-4f69833ed2d9 RequestedSchemas: {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}VoicePolicy, {urn:schema:Microsoft.Rtc.Management.Policy.Voice.2008}OnlineVoiceRoutingPolicy, TenantId: 7a205197-8e59-487d-b9fa-3fc1b108f1e5";
Console.WriteLine(GetCorrelationId(m));
Console.WriteLine(GetRFAPI(m));
Console.WriteLine(GetCaller(m));
Console.WriteLine(GetRqSchema(m));
Console.WriteLine(GetTenantId(m));
}
}
You can run it here.
EDIT: Of course you can modify this to use get-only properties, as some other answerers have tried to do.
If, on the other hand, you want to write a parser (this is kind of lazy one 😂), then that's another matter for your researching pleasure.
Here's a more robust solution if you don't know if the order of these elements returned from the API will change in the future:
private string GetRFAPI(string str)
{
return GetSubstring(str, "Request for API: ", ' ', 1);
}
private string GetCaller(string str)
{
return GetSubstring(str, "Caller:", ' ', 1);
}
private string GetCorrelationId(string str)
{
return GetSubstring(str, "CorrelationId: ", ' ', 1);
}
private string GetTenantId(string str)
{
return GetSubstring(str, "TenantId: ", ' ', 1);
}
private string GetRequestedSchemas(string str)
{
return GetSubstring(str, "RequestedSchemas: ", ',', 2);
}
private string GetSubstring(string str, string pattern, char delimiter, int occurrence)
{
int start = str.IndexOf(pattern);
if (start < 0)
{
return null;
}
for (int i = start + pattern.Length, counter = 0; i < str.Length; i++, counter++)
{
if ((str[i] == delimiter && --occurrence == 0) || i == str.Length - 1)
{
return str.Substring(start + pattern.Length, counter).Trim();
}
}
return null;
}
I need to parse this string from serial:-
!00037,00055#
00037 as one string, 00055 as another string
However this string is came out when the robot's tire is rotated and some other string may also display before and after the string that I need to parse. For example this is the some of the transmission received:-
11,00085#R-STOPR-STOP!00011,00095#!00001,00015#R-STOP!00001,00085#!00003,00075#!00006,00015#R-STOP!00009,00025#!00011,00035#!00011,00085#R-STOPR-STOP!00011,00095#!00001,00015#R-STOP!00001,00085#!00003,00075#!00006,00015#R-STOP!00009,00025#!00011,00035#R-STOP!00001,00085#!00003,00075#!00006,00015#R-STOP!00009,00025#!00011,00035#R-STOP!00037,00055#!00023,00075#R-STOPR-STOP!00022,00065#!00011,00085#R-STOPR-STOP!00011,00095#!00001,00015#R-STOP!00001,00085#!00003,00075#!00006,00015#R-STOP!00009,00025#!00011,00035#R-STOP!00037,00055#!00023,00075#R-STOPR-STOP!00022,00065#!00011,00085#R-STOPR-STOP!00011,00095#!00001,00015#
So far I'm stuck at what to do next after SerialPort.ReadExisting()
Here is some code to retrieve the serial data:-
private void serialCom_DataReceived(object sender, SerialDataReceivedEventArgs e)
{
try
{
InputData = serialCom.ReadExisting();
if (InputData != String.Empty)
{
this.BeginInvoke(new SetTextCallback(IncomingData), new object[] { InputData });
}
}
catch
{
MessageBox.Show("Error");
}
}
and display incoming serial data inside textbox
private void IncomingData(string data)
{
tb_incomingData.AppendText(data);
tb_incomingData.ScrollToCaret();
}
This code is using .NET Framework 4.0 and Windows Form.
Finally solve it using indexof and substring.
private void IncomingData(string data)
{
//Show received data in textbox
tb_incomingData.AppendText(data);
tb_incomingData.ScrollToCaret();
//Append data inside longdata (string)
longData = longData + data;
if (longData.Contains('#') && longData.Contains(',') && longData.Contains('!'))
{
try
{
indexSeru = longData.IndexOf('!'); //retrieve index number of the symbol !
indexComma = longData.IndexOf(','); //retrieve index number of the symbol ,
indexAlias = longData.IndexOf('#'); //retrieve index number of the symbol ,
rotation = longData.Substring(indexSeru + 1, 5); //first string is taken after symbol ! and 5 next char
subRotation = longData.Substring(indexComma + 1, 5); //second string is taken after symbol ! and 5 next char
//tss_distance.Text = rotation + "," + subRotation;
longData = null; //clear longdata string
}
catch
{
indexSeru = 0;
indexComma = 0;
indexAlias = 0;
}
}
}
You can determine your pattern to transform this string into a array using SPLIT function.
This code, send "!00037,00055#" returns two itens: 00037 and 00055.
static void Main(string[] args)
{
string k = "!00037,00055#";
var array = k.ToString().Split(',');
Console.WriteLine("Dirty Itens");
for (var x = 0; x <= array.Length - 1; x++)
{
var linha = "Item " + x.ToString() + " = " + array[x];
Console.WriteLine(linha);
}
Console.WriteLine("Cleaned Itens");
for (var x = 0; x <= array.Length - 1; x++)
{
var linha = "Item " + x.ToString() + " = " + CleanString(array[x]);
Console.WriteLine(linha);
}
Console.ReadLine();
}
public static string CleanString(string inputString)
{
string resultString = "";
Regex regexObj = new Regex(#"[^\d]");
resultString = regexObj.Replace(inputString, "");
return resultString;
}
I would like to match strings with a wildcard (*), where the wildcard means "any". For example:
*X = string must end with X
X* = string must start with X
*X* = string must contain X
Also, some compound uses such as:
*X*YZ* = string contains X and contains YZ
X*YZ*P = string starts with X, contains YZ and ends with P.
Is there a simple algorithm to do this? I'm unsure about using regex (though it is a possibility).
To clarify, the users will type in the above to a filter box (as simple a filter as possible), I don't want them to have to write regular expressions themselves. So something I can easily transform from the above notation would be good.
Often, wild cards operate with two type of jokers:
? - any character (one and only one)
* - any characters (zero or more)
so you can easily convert these rules into appropriate regular expression:
// If you want to implement both "*" and "?"
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\\?", ".").Replace("\\*", ".*") + "$";
}
// If you want to implement "*" only
private static String WildCardToRegular(String value) {
return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$";
}
And then you can use Regex as usual:
String test = "Some Data X";
Boolean endsWithEx = Regex.IsMatch(test, WildCardToRegular("*X"));
Boolean startsWithS = Regex.IsMatch(test, WildCardToRegular("S*"));
Boolean containsD = Regex.IsMatch(test, WildCardToRegular("*D*"));
// Starts with S, ends with X, contains "me" and "a" (in that order)
Boolean complex = Regex.IsMatch(test, WildCardToRegular("S*me*a*X"));
You could use the VB.NET Like-Operator:
string text = "x is not the same as X and yz not the same as YZ";
bool contains = LikeOperator.LikeString(text,"*X*YZ*", Microsoft.VisualBasic.CompareMethod.Binary);
Use CompareMethod.Text if you want to ignore the case.
You need to add using Microsoft.VisualBasic.CompilerServices; and add a reference to the Microsoft.VisualBasic.dll.
Since it's part of the .NET framework and will always be, it's not a problem to use this class.
For those using .NET Core 2.1+ or .NET 5+, you can use the FileSystemName.MatchesSimpleExpression method in the System.IO.Enumeration namespace.
string text = "X is a string with ZY in the middle and at the end is P";
bool isMatch = FileSystemName.MatchesSimpleExpression("X*ZY*P", text);
Both parameters are actually ReadOnlySpan<char> but you can use string arguments too. There's also an overloaded method if you want to turn on/off case matching. It is case insensitive by default as Chris mentioned in the comments.
Using of WildcardPattern from System.Management.Automation may be an option.
pattern = new WildcardPattern(patternString);
pattern.IsMatch(stringToMatch);
Visual Studio UI may not allow you to add System.Management.Automation assembly to References of your project. Feel free to add it manually, as described here.
A wildcard * can be translated as .* or .*? regex pattern.
You might need to use a singleline mode to match newline symbols, and in this case, you can use (?s) as part of the regex pattern.
You can set it for the whole or part of the pattern:
X* = > #"X(?s:.*)"
*X = > #"(?s:.*)X"
*X* = > #"(?s).*X.*"
*X*YZ* = > #"(?s).*X.*YZ.*"
X*YZ*P = > #"(?s:X.*YZ.*P)"
*X*YZ* = string contains X and contains YZ
#".*X.*YZ"
X*YZ*P = string starts with X, contains YZ and ends with P.
#"^X.*YZ.*P$"
It is necessary to take into consideration, that Regex IsMatch gives true with XYZ, when checking match with Y*. To avoid it, I use "^" anchor
isMatch(str1, "^" + str2.Replace("*", ".*?"));
So, full code to solve your problem is
bool isMatchStr(string str1, string str2)
{
string s1 = str1.Replace("*", ".*?");
string s2 = str2.Replace("*", ".*?");
bool r1 = Regex.IsMatch(s1, "^" + s2);
bool r2 = Regex.IsMatch(s2, "^" + s1);
return r1 || r2;
}
This is kind of an improvement on the popular answer from #Dmitry Bychenko above (https://stackoverflow.com/a/30300521/4491768). In order to support ? and * as a matching characters we have to escape them. Use \\? or \\* to escape them.
Also a pre compiled regex will improve the performance (on reuse).
public class WildcardPattern
{
private readonly string _expression;
private readonly Regex _regex;
public WildcardPattern(string pattern)
{
if (string.IsNullOrEmpty(pattern)) throw new ArgumentNullException(nameof(pattern));
_expression = "^" + Regex.Escape(pattern)
.Replace("\\\\\\?","??").Replace("\\?", ".").Replace("??","\\?")
.Replace("\\\\\\*","**").Replace("\\*", ".*").Replace("**","\\*") + "$";
_regex = new Regex(_expression, RegexOptions.Compiled);
}
public bool IsMatch(string value)
{
return _regex.IsMatch(value);
}
}
usage
new WildcardPattern("Hello *\\**\\?").IsMatch("Hello W*rld?");
new WildcardPattern(#"Hello *\**\?").IsMatch("Hello W*rld?");
To support those one with C#+Excel (for partial known WS name) but not only - here's my code with wildcard (ddd*).
Briefly: the code gets all WS names and if today's weekday(ddd) matches the first 3 letters of WS name (bool=true) then it turn it to string that gets extracted out of the loop.
using System;
using Microsoft.Office.Interop.Excel;
using System.Runtime.InteropServices;
using Range = Microsoft.Office.Interop.Excel.Range;
using System.Diagnostics;
using System.Reflection;
using System.IO;
using System.Text.RegularExpressions;
...
string weekDay = DateTime.Now.ToString("ddd*");
Workbook sourceWorkbook4 = xlApp.Workbooks.Open(LrsIdWorkbook, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
Workbook destinationWorkbook = xlApp.Workbooks.Open(masterWB, 0, false, 5, "", "", true, XlPlatform.xlWindows, "\t", false, false, 0, true, 1, 0);
static String WildCardToRegular(String value)
{
return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$";
}
string wsName = null;
foreach (Worksheet works in sourceWorkbook4.Worksheets)
{
Boolean startsWithddd = Regex.IsMatch(works.Name, WildCardToRegular(weekDay + "*"));
if (startsWithddd == true)
{
wsName = works.Name.ToString();
}
}
Worksheet sourceWorksheet4 = (Worksheet)sourceWorkbook4.Worksheets.get_Item(wsName);
...
public class Wildcard
{
private readonly string _pattern;
public Wildcard(string pattern)
{
_pattern = pattern;
}
public static bool Match(string value, string pattern)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end);
}
public static bool Match(string value, string pattern, char[] toLowerTable)
{
int start = -1;
int end = -1;
return Match(value, pattern, ref start, ref end, toLowerTable);
}
public static bool Match(string value, string pattern, ref int start, ref int end)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end);
}
public static bool Match(string value, string pattern, ref int start, ref int end, char[] toLowerTable)
{
return new Wildcard(pattern).IsMatch(value, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end);
}
public bool IsMatch(string str, char[] toLowerTable)
{
int start = -1;
int end = -1;
return IsMatch(str, ref start, ref end, toLowerTable);
}
public bool IsMatch(string str, ref int start, ref int end)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
if (str[si] == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && str[si] != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
public bool IsMatch(string str, ref int start, ref int end, char[] toLowerTable)
{
if (_pattern.Length == 0) return false;
int pindex = 0;
int sindex = 0;
int pattern_len = _pattern.Length;
int str_len = str.Length;
start = -1;
while (true)
{
bool star = false;
if (_pattern[pindex] == '*')
{
star = true;
do
{
pindex++;
}
while (pindex < pattern_len && _pattern[pindex] == '*');
}
end = sindex;
int i;
while (true)
{
int si = 0;
bool breakLoops = false;
for (i = 0; pindex + i < pattern_len && _pattern[pindex + i] != '*'; i++)
{
si = sindex + i;
if (si == str_len)
{
return false;
}
char c = toLowerTable[str[si]];
if (c == _pattern[pindex + i])
{
continue;
}
if (si == str_len)
{
return false;
}
if (_pattern[pindex + i] == '?' && c != '.')
{
continue;
}
breakLoops = true;
break;
}
if (breakLoops)
{
if (!star)
{
return false;
}
sindex++;
if (si == str_len)
{
return false;
}
}
else
{
if (start == -1)
{
start = sindex;
}
if (pindex + i < pattern_len && _pattern[pindex + i] == '*')
{
break;
}
if (sindex + i == str_len)
{
if (end <= start)
{
end = str_len;
}
return true;
}
if (i != 0 && _pattern[pindex + i - 1] == '*')
{
return true;
}
if (!star)
{
return false;
}
sindex++;
continue;
}
}
sindex += i;
pindex += i;
if (start == -1)
{
start = sindex;
}
}
}
}
C# Console application sample
Command line Sample:
C:/> App_Exe -Opy PythonFile.py 1 2 3
Console output:
Argument list: -Opy PythonFile.py 1 2 3
Found python filename: PythonFile.py
using System;
using System.Text.RegularExpressions; //Regex
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
string cmdLine = String.Join(" ", args);
bool bFileExtFlag = false;
int argIndex = 0;
Regex regex;
foreach (string s in args)
{
//Search for the 1st occurrence of the "*.py" pattern
regex = new Regex(#"(?s:.*)\056py", RegexOptions.IgnoreCase);
bFileExtFlag = regex.IsMatch(s);
if (bFileExtFlag == true)
break;
argIndex++;
};
Console.WriteLine("Argument list: " + cmdLine);
if (bFileExtFlag == true)
Console.WriteLine("Found python filename: " + args[argIndex]);
else
Console.WriteLine("Python file with extension <.py> not found!");
}
}
}
I'm learning about string utilities in C#, and I have a method that replaces parts of a string.
Using the replace method I need to get an output such as
"Old file name: file00"
"New file name: file01"
Depending on what the user wants to change it to.
I am looking for help on making the method (NextImageName) replace only the digits, but not the file name.
class BuildingBlock
{
public static string ReplaceOnce(string word, string characters, int position)
{
word = word.Remove(position, characters.Length);
word = word.Insert(position, characters);
return word;
}
public static string GetLastName(string name)
{
string result = "";
int posn = name.LastIndexOf(' ');
if (posn >= 0) result = name.Substring(posn + 1);
return result;
}
public static string NextImageName(string filename, int newNumber)
{
if (newNumber > 9)
{
return ReplaceOnce(filename, newNumber, (filename.Length - 2))
}
if (newNumber < 10)
{
}
if (newNumber == 0)
{
}
}
The other "if" statements are empty for now until I find out how to do the first one.
The correct way to do this would be to use Regular Expressions.
Ideally you would separate "file" from "00" in "file00". Then take "00", convert it to an Int32 (using Int32.Parse()) and then rebuild your string with String.Format().
public static string NextImageName(string filename, int newNumber)
{
string oldnumber = "";
foreach (var item in filename.ToCharArray().Reverse())
if (char.IsDigit(item))
oldnumber = item + oldnumber ;
else
break;
return filename.Replace(oldnumber ,newNumber.ToString());
}
public static string NextImageName(string filename, int newNumber)
{
int i = 0;
foreach (char c in filename) // get index of first number
{
if (char.IsNumber(c))
break;
else
i++;
}
string s = filename.Substring(0,i); // remove original number
s = s + newNumber.ToString(); // add new number
return s;
}
I need to be able to extract a string between 2 tags for example: "00002" from "morenonxmldata<tag1>0002</tag1>morenonxmldata"
I am using C# and .NET 3.5.
Regex regex = new Regex("<tag1>(.*)</tag1>");
var v = regex.Match("morenonxmldata<tag1>0002</tag1>morenonxmldata");
string s = v.Groups[1].ToString();
Or (as mentioned in the comments) to match the minimal subset:
Regex regex = new Regex("<tag1>(.*?)</tag1>");
Regex class is in System.Text.RegularExpressions namespace.
Solution without need of regular expression:
string ExtractString(string s, string tag) {
// You should check for errors in real-world code, omitted for brevity
var startTag = "<" + tag + ">";
int startIndex = s.IndexOf(startTag) + startTag.Length;
int endIndex = s.IndexOf("</" + tag + ">", startIndex);
return s.Substring(startIndex, endIndex - startIndex);
}
A Regex approach using lazy match and back-reference:
foreach (Match match in Regex.Matches(
"morenonxmldata<tag1>0002</tag1>morenonxmldata<tag2>abc</tag2>asd",
#"<([^>]+)>(.*?)</\1>"))
{
Console.WriteLine("{0}={1}",
match.Groups[1].Value,
match.Groups[2].Value);
}
Extracting contents between two known values can be useful for later as well. So why not create an extension method for it. Here is what i do, Short and simple...
public static string GetBetween(this string content, string startString, string endString)
{
int Start=0, End=0;
if (content.Contains(startString) && content.Contains(endString))
{
Start = content.IndexOf(startString, 0) + startString.Length;
End = content.IndexOf(endString, Start);
return content.Substring(Start, End - Start);
}
else
return string.Empty;
}
string input = "Exemple of value between two string FirstString text I want to keep SecondString end of my string";
var match = Regex.Match(input, #"FirstString (.+?) SecondString ").Groups[1].Value;
To get Single/Multiple values without regular expression
// For Single
var value = inputString.Split("<tag1>", "</tag1>")[1];
// For Multiple
var values = inputString.Split("<tag1>", "</tag1>").Where((_, index) => index % 2 != 0);
For future reference, I found this code snippet at http://www.mycsharpcorner.com/Post.aspx?postID=15 If you need to search for different "tags" it works very well.
public static string[] GetStringInBetween(string strBegin,
string strEnd, string strSource,
bool includeBegin, bool includeEnd)
{
string[] result ={ "", "" };
int iIndexOfBegin = strSource.IndexOf(strBegin);
if (iIndexOfBegin != -1)
{
// include the Begin string if desired
if (includeBegin)
iIndexOfBegin -= strBegin.Length;
strSource = strSource.Substring(iIndexOfBegin
+ strBegin.Length);
int iEnd = strSource.IndexOf(strEnd);
if (iEnd != -1)
{
// include the End string if desired
if (includeEnd)
iEnd += strEnd.Length;
result[0] = strSource.Substring(0, iEnd);
// advance beyond this segment
if (iEnd + strEnd.Length < strSource.Length)
result[1] = strSource.Substring(iEnd
+ strEnd.Length);
}
}
else
// stay where we are
result[1] = strSource;
return result;
}
I strip before and after data.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
namespace testApp
{
class Program
{
static void Main(string[] args)
{
string tempString = "morenonxmldata<tag1>0002</tag1>morenonxmldata";
tempString = Regex.Replace(tempString, "[\\s\\S]*<tag1>", "");//removes all leading data
tempString = Regex.Replace(tempString, "</tag1>[\\s\\S]*", "");//removes all trailing data
Console.WriteLine(tempString);
Console.ReadLine();
}
}
}
Without RegEx, with some must-have value checking
public static string ExtractString(string soapMessage, string tag)
{
if (string.IsNullOrEmpty(soapMessage))
return soapMessage;
var startTag = "<" + tag + ">";
int startIndex = soapMessage.IndexOf(startTag);
startIndex = startIndex == -1 ? 0 : startIndex + startTag.Length;
int endIndex = soapMessage.IndexOf("</" + tag + ">", startIndex);
endIndex = endIndex > soapMessage.Length || endIndex == -1 ? soapMessage.Length : endIndex;
return soapMessage.Substring(startIndex, endIndex - startIndex);
}
public string between2finer(string line, string delimiterFirst, string delimiterLast)
{
string[] splitterFirst = new string[] { delimiterFirst };
string[] splitterLast = new string[] { delimiterLast };
string[] splitRes;
string buildBuffer;
splitRes = line.Split(splitterFirst, 100000, System.StringSplitOptions.RemoveEmptyEntries);
buildBuffer = splitRes[1];
splitRes = buildBuffer.Split(splitterLast, 100000, System.StringSplitOptions.RemoveEmptyEntries);
return splitRes[0];
}
private void button1_Click(object sender, EventArgs e)
{
string manyLines = "Received: from exim by isp2.ihc.ru with local (Exim 4.77) \nX-Failed-Recipients: rmnokixm#gmail.com\nFrom: Mail Delivery System <Mailer-Daemon#isp2.ihc.ru>";
MessageBox.Show(between2finer(manyLines, "X-Failed-Recipients: ", "\n"));
}