Find String with BeginsWith and EndsWith - c#

I want to search a long string for a string inside it that BeginsWith and EndsWith something, and then eventually replace it with a token.
So, let's say I have a string:
"This is text with a token: <TokenID=123>doesn't matter</Token>, and some more text!"
I'd like to extract/identify the following string:
<TokenID=123>doesn't matter</Token>
So that I can use it within a replace statement on the original string, to replace it with something else. The ID for this tag could be different, so I want to identify the string above by using something like:
var beginsWith = "<TokenID=";
var endsWith = "</Token>";
The BeginsWith and EndsWith values will be pulled from a CSV file into a list, as there are many of them with different content. Once I know how to extract these, I eventually want to replace them with a character that I could split on to create an array of strings around the extracted strings.
I don't want to use regex, as the BeginsWith and EndsWith strings need to be easily configurable and added to at a later stage.
This feels like it should be a really simple exercise, but for the life of me I can't figure out how to do it...

If I'm understanding your question correctly, you're going to want to use IndexOf() & Substring().
IndexOf() will get you the locations of your beginsWith and endWith for which you can provide to the Substring()
string data = "This is text with a token: <TokenID=123>doesn't matter</Token>, and some more text!";
string beginsWith = "<TokenID=";
string endsWith = "</Token>";
int startIndex = data.IndexOf(beginsWith);
// Add the length of endWidth so you're getting the location of the last character of the endsWith
int endIndex = data.IndexOf(endsWith) + endsWith.Length;
string extract = data.Substring(startIndex, endIndex - startIndex);
Console.WriteLine(extract);
Console.ReadLine();
Results:
<TokenID=123>doesn't matter</Token>
If you change your mind about using Regex, you can still use your beginsWith and endsWith to create your pattern.
string data = "This is text with a token: <TokenID=123>doesn't matter</Token>, and some more text!";
string beginsWith = "<TokenID=";
string endsWith = "</Token>";
string extract = Regex.Match(data, String.Format("{0}.+{1}", beginsWith, endsWith)).Value;
Console.WriteLine(extract);
Console.ReadLine();
The String.Format() creates a pattern that looks like
<TokenID=.+</Token>
Results:
<TokenID=123>doesn't matter</Token>

Here's an extension method so you can just attach it to a string:
Method
private static string StringBetween( this string StringEval, string startsWith, string endsWith)
{
var str = StringEval;
var start = str.IndexOf(startsWith);
var end = str.IndexOf(endsWith, start);
var val = str.Substring(start, (end - start) + endsWith.Length);
return val;
}
Usage
static void Main(string[] args)
{
var exampleStr = "This is text with a token: <TokenID=123>doesn't matter</Token>, and some more text!";
var startsWith = "<TokenID=";
var endsWith = "</Token>";
var val = exampleStr.StringBetween(startsWith, endsWith);
Console.WriteLine(val);
Console.ReadKey();
}

Without using Regex:
string s="This is text with a token: <TokenID=123>doesn't matter</Token>, and and some more text!"
string beginsWith = "<TokenID=";
string endsWith = "</Token>";
string Extract = null ;
int i,j ;
if ((i=s.IndexOf(beginsWith))>=0) && ((j=s.Substring(i).IndexOf(endsWith)>=0))
Extract=s.Substring(i,j)+endsWith ;
// result in extract (null if delimiting strings not found)

Related

How to do I cut off a certain part a String?

I have a big String in my program.
For Example:
String Newspaper = "...Blablabla... What do you like?...Blablabla... ";
Now I want to cut out the "What do you like?" an write it to a new String. But the problem is that the "Blablabla" is everytime something diffrent. Whit "cut out" I mean that you submit a start and a end word and all the things wrote between these lines should be in the new string. Because the sentence "What do you like?" changes sometimes except the start word "What" and the end word "like?"
Thanks for every responds
You can write the following method:
public static string CutOut(string s, string start, string end)
{
int startIndex = s.IndexOf(start);
if (startIndex == -1) {
return null;
}
int endIndex = s.IndexOf(end, startIndex);
if (endIndex == -1) {
return null;
}
return s.Substring(startIndex, endIndex - startIndex + end.Length);
}
It returns null if either the start or end pattern is not found. Only end patterns that follow the start pattern are searched for.
If you are working with C# 8+ and .NET Core 3.0+, you can also replace the last line with
return s[startIndex..(endIndex + end.Length)];
Test:
string input = "...Blablabla... What do you like?...Blablabla... ";
Console.WriteLine(CutOut(input, "What ", " like?"));
prints:
What do you like?
If you are happy with Regex, you can also write:
public static string CutOutRegex(string s, string start, string end)
{
Match match = Regex.Match(s, $#"\b{Regex.Escape(start)}.*{Regex.Escape(end)}");
if (match.Success) {
return match.Value;
}
return null;
}
The \b ensures that the start pattern is only found at the beginning of a word. You can drop it if you want. Also, if the end pattern occurs more than once, the result will include all of them unlike the first example with IndexOf which will only include the first one.
You have to do a substring, like the example below. See source for more information on substrings.
// A long string
string bio = "Mahesh Chand is a founder of C# Corner. Mahesh is also an
author, speaker, and software architect. Mahesh founded C# Corner in
2000.";
// Get first 12 characters substring from a string
string authorName = bio.Substring(0, 12);
Console.WriteLine(authorName);
In this case I would do it like this, cut the first part and then the second and concatenate with the fixed words using them as a parameter for cutting.
public string CutPhrase(string phrase)
{
var fst = "What";
var snd = "like?";
string[] cut1 = phrase.Split(new[] { fst }, StringSplitOptions.None);
string[] cut2 = cut1[1].Split(new[] { snd }, StringSplitOptions.None);
var rst = $"{fst} {cut2[0]} {snd}";
return rst;
}

Remove part of a string between an start and end

Code first:
string myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>"
// some code to handle myString and save it in myEditedString
Console.WriteLine(myEditedString);
//output now is: some question here regarding <at>disPossibleName</at>
I want to remove <at>onePossibleName</at> from myString. The string onePossibleName and disPossbileName could be any other string.
So far I am working with
string myEditedString = string.Join(" ", myString.Split(' ').Skip(1));
The problem here would be that if onePossibleName becomes one Possible Name.
Same goes for the try with myString.Remove(startIndex, count) - this is not the solution.
There will be different method depending on what you want, you can go with a IndexOf and a SubString, regex would be a solution too.
// SubString and IndexOf method
// Usefull if you don't care of the word in the at tag, and you want to remove the first at tag
if (myString.Contains("</at>"))
{
var myEditedString = myString.Substring(myString.IndexOf("</at>") + 5);
}
// Regex method
var stringToRemove = "onePossibleName";
var rgx = new Regex($"<at>{stringToRemove}</at>");
var myEditedString = rgx.Replace(myString, string.Empty, 1); // The 1 precise that only the first occurrence will be replaced
You could use this generic regular expression.
var myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>";
var rg = new Regex(#"<at>(.*?)<\/at>");
var result = rg.Replace(myString, "").Trim();
This would remove all 'at' tags and the content between. The Trim() call is to remove any white space at the beginning/end of the string after the replacement.
string myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>"
int sFrom = myString.IndexOf("<at>") + "<at>".Length;
int sTo = myString.IndexOf("</at>");
string myEditedString = myString.SubString(sFrom, sFrom - sTo);
Console.WriteLine(myEditedString);
//output now is: some question here regarding <at>disPossibleName</at>

What is the best practise to extract a pattern from a string en C# and use it to create variables

Let's consider a string like this :
string myString = "C125:AAAAA|C12:22222|C16542:D1|ABCD:1234A|C6:12AAA"
I'd like to end with something like that :
string C125 = "AAAAA";
string C12 = "22222";
string C16542 = "D1";
string C6 = "12AAA";
It means that I'd like to extract substrings (or whatever) that matches the pattern "C+characters:characters" (and exclude other patterns like ABCD:1234A e.g.). Then automatically create a variable that would have the 1st part of my "substring" ("C125:AAAAA" e.g.) as a name (so string C125 in that case) and the 2nd part of my substring as a value ("AAAAA" e.g.).
What would be the best practise to do that in C# ? Thx !
use a dictionary to store your values:
string myString = "C125:AAAAA|C12:22222|C16542:D1|ABCD:1234A|C6:12AAA";
Dictionary<string, string> result = new Dictionary<string, string>();
myString.Split('|').ToList().ForEach(x => result.Add(x.Split(':')[0], x.Split(':')[1]));
Update - improved solution from CodesInChaos:
string myString = "C125:AAAAA|C12:22222|C16542:D1|ABCD:1234A|C6:12AAA";
Dictionary<string, string> result = myString.Split('|').Select(x => x.Split(':')).ToDictionary(x => x[0], x => x[1]);
if your purpose is code-generation, you can create a StringBuilder:
string myString = "C125:AAAAA|C12:22222|C16542:D1|ABCD:1234A|C6:12AAA";
var res = myString.Split('|')
.Select(s=>s.Split(':'))
.Where(arr=>arr[0][0] == 'C')
.Aggregate(new StringBuilder(),
(b, t)=>b.AppendFormat("string {0} = \"{1}\";", t[0], t[1])
.AppendLine());
output:
string C125 = "AAAAA";
string C12 = "22222";
string C16542 = "D1";
string C6 = "12AAA";
if your purpose to store values with keys, you can create a Dictionary
string myString = "C125:AAAAA|C12:22222|C16542:D1|ABCD:1234A|C6:12AAA";
var D = myString.Split('|')
.Select(s=>s.Split(':'))
.Where(arr=>arr[0][0] == 'C')
.ToDictionary(arr=>arr[0], arr=>arr[1]);
output:
[C125, AAAAA]
[C12, 22222]
[C16542, D1]
[C6, 12AAA]
format of input string is not complex, so String.Split would more appropriate here, than RegEx
You can use the following regex, that will match any combination of word characters with length 1 or more that came after : .
#":(\w+)"
Note that the preceding pattern used capture grouping so for get the proper result you need to print the 1st group.
Demo
or you can use a positive look behind :
#"(?<=:)\w+"
Demo
But if you want to create a name from first part the better choice for such tasks is use a data structure like dictionary.
So you can loop over the result if following command :
Match match = Regex.Match(text, (\w+):(\w+));
And put the pairs of 1st and 2nd groups within a dictionary.
Perhaps, you could use something like this.You could simply use the string split.
string myString = "C125:AAAAA|C12:22222|C16542:D1|ABCD:1234A|C6:12AAA";
StringBuilder result=new StringBuilder();
List<string> resulttemp = myString.Split('|').ToList();
foreach (string[] temp in from v in resulttemp where v.StartsWith("C") select v.Split(':'))
{
result.Append("string ");
result.Append(temp[0]);
result.Append("=");
result.Append("\"");
result.Append(temp[1]);
result.Append("\"");
result.Append(";");
result.Append("\n");
}

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

I have this code:
string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);
if (index == -1)
continue;
var secondIndex = forums.IndexOf(endTag, index);
result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));
The string i want to extract from is for example:
הנקה
What i want to get is the word after the title only this: הנקה
And the second problem is that when i'm extracting it i see instead hebrew some gibrish like this: ������
One powerful way to do this is to use Regular Expressions instead of trying to find a starting position and use a substring. Try out this code, and you'll see that it extracts the anchor tag's title:
var input = "הנקה";
var expression = new System.Text.RegularExpressions.Regex(#"title=\""([^\""]+)\""");
var match = expression.Match(input);
if (match.Success) {
Console.WriteLine(match.Groups[1]);
}
else {
Console.WriteLine("not found");
}
And for the curious, here is a version in JavaScript:
var input = 'הנקה';
var expression = new RegExp('title=\"([^\"]+)\"');
var results = expression.exec(input);
if (results) {
document.write(results[1]);
}
else {
document.write("not found");
}
Okay here is the solution using String.Substring() String.Split() and String.IndexOf()
String str = "הנקה"; // <== Assume this is passing string. Yes unusual scape sequence are added
int splitStart = str.IndexOf("title="); // < Where to start splitting
int splitEnd = str.LastIndexOf("</a>"); // < = Where to end
/* What we try to extract is this : title="הנקה">הנקה
* (Given without escape sequence)
*/
String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion
String[] splitted = extracted.Split('"'); // < = Now split with "
Console.WriteLine(splitted[1]); // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array
Now the problem, here you can see that i have to use escape sequence in an unusual way. You may ignore that since you are simply passing the scanning string.
And this actually works, but you cannot visualize it with the provided Console.WriteLine(splitted[1]);
But if you put a break point and check the extracted split array you can see that text are extracted. you can confirm it with following screenshot

replace a character in a string in c# based on position with a string

I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.
If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.
Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}
I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);
Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8

Categories