Remove part of a string between an start and end

Remove part of a string between an start and end - c#

Code first:
string myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>"
// some code to handle myString and save it in myEditedString
Console.WriteLine(myEditedString);
//output now is: some question here regarding <at>disPossibleName</at>
I want to remove <at>onePossibleName</at> from myString. The string onePossibleName and disPossbileName could be any other string.
So far I am working with
string myEditedString = string.Join(" ", myString.Split(' ').Skip(1));
The problem here would be that if onePossibleName becomes one Possible Name.
Same goes for the try with myString.Remove(startIndex, count) - this is not the solution.

There will be different method depending on what you want, you can go with a IndexOf and a SubString, regex would be a solution too.
// SubString and IndexOf method
// Usefull if you don't care of the word in the at tag, and you want to remove the first at tag
if (myString.Contains("</at>"))
{
var myEditedString = myString.Substring(myString.IndexOf("</at>") + 5);
}
// Regex method
var stringToRemove = "onePossibleName";
var rgx = new Regex($"<at>{stringToRemove}</at>");
var myEditedString = rgx.Replace(myString, string.Empty, 1); // The 1 precise that only the first occurrence will be replaced

You could use this generic regular expression.
var myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>";
var rg = new Regex(#"<at>(.*?)<\/at>");
var result = rg.Replace(myString, "").Trim();
This would remove all 'at' tags and the content between. The Trim() call is to remove any white space at the beginning/end of the string after the replacement.

string myString = "<at>onePossibleName</at> some question here regarding <at>disPossibleName</at>"
int sFrom = myString.IndexOf("<at>") + "<at>".Length;
int sTo = myString.IndexOf("</at>");
string myEditedString = myString.SubString(sFrom, sFrom - sTo);
Console.WriteLine(myEditedString);
//output now is: some question here regarding <at>disPossibleName</at>

Related

How should we reverse format & extract a string in C#?

Pretty sure we all do string.format and define some string in a specifed format.
I have a string which is always formatted in a way like this:
const string myString = string.Format("pt:{0}-first:{1}", inputString);
To get {0}, I can always check for pt:{ and read till }.
But what is the best/recommended way to extract {0} & {1} from the above variable myString ?

A Regex version of answer, but again, assuming your input doesnt contain '-'
var example = = "pt:hello-first:23";
var str = "pt:(?<First>[^-]+)-first:(?<Second>[^%]+)";
var match = new Regex(str).Match(example);
var first = match.Groups["First"].Value;
var second = match.Groups["Second"].Value;
It might be a good idea that you define what your variable can/cannot contain.

Not sure if this is the best way to do it, but it's the most obvious:
string example = "pt:123-first:456";
var split = example.Split('-');
var pt = split[0].Substring(split[0].IndexOf(':') + 1);
var first = split[1].Substring(split[1].IndexOf(':') + 1);
As Shawn said, if you can guarantee that the variables wont contain either : or - this will be adequate.

Extract ID and replace everything in `Example HTML`

New to Regular Expressions, I want to have the following text in my HTML and would like to replace with something else
Example HTML:
{{Object id='foo'}}
Extract the id into a variable like this:
string strId = "foo";
So far I have the following Regular Expression code that will capture the Example HTML:
string strStart = "Object";
string strFind = "{{(" + strStart + ".*?)}}";
Regex regExp = new Regex(strFind, RegexOptions.IgnoreCase);
Match matchRegExp = regExp.Match(html);
while (matchRegExp.Success)
{
//At this point, I have this variable:
//{{Object id='foo'}}
//I can find the id='foo' (see below)
//but not sure how to extract 'foo' and use it
string strFindInner = "id='(.*?)'"; //"{{Slider";
Regex regExpInner = new Regex(strFindInner, RegexOptions.IgnoreCase);
Match matchRegExpInner = regExpInner.Match(matchRegExp.Value.ToString());
//Do something with 'foo'
matchRegExp = matchRegExp.NextMatch();
}
I understand this might be a simple solution, I am hoping to gain more knowledge about Regular Expressions but more importantly, I am hoping to receive a suggestion on how to approach this cleaner and more efficiently.
Thank you
Edit:
Is this an example that I could potentially use: c# regex replace

While I am not solving my initial question with Regular Expressions, I did move into a simpler solution using SubString, IndexOf and string.Split for the time being, I understand that my code needs to be cleaned up but thought I would post the answer that I have thus far.
string html = "<p>Start of Example</p>{{Object id='foo'}}<p>End of example</p>"
string strObject = "Slider"; //Example
//When found, this will contain "{{Object id='foo'}}"
string strCode = "";
//ie: "id='foo'"
string strCodeInner = "";
//Tags will be a list, but in this example, only "id='foo'"
string[] tags = { };
//Looking for the following "{{Object "
string strFindStart = "{{" + strObject + " ";
int intFindStart = html.IndexOf(strFindStart);
//Then ending in the following
string strFindEnd = "}}";
int intFindEnd = html.IndexOf(strFindEnd) + strFindEnd.Length;
//Must find both Start and End conditions
if (intFindStart != -1 && intFindEnd != -1)
{
strCode = html.Substring(intFindStart, intFindEnd - intFindStart);
//Remove Start and End
strCodeInner = strCode.Replace(strFindStart, "").Replace(strFindEnd, "");
//Split by spaces, this needs to be improved if more than IDs are to be used
//but for proof of concept this is perfect
tags = strCodeInner.Split(new char[] { ' ' });
}
Dictionary<string, string> dictTags = new Dictionary<string, string>();
foreach (string tag in tags)
{
string[] tagSplit = tag.Split(new char[] { '=' });
dictTags.Add(tagSplit[0], tagSplit[1].Replace("'", "").Replace("\"", ""));
}
//At this point, I can replace "{{Object id='foo'}}" with anything I'd like
//What I don't show is that I go into the website's database,
//get the object (ie: Slider) and return the html for slider with the ID of foo
html = html.Replace(strCode, strView);
/*
"html" variable may contain:
<p>Start of Example</p>
<p id="foo">This is the replacement text</p>
<p>End of example</p>
*/

The fastest way to trim string in C#

I need to trim paths in million strings like this:
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
to
src\my_component\my_file.cpp
I.e. remove absolute part of the path, what is the fastest way to do that?
My try using regex:
Regex.Replace(path, #"(.*?)\src", ""),

I wouldn't go with regex for this, use the plain old method.
If the path prefix is always the same:
const string partToRemove = #"C:\workspace\my_projects\my_app\";
if (path.StartsWith(partToRemove, StringComparison.OrdinalIgnoreCase))
path = path.Substring(partToRemove.Length);
If the prefix is variable, you can get the last index of \src\:
var startIndex = path.LastIndexOf(#"\src\", StringComparison.OrdinalIgnoreCase);
if (startIndex >= 0)
path = path.Substring(startIndex + 1);

define the regex with a new and reuse it
there is a (significant) cost to creating the regex
string input = "This is text with far too much " +
"whitespace.";
string pattern = "\\s+";
string replacement = " ";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);

I'm not sure if you need speed here, but if you always get the full path, you could do a simple .Substring()
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
Console.WriteLine(path.Substring(32));
However, I think you should sanitize your input first; in this case, the Uri class could do the parsing step:
var root = #"C:\workspace\my_projects\my_app\";
var path = #"C:\workspace\my_projects\my_app\src\my_component\my_file.cpp";
var relative = new Uri(root).MakeRelativeUri(new Uri(path));
Console.WriteLine(relative.OriginalString.Replace("/", "\\"));
Notice here the Uri will change the \ with a /: that's the .Replace reason.

Cant think any faster than this
path.Substring(33);
What is before src is constant. and it starts from index 33.
C:\workspace\my_projects\my_app\src\my_component\my_file.cpp
^
How ever if its not always constant. you can find it once. and do the rest inside loop.
int startInd = path.IndexOf(#"\src\") + 1;
// Do this inside loop. 1 million times
path.Substring(startInd);

If your files will all end in "src/filename.ext" you could use the Path class in the .NET framework for it and get around all caveats you could have with pathes and filenames:
result = "src\" + Path.GetFileName(path);
So you should first double-check that the conversion is the thing that takes to long.

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

I have this code:
string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);
if (index == -1)
continue;
var secondIndex = forums.IndexOf(endTag, index);
result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));
The string i want to extract from is for example:
הנקה
What i want to get is the word after the title only this: הנקה
And the second problem is that when i'm extracting it i see instead hebrew some gibrish like this: ������

One powerful way to do this is to use Regular Expressions instead of trying to find a starting position and use a substring. Try out this code, and you'll see that it extracts the anchor tag's title:
var input = "הנקה";
var expression = new System.Text.RegularExpressions.Regex(#"title=\""([^\""]+)\""");
var match = expression.Match(input);
if (match.Success) {
Console.WriteLine(match.Groups[1]);
}
else {
Console.WriteLine("not found");
}
And for the curious, here is a version in JavaScript:
var input = 'הנקה';
var expression = new RegExp('title=\"([^\"]+)\"');
var results = expression.exec(input);
if (results) {
document.write(results[1]);
}
else {
document.write("not found");
}

Okay here is the solution using String.Substring() String.Split() and String.IndexOf()
String str = "הנקה"; // <== Assume this is passing string. Yes unusual scape sequence are added
int splitStart = str.IndexOf("title="); // < Where to start splitting
int splitEnd = str.LastIndexOf("</a>"); // < = Where to end
/* What we try to extract is this : title="הנקה">הנקה
* (Given without escape sequence)
*/
String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion
String[] splitted = extracted.Split('"'); // < = Now split with "
Console.WriteLine(splitted[1]); // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array
Now the problem, here you can see that i have to use escape sequence in an unusual way. You may ignore that since you are simply passing the scanning string.
And this actually works, but you cannot visualize it with the provided Console.WriteLine(splitted[1]);
But if you put a break point and check the extracted split array you can see that text are extracted. you can confirm it with following screenshot

replace a character in a string in c# based on position with a string

I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.

If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.

Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}

I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);

Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove part of a string between an start and end - c#

Related

How should we reverse format & extract a string in C#?

Extract ID and replace everything in `Example HTML`

The fastest way to trim string in C#

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

replace a character in a string in c# based on position with a string

Categories

Resources