c# Remove string from XML

c# Remove string from XML - c#

I have an xml that has several attributes and values such as follows:
<z:row ID="1"
Author="2;#Bruce, Banner"
Editor="1;#Bruce, Banner"
FileRef="1;#Reports/Pipeline Tracker Report.xltm"
FileDirRef="1;#Reports"
Last_x0020_Modified="1;#2014-04-04 12:05:56"
Created_x0020_Date="1;#2014-04-04 11:36:21"
File_x0020_Size="1;#311815"
/>
How can I remove the string from after the " up to the #?
Original
'Author="2;#Bruce, Banner"'
Converted
'Author="Bruce, Banner"'

See if this helps.
private string FilterValue(string input)
{
// If the string does not contain #, return value
if (!input.Contains("#"))
return input;
// # does exist in the string so
// 1) find its location
// 2) Read everything from that point to the end of the string
// 3) Return the SubString value
var index = input.IndexOf("#", StringComparison.Ordinal) + 1;
return input.Substring(index, input.Length - index);
}

Something like this ?
// same logic then M Patel.
// This one will fit only if you have three items to remove (one digit, one semi-colon and one sharp).
// use M Patel solution
string CleanElement(string elem)
{
return elem.Substring(3, elem.Length - 3);
}
or like this :
// slower I guess but still a solution
string CleanElement(string elem)
{
string[] strs = elem.Split('#');
strs[0] = "";
return string.Join("", strs);
}

You can use string.Substring and string.IndexOf methods
string value= node.Attributes["Author"].Value;
value=value.Substring(0, value.IndexOf('#'));
I hope this is what you are looking for assuming that you are already reading your node from xml document
If you are new to reading XML in c#, I would recommend you to take a look at following msdn link https://msdn.microsoft.com/en-us/library/cc189056(v=vs.95).aspx

You can use regex for for seraching you pattern and use regEx.Replace() method.
Regex might goes like this "\d;#".

It should work if entry is 2;#Bruce, Banner!
string value= node.Attributes["Author"].Value;
var op = value.Split('#');
string name = op[1];
If other # is expected then,
string value1 = value.Substring(3, value.Length - 3);

You can use a simple regex:
string s = #"<z:row ID=""1""
Author=""2;#Bruce, Banner""
Editor=""1;#Bruce, Banner""
FileRef=""1;#Reports/Pipeline Tracker Report.xltm""
FileDirRef=""1;#Reports""
Last_x0020_Modified=""1;#2014-04-04 12:05:56""
Created_x0020_Date=""1;#2014-04-04 11:36:21""
File_x0020_Size=""1;#311815""
/>";
string result = Regex.Replace(s,"\"([0-9];#)","");

Related

Replacing anchor/link in text

I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards

If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].

Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

I have this code:
string firstTag = "Forums2008/forumPage.aspx?forumId=";
string endTag = "</a>";
index = forums.IndexOf(firstTag, index1);
if (index == -1)
continue;
var secondIndex = forums.IndexOf(endTag, index);
result = forums.Substring(index + firstTag.Length + 12, secondIndex - (index + firstTag.Length - 50));
The string i want to extract from is for example:
הנקה
What i want to get is the word after the title only this: הנקה
And the second problem is that when i'm extracting it i see instead hebrew some gibrish like this: ������

One powerful way to do this is to use Regular Expressions instead of trying to find a starting position and use a substring. Try out this code, and you'll see that it extracts the anchor tag's title:
var input = "הנקה";
var expression = new System.Text.RegularExpressions.Regex(#"title=\""([^\""]+)\""");
var match = expression.Match(input);
if (match.Success) {
Console.WriteLine(match.Groups[1]);
}
else {
Console.WriteLine("not found");
}
And for the curious, here is a version in JavaScript:
var input = 'הנקה';
var expression = new RegExp('title=\"([^\"]+)\"');
var results = expression.exec(input);
if (results) {
document.write(results[1]);
}
else {
document.write("not found");
}

Okay here is the solution using String.Substring() String.Split() and String.IndexOf()
String str = "הנקה"; // <== Assume this is passing string. Yes unusual scape sequence are added
int splitStart = str.IndexOf("title="); // < Where to start splitting
int splitEnd = str.LastIndexOf("</a>"); // < = Where to end
/* What we try to extract is this : title="הנקה">הנקה
* (Given without escape sequence)
*/
String extracted = str.Substring(splitStart, splitEnd - splitStart); // <=Extracting required portion
String[] splitted = extracted.Split('"'); // < = Now split with "
Console.WriteLine(splitted[1]); // <= Try to Out but yes will produce ???? But put a breakpoint here and check the values in split array
Now the problem, here you can see that i have to use escape sequence in an unusual way. You may ignore that since you are simply passing the scanning string.
And this actually works, but you cannot visualize it with the provided Console.WriteLine(splitted[1]);
But if you put a break point and check the extracted split array you can see that text are extracted. you can confirm it with following screenshot

Shorthand way to remove last forward slash and trailing characters from string

If I have the following string:
/lorem/ipsum/dolor
and I want this to become:
/lorem/ipsum
What is the short-hand way of removing the last forward slash, and all characters following it?
I know how I can do this by spliting the string into a List<> and removing the last item, and then joining, but is there a shorter way of writing this?
My question is not URL specific.

You can use Substring() and LastIndexOf():
str = str.Substring(0, str.LastIndexOf('/'));
EDIT (suggested comment)
To prevent any issues when the string may not contain a /, you could use something like:
int lastSlash = str.LastIndexOf('/');
str = (lastSlash > -1) ? str.Substring(0, lastSlash) : str;
Storing the position in a temp-variable would prevent the need to call .LastIndexOf('/') twice, but it could be dropped in favor of a one-line solution instead.

If there is '/' at the end of the url, remove it.
If not; just return the original one.
var url = this.Request.RequestUri.ToString();
url = url.EndsWith("/") ? url.Substring(0, url.Length - 1) : url;
url += #"/mycontroller";

You can do something like str.Remove(str.LastIndexOf("/")), but there is no built-in method to do what you want.
Edit: you could also use the Uri object to traverse directories, although it does not give exactly what you want:
Uri baseUri = new Uri("http://domain.com/lorem/ipsum/dolor");
Uri myUri = new Uri(baseUri, ".");
// myUri now contains http://domain.com/lorem/ipsum/

One simple way would be
String s = "domain.com/lorem/ipsum/dolor";
s = s.Substring(0, s.LastIndexOf('/'));
Console.WriteLine(s);
Another maybe
String s = "domain.com/lorem/ipsum/dolor";
s = s.TrimEnd('/');
Console.WriteLine(s);

You can use the regex /[^/]*$ and replace with the empty string:
var fixed = new Regex("/[^/]*$").Replace("domain.com/lorem/ipsum/dolor", "")
But it's probably overkill here. #newfurniturey's answer of Substring with LastIndexOf is probably best.

I like to create a String Extension for stuff like this:
/// <summary>
/// Returns with suffix removed, if present
/// </summary>
public static string TrimIfEndsWith(
this string value,
string suffix)
{
return
value.EndsWith(suffix) ?
value.Substring(0, value.Length - suffix.Length) :
value;
}
You can then use like this:
var myString = "/lorem/ipsum/dolor";
myStringClean = myString.TrimIfEndsWith("/dolor");
You now have a re-usable extension across all of your projects that can be used to remove one trailing character or multiple.

using System.IO;
mystring.TrimEnd(Path.AltDirectorySeparatorChar); // To remove "/"
mystring.TrimEnd(Path.DirectorySeparatorChar); // To remove "\"

while (input.Last() == '/' || input.Last() == '\\')
{
input = input.Substring(0, input.Length - 1);
}

Thank you #Curt for your question.
I slightly improved #newfurniturey's code, and here is my version.
if(str.Contains('/')){
str = str.Substring(0, str.LastIndexOf('/'));
}

I'm way late to the party, but if you're using C# 8.0+, another clean approach would be to use the range operator:
if (urlStr.EndsWith("/")) urlStr = urlStr[..^1];
If you're curious as to how this works, take a look at the spec for ranges in C#:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-8.0/ranges
tldr; urlStr[..^1] roughly translates to something along the lines of "Give me a substring comprised of the characters contained within the range of index 0 to whatever index is 1 away from the last index.".
In other words, it's similar to...
urlStr.Substring(0, urlStr.Length-1)

replace a character in a string in c# based on position with a string

I want to replace a charecter in a string with a string in c#.
I have tried the following,
Here in the following program, i want replace set of charecters between charecters ':' and first occurance of '-' with some others charecters.
I could able to extract the set of charecters between ':' and first occurance of '-'.
Can any one say how to insert these back in the source string.
string source= "tcm:7-426-8";
string target= "tcm:10-15-2";
int fistunderscore = target.IndexOf("-");
string temp = target.Substring(4, fistunderscore-4);
Response.Write("<BR>"+"temp1:" + temp + "<BR>");
Examples:
source: "tcm:7-426-8" or "tcm:100-426-8" or "tcm:10-426-8"
Target: "tcm:10-15-2" or "tcm:5-15-2" or "tcm:100-15-2"
output: "tcm:10-426-8" or "tcm:5-426-8" or "tcm:100-426-8"
In a nutshell, I want to replace the set of charectes between ':' and '-'(firstoccurance) and the charecters extracetd from the same sort of string.
Can any help how it can be done.
Thank you.

If you want to replace the first ":Number-" from the source with the content from target, you can use the following regex.
var pattern1 = New Regex(":\d{1,3}-{1}");
if(pattern1.IsMatch(source) && pattern1.IsMatch(target))
{
var source = "tcm:7-426-8";
var target = "tcm:10-15-2";
var res = pattern1.Replace(source, pattern1.Match(target).Value);
// "tcm:10-426-8"
}
Edit: To not have your string replaced with something empty, add an if-clause before the actualy replacing.

Try a regex solution - first this method, takes the source and target strings, and performs a regex replace on the first, targetting the first numbers after the 'tcm', which must be anchored to the start of the string. In the MatchEvaluator it executes the same regex again, but on the target string.
static Regex rx = new Regex("(?<=^tcm:)[0-9]+", RegexOptions.Compiled);
public string ReplaceOneWith(string source, string target)
{
return rx.Replace(source, new MatchEvaluator((Match m) =>
{
var targetMatch = rx.Match(target);
if (targetMatch.Success)
return targetMatch.Value;
return m.Value; //don't replace if no match
}));
}
Note that no replacement is performed if the regex doesn't return a match on the target string.
Now run this test (probably need to copy the above into the test class):
[TestMethod]
public void SO9973554()
{
Assert.AreEqual("tcm:10-426-8", ReplaceOneWith("tcm:7-426-8", "tcm:10-15-2"));
Assert.AreEqual("tcm:5-426-8", ReplaceOneWith("tcm:100-426-8", "tcm:5-15-2"));
Assert.AreEqual("tcm:100-426-8", ReplaceOneWith("tcm:10-426-8", "tcm:100-15-2"));
}

I'm not clear on the logic used to decide which bit from which string is used, but still, you should use Split(), rather than mucking about with string offsets:
(note that the Remove(0,4) is there to remove the tcm: prefix)
string[] source = "tcm:90-2-10".Remove(0,4).Split('-');
string[] target = "tcm:42-23-17".Remove(0,4).Split('-');
Now you have the numbers from both source and target in easy-to-access arrays, so you can build the new string any way you want:
string output = string.Format("tcm:{0}-{1}-{2}", source[0], target[1], source[2]);

Heres without regex
string source = "tcm:7-426-8";
string target = "tcm:10-15-2";
int targetBeginning = target.IndexOf("-");
int sourceBeginning = source.IndexOf("-");
string temp = target.Substring(0, targetBeginning);//tcm:10
string result = temp + source.Substring(sourceBeginning, source.Length-sourceBeginning); //tcm:10 + -426-8

get all characters to right of last dash

I have the following:
string test = "9586-202-10072"
How would I get all characters to the right of the final - so 10072. The number of characters is always different to the right of the last dash.
How can this be done?

You can get the position of the last - with str.LastIndexOf('-'). So the next step is obvious:
var result = str.Substring(str.LastIndexOf('-') + 1);
Correction:
As Brian states below, using this on a string with no dashes will result in the original string being returned.

You could use LINQ, and save yourself the explicit parsing:
string test = "9586-202-10072";
string lastFragment = test.Split('-').Last();
Console.WriteLine(lastFragment);

I can see this post was viewed over 46,000 times. I would bet many of the 46,000 viewers are asking this question simply because they just want the file name... and these answers can be a rabbit hole if you cannot make your substring verbatim using the at sign.
If you simply want to get the file name, then there is a simple answer which should be mentioned here. Even if it's not the precise answer to the question.
result = Path.GetFileName(fileName);
see https://msdn.microsoft.com/en-us/library/system.io.path.getfilename(v=vs.110).aspx

string tail = test.Substring(test.LastIndexOf('-') + 1);

YourString.Substring(YourString.LastIndexOf("-"));

With the latest C# 8 and later you can use Range Indexer as follows:-
string test = "9586-202-10072"
var foo = test?[(test.LastIndexOf('-') + 1)..];
// foo is => 10072

string atest = "9586-202-10072";
int indexOfHyphen = atest.LastIndexOf("-");
if (indexOfHyphen >= 0)
{
string contentAfterLastHyphen = atest.Substring(indexOfHyphen + 1);
Console.WriteLine(contentAfterLastHyphen );
}

See String.lastIndexOf method

I created a string extension for this, hope it helps.
public static string GetStringAfterChar(this string value, char substring)
{
if (!string.IsNullOrWhiteSpace(value))
{
var index = value.LastIndexOf(substring);
return index > 0 ? value.Substring(index + 1) : value;
}
return string.Empty;
}

test.Substring[(test.LastIndexOf('-') + 1)..]
C# 8 (late 2019) introduces range operator and simplifies it a bit further. The two dots here means from the index (inclusive) till the end of string.

test.Substring(test.LastIndexOf("-"))

and... in case you need the left part of a string:
private string AllTheLeftPart(string theString)
{
string rightPart = theString.Substring(theString.LastIndexOf('-') + 1);
string leftPart theString.Replace("-" + rightPart, String.Empty);
return leftPart ;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# Remove string from XML - c#

You can use regex for for seraching you pattern and use regEx.Replace() method. Regex might goes like this "\d;#".

It should work if entry is 2;#Bruce, Banner! string value= node.Attributes["Author"].Value; var op = value.Split('#'); string name = op[1]; If other # is expected then, string value1 = value.Substring(3, value.Length - 3);

Related

Replacing anchor/link in text

When using indexof and substring how do i parse the right start and end indexs ? And how do i encode hebrew chars?

Shorthand way to remove last forward slash and trailing characters from string

replace a character in a string in c# based on position with a string

get all characters to right of last dash

Categories

Resources