get div element contents in C#

get div element contents in C# - c#

I have a moderately well-formatted HTML document. It is not XHTML so it's not valid XML. Given a offset of the opening tag I need to obtain contents of this tag, considering that it can have multiple nested tags inside of it.
What is the easiest way to solve this problem with a minimum amount of C# code that doesn't involve using non-standard libraries?

You can strip your html content using following function
public static string StripHTMLTag(string strHTML)
{
return Regex.Replace(strHTML, "<(.|\n)*?>", "");
}
pass your content of outer tag, this will strip all html tags and provide you only content.
Hope this helps
Imran

I ended up writing the following function. It seems to get the job done for my purposes.
I know that it's kind of dirty, but so is the HTML code of most web-pages.
If anyone can point out principal flaws, please do so:
private static readonly Regex rxDivTag = new Regex(
#"<(?<close>/)?div(\s[^>]*?)?(?<selfClose>/)?>",
RegexOptions.Compiled | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.Singleline);
private const string RXCAP_DIVTAG_CLOSE = "close";
private const string RXCAP_DIVTAG_SELFCLOSE = "selfClose";
private static List<string> GetProductDivs(string pageText, int start)
{
bool success = true;
int curr = start + 1;
for (Match matchNextTag = rxDivTag.Match(pageText, curr) ; depth > 0 ; matchNextTag = rxDivTag.Match(pageText, curr))
{
if (matchNextTag == Match.Empty)
{
success = false;
break;
}
if (matchNextTag.Groups[RXCAP_DIVTAG_CLOSE].Success)
{
if (matchNextTag.Groups[RXCAP_DIVTAG_SELFCLOSE].Success)
{
success = false;
break;
}
--depth;
}
else if (!matchNextTag.Groups[RXCAP_DIVTAG_SELFCLOSE].Success)
{
++depth;
}
curr = matchNextTag.Index + matchNextTag.Length;
}
if (success)
{
return pageText.Substring(start, curr - start);
}
else
{
return null;
}
}

Related

How do I remove all HTML tags from a string without knowing which tags are in it?

Is there any easy way to remove all HTML tags or ANYTHING HTML related from a string?
For example:
string title = "<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )"
The above should really be:
"Hulk Hogan's Celebrity Championship Wrestling [Proj # 206010] (Reality Series)"

You can use a simple regex like this:
public static string StripHTML(string input)
{
return Regex.Replace(input, "<.*?>", String.Empty);
}
Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/#mehaase)
Another solution would be to use the HTML Agility Pack.
You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?

You can parse the string using Html Agility pack and get the InnerText.
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(#"<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )");
string result = htmlDoc.DocumentNode.InnerText;

You can use the below code on your string and you will get the complete string without html part.
string title = "<b> Hulk Hogan's Celebrity Championship Wrestling <font color=\"#228b22\">[Proj # 206010]</font></b> (Reality Series, )".Replace(" ",string.Empty);
string s = Regex.Replace(title, "<.*?>", String.Empty);

I built a small function to remove HTML tags.
public static string RemoveHtmlTags(string text)
{
List<int> openTagIndexes = Regex.Matches(text, "<").Cast<Match>().Select(m => m.Index).ToList();
List<int> closeTagIndexes = Regex.Matches(text, ">").Cast<Match>().Select(m => m.Index).ToList();
if (closeTagIndexes.Count > 0)
{
StringBuilder sb = new StringBuilder();
int previousIndex = 0;
foreach (int closeTagIndex in closeTagIndexes)
{
var openTagsSubset = openTagIndexes.Where(x => x >= previousIndex && x < closeTagIndex);
if (openTagsSubset.Count() > 0 && closeTagIndex - openTagsSubset.Max() > 1 )
{
sb.Append(text.Substring(previousIndex, openTagsSubset.Max() - previousIndex));
}
else
{
sb.Append(text.Substring(previousIndex, closeTagIndex - previousIndex + 1));
}
previousIndex = closeTagIndex + 1;
}
if (closeTagIndexes.Max() < text.Length)
{
sb.Append(text.Substring(closeTagIndexes.Max() + 1));
}
return sb.ToString();
}
else
{
return text;
}
}

public static string StripHTML(string input)
{
if (input==null)
{
return string.Empty;
}
return Regex.Replace(input, "<.*?>", String.Empty);
}

Regular expression can't handle rogue square brackets

Thanks to the smarties on here in the past I have this amazing recursive regular expression that helps me to transform custom BBCode-style tags in a block of text.
/// <summary>
/// Static class containing common regular expression strings.
/// </summary>
public static class RegularExpressions
{
/// <summary>
/// Expression to find all root-level BBCode tags. Use this expression recursively to obtain nested tags.
/// </summary>
public static string BBCodeTags
{
get
{
return #"
(?>
\[ (?<tag>[^][/=\s]+) \s*
(?: = \s* (?<val>[^][]*) \s*)?
]
)
(?<content>
(?>
\[(?<innertag>[^][/=\s]+)[^][]*]
|
\[/(?<-innertag>\k<innertag>)]
|
[^][]+
)*
(?(innertag)(?!))
)
\[/\k<tag>]
";
}
}
}
This regex works beautifully, recursively matching on all tags. Like this:
[code]
some code
[b]some text [url=http://www.google.com]some link[/url][/b]
[/code]
The regex does exactly what I want and matches the [code] tag. It breaks it up into three groups: tag, optional value, and content. Tag being the tag name ("code" in this case). Optional value being a value after the equals(=) sign if there is one. And content being everything between the opening and closing tag.
The regex can be used recursively to match nested tags. So after matching on [code] I would run it again against the content group and it would match the [b] tag. If I ran it again on the next content group it would then match the [url] tag.
All of that is wonderful and delicious but it hiccups on one issue. It can't handle rogue square brackets.
[code]This is a successful match.[/code]
[code]This is an [ unsuccessful match.[/code]
[code]This is also an [unsuccessful] match.[/code]
I really suck at regular expressions but if anyone knows how I might tweak this regex to correctly ignore rogue brackets (brackets that do not make up an opening tag and/or do not have a matching closing tag) so that it still matches the surrounding tags, I would be very appreciative :D
Thanks in advance!
Edit
If you are interested in seeing the method where I use this expression you are welcome to.

I did a program that can parse your strings in a debugable, developer-friendly way. It is not a small code like those regexes, but it has a positive side: you can debug it, and fine tune it as you need.
The implementation is a descent recursive parser, but if you need some kind of contextual data, you can place it all inside the ParseContext class.
It is quite long, but I consider it as being better than a a regex based solution.
To test it, create a console application, and replace all the code inside Program.cs with the following code:
using System.Collections.Generic;
namespace q7922337
{
static class Program
{
static void Main(string[] args)
{
var result1 = Match.ParseList<TagsGroup>("[code]This is a successful match.[/code]");
var result2 = Match.ParseList<TagsGroup>("[code]This is an [ unsuccessful match.[/code]");
var result3 = Match.ParseList<TagsGroup>("[code]This is also an [unsuccessful] match.[/code]");
var result4 = Match.ParseList<TagsGroup>(#"
[code]
some code
[b]some text [url=http://www.google.com]some link[/url][/b]
[/code]");
}
class ParseContext
{
public string Source { get; set; }
public int Position { get; set; }
}
abstract class Match
{
public override string ToString()
{
return this.Text;
}
public string Source { get; set; }
public int Start { get; set; }
public int Length { get; set; }
public string Text { get { return this.Source.Substring(this.Start, this.Length); } }
protected abstract bool ParseInternal(ParseContext context);
public bool Parse(ParseContext context)
{
var result = this.ParseInternal(context);
this.Length = context.Position - this.Start;
return result;
}
public bool MarkBeginAndParse(ParseContext context)
{
this.Start = context.Position;
var result = this.ParseInternal(context);
this.Length = context.Position - this.Start;
return result;
}
public static List<T> ParseList<T>(string source)
where T : Match, new()
{
var context = new ParseContext
{
Position = 0,
Source = source
};
var result = new List<T>();
while (true)
{
var item = new T { Source = source, Start = context.Position };
if (!item.Parse(context))
break;
result.Add(item);
}
return result;
}
public static T ParseSingle<T>(string source)
where T : Match, new()
{
var context = new ParseContext
{
Position = 0,
Source = source
};
var result = new T { Source = source, Start = context.Position };
if (result.Parse(context))
return result;
return null;
}
protected List<T> ReadList<T>(ParseContext context)
where T : Match, new()
{
var result = new List<T>();
while (true)
{
var item = new T { Source = this.Source, Start = context.Position };
if (!item.Parse(context))
break;
result.Add(item);
}
return result;
}
protected T ReadSingle<T>(ParseContext context)
where T : Match, new()
{
var result = new T { Source = this.Source, Start = context.Position };
if (result.Parse(context))
return result;
return null;
}
protected int ReadSpaces(ParseContext context)
{
int startPos = context.Position;
int cnt = 0;
while (true)
{
if (startPos + cnt >= context.Source.Length)
break;
if (!char.IsWhiteSpace(context.Source[context.Position + cnt]))
break;
cnt++;
}
context.Position = startPos + cnt;
return cnt;
}
protected bool ReadChar(ParseContext context, char p)
{
int startPos = context.Position;
if (startPos >= context.Source.Length)
return false;
if (context.Source[startPos] == p)
{
context.Position = startPos + 1;
return true;
}
return false;
}
}
class Tag : Match
{
protected override bool ParseInternal(ParseContext context)
{
int startPos = context.Position;
if (!this.ReadChar(context, '['))
return false;
this.ReadSpaces(context);
if (this.ReadChar(context, '/'))
this.IsEndTag = true;
this.ReadSpaces(context);
var validName = this.ReadValidName(context);
if (validName != null)
this.Name = validName;
this.ReadSpaces(context);
if (this.ReadChar(context, ']'))
return true;
context.Position = startPos;
return false;
}
protected string ReadValidName(ParseContext context)
{
int startPos = context.Position;
int endPos = startPos;
while (char.IsLetter(context.Source[endPos]))
endPos++;
if (endPos == startPos) return null;
context.Position = endPos;
return context.Source.Substring(startPos, endPos - startPos);
}
public bool IsEndTag { get; set; }
public string Name { get; set; }
}
class TagsGroup : Match
{
public TagsGroup()
{
}
protected TagsGroup(Tag openTag)
{
this.Start = openTag.Start;
this.Source = openTag.Source;
this.OpenTag = openTag;
}
protected override bool ParseInternal(ParseContext context)
{
var startPos = context.Position;
if (this.OpenTag == null)
{
this.ReadSpaces(context);
this.OpenTag = this.ReadSingle<Tag>(context);
}
if (this.OpenTag != null)
{
int textStart = context.Position;
int textLength = 0;
while (true)
{
Tag tag = new Tag { Source = this.Source, Start = context.Position };
while (!tag.MarkBeginAndParse(context))
{
if (context.Position >= context.Source.Length)
{
context.Position = startPos;
return false;
}
context.Position++;
textLength++;
}
if (!tag.IsEndTag)
{
var tagGrpStart = context.Position;
var tagGrup = new TagsGroup(tag);
if (tagGrup.Parse(context))
{
if (textLength > 0)
{
if (this.Contents == null) this.Contents = new List<Match>();
this.Contents.Add(new Text { Source = this.Source, Start = textStart, Length = textLength });
textStart = context.Position;
textLength = 0;
}
this.Contents.Add(tagGrup);
}
else
{
textLength += tag.Length;
}
}
else
{
if (tag.Name == this.OpenTag.Name)
{
if (textLength > 0)
{
if (this.Contents == null) this.Contents = new List<Match>();
this.Contents.Add(new Text { Source = this.Source, Start = textStart, Length = textLength });
textStart = context.Position;
textLength = 0;
}
this.CloseTag = tag;
return true;
}
else
{
textLength += tag.Length;
}
}
}
}
context.Position = startPos;
return false;
}
public Tag OpenTag { get; set; }
public Tag CloseTag { get; set; }
public List<Match> Contents { get; set; }
}
class Text : Match
{
protected override bool ParseInternal(ParseContext context)
{
return true;
}
}
}
}
If you use this code, and someday find that you need optimizations because the parser has become ambiguous, then try using a dictionary in the ParseContext, take a look here for more info: http://en.wikipedia.org/wiki/Top-down_parsing in the topic Time and space complexity of top-down parsing. I find it very interesting.

The first change is pretty simple - you can get it by changing [^][]+, which is responsible for matching the free text, to .. This seems a little crazy, perhaps, but it's actually safe, because you are using a possessive group (?> ), so all the valid tags will be matched by the first alternation - \[(?<innertag>[^][/=\s]+)[^][]*] - and cannot backtrack and break the tags.
(Remember to enable the Singleline flag, so . matches newlines)
The second requirement, [unsuccessful], seems to go against your goal it. The whole idea from the very start is not to match these unclosed tags. If you allow unclosed tags, all matches of the form \[(.*?)\].*?[/\1] become valid. Not good. At best, you can try to whitelist a few tags which are not allowed to be matched.
An example of both changes:
(?>
\[ (?<tag>[^][/=\s]+) \s*
(?: = \s* (?<val>[^][]*) \s*)?
\]
)
(?<content>
(?>
\[(?:unsuccessful)\] # self closing
|
\[(?<innertag>[^][/=\s]+)[^][]*]
|
\[/(?<-innertag>\k<innertag>)]
|
.
)*
(?(innertag)(?!))
)
\[/\k<tag>\]
Working example on Regex Hero

Ok. Here's another attempt. This one is a little more complicated.
The idea is to match the whole text from start to ext, and parse it to a single Match. While rarely used as such, .Net Balancing Groups allow you to fine tune your captures, remembering all positions and captures exactly the way you want them.
The pattern I came up with is:
\A
(?<StartContentPosition>)
(?:
# Open tag
(?<Content-StartContentPosition>) # capture the content between tags
(?<StartTagPosition>) # Keep the starting postion of the tag
(?>\[(?<TagName>[^][/=\s]+)[^\]\[]*\]) # opening tag
(?<StartContentPosition>) # start another content capture
|
# Close tag
(?<Content-StartContentPosition>) # capture the content in the tag
\[/\k<TagName>\](?<Tag-StartTagPosition>) # closing tag, keep the content in the <tag> group
(?<-TagName>)
(?<StartContentPosition>) # start another content capture
|
. # just match anything. The tags are first, so it should match
# a few if it can. (?(TagName)(?!)) keeps this in line, so
# unmatched tags will not mess with the resul
)*
(?<Content-StartContentPosition>) # capture the content after the last tag
\Z
(?(TagName)(?!))
Remember - the balancing group (?<A-B>) captures into A all text since B was last captured (and pops that position from B's stack).
Now you can parse the string using:
Match match = Regex.Match(sample, pattern, RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);
Your interesting data will be on match.Groups["Tag"].Captures, which contains all tags (some of them are contained in others), and match.Groups["Content"].Captures, which contains tag's contents, and contents between tags. For example, without all blanks, it contains:
some code
some text
This is also an successful match.
This is also an [ unsuccessful match.
This is also an [unsuccessful] match.
This is pretty close to a full parsed document, but you'll still have to play with indices and length to figure out the exact order and structure of the document (though it isn't more complex than sorting all captures)
At this point I'll state what others have said - it may be a good time to write a parser, this pattern isn't pretty...

String Replacements for Word Merge

using asp.net 4
we do a lot of Word merges at work. rather than using the complicated conditional statements of Word i want to embed my own syntax. something like:
Dear Mr. { select lastname from users where userid = 7 },
Your invoice for this quarter is: ${ select amount from invoices where userid = 7 }.
......
ideally, i'd like this to get turned into:
string.Format("Dear Mr. {0}, Your invoice for this quarter is: ${1}", sqlEval[0], sqlEval[1]);
any ideas?

Well, I don't really recommend rolling your own solution for this, however I will answer the question as asked.
First, you need to process the text and extract the SQL statements. For that you'll need a simple parser:
/// <summary>Parses the input string and extracts a unique list of all placeholders.</summary>
/// <remarks>
/// This method does not handle escaping of delimiters
/// </remarks>
public static IList<string> Parse(string input)
{
const char placeholderDelimStart = '{';
const char placeholderDelimEnd = '}';
var characters = input.ToCharArray();
var placeHolders = new List<string>();
string currentPlaceHolder = string.Empty;
bool inPlaceHolder = false;
for (int i = 0; i < characters.Length; i++)
{
var currentChar = characters[i];
// Start of a placeholder
if (!inPlaceHolder && currentChar == placeholderDelimStart)
{
currentPlaceHolder = string.Empty;
inPlaceHolder = true;
continue;
}
// Start of a placeholder when we already have one
if (inPlaceHolder && currentChar == placeholderDelimStart)
throw new InvalidOperationException("Unexpected character detected at position " + i);
// We found the end marker while in a placeholder - we're done with this placeholder
if (inPlaceHolder && currentChar == placeholderDelimEnd)
{
if (!placeHolders.Contains(currentPlaceHolder))
placeHolders.Add(currentPlaceHolder);
inPlaceHolder = false;
continue;
}
// End of a placeholder with no matching start
if (!inPlaceHolder && currentChar == placeholderDelimEnd)
throw new InvalidOperationException("Unexpected character detected at position " + i);
if (inPlaceHolder)
currentPlaceHolder += currentChar;
}
return placeHolders;
}
Okay, so that will get you a list of SQL statements extracted from the input text. You'll probably want to tweak it to use properly typed parser exceptions and some input guards (which I elided for clarity).
Now you just need to replace those placeholders with the results of the evaluated SQL:
// Sample input
var input = "Hello Mr. {select firstname from users where userid=7}";
string output = input;
var extractedStatements = Parse(input);
foreach (var statement in extractedStatements)
{
// Execute the SQL statement
var result = Evaluate(statement);
// Update the output with the result of the SQL statement
output = output.Replace("{" + statement + "}", result);
}
This is obviously not the most efficient way to do this, but I think it sufficiently demonstrates the concept without muddying the waters.
You'll need to define the Evaluate(string) method. This will handle executing the SQL.

I just finished building a proprietary solution like this for a law firm here.
I evaluated a product called Windward reports. It's a tad pricy, esp if you need a lot of copies, but for one user it's not bad.
it can pull from XML or SQL data sources (or more if I remember).
Might be worth a look (and no I don't work for 'em, just evaluated their stuff)

You might want to check out the razor engine project on codeplex
http://razorengine.codeplex.com/
Using SQL etc within your template looks like a bad idea. I'd suggest you make a ViewModel for each template.
The Razor thing is really easy to use. Just add a reference, import the namespace, and call the Parse method like so:
(VB guy so excuse syntax!)
MyViewModel myModel = new MyViewModel("Bob",150.00); //set properties
string myTemplate = "Dear Mr. #Model.FirstName, Your invoice for this quarter is: #Model.InvoiceAmount";
string myOutput = Razor.Parse(myTemplate, myModel);
Your string can come from anywhere - I use this with my templates stored in a database, you could equally load it from files or whatever. It's very powerful as a view engine, you can do conditional stuff, loops, etc etc.

i ended up rolling my own solution but thanks. i really dislike if statements. i'll need to refactor them out. here it is:
var mailingMergeString = new MailingMergeString(input);
var output = mailingMergeString.ParseMailingMergeString();
public class MailingMergeString
{
private string _input;
public MailingMergeString(string input)
{
_input = input;
}
public string ParseMailingMergeString()
{
IList<SqlReplaceCommand> sqlCommands = new List<SqlReplaceCommand>();
var i = 0;
const string openBrace = "{";
const string closeBrace = "}";
while (string.IsNullOrWhiteSpace(_input) == false)
{
var sqlReplaceCommand = new SqlReplaceCommand();
var open = _input.IndexOf(openBrace) + 1;
var close = _input.IndexOf(closeBrace);
var length = close != -1 ? close - open : _input.Length;
var newInput = _input.Substring(close + 1);
var nextClose = newInput.Contains(openBrace) ? newInput.IndexOf(openBrace) : newInput.Length;
if (i == 0 && open > 0)
{
sqlReplaceCommand.Text = _input.Substring(0, open - 1);
_input = _input.Substring(open - 1);
}
else
{
sqlReplaceCommand.Command = _input.Substring(open, length);
sqlReplaceCommand.PlaceHolder = openBrace + i + closeBrace;
sqlReplaceCommand.Text = _input.Substring(close + 1, nextClose);
sqlReplaceCommand.NewInput = _input.Substring(close + 1);
_input = newInput.Contains(openBrace) ? sqlReplaceCommand.NewInput : string.Empty;
}
sqlCommands.Add(sqlReplaceCommand);
i++;
}
return sqlCommands.GetParsedString();
}
internal class SqlReplaceCommand
{
public string Command { get; set; }
public string SqlResult { get; set; }
public string PlaceHolder { get; set; }
public string Text { get; set; }
protected internal string NewInput { get; set; }
}
}
internal static class SqlReplaceExtensions
{
public static string GetParsedString(this IEnumerable<MailingMergeString.SqlReplaceCommand> sqlCommands)
{
return sqlCommands.Aggregate("", (current, replaceCommand) => current + (replaceCommand.PlaceHolder + replaceCommand.Text));
}
}

Parsing tree in C#

I have a [textual] tree like this:
+---step-1
| +---step_2
| | +---step3
| | \---step4
| +---step_2.1
| \---step_2.2
+---step1.2
Tree2
+---step-1
| \---step_2
| | +---step3
| | \---step4
+---step1.2
This is just a small example, tree can be deeper and with more children and etc..
Right now I'm doing this:
for (int i = 0; i < cmdOutList.Count; i++)
{
string s = cmdOutList[i];
String value = Regex.Match(s, #"(?<=\---).*").Value;
value = value.Replace("\r", "");
if (s[1].ToString() == "-")
{
DirectoryNode p = new DirectoryNode { Name = value };
//p.AddChild(f);
directoryList.Add(p);
}
else
{
DirectoryNode f = new DirectoryNode { Name = value };
directoryList[i - 1].AddChild(f);
directoryList.Add(f);
}
}
But this doesn't handle the "step_2.1" and "step_2.2"
I think I'm doing this totally wrong, maybe someone can help me out with this.
EDIT:
Here is the DirectoryNode class to make that a bit more clear..
public class DirectoryNode
{
public DirectoryNode()
{
this.Children = new List<DirectoryNode>();
}
public DirectoryNode ParentObject { get; set; }
public string Name;
public List<DirectoryNode> Children { get; set; }
public void AddChild(DirectoryNode child)
{
child.ParentObject = this;
this.Children.Add(child);
}
}

If your text is that simple (just either +--- or \--- preceded by a series of |), then a regex might be more than you need (and what's tripping you up).
DirectoryNode currentParent = null;
DirectoryNode current = null;
int lastStartIndex = 0;
foreach(string temp in cmdOutList)
{
string line = temp;
int startIndex = Math.Max(line.IndexOf("+"), line.IndexOf(#"\");
line = line.Substring(startIndex);
if(startIndex > lastStartIndex)
{
currentParent = current;
}
else if(startIndex < lastStartIndex)
{
for(int i = 0; i < (lastStartIndex - startIndex) / 4; i++)
{
if(currentParent == null) break;
currentParent = currentParent.ParentObject;
}
}
lastStartIndex = startIndex;
current = new DirectoryNode() { Name = line.Substring(4) };
if(currentParent != null)
{
currentParent.AddChild(current);
}
else
{
directoryList.Add(current);
}
}

Regex definitely looks unnecessary here, since the symbols in your markup language (that's what it is, after all) are both static and few. That is: Although the label names may vary, the tokens you need to look for when trying to parse them into relevant pieces will never be anything other than +---, \---, and ..
From a question I answered yesterday: "Regexes are extremely useful for describing a whole class of needles in a largely unknown haystack, but they're not the right tool for input that's in a very static format."
String manipulation is what you want for parsing this, especially since you're dealing with a recursive markup language, which can't be fully understood by regex anyway. I'd also suggest creating a tree-type data structure to store the data (which, surprisingly, doesn't seem to be included in the framework unless they added it after 2.0).
As an aside, your regex above seems to have an unnecessary \ in it, but that doesn't matter in most regex flavors.

Simple regex - replace quote information in forum topic in C#

This should be pretty straightforward I would think.
I have this string:
[quote=Joe Johnson|1]Hi![/quote]
Which should be replaced with something like
<div class="quote">Hi!<div>JoeJohnson</div></div>
I'm pretty shure this is not going very well. So far I have this:
Regex regexQuote = new Regex(#"\[quote\=(.*?)\|(.*?)\](.*?)\[\/quote\]");
Can anyone point me in the right direction?
Any help appreciated!

Try this:
string pattern = #"\[quote=(.*?)\|(\d+)\]([\s\S]*?)\[/quote\]";
string replacement =
#"<div class=""quote"">$3<div>$1</div></div>";
Console.WriteLine(
Regex.Replace(input, pattern, replacement));

why didn't you said you want to deal with nested tags as well...
i've barely ever worked with regex, but here the thing:
static string ReplaceQuoteTags(string input)
{
const string closeTag = #"[/quote]";
const string pattern = #"\[quote=(.*?)\|(\d+?)\](.*?)\[/quote\]"; //or whatever you prefer
const string replacement = #"<div class=""quote"">{0}<div>{2}</div></div>";
int searchStartIndex = 0;
int closeTagIndex = input.IndexOf(closeTag, StringComparison.OrdinalIgnoreCase);
while (closeTagIndex > -1)
{
Regex r = new Regex(pattern, RegexOptions.RightToLeft | RegexOptions.IgnoreCase);
bool found = false;
input = r.Replace(input,
x =>
{
found = true;
return string.Format(replacement, x.Groups[3], x.Groups[2], x.Groups[1]);
}
, 1, closeTagIndex + closeTag.Length);
if (!found)
{
searchStartIndex = closeTagIndex + closeTag.Length;
//in case there is a close tag without a proper corresond open tag.
}
closeTagIndex = input.IndexOf(closeTag, searchStartIndex, StringComparison.OrdinalIgnoreCase);
}
return input;
}

this should be your regex in dot net:
\[quote\=(?<name>(.*))\|(?<id>(.*))\](?<content>(.*))\[\/quote\]
string name = regexQuote.Match().Groups["name"];
string id = regexQuote.Match().Groups["id"];
//..

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

get div element contents in C# - c#

You can strip your html content using following function public static string StripHTMLTag(string strHTML) { return Regex.Replace(strHTML, "<(.|\n)*?>", ""); } pass your content of outer tag, this will strip all html tags and provide you only content. Hope this helps Imran

Related

How do I remove all HTML tags from a string without knowing which tags are in it?

Regular expression can't handle rogue square brackets

String Replacements for Word Merge

Parsing tree in C#

Simple regex - replace quote information in forum topic in C#

Categories

Resources