Regular expression : parse html to certain html - c#

We having a HTML content like
<em></ em >
<font style="text-align:justify;">aaaaaaaaaaa</font>
<img src="abc.jpg"/>
<iframe src="somelink.com">
</iframe>
<br>
<br/>
We want to change all HTML Tags to <p></p>
but do not change the <img/> and <br/> tag, some <br/> tags may display <br>
so, the following is our expected result:
<p></p>
<p>aaaaaaaaaaa</p>
<img src="abc.jpg"/>
<p>
</p>
<br>
<br/>
My regular expression like (in C#):
String result = Regex.Replace(content, #"<[^/b>]*>", "<p>");
result = Regex.Replace(result, #"</[^>]*>", "</p>");
but it can't skip the certain tags,
please help me, thanks !

You can use this:
<(?<close>/?)((?!img|br).)*?>
and replace with:
<${close}p>
CODE SAMPLE

Related

Angular js Replace html string with Tokens

I have a Value coming from model that is relatively a huge html String. It also has a token that needs to be replaced with a value from Angular. I am trying to see how to get both of ends meet.
my Html
<h2 class="sub-title"> #Model.WelcomeText </h2>
Now this Welcome text is a relatively a html string that has a token. Something like this
"Hi ##Name , Welcome to our Website. Click <a href='#'> here </a>
for more details. <small> For more details please visit blah blah
</small>
Now i need to replace ##Name with a model value in my controller- $scope.Name
I tried
<h2 class="sub-title" ng-init= $scope.welcomeText('#Model.WelcomeText') </h2>
and then in my controller
function welcomeText(str)
{
return str.replace('##Name',$scope.Name);
}
But it breaks in the html itself because of $scope.welcomeText has invalid values inside ng-init
Any pointers on how to achieve this? The Model.Title is from a Sitecore CMS. I dont have that value in JS
I ended up doing this. Not sure if this is the ideal solution but atleast it works.
//Added a div with hidden class to get the value.
<div class="hidden" id="inp-title-server" /> #Model.Title </div>
<h2 class="sub-title" data-ng-bind-html="$scope.header"></h2>
Then in my controller,
var title = $("#inp-title-server").text();
$scope.header = title.replace('##Name', $scope.Name);
You can simply use replace function of string.
angular.module("app",[])
.controller("ctrl",['$scope',function($scope){
$scope.WelcomeText = "Hi ##Name , Welcome to our Website. Click <a href='#'> here </a> for more details. <small> For more details please visit blah blah </small>";
$scope.UserName = "ABC";
$scope.WelcomeText = $scope.WelcomeText.replace("##Name",$scope.UserName);
}]);
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.2.23/angular.min.js"></script>
<html>
<body ng-app="app" ng-controller="ctrl">
<h4 class="sub-title" ng-bind="WelcomeText"> </h4>
</body>
<html>

Extract Content from <span class=“value ”> </div> Tag C# RegEx

I have some HTML code:
<div class="code">
<span class="title">desc</span>:<span class="value">'Custom text'</span>,
<div class="code">
<span class="title">
</span>
<div>
I need to get the content between <span class="value">...</div> tags. I tried to get - Custom text. How can I do it with Regex and C#?
You can capture in first group like:
<span .*? class="value".*?>([^<]+)<\/span>
You can use the following regex:
<span[^>]*class=\"value\"[^>]*>([^<]*)<\/span>
Add runat="server" to the control of which you want to extract the data in your CS page
HTML:
<span class="value" id="test" runat="server">'Custom text'</span>
CS
String vl=test.InnerHtml;

CsQuery replace tags

I using CsQuery in order to parse HTML documents. What I'm trying to do is to replace all the "br" HTML tags with "." character.
Assuming that this is my input HTML:
<html>
<body>
Hello
<br>
World
</body>
</html>
The requested output will be:
<html>
<body>
Hello
.
World
</body>
</html>
Pseudo code:
CQ dom = CQ.CreateFromUrl("http://my.url");
dom.ReplaceTag("<br>", ".");
Is this possible?
Thanks for advices.
That's pretty easy, just replace the <br> elements by setting their OuterHTML.
The relevant selector is just "br":
foreach (var br in dom["br"])
br.OuterHTML = ".";
Call dom.Render() to see the result.

Changing html with Regex in C#

I've got source string like
<p>blablabla</p>
<p><img style="float: left;" src="../Content/attachments/455dd178-db28-4856-85e8-c65c8e6b04df_312540909.jpg" alt="455dd178-db28-4856-85e8-c65c8e6b04df_312540909.jpg" />blablabla</p>
<p><img style="float: right;" src="../Content/attachments/dec0f850-2921-4bf7-87b8-d2410e04a841_image001.gif" alt="dec0f850-2921-4bf7-87b8-d2410e04a841_image001.gif" width="100" /></p>
For each img element I need to remove alt attribute, and replace src elements with srcFileName.Substring(37).
Can't figure out the regex needed. Please help.
Had to use Html Agility Pack for this

Replace newlines with <p> paragraph and with <br /> tags

So I know how to replace newlines in my C# code. But replacing a newline for a <br /> tag isn't always very correct.
So I was wondering what kind of strategy do others use? The correct way I guess would be to use <p> tags and <br /> tags.
Here are some examples of the results I would like to get.
If there is no newline I want the text to wrapped in a <p> tags.
This text contains no newlines
<p>This text contains no newlines</p>
If the text contains a newline I want it to be replaced by a <br /> tag and be wrapped in <p> tags.
This text contains
1 newline
<p>This text contains<br /> 1 newline.</p>
If there are 'double newlines' I want that block to be wrapped in <p> tags.
This is a text with 'double newlines' at the end.
This is a text with no newline at the end.
<p>This a text with 'double newlines at the end.</p>
<p>This is a text with no newline at the end.</p>
I could write more examples/combination but I guess it's somewhat clear what I mean.
Thanks in advance.
Here's a way you could do it using only simple string replacements:
string result = "<p>" + text
.Replace(Environment.NewLine + Environment.NewLine, "</p><p>")
.Replace(Environment.NewLine, "<br />")
.Replace("</p><p>", "</p>" + Environment.NewLine + "<p>") + "</p>";
Note that your text must be HTML-escaped first otherwise you could be at risk of cross-site scripting attacks. (Note: even using <pre> tags still has a cross-site scripting risk).
You could just leave it alone and use CSS to render the breaks correctly. Here is a complicated example that is a kind of "pretty" replacement for the <pre> but you are using a <p> instead:
<p style="padding: 1em; line-height: 1.1em; font-family: monospace; white-space: pre; overflow: auto; background-color: rgb(240,255,240); border: thin solid rgb(255,220,255);">
Text goes here.
</p>

Categories