I have HTML code like this:
<tr>
<th colspan="2" style="padding: 10px; font-size: 11px; background: #eee; border: 1px solid white" align="left">
Some Text Here
</th>
</tr>
I am storing this in string like this:
string gtr =
#"<tr>
<th colspan=""2"" style=""padding: 10px; font-size: 11px; background: #eee; border: 1px solid white"" align=""left"">
Some Text Here
</th>
</tr>";
But When I Debug It Show String Like this:
<tr>
<th colspan=\"2\" style=\"padding: 10px; font-size: 11px; background: #eee; border: 1px solid white\" align=\"left\">
Some Text Here
</th>
</tr>
It Show Escape Sequence Characters.
It Tried to remove them like this
gtr = gtr.Replace(#"\","");
and use all other possible methods
But this not working it always show string gtr with Escape Sequence Character.
How to achieve string without Escape Sequence Character and gives only clear HTML code.
I am only Using ASP.NET with C# and not Using MVC and this is a static content.
But When I Debug It Show String Like this
That's because you're looking at the string in the debugger. The string doesn't actually contain those backslashes - they're just part of the debug output, which escapes various characters to make it look like it would as a regular string literal in code.
Write the string to a file or the console and you'll see the backslashes really aren't there.
As an alternative way of convincing yourself of this even in the debugger, try this:
string x = "\"\"";
int y = x.Length;
char z = x[0];
Then in the debugger you'll see that y is 2, and z is just " - it may be escaped again, but clearly it can't be both characters in \" as it's just a char.
When I check you code in Text Visualizer from Quick Watch it show string without escape characters, See following snap
Related
I am searching through a database to find span tags with video information for the purpose of migration.
My regex works well and I can extract all of the information I need for the most part. The trouble I run into is when the style tag is in a different position than expected. This throws off the expression and results in about 2/3rds of the captures I would expect.
If I try and nest the style capture group inside the main capture group, it fails to capture anything. I also tried using negative/positive lookaheads as well, but it only ever works if I make it an optional capture group. I think the problem is im not nesting it correctly. Most of the related questions give the answer of a negative lookbehind, but my understanding is that's more of a assertion/quantifier.
So how can I always capture the style tag regardless of its position in the span tag?
Regex flavor is .NET (server side)
I have a Regexr setup
/(?<tag><span class='vidly-vid' data-thumb='(?<thumb>http.+\.jpg)'.+aspect-ratio='(?<aspect>\d{1,3}:\d{1,3})'.+sources='\[{"file":.+"(?<src>(?<uri>https:\/\/cf1234.cloudfront\.net\/Vids\/)(?<key>(?<ident>[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}|[a-z0-9]{6})\/(?<mp4>mp4_1080.mp4|mp4_720.mp4|mp4_480.mp4|mp4_360.mp4|mp4.mp4))).+style='(?<style>.+width: (?<width>.+)px.+height: (?<height>.+)px.+)'.+<\/span>)/gmi
Sample Data
All of these should match. The first one does NOT, the other three do.
<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/691DBB43-5EC8-4D57-AF7B-99896D9BD5D1_19127.jpg' data-aspect-ratio='4:3' style='border-width: 0px; width: 352px; height: 240px;' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/6v1j0a/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/6v1j0a/mp4_360.mp4","label":"360p SD"}]'> </span>
<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/b181cfa5-565d-470a-b93a-2610987bb4da_28142.jpg' data-aspect-ratio='160:117' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/b181cfa5-565d-470a-b93a-2610987bb4da/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 600px; height: 480px;'> </span>
<table align="left" border="0" cellpadding="5" cellspacing="5" style="width:600px"> <tbody> <tr> <td><img alt="" src="/content/generator/Course_90016206/Case-10-LMLO_MG_FLAVOR1label.jpg" style="height:497px; width:324px" /></td> <td><span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414_28142.jpg' data-aspect-ratio='146:225' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/b2a7cbd3-5d31-49a5-bf89-aef0cf9f7414/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 324px; height: 500px;'> </span></td> </tr> </tbody> </table>
<span class='vidly-vid' data-thumb='https://cf1234.cloudfront.net/Vids/Thumbnails/231913a7-b608-4d8b-9332-64b6840c22f0_28142.jpg' data-aspect-ratio='16:9' data-sources='[{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/hls.m3u8","label":"HD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_480.mp4","label":"480p SD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_360.mp4","label":"360p SD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_720.mp4","label":"720p HD"},{"file":"https://cf1234.cloudfront.net/Vids/231913a7-b608-4d8b-9332-64b6840c22f0/mp4_1080.mp4","label":"1080p HD"}]' style='border-width: 0px; width: 920px; height: 520px;'> </span>
I'd personally just split up the regex into more manageable chunks, like so:
var spanRegex = new Regex(#"<span class='vidly-vid'.+<\/span>");
var attrRegexes = new[]{
#"data-thumb='(?<thumb>http.+\.jpg)'",
#"aspect-ratio='(?<aspect>\d{1,3}:\d{1,3})'",
#"sources='\[{""file"":.+""(?<src>(?<uri>https:\/\/cf1234.cloudfront\.net\/Vids\/)(?<key>(?<ident>[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}|[a-z0-9]{6})\/(?<mp4>mp4_1080.mp4|mp4_720.mp4|mp4_480.mp4|mp4_360.mp4|mp4.mp4)))",
#"style='(?<style>.+width: (?<width>.+)px.+height: (?<height>.+)px.+)'",
}
.Select(r => new Regex(r))
.ToList();
var results = inputs.Select(i => spanRegex.Match(i).Value)
.Select(i => new
{
i,
attributes =
from r in attrRegexes
let match = r.Match(i)
from g in match.Groups.Cast<Group>().Skip(1)
select new {g.Name, capture = g.Value}
});
Linqpad example
The HTTP GET response for a request is like below
<html>
<head> <script type="text/javascript">----</script> <script type="text/javascript">---</script> <title>Detailed Notes</title>
</head>
<body style="background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000"> <p>this is one note </p> </body> </html>
I am getting this as a string and i have to read the body part out of it.
I tried HtmlAgility pack, but HTML parsing is getting failed due to some specials in the html content (I think something from the commented script causing this issue).
So to read the tag content i am thinking of a SubString operation.
Like SubString from the beginning of <body tag.
How can we do SubString from the beginning of a word from a text?
Using a simple SubString() with IndexOf() + LastIndexOf():
string BodyContent = input.Substring(0, input.LastIndexOf("</body>") - 1).Substring(input.IndexOf("<body"));
BodyContent = BodyContent.Substring(BodyContent.IndexOf(">") + 1).Trim();
This will return:
<p> this is one note </p>
string FullBody = input.Substring(0, input.LastIndexOf("</body>") + 7).Substring(input.IndexOf("<body")).Trim();
This will return:
<body style = background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000' >< p > this is one note </p> </body>
The " will cause a problme so you need to replace every " after you get the request source
WebClient client = new WebClient(); // make an instance of webclient
string source = client.DownloadString("url").Replace("\"",",,"); // get the html source and escape " with any charachter
string code = "<body style=\"background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000\"> <p>this is one note </p> </body>";
MatchCollection m0 = Regex.Matches(code, "(<body)(?<body>.*?)(</body>)", RegexOptions.Singleline); // use RE to get between tags
foreach (Match m in m0) // loop through the results
{
string result = m.Groups["body"].Value.Replace(",,", "\""); // get the result and replace the " back
}
I need to get rid of the borders around the individual checkboxes that are rendered by a CheckBox control. Here's what it looks like now:
The ASP.Net markup is straightforward:
<asp:CheckBoxList ID="cblEthnicity" runat="server" RepeatDirection="Vertical"
RepeatColumns="3" RepeatLayout="Table" BorderStyle="None" BorderWidth="0">
</asp:CheckBoxList>
which is in a cell in a table with the class formTable applied (see below).
As you can see, I've tried setting the attributes BorderStyle="None" and BorderWidth="0" to no effect.
I'm pretty sure that what's behind this is the following CSS, which puts rounded corner borders around the enclosing table cells, which I want to keep:
.formTable
{
background-color: #eeeeee;
border: solid 1px #bbbbbb;
-moz-border-radius: 7px;
-webkit-border-radius: 7px;
border-radius: 7px;
}
.formTable tr, .formTable tr td, .formTable tr th
{
background-color: #eeeeee;
padding: 3px;
border: solid 1px #bbbbbb;
vertical-align: top;
}
I added the following CSS, which also did nothing:
.formTable tr td input[type="checkbox"]
{
border: none;
}
Finally, the HTML rendered from the .aspx for the CheckBoxList, as seen in Chrome DevTools, looks like this (edited a little for brevity):
<table id="main_cblEthnicity" style="border-width:0px; border-style:None; border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<tbody>
<tr>
<td style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<input id="main_cblEthnicity_0" type="checkbox" name="ctl00$main$cblEthnicity$0"
checked="checked" value="Native American" />
<label for="main_cblEthnicity_0">Native American</label>
</td>
...
</tr>
</tbody>
</table>
Any suggestions on how I can get rid of the unwanted borders?
UPDATE: Here are some images to make it more clear what's going on and what I'm trying to accomplish:
This is what I'm getting now:
This is what I get if I use either suggestion that has been presented so far:
This is what I'm trying to achieve:
In addition to the suggestions made here, I tried adding this to the CSS, but it made no difference:
.formTable tr td > input[type="checkbox"] {
border: none;
}
I also tried this in Javascript/jQuery:
<script type="text/javascript">
$(document).ready(function() {
$('.formTable tr td > input[type="checkbox"]').removeAttr("border");
});
</script>
The problem isn't the input but in it's td.
Look:
<td style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
Here (above) is defined the border radius. And here (below) the border color:
.formTable tr, .formTable tr td, .formTable tr th
{
background-color: #eeeeee;
padding: 3px;
border: solid 1px #bbbbbb;
vertical-align: top;
}
So, to change this, you may want to add just after the above CSS code, this:
.formTable tr td
{
border:0;
}
Doing this, you'll just make the td borders to disappear and not the borders of tr or th
UPDATE AFTER OP's CLARIFICATIONS
Oh, all right. Now with those new screenshots we can see well what you're tryning to do achieve.
Anyway, you're still trying to remove a border from the input, but I repeat, the problem isn't the input but it's td.
I'll explain you with the code you gave us ok? So:
<table id="main_cblEthnicity" style="border-width:0px; border-style:None; border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<tbody>
<tr>
<td style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;">
<input id="main_cblEthnicity_0" type="checkbox" name="ctl00$main$cblEthnicity$0"
checked="checked" value="Native American" />
<label for="main_cblEthnicity_0">Native American</label>
</td>
...
</tr>
</tbody>
</table>
This is the HTML code of the table that has inside all those checkboxes. All it's TDs have rounded borders and stuff we already know. This table that has inside all those checkboxes is inside a bigger TD (which borders you want to keep) W're in the following situation:
So now you got 2 ways to act without changing all your HTML: CSS or jQuery.
The CSS way
Pretty simple, you may want to put inline style at those table cells (which have checkboxes inside) like this: style="border:0" instead of style="border-top-left-radius:5px; border-top-right-radius:5px; border-bottom-left-radius:5px; border-bottom-right-radius:5px;". Or Just create a new CSS class like this
.no-borders {
border:0;
}
and apply it on every td you don't want to see.
The jQuery way
<script type="text/javascript">
$(document).ready(function() {
$('.formTable input[type="checkbox"]').parent().css('border','none');
});
</script>
Your code isn't showing it, but apparently at some point class .formTable is being assigned to the CheckBoxList. Just remove border: solid 1px #bbbbbb; from the second class declaration:
.formTable tr, .formTable tr td, .formTable tr th
{
background-color: #eeeeee;
padding: 3px;
vertical-align: top;
}
Demo: http://jsfiddle.net/pgpR3/1/
I am trying to print rich text content in pdf using itextSharp library with version 5.5.8.0. But I am I am printing that rich text, it will print only normal string without any style effect or html tags.
Here is my web view that contain rich text
But while printing in pdf using iTexhSharp, it will print like rich text but each html element it will start printing with new line, as like below image
Here is the code, that I am using to print rich text in pdf:
ElementList elements = XMLWorkerHelper.ParseToElementList(descriptionData, "");
PdfPCell cell = new PdfPCell();
foreach(var ele in elements) {
cellDescriptionData.AddElement(ele);
}
tableThirdBlock.AddCell(cellDescriptionData);
Here the "descriptionData" field will contain the html string.
I want to be print the same in pdf as available in web view.
Here is the actual HTML string, which is generated dynamically. So the html string will be dynamic with dynamic css and text.
" <b style="font-family: Arial, Verdana; font-size: 10pt; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal; color: rgb(204, 255, 255); background-color: rgb(51, 102, 255);">zzz </b><div style="font-family: Arial, Verdana; font-size: 10pt; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal;"><b style="color: rgb(204, 255, 255); background-color: rgb(51, 102, 255);"><br /></b></div><div><b style="font-family: Arial, Verdana; font-size: 10pt; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal; color: rgb(204, 255, 255); background-color: rgb(51, 102, 255);">Test </b> <span style="color: rgb(255, 0, 0);"> <span style="font-weight: bold;">Description</span> </span><span style="color: rgb(153, 51, 153); font-style: italic;">Other Color</span></div> ".
Is anything wrong, that I am missing ?
Please help me to print rich text with all effect and styles in pdf.
Thanks
I your comment, you are asking for an example on how to use XML Worker to parse HTML to a list of Element objects. Such an example can be found on the official iText web site in the answers to question such as:
How to adjust the page height to the content height? 1
How to convert Arabic HTML to PDF? 2
In 1, you'll discover that you can parse an HTML and CSS file to an ElementList object:
ElementList elements = XMLWorkerHelper.parseToElementList(HTML, CSS);
In 2, you'll learn how to add the elements in an ElementList to a PdfPCell:
PdfPTable table = new PdfPTable(1);
PdfPCell cell = new PdfPCell();
for (Element e : elements) {
cell.addElement(e);
}
table.addCell(cell);
document.add(table);
Note that I simplified the examples from the original questions. You don't need RTL because your text probably isn't Arabic. You can probably use the convenience method parseToElementList() instead of using the full code that was used in 2.
I have created a proof of concept that results in the file test-herin.pdf:
The HTML to get this result looks like this:
<div><span class="bluetextwhitebackground">zzz</span></div>
<div>
<span class="bluetextwhitebackground">Test</span>
<span class="redtext">Description</span>
<span class="italicpurple">Other Color</span>
</div>
The CSS to get the desired styles looks like this:
.bluetextwhitebackground
{ font-family: times; color: white; background: blue}
.redtext
{ font-family: times; color: red; }
.italicpurple
{ font-family: times; font-style: italic; color: purple }
As you can see, I used <div> when I want a block of text that causes a new line after the text is rendered. I used <span> in cases where I don't want a new line to appear.
It is hard to answer your question because you don't show your HTML. New lines are triggered when using <p> or <div> tags, or by defining something as a div in the CSS.
By the way, the code to generate the PDF can be found here: ParseHtml13
Update:
After you shared the HTML, I created another proof of concept:
The resulting PDF file is test-herin2.pdf and it looks like this:
This looks exactly the way I expect it. In other words: I can't reproduce the problem you're describing. If I can't reproduce a problem, I can't fix it.
I'm generating a PDF file based on some HTML, using the pechkin dll.
This is all working nicely except the background colors are not being rendered.
An example of the HTML I'm using is:
<table style="border-top: 0px solid black; border-bottom: 2px solid black; background-color: #99ccff; height: 30px; width: 800px;" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td><strong>Insured Details</strong></td>
</tr>
</tbody>
</table>
The code I'm using to generate the PDF is as below:
Dim buf As Byte() = Pechkin.Factory.Create(New GlobalConfig().SetMargins(New Margins(20, 20, 20, 20)) _
.SetDocumentTitle("").SetCopyCount(1).SetImageQuality(100) _
.SetLosslessCompression(True).SetMaxImageDpi(300).SetOutlineGeneration(True).SetOutputDpi(1200).SetPaperOrientation(True) _
.SetPaperSize(PaperKind.A4) _
.SetImageQuality(100) _
.SetPaperOrientation(False)).Convert(New ObjectConfig().SetPrintBackground(True).SetAllowLocalContent(True), strHTML)
Return buf
I've seen articles around the net that seem to indicate that my code should work fine, but it's not.
From memory I had to add this to see backgrounds:
.SetPrintBackground(true)
.SetScreenMediaType(true)