How to insert custom page number in Aspose.Words - c#

I want to add custom page numbers (like 1/2,2/2) to word document with using Aspose.Words. But I couldn't find any sample for c# language. I tried to overrite footer but i couldn't give a format to page numbers.
Pls help!
Thanks!
edit
After i tried first answer,it worked as what i want but another problem came up. I adding child documents to main document. I can only formatting main document's number. Child documents still have ordinary page number.
Here a sample of code;
public void AddChildDocs (System.IO.Stream parentStream, List<System.IO.Stream> childStreams)
{
doc = new Aspose.Words.Document(parentStream);
if (Items.Count > 0)
{
WordReplacer evaluator = new WordReplacer(this);
doc.Range.Replace(new Regex(ReplaceRegex), evaluator, false);
}
foreach (var item in childStreams)
{
Aspose.Words.Document childDoc = new Aspose.Words.Document(item);
if (Items.Count > 0)
{
WordReplacer evaluator = new WordReplacer(this);
childDoc.Range.Replace(new Regex(ReplaceRegex), evaluator, false);
}
doc.AppendDocument(childDoc, ImportFormatMode.KeepSourceFormatting);
}
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToHeaderFooter(HeaderFooterType.FooterPrimary);
builder.InsertField("PAGE", "");
builder.Write(" / ");
builder.InsertField("NUMPAGES", "");
}

You can get idea from this page in Aspose documentation. Below is the sample code taken from the same page, but only related to custom page numbers.
String src = dataDir + "Page numbers.docx";
String dst = dataDir + "Page numbers_out.docx";
// Create a new document or load from disk
Aspose.Words.Document doc = new Aspose.Words.Document(src);
// Create a document builder
Aspose.Words.DocumentBuilder builder = new DocumentBuilder(doc);
// Go to the primary footer
builder.MoveToHeaderFooter(HeaderFooterType.FooterPrimary);
// Add fields for current page number
builder.InsertField("PAGE", "");
// Add any custom text
builder.Write(" / ");
// Add field for total page numbers in document
builder.InsertField("NUMPAGES", "");
// Import new document
Aspose.Words.Document newDoc = new Aspose.Words.Document(dataDir + "new.docx");
// Link the header/footer of first section to previous document
newDoc.FirstSection.HeadersFooters.LinkToPrevious(true);
doc.AppendDocument(newDoc, ImportFormatMode.UseDestinationStyles);
// Save the document
doc.Save(dst);
I work with Aspose as Developer Evangelist.

Here is the code to set custom page number in aspose.word, when you set page margins and starting page number then it automatically get next page when that particular page area is finished. Try this it will work...
section.PageSetup.PaperSize = PaperSize.Letter;
section.PageSetup.LeftMargin = 10;
section.PageSetup.RightMargin = 10;
section.PageSetup.TopMargin = 00;
section.PageSetup.BottomMargin = 0;
section.PageSetup.HeaderDistance = 50;
section.PageSetup.FooterDistance = 50;
section.PageSetup.Borders.Color = Color.Black;
section.PageSetup.PageStartingNumber = 1;

Related

Apose.Words ImportNode ignores font formatting when appendingchild

I am currently using Aspose.Words to open a document, pull content between a bookmark start and a bookmark end and then place that content into another document. The issue that I'm having is that when using the ImportNode method is imports onto my document but changes all of the fonts from Calibri to Times New Roman and changes the font size from whatever it was on the original document to 12pt.
The way I'm obtaining the content from the bookmark is by using the Aspose ExtractContent method.
Because I'm having the issue with the ImportNode stripping my font formatting I tried making some adjustments and saving each node to an HTML string using ToString(HtmlSaveOptions). This works mostly but the problem with this is it is stripping out my returns on the word document so none of my text has the appropriate spacing. My returns end up coming in as HTML in the following format
"<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>"
When using
DocumentBuilder.InsertHtml("<p style=\"margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt\"><span style=\"font-family:Calibri; display:none; -aw-import:ignore\"> </span></p>");
it does not correctly add the return on the word document.
Here is the code I'm using, please forgive the comments etc... this has been my attempts at correcting this.
public async Task<string> GenerateHtmlString(Document srcDoc, ArrayList nodes)
{
// Create a blank document.
Document dstDoc = new Document();
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
// Remove the first paragraph from the empty document.
dstDoc.FirstSection.Body.RemoveAllChildren();
// Create a new Builder for the temporary document that gets generated with the header or footer data.
// This allows us to control font and styles separately from the main document being built.
var newBuilder = new DocumentBuilder(dstDoc);
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.SaveFormat = SaveFormat.Html;
htmlSaveOptions.ExportFontsAsBase64 = true;
htmlSaveOptions.ExportFontResources = true;
htmlSaveOptions.ExportTextBoxAsSvg = true;
htmlSaveOptions.ExportRoundtripInformation = true;
htmlSaveOptions.Encoding = Encoding.UTF8;
// Obtain all the links from the source document
// This is used later to add hyperlinks to the html
// because by default extracting nodes using Aspose
// does not pull in the links in a usable way.
var srcDocLinks = srcDoc.Range.Fields.GroupBy(x => x.DisplayResult).Select(x => x.First()).Where(x => x.Type == Aspose.Words.Fields.FieldType.FieldHyperlink).Distinct().ToList();
var childNodes = nodes.Cast<Node>().Select(x => x).ToList();
var oldBuilder = new DocumentBuilder(srcDoc);
oldBuilder.MoveToBookmark("Header");
var allchildren = oldBuilder.CurrentParagraph.Runs;
var allChildNodes = childNodes[0].Document.GetChildNodes(NodeType.Any, true);
var headerText = allChildNodes[0].Range.Bookmarks["Header"].BookmarkStart.GetText();
foreach (Node node in nodes)
{
var html = node.ToString(htmlSaveOptions);
try
{
//   is used by aspose because it works in XML
// If we see this character and the text of the node is \r we need to insert a break
if (html.Contains(" ") && node.Range.Text == "\r")
{
newBuilder.InsertHtml(html, false);
// Change the node into an HTML string
/*var htmlString = node.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
// Get all the child nodes of the html document
var allChildNodes = tempHtmlLinkDoc.DocumentNode.SelectNodes("//*");
// Loop over all child nodes so we can make sure we apply the correct font family and size to the break.
foreach (var childNode in allChildNodes)
{
// Get the style attribute from the child node
var childNodeStyles = childNode.GetAttributeValue("style", "").Split(';');
foreach (var childNodeStyle in childNodeStyles)
{
// Apply the font name and size to the new builder on the document.
if (childNodeStyle.ToLower().Contains("font-family"))
{
newBuilder.Font.Name = childNodeStyle.Split(':')[1].Trim();
}
if (childNodeStyle.ToLower().Contains("font-size"))
{
newBuilder.Font.Size = Convert.ToDouble(childNodeStyle.Split(':')[1]
.Replace("pt", "")
.Replace("px", "")
.Replace("em", "")
.Replace("rem", "")
.Replace("%", "")
.Trim());
}
}
}
// Insert the break with the corresponding font size and name.
newBuilder.InsertBreak(BreakType.ParagraphBreak);*/
}
else
{
// Loop through the source document links so the link can be applied to the HTML.
foreach (var srcDocLink in srcDocLinks)
{
if (html.Contains(srcDocLink.DisplayResult))
{
// Now that we know the html string has one of the links in it we need to get the address from the node.
var linkAddress = srcDocLink.Start.NextSibling.GetText().Replace(" HYPERLINK \"", "").Replace("\"", "");
//Convert the node into an HTML String so we can get the correct font color, name, size, and any text decoration.
var htmlString = srcDocLink.Start.NextSibling.ToString(SaveFormat.Html);
var tempHtmlLinkDoc = new HtmlDocument();
tempHtmlLinkDoc.LoadHtml(htmlString);
var linkStyles = tempHtmlLinkDoc.DocumentNode.ChildNodes[0].GetAttributeValue("style", "").Split(';');
var linkStyleHtml = "";
foreach (var linkStyle in linkStyles)
{
if (linkStyle.ToLower().Contains("color"))
{
linkStyleHtml += $"color:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-family"))
{
linkStyleHtml += $"font-family:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("font-size"))
{
linkStyleHtml += $"font-size:{linkStyle.Split(':')[1].Trim()};";
}
if (linkStyle.ToLower().Contains("text-decoration"))
{
linkStyleHtml += $"text-decoration:{linkStyle.Split(':')[1].Trim()};";
}
}
if (linkAddress.ToLower().Contains("mailto:"))
{
// Since the link has mailto included don't add the target attribute to the link.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
else
{
// Since the links is not an email include the target attribute.
html = new Regex($#"\b{srcDocLink.DisplayResult}\b").Replace(html, $"{srcDocLink.DisplayResult}");
//html = html.Replace(srcDocLink.DisplayResult, $"{srcDocLink.DisplayResult}");
}
}
}
// Inseret the HTML String into the temporary document.
newBuilder.InsertHtml(html, false);
}
}
catch (Exception ex)
{
throw;
}
}
// This is just for debugging/troubleshooting purposes and to make sure thigns look correct
string tempDocxPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.docx");
dstDoc.Save(tempDocxPath);
// We generate this HTML file then load it back up and pass the DocumentNode.OuterHtml back to the requesting method.
ELSLogHelper.InsertInfoLog(_callContext, ELSLogHelper.AsposeLogMessage("Save"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
string tempHtmlPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "temp", "TemporaryCompiledDocument.html");
dstDoc.Save(tempHtmlPath, htmlSaveOptions);
var tempHtmlDoc = new HtmlDocument();
tempHtmlDoc.Load(tempHtmlPath);
var htmlText = tempHtmlDoc.DocumentNode.OuterHtml;
// Clean up our mess...
if (File.Exists(tempDocxPath))
{
File.Delete(tempDocxPath);
}
if (File.Exists(tempHtmlPath))
{
File.Delete(tempHtmlPath);
}
// Return the generated HTML string.
return htmlText;
}
Saving each node to HTML and then inserting them into the destination document is not a good idea. Because not all nodes can be properly saved to HTML and some formatting can be lost after Aspose.Words DOM -> HTML -> Aspose.Words DOM roundtrip.
Regarding the original issue, the problem might occur because you are using ImportFormatMode.UseDestinationStyles, in this case styles and default of the destination document are used and font might be changed. If you need to keep the source document formatting, you should use ImportFormatMode.KeepSourceFormatting.
If the problem occurs even with ImportFormatMode.KeepSourceFormatting this must be a bug and you should report this to Aspose.Words staff in the support forum.

Add date on Presentations (Open XML SDK) on C#

My reference is Presentations (Open XML SDK)
With the code-behind below I add the date on existing PowerPoint.
This code working and in the first slide the date is added, but it's possible customize the point on the page where to insert this date? The font and size ?
This is current output:
Thank you in advance for help.
string fileName = #"C:\\inetpub\\wwwroot\\aspnet\\Template\\01_FOCUS.pptx";
using (PresentationDocument oPDoc = PresentationDocument.Open(fileName, true))
{
PresentationPart oPPart = oPDoc.PresentationPart;
SlideIdList slideIdList = oPPart.Presentation.SlideIdList;
SlidePart sp = slideIdList.ChildElements
.Cast<SlideId>()
.Select(x => oPPart.GetPartById(x.RelationshipId))
.Cast<SlidePart>().First();
AddDateToSlidePart(sp);
}
public static void AddDateToSlidePart(SlidePart slidePart1)
{
Slide slide1 = slidePart1.Slide;
CommonSlideData commonSlideData1 = slide1.GetFirstChild<CommonSlideData>();
ShapeTree shapeTree1 = commonSlideData1.GetFirstChild<ShapeTree>();
DocumentFormat.OpenXml.Presentation.Shape shape1 =
new DocumentFormat.OpenXml.Presentation.Shape();
DocumentFormat.OpenXml.Presentation.NonVisualShapeProperties nonVisualShapeProperties1 =
new DocumentFormat.OpenXml.Presentation.NonVisualShapeProperties();
DocumentFormat.OpenXml.Presentation.NonVisualDrawingProperties nonVisualDrawingProperties1 =
new DocumentFormat.OpenXml.Presentation.NonVisualDrawingProperties() { Id = (UInt32Value)4U, Name = "Date Placeholder 3" };
DocumentFormat.OpenXml.Presentation.NonVisualShapeDrawingProperties nonVisualShapeDrawingProperties1 =
new DocumentFormat.OpenXml.Presentation.NonVisualShapeDrawingProperties();
DocumentFormat.OpenXml.Drawing.ShapeLocks shapeLocks1 =
new DocumentFormat.OpenXml.Drawing.ShapeLocks() { NoGrouping = true };
nonVisualShapeDrawingProperties1.Append(shapeLocks1);
ApplicationNonVisualDrawingProperties applicationNonVisualDrawingProperties1 =
new ApplicationNonVisualDrawingProperties();
PlaceholderShape placeholderShape1 =
new PlaceholderShape() { Type = PlaceholderValues.DateAndTime, Size = PlaceholderSizeValues.Half, Index = (UInt32Value)10U };
applicationNonVisualDrawingProperties1.Append(placeholderShape1);
nonVisualShapeProperties1.Append(nonVisualDrawingProperties1);
nonVisualShapeProperties1.Append(nonVisualShapeDrawingProperties1);
nonVisualShapeProperties1.Append(applicationNonVisualDrawingProperties1);
DocumentFormat.OpenXml.Presentation.ShapeProperties shapeProperties1 =
new DocumentFormat.OpenXml.Presentation.ShapeProperties();
DocumentFormat.OpenXml.Presentation.TextBody textBody1 =
new DocumentFormat.OpenXml.Presentation.TextBody();
DocumentFormat.OpenXml.Drawing.BodyProperties bodyProperties1 =
new DocumentFormat.OpenXml.Drawing.BodyProperties();
DocumentFormat.OpenXml.Drawing.ListStyle listStyle1 =
new DocumentFormat.OpenXml.Drawing.ListStyle();
DocumentFormat.OpenXml.Drawing.Paragraph paragraph1 =
new DocumentFormat.OpenXml.Drawing.Paragraph();
DocumentFormat.OpenXml.Drawing.Field field1 =
new DocumentFormat.OpenXml.Drawing.Field() { Id = "{528B97E8-8E4B-4D32-BA17-4F287283DFD6}", Type = "datetime1" };
DocumentFormat.OpenXml.Drawing.RunProperties runProperties1 =
new DocumentFormat.OpenXml.Drawing.RunProperties() { Language = "it-IT" };
DocumentFormat.OpenXml.Drawing.Text text1 =
new DocumentFormat.OpenXml.Drawing.Text();
text1.Text = DateTime.Now.ToString("dd/mm/yyyy");
field1.Append(runProperties1);
field1.Append(text1);
DocumentFormat.OpenXml.Drawing.EndParagraphRunProperties endParagraphRunProperties1 =
new DocumentFormat.OpenXml.Drawing.EndParagraphRunProperties() { Language = "it-IT" };
paragraph1.Append(field1);
paragraph1.Append(endParagraphRunProperties1);
textBody1.Append(bodyProperties1);
textBody1.Append(listStyle1);
textBody1.Append(paragraph1);
shape1.Append(nonVisualShapeProperties1);
shape1.Append(shapeProperties1);
shape1.Append(textBody1);
shapeTree1.Append(shape1);
}
The OpenXML SDK Productivity tool (downloadable from the Microsoft site) is your friend here. When I need to do something like this, I:
Create the document I want
Open it in the appropriate tool (PowerPoint in this case).
Make a tiny change (to "dirty" the document and make PowerPoint believe it needs to be changed).
Save the result
Make the changes to the document that I want to see, and save the result with a new name
Open the OpenXML Productivity Tool and click the "Compare Files" tool.
The reason I force a save (in steps 3/4) is so that PowerPoint can add all of it's PowerPoint-ness to the document I created. The second save (step 5) will necessarily have all of that, so I want the two documents as close as possible - with only the "changes to the document that I want to see" being the difference between the two documents.
At this point, you should see a path to a solution to your problem. That tool is indispensable when working with OOXML.

Aspose PDF - get text from page that has a matching string

I'm working with an existing library - the goal of the library is to pull text out of PDFs to verify against expected values to quality check recorded data vs data in pdf.
I'm looking for a way to succinctly pull a specific page worth of text given a string that should only fall on that specific page.
var pdfDocument = new Document(file.PdfFilePath);
var textAbsorber = new TextAbsorber{
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
pdfDocument.Pages.Accept(textAbsorber);
foreach (var page in pdfDocument.Pages)
{
}
I'm stuck inside the foreach(var page in pdfDocument.Pages) portion... or is that the right area to be looking?
Answer: Text Absorber recreated each page - inside the foreach loop.
If the absorber isn't recreated, it keeps text from previous loops.
public List<string> ProcessPage(MyInfoClass file, string find)
{
var pdfDocument = new Document(file.PdfFilePath);
foreach (Page page in pdfDocument.Pages)
{
var textAbsorber = new TextAbsorber {
ExtractionOptions = {
FormattingMode = TextExtractionOptions.TextFormattingMode.Pure
}
};
page.Accept(textAbsorber);
var ext = textAbsorber.Text;
var exts = ext.Replace("\n", "").Split('\r').ToList();
if (ext.Contains(find))
return exts;
}
return null;
}

write to powerpoint pptx from c# using openxml

I am writing a data in to a pptx last slide using openxml in csharp and below is code to save the ppt
public static void CreateTableInLastSlide(PresentationDocument presentationDocument)
{
// Get the presentation Part of the presentation document
PresentationPart presentationPart = presentationDocument.PresentationPart;
// Get the Slide Id collection of the presentation document
var slideIdList = presentationPart.Presentation.SlideIdList;
if (slideIdList == null)
{
throw new NullReferenceException("The number of slide is empty, please select a ppt with a slide at least again");
}
// Get all Slide Part of the presentation document
var list = slideIdList.ChildElements
.Cast<SlideId>()
.Select(x => presentationPart.GetPartById(x.RelationshipId))
.Cast<SlidePart>();
// Get the last Slide Part of the presentation document
var tableSlidePart = (SlidePart)list.Last();
// Declare and instantiate the graphic Frame of the new slide
P.GraphicFrame graphicFrame = tableSlidePart.Slide.CommonSlideData.ShapeTree.AppendChild(new P.GraphicFrame());
ApplicationNonVisualDrawingPropertiesExtension applicationNonVisualDrawingPropertiesExtension = new ApplicationNonVisualDrawingPropertiesExtension();
P14.ModificationId modificationId1 = new P14.ModificationId() { Val = 3229994563U };
modificationId1.AddNamespaceDeclaration("p14", "http://schemas.microsoft.com/office/powerpoint/2010/main");
applicationNonVisualDrawingPropertiesExtension.Append(modificationId1);
graphicFrame.NonVisualGraphicFrameProperties = new DocumentFormat.OpenXml.Presentation.NonVisualGraphicFrameProperties
(new A.NonVisualDrawingProperties() { Id = 5, Name = "table 1" },
new A.NonVisualGraphicFrameDrawingProperties(new A.GraphicFrameLocks() { NoGrouping = true }),
new ApplicationNonVisualDrawingProperties(new ApplicationNonVisualDrawingPropertiesExtensionList(applicationNonVisualDrawingPropertiesExtension)));
graphicFrame.Transform = new Transform(new Offset() { X = 10, Y = 10 });
graphicFrame.Graphic = new A.Graphic(new A.GraphicData(GenerateTable()) { Uri = "http://schemas.openxmlformats.org/drawingml/2006/table" });
presentationPart.Presentation.Save();
}
the file is saved with required data but
While opening the powerpoint I get the following error:
Powerpoint found a problem with content in "Sample.pptx". Power point can attempt to repair the presentation.
Once repaired, the content is shown ok. but since the file is corrupted when i run the code it is not updating with the latest data nexttime.
Please help me to solve the issue
-tried to open in different version of office but no luck
-created the template from different versions still the issue occurs

Removing jquery and CSS from an Xml Document

I'm using sgmlreader to convert HTML to XML. The output goes into a XmlDocument object, which I can then use the InnerText method to extract the plain text from the website. I'm trying to get the text to look as clean as possible, by removing any javascript. Looping through the xml and removing any <script type="text/javascript"> is easy enough, but I've hit a brick wall when any jquery or styling isn't encapsulated in any tags. Can anybody help me out?
Sample Code:
Step one:
Once I use the webclient class to download the HTML, I save it, then open the file with the text reader class.
Step two:
Create sgmlreader class and set the input stream to the text reader:
// setup SGMLReader
Sgml.SgmlReader sgmlReader = new Sgml.SgmlReader();
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = reader;
// create document
doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
Step three:
Once I have a xmldocument, I use the doc.InnerText to get my plain text.
Step four:
I can easy remove JavaScript tags like so:
XmlNodeList nodes = document.GetElementsByTagName("text/javascript");
for (int i = nodes.Count - 1; i >= 0; i--)
{
nodes[i].ParentNode.RemoveChild(nodes[i]);
}
Some stuff still slips through. Heres an example of an ouput for one particular website I'm scriping:
Criminal and Civil Enforcement | Fraud | Office of Inspector General | U.S. Department of Health and Human Services
#fancybox-right {
right:-20px;
}
#fancybox-left {
left:-20px;
}
#fancybox-right:hover span, #fancybox-right span
#fancybox-right:hover span, #fancybox-right span {
left:auto;
right:0;
}
#fancybox-left:hover span, #fancybox-left span
#fancybox-left:hover span, #fancybox-left span {
right:auto;
left:0;
}
#fancybox-overlay {
/* background: url('/connections/images/wc-overlay.png'); */
/* background: url('/connections/images/banner.png') center center no-repeat; */
}
$(document).ready(function(){
$("a[rel=photo-show]").fancybox({
'titlePosition' : 'over',
'overlayColor' : '#000',
'overlayOpacity' : 0.9
});
$(".title-under").fancybox({
'titlePosition' : 'outside',
'overlayColor' : '#000',
'overlayOpacity' : 0.9
})
});
That jquery and styling needs to be removed.
I just threw this together in LinqPad based on the html of this page and it properly removes the script and style tags.
void Main()
{
string htmlPath = #"C:\Users\Jschubert\Desktop\html\test.html";
var sgmlReader = new Sgml.SgmlReader();
var stringReader = new StringReader(File.ReadAllText(htmlPath));
sgmlReader.DocType = "HTML";
sgmlReader.WhitespaceHandling = WhitespaceHandling.All;
sgmlReader.CaseFolding = Sgml.CaseFolding.ToLower;
sgmlReader.InputStream = stringReader;
// create document
var doc = new XmlDocument();
doc.PreserveWhitespace = true;
doc.XmlResolver = null;
doc.Load(sgmlReader);
List<XmlNode> nodes = doc.GetElementsByTagName("script")
.Cast<XmlNode>().ToList();
var byType = doc.SelectNodes("script[#type = 'text/javascript']")
.Cast<XmlNode>().ToList();
var style = doc.GetElementsByTagName("style").Cast<XmlNode>().ToList();
nodes.AddRange(byType);
nodes.AddRange(style);
for (int i = nodes.Count - 1; i >= 0; i--)
{
nodes[i].ParentNode.RemoveChild(nodes[i]);
}
doc.DumpFormatted();
stringReader.Close();
sgmlReader.Close();
}
Casting to XmlNode to use the generic list is not ideal, but I did it for the sake of space and demonstration.
Also, you shouldn't need both
doc.GetElementsByTagName("script") and
doc.SelectNodes("script[#type = 'text/javascript']").
Again, I did that for the sake of demonstration.
If you have other scripts and you only want to remove JavaScript, use the latter. If you're removing all script tags, use the first one. Or, use both if you want.

Categories