How can I remove watermark in PDF using itext/itextsharp? [duplicate] - c#

I added a watermark on pdf using Pdfstamper. Here is the code:
for (int pageIndex = 1; pageIndex <= pageCount; pageIndex++)
{
iTextSharp.text.Rectangle pageRectangle = reader.GetPageSizeWithRotation(pageIndex);
PdfContentByte pdfData = stamper.GetUnderContent(pageIndex);
pdfData.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252,
BaseFont.NOT_EMBEDDED), watermarkFontSize);
PdfGState graphicsState = new PdfGState();
graphicsState.FillOpacity = watermarkFontOpacity;
pdfData.SetGState(graphicsState);
pdfData.SetColorFill(iTextSharp.text.BaseColor.BLACK);
pdfData.BeginText();
pdfData.ShowTextAligned(PdfContentByte.ALIGN_CENTER, "LipikaChatterjee",
pageRectangle.Width / 2, pageRectangle.Height / 2, watermarkRotation);
pdfData.EndText();
}
This works fine. Now I want to remove this watermark from my pdf. I looked into iTextSharp but was not able to get any help. I even tried to add watermark as layer and then delete the layer but was not able to delete the content of layer from the pdf. I looked into iText for layer removal and found a class OCGRemover but I was not able to get an equivalent class in iTextsharp.

I'm going to give you the benefit of the doubt based on the statement "I even tried to add watermark as layer" and assume that you are working on content that you are creating and not trying to unwatermark someone else's content.
PDFs use Optional Content Groups (OCG) to store objects as layers. If you add your watermark text to a layer you can fairly easily remove it later.
The code below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0. It uses code based on Bruno's original Java code found here. The code is in three sections. Section 1 creates a sample PDF for us to work with. Section 2 creates a new PDF from the first and applies a watermark to each page on a separate layer. Section 3 creates a final PDF from the second but removes the layer with our watermark text. See the code comments for additional details.
When you create a PdfLayer object you can assign it a name to appear within a PDF reader. Unfortunately I can't find a way to access this name so the code below looks for the actual watermark text within the layer. If you aren't using additional PDF layers I would recommend only looking for /OC within the content stream and not wasting time looking for your actual watermark text. If you find a way to look for /OC groups by name please let me kwow!
using System;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string startFile = Path.Combine(workingFolder, "StartFile.pdf");
string watermarkedFile = Path.Combine(workingFolder, "Watermarked.pdf");
string unwatermarkedFile = Path.Combine(workingFolder, "Un-watermarked.pdf");
string watermarkText = "This is a test";
//SECTION 1
//Create a 5 page PDF, nothing special here
using (FileStream fs = new FileStream(startFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter witier = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
for (int i = 1; i <= 5; i++) {
doc.NewPage();
doc.Add(new Paragraph(String.Format("This is page {0}", i)));
}
doc.Close();
}
}
}
//SECTION 2
//Create our watermark on a separate layer. The only different here is that we are adding the watermark to a PdfLayer which is an OCG or Optional Content Group
PdfReader reader1 = new PdfReader(startFile);
using (FileStream fs = new FileStream(watermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (PdfStamper stamper = new PdfStamper(reader1, fs)) {
int pageCount1 = reader1.NumberOfPages;
//Create a new layer
PdfLayer layer = new PdfLayer("WatermarkLayer", stamper.Writer);
for (int i = 1; i <= pageCount1; i++) {
iTextSharp.text.Rectangle rect = reader1.GetPageSize(i);
//Get the ContentByte object
PdfContentByte cb = stamper.GetUnderContent(i);
//Tell the CB that the next commands should be "bound" to this new layer
cb.BeginLayer(layer);
cb.SetFontAndSize(BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED), 50);
PdfGState gState = new PdfGState();
gState.FillOpacity = 0.25f;
cb.SetGState(gState);
cb.SetColorFill(BaseColor.BLACK);
cb.BeginText();
cb.ShowTextAligned(PdfContentByte.ALIGN_CENTER, watermarkText, rect.Width / 2, rect.Height / 2, 45f);
cb.EndText();
//"Close" the layer
cb.EndLayer();
}
}
}
//SECTION 3
//Remove the layer created above
//First we bind a reader to the watermarked file, then strip out a bunch of things, and finally use a simple stamper to write out the edited reader
PdfReader reader2 = new PdfReader(watermarkedFile);
//NOTE, This will destroy all layers in the document, only use if you don't have additional layers
//Remove the OCG group completely from the document.
//reader2.Catalog.Remove(PdfName.OCPROPERTIES);
//Clean up the reader, optional
reader2.RemoveUnusedObjects();
//Placeholder variables
PRStream stream;
String content;
PdfDictionary page;
PdfArray contentarray;
//Get the page count
int pageCount2 = reader2.NumberOfPages;
//Loop through each page
for (int i = 1; i <= pageCount2; i++) {
//Get the page
page = reader2.GetPageN(i);
//Get the raw content
contentarray = page.GetAsArray(PdfName.CONTENTS);
if (contentarray != null) {
//Loop through content
for (int j = 0; j < contentarray.Size; j++) {
//Get the raw byte stream
stream = (PRStream)contentarray.GetAsStream(j);
//Convert to a string. NOTE, you might need a different encoding here
content = System.Text.Encoding.ASCII.GetString(PdfReader.GetStreamBytes(stream));
//Look for the OCG token in the stream as well as our watermarked text
if (content.IndexOf("/OC") >= 0 && content.IndexOf(watermarkText) >= 0) {
//Remove it by giving it zero length and zero data
stream.Put(PdfName.LENGTH, new PdfNumber(0));
stream.SetData(new byte[0]);
}
}
}
}
//Write the content out
using (FileStream fs = new FileStream(unwatermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (PdfStamper stamper = new PdfStamper(reader2, fs)) {
}
}
this.Close();
}
}
}

As an extension to Chris's answer, a VB.Net class for removing a layer is included at the bottom of this post which should be a bit more precise.
It goes through the PDF's list of layers (stored in the OCGs array in the OCProperties dictionary in the file's catalog). This array contains indirect references to objects in the PDF file which contain the name
It goes through the properties of the page (also stored in a dictionary) to find the properties which point to the layer objects (via indirect references)
It does an actual parse of the content stream to find instances of the pattern /OC /{PagePropertyReference} BDC {Actual Content} EMC so it can remove just these segments as appropriate
The code then cleans up all the references as much as it can. Calling the code might work as shown:
Public Shared Sub RemoveWatermark(path As String, savePath As String)
Using reader = New PdfReader(path)
Using fs As New FileStream(savePath, FileMode.Create, FileAccess.Write, FileShare.None)
Using stamper As New PdfStamper(reader, fs)
Using remover As New PdfLayerRemover(reader)
remover.RemoveByName("WatermarkLayer")
End Using
End Using
End Using
End Using
End Sub
Full class:
Imports iTextSharp.text
Imports iTextSharp.text.io
Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser
Public Class PdfLayerRemover
Implements IDisposable
Private _reader As PdfReader
Private _layerNames As New List(Of String)
Public Sub New(reader As PdfReader)
_reader = reader
End Sub
Public Sub RemoveByName(name As String)
_layerNames.Add(name)
End Sub
Private Sub RemoveLayers()
Dim ocProps = _reader.Catalog.GetAsDict(PdfName.OCPROPERTIES)
If ocProps Is Nothing Then Return
Dim ocgs = ocProps.GetAsArray(PdfName.OCGS)
If ocgs Is Nothing Then Return
'Get a list of indirect references to the layer information
Dim layerRefs = (From l In (From i In ocgs
Select Obj = DirectCast(PdfReader.GetPdfObject(i), PdfDictionary),
Ref = DirectCast(i, PdfIndirectReference))
Where _layerNames.Contains(l.Obj.GetAsString(PdfName.NAME).ToString)
Select l.Ref).ToList
'Get a list of numbers for these layer references
Dim layerRefNumbers = (From l In layerRefs Select l.Number).ToList
'Loop through the pages
Dim page As PdfDictionary
Dim propsToRemove As IEnumerable(Of PdfName)
For i As Integer = 1 To _reader.NumberOfPages
'Get the page
page = _reader.GetPageN(i)
'Get the page properties which reference the layers to remove
Dim props = _reader.GetPageResources(i).GetAsDict(PdfName.PROPERTIES)
propsToRemove = (From k In props.Keys Where layerRefNumbers.Contains(props.GetAsIndirectObject(k).Number) Select k).ToList
'Get the raw content
Dim contentarray = page.GetAsArray(PdfName.CONTENTS)
If contentarray IsNot Nothing Then
For j As Integer = 0 To contentarray.Size - 1
'Parse the stream data looking for references to a property pointing to the layer.
Dim stream = DirectCast(contentarray.GetAsStream(j), PRStream)
Dim streamData = PdfReader.GetStreamBytes(stream)
Dim newData = GetNewStream(streamData, (From p In propsToRemove Select p.ToString.Substring(1)))
'Store data without the stream references in the stream
If newData.Length <> streamData.Length Then
stream.SetData(newData)
stream.Put(PdfName.LENGTH, New PdfNumber(newData.Length))
End If
Next
End If
'Remove the properties from the page data
For Each prop In propsToRemove
props.Remove(prop)
Next
Next
'Remove references to the layer in the master catalog
RemoveIndirectReferences(ocProps, layerRefNumbers)
'Clean up unused objects
_reader.RemoveUnusedObjects()
End Sub
Private Shared Function GetNewStream(data As Byte(), propsToRemove As IEnumerable(Of String)) As Byte()
Dim item As PdfLayer = Nothing
Dim positions As New List(Of Integer)
positions.Add(0)
Dim pos As Integer
Dim inGroup As Boolean = False
Dim tokenizer As New PRTokeniser(New RandomAccessFileOrArray(New RandomAccessSourceFactory().CreateSource(data)))
While tokenizer.NextToken
If tokenizer.TokenType = PRTokeniser.TokType.NAME AndAlso tokenizer.StringValue = "OC" Then
pos = CInt(tokenizer.FilePointer - 3)
If tokenizer.NextToken() AndAlso tokenizer.TokenType = PRTokeniser.TokType.NAME Then
If Not inGroup AndAlso propsToRemove.Contains(tokenizer.StringValue) Then
inGroup = True
positions.Add(pos)
End If
End If
ElseIf tokenizer.TokenType = PRTokeniser.TokType.OTHER AndAlso tokenizer.StringValue = "EMC" AndAlso inGroup Then
positions.Add(CInt(tokenizer.FilePointer))
inGroup = False
End If
End While
positions.Add(data.Length)
If positions.Count > 2 Then
Dim length As Integer = 0
For i As Integer = 0 To positions.Count - 1 Step 2
length += positions(i + 1) - positions(i)
Next
Dim newData(length) As Byte
length = 0
For i As Integer = 0 To positions.Count - 1 Step 2
Array.Copy(data, positions(i), newData, length, positions(i + 1) - positions(i))
length += positions(i + 1) - positions(i)
Next
Dim origStr = System.Text.Encoding.UTF8.GetString(data)
Dim newStr = System.Text.Encoding.UTF8.GetString(newData)
Return newData
Else
Return data
End If
End Function
Private Shared Sub RemoveIndirectReferences(dict As PdfDictionary, refNumbers As IEnumerable(Of Integer))
Dim newDict As PdfDictionary
Dim arrayData As PdfArray
Dim indirect As PdfIndirectReference
Dim i As Integer
For Each key In dict.Keys
newDict = dict.GetAsDict(key)
arrayData = dict.GetAsArray(key)
If newDict IsNot Nothing Then
RemoveIndirectReferences(newDict, refNumbers)
ElseIf arrayData IsNot Nothing Then
i = 0
While i < arrayData.Size
indirect = arrayData.GetAsIndirectObject(i)
If refNumbers.Contains(indirect.Number) Then
arrayData.Remove(i)
Else
i += 1
End If
End While
End If
Next
End Sub
#Region "IDisposable Support"
Private disposedValue As Boolean ' To detect redundant calls
' IDisposable
Protected Overridable Sub Dispose(disposing As Boolean)
If Not Me.disposedValue Then
If disposing Then
RemoveLayers()
End If
' TODO: free unmanaged resources (unmanaged objects) and override Finalize() below.
' TODO: set large fields to null.
End If
Me.disposedValue = True
End Sub
' TODO: override Finalize() only if Dispose(ByVal disposing As Boolean) above has code to free unmanaged resources.
'Protected Overrides Sub Finalize()
' ' Do not change this code. Put cleanup code in Dispose(ByVal disposing As Boolean) above.
' Dispose(False)
' MyBase.Finalize()
'End Sub
' This code added by Visual Basic to correctly implement the disposable pattern.
Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region
End Class

Related

Is there a way to export QR code as image?

I was trying to export QR code as image inside C: folder. I'm using C# on visual studio and devExpress to create the QR code. I've already created the QR code but I don't know how to export it. Is there a way to do it? Hadn't tried much because I didn't find a lot. Thank you!
If you wish to generate barcodes using our WinForms BarCodeControl, use the BarCodeControl.ExportToImage(Stream, ImageFormat, Int32) method to export barcode as an image to a stream. after that convert a stream to byte array or a Base64 string.
If you prefer non-visual Barcode Generation API library to create barcodes, use the BarCode.Save(Stream, ImageFormat) method to save the barcode image to a stream in the specified format.Then, convert a stream to a byte array or a Base64 string.
This code belokngs to vb. Net and I'm not using devexpress
Add Zxing.net nuget package.
Maybe this code can help you or someone
Imports System.Data
Imports ZXing.Common
Imports ZXing
Imports ZXing.QrCode
Public Class Form1
Private options As QrCodeEncodingOptions
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
options = New QrCodeEncodingOptions With {
.DisableECI = True,
.CharacterSet = "UTF-8",
.Width = 50,
.Height = 50
}
Dim writer = New BarcodeWriter()
writer.Format = BarcodeFormat.QR_CODE
writer.Options = options
writer.Options.PureBarcode = True
End Sub
Private Sub btn_GenerateQRCode_Click(sender As Object, e As EventArgs) Handles btn_GenerateQRCode.Click
If String.IsNullOrWhiteSpace(TextBox1.Text) OrElse String.IsNullOrEmpty(TextBox1.Text) Then
PictureBox1.Image = Nothing
MessageBox.Show("Text not found", "Oops!", MessageBoxButtons.OK, MessageBoxIcon.[Error])
Else
Dim qr = New ZXing.BarcodeWriter()
qr.Options = options
qr.Format = ZXing.BarcodeFormat.QR_CODE
Dim result = New Bitmap(qr.Write(TextBox1.Text.Trim()))
PictureBox1.Image = result
TextBox1.Clear()
End If
End Sub
Private Sub btn_DecodeQRCode_Click(sender As Object, e As EventArgs) Handles btn_DecodeQRCode.Click
Try
Dim bitmap As Bitmap = New Bitmap(PictureBox1.Image)
Dim reader As BarcodeReader = New BarcodeReader With {
.AutoRotate = True,
.TryInverted = True
}
Dim result As Result = reader.Decode(bitmap)
Dim decoded As String = result.ToString().Trim()
TextBox1.Text = decoded
Catch ex As Exception
MessageBox.Show("Image not found", "Oops!", MessageBoxButtons.OK, MessageBoxIcon.[Error])
End Try
End Sub
Private Sub btn_BrowseQRImage_Click(sender As Object, e As EventArgs) Handles btn_BrowseQRImage.Click
Dim open As OpenFileDialog = New OpenFileDialog()
If open.ShowDialog() = System.Windows.Forms.DialogResult.OK Then
Dim qr = New ZXing.BarcodeWriter()
qr.Options = options
qr.Format = ZXing.BarcodeFormat.QR_CODE
PictureBox1.ImageLocation = open.FileName
End If
End Sub
Private Sub btnExportQRCodetoFolder_Click(sender As Object, e As EventArgs) Handles btnExportQRCodetoFolder.Click
If PictureBox1.Image Is Nothing Then
MessageBox.Show("Image not found", "Oops!", MessageBoxButtons.OK, MessageBoxIcon.[Error])
Else
Dim save As SaveFileDialog = New SaveFileDialog()
'save.CreatePrompt = True
'save.OverwritePrompt = True
'save.FileName = "MyQR"
save.Filter = "PNG|*.png|JPEG|*.jpg|BMP|*.bmp|GIF|*.gif"
If save.ShowDialog() = System.Windows.Forms.DialogResult.OK Then
PictureBox1.Image.Save(save.FileName)
save.InitialDirectory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop)
End If
End If
End Sub
End Class

editing html string having text and several images using regex

I am new to this forum and hoping to get some help.
I have a an HTML string having text and several base64 images.
I need to loop through all image tags adding a slash / before
the closing tag > so that each image ends with /> and return
a new html string with the changes.
so each
<IMG src="data:image/png;base64,iVBORw0KG....">
should then be
<IMG src="data:image/png;base64,iVBORw0KG...."/>
I am not versed with html and I am wondering how to do it
(using regex?).
Here is some pseudo code:
Function GetSourceImges(Sourcehtml As String) As List(Of String)
Dim listOfImgs As New List(Of String)()
'use regex to find image tags
'Return list of base64 image tags
End Function
For each image in list
insert a slash appropriately
next
Reconstitute a new html string with edited images
Thanks
Map all "IMG" tags using LINQ and use their indexes as an anchor to fix the missing "/" characters. please see my comments inside the code.
Sub Main()
Dim htmlstring As String = "<IMG src=""data:image/png;base64,iVBORw0KG....""> " & vbCrLf _
& "<img src=""data:image/png;base64,iVBORw0KG...."">" & vbCrLf _
& "<p>blahblah</p>" & vbCrLf _
& "<IMG src=""data:image/png;base64,iVBORw0KG...."">" & vbCrLf _
& "<p>blahblah</p>"
' find all indxes of img using regex and lambda exprations '
Dim indexofIMG() As Integer = Regex.Matches(htmlstring, "IMG", RegexOptions.IgnoreCase) _
.Cast(Of Match)().Select(Function(x) x.Index).ToArray()
' check from each index of "IMG" if "/" is missing '
For Each itm As Integer In indexofIMG
Dim counter As Integer = itm
While counter < htmlstring.Length - 1
If htmlstring(counter) = ">" Then
If htmlstring(counter - 1) <> "/" Then
' fix the missing "/" using Insert() method '
htmlstring = htmlstring.Insert(counter, "/")
End If
Exit While
End If
counter += 1
End While
Next
Console.WriteLine(htmlstring)
Console.ReadLine()
End Sub
Surprisingly it works with the console app but doesn't when I view it on a richtextbox as in btnEditHTML method below. The generated pdf has only one red dot and not two.
Can't say why.
I must say you have been very helpfull.
'SetTable and customimagetagprocessor borrowed from [here] iTextsharp base64 embedded image in header not parsing/showing
Imports System.IO
Imports iTextSharp.text
Imports iTextSharp.tool.xml
Imports iTextSharp.text.pdf
Imports iTextSharp.tool.xml.parser
Imports iTextSharp.tool.xml.pipeline.css
Imports iTextSharp.tool.xml.pipeline.html
Imports iTextSharp.tool.xml.pipeline.end
Imports iTextSharp.tool.xml.html
Imports System.Text.RegularExpressions
Public Class Form1
Dim dsktop As String = My.Computer.FileSystem.SpecialDirectories.Desktop
Public Function GetFormattedHTML(str As String) As String
'format images by changing > to />
' find all indxes of img using regex and lambda exprations '
Dim indexofIMG() As Integer = Regex.Matches(str.ToString, "IMG", RegexOptions.IgnoreCase) _
.Cast(Of Match)().Select(Function(x) x.Index).ToArray()
' check from each index of "IMG" if "/" is missing '
For Each itm As Integer In indexofIMG
Dim counter As Integer = itm
While counter < str.ToString.Length - 1
If str(counter) = ">" Then
If str(counter - 1) <> "/" Then
' fix the missing "/" using Insert() method '
str = str.ToString.Insert(counter, " /")
End If
Exit While
End If
counter += 1
End While
Next
Return str.ToString
End Function
Private Sub btnEditHTML_Click(sender As Object, e As EventArgs) Handles btnEditHTML.Click
Rtb.Text = String.Empty
'the 2 base64 images in the html below are actually just small red dots
Dim RawHTML As String = "<P>John Doe</P><IMG " &
"src=""data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwAEhQGAhKmMIQAAAABJRU5ErkJggg==""> Jackson5<IMG " &
"src=""data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwAEhQGAhKmMIQAAAABJRU5ErkJggg=="">"
Rtb.Text = GetFormattedHTML(RawHTML)
'notice that the 2nd base64 string is not edited as required.
End Sub
Private Sub btnGenerate_Click(sender As Object, e As EventArgs) Handles btnGenerate.Click
'here I create a 2 column itextsharp table to parse my html into the cells
Dim doc As New iTextSharp.text.Document(iTextSharp.text.PageSize.A4, 25, 25, 25, 30)
Dim wri As PdfWriter = PdfWriter.GetInstance(doc, New System.IO.FileStream(dsktop & "\testtable.pdf", System.IO.FileMode.Create))
doc.Open()
'set table columnwidths -------------------------------------------------------------
Dim MainTable As New PdfPTable(2) '2 column table
MainTable.WidthPercentage = 100
Dim Wth(1) As Single
Dim u As Integer = 2
For i As Integer = 0 To 1
Wth(i) = CInt(Math.Floor(2 * 500 / u))
Next
MainTable.SetWidths(Wth)
Dim htmlstr As String = GetFormattedHTML("<P>John Doe</P><IMG " &
"src=""data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwAEhQGAhKmMIQAAAABJRU5ErkJggg==""> Jackson5<IMG " &
"src=""data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwAEhQGAhKmMIQAAAABJRU5ErkJggg=="">")
Dim Elmts = New ElementList()
Elmts = XMLWorkerHelper.ParseToElementList(htmlstr, Nothing)
Dim MinorTable As New PdfPTable(1)
MinorTable = SetTable(Elmts, htmlstr)
For i = 1 To 2
Dim Cell As New PdfPCell
Cell.AddElement(MinorTable)
MainTable.AddCell(Cell)
Next
doc.Add(MainTable)
doc.Close()
Process.Start(dsktop & "\testtable.pdf")
End Sub
Public Function SetTable(ByVal elements As ElementList, ByVal htmlcode As String) As PdfPTable
Dim tagProcessors As DefaultTagProcessorFactory = CType(Tags.GetHtmlTagProcessorFactory(), DefaultTagProcessorFactory)
tagProcessors.RemoveProcessor(HTML.Tag.IMG) ' remove the default processor
tagProcessors.AddProcessor(HTML.Tag.IMG, New CustomImageTagProcessor()) ' use our new processor
Dim cssResolver As ICSSResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(True)
cssResolver.AddCssFile(Application.StartupPath & "\pdf.css", True)
'see sample css file at https://learnwebcode.com/how-to-create-your-first-css-file/
'Setup Fonts
Dim xmlFontProvider As XMLWorkerFontProvider = New XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS)
xmlFontProvider.RegisterDirectory(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "assets/fonts/"))
Dim cssAppliers As CssAppliers = New CssAppliersImpl(xmlFontProvider)
Dim htmlContext As HtmlPipelineContext = New HtmlPipelineContext(cssAppliers)
htmlContext.SetAcceptUnknown(True)
htmlContext.SetTagFactory(tagProcessors)
Dim pdf As ElementHandlerPipeline = New ElementHandlerPipeline(elements, Nothing)
Dim htmlp As HtmlPipeline = New HtmlPipeline(htmlContext, pdf)
Dim css As CssResolverPipeline = New CssResolverPipeline(cssResolver, htmlp)
Dim worker As XMLWorker = New XMLWorker(css, True)
Dim p As XMLParser = New XMLParser(worker)
'Dim holderTable As New PdfPTable({1})
Dim holderTable As PdfPTable = New PdfPTable({1})
holderTable.WidthPercentage = 100
holderTable.HorizontalAlignment = Element.ALIGN_LEFT
Dim holderCell As New PdfPCell()
holderCell.Padding = 0
holderCell.UseBorderPadding = False
holderCell.Border = 0
p.Parse(New MemoryStream(System.Text.Encoding.ASCII.GetBytes(htmlcode)))
For Each el As IElement In elements
holderCell.AddElement(el)
Next
holderTable.AddCell(holderCell)
'Dim holderRow As New PdfPRow({holderCell})
'holderTable.Rows.Add(holderRow)
Return holderTable
End Function
End Class
Public Class CustomImageTagProcessor
Inherits iTextSharp.tool.xml.html.Image
Public Overrides Function [End](ctx As IWorkerContext, tag As Tag, currentContent As IList(Of IElement)) As IList(Of IElement)
Dim attributes As IDictionary(Of String, String) = tag.Attributes
Dim src As String = String.Empty
If Not attributes.TryGetValue(iTextSharp.tool.xml.html.HTML.Attribute.SRC, src) Then
Return New List(Of IElement)(1)
End If
If String.IsNullOrEmpty(src) Then
Return New List(Of IElement)(1)
End If
If src.StartsWith("data:image/", StringComparison.InvariantCultureIgnoreCase) Then
' data:[<MIME-type>][;charset=<encoding>][;base64],<data>
Dim base64Data As String = src.Substring(src.IndexOf(",") + 1)
Dim imagedata As Byte() = Convert.FromBase64String(base64Data)
Dim image As iTextSharp.text.Image = iTextSharp.text.Image.GetInstance(imagedata)
Dim list As List(Of IElement) = New List(Of IElement)()
Dim htmlPipelineContext As pipeline.html.HtmlPipelineContext = GetHtmlPipelineContext(ctx)
list.Add(GetCssAppliers().Apply(New Chunk(DirectCast(GetCssAppliers().Apply(image, tag, htmlPipelineContext), iTextSharp.text.Image), 0, 0, True), tag, htmlPipelineContext))
Return list
Else
If File.Exists(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, src)) Then
Dim imagedata As Byte() = File.ReadAllBytes(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, src))
Dim image As iTextSharp.text.Image = iTextSharp.text.Image.GetInstance(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, src))
Dim list As List(Of IElement) = New List(Of IElement)()
Dim htmlPipelineContext As pipeline.html.HtmlPipelineContext = GetHtmlPipelineContext(ctx)
list.Add(GetCssAppliers().Apply(New Chunk(DirectCast(GetCssAppliers().Apply(image, tag, htmlPipelineContext), iTextSharp.text.Image), 0, 0, True), tag, htmlPipelineContext))
Return list
End If
Return MyBase.[End](ctx, tag, currentContent)
End If
End Function
End Class
I highly recommend just using AngleSharp to parse the HTML, edit the document if required, and save it again.
There are many posts on here about why trying to parse HTML with regular expressions is a bad idea.
var doc = new HtmlParser().Parse(html);
As you aren't actually changing the HTML content, just fixing up the tags, your should be able to just parse it and save it with no changes to fix the tags.

Generate PDF in using windows form

I want to generate a PDF using windows form in the desktop application. I have readymade pdf design and I just want to feed data from database in that blank section of pdf for each user. (One type of receipt). Please guide me. I have searched but most of the time there is the solution in asp.net for the web application. I want to do in the desktop app. Here is my code I am able to fatch data from database and print in pdf. But main problem is trhat I have already designed pdf and I want to place data exactly at same field (ie name, Amount, date etc.)
using System;
using System.Windows.Forms;
using System.Diagnostics;
using PdfSharp;
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using System.Data.SqlClient;
using System.Data;
using System.Configuration;
namespace printPDF
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click_1(object sender, EventArgs e)
{
try
{
string connetionString = null;
SqlConnection connection ;
SqlCommand command ;
SqlDataAdapter adapter = new SqlDataAdapter();
DataSet ds = new DataSet();
int i = 0;
string sql = null;
int yPoint = 0;
string pubname = null;
string city = null;
string state = null;
connetionString = "Data Source=EEVO-SALMAN\\MY_PC;Initial Catalog=;User ID=s***;Password=******";
// var connectionString = ConfigurationManager.ConnectionStrings["CharityManagement"].ConnectionString;
sql = "select NAME,NAME,uid from tblumaster";
connection = new SqlConnection(connetionString);
connection.Open();
command = new SqlCommand(sql, connection);
adapter.SelectCommand = command;
adapter.Fill(ds);
connection.Close();
PdfDocument pdf = new PdfDocument();
pdf.Info.Title = "Database to PDF";
PdfPage pdfPage = pdf.AddPage();
XGraphics graph = XGraphics.FromPdfPage(pdfPage);
XFont font = new XFont("Verdana", 20, XFontStyle.Regular );
yPoint = yPoint + 100;
for (i = 0; i <=ds.Tables[0].Rows.Count-1; i++)
{
pubname = ds.Tables[0].Rows[i].ItemArray[0].ToString ();
city = ds.Tables[0].Rows[i].ItemArray[1].ToString();
state = ds.Tables[0].Rows[i].ItemArray[2].ToString();
graph.DrawString(pubname, font, XBrushes.Black, new XRect(10, yPoint, pdfPage.Width.Point, pdfPage.Height.Point), XStringFormats.TopLeft);
graph.DrawString(city, font, XBrushes.Black, new XRect(200, yPoint, pdfPage.Width.Point, pdfPage.Height.Point), XStringFormats.TopLeft);
graph.DrawString(state, font, XBrushes.Black, new XRect(400, yPoint, pdfPage.Width.Point, pdfPage.Height.Point), XStringFormats.TopLeft);
yPoint = yPoint + 40;
}
string pdfFilename = "dbtopdf.pdf";
pdf.Save(pdfFilename);
Process.Start(pdfFilename);
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
}
}
Instead of modifying the document, please create a new document and copy the pages from the old document to new document
sample code can be found here,
http://forum.pdfsharp.net/viewtopic.php?p=2637#p2637
Because modifying pdf is not recommended using 'PdfSharp' library. if you still want to edit you can use 'ISharp' library which needs a license.
Here is some VB.Net code I use to fill PDF forms. You need a PDF fillable form with form control names matching the SQL record field names.
It calls a routine Gen.GetDataTable() that just builds a typical DataTable. You could re-code to accept a pre-built Datatable as a parameter. Only the top row is processed. The code can be modified to work with a DataRow (.Table.Columns for column reference) or a DataReader.
Public Function FillPDFFormSQL(pdfMasterPath As String, pdfFinalPath As String, SQL As String, Optional FlattenForm As Boolean = True, Optional PrintPDF As Boolean = False, Optional PrinterName As String = "", Optional AllowMissingFields As Boolean = False) As Boolean
' case matters SQL <-> PDF Form Field Names
Dim pdfFormFields As AcroFields
Dim pdfReader As PdfReader
Dim pdfStamper As PdfStamper
Dim s As String = ""
Try
If pdfFinalPath = "" Then pdfFinalPath = pdfMasterPath.Replace(".pdf", "_Out.pdf")
Dim newFile As String = pdfFinalPath
pdfReader = New PdfReader(pdfMasterPath)
pdfStamper = New PdfStamper(pdfReader, New FileStream(newFile, FileMode.Create))
pdfReader.Close()
pdfFormFields = pdfStamper.AcroFields
Dim dt As DataTable = Gen.GetDataTable(SQL)
For i As Integer = 0 To dt.Columns.Count - 1
s = dt.Columns(i).ColumnName
If AllowMissingFields Then
If pdfFormFields.Fields.ContainsKey(s) Then
pdfFormFields.SetField(s, dt.Rows(0)(i).ToString.Trim)
Else
Debug.WriteLine($"Missing PDF Field: {s}")
End If
Else
pdfFormFields.SetField(s, dt.Rows(0)(i).ToString.Trim)
End If
Next
' flatten the form to remove editing options
' set it to false to leave the form open for subsequent manual edits
If My.Computer.Keyboard.CtrlKeyDown Then
pdfStamper.FormFlattening = False
Else
pdfStamper.FormFlattening = FlattenForm
End If
pdfStamper.Close()
If Not newFile.Contains("""") Then newFile = """" & newFile & """"
If Not PrintPDF Then
Process.Start(newFile)
Else
Dim sPDFProgramPath As String = INI.GetValue("OISForms", "PDFProgramPath", "C:\Program Files (x86)\Foxit Software\Foxit PhantomPDF\FoxitPhantomPDF.exe")
If Not IO.File.Exists(sPDFProgramPath) Then MsgBox("PDF EXE not found:" & vbNewLine & sPDFProgramPath) : Exit Function
If PrinterName.Length > 0 Then
Process.Start(sPDFProgramPath, "/t " & newFile & " " & PrinterName)
Else
Process.Start(sPDFProgramPath, "/p " & newFile)
End If
End If
Return True
Catch ex As Exception
MsgBox(ex.Message)
Return False
Finally
pdfStamper = Nothing
pdfReader = Nothing
End Try
End Function

How do I remove link annotations from a PDF using iText?

Here i want to remove Annotation(Link, Text, ..) from PDF permanently using iTextSharp.
Already i have tried
AnnotationDictionary.Remove(PdfName.LINK);
But that Link annotations exist in that PDF.
Note:
I want remove particular selected Annotations(Link, Text, ..),
For Example i want remove Link Annotation with the URI as www.google.com, remaining Link Annotations i want to be retain as per exist.
I got the answer for my question.
Sample Code:
//Get the current page
PageDictionary = R.GetPageN(i);
//Get all of the annotations for the current page
Annots = PageDictionary.GetAsArray(PdfName.ANNOTS);
foreach (PdfObject A in Annots.ArrayList)
{
//code to check the annotation
//remove the annotation
Annots.Remove(int idx);
}
Dim pdfReader As New PdfReader(fileloc)
For i = 1 To pdfReader.NumberOfPages
Dim pageDict As PdfDictionary = pdfReader.GetPageN(i)
Dim annots As PdfArray = pageDict.GetAsArray(PdfName.ANNOTS)
Dim newAnnots As PdfArray = New PdfArray()
If annots IsNot Nothing Then
For j As Integer = 0 To annots.Size() - 1
Dim annotDict As PdfDictionary = annots.GetAsDict(i)
If Not PdfName.LINK.Equals(annotDict.GetAsName(PdfName.SUBTYPE)) Then
newAnnots.Add(annots.GetAsDict(j))
End If
Next
pageDict.Put(PdfName.ANNOTS, newAnnots)
End If
Next
Dim pdfStamper As PdfStamper = Nothing
Dim extension = Path.GetExtension(fileloc)
Dim filename As String = Path.GetFileNameWithoutExtension(fileloc)
Dim filePath As String = Path.GetDirectoryName(fileloc)
fileloc = filePath + "\" + filename + "new" + extension
pdfStamper = New PdfStamper(pdfReader, New FileStream(fileloc, FileMode.Create))
pdfStamper.FormFlattening = False
pdfStamper.Close()
pdfReader.Close()
End Sub```

Converting a file read VB code to c#

I need to convert a piece of code from VB to C#. what should I use in place of FileSystemObject and TextStream?
what the below code does is that it reads a file already present in a directory and adds the content of the file to the fields.
Private Sub Read_abc_File()
Dim FileSystem As FileSystemObject
Dim abcFile As TextStream
Dim abcLine As String, abcSection As String
Dim abcFilename As String
Const Read As Integer = 1
abcFilename = "abc.txt"
Set FileSystem = New FileSystemObject
If Not FileSystem.FileExists(abcFilename) Then
FileSystem = Null
Exit Sub
End If
Set abcFile = FileSystem.OpenTextFile(abcFilename, Read, False)
Do While abcFile.AtEndOfStream <> True
abcLine = abcFile.ReadLine
If abcLine > " " Then
If Left$(abcLine, 1) = "[" Then
abcSection = abcLine
Else
Select Case abcSection
Case "[Datafiles]"
DataFilename.AddItem abcLine
Case "[Locations]"
Location.AddItem abcLine
Case "[Formats]"
Format.AddItem abcLine
Case "[Categories]"
Category.AddItem abcLine
End Select
End If
End If
Loop
abcFile.Close
Set abcFile = Nothing
Set FileSystem = Nothing
End Sub
any suggestions/answers are appreciated.
Thanks!
Heres a code snippet to get you started, i think you should be able to complete the job.
using System;
using System.IO;
static void Main(string[] args)
{
string fileName = "abc.txt";
if (!File.Exists(fileName))
return;
using (FileStream file = File.OpenRead(fileName))
using (StreamReader reader = new StreamReader(file))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
}
}
}

Categories