Chinese characters instead of text into the metadata called "Producer" - c#

I have a problem when I edit the metadata of a pdf with iTextSharp.
I save a word document in pdf with Word. The field called "Producer" is filled by word with the text "Microsoft Word 210". After, I edit the metadata with ITextSharp and iTextSharp tries to edit this field in order to add the text "modified using iTextSharp 4.1.6".
The result is Producer(þÿMicrosoft® Word 2010; modified using iTextSharp 4.1.6 by 1T3XT). In adobe reader, the field PDF Producer in document properties shows chinese characters.
Adobe can read the field if I remove manually the characters þÿ.
Do you know why I have this problem ?
What can I do to solve this problem ?

Just for reference, this works with iText 2.1.7. It is Java code, but probably works too for C#.
import java.io.File;
import java.io.FileOutputStream;
import org.junit.Test;
import com.lowagie.text.pdf.PdfDictionary;
import com.lowagie.text.pdf.PdfName;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfStamper;
import com.lowagie.text.pdf.PdfString;
public class AppTest {
#Test
public void testApp() throws Exception {
PdfReader reader = new PdfReader(AppTest.class.getResourceAsStream("/msword2010.pdf"));
FileOutputStream fos = new FileOutputStream(new File("target", "modified_msword2010.pdf"));
PdfStamper stamper = new PdfStamper(reader, fos, '\0', true);
PdfDictionary infoDict = stamper.getReader().getTrailer().getAsDict(PdfName.INFO);
String producerCleaned = null;
if (infoDict != null) {
PdfString producer = (PdfString) infoDict.get(PdfName.PRODUCER);
if (producer != null) {
producerCleaned = producer.toUnicodeString();
PdfString cleanStrObj = new PdfString(producerCleaned);
infoDict.put(PdfName.PRODUCER, cleanStrObj);
}
}
stamper.close();
}
}

Related

How to insert barcode into PDF created by IText7 HtmlConverter.ConvertToDocument method

I'm using IText 7 to convert my HTML file into PDF and auto download the file when users click a button.
Currently I'm planning to insert a barcode using IText7 into the PDF file, however I encountered some error. Btw, it works fine without the 'barcode code'.
This is the error : 'iText.Kernel.PdfException: 'Pdf indirect object belongs to other PDF document. Copy object to current pdf document.''
How can I add a barcode at the end of my pdf file?
public MemoryStream GetCovidFormPdfByAccessionNumber(string htmlFile, string accessionNumber)
{
var workStream = new MemoryStream();
using (var pdfWriter = new PdfWriter(workStream))
{
pdfWriter.SetCloseStream(false);
var pdfDoc = new PdfDocument(pdfWriter);
using (var document = HtmlConverter.ConvertToDocument(htmlFile, pdfWriter))
{
document.Add(CreateBarcode(accessionNumber, pdfDoc));
}
}
workStream.Position = 0;
return workStream;
}
private static Image CreateBarcode(string code, PdfDocument pdfDoc)
{
Barcode39 barcode = new Barcode39(pdfDoc);
barcode.SetCode(code);
//Create barcode object to put it to the cell as image
PdfFormXObject barcodeObject = barcode.CreateFormXObject(ColorConstants.BLACK, ColorConstants.BLACK, pdfDoc);
var image = new Image(barcodeObject);
image.SetWidth(250);
return image;
}

OpenXml to remove watermarks from Word, Excel and Powerpoint

I'm newer using OpenXml, I'm developing a window application in C# and I need to remove the watermark (at run-time) from the selected word, excel or powerpoint file. The watermark has been added manually by the user (don't ask me why he could not remove it manually... it's a customer request...).
I have been created an "empty" file docx (Hello Word! in the body) and with a watermark "DRAFT". I have been implemented an example of simple application to remove it using the code used in this topic (code 1): Removing watermark in word with OpenXml & C# corrupts document
but the application returns a System.ArgumentOutOfRangeException.
The code is the following:
public Form1()
{
InitializeComponent();
string document = #"D:\Work\EsempioFiligrana\doc1.docx";
// Open the file in editable mode.
using (WordprocessingDocument wordprocessingDocument =
WordprocessingDocument.Open(document, true))
{
DeleteCustomWatermark(wordprocessingDocument, "DRAFT");
}
}
private static void DeleteCustomWatermark(WordprocessingDocument package, string watermarkId)
{
MainDocumentPart maindoc = package.MainDocumentPart;
if (maindoc != null)
{
var headers = maindoc.GetPartsOfType<HeaderPart>();
if (headers != null)
{
var head = headers.First(); //we are sure that this header part contains the Watermark with id=watermarkId
var watermark = head.GetPartById(watermarkId); \\ !! This statement generates the exception !!
if (watermark != null)
head.DeletePart(watermark);
}
}
}
What's wrong? What can I do to remove the watermark from the document?
Thanks

Convert a Word (DOCX) file to a PDF in C# on cloud environment

I have generated a word file using Open Xml and I need to send it as attachment in a email with pdf format but I cannot save any physical pdf or word file on disk because I develop my application in cloud environment(CRM online).
I found only way is "Aspose Word to .Net".
http://www.aspose.com/docs/display/wordsnet/How+to++Convert+a+Document+to+a+Byte+Array But it is too expensive.
Then I found a solution is to convert word to html, then convert html to pdf. But there is a picture in my word. And I cannot resolve the issue.
The most accurate conversion from DOCX to PDF is going to be through Word. Your best option for that is setting up a server with OWAS (Office Web Apps Server) and doing your conversion through that.
You'll need to set up a WOPI endpoint on your application server and call:
/wv/WordViewer/request.pdf?WOPISrc={WopiUrl}&type=downloadpdf
OR
/wv/WordViewer/request.pdf?WOPISrc={WopiUrl}&type=printpdf
Alternatively you could try and do it using OneDrive and Word Online, but you'll need to work out the parameters Word Online uses as well as whether that's permitted within the Ts & Cs.
You can try Gnostice XtremeDocumentStudio .NET.
Converting From DOCX To PDF Using XtremeDocumentStudio .NET
http://www.gnostice.com/goto.asp?id=24900&t=convert_docx_to_pdf_using_xdoc.net
In the published article, conversion has been demonstrated to save to a physical file. You can use documentConverter.ConvertToStream method to convert a document to a Stream as shown below in the code snippet.
DocumentConverter documentConverter = new DocumentConverter();
// input can be a FilePath, Stream, list of FilePaths or list of Streams
Object input = "InputDocument.docx";
string outputFileFormat = "pdf";
ConversionMode conversionMode = ConversionMode.ConvertToSeperateFiles;
List<Stream> outputStreams = documentConverter.ConvertToStream(input, outputFileFormat, conversionMode);
Disclaimer: I work for Gnostice.
If you wanna convert bytes array, then to use Metamorphosis:
string docxPath = #"example.docx";
string pdfPath = Path.ChangeExtension(docxPath, ".pdf");
byte[] docx = File.ReadAllBytes(docxPath);
// Convert DOCX to PDF in memory
byte[] pdf = p.DocxToPdfConvertByte(docx);
if (pdf != null)
{
// Save the PDF document to a file for a viewing purpose.
File.WriteAllBytes(pdfPath, pdf);
System.Diagnostics.Process.Start(pdfPath);
}
else
{
System.Console.WriteLine("Conversion failed!");
Console.ReadLine();
}
I have recently used SautinSoft 'Document .Net' library to convert docx to pdf in my React(frontend), .NET core(micro services- backend) application. It only take 15 seconds to generate a pdf having 23 pages. This 15 seconds includes getting data from database, then merging data with docx template and then converting it to pdf. The code has deployed to azure Linux box and works fine.
https://sautinsoft.com/products/document/
Sample code
public string GeneratePDF(PDFDocumentModel document)
{
byte[] output = null;
using (var outputStream = new MemoryStream())
{
// Create single pdf.
DocumentCore singlePDF = new DocumentCore();
var documentCores = new List<DocumentCore>();
foreach (var section in document.Sections)
{
documentCores.Add(GenerateDocument(section));
}
foreach (var dc in documentCores)
{
// Create import session.
ImportSession session = new ImportSession(dc, singlePDF, StyleImportingMode.KeepSourceFormatting);
// Loop through all sections in the source document.
foreach (Section sourceSection in dc.Sections)
{
// Because we are copying a section from one document to another,
// it is required to import the Section into the destination document.
// This adjusts any document-specific references to styles, bookmarks, etc.
// Importing a element creates a copy of the original element, but the copy
// is ready to be inserted into the destination document.
Section importedSection = singlePDF.Import<Section>(sourceSection, true, session);
// First section start from new page.
if (dc.Sections.IndexOf(sourceSection) == 0)
importedSection.PageSetup.SectionStart = SectionStart.NewPage;
// Now the new section can be appended to the destination document.
singlePDF.Sections.Add(importedSection);
//Paging
HeaderFooter footer = new HeaderFooter(singlePDF, HeaderFooterType.FooterDefault);
// Create a new paragraph to insert a page numbering.
// So that, our page numbering looks as: Page N of M.
Paragraph par = new Paragraph(singlePDF);
par.ParagraphFormat.Alignment = HorizontalAlignment.Center;
CharacterFormat cf = new CharacterFormat() { FontName = "Consolas", Size = 11.0 };
par.Content.Start.Insert("Page ", cf.Clone());
// Page numbering is a Field.
Field fPage = new Field(singlePDF, FieldType.Page);
fPage.CharacterFormat = cf.Clone();
par.Content.End.Insert(fPage.Content);
par.Content.End.Insert(" of ", cf.Clone());
Field fPages = new Field(singlePDF, FieldType.NumPages);
fPages.CharacterFormat = cf.Clone();
par.Content.End.Insert(fPages.Content);
footer.Blocks.Add(par);
importedSection.HeadersFooters.Add(footer);
}
}
var pdfOptions = new PdfSaveOptions();
pdfOptions.Compression = false;
pdfOptions.EmbedAllFonts = false;
pdfOptions.EmbeddedImagesFormat = PdfSaveOptions.EmbImagesFormat.Png;
pdfOptions.EmbeddedJpegQuality = 100;
//dont allow editing after population, also ensures content can be printed.
pdfOptions.PreserveFormFields = false;
pdfOptions.PreserveContentControls = false;
if (!string.IsNullOrEmpty(document.PdfProperties.Title))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Title] = document.PdfProperties.Title;
}
if (!string.IsNullOrEmpty(document.PdfProperties.Author))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Author] = document.PdfProperties.Author;
}
if (!string.IsNullOrEmpty(document.PdfProperties.Subject))
{
singlePDF.Document.Properties.BuiltIn[BuiltInDocumentProperty.Subject] = document.PdfProperties.Subject;
}
singlePDF.Save(outputStream, pdfOptions);
output = outputStream.ToArray();
}
return Convert.ToBase64String(output);
}

How can I remove image properties such as local path that Adobe Illustrator has been embedded to PDF file?

I'm trying to replace image in PDF file using iTextSharp(not a java version). It works fine but there only the problem is when I open that PDF with Adobe Illustrator it's always opened with the old hard link. It means Abode Illustrator always view the old image before replace. And a little weird here that it view fine with Adobe Reader(the replaced image can be viewed).
This is the snip code that I've tried:
public static void ReplaceImage(string pdfIn, string imagePath, string pdfOut)
{
PdfReader reader = new PdfReader(pdfIn);
PdfStamper stamper = new PdfStamper(reader, new FileStream(pdfOut, FileMode.Create));
PdfWriter writer = stamper.Writer;
Image img = Image.GetInstance(SysDrawing.Image.FromFile(imagePath), ImageFormat.Tiff);
PdfDictionary page = reader.GetPageN(1);
PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
PdfDictionary xobject = resources.GetAsDict(PdfName.XOBJECT);
PdfDictionary properties = resources.GetAsDict(PdfName.PROPERTIES);
PdfDictionary procset = resources.GetAsDict(PdfName.PROCSET);
if (xobject != null)
{
List<PdfName> imgs = new List<PdfName>();
foreach (var ele in xobject.Keys)
{
PdfIndirectReference iref = xobject.GetAsIndirectObject(ele);
imgs.Add(ele);
if (iref.IsIndirect())
{
try
{
PdfDictionary pg = (PdfDictionary)PdfReader.GetPdfObject(iref);
if (pg != null)
{
PdfReader.KillIndirect(iref);
if (PdfName.IMAGE.Equals(SubType))
{
if (img.ImageMask != null)
writer.AddDirectImageSimple(img.ImageMask);
writer.AddDirectImageSimple(img, iref);
}
}
else
{
PdfReader.KillIndirect(iref);
writer.AddDirectImageSimple(img, iref);
}
}
catch {
continue;
}
}
}
}
//stamper.SetFullCompression();
stamper.Close();
stamper.Dispose();
reader.RemoveUnusedObjects();
reader.RemoveAnnotations();
reader.RemoveFields();
reader.Close();
reader.Dispose();
}
Any answer would be appreciated!
Your PDF contains two different documents: one described using PDF syntax and one described using Adobe Illustrator syntax. These two different documents should look identical, but as you have changed the PDF version of the document, they no longer do.
You perceive the document as only one document, because the AI document is stored inside the PDF document. In another question on SO, mkl explains the mechanism: Insert hidden digest in pdf using iText library
In his answer, mkl explains how to add hidden data (in this case a hidden digest, in your case the document in IA format) into a PDF.
You can remove this second document like this:
PdfDictionary catalog = reader.getCatalog();
catalog.remove(PdfName.PIECEINFO);
Of course, this throws away the Adobe Illustrator entirely, so you won't be able to edit the PDF in Adobe Illustrator anymore. If you want the image to change in the AI syntax, you need a library that is able to change AI syntax (and I don't know of any such library).

Export data to word in c#

Below is the code that creates a pdf to write a file..Every time i call the below code it creates a pdf file to write into..My question is,is there a same method for exporting to word or for simplicity just creates a blank doc file so that i can export data into it..
public void showPDf() {
iTextSharp.text.Document doc = new iTextSharp.text.Document(
iTextSharp.text.PageSize.A4);
string combined = Path.Combine(txtPath.Text,".pdf");
PdfWriter pw = PdfWriter.GetInstance(doc, new FileStream(combined, FileMode.Create));
doc.Open();
}
1. Interop API
It is available in Namespace Microsoft.Office.Interop.Word.
You can use Word Interop COM API to do that using following code,
// Open a doc file.
Application application = new Application();
Document document = application.Documents.Open("C:\\word.doc");
// Loop through all words in the document.
int count = document.Words.Count;
for (int i = 1; i <= count; i++)
{
// Write the word.
string text = document.Words[i].Text;
Console.WriteLine("Word {0} = {1}", i, text);
}
// Close word.
application.Quit();
Only Drawback is you must have office installed to use this feature.
2. OpenXML
you can use openxml to build word documents, try the following link,
http://msdn.microsoft.com/en-us/library/bb264572(v=office.12).aspx
Did you try searching the web for this ?
How to automate Microsoft Word to create a new document by using Visual C#
There is a free solution to export data to word,
http://www.codeproject.com/Articles/151789/Export-Data-to-Excel-Word-PDF-without-Automation-f

Categories