Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I convert PDF file to the word file using PDFFocus.net dll. But for my system I want .docx file. I tried different ways. There some libraries available. But those are not free. This is my pdf to doc convert code.
Using System;
Using System.Collections.Generic;
Using System.Linq;
Using System.Text;
Using System.Threading.Tasks;
Using iTextSharp.text;
Using iTextSharp.text.pdf;
namespace ConsoleApplication
{
class Program
{
static void main(String[] args)
{
SautinSoft.PdfFocus f=new SautinSoft.PdfFocus();
f.OpenPdf(#"E:\input.pdf");
t.ToWord(#"E:\input.doc");
}
}
}
This work successfully.
Then I tried with below code to convert .doc to .docx. But it gives me error.
//Open a Document.
Document doc=new Document("input.doc");
//Save Document.
doc.save("output.docx");
Can anyone help me please.
Yes like Erop said. You can use the Microsoft Word 14.0 Object Library. Then it's really easy to convert from doc to docx. E.g with a function like this:
public void ConvertDocToDocx(string path)
{
Application word = new Application();
if (path.ToLower().EndsWith(".doc"))
{
var sourceFile = new FileInfo(path);
var document = word.Documents.Open(sourceFile.FullName);
string newFileName = sourceFile.FullName.Replace(".doc", ".docx");
document.SaveAs2(newFileName,WdSaveFormat.wdFormatXMLDocument,
CompatibilityMode: WdCompatibilityMode.wdWord2010);
word.ActiveDocument.Close();
word.Quit();
File.Delete(path);
}
}
Make sure to add CompatibilityMode: WdCompatibilityMode.wdWord2010 otherwise the file will stay in compatibility mode. And also make sure that Microsoft Office is installed on the machine where you want to run the application.
Another thing, I don't know PDFFocus.net but have you tried converting directly from pdf to docx. Like this:
static void main(String[] args)
{
SautinSoft.PdfFocus f=new SautinSoft.PdfFocus();
f.OpenPdf(#"E:\input.pdf");
t.ToWord(#"E:\input.docx");
}
I would assume that this is working, but it's only an assumption.
Try to use Microsoft.Office.Interop.Word assembly.
An MSDN article can be found Here
Include references in your project, and enable their use in a code module via an example from the above link that shows
using System.Collections.Generic;
using Word = Microsoft.Office.Interop.Word;
Related
I have to copy selected text from activedocument to new file (at the end of target file). Both (source and target) are docx files. The source file is opened in Word and the user is working with it.
I would like to copy the selection without opening the target file as Microsoft.Office.Interop.Word.Document and copy-paste (for performance reasons).
I don't know how to change the "Selection" in open document to xml understandable by DocumentOpenXml and how to inject this xml into target file.
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml.Packaging;
using Range = Microsoft.Office.Interop.Word.Range;
public void RangeToNewDocument(string documentPath, Range range)
{
string selectedXML = range.WordOpenXML; //??????????
using (WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
Body body = doc.MainDocumentPart.Document.Body;
//body.Append(selectedXML); ??????
doc.SaveAs(documentPath + ".RangeToNewDocumentTest.docx");
}
}
Many example codes are for "How to add something" but there are new objects (as 'Runs' or all 'Paragraphs') but I couldn't find anything about an existing object.
The only link I found is:
https://learn.microsoft.com/en-us/office/open-xml/how-to-copy-the-contents-of-an-open-xml-package-part-to-a-document-part-in-a-dif
but there replace one ThemePart with another and I have no idea how to adjust it for me.
I know this question has probably been answered multiple times across this site, but even after looking at those solutions I don't have an answer to why my program isn't writing to the text file I have assigned. Here is the code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace Main
{
class Program
{
public static void Main (string[] args)
{
using (StreamWriter writer = new StreamWriter("test.txt"))
{
writer.WriteLine("Hello, World!");
}
}
}
}
My program throws no exceptions and exits with 0, so I am not understanding how this is still not functioning properly. If someone could please provide an answer along with an explanation as to why this doesn't work, I would really appreciate it.
EDIT: Okay, I fixed the code after playing around a bit. Turns out that upon reading the file the text that I wrote is there. Thus, a clarification of this problem would be:
The text writes to the file, but it is not visible to me when I open up the file from my project in Visual Studio. I am not sure as to why, and this is leading to confusion
Your sample is correct. The file is saved into build path. (where is "yourApp.exe").
You can try with a abosule path to define where file will be save, for example StreamWriter writer = new StreamWriter(#"c:\test\test.txt")
I'm trying to embed an XML file into a C# console application via Right clicking on file -> Build Action -> Embedded Resource.
How do I then access this embedded resource?
XDocument XMLDoc = XDocument.Load(???);
Edit: Hi all, despite all the bashing this question received, here's an update.
I managed to get it working by using
XDocument.Load(new System.IO.StreamReader(System.Reflection.Assembly.GetExecutingAssembly().GetManifestResourceStream("Namespace.FolderName.FileName.Extension")))
It didn't work previously because the folder name containing the resource file within the project was not included (none of the examples I found seemed to have that).
Thanks everyone who tried to help.
Something along these lines
using System.IO;
using System.Reflection;
using System.Xml;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var stream = Assembly.GetExecutingAssembly().GetManifestResourceStream("ConsoleApplication1.XMLFile1.xml");
StreamReader reader = new StreamReader(stream);
XmlDocument doc = new XmlDocument();
doc.LoadXml(reader.ReadToEnd());
}
}
}
Here is a link to the Microsoft doc that describes how to do it. http://support.microsoft.com/kb/319292
I'm currently trying to find a PDF library which will run without a running X server. I have already tried the following...
Migradoc/PDFSharp (requires X)
ITextSharp (requires X)
SharpPDF (might work, but I am looking for something with a bit more features)
The library does not have to be opensource or free.
My solution runs on Apache2.2 mod_mono.
Does anyone know of such library?
--- edit ---
The test code used for itextsharp, which produces errors on my testserver is listed below (the code for Migradoc and SharpPDF is just as simple):
using System;
using sharp=iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.xml;
using System.IO;
namespace pdftester
{
public static class ITextSharpTest
{
public static void HelloWorld(string filename)
{
Stream stream = new FileStream(filename, FileMode.Create);
sharp.Document document = new sharp.Document();
PdfWriter.GetInstance(document, stream);
document.Open();
document.Add(new sharp.Paragraph("Hello world"));
document.Close();
}
}
}
Since no one has given a definitive answer to the thread, i'm closing it.
I've chosen to go the sharpPDF way, as it's the only one supported on my server. I'll simply have to implement what's needed for my project.
Thanks for the help received so far :)
HI All,
I have a PDF file with a xml attached, i need to parse the xml file. Does anyone knows how i do that?
I´m using C#.
Thanks in advance.
I believe this blog post describing how read from a PDF file using C# is what you want.
This is the example he gives of grabbing text from the PDF:
using System;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
namespace PDFReader
{
class Program
{
static void Main(string[] args)
{
PDDocument doc = PDDocument.load("lopreacamasa.pdf");
PDFTextStripper pdfStripper = new PDFTextStripper();
Console.Write(pdfStripper.getText(doc));
}
}
}
Here is what looks like an exhaustive and highly organized list of how to read PDFs with C#.
If what you need is some form of embedded meta data, as Mark suggested, I'm sure it's also possible with the to fetch using the tools I've linked to.
Try using LINQ to XML as suggested in this question.
PDF files can have a meta data information object or is it an XML file embedded as an object?