i want to develop such an application through which i can read book ,currently i am using the Richtextbox in flow document,i dont want to use the scroll ,i prefer the navigation style i.e prev page next page start and end ,book may contains images tables so and so ,and how do i import books in my application
How can i achieve?
Regards,
Aamir
You want to look at the FlowDocument, which is meant for documents with pages.
You'll need to write code to extract the text from the word or pdf document.
You can get out the text from a Word document using Word automation.
For pdf files you can possibly use the iTextSharp library:
http://itextsharp.sourceforge.net/
For other formats you might be able to use the source of FBReader as a sample:
http://www.fbreader.org/downloads.php
Related
I am trying to make a word document in an asp.Net MVC application using OpenXML template document .
The main challenges for me are
How can i create a word document as an OpenXML template? In my word
document i have some paragraphs of texts and in every paragraphs i
have to fill information from data base like in word file there are
instances of text like
etc and these should be filled with actual data. But
i dont know how can i convert a normal word document as a OpenXML
template file .
How can i fill the values with data from db ? If i have a model say WordModel in hand with filled values of properties FirstName TotalAmount AmountUnit TotalCopies etc then how can i fill the details to template and allow user to download the file ?
What you want to do is called Mail Merge and means populating word documents that act as templates with your application data. There are some existing (commercial) solutions out there, but if you want to do it yourself using Open XML SDK, you need to set up some kind of tagging mechanism that you will use to tag certain parts of document where you want the data/text from the database should be placed. For Word documents you have the following options: Bookmarks, Content Controls, Merge Fields, or special text (<% FirstName %>). I would personally go with Content Controls as they offer the best user experience and they are pretty easy to parse and replace. So the templates would be ordinary word documents containing these tags and then you could use Open XML SDK to parse these templates, search the tags in them and replace them with your application data according to the tag's name/code/title. This a very abstract, high level picture of a mail merge processor. Of course system like this is not easy to implement and also note that using Open XML requires some learning. There was a similar question answered, but you can probably find many more - just google.
I have a fairly simple task: I need to read a PDF file and write out its image contents while ignoring its text contents. So essentially I need to do the complement of "save as text".
Ideally, I would prefer to avoid any sort of re-compression of the image contents but if it's not possible, it's ok too.
Are the examples of how to do it?
Thanks!
Extracting text from a PDF file with PDFsharp is not a simple task.
It was discussed recently in this thread:
https://stackoverflow.com/a/9161732/162529
Extracting text from a PDF with PdfSharp can actually be very easy, depending on the document type and what you intend to do with it. If the text is in the document as text, and not an image, and you don't care about the position or format, then it's quite simple. This code gets all of the text of the first page in the PDFs I'm working with:
var doc = PdfReader.Open(docPath);
string pageText = doc.Pages[0].Contents.Elements.GetDictionary(0).Stream.ToString();
doc.Pages.Count gives you the total number of pages, and you access each one through the doc.Pages array with the index. I don't recommend using foreach and Linq here, as the interfaces aren't implemented well. The index passed into GetDictionary is for which PDF document element - this may vary based on how the documents are produced. If you don't get the text you're looking for, try looping through all of the elements.
The text that this produces will be full of various PDF formatting codes. If all you need to do is extract strings, though, you can find the ones you want using Regex or any other appropriate string searching code. If you need to do anything with the formatting or positioning, then good luck - from what I can tell, you'll need it.
Example of PDFSharp libraries extracting images from .pdf file:
link
library
EDIT:
Then if you want to extract text from image you have to use OCR libraries.
There are two good OCRs tessnet and MODI
Link to thread on stack
But I fully can recommend MODI which I am using now. Some sample # codeproject.
EDIT 2 :
If you don't want to read text from extracted images, you should write new PDF document and put all of them into it. For writing PDFs I use MigraDoc. It is not difficult to use that library.
I'm currently using Aspose PDF Kit to split a 'master PDF' up into individual documents + thumbnails. This works well at the moment, but the device I'll be rendering the PDF on won't know about the annotations/links within the PDF.
I understand there is a way to parse the PDF document to detect the X/Y position of a hyperlink etc, is there an simple way to extract/iterate across the document data so I can write it to an external XML file?
You may want to try Docotic.Pdf library for this (disclaimer: I work for Bit Miracle).
The library can be used to retrieve all hyperlinks in a document. You may retrieve bounding box, text and other properties of a link, too.
Please take a look at "Extract text from link target" sample. It may help you to get started.
I need to Embed the Word Document in silverlight,and i need to have all the same functionality of Word Document.
Like Cut,Copy,Paste,Save,Save us,Formating Etc.
How can i Achieve this?.
Also Suggest me some links too.
SL4 comes with COM automation support mean if client machine has Word installed SL can use it to display work doc:
http://forums.silverlight.net/forums/p/185680/424357.aspx
http://www.silverlightshow.net/items/MS-Word-Mail-Merge-with-Silverlight-4-COM-Automation.aspx
If you are using SL3.. it will be a little daunting... may be you will have to find some RTE to display word in it.
Regards.
The problem with using the COM model to read the file is that you must run the Silverlight app out of the browser with Elevated privileges and the user mush have Word installed so not very useful if you want a web app.
However Word documents are storred as XML files inside a zipped file (rename file from name.docx to name.zip to see the files) so you could always write a class to read in the XML and display it inside a Rich Text Box and then after formatting write it out to a XML form, this will take a lot of effort.
I have several Word templates and I wish to use these to dynamically create Word documents in my app. I wish to avoid using automation at all costs as this is no good. I know that I can use both HTML and XML to create word documents but I just don't know where to start with regards to using a template that may well have images in the footer or the header of a document.
I use the OpenXML SDK with Word 2007. After you get the hang of it, it's not so bad. I have several template docx files that I scan through to search and replace for placeholder strings with what I want, and then can stitch together multiple templates into one document if I want to. It's nice because I can start with docx files as the template and modify them while the whole time staying within the realm of the docx format. If an image is in the docx when you start modifying it, it'll be there after you re-save it after modification (provided you didn't programmatically remove it of course).
If you have more details with what you'll be doing, let us know.
You could use DocX. It's free, very easy to use, with nice tutorials and is feature reach. It works with only DOCX documents thou. Also development is currently on hold until the author will finish his semester. Here's detailed blog about it.
It has good example of using template in his Invoice Example.
MigraDoc http://www.pdfsharp.net/MigraDocOverview.ashx is a free utility for exporting PDF/Word/HTML files. I've not worked with it using templates as yet however, you could use the DDL files to persists a layout for your files to be re-used.