how can get text from table in pdf file? - c#

I want to get text from table in PDF file?
I cannot get cell in table. I was try to run example of Leadtools but it cannot auto detect cell.
https://www.leadtools.com/help/leadtools/v20/dh/fo/iocrtablezonemanager.html
Can you give me advice? Thanks all

In tables similar to the image you posted, you should be able to find the cells using the IOcrPage.TableZoneManager.AutoDetectCells() method. This method is used in the OcrMultiEngineDemo project that’s shipped with the current version of LEADTOOLS.
Here’s how you can test it:
Run the OCR Multi-Engine Demo.
Select the OmniPage OCR Engine
Open the image or PDF file that contains the table.
Draw a zone around the table.
Choose “Update Zones…” from the OCR->Zones menu.
In the “Update Zones” dialog, click “Detect Cells” as shown in attached image.
If this doesn’t give you the result you’re expecting, send the actual files you’re testing with to support#leadtools.com and explain how you tested exactly.

Related

Embed word document into another WITHOUT icon

How to embed a word document into another word document via OpenXML SDK, but showing content, not an icon of word? Such, as we do it manually in word: Insert object from file -> WITHOUT checking "Dispaly as icon"?
I've found this article, but it uses an icon. I've also tried to use OpenXML SDK Productivity Tool, but shows only generated binary data.
EDITED:
I use the following code:
DrawAspect = OleDrawAspectValues.Content
and then i add image part:
var imagePart = mainDocumentPart.AddNewPart<ImagePart>("image/x-emf", imagePartId);
GenerateImagePart(imagePart);
But my image part - is just an array of bytes of word's icon.
So, in this case happens the following: when i open generated document, it shows embedded document as an icon, but when i double click this embedded document, edit it and save changes, the embedded document is shown as a content, so maybe it's possible in some way to show this content without editing embedded document? Should i use instead of array of bytes of word's icon an array of bytes of doc's screenshot?
Not sure i described it clear, so please ask
I'm afraid what you are asking for is almost impossible.
The only difference as far as the word file is concerned between the icon and the embedded file, is the image.
When you don't use a icon Word pretty much just take a screenshot of the document you are embedding and inserts that in place of the Icon graphic.
I've uploaded an example I grabbed from a Word file I made. Found this little gem in the /media folder inside the .docx file.
So basicly, your only choice in resolving this if you can't live with the Icon is to somehow grab a picture of the word-file you want to embed and insert that instead of the Icon image.
How you'd go about that can't be pretty. First of all the open xml sdk contains no such functionality. I tried playing a bit around with office interop as well, but no luck.
I only see two possible ways to achieve this.
First one is via Interop. You'll need to install a "pretend printer" like the ones that print to PDF instead of sending it to a printer. This one however needs to print to an image format. The format of the file in the Media folder was .emf but I'm not positive thats a requirement.
Anyways, should the above somehow be possible you could embed that picture, pretty much using the example you link from Microsoft, and just change this size of the "icon" which now would be an image of the document.
Second possibility would be to open the word document as a process, set the document size to 72% (or whatever makes the document be the only one on screen on your desktop) and the grab a print screen and cut it down to just the document and the use that as your image for the embedding.
For the record, I don't recommend you do any of the above, but thoose are the only options I see.
Should someone have a better solution to this I'm all ears.
Finally, should you decide that you want to push on with this, I'll be happy to code up an example of option number 2 if you reply and tell me you'd like that.
Kaspar
There is a nice wrapper API (Document Builder 2.2) around open xml specially designed to merge documents, with flexibility of choosing the paragraphs to merge etc. You can download it from here.
Using this tool you can embed a paragraph of another word document or entire word document as per your requirement.
The documentation and screen casts on how to use it are here.
Hope this helps.

How can I export data from Excel to png?

I have a set of Excel spreadsheets with multiple tabs which contains each one table that I need to export as pictures in an automated process (I have dozens of such files to process).
While I could "manually" select the table, copy and paste them as image in another software, I need to industrialize this process to save time.
What would be the best approach using .Net or any builtin Excel feature?
Thanks
Check this question.
Programmatically (C#) convert Excel to an image
It looks like they're doing what you need?
Think I would use a small C# apop to do it - that assumes that you have a one off task and don't want to mess about with Excel templates or global excel macros and opening each spreadsheets etc.
I would do it like this:
dump all my excel docs in a single folder.
open up each doc in the folder in C# app
iterate each tab
If data capture data for all used ranges (from A1 to the whatever the bottom right cell is) - for any embedded charts pull them off as well
If chart pull it off
dump each to the folder as an image prefixed with the excel doc name and some iterative suffix like _chat01 _data01
How to rwead it ina and convert to image? See here => http://csharp.net-informations.com/excel/csharp-excel-chart-picturebox.htm
Copy all desired cells
Open MS-Paint
Paste
Save as PNG.

Excel to Powerpoint - looking for a helpful nudge

Good Morning,
I have a C# agent which runs periodically and updates certain values in a particularly important spreadsheet, the reason that this spreadsheet is updated is that periodically someone will manually go into this .xls file and print screen the worksheets and paste them in to a Powerpoint presentation template as images.
These 'images' aren't charts, nor tables, simply ranges of cells that are coloured etc in the spreadsheet - and what I'm looking to do is basically automate this by customising my agent so that everytime it updates the spreadsheet, it 'print screens' a certain range that I specify and copies it as an image into the .ppt file.
I appreciate this question lacks a code example, and I'm not expecting someone to 'do it for me', any advice or pointers on how I might accomplish this would be much appreciated.
Also VSTO is not an option unfortunately (work environment).
Many Thanks
You may not have the ability to control how other people create their PowerPoint slides, but if they want a specific range of cells to update to match what the excel sheet is currently, they can Ctrl-C the section then Paste Special.
I know your question asked about automating a print screen capture of the cell range, but would this work for you? Or must there be no possibility of an accidental update, or some reason it must be an image?
The linked section will automatically update if the file is open and if it isn't it will ask if you want to update the links on opening the PowerPoint. Or right-click on the object in the slide and update link.
I've been doing it too reccently.
It might give you an idea.
Put an Alternative.Text to your shape in PowerPoint, it might help you place your new image in the correct place.
C# Paste HTML to Excel or PowerPoint
Showing HTML in PowerPoint

Display dynamic QR Code in Crystal Report in asp .net page C# code-behind

I'm using CRv9 and want to make use of Google Charts API for generating QR code on fly (in asp .net) and display it in the Crystal Report in a PDF format.
I have spent the whole day looking for solution with no luck. The way we output the report is we use .rpt file, feed it with data and use Response.OutputStream to feed to browser. No CrystalReportViewer control hence CSS solution is not an option.
Now, I got as far as added an OLE Object from file with Link, which I would be overwriting every time the new QR code is generated. I apreciate that CR requires it to be a bitmap, so I was planning to download and convert the google's generated PNG file to BMP, that's not an issue. The problem is that Image in the report does not update after I replace the file. Meaning, it displays the original image, which was added as an OLE Object.
If I open this report in CR designer, the image gets refresh/updated and I'd have to save changes to the report to see this new image next time I generate a PDF file.
The question is really how to achieve a dynamic image in Crystal Reports 9? Remember, Picture object did not have a Graphic Location property until vXI, so I cannot use that.
Please help, I'm kinda stuck here. Manipulations with DataSets is not an option either as we're not giving report a datasource, instead we just map the fields with FormulaFieldDefinitions.
sample qr code url: https://chart.googleapis.com/chart?chs=150x150&cht=qr&chl=Hello%20world&choe=UTF-8
Try this:
Insert a picture; use a dummy QR code or something about the same size
Right-click the image and select 'Format Graphic...'
Select the Picture tab
Add your URL, in double-quotation marks, to the Graphic Location's conditional formatting
Refresh the report
My original posting: Crystal Reports: Dynamic Images
This technique worked with versions prior to XI.
Another idea: create a user-function library (UFL):
Creating a Crystal Reports Custom Function Library
You can also create a UFL in Java. In the UFL, you could make the call to Google's service and return the resulting image.
Or purchase a QR UFL: QR Code Font kit
No idea about anything in crystal reports, but the traditional way of embedding barcodes is to use a font, not an image. So it should be pretty doable if you have the ability to use custom fonts here.
the answer to my question is "it's impossible" :(
I had similar issue with dynamic QR-code as image. The problem with CrystalReport is that it flattens image rendering. My solution was to use the rdlc reporting option, though am not an expert in it. It solved the issue because it renders the image as the original file.
Add image to the report and set the source to database in the property. Set it to conform to original size. I think SAP should look into the way image is rendered because I had to change lots of design to rdlc.

Extract data from nested tables in PDF

I have a few pdf files that were created from word or excel files.
I need to get the information thats in the tables.
The text in the document is not an image so I'm able to extract the text using tools such as pdfbox.
When I have the text I have no way of knowing what cells in the table it belongs to because I don't know where the table borders are.
Iv'e tried a few desktop tools such as abby or solid pdf converter and they are able to convert the files into nice word documents but this doesn't suit my needs as I want to be able to do this programatticly in C#.
Some of the tables have nested tables wich I think makes this a little bit more diffucult.
I appreciate your help
The difficulty here is caused by the fact that the text in the PDF is not contained within any table. It might look like it is, but underneath the surface, it is not.
So there are a couple of options that I can think of. But none of them are going to be quite as satisfying as you'd probably like.
There are some companies that offer SDKs for PDF to Excel/Word conversion. Investintech and Iceni are a couple of examples. But these solutions are not free.
If you know the exact layout of the PDF files that you need to extract the table data from, then you can use any SDK that lets you extract text from a PDF and also tells you the exact co-ordinates of the extracted text. Using this method you need to know in advance where the text is going to be, so that you can extract text from a specific area on the page. It obviously won't work if you need to process any random document.
It's a difficult task, but hopefully this will give you a starting point.

Categories