I need to capture some feedback from a printed paper form. It contains choices to mark and may be some handwritten texts. Completed feedback forms should be scanned/ processed using an OCR tool and stored in a MS SQL Server database.
The system is developed using C#.Net.
Please let me know some your experience. BTW, what about Abbyy FineReader for this?
You are talking about OMR aka Form processing. You will need a toolkit having following features:
OMR /Form processing to read optical marks
ICR to read hand written text
Image processing, to perform image pre - processing to correct image before processing
I had used atalasoft OMR, capturing, image processing and IRIS OCR, ICR.
Good paid solutions:
Abbyy FineReader is good option Atalasoft (basic /intermediate) OMR
feature, great capture and image processing features.
Leatools - provides OMR, OCR, ICR
IRIS (iDRS ocr Engine). difficult to program but great quality OCR, ICR features.
Like Needo mentioned, there are commercial toolkits that supports OCR\OMR technology. We tried some of them and ended up using leadtools and it helped us to capture data from our surveys. You can find more here:
http://support.leadtools.com/CS/forums/37869/ShowPost.aspx
Related
I'm looking for a Tesseract or Google's Vision API type of OCR which can help in extracting textual information in passport / ID card image, (which may be captured from mobile or may be scanned. Hence frame size may vary a little). I have been through several posts, and and found Tesseract as preferred solution.
I also tested my test data using Vision API, and got 99% accurate and satisfactory results. But I have following problems/requirements:
Problems:
Tesseract is the suggested solution that i found on most of the posts i had been through, but it gives very bad results, as frame may vary. I can't train data, and I'm okay with any paid library available to help me in my scenario.
Vision API gives accurate result, but my requirement is to not to use cloud based solution.
There are few providers, (eg, LeadTool, IdScan etc.) which provide this feature, but they use their scanners first to scan the passport. Hence their SDK works for their scanners device.
Summary: Is there any available (paid or opensource) c# library available, which takes passport/cedula image as input, and returns accurate text?. Any suggestion/help will be appreciated.
Company called MicroBlink created BlinkID SDK to scan passports, ID cards. It is not free for commercial usage, but free for development. Link to SDK's site HERE. Tesseract OCR tool may give you false results because you probably have not done any processing for an image before OCR scan, which is mandatory if you want a proper result, especially for images of passports and ID's and so on. For image processing you can use OpenCV (free), but it may take you time to learn computer vision and image processing (which are very rewarding actually).
I'm one of the developers in MicroBlink, which is a company specializing in development of barcode and OCR solutions.
Tesseract is indeed one of the options you have. The problem with Tesseract is that it's hard to set the right parameters to get really accurate OCR results. And you still need to implement the data extraction logic on top of the OCR results. And integration on iOS/Android requires two separate codebases.
Google Cloud Vision gives very accurate OCR result, but as you said, it performs image processing on server side, which raises privacy and security concerns regarding sending private ID information over the network to third parties.
There are other companies developing similar products with similar properties (server side, no data extraction, etc..)
MicroBlink's BlinkID is different in the sense that it performs all processing locally (without server side connection). It uses our proprietary machine-learning based OCR engine to ensure data is captured correctly. It supports MRZ, PDF417 barcodes, and scanning the front side of some ID documents (such as UK Driver's licenses, Malaysian IDs, EU IDs...). All ID data is parsed and verified according to country's standards with checksum validation.
BlinkID is provided as a native iOS, Android and Windows Phone 8 SDKs, Phonegap / Cordova plugins for iOS and Android, and Xamarin component (C#) for iOS and Android.
There are also server side library (available on request) which can run on Linux / Windows / MacOS and which has C API and that can be used from .NET application using C++/CLI. Our development team is here to help with the integration in a .NET app.
Please contact support#microblink.com for more information on the subject.
Asprise C# .NET OCR and Barcode Recognition SDK can recognize both normal text, MRZ data and barcodes on passports and other identity documents. The accuracy rate for MRZ data is extremely high.
You can simply pass input images in formats like BMP, JPG, PNG, PDF or TIFF.
Many government agencies use Asprise OCR to read passport MRZ information.
You may contact Asprise support to obtain a special evaluation version for your scenario.
Just a correction about LEADTOOLS needing to...
"use their scanners first to scan the passport. Hence their SDK
works for their scanners device"
This part is not correct. We do not make or sell any type of scanning device. LEADTOOLS SDKs can use different standard devices (Twain and WIA on Windows, Sane on Linux, and cameras on Android and iOS devices using the operating system's own APIs). The OCR and ID recognition is then done through software.
I´m looking for a way to convert a High Resolution PDF file to a Low Resolution PDF file from an ASP.NET applicaitn (C#).
Users will import High Resolution PDF's and the solution should then have the possibility to provide both High Resolution PDF and Low Resolution PDF.
I´m looking for a API to do that. I have found a lot of PDF apis but none of them seems to do what I´m looking for.
ABCpdf .NET will do this for you. There are a variety of functions for resizing, resampling or recompressing the images within a PDF document. However given your requirements you probably just want to use the document reduce size operation.
To do this you just need code of the following form:
Doc doc = new Doc();
doc.Read(Server.MapPath("../mypics/sample.pdf"));
using (ReduceSizeOperation op = new ReduceSizeOperation(doc)) {
op.UnembedSimpleFonts = false; // though of course making these true...
op.UnembedComplexFonts = false; // ... would further reduce file size.
op.MonochromeImageDpi = 72;
op.GrayImageDpi = 72;
op.ColorImageDpi = 144;
op.Compact(true);
}
doc.Save(Server.MapPath("ReduceSizeOperation.pdf"));
I work on the ABCpdf .NET software component so my replies may feature concepts based around ABCpdf. It's just what I know. :-)
There are a number of possible approaches to this problem, one of which would simply be to export from InDesign twice (which would allow you to make the two required versions of PDF). If that is not feasible (which it might not be as exporting from InDesign can take a bit of time) there are definitely libraries on the market that can perform what you want to do. Here to you have a number of approaches:
1) While this will get me shot by most Adobe employees, you could re-distill your PDF file into a smaller file. I would not advocate to do this, I'm mentioning it to be complete. Risks involved would be introducing quality loss, artefacts and so on (mostly because PostScript doesn't support a number of features PDF does support and by redistilling you'd loose those features).
2) Use an application or library that was made for this task. callas pdfToolbox for example (warning, I'm affiliated with this company!) is capable of taking a PDF file and running corrections on it. Amongst those corrections are image downsampling, conversion to RGB, image re-compression (for example with JPEG-2000), throwing away unnecessary data in the PDF and much more. Given that the application has a command-line interface, it can easily be integrated into a c# process. Other applications with similar functionality would be from companies such as Enfocus, Apago and others.
3) Use a low-level PDF library (such as the Adobe PDF library that can be licensed from Adobe through DataLogics) and implement the necessary changes yourself. More work, but potentially more flexible.
Whatever approach you choose, know that you are starting from a high-quality PDF file and that your process should try to retain as much of that quality as possible (dependant on which application you have for the low resolution PDF file of course). Make sure you don't get into trouble by loosing proper overprint, transparency etc...
Probably you can just resize images in high resolution version PDF and this will give you much smaller files.
You can resize and/or recompress images using Docotic.Pdf library. For more details please take a look at my answer to a similar question.
Disclaimer: I am one of the developers of Docotic.Pdf library.
I am looking for a Optical Mark Recognition software to read the scanned documents and automatically process them.
These documents are created from a ASP.net Web application. Users will fill in those printed forms which have a barcode and then scan the same.
If you have any idea or used it appreciate if you could suggest something.
Thanks
You can try something like http://en.wikipedia.org/wiki/Microsoft_Office_Document_Imaging for scanning documents.
http://www.pixel-technology.com/freeware/tessnet2/ Tessnet 2 is an open source .Net based OMR assembly.
I have never done barcode scanning or reader implementation, but I found this on google. Hope this helps. http://www.onbarcode.com/products/net_barcode_reader/
I've used Softek's Barcode Reader Toolkit for reading barcodes off of bitmaps before, and it worked very well. It's fairly configurable and speedy.
I'm creating a service that monitors a folder for scanned files. Once the file is there, The service picks it up, and convert it to a readable PDF. In this process the service also searches for a barcode. After this, the text is extracted and the file, with its text is stored into the database of our software. The location is based on the barcode.
Now, for the OCR we are using the SDK of Atalasoft (http://www.atalasoft.com/).
Also the Barcode recognizer is included in this SDK.
But the converted text still has some mistakes. (I ran some tests with other OCR-programs, but Atalasoft came out nice.)
I'm looking for some software (SDK-kit) which allows me to improve the quality of the PDF for OCR purposes.
I tested Kofax VRS Elite (http://www.kofax.com/vrs-virtualrescan/). I'm looking for something similar, but that can be implemented in the service using some kind of SDK-kit.
Anyone who did this before, or had similar problems?
thx in advance!
You may try and follow a different path altogether:
See if you can configure the scanner(s) to scan directly to PDF and do the OCR on the fly. The Lexmark scanners can do this. This creates PDF's with selectable and searchable text. This in turn can be extracted with a PDF reading library.
Alternatively you may want to have a look at http://www.abbyy.com/ and see if you get better results.
If these are not good options, you may want to break down your problem in a systematic way:
1. Is the image quality of the scanned images the problem? If so, then this will have to be fixed first. Your OCR solution may be affected by resolution, contrast, and colour.
2. Is it the OCR software? Take a highly legible document and see if the OCR software makes mistakes. If so, then you know you have to find better OCR software.
3. If your document quality is decent and your OCR software has a high success rate in deciphering a legible document, then you may want to look at the exceptions that do not work, and tackle these on a case by case basis.
If smears and background images on documents is the cause of the problem, you may want to look into ways of avoiding this, or cleaning this with image processing software that exposes an API.
hi
I am developing a video capture application using C#.net. i captured
video through webcam and saved it as a JPEG images then i want to make a
wmv file with those images. how can i do that what are the basic steps needed for that can any body help
I am working on this myself. I have found two ways that may be possible - both require the purchase of an outside library.
The first one looks to be the easiest but costs the most, although it will allow you to use it for free you will just have to deal with a pop up telling you to purchase the library: http://bytescout.com/products/developer/imagetovideosdk/imagetovideosdk_convert_jpg_to_video.html
The other involves using Microsoft Encoder 4. I am still working on this one. You can get the free version here: http://www.microsoft.com/download/en/details.aspx?id=18974
C# doesn't natively support much in the way of sound or video so outside reference assemblies seem to be a necessity.
Right now that is the best help I can offer until I figure it out.