Tesseract 3 (OCR) - .NET Wrapper - c#

http://code.google.com/p/tesseractdotnet/
I am having a problem getting Tesseract to work in my Visual Studio 2010 projects. I have tried console and winforms and both have the same outcome. I have come across a dll by someone else who claims to have it working in VS2010:
http://code.google.com/p/tesseractdotnet/issues/detail?id=1
I am adding a reference to the dll which can be found in the attached to post 64 from the website above. Every time I build my project I get an AccessViolationException saying that an attempt was made to read or write protected memory.
public void StartOCR()
{
const string language = "eng";
const string TessractData = #"C:\Users\Joe\Desktop\tessdata\";
using (TesseractProcessor processor = new TesseractProcessor())
{
using (Bitmap bmp = Bitmap.FromFile(fileName) as Bitmap)
{
if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT))
{
string text = processor.Recognize(bmp);
}
}
}
}
The access violation exception always points to if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT)). I've seen a few suggestions to make sure the solution platform is set to x86 in the configuration manager and that the tessdata folder location is finished with trailing slash, to no avail. Any ideas?

It appeared to be the contents of the tessdata folder that was causing the problem. Obtained the tessdata folder from the first link and all is now working.

I have just completed a project with tesseract engine 3. i think, there is a bug in the engine, that need to be rectified. What i Did to remove "AccessViolationError" is, add "\tessdata" to the real tessdata directory string. I don't know why, but the engine seems to be truncating the innermost directory in the Tessdata path.
Just made Full OCR package (Dlls+Tessdata(english)) that works with .net framework 4.

If somebody has the same problem and advice with trailing slash doesn't work, try... TWO ending slashes! Seriosly. It works for me.
if (processor.Init(#".\tessdata\\", "eng", (int)eOcrEngineMode.OEM_DEFAULT))

Seems your problem relates to stability issue mentioned here. On the official site there is a recommendation to use previous stable release 2.4.1. You can install it from nuget.org via the package manager command: Install-Package Tesseract -Version 2.4.1

Related

Unable to find an entry point named 'sk_color_get_bit_shift' in DLL 'libSkiaSharp'. when using SkiaSharp 15.9.1

I am attempting to build and use the MicoCharts project available here: https://github.com/dotnet-ad/Microcharts which is dependant on this SkiaSharp project available here: https://github.com/mono/SkiaSharp
The specific version I am attempting to use is 15.9.1 (the version that the nuget package downloads) which utilizes skia m59.
I need to build them myself and cannot use Nuget due to business restrictions, just use the package isn't an option for me.
I have built skia m59, SkiaSharp and MicroCharts but when I attempt to create a SKBitmap object I get an error when it attempts to initialize SkiaSharp.SKImageInfo. The error is as follows:
Unable to find an entry point named 'sk_color_get_bit_shift' in DLL 'libSkiaSharp'.
I had to make a few changes to the base BUILD.gn to point to the correct file locations, for the windows SDK and the VC install. I enabled skia_use_gdi in the BUILD.gn and ran the following commands.
python2 tools/git-sync-deps
gn gen out/Release --args="is_debug=false is_official_build=true skia_use_system_expat=false skia_use_system_libjpeg_turbo=false skia_use_system_libpng=false skia_use_system_libwebp=false skia_use_system_zlib=false skia_use_icu=false is_component_build=true"
ninja -C out/Release skia
This process outputs a DLL I assumed is the same as the libSkiaSharp that the SkiaSharp project relies on. I add all my references and run, the project runs successfully until I attempt to create the SkBitmap object then it fails.
Either this DLL is not the correct DLL and I am misunderstanding something here or something in my process is wrong. I would love any help I can get as I am completely new to building these sorts of projects, I am a C# developer by trade.
This is not the same thing. SkiaSharp has a few other bits that it adds to the core skia. The output that you would have got is a skia.dll, which only part. Not sure how you got a libSkiaSharp from the skia target...
If you can't use SkiaSharp from NuGet.org (which is the supported case) you can follow this to build your own: https://github.com/mono/SkiaSharp/wiki/Building-SkiaSharp
You can also check out the Azure DevOps yaml: https://github.com/mono/SkiaSharp/blob/master/scripts/azure-pipelines.yml
Just set up your own DevOps job to use that and all the work will be done for you.

Unable to load DLL 'libdl' when using System.Drawing.Common NuGet package on AWS Lambda

We have a thumbnail generator lambda function which I'm trying to update to .NET Core 2.0, but I've encountered the following error when using Microsoft's System.Drawing.Common NuGet package:
TypeInitializationException
The type initializer for 'Gdip' threw an exception.
at System.Drawing.SafeNativeMethods.Gdip.GdipCreateBitmapFromScan0(Int32 width, Int32 height, Int32 stride, Int32 format, HandleRef scan0, IntPtr& bitmap)
at System.Drawing.Bitmap..ctor(Int32 width, Int32 height, PixelFormat format)
at TestFailExample.Function.FunctionHandler(String input, ILambdaContext context) in C:\work\graphics\TestFailExample\Function.cs:line 25
at lambda_method(Closure , Stream , Stream , LambdaContextInternal )
caused by
DllNotFoundException
Unable to load DLL 'libdl': The specified module or one of its dependencies could not be found.\n (Exception from HRESULT: 0x8007007E)
at Interop.Libdl.dlopen(String fileName, Int32 flag)
at System.Drawing.SafeNativeMethods.Gdip.LoadNativeLibrary()
at System.Drawing.SafeNativeMethods.Gdip..cctor()
I've seen this question, but there was no resolution.
The minimum code to reproduce the issue is this:
public string FunctionHandler(string input, ILambdaContext context)
{
using (var bmp = new Bitmap(100, 100))
{
return bmp.Width.ToString();
}
}
Simply create a .NET Core 2.0 Lambda function project, add a reference to the System.Drawing.Common NuGet package, and replace the function handler with the above code. Chuck it on AWS and run it to get the error. I've noted that referencing the package doesn't cause a problem until you try to actually use it, but this could be down to compiler optimizations.
I've packaged the MCVE into a project and uploaded it to GitHub here for the sake of simplifying the steps people have to go through to reproduce the issue.
I can see that /lib64/libdl.so.2 exists, but /lib64/libdl.so does not. Since symlinking doesn't seem to be possible (read-only file system), I'm not sure how I can resolve this. I've tried using the LD_LIBRARY_PATH environment variable by creating a folder in /tmp and symlinking the file there as the first thing the function does. Unfortunately, it seems to look here for all libraries so the function doesn't run at all. I've also tried setting LD_LIBRARY_PATH to /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/tmp and, although I could now run the function again, this still didn't help and I just get the same Gdip error.
I noted that /var/task/lib is already included in the LD_LIBRARY_PATH, so I tried packaging libdl.so and libgdiplus.so with my function, but this also failed, this time stating that entry point GdiplusStartup wasn't found in libdgiplus.so. These files weren't from an Amazon Linux instance, so I've now tried installing Mono and obtaining them from an Amazon Linux instance. This has not helped.
I've tried with the CoreCompat drawing library but this also reports problems pertaining to libgdiplus.so, even if I try and bundle that with the function.
I've tried since on my own Linux instance and can confirm that System.Drawing.Common works.
Is there some clever solution that will allow me to use System.Drawing.Common on AWS Lambda? Is there another way I can fudge my lambda function to have libdl and work?
Update:
Our latest attempt involved using AWS Lambda Layers and carefully extracting all the packages installed by apt within the Docker Amazon Linux image, and then applying those to their own layer. Still we ultimately came down to the "libdl" issue, so we gave up.
A lot of the issues with libraries people suggested are that they didn't render Japanese text correctly, which is important for us. This seems to be an issue which isn't going to get better on AWS Lambda it didn't help, and ultimately it was easier to rewrite our function in Go than continue using C# for this.
Since the libraries mentioned by the answers below are seemingly suitable for general use - and may indeed support Japanese text now - I've chosen to accept the answer that I'm sure will work on AWS Lambda.
I had the same issue after uploading my application on Ubuntu 18 server running dotnet core 2.1.500 version. I resolved this issue with this solution https://github.com/dotnet/dotnet-docker/issues/618 using MichaelSimons suggestions.
I ran
#sudo apt-get update
#sudo apt-get install -y --allow-unauthenticated \
libc6-dev \
libgdiplus \
libx11-dev \
#sudo rm -rf /var/lib/apt/lists/*
This resolved the issues.
I found a solution for this issue which worked for me:
At first i removed the System.Drawing.Common library from the project, then i installed the library you can find here. It uses the same classes.
using System.Drawing
...
var bmp = new Bitmap(100,100);
At last I installed this other library which contains all the dll's necesary for using drawing libraries on Linux and Lambda as well. By doing this steps the code can be uploaded to AWS without any problem.
If one uses centOS, then below command helps.
sudo yum install libgdiplus
For image processing in .NET Core Lambda I use the SixLabors.ImageSharp
Here is the code I used in my recent AWS re:Invent talk that did a log if image processing:
var imageBuffer = new MemoryStream();
var resizeOptions = new ResizeOptions
{
Size = new SixLabors.Primitives.Size { Width = this.TileSize, Height = this.TileSize},
Mode = ResizeMode.Stretch
};
image.Mutate(x => x.Resize(resizeOptions));
image.Save(imageBuffer, new SixLabors.ImageSharp.Formats.Jpeg.JpegEncoder());
imageBuffer.Position = 0;

TuesPechkin unable to load DLL 'wkhtmltox.dll'

I've been using TuesPechkin for some time now and today I went to update the nuget package to the new version 2.0.0+ and noticed that Factory.Create() no longer resolved, so I went to read on the GitHub the changes made and noticed it now expects the path to the dll?
IConverter converter =
new ThreadSafeConverter(
new PdfToolset(
new StaticDeployment(DLL_FOLDER_PATH)));
For the past few hours I've tried almost all the paths I can think of, "\bin", "\app_data", "\app_start", etc and I can't seem to find or figure out what it wants for the path and what dll?
I can see the TuesPechkin dll in my bin folder and it was the first path I tried, but I got the following error:
Additional information: Unable to load DLL 'wkhtmltox.dll': The
specified module could not be found. (Exception from HRESULT:
0x8007007E)
Where is that dll and now can I get it as the library doesn't seem to contain it, I tried installing the TuesPechkin.Wkhtmltox.Win32 package but the dll still is nowhere to be found. Also I am using this in a asp.net website project so I assume that using the following should work for obtaining the path, right?
var path = HttpContext.Current.Server.MapPath(#"~\bin\TuesPechkin.dll");
Further information: https://github.com/tuespetre/TuesPechkin/issues/57
The Tuespechkin has a zip file as a resource in the Win32 and Win64 embedded packages for the 'wkhtmltox.dll' file.
What it does when you use the Win32 or Win64 Embedded package is unzips the file and places it in the directory that you specify.
I have been putting a copy of the wkhtmltox dll at the root portion of my web app directory and pointing the DLL_FOLDER_PATH to it using the server physical path of my web app to get to it.
According to the author, you must set the converter in a static field for best results.
I do that, but set the converter to null when I am finished using it, and that seems to work.
Tuespechkin is wrapper for the wmkhtmlox dll file.
The original file is written in C++ and so will not automatically be usable in C# or VB.NET or any of the other managed code domains.
The Tuespechkin.dll file DOES NOT contain a copy of 'wkhtmltox.dll'. You either have to use one of the other embedded deployment modules or install a copy of the 'wkhtmltox.dll' in your web app after downloading it from the internet. That is what I do, and it seems to work just fine.
I am using Team Foundation Server, and attempts to compile code after using the Tuespechkin routines will fail the first time because the 'wkhtmltox.dll' file gets locked, but all you have to do is simply retry your build and it will go through.
I had issues with the 32-bit routine not working in a 64-bit environment and the 64-bit environment not being testable on localhost. I went with the workaround I came up with after examining the source code for Tuespechkin and the Win32 and Win64 embedded deployment packages.
It works well as long as you specify a url for the input rather than raw html.
The older package didn't render css very well.
If you are using a print.aspx routine, you can create the url for it as an offset from your main url.
I don't have the source code I am using with me at this point to offset to your base url for your web application, but it is simply an offshoot of HttpRequest.
You have to use the physical path to find the .dll, but you can use a web path for the print routine.
I hope this answers your question a bit.
If you are getting this error -> Could not load file or assembly 'TuesPechkin.Wkhtmltox.Win64' or one of its dependencies. An attempt was made to load a program with an incorrect format.
In Visual Studio Go to -
Tools -> Options -> Projects and Solutions -> Web Projects -> Use the 64 bit version of IIS Express for web sites and projects.
I installed TuesPechkin.Wkhtmltox.Win64 Nuget package and used the following code in a singleton:
public class PechkinPDFConvertor : IPDFConvertor
{
IConverter converter =
new ThreadSafeConverter(
new RemotingToolset<PdfToolset>(
new Win64EmbeddedDeployment(
new TempFolderDeployment())));
public byte[] Convert(string html)
{
// return PechkinSync.Convert(new GlobalConfig(), html);
return converter.Convert(new HtmlToPdfDocument(html));
}
}
The web application then has to be run in x64 otherwise you will get an error about trying to load an x64 assembly in an x86 environment. Presumably you have to choose x64 or x86 at design time and use the corresponding nuget package, it would be nicer to choose this in the web.config.
EDIT: The above code failed on one server with the exact same message as yours - it was due to having not installed VC++ 2013. So the new code is running x86 as follows
try
{
string path = Path.Combine(Path.GetTempPath(), "MyApp_PDF_32");
Converter = new ThreadSafeConverter(
new RemotingToolset<PdfToolset>(
new Win32EmbeddedDeployment(
new StaticDeployment(path))));
}
catch (Exception e)
{
if (e.Message.StartsWith("Unable to load DLL 'wkhtmltox.dll'"))
{
throw new InvalidOperationException(
"Ensure the prerequisite C++ 2013 Redistributable is installed", e);
}
else
throw;
}
If you do not want run the installer for wkhtmltox just to get the dll, you can do the following:
As #Timothy suggests, if you use the embedded version of wkhtmltox.dll from TuesPechkin, it will unzip it and place it in a temp directory. I copied this dll and referenced it with the StaticDeployment option without any issues.
To find the exact location, I just used Process Monitor (procmon.exe). For me it was C:\Windows\Temp\-169958574\8\0.12.2.1\wkhtmltox.dll
In my case, I am deploying on a 64-bit VPS then I got this error. I have solved the problem by installing the wkhtmltopdf that I downloaded from http://wkhtmltopdf.org/downloads.html. I chose the 32-bit installer.
In my case, I have solved the problem by installing the Wkhtmltox for win32 at https://www.nuget.org/packages/TuesPechkin.Wkhtmltox.Win32/
This error: Unable to load DLL 'wkhtmltox.dll': The specified module could not be found. (Exception from HRESULT: 0x8007007E) is returned in two situations:
1- Deploy dependency not installed:
For solve this, you can install nuget package "TuesPechkin.Wkhtmltox.Win64" and use this code (for WebApplications running in IIS):
IConverter converter =
new ThreadSafeConverter(
new RemotingToolset<PdfToolset>(
new Win64EmbeddedDeployment(
new TempFolderDeployment())));
// Keep the converter somewhere static, or as a singleton instance!
// Do NOT run the above code more than once in the application lifecycle!
byte[] result = converter.Convert(document);
In runtime this code will copy the dependency "wkhtmltox.dll" in a temporary directory like: "C:\Windows\Temp\1402166677\8\0.12.2.1". It's possible to get the destination of file using:
var deployment = new Win64EmbeddedDeployment(new TempFolderDeployment());
Console.WriteLine(deployment.Path);
2- Microsoft Visual C++ 2013 Redistributable not installed:
As described here:
https://github.com/tuespetre/TuesPechkin/issues/65#issuecomment-71266114, the Visual C++ 2013 Runtime is required.
The solution from README is:
You must have Visual C++ 2013 runtime installed to use these packages. Otherwise, you will need to download the MingW build of wkhtmltopdf and its dependencies from their website and use that with the library. https://github.com/tuespetre/TuesPechkin#wkhtmltoxdll
or, you can install the Microsoft Visual C++ 2013 Redistributable:
choco install msvisualcplusplus2013-redist
Here is AnyCpu version, also support iis-base or winform application
using TuesPechkin.Wkhtmltox.AnyCPU;
...
var converter = PDFHelper.Factory.GetConverter();
var result = converter.Convert(This.Document);
Reference : https://github.com/tloy1966/TuesPechkin
Installing the Visual C++ Redistributable for Visual Studio 2013 resolved the error for me.
https://www.microsoft.com/en-us/download/details.aspx?id=40784

BundleTransformer.Less unable to parse variables in #import statements

EDIT:
So according to this it is not possible which is a real shame. I will need to look for a library that bundles and compresses with the support of variables in imports.
I am having trouble trying to get BundleTransformer.Less to parse the following LESS:
// There is a path to Startup framework
#startup-basePath: "../../../";
#import '#{startup-basePath}flat-ui/less/config.less';
#import '#{startup-basePath}flat-ui/less/mixins.less';
And it is producing the following error:
You are importing a file ending in .less that cannot be found.":"/lib/startup/samples/template/less/#{startup-basePath}flat-ui/less/config.less
The files does exist but as you can see it isn't parsing the variable in the location string. Web Essentials in VS2013 has no problem compiling the LESS files and output the CSS as expected. I suspect the issue lies with BundleTransformer or the way that have set it up. I am using the following version:
Id Version Description/Release Notes
-- ------- -------------------------
BundleTransformer.Core 1.8.0 Bundle Transformer - a modular extension for System.Web.Optimization (aka Microsoft ASP.NET Web Optimization Framework). Classes `CssTransformer` and `JsTra...
BundleTransformer.Less 1.7.16 BundleTransformer.Less contains translator-adapter LessTranslator. This adapter makes translation of LESS-code to CSS-code. Also contains HTTP-handler LessA...
BundleTransformer.Yui 1.8.0 BundleTransformer.Yui contains 2 minifier-adapters: `YuiCssMinifier` (for minification of CSS-code) and `YuiJsMinifier` (for minification of JS-code). These...
I have to use these versions as I am using Umbraco 7 and it will not allow me to update Newtonsoft.Json without breaking Umbraco.
My bundle config file looks like the following:
public static void RegisterBundles(BundleCollection bundles)
{
bundles.UseCdn = true;
var nullBuilder = new NullBuilder();
var nullOrderer = new NullOrderer();
// CSS + LESS
var libCSS = new CustomStyleBundle("~/libCSS");
libCSS.Include(
"~/Content/font-awesome.css",
// LESS
"~/lib/startup/samples/template/less/style.less");
libCSS.Orderer = nullOrderer;
bundles.Add(libCSS);
}
I assumed that the issue was with the Less transformer not being registered correctly but I have followed the installation instructions to the letter, please see the documentation for the LESS version. Can anybody see something that I am missing that would help solve this issue or could anyone recommend something that I could try?
All help appreciated.
I have also tried this library, but unsuccessfully :/
Best solution for compiling LESS are node.js packages LESS (https://www.npmjs.org/package/less) or Recess (http://twitter.github.io/recess/).
Because your LESS files won't be changed after you deploy web project, you don't have to generate css during each application initialization.
You can also generate CSS before application builds or after saving LESS file.
If you're interested i can help you with more informations.
I have had better luck with the 1.9.40 and 1.9.34 versions of these BundleTransformer packages. I had problems with the 1.8 versions failing at times. We've been using BundleTransformer.Core.1.9.40, BundleTransformer.Less.1.9.40, and BundleTransformer.Yui.1.9.34 for a couple of weeks now without any of the errors of the 1.8 versions.
BundleTransformer.Less does not support the string interpolation in file paths (see «Is string interpolation not supported?» discussion).
UPDATE: In BundleTransformer.Less version 1.9.92 now supports the interpolation in file paths.

SharpSVN Path Problem

Having a problem with SharpSVN (1.5 and 1.6) checking out code. (Note, I also have Tortoise 1.5 installed on my machine)
This same code has worked previously, so I don't know why things might have broken.
using (SvnClient client = new SvnClient())
{
SvnUriTarget url = new SvnUriTarget(checkoutURL.ToString());
client.Authentication.DefaultCredentials = new NetworkCredential(userName, password);
return client.CheckOut(url, destinationPath, out result); //error happens here
}
This code pulls Down a copy from SVN. It creates a copy into a directory named Sandbox.
Nothing has changed (except my own System configuration, I'll get to that in a minute), however, now I get the error:
SharpSvn.SvnException:
Can't open file '..\..\..\TestHarness\Sandbox\testBuild\Trunk\TestProjects\XX\Source\XX.TestHarness\Tests\Service\_svn\tmp\text-base\IViewProject_Tester.cs.svn-base':
The system cannot find the path specified.
Now this is crazy. This has pulled down fine before. For it to tell me to run "Cleanup" insinuates that there was a working copy there previously!
Also, you can also see that SharpSVN thinks that the .cs file is inside the _svn directory!
About my setup..
my system has Tortoise 1.5 on it (after downgrading from Tortoise 1.6 to see if I could fix this problem.. no go..
since I am a .net developer, I did set up Tortoise to use _svn folders
Any clues? Even questions are welcome..
ok,
Apparently this is an unresolvable bug that is tied to the max length for relative file paths in Windows.
Bert Huijben answers the issue pretty well here.
http://sharpsvn.open.collab.net/ds/viewMessage.do?dsForumId=728&dsMessageId=331173
Solution: Ditch the relative path and Use a Fully Qualified path

Categories