About analysing the photo with tesseract

About analysing the photo with tesseract - c#

I wrote this code for analyzing the numbers included in picture. It does not give any error while starting but it can not read the numbers. When I start program, it shows an empty MessageBox.
I want to read pictures like this:
The code:
private string FotoAnaliz()
{
FileStream fs = new FileStream("D:\\program_goruntusu.jpg", FileMode.OpenOrCreate);
//string fotopath = #"D:\\program_goruntusu.jpg";
Bitmap images = new Bitmap(fs);
using (var engine = new TesseractEngine(#"./tessdata", "eng"))
{
engine.SetVariable("tessedit_char_whitelist", "0123456789");
// have to load Pix via a bitmap since Pix doesn't support loading a stream.
using (var image = new Bitmap(images))
{
using (var pix = PixConverter.ToPix(image))
{
using (var page = engine.Process(pix))
{
sayı = page.GetText();
MessageBox.Show(sayı);
fs.Close();
}
}
}
}
return sayı;
}

Try PSM 10: Treat the image as a single character.
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc

Related

PDFiumSharp PDF to Image size

I am using PDFiumSharp to generate JPGs from PDF file. Here is my code:
using (WebClient client = new WebClient())
{
byte[] pdfData = await client.DownloadDataTaskAsync(pdfUrl);
using (var doc = new PdfDocument(pdfData))
{
int i = 0;
foreach (var page in doc.Pages)
{
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, true))
using (var stream = new MemoryStream())
{
page.Render(bitmap);
bitmap.Save(stream);
...
i++;
}
}
}
}
The codes work very well, images are generated accurately. However, each JPG is about 2mb. With multi-page PDF, the overall image size adds up quickly. Is there any way to reduce the JPG file size? I only need the JPG for preview purposes, not for printing. So lower resolution or quality is fine.

When you call bitmap.Save(...), the resulting byte[] that gets put into the MemoryStream stream represents a BMP. You should convert it into JPG yourself.
public static byte[] Render(PdfDocument pdfDocument, int pageNumber, (int width, int height) outputSize)
{
var page = pdfDocument.Pages[pageNumber];
using var thumb = new PDFiumBitmap((int)page.Width, (int)page.Height, false);
page.Render(thumb);
using MemoryStream memoryStreamBMP = new();
thumb.Save(memoryStreamBMP);
using Image imageBmp = Image.FromStream(memoryStreamBMP);
using MemoryStream memoryStreamJPG = new();
imageBmp.Save(memoryStreamJPG, ImageFormat.Jpeg);
return memoryStreamJPG.ToArray();
}

Tesseract can not read single number

I am trying to scan the numbers on screenshot but tesseract can read 2 number digit (ex.30,21,19) but it can not ready single digit (ex. 2,6,9) how can i fix that? I tried some solutions but i can not fix this problem.
private string FotoAnaliz()
{
FileStream fs = new FileStream("D:\\program_goruntusuasıl.png", FileMode.OpenOrCreate);
//string fotopath = #"D:\\program_goruntusu.jpg";
Bitmap images = new Bitmap(fs);
using (var engine = new TesseractEngine(#"./tessdata", "eng"))
{
engine.SetVariable("tessedit_char_whitelist", "0123456789");
// have to load Pix via a bitmap since Pix doesn't support loading a stream.
using (var image = new Bitmap(images))
{
using (var pix = PixConverter.ToPix(image))
{
using (var page = engine.Process(pix))
{
sayı = page.GetText();
MessageBox.Show(sayı);
fs.Close();
}
}
}
}
return sayı;
}

Sevenzipsharp CompressStream throws Exception: Value cannot be null

I am trying out following code in order to compress and save a bitmap of a screenshot but get this error. I haven't tried using CompressFiles as I have to do it using memory stream.
public void CompressAndSaveBitmap(Bitmap bitmap)
{
using (MemoryStream memStream = new MemoryStream())
{
using (FileStream file = new FileStream("XYZ", FileMode.Create, System.IO.FileAccess.Write))
{
bitmap.Save(memStream, System.Drawing.Imaging.ImageFormat.Bmp);
// tried these but they don't work.
//memStream.Seek(0, 0);
//memStream.Position = 0;
this.compressor.CompressStream(memStream, file); // throws error here
// also tried the following to see if Bitmap contains the problem
//UnicodeEncoding uniEncoding = new UnicodeEncoding();
//byte[] firstString = uniEncoding.GetBytes("Invalid file path characters are: ");
//int count = 0;
//while (count < firstString.Length)
//memStream.WriteByte(firstString[count++]);
//memStream.Flush();
////memStream.Seek(0, 0);
//memStream.Position = 0;
//this.compressor.CompressStream(memStream, file);
}
}
}
Compressor initialization code:
public Compressor()
{
SevenZipCompressor.SetLibraryPath(#"D:\7z.dll");
this.compressor = new SevenZip.SevenZipCompressor();
compressor.CompressionLevel = CompressionLevel.Ultra;
compressor.CompressionMethod = CompressionMethod.Ppmd;
}
It's crashes on this line in the library:
var lockObject = (object) _files ?? _streams;
lock (lockObject) // here in ArchiveUpdateCallback.cs
I see that the file is created but it's corrupt.

I was able to successfully compress file using following code:
public void CompressAndSaveBitmap(Bitmap bitmap)
{
using (MemoryStream memStream = new MemoryStream())
{
using (FileStream file = new FileStream("XYZ", FileMode.Create, System.IO.FileAccess.Write))
{
bitmap.Save(memStream, System.Drawing.Imaging.ImageFormat.Bmp);
// tried these but they don't work.
//memStream.Seek(0, 0);
//memStream.Position = 0;
SevenZipCompressor compressor =
new SevenZipCompressor
{
CompressionLevel = CompressionLevel.Ultra,
CompressionMethod = CompressionMethod.Lzma
};
compressor.CompressStream(memStream, file); // throws error here
}
}
}
Following was the bitmap file:
Executed code:
Bitmap bmp = new Bitmap("Test.bmp");
res.CompressAndSaveBitmap(bmp);
After execution got following output:
Opened the file with WinRar and extracted file:
In folder found the extracted image:
UPDATE:
I think the only problem with OP's code is this.compressor is not initialized.

Byte array getting corrupted when passed to another method

I have a bunch of Jpg images in byte array form. I want to add these to a zip file, turn the zip file into a byte array, and pass it to somewhere else. In a method, I have this code:
var response = //some response object that will hold a byte array
using (var ms = new MemoryStream())
{
using (var zipArchive = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
var i = 1;
foreach (var image in images) // some collection that holds byte arrays.
{
var entry = zipArchive.CreateEntry(i + ".jpg");
using (var entryStream = entry.Open())
using (var compressStream = new MemoryStream(photo.ImageOriginal))
{
compressStream.CopyTo(entryStream);
}
i++;
}
response.ZipFile = ms.ToArray();
}
using (var fs = new FileStream(#"C:\Users\MyName\Desktop\image.zip", FileMode.Create))
{
ms.Position = 0;
ms.CopyTo(fs);
}
}
return response;
Now, I've added a filestream near the bottom to write it to a zipfile right away for testing purposes. This works, I get a zip file with 1 or more images in it on my desktop. However: response.ZipFile can not be made into a valid zipfile in the same way. I have tried this:
using (var ms2 = new MemoryStream(response.ZipFile))
using (var fs = new FileStream(#"C:\Users\Bara\Desktop\image.zip", FileMode.Create))
{
ms2.Position = 0;
ms2.CopyTo(fs);
}
But that creates a zipfile that can not be opened.
What I'm trying to do: turn response.ZipFileinto an array that can be turned into a working zipfile again. What am I doing wrong in this code?

How do you know that ZipArchive's Dispose doesn't write more to the underlying stream?
You should move this line to be after disposing the ZipArchive:
response.ZipFile = ms.ToArray();
Full code:
var response = //some response object that will hold a byte array
using (var ms = new MemoryStream())
{
using (var zipArchive = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
var i = 1;
foreach (var image in images) // some collection that holds byte arrays.
{
var entry = zipArchive.CreateEntry(i + ".jpg");
using (var entryStream = entry.Open())
using (var compressStream = new MemoryStream(photo.ImageOriginal))
{
compressStream.CopyTo(entryStream);
}
i++;
}
}
response.ZipFile = ms.ToArray();
}
return response;

Creating Multipage TIFF with Magick.NET

I'm using Magick.NET and trying to create multipage-TIFF-files. My input is a PDF-file. But writing the result to a MemoryStream or getting it as byte-array results in an error:
iisexpress.exe: Error flushing data before directory write. `TIFFWriteDirectorySec' # error/tiff.c/TIFFErrors/551
But when I write the result to a file on the harddisk there is no error and the file is fine.
Here is my code:
var outputStream = new MemoryStream();
using (var inputPdf = new MagickImageCollection())
{
inputPdf.Read(rawData, settings);
using (var tif = new MagickImageCollection())
{
foreach (var pdf in inputPdf)
{
pdf.Depth = 8;
pdf.Format = MagickFormat.Tif;
tif.Add(pdf);
}
if (debug)
{
// Writing the data to a file is successful!
tif.Write(pathImage);
}
// But writing it to a stream results in the error!
//tif.Write(outputStream);
// Same as getting the data as byte-array!
var outputData = tif.ToByteArray(MagickFormat.Tif);
outputStream.Write(outputData, 0, outputData.Length);
}
}

Solved.
The solution is to set a compression:
pdf.CompressionMethod = CompressionMethod.JPEG;
Has anyone an idea why?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

About analysing the photo with tesseract - c#

Try PSM 10: Treat the image as a single character. https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc

Related

PDFiumSharp PDF to Image size

Tesseract can not read single number

Sevenzipsharp CompressStream throws Exception: Value cannot be null

Byte array getting corrupted when passed to another method

Creating Multipage TIFF with Magick.NET

Categories

Resources