Read Text from Image (OCR) in C# with IronOCR Tesseract - c#

According this Link I installed the IronOcr package and I try the follow code.
using IronOcr;
var Result = new IronTesseract().Read(path);
string currentSubText = Result.Text;
textBox1.Text += currentSubText + Environment.NewLine + Environment.NewLine;
I tested it with six pictures:
Picture
Picture
Picture
Picture
I could just upload four pictures.
Actually it looks good. There are just a few mistakes with some special German language characters (äöü)
Result 1:
I google and found it is possible to use a language package in OCR. I try it with the follow code.
var Ocr = new IronTesseract();
//Ocr.Language = OcrLanguage.German;
Ocr.Language = OcrLanguage.GermanBest;
using (var Input = new OcrInput(path))
{
var Result = Ocr.Read(Input);
string currentSubText = Result.Text;
textBox1.Text += currentSubText + Environment.NewLine + Environment.NewLine;
}
Unfortunately the result is very, very bad.
Result 2:
Can someone help me here?
Thanks and best regards

Did you try using the built in inversion color filter?
All OCR tends to work best for me with black text on white. I use this code based on code found in the IronOCR documentation:
https://ironsoftware.com/csharp/ocr/examples/ocr-image-filters-for-net-tesseract/
Simplified source code:
using IronOcr;
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.GermanBest;
using (var Input = new OcrInput(#"image.png"))
{
//Input.EnhanceResolution(300);
Input.Invert();
/*
// Optional: Export modified images so you can view them.
foreach(var page in Input.Pages){
page.SaveAsImage("filtered.bmp")
}
*/
var Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
}
MSDN style docs:
https://ironsoftware.com/csharp/ocr/object-reference/api/IronOcr.OcrInput.html#IronOcr_OcrInput_Invert_System_Boolean_

Related

C# add simple title for spreadsheet chart OpenXml

Hi!!
i'm able to write charts to my XLSX file. But i'm stuck adding a simple title for every chart. No styles just simple plain text.
My code is like this:
String Dtitulo = "Hello chart";
DocumentFormat.OpenXml.Drawing.Charts.Title chartTitle = new DocumentFormat.OpenXml.Drawing.Charts.Title();
chartTitle.ChartText = new ChartText();
chartTitle.ChartText.RichText = new RichText();
DocumentFormat.OpenXml.Drawing.Paragraph parrafoTitulo = new DocumentFormat.OpenXml.Drawing.Paragraph();
DocumentFormat.OpenXml.Drawing.Run run = parrafoTitulo.AppendChild(new DocumentFormat.OpenXml.Drawing.Run());
run.AppendChild(new DocumentFormat.OpenXml.Drawing.Text(Dtitulo));
chartTitle.ChartText.RichText.AppendChild<DocumentFormat.OpenXml.Drawing.Paragraph>(parrafoTitulo);
chart.Title = chartTitle;
But when i open my file with excel says "file is corrupt" or something like that.
A bit late but I was faced with the same task, and I created an excel sheet and added manually a chart with a chart title, then opened the xml to understand what tags were needed. And after a while I got it working. moved everything in a small function as below:
So you can provide your chart object and the title you want to the below function and it will add the chart title.
Note:Im using Open XML SDK 2.0 for Microsoft Office
private void AddChartTitle(DocumentFormat.OpenXml.Drawing.Charts.Chart chart,string title)
{
var ctitle = chart.AppendChild(new Title());
var chartText = ctitle.AppendChild(new ChartText());
var richText = chartText.AppendChild(new RichText());
var bodyPr = richText.AppendChild(new BodyProperties());
var lstStyle = richText.AppendChild(new ListStyle());
var paragraph = richText.AppendChild(new Paragraph());
var apPr = paragraph.AppendChild(new ParagraphProperties());
apPr.AppendChild(new DefaultRunProperties());
var run = paragraph.AppendChild(new DocumentFormat.OpenXml.Drawing.Run());
run.AppendChild(new DocumentFormat.OpenXml.Drawing.RunProperties() { Language = "en-CA" });
run.AppendChild(new DocumentFormat.OpenXml.Drawing.Text() { Text = title });
}
And if you want a full example, you can review the official one here, and inject the above function in the right place (after the creation of the chart object) and it will add the chart title.

Reading text and variables from text file c#

I have the following code which tries to read data from a text file (so users can modify easily) and auto format a paragraph based on a the words in the text document plus variables in the form. I have the file "body" going into a field. my body text file has the following data in it
"contents: " + contents
I was hoping based on that to get
contents: Item 1, 2, etc.
based on my input. I only get exactly whats in the text doc despite putting "". What am I doing wrong? I was hoping to get variables in addition to my text.
string readSettings(string name)
{
string path = System.Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments) + "/Yuneec_Repair_Inv";
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(path + "/" + name + ".txt"))
{
string data = sr.ReadToEnd();
return data;
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The settings file for " + name + " could not be read:");
Console.WriteLine(e.Message);
string content = "error";
return content;
}
}
private void Form1_Load(object sender, EventArgs e)
{
createSettings("Email");
createSettings("Subject");
createSettings("Body");
yuneecEmail = readSettings("Email");
subject = readSettings("Subject");
body = readSettings("Body");
}
private void button2_Click(object sender, EventArgs e)
{
bodyTextBox.Text = body;
}
If you want to provide the ability for your users to customize certain parts of the text you should use some "indicator" that you know before hand, that can be searched and parsed out, something like everything in between # and # is something you will read as a string.
Hello #Mr Douglas#,
Today is #DayOfTheWeek#.....
At that point your user can replace whatever they need in between the # and # symbols and you read that (for example using Regular Expressions) and use that as your "variable" text.
Let me know if this is what you are after and I can provide some C# code as an example.
Ok, this is the example code for that:
StreamReader sr = new StreamReader(#"C:\temp\settings.txt");
var set = sr.ReadToEnd();
var settings = new Regex(#"(?<=\[)(.*?)(?=\])").Matches(set);
foreach (var setting in settings)
{
Console.WriteLine("Parameter read from settings file is " + setting);
}
Console.WriteLine("Press any key to finish program...");
Console.ReadKey();
And this is the source of the text file:
Hello [MrReceiver],
This is [User] from [Company] something else, not very versatile using this as an example :)
[Signature]
Hope this helps!
When you read text from a file as a string, you get a string of text, nothing more.
There's no part of the system which assumes it's C#, parses, compiles and executes it in the current scope, casts the result to text and gives you the result of that.
That would be mostly not what people want, and would be a big security risk - the last thing you want is to execute arbitrary code from outside your program with no checks.
If you need a templating engine, you need to build one - e.g. read in the string, process the string looking for keywords, e.g. %content%, then add the data in where they are - or find a template processing library and integrate it.

Image pre-processing for text recognize with tesseract or puma.net

How i can pre-processing image with OpenCVdotnet for better text recognize?
I try tesseract wrapper and Puma.NET,but my result is worse... how i can improve result?
#region Tesseract
Bitmap pictureInfoArea = src.ToBitmap();
TesseractEngine engine = new TesseractEngine("tessdata/", "rus", EngineMode.Default);
//engine.SetVariable("tessedit_char_whitelist", "0123456789");
var page = engine.Process(pictureInfoArea, PageSegMode.Auto);
string sTesseract = page.GetText();
#endregion
#region Puma.NET
PumaPage pumaInfoArea = new PumaPage(pictureInfoArea);
using (pumaInfoArea)
{
// Changing default settings
pumaInfoArea.FileFormat = PumaFileFormat.TxtAnsi;
pumaInfoArea.EnableSpeller = true;
pumaInfoArea.Language = PumaLanguage.Russian;
// Recognizing and saving results to a file
string sPuma = pumaInfoArea.RecognizeToString();
//MessageBox.Show(s);
}
#endregion
Here is a tutorial explaining how to train your own language. I suggest that you install jTessBoxeditor, that help you well in training your patterns,after applying the letters separation algorithm. jTessBoxeditor has a GUI interface letting you train your own dataset
or
Here you have another tutorial to train Tesseract3 for a new language.
Have a look at this one (i did not test it) sunnypage.ge/en http://lib.psnc.pl/Content/358/PSNC_Tesseract-FineReader-report.pdf

Searching using the Google custom search API and displaying links

I am working on a personal assistant for home automation and so far it has basic features such as searching wolfram alpha and pulling weather conditions/forecasts but I wan't to enable it to search for things on google and display the results on screen.
After searching around the community it seems the recommended way is to use the Google Search API (which has been replaced with Google Custom Search API. So I have looked at some examples and am able to get the data out into a data grid on the windows form however. I want to show clickable links. How can I do this? I already have an API key and CX to use with the code but cannot get the proper output.
GoogleSearch search = new GoogleSearch()
{
Key = "KEY HERE",
CX = "CX HERE"
};
search.SearchCompleted += (a, b) =>
{
this.DataGridResults.ItemsSource = b.Response.Items;
};
search.Search(search_query.Text);
So I solved this problem after working on it for a long time. Turns out I was just using the list the method returned wrong. I attached a link to the original post that gave me the method and my completed solution which just outputs the titles and HTML links in a text box. You can do whatever you like with them from there.
private void Button_Click_1(object sender, RoutedEventArgs e)
{
GoogleSearch search = new GoogleSearch()
{
Key = "API KEY HERE",
CX = "CX GOES HERE"
};
search.SearchCompleted += (a, b) =>
{
foreach (Item i in b.Response.Items)
{
results_box.Text = results_box.Text + Environment.NewLine + "Page Title: " + i.Title;
results_box.Text = results_box.Text + Environment.NewLine + "Link to Page " + i.Link;
};
};
search.Search(search_query.Text);
The method and original post can be found at http://kiwigis.blogspot.com/2011/03/google-custom-search-in-c.html

How to load quickdic dictionary into C#

I have downloaded a dictionary file from http://code.google.com/p/quickdic-dictionary/
But the file extension is .quickdic and is not plain text.
How can I load the quickdic dictionaries (.quickdic) into c# to make simple word queries?
I browsed through the git code, and a few things stuck out.
First, in the DictionaryActivity.java file, there is the following in onCreate():
final String name = application.getDictionaryName(dictFile.getName());
this.setTitle("QuickDic: " + name);
dictRaf = new RandomAccessFile(dictFile, "r");
dictionary = new Dictionary(dictRaf);
That Dictionary Class is not the built in class with Java, but is here according to the imports:
import com.hughes.android.dictionary.engine.Dictionary;
When I look there, it shows a constructor for a Dictionary taking a RandomAccessFile as the parameter. Here's that source code:
public Dictionary(final RandomAccessFile raf) throws IOException {
dictFileVersion = raf.readInt();
if (dictFileVersion < 0 || dictFileVersion > CURRENT_DICT_VERSION) {
throw new IOException("Invalid dictionary version: " + dictFileVersion);
}
creationMillis = raf.readLong();
dictInfo = raf.readUTF();
// Load the sources, then seek past them, because reading them later disrupts the offset.
try {
final RAFList<EntrySource> rafSources = RAFList.create(raf, new EntrySource.Serializer(this), raf.getFilePointer());
sources = new ArrayList<EntrySource>(rafSources);
raf.seek(rafSources.getEndOffset());
pairEntries = CachingList.create(RAFList.create(raf, new PairEntry.Serializer(this), raf.getFilePointer()), CACHE_SIZE);
textEntries = CachingList.create(RAFList.create(raf, new TextEntry.Serializer(this), raf.getFilePointer()), CACHE_SIZE);
if (dictFileVersion >= 5) {
htmlEntries = CachingList.create(RAFList.create(raf, new HtmlEntry.Serializer(this), raf.getFilePointer()), CACHE_SIZE);
} else {
htmlEntries = Collections.emptyList();
}
indices = CachingList.createFullyCached(RAFList.create(raf, indexSerializer, raf.getFilePointer()));
} catch (RuntimeException e) {
final IOException ioe = new IOException("RuntimeException loading dictionary");
ioe.initCause(e);
throw ioe;
}
final String end = raf.readUTF();
if (!end.equals(END_OF_DICTIONARY)) {
throw new IOException("Dictionary seems corrupt: " + end);
}
So, anyway, this is how his java code reads the file in.
Hopefully, this helps you simulate this in C#.
From here you would probably want to see how he is serializing the EntrySource, PairEntry, TextEntry, and HtmlEntry, as well as the indexSerializer.
Next look to see how RAFList.create() works.
Then see how that result is incorporated in creating a CachingList using CachingList.create()
Disclaimer: I'm not sure if the built in serializer in C# uses the same format as Java's, so you may need to simulate that too :)

Categories