Using Modi OCR To extract text from image - c#

I Planned to use OCR in my project and searched more OCR methods and i didnt find anything correctly. And at last i heard about MODI and i tried that . But It throwing Following error:
Retrieving the COM class factory for component with CLSID {40942A6C-1520-4132-BDF8-BDC1F71F547B} failed due to the following error: 80040154
I'm Using Microsoft Office 2013 and visual studio 2012.
The code me using is follows:
private void button1_Click(object sender, EventArgs e)
{
CheckFileType(#"E:\\");
}
public void CheckFileType(string directoryPath)
{
IEnumerator files = Directory.GetFiles(directoryPath).GetEnumerator();
while (files.MoveNext())
{
//get file extension
string fileExtension = Path.GetExtension(Convert.ToString(files.Current));
//get file name without extenstion
string fileName=Convert.ToString(files.Current).Replace(fileExtension,string.Empty);
//Check for JPG File Format
if (fileExtension == ".jpg" || fileExtension == ".JPG") // or // ImageFormat.Jpeg.ToString()
{
try
{
//OCR Operations ...
MODI.Document md = new MODI.Document();
md.Create(Convert.ToString(files.Current));
md.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
MODI.Image image = (MODI.Image)md.Images[0];
//create text file with the same Image file name
FileStream createFile = new FileStream(fileName + ".txt",FileMode.CreateNew);
//save the image text in the text file
StreamWriter writeFile = new StreamWriter(createFile);
writeFile.Write(image.Layout.Text);
writeFile.Close();
}
catch (Exception)
{
MessageBox.Show("This Image hasn't a text or has a problem",
"OCR Notifications",
MessageBoxButtons.OK, MessageBoxIcon.Information);
}
}
}
}
Can anyone help me in this ? Is that problem based on Microsoft Office version or Do i Need to make any changes ? Is that any better OCR dll ? thanks ..

The reason for the error is that Microsoft Office Document Imaging(MODI) has been discontinued with MS Office 2010. This is collaborated to OneNote in Office 2013.
Even I am still searching for the solutions or if there are any other tools to extract text from images programaticaly. If you know of any or have the solution, please share it.

Related

How to open password protected Excel File with NPOI in C#

Good day!
Currently the code to open excel file is:
public void LoadExcelFile(string fullPath)
{
using (var fileStream = File.OpenRead(fullPath))
{
_workbook = WorkbookFactory.Create(fileStream);
}
}
One of the files that I need to open now is password protected.
How can I send in a password to open the file?
Using NPOI version 2.3.0.0
Thank you in advance!
Found a solution:
As mentioned above, NPOI does not cater for a file with a password.
So I added a reference through NuGet to EPPlus and calling it as follows:
public void LoadExcelFile(string fullPath, string password)
{
var file = new FileInfo(fullPath);
var _workbook = new OfficeOpenXml.ExcelPackage(file, "password").Workbook;
}
Using it in OutSystems to load Excel files with various formats.

How to run the installer from an iso file using c# code

I have created an iso image with the installation folder of an application. I want to intialize the execution of the application form a .net code. I have been using the following code to open the image as a drive given that file explorer is the default application for opening iso files, then read the drives to check if there exists the file i want to run.
System.Diagnostics.Process.Start("C:\Users\tjdtud\Desktop\done\publish.iso");
private void button1_Click(object sender, EventArgs e)
{
DriveInfo[] diLocalDrives = DriveInfo.GetDrives();
try
{
foreach (DriveInfo diLogicalDrive in diLocalDrives)
{
if (File.Exists(diLogicalDrive.Name + "setup.exe"))
{
MessageBox.Show(diLogicalDrive.Name + "setup.exe");
System.Diagnostics.Process.Start(diLogicalDrive.Name + "\\setup.exe");
//MessageBox.Show("Logical Drive: " + diLogicalDrive.Name,
// "Logical Drives",
// MessageBoxButtons.OK,
// MessageBoxIcon.Information);
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
This code failes to work if file explorer is not the default iso opening application. Besides i have a strong feeling that it is not even close to the right way of doing it. Will very much appreciate any form of help or pointers to help links. Thank you for reading
You can use .NET DiscUtils to extract the file as follows:
using (FileStream isoStream = File.Open(#"C:\temp\sample.iso"))
{
CDReader cd = new CDReader(isoStream, true);
Stream fileStream = cd.OpenFile(#"Folder\Hello.txt", FileMode.Open);
// Use fileStream...
}
Extract the file to a temporary location and then execute it.

The document cannot be opened because there is an invalid part with an unexpected content type

I am getting a error while opening using a presentation (PPTX files) creation code.
Code i am using is given below:
public static void UpdatePPT()
{
const string presentationmlNamespace = "http://schemas.openxmlformats.org/presentationml/2006/main";
const string drawingmlNamespace = "http://schemas.openxmlformats.org/drawingml/2006/main";
string fileName = Server.MapPath("~/PPT1.pptx"); //path of pptx file
using (PresentationDocument pptPackage = PresentationDocument.Open(fileName, true))
{
} // Using pptPackage
}
and the error i am getting is:
"The document cannot be opened because there is an invalid part with an unexpected content type.
[Part Uri=/ppt/printerSettings/printerSettings1.bin],
[Content Type=application/vnd.openxmlformats-officedocument.presentationml.printerSettings],
[Expected Content Type=application/vnd.openxmlformats-officedocument.spreadsheetml.printerSettings]."
error occurs at using (PresentationDocument pptPackage = PresentationDocument.Open(fileName, true))
Code works fine for many PPTX files. But it is throwing this error on some files.
I am not able to find any solution.
Thanks for your help.
Old post, but I ran in to the same problem. I solved it programatically.
Means:
My code runs using (var document = PresentationDocument.Open(fileName, true))
If this run into a exception I have a document like described. Then I call FixPowerpoint() method and do the other stuff after again.
Here is the method to share (using System.IO.Packaging):
private static void FixPowerpoint(string fileName)
{
//Opening the package associated with file
using (Package wdPackage = Package.Open(fileName, FileMode.Open, FileAccess.ReadWrite))
{
//Uri of the printer settings part
var binPartUri = new Uri("/ppt/printerSettings/printerSettings1.bin", UriKind.Relative);
if (wdPackage.PartExists(binPartUri))
{
//Uri of the presentation part which contains the relationship
var presPartUri = new Uri("/ppt/presentation.xml", UriKind.RelativeOrAbsolute);
var presPart = wdPackage.GetPart(presPartUri);
//Getting the relationship from the URI
var presentationPartRels =
presPart.GetRelationships().Where(a => a.RelationshipType.Equals("http://schemas.openxmlformats.org/officeDocument/2006/relationships/printerSettings",
StringComparison.InvariantCultureIgnoreCase)).SingleOrDefault();
if (presentationPartRels != null)
{
//Delete the relationship
presPart.DeleteRelationship(presentationPartRels.Id);
}
//Delete the part
wdPackage.DeletePart(binPartUri);
}
wdPackage.Close();
}
}
Finally i have solved my problem. The PPTX i got was developed in mac os. So what i did is i just opened a working pptx file. And copied all the contents of not working pptx into working pptx and saved it by the name of not working pptx.

Include resource file available for download

I am new to c# and I am almost done with a simple project. This project needs to include an excel file available for download using a LinkLabel
How can I include this file when compiling my project and when the LinkLabel is click it will as the user where to save the file.
My google search always point me to creating an excel, I dont need to create it, its already available, i just need to include in my resource file.
I am stuck here;
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
}
Now its working fine with the below code, I cant answer my question yet due to low score.
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
string filePath = null;
if (saveFileDialog1.ShowDialog() == DialogResult.OK)
{
filePath = saveFileDialog1.FileName;
File.WriteAllBytes(#filePath, Properties.Resources.importPurchases);
MessageBox.Show("File Successfully saved.\r\n\r\n" + filePath, "Success Message", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
else
{
return;
}
}
Add the excel file to your Project (Add -> Existing Item).
Right Click the added Excel File, go to Properties
Set Build Action to Content
Set Copy to Output Directory as Copy Always or Copy If Newer
By doing this; the Excel File would be copied to the output folder always after build.
var saveFileDialog = new SaveFileDialog();
saveFileDialog.DefaultExt = "xls";
saveFileDialog.Filter = "Excel files (*.xls)|*.xls |All files (*.*)|*.*";
if (saveFileDialog.ShowDialog() == DialogResult.OK)
{
const string MyFileName = "myExcelFile.xls";
string execPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);
var filePath = Path.Combine(execPath, MyFileName);
Microsoft.Office.Interop.Excel.Application app = new Application();
Microsoft.Office.Interop.Excel.Workbook book = app.Workbooks.Open(filePath);
book.SaveAs(saveFileDialog.FileName); //Save
book.Close();
}
Update: The above sample is for Windows Application...

Corrupted PPTX file when using OpenXML to create PowerPoint presentation

**Important: question answered after code block**
Platform:
C#, OpenXML SDK (2.5), .NET 4.0
What I'm trying to achieve
I've been trying to generate a pptx presentation from some data and images coming from my database.
Any generated file gets corrupted, but it's really passing the OOXML validation. I really don't know what else I could do.
What I've already tried
I tried to remove the images, the text, then I've commented the code that deletes the first (template) slide but nothing changes my final result: a corrupted file.
The error
When I try to open the file: "PowerPoint was unable to display some of the text, images, or objects on slides in the file, "filename.pptx", because they have become corrupted.
Affected slides have been replaced by blank slides in the presentation and it is not possible to recover the lost information. To ensure that the file can be opened in previous versions of PowerPoint, use the Save As command (File menu) and save the file with either the same or a new name.
Code
Here's the code I'm using to generate the PPTX:
void GenerateSlides(string fullPath, string path, IEnumerable<Data> data)
{
var slidePath = fullPath;
if (!Directory.Exists(path))
Directory.CreateDirectory(path);
// Copy the template file to generate new slides
File.Copy(string.Format("{0}{1}", path, "TemplateTF.pptx"), slidePath, true);
using (var presentationDocument = PresentationDocument.Open(slidePath, true))
{
var presentationPart = presentationDocument.PresentationPart;
var slideTemplate = (SlidePart)presentationPart.GetPartById("rId2");
// Recover the data to fullfill the slidepart
int i = 1;
foreach (var singleData in data)
{
(...)
// Creates the new image
var newSlide = CloneSlidePart(presentationPart, slideTemplate);
var imgId = "rIdImg" + i;
var imagePart = newSlide.AddImagePart(ImagePartType.Jpeg, imgId);
var stream = new MemoryStream();
using (var file = File.Open(string.Format("{0}{1}"
, WebConfigurationManager.AppSettings["pathImages"]
, singleData.ImageName), FileMode.Open))
{
var buffer = new byte[file.Length];
file.Read(buffer, 0, (int)file.Length);
stream.Write(buffer, 0, buffer.Length);
imagePart.FeedData(new MemoryStream(buffer));
}
// Important method to swap the original image
SwapPhoto(newSlide, imgId);
i++;
InsertContent(newSlide, (...));
SwapPhoto(newSlide, imgId);
newSlide.Slide.Save();
}
DeleteTemplateSlide(presentationPart, slideTemplate);
presentationPart.Presentation.Save();
}
}
void SwapPhoto(SlidePart slidePart, string imgId)
{
var blip = slidePart.Slide.Descendants<Drawing.Blip>().First();
blip.Embed = imgId;
slidePart.Slide.Save();
}
void DeleteTemplateSlide(PresentationPart presentationPart, SlidePart slideTemplate)
{
var slideIdList = presentationPart.Presentation.SlideIdList;
foreach (SlideId slideId in slideIdList.ChildElements)
{
if (slideId.RelationshipId.Value.Equals("rId2"))
{
slideIdList.RemoveChild(slideId);
}
}
presentationPart.DeletePart(slideTemplate);
}
SlidePart CloneSlidePart(PresentationPart presentationPart, SlidePart slideTemplate)
{
var newSlidePart = presentationPart.AddNewPart<SlidePart>("newSlide" + i);
i++;
newSlidePart.FeedData(slideTemplate.GetStream(FileMode.Open));
newSlidePart.AddPart(slideTemplate.SlideLayoutPart);
var slideIdList = presentationPart.Presentation.SlideIdList;
uint maxSlideId = 1;
SlideId prevSlideId = null;
foreach (SlideId slideId in slideIdList.ChildElements)
{
if (slideId.Id > maxSlideId)
{
maxSlideId = slideId.Id;
prevSlideId = slideId;
}
}
maxSlideId++;
var newSlideId = slideIdList.InsertAfter(new SlideId(), prevSlideId);
newSlideId.Id = maxSlideId;
newSlideId.RelationshipId = presentationPart.GetIdOfPart(newSlidePart);
return newSlidePart;
}
void InsertContent(SlidePart slidePart, (...))
{
SwapPlaceholderText(slidePart, "Title", "ReplacementString1");
SwapPlaceholderText(slidePart, "Text", "ReplacementString2");
}
void SwapPlaceholderText(SlidePart slidePart, string placeholder, string value)
{
var textList = slidePart.Slide.Descendants<Drawing.Text>().Where(
t => t.Text.Equals(placeholder)).ToList();
foreach (Drawing.Text text in textList)
{
text.Text = value;
}
}
Answer
Ok, I realized how different MS Office versions can be.
a) If I try to open the .pptx file with Office 2013: error message + opens perfectly, no logo image nor slidepart showing any aditional information
b) If I try to open the .pptx file with Office 2007: error message + empty slides, no information at all
c) If I try to open the .pptx file with Office 2010: error message + empty slides and the most important information I could ever have: corrupted icon in logo's place!!!
I removed the logo image from my template and voilĂ , the file is is perfectly generated. Now, if I really NEED to add the logo image, I can do it programatically.
Thanks! After one week trying to realize what the hell was happening, a great friend of mine opened the file using Office 2010, then I DID realize the logo image was corrupted in my original template file.
Thanks :)
Ok, I realized how different MS Office versions can be.
a) If I try to open the .pptx file with Office 2013: error message + opens perfectly, no logo image nor slidepart showing any aditional information
b) If I try to open the .pptx file with Office 2007: error message + empty slides, no information at all
c) If I try to open the .pptx file with Office 2010: error message + empty slides and the most important information I could ever have: corrupted icon in logo's place!!!
I removed the logo image from my template and voilĂ , the file is is perfectly generated. Now, if I really NEED to add the logo image, I can do it programatically.
Thanks! After one week trying to realize what the hell was happening, a friend of mine opened the file using Office 2010, then I DID realize the logo image was corrupted in my original template file.

Categories