Accessing Images in PowerPoint file via VSTO Add-In (C#)

Accessing Images in PowerPoint file via VSTO Add-In (C#) - c#

I am trying to build a PowerPoint add-in that will allow me to do a one-click optimization of the current presentation by removing unused master slides and converting huge 24-bit PNG files to slightly-compressed JPGs.
I have the first part handled already, and now I'm working on the images. Although I can easily find the Shape object containing the image, I cannot find a way to access the source image through the managed API. At best, I can copy the shape to the clipboard, which does give me the image but in a different format (MemoryBmp).
using PPTX = Microsoft.Office.Interop.PowerPoint;
...
foreach (PPTX.Slide slide in Globals.ThisAddIn.Application.ActivePresentation.Slides)
{
foreach (PPTX.Shape shape in slide.Shapes)
{
if (shape.Type == MsoShapeType.msoPicture)
{
// Now, how can I access the source image contained within this shape?
// I -can- copy it via the clipboard, like this:
shape.Copy();
System.Drawing.Image image = Clipboard.GetImage();
// ...but image's format reflects a MemoryBmp ImageFormat, as noted here:
// https://referencesource.microsoft.com/#System.Drawing/commonui/System/Drawing/Advanced/ImageFormat.cs
// ...which doesn't help me determine if the source image is something that should be optimized.
}
}
}
Obviously, I can get to the images directly via other methods (e.g. accessing the contents of the file as a ZIP, or using OpenXML SDK), but I'm trying to perform all the steps from within the already-opened presentation (which means I can't update the open file). Any thoughts on how I can do this?

You could use a Microsoft.Office.Interop.PowerPoint.Shape Export function.
void Export(string PathName, PpShapeFormat Filter, int ScaleWidth = 0, int ScaleHeight = 0, PpExportMode ExportMode = PpExportMode.ppRelativeToSlide)
There is a very little documentation of this function.
I've found it in the Shape interface definition.
So your code would look something like:
shape.Export(<some_file>, PpShapeFormat.ppShapeFormatPNG);
More info on MSDN

Related

How to open Image manager through Enterprise Architect API

I am working on Enterprise Architect through C# add-ins. I need to display the image manager through automation where user can add directly add images on an "add image" button click in form.
I use the API Repository.InvokeConstructPicker() but it only opens the select package/class/component window. Is there an EA API available to open the Image Manager.

No, there is none. There is the undocumented Respository.CustomCommand which can open several properties windows. But the image manager is not part of that (or it has not been discovered what parameters to supply).
Please see Edit2 below about adding new values to the table.
Edit: Based on another question I had to dig into this a bit deeper.
I found out that, although EA imports a number of different image formats, it internally uses PNG to store the image. Obviously their BMP-importer does not like all BMP formats (not so deep in that, but I seem to remember there's some 8/16 bit stuff; typical Windoze weirdness). Anyhow, I used this Python code snippet to retrieve some test image data, previously imported into EA:
import sys
import win32com.client
import base64
import xml.etree.ElementTree
eaRep = None
try:
eaApp = win32com.client.GetActiveObject(Class="EA.App")
eaRep = eaApp.repository
except:
print "failure to open EA"
sys.exit(2)
def dump():
sqlRes = eaRep.SQLQuery("SELECT * FROM t_image")
root = xml.etree.ElementTree.fromstring(sqlRes)
for dataset in root:
for data in dataset:
for row in data:
name = row[1].text
print name
data = row[3].text
png = base64.standard_b64decode(data)
file = open("c:/" + name + ".png", "wb")
file.write(png)
file.close()
dump()
This correctly extracted the images from the database.
Edit2: I was assuming that EA stores the png as base64, but that's not true. EA only delivers base64 on return of SQLQuery. But they internally just store the raw png in Image. So, unfortunately, you can not use Repository.Execute since it can not transport binary data - or at least I have not figured out how to do that. As a work around you can look into Repository.ConnectionString and open a native connection to the database. Once you have plugged the new picture(s) in the table you can use them via thier ImageID.
Contents of t_image:
ImageID : You just need to create an unique ID
Name : an arbitrary string
Type : fixed string Bitmap
Image : blob of a png
Here's a Python snippet that connects natively to an EAP file:
import pyodbc
db_file = r'''C:\Documents and Settings\Administrator\Desktop\empty.eap'''
odbc_conn_str = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=' + db_file
conn = pyodbc.connect(odbc_conn_str)
cursor = conn.cursor()
cursor.execute("select * from t_image")
row = cursor.fetchone()
if row:
print(row)
Rather than printing the row with the image data (which shows that its contents is a png-blob) you can use it to actually issue an INSERT or UPDATE to modify t_image.

How to get text from image using C# .NET [duplicate]

Is there an API to use Onenote OCR capabilities to recognise text in images automatically?

If you have OneNote client on the same machine as your program will execute you can create a page in OneNote and insert the image through the COM API. Then you can read the page in XML format which will include the OCR'ed text.
You want to use
Application.CreateNewPage to create a page
Application.UpdatePageContent to insert the image
Application.GetPageContent to read the page content and look for OCRData and OCRText elements in the XML.
OneNote COM API is documented here: http://msdn.microsoft.com/en-us/library/office/jj680120(v=office.15).aspx

When you put an image on a page in OneNote through the API, any images will automatically be OCR'd. The user will then be able to search any text in the images in OneNote. However, you cannot pull the image back and read the OCR'd text at this point.
If this is a feature that interests you, I invite you to go to our UserVoice site and submit this idea: http://onenote.uservoice.com/forums/245490-onenote-developers
update: vote on the idea: https://onenote.uservoice.com/forums/245490-onenote-developer-apis/suggestions/10671321-make-ocr-available-in-the-c-api
-- James

There is a really good sample of how to do this here:
http://www.journeyofcode.com/free-easy-ocr-c-using-onenote/
The main bit of code is:
private string RecognizeIntern(Image image)
{
this._page.Reload();
this._page.Clear();
this._page.AddImage(image);
this._page.Save();
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
return null;
}

As I will be deleting my blog (which was mentioned in another post), I thought I should add the content here for future reference:
Usage
Let's start by taking a look on how to use the component: The class OnenoteOcrEngine implements the core functionality and implements the interface IOcrEngine which provides a single method:
public interface IOcrEngine
{
string Recognize(Image image);
}
Excluding any error handling, it can be used in a way similar to the following one:
using (var ocrEngine = new OnenoteOcrEngine())
using (var image = Image.FromFile(imagePath))
{
var text = ocrEngine.Recognize(image);
if (text == null)
Console.WriteLine("nothing recognized");
else
Console.WriteLine("Recognized: " + text);
}
Implementation
The implementation is far less straight-forward. Prior to Office 2010, Microsoft Office Document Imaging (MODI) was available for OCR. Unfortunately, this no longer is the case. Further research confirmed that OneNote's OCR functionality is not directly exposed in form of an API, but the suggestions were made to manually parse OneNote documents for the text (see Is it possible to do OCR on a Tiff image using the OneNote interop API? or need a document to extract text from image using onenote Interop?. And that's exactly what I did:
Connect to OneNote using COM interop
Create a temporary page containing the image to process
Show the temporary page (important because OneNote won't perform the OCR otherwise)
Poll for an OCRData tag containing an OCRText tag in the XML code of the page.
Delete the temporary page
Challenges included the parsing of the XML code for which I decided to use LINQ to XML. For example, inserting the image was done using the following code:
private XElement CreateImageTag(Image image)
{
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.ToBase64(image);
img.Add(data);
return img;
}
private string ToBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
Note the usage of XName.Get("Image", OneNoteNamespace) (where OneNoteNamespace is the constant "http://schemas.microsoft.com/office/onenote/2013/onenote" ) for creating the element with the correct namespace and the method ToBase64 which serializes an GDI-image from memory into the Base64 format. Unfortunately, polling (See What is wrong with polling? for a discussion of the topic) in combination with a timeout is necessary to determine whether the detection process has completed successfully:
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
Results
The results are not perfect. Considering the quality of the images, however, they are more than satisfactory in my opinion. I could successfully use the component in my project. One issue remains which is very annoying: Sometimes, OneNote crashes during the process. Most of the times, a simple restart will fix this issue, but trying to recognise text from some images reproducibly crashes OneNote.
Code / Download
Check out the code at GitHub

not sure about OCR, but the documentation site for onenote API is this
http://msdn.microsoft.com/en-us/library/office/dn575425.aspx#sectionSection1

Onenote OCR capabilities in a desktop software

Is there an API to use Onenote OCR capabilities to recognise text in images automatically?

If you have OneNote client on the same machine as your program will execute you can create a page in OneNote and insert the image through the COM API. Then you can read the page in XML format which will include the OCR'ed text.
You want to use
Application.CreateNewPage to create a page
Application.UpdatePageContent to insert the image
Application.GetPageContent to read the page content and look for OCRData and OCRText elements in the XML.
OneNote COM API is documented here: http://msdn.microsoft.com/en-us/library/office/jj680120(v=office.15).aspx

When you put an image on a page in OneNote through the API, any images will automatically be OCR'd. The user will then be able to search any text in the images in OneNote. However, you cannot pull the image back and read the OCR'd text at this point.
If this is a feature that interests you, I invite you to go to our UserVoice site and submit this idea: http://onenote.uservoice.com/forums/245490-onenote-developers
update: vote on the idea: https://onenote.uservoice.com/forums/245490-onenote-developer-apis/suggestions/10671321-make-ocr-available-in-the-c-api
-- James

There is a really good sample of how to do this here:
http://www.journeyofcode.com/free-easy-ocr-c-using-onenote/
The main bit of code is:
private string RecognizeIntern(Image image)
{
this._page.Reload();
this._page.Clear();
this._page.AddImage(image);
this._page.Save();
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
return null;
}

As I will be deleting my blog (which was mentioned in another post), I thought I should add the content here for future reference:
Usage
Let's start by taking a look on how to use the component: The class OnenoteOcrEngine implements the core functionality and implements the interface IOcrEngine which provides a single method:
public interface IOcrEngine
{
string Recognize(Image image);
}
Excluding any error handling, it can be used in a way similar to the following one:
using (var ocrEngine = new OnenoteOcrEngine())
using (var image = Image.FromFile(imagePath))
{
var text = ocrEngine.Recognize(image);
if (text == null)
Console.WriteLine("nothing recognized");
else
Console.WriteLine("Recognized: " + text);
}
Implementation
The implementation is far less straight-forward. Prior to Office 2010, Microsoft Office Document Imaging (MODI) was available for OCR. Unfortunately, this no longer is the case. Further research confirmed that OneNote's OCR functionality is not directly exposed in form of an API, but the suggestions were made to manually parse OneNote documents for the text (see Is it possible to do OCR on a Tiff image using the OneNote interop API? or need a document to extract text from image using onenote Interop?. And that's exactly what I did:
Connect to OneNote using COM interop
Create a temporary page containing the image to process
Show the temporary page (important because OneNote won't perform the OCR otherwise)
Poll for an OCRData tag containing an OCRText tag in the XML code of the page.
Delete the temporary page
Challenges included the parsing of the XML code for which I decided to use LINQ to XML. For example, inserting the image was done using the following code:
private XElement CreateImageTag(Image image)
{
var img = new XElement(XName.Get("Image", OneNoteNamespace));
var data = new XElement(XName.Get("Data", OneNoteNamespace));
data.Value = this.ToBase64(image);
img.Add(data);
return img;
}
private string ToBase64(Image image)
{
using (var memoryStream = new MemoryStream())
{
image.Save(memoryStream, ImageFormat.Png);
var binary = memoryStream.ToArray();
return Convert.ToBase64String(binary);
}
}
Note the usage of XName.Get("Image", OneNoteNamespace) (where OneNoteNamespace is the constant "http://schemas.microsoft.com/office/onenote/2013/onenote" ) for creating the element with the correct namespace and the method ToBase64 which serializes an GDI-image from memory into the Base64 format. Unfortunately, polling (See What is wrong with polling? for a discussion of the topic) in combination with a timeout is necessary to determine whether the detection process has completed successfully:
int total = 0;
do
{
Thread.Sleep(PollInterval);
this._page.Reload();
string result = this._page.ReadOcrText();
if (result != null)
return result;
} while (total++ < PollAttempts);
Results
The results are not perfect. Considering the quality of the images, however, they are more than satisfactory in my opinion. I could successfully use the component in my project. One issue remains which is very annoying: Sometimes, OneNote crashes during the process. Most of the times, a simple restart will fix this issue, but trying to recognise text from some images reproducibly crashes OneNote.
Code / Download
Check out the code at GitHub

not sure about OCR, but the documentation site for onenote API is this
http://msdn.microsoft.com/en-us/library/office/dn575425.aspx#sectionSection1

Can a PDF be converted to a vector image format that can be printed from .NET?

We have a .NET app which prints to both real printers and PDF, currently using PDFsharp, although that part can be changed if there's a better option. Most of the output is generated text or images, but there can be one or more pages that get appended to the end. That page(s) are provided by the end-user in PDF format.
When printing to paper, our users use pre-printed paper, but in the case of an exported PDF, we concatenate those pages to the end, since they're already in PDF format.
We want to be able to embed those PDFs directly into the print stream so they don't need pre-printed paper. However, there aren't really any good options for rendering a PDF to a GDI page (System.Drawing.Graphics).
Is there a vector format the PDF could be converted to by some external program, that could rendered to a GDI+ page without being degraded by conversion to a bitmap first?

In an article titled "How To Convert PDF to EMF In .NET," I have shown how to do this using our PDFOne .NET product. EMFs are vector graphics and you can render them on the printer canvas.
A simpler alternative for you is PDF overlay explained in another article titled "PDF Overlay - Stitching PDF Pages Together in .NET." PDFOne allows x-y offsets in overlays that allows you stitch pages on the edges. In the article cited here, I have overlaid the pages one over another by setting the offsets to zero. You will have set it to page width and height.
DISCLAIMER: I work for Gnostice.

Ghostscript can output PostScript (which is a vector file) which can be directly sent to some types of printers. For example, if you're using an LPR capable printer, the PS file can be directly set to that printer using something like this project: http://www.codeproject.com/KB/printing/lpr.aspx
There are also some commercial options which can print a PDF (although I'm not sure if the internal mechanism is vector or bitmap based), for example http://www.tallcomponents.com/pdfcontrols2-features.aspx or http://www.tallcomponents.com/pdfrasterizer3.aspx

I finally figured out that there is an option that addresses my general requirement of embedding a vector format into a print job, but it doesn't work with GDI based printing.
The XPS file format created by Microsoft XPS Writer print driver can be printed from WPF, using the ReachFramework.dll included in .NET. By using WPF for printing instead of GDI, it's possible to embed an XPS document page into a larger print document.
The downside is, WPF printing works quite a bit different, so all the support code that directly uses stuff in the Sytem.Drawing namespace has to be re-written.
Here's the basic outline of how to embed the XPS document:
Open the document:
XpsDocument xpsDoc = new XpsDocument(filename, System.IO.FileAccess.Read);
var document = xpsDoc.GetFixedDocumentSequence().DocumentPaginator;
// pass the document into a custom DocumentPaginator that will decide
// what order to print the pages:
var mypaginator = new myDocumentPaginator(new DocumentPaginator[] { document });
// pass the paginator into PrintDialog.PrintDocument() to do the actual printing:
new PrintDialog().PrintDocument(mypaginator, "printjobname");
Then create a descendant of DocumentPaginator, that will do your actual printing. Override the abstract methods, in particular the GetPage should return DocumentPages in the correct order. Here's my proof of concept code that demonstrates how to append custom content to a list of Xps documents:
public override DocumentPage GetPage(int pageNumber)
{
for (int i = 0; i < children.Count; i++)
{
if (pageNumber >= pageCounts[i])
pageNumber -= pageCounts[i];
else
return FixFixedPage(children[i].GetPage(pageNumber));
}
if (pageNumber < PageCount)
{
DrawingVisual dv = new DrawingVisual();
var dc = dv.Drawing.Append();
dc = dv.RenderOpen();
DoRender(pageNumber, dc); // some method to render stuff to the DrawingContext
dc.Close();
return new DocumentPage(dv);
}
return null;
}
When trying to print to another XPS document, it gives an exception "FixedPage cannot contain another FixedPage", and a post by H.Alipourian demonstrates how to fix it: http://social.msdn.microsoft.com/Forums/da/wpf/thread/841e804b-9130-4476-8709-0d2854c11582
private DocumentPage FixFixedPage(DocumentPage page)
{
if (!(page.Visual is FixedPage))
return page;
// Create a new ContainerVisual as a new parent for page children
var cv = new ContainerVisual();
foreach (var child in ((FixedPage)page.Visual).Children)
{
// Make a shallow clone of the child using reflection
var childClone = (UIElement)child.GetType().GetMethod(
"MemberwiseClone", BindingFlags.Instance | BindingFlags.NonPublic
).Invoke(child, null);
// Setting the parent of the cloned child to the created ContainerVisual
// by using Reflection.
// WARNING: If we use Add and Remove methods on the FixedPage.Children,
// for some reason it will throw an exception concerning event handlers
// after the printing job has finished.
var parentField = childClone.GetType().GetField(
"_parent", BindingFlags.Instance | BindingFlags.NonPublic);
if (parentField != null)
{
parentField.SetValue(childClone, null);
cv.Children.Add(childClone);
}
}
return new DocumentPage(cv, page.Size, page.BleedBox, page.ContentBox);
}
Sorry that it's not exactly compiling code, I just wanted to provide an overview of the pieces of code necessary to make it work to give other people a head start on all the disparate pieces that need to come together to make it work. Trying to create a more generalized solution would be much more complex than the scope of this answer.

While not open source and not .NET native (Delphi based I believe, but offers a precompiled .NET library), Quick PDF can render a PDF to an EMF file which you could load into your Graphics object.

Reading PSD file format

I wonder if this is even possible. I have an application that adds a context menu when you right click a file. It all works fine but here is what I'd like to do:
If the file is a PSD then I want the program to extract the image. Is this possible to do without having Photoshop installed?
Basically I want the user to right click and click "image" which would save a .jpg of the file for them.
edit: will be using c#
Thanks

The ImageMagick libraries (which provide bindings for C#) also support the PSD format. They might be easier to get started with than getting into the Paint.NET code and also come with a quite free (BSD-like) license.
A simple sample (found at http://midimick.com/magicknet/magickDoc.html) using MagickNet would look like this:
using System;
static void Main(string[] args)
{
MagickNet.Magick.Init();
MagicNet.Image img = new MagicNet.Image("file.psd");
img.Resize(System.Drawing.Size(100,100));
img.Write("newFile.png");
MagickNet.Magick.Term();
}
Note: MagickNet has moved to http://www.codeproject.com/KB/dotnet/ImageMagick_in_VBNET.aspx

Well, there's a PSD plugin for Paint.NET which I think is Open-Source which you might want to take a look at for starters:
http://frankblumenberg.de/doku/doku.php?id=paintnet:psdplugin#download

This guy do it easier:
http://www.codeproject.com/KB/graphics/simplepsd.aspx
With a C# library and a sample project.
I've tried with PS2 files and works ok.

I have written a PSD parser which extracts raster format layers from all versions of PSD and PSB. http://www.telegraphics.com.au/svn/psdparse/trunk

You can use GroupDocs.Viewer for .NET API to render your PSD files as images (JPG, PNG, BMP) in your application using a few lines of code.
C#
ViewerConfig config = new ViewerConfig();
config.StoragePath = "D:\\storage\\";
// Create handler
ViewerImageHandler imageHandler = new ViewerImageHandler(config);
// Guid implies that unique document name
string guid = "sample.psd";
// Get document pages as images
List<PageImage> pages = imageHandler.GetPages(guid);
foreach (PageImage page in pages)
{
// Access each image using page.Stream
}
For more details and sample code, please visit here.
Disclosure: I work as a Developer Evangelist at GroupDocs.

For people who are reading this now: the link from accepted answer doesn't seem to work anymore (at least for me). Would add a comment there, but not allowed to comment yet - hence I'm adding a new answer.
The working link where you can find the psdplugin code for Paint.Net: https://github.com/PsdPlugin/PsdPlugin

Here is my own psd parser and exporter:
http://papirosnik.info/psdsplit/.
It allows to correctly parse psd with rgb color 8, 16- and 32-bit for channel, process user masks, export selected layers into jpeg, png, jng, bmp, tiff; create xml layout of exported layers and groups and also create a texture atlas and animations set from given layers.
It's entirely written in C#. If you want its sources inform me via support link on About dialog in the application.

ImageMagick.NET - http://imagemagick.codeplex.com/ - is the later version of the link 0xA3 gave, with some slightly different syntax. (Note, this is untested):
using ImageMagickNET;
public void Test() {
MagickNet.InitializeMagick();
ImageMagickNET.Image img = new ImageMagickNET.Image("file.psd");
img.Resize(new Geometry(100, 100, 0, 0, false, false);
img.Write("newFile.png");
}

I got extraction from psd working. see my answer here
How to extract layers from a Photoshop file? C#
may help someone else.

FastStone does this pretty efficiently.
They do not have their libraries availaible, but I guess you can contact them and see if they can help.
Check out their website: http://www.faststone.org/download.htm

I've had great success with Aspose's Imaging component which can load and save PSD files without Photoshop: https://products.aspose.com/imaging/net

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Accessing Images in PowerPoint file via VSTO Add-In (C#) - c#

Related

How to open Image manager through Enterprise Architect API

How to get text from image using C# .NET [duplicate]

Onenote OCR capabilities in a desktop software

Can a PDF be converted to a vector image format that can be printed from .NET?

Reading PSD file format

Categories

Resources