I have looked into this fairly extensively, but have not found quite what I am looking for. The two methods I have found are
1: Using the Microsoft.Office.Interop.Excel, Iterate through the workbook's shapes, then copy the images to the clipboard and then take the clipboard data and put that into a bitmap, then finally save that bitmap. The problem with this method is the clipboard. We would like this to be used in a multi-threaded environment and are afraid of clipboard issues between threads. We would rather not deal with clipboard-locking.
2: Save the file as an .HTML file, then grab the images from the "_files" folder created where the document is saved to. The problem with this is that two images get created for each image (1 high res, 1 low res), and there is no good way to determine which images are low-res and which images are high-res, as they are all named image### and some files list them high, low, high, low, while some list them high, high, low, low. Using all of these files is slow and takes up space, which is not ideal. I could check images by aspect ratios, but this is obviously not great because multiple images could have the same aspect ratio.
Is there a way to parse the Excel.Shape directly as a bitmap (or any image format) without using the clipboard? It seems like there must be a way, because the Shape.CopyPicture method is able to send it to the clipboard in an image format.
Otherwise, is there a way to do something similar to number 2 without getting duplicates? I would prefer a solution that avoids using 3rd party libraries.
Thank you.
I assume you use the Excel 2007/2010 format. Every Excel 2007/2010 file is a zip-file! So simply rename your file Excel.xlsx to Excel.zip. Then unzip the file to a folder. The images are in the folder '/xl/media'.
Try this manually. Knowing this, it's easy to automate this by writing a C# program for this.
Related
I have written a program that has a feature for viewing images. It works fine for common files types, I simply use a picturebox. But I want to be able to load DNG (Digital Negative) images too.
I don't need to load the whole thing, loading just the baked in jpg preview would be fine, even if not all DNGs have it, I am willing to settle for this to avoid a huge hassle. Plus since DNGs are so big and my program is meant to browse through them quickly this might be best anyway.
I have tried using WindowsAPICodePack.Shell to get the Extra Large Bitmap or Icon from the file. I am already using this for my "explorer" thumbnails. But there is nothing larger than "Large" which I believe is around 256px. I know that the image has something around 1024px jpg preview.
Is there an easy way to get at the jpg preview of a DNG file?
I'm trying to create a card game in C# and for this I have alot of images that I need to load. They're all jpg images and there are about 7000 of them.
I would like to make sure that if you download the game, the images will not be easily accessible, meaning that they should not just be JPG images in a sub folder of the application. So I thought about imbedding them in a DLL file.
But how do I do this? And how do I handle this efficiently? Is there a tecnique to this sort of thing, or is another method preferable?
I would like to make sure that [...] the images will not be easily accessible
First, you should ask yourself why you want to forbid this. If you just want to avoid that someone else manipulates the pictures, you can leave them in a bunch of subfolders as JPGs, just generate checksums for each file and check them at the time the program loads the pictures.
If you want to avoid reuse of the pictures, you can leave them in a bunch of subfolders, but not as JPGs. Encode them with for example with the standard AES algorithm. But beware, that won't prevent anyone else of making screenshots while you application is running, so you should consider if that's really worth the effort.
EDIT: if you want to embed the images because installation gets easier when you have just one big file to deploy instead of 7000 single files, then you may write a helper program for creating resource files programmatically. See this page from Microsoft, especially the part about .resource files, to learn how to utilize the ResourceWriter class for that purpose.
If you have 7000 image, you need a database. Microsoft SQL Server Compact 4.0 is an option. It's small and easy to use.
I'm assuming that this is a windows application
In order to Embed a Image to the assembly
1. Right click the Image file and Select properties
2. In the Properties Pane Set the BuildAction as Embeded resource
So this Image becomes a embeded resource when the application is compiled
Then you can access the Image from the assembly like:
global::[[MyNameSpace]].Properties.Resources.[[ImageName]]
for eg:this.pictureBox1.Image = global::[[MyNameSpace]].Properties.Resources.[[ImageName]]
Hi and thanks for looking!
Update
For the sake of clarity, a third-party .NET library is just fine. Preferably an open-source or free one. The solution need not be native .NET.
Background
I am working on an enterprise web application for which the client has given us thousands of pages of content in MS Word documents that we have to parse, extract data, and send to the content database.
Within these docs are various embedded images representing a larger original image in a separate folder.
The client did not provide any paths to the original source image, so when we see content with an embedded image in the MS Word doc, we have to go through several "assets" folders and look for the corresponding image which is extraordinarily time consuming.
We are already using DocX to parse the documents, so you can assume that we have a list of bitmap images to loop through that we have pulled from the document.
Question
Given a list of bitmaps that we just extracted from the document, how do we search a different folder containing hundreds of images, for the matching image, and then return the file path to it?
TinEye.com does this over the web. I am wondering if, using System.Drawing or something, we can do it on a PC with C#.
Thanks!
Matt
Hate to propose an answer to my own question, but I think I might be on to something here. Here is heuristic/pseudo code for a C# forms app--your thoughts are appreciated:
Part 1
Using System.IO, traverse the "assets" folders and get all images.
For each image, Base64 encode it.
Take the resulting string and place in an XML file:
<Image>
<Path>C:\SomePath</Path>
<EncodedString>[Some Base64 String]<Encoded String>
</Image>
Now we have an XML file containing all original images, in Base64 form, along with their file path.
Part 2
Using DocX, extract all images from MS Word Doc.
For each image, use Linq-to-Xml to search for an exact match in the XML file from Part 1.
If there are no exact matches, start iterating the XML file and computing the Levenshtein distance.
While in the foreach store the XML node Id (or file path) and Levenshtein Distance as a key value pair in an object.
Take the k/v pair with the lowest LD score and return the file path.
For performance, set tolerance so that the foreach stops if a certain original image has an acceptably low LD score when compared to the image extracted from the document.
Since this is a one-off task, I don't need instant performance. So, I could run this tonight before leaving the office and, hopefully, come back tomorrow to a list of paths connecting the original images to the ones embedded in the docs.
UPDATE
The heuristic above worked beautifully! I ended up using the Sift library to efficiently calculate distances between Base64 strings. Specifically, I used their FastDistance() method. Having 100% accuracy on finding the images I need, even if the angle from which the photo was taken is slightly different.
There is no built-in algorithm in the .NET framework for generating image similarity. You'd need to use a third-party library or do it yourself. Lots of image similarity algo questions on SO:
Algorithm for finding similar images
How can I measure the similarity between two images?
comparing images programmatically - lib or class
One more, for .NET: Are there any OK image recognition libraries for .NET?. This one refers you to AForge, which seems to have the algorithm that you are after.
According to this SO answer to a similar question, you should look at OpenCV and VLFeat. The former has a C++ API and the latter a C API, so you would need to write your own P/Invoke wrapper or perhaps wrap them in a C++/CLI facade, which you could call from C#.
I have a system for creating a pdf book from users own images. The images are in high resolution and the pdf end up with around 70pages with pictures on most of them.
When generating the pdf the in a local application on the server the process uses around 3Gb of ram which makes it crash more often then it succeeds. The files are also really huge, around 1,2 Gb. Running it through a print to pdf would make it a a hundred times smaller.
Is there a way to make ABCPdf use less memory and creating smaller files?
I have had a very similar experience with iTextSharp, where I was basically running out of memory anytime I create a large PDF with images in it.
I found that there is a function that I should call to release images after I am done with the image, since it holds it in memory in case you want to use it again or until you finally close the PDF.
Either reuse the image if they are repeating header/footer logos, or release images on the go.
Most likely that is the issue you are facing, but I have no experience in ABCPdf.
I've not used ABCPdf directly but I'd suspect that the images are the source of your issues, resize them before they are included in the PDF objects. I suspect that's what a print-to-PDF process will be doing.
One other note, for very large PDF's you may want to set "linearize" to false.
<pdfDoc.SaveOptions.Linearize = false;>
This optimizes the PDF for web streaming, so if you are streaming the PDF, then you might want to leave it as true, but I've found it drastically increases the memory used by ABCPDF during the save.
My C# application receives image files from KOFAX VRS TWAIN driver in TWSX_FILE mode, but neither my own .NET based application nor Windows default image viewer can open these files. However, Adobe Photoshop can open them without any problem.
I tried FreeImage library and although it detects their dimensions correctly it renders black images.
It seems that KOFAX has some kind of its own bitmap format which its header is different from normal bmp files:
http://www.fileformat.info/mirror/egff/ch03_03.htm
I have uploaded one of these files here:
http://www.box.net/shared/aby42aagz4
I wanted to know how can I open these images in my applications, anybody knows any lightweight open source/free library or C++/C# code snippet, supporting this image format?
You've basically answered your own question: The file is neither a Windows bitmap file nor is it in the documented Kofax Raster Format.
As you pointed out, the first two bytes are 'BM', which would indicate the file is purporting to be a Windows bitmap. However, if that were truly the case the next four bytes would contain the file size. In your sample file, the next four bytes contain a value much bigger than the actual file size so it can't be correctly interpreted as a Windows bitmap file.
As the fileformat.info site you linked to states, if the file was truly in Kofax Raster Format, it would start with the bytes '68464B2Eh'. Thus, your file isn't in Kofax Raster Format either. In fact, I tried opening it with Kofax's VCDemo software and got the following error: "Error 20204 - Internal invalid state"
Thus, Kofax's own software thinks the file is corrupt.
The fact that Photoshop can open it and display something doesn't necessarily mean it's a valid image file format. Image processing software packages will often simply try to guess at interpreting the raw bytes of the file. Sometimes they get lucky, sometimes not.
Trying to find something that can read the files assumes that the file is in a standard format, which it isn't. Thus, I wouldn't search for something that could read the file but instead search for why the VRS/TWAIN configuration you are using is producing a non-standard format.