I'm reading in a .docx file using the Novacode API, and am unable to create or display any images within the file to a WinForm app due to not being able to convert from a Novacode Picture (pic) or Image to a system image. I've noticed that there's very little info inside the pic itself, with no way to get any pixel data that I can see. So I have been unable to utilize any of the usual conversion ideas.
I've also looked up how Word saves images inside the files as well as Novacode source for any hints and I've come up with nothing.
My question then is is there a way to convert a Novacode Picture to a system one, or should I use something different to gather the image data like OpenXML? If so, would Novacode and OpenXML conflict in any way?
There's also this answer that might be another place to start.
Any help is much appreciated.
Okay. This is what I ended up doing. Thanks to gattsbr for the advice. This only works if you can grab all the images in order, and have descending names for all the images.
using System.IO.Compression; // Had to add an assembly for this
using Novacode;
// Have to specify to remove ambiguous error from Novacode
Dictionary<string, System.Drawing.Image> images = new Dictionary<string, System.Drawing.Image>();
void LoadTree()
{
// In case of previous exception
if(File.Exists("Images.zip")) { File.Delete("Images.zip"); }
// Allow the file to be open while parsing
using(FileStream stream = File.Open("Images.docx", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using(DocX doc = DocX.Load(stream))
{
// Work rest of document
// Still parse here to get the names of the images
// Might have to drag and drop images into the file, rather than insert through Word
foreach(Picture pic in doc.Pictures)
{
string name = pic.Description;
if(null == name) { continue; }
name = name.Substring(name.LastIndexOf("\\") + 1);
name = name.Substring(0, name.Length - 4);
images[name] = null;
}
// Save while still open
doc.SaveAs("Images.zip");
}
}
// Use temp zip directory to extract images
using(ZipArchive zip = ZipFile.OpenRead("Images.zip"))
{
// Gather all image names, in order
// They're retrieved from the bottom up, so reverse
string[] keys = images.Keys.OrderByDescending(o => o).Reverse().ToArray();
for(int i = 1; ; i++)
{
// Also had to add an assembly for ZipArchiveEntry
ZipArchiveEntry entry = zip.GetEntry(String.Format("word/media/image{0}.png", i));
if(null == entry) { break; }
Stream stream = entry.Open();
images[keys[i - 1]] = new Bitmap(stream);
}
}
// Remove temp directory
File.Delete("Images.zip");
}
Related
I have this method for resizing images, and I have managed to input all of the metadata into the new image except for the XMP data. Now, I can only find topics on how manage the XMP part in C++ but I need it in C#. The closest I've gotten is the xmp-sharp project which is based on some old port of Adobe's SDK, but I can't get that working for me. The MetaDataExtractor project gives me the same results - that is, file format/encoding not supported. I've tried this with .jpg, .png and .tif files.
Is there no good way of reading and writing XMP in C#?
Here is my code if it's of any help (omitting all irrelevant parts):
public Task<Stream> Resize(Size size, Stream image)
{
using (var bitmap = Image.FromStream(image))
{
var newSize = new Size(size.Width, size.Height);
var ms = new MemoryStream();
using (var bmPhoto = new Bitmap(newSize.Width, newSize.Height, PixelFormat.Format24bppRgb))
{
// This saves all metadata except XMP
foreach (var id in bitmap.PropertyIdList)
bmPhoto.SetPropertyItem(bitmap.GetPropertyItem(id));
// Trying to use xmp-sharp for the XMP part
try
{
IXmpMeta xmp = XmpMetaFactory.Parse(image);
}
catch (XmpException e)
{
// Here, I always get "Unsupported Encoding, XML parsing failure"
}
// Trying to use MetadataExtractor for the XMP part
try
{
var xmpDirs = ImageMetadataReader.ReadMetadata(image).Where(d => d.Name == "XMP");
}
catch (Exception e)
{
// Here, I always get "File format is not supported"
}
// more code to modify image and save to stream
}
ms.Position = 0;
return Task.FromResult<Stream>(ms);
}
}
The reason you get "File format is not supported" is because you already consumed the image from the stream when you called Image.FromStream(image) in the first few lines.
If you don't do that, you should find that you can read out the XMP just fine.
var xmp = ImageMetadataReader.ReadMetadata(stream).OfType<XmpDirectory().FirstOrDefault();
If your stream is seekable, you might be able to seek back to the origin (using the Seek method, or by setting Position to zero.)
I've just started using ImageResizer to create thumbnails for my images using the code from their website below.
private void CreateThumbnail()
{
Dictionary<string, string> versions = new Dictionary<string, string>();
//Define the versions to generate
versions.Add("_thumb", "width=100&height=100&crop=auto&format=jpg"); //Crop to square thumbnail
versions.Add("_medium", "maxwidth=100&maxheight=100&format=jpg"); //Fit inside 400x400 area, jpeg
versions.Add("_large", "maxwidth=1900&maxheight=1900&format=jpg"); //Fit inside 1900x1200 area
//Loop through each uploaded file
foreach (string fileKey in HttpContext.Current.Request.Files.Keys)
{
HttpPostedFile file = HttpContext.Current.Request.Files[fileKey];
if (file.ContentLength <= 0) continue; //Skip unused file controls.
//Get the physical path for the uploads folder and make sure it exists
string uploadFolder = MapPath("~/Images");
if (!Directory.Exists(uploadFolder)) Directory.CreateDirectory(uploadFolder);
//Generate each version
foreach (string suffix in versions.Keys)
{
//Generate a filename (GUIDs are best).
string fileName = Path.Combine(uploadFolder, "AssetID" + suffix);
//Let the image builder add the correct extension based on the output file type
fileName = ImageBuilder.Current.Build(file, fileName, new ResizeSettings(versions[suffix]), false, true);
}
}
}
However, when I apply this code to a Pdf it crashes with the error 'File may be corrupted, empty, or may contain a PNG image with a single dimension greater than 65,535 pixels.'
What changes do I need to make to enable sizing a Pdf? I've gone through their documentation and although it seems that it will create a Thumbnail from a Pdf, the examples are all using images.
This is the list of Plugins including PdfRenderer
The PdfiumRenderer or PdfRenderer plugin (whichever you chose) is not installed. Thus, the primary decoder is failing to decode the image.
You must install a PDF plugin for this to work.
PdfiumRenderer is the better of the two.
See http://imageresizing.net/docs/v4/plugins/pdfiumrenderer
I have a pdf file with a cover that looks like the following:
Now, I need to remove the so-called 'galley marks' around the edges of the cover. I am using iTextSharp with C# and I need code using iTextSharp to create a new document with only the intended cover or use PdfStamper to remove that. Or any other solution using iTextSharp that would deliver the results.
I have been unable to find any good code samples in my search to this point.
Do you have to actually remove them or can you just crop them out? If you can just crop them out then the code below will work. If you have to actually remove them from the file then to the best of my knowledge there isn't a simple way to do that. Those objects aren't explicitly marked as meta-objects to the best of my knowledge. The only way I can think of to remove them would be to inspect everything and see if it fits into the document's active area.
Below is sample code that reads each page in the input file and finds the various boxes that might exist, trim, art and bleed. (See this page.)
As long as it finds at least one it sets the page's crop box to the first item in the list. In your case you might actually have to perform some logic to find the "smallest" of all of those items or you might be able to just know that "art" will always work for you. See the code for additional comments. This targets iTextSharp 5.4.0.0.
//Sample input file
var inputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Binder1.pdf");
//Sample output file
var outputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Cropped.pdf");
//Bind a reader to our input file
using (var r = new PdfReader(inputFile)) {
//Get the number of pages
var pageCount = r.NumberOfPages;
//See this for a list: http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfReader.html#getBoxSize(int, java.lang.String)
var boxNames = new string[] { "trim", "art", "bleed" };
//We'll create a list of all possible boxes to pick from later
List<iTextSharp.text.Rectangle> boxes;
//Loop through each page
for (var i = 1; i <= pageCount; i++) {
//Initialize our list for this page
boxes = new List<iTextSharp.text.Rectangle>();
//Loop through the list of known boxes
for (var j = 0; j < boxNames.Length; j++) {
//If the box exists
if(r.GetBoxSize(i, boxNames[j]) != null){
//Add it to our collection
boxes.Add(r.GetBoxSize(i, boxNames[j]));
}
}
//If we found at least one box
if (boxes.Count > 0) {
//Get the page's entire dictionary
var dict = r.GetPageN(i);
//At this point we might want to apply some logic to find the "inner most" box if our trim/bleed/art aren't all the same
//I'm just hard-coding the first item in the list for demonstration purposes
//Set the page's crop box to the specified box
dict.Put(PdfName.CROPBOX, new PdfRectangle(boxes[0]));
}
}
//Create our output file
using (var fs = new FileStream(outputFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
//Bind a stamper to our reader and output file
using(var stamper = new PdfStamper(r,fs)){
//We did all of our PDF manipulation above so we don't actually have to do anything here
}
}
}
**Important: question answered after code block**
Platform:
C#, OpenXML SDK (2.5), .NET 4.0
What I'm trying to achieve
I've been trying to generate a pptx presentation from some data and images coming from my database.
Any generated file gets corrupted, but it's really passing the OOXML validation. I really don't know what else I could do.
What I've already tried
I tried to remove the images, the text, then I've commented the code that deletes the first (template) slide but nothing changes my final result: a corrupted file.
The error
When I try to open the file: "PowerPoint was unable to display some of the text, images, or objects on slides in the file, "filename.pptx", because they have become corrupted.
Affected slides have been replaced by blank slides in the presentation and it is not possible to recover the lost information. To ensure that the file can be opened in previous versions of PowerPoint, use the Save As command (File menu) and save the file with either the same or a new name.
Code
Here's the code I'm using to generate the PPTX:
void GenerateSlides(string fullPath, string path, IEnumerable<Data> data)
{
var slidePath = fullPath;
if (!Directory.Exists(path))
Directory.CreateDirectory(path);
// Copy the template file to generate new slides
File.Copy(string.Format("{0}{1}", path, "TemplateTF.pptx"), slidePath, true);
using (var presentationDocument = PresentationDocument.Open(slidePath, true))
{
var presentationPart = presentationDocument.PresentationPart;
var slideTemplate = (SlidePart)presentationPart.GetPartById("rId2");
// Recover the data to fullfill the slidepart
int i = 1;
foreach (var singleData in data)
{
(...)
// Creates the new image
var newSlide = CloneSlidePart(presentationPart, slideTemplate);
var imgId = "rIdImg" + i;
var imagePart = newSlide.AddImagePart(ImagePartType.Jpeg, imgId);
var stream = new MemoryStream();
using (var file = File.Open(string.Format("{0}{1}"
, WebConfigurationManager.AppSettings["pathImages"]
, singleData.ImageName), FileMode.Open))
{
var buffer = new byte[file.Length];
file.Read(buffer, 0, (int)file.Length);
stream.Write(buffer, 0, buffer.Length);
imagePart.FeedData(new MemoryStream(buffer));
}
// Important method to swap the original image
SwapPhoto(newSlide, imgId);
i++;
InsertContent(newSlide, (...));
SwapPhoto(newSlide, imgId);
newSlide.Slide.Save();
}
DeleteTemplateSlide(presentationPart, slideTemplate);
presentationPart.Presentation.Save();
}
}
void SwapPhoto(SlidePart slidePart, string imgId)
{
var blip = slidePart.Slide.Descendants<Drawing.Blip>().First();
blip.Embed = imgId;
slidePart.Slide.Save();
}
void DeleteTemplateSlide(PresentationPart presentationPart, SlidePart slideTemplate)
{
var slideIdList = presentationPart.Presentation.SlideIdList;
foreach (SlideId slideId in slideIdList.ChildElements)
{
if (slideId.RelationshipId.Value.Equals("rId2"))
{
slideIdList.RemoveChild(slideId);
}
}
presentationPart.DeletePart(slideTemplate);
}
SlidePart CloneSlidePart(PresentationPart presentationPart, SlidePart slideTemplate)
{
var newSlidePart = presentationPart.AddNewPart<SlidePart>("newSlide" + i);
i++;
newSlidePart.FeedData(slideTemplate.GetStream(FileMode.Open));
newSlidePart.AddPart(slideTemplate.SlideLayoutPart);
var slideIdList = presentationPart.Presentation.SlideIdList;
uint maxSlideId = 1;
SlideId prevSlideId = null;
foreach (SlideId slideId in slideIdList.ChildElements)
{
if (slideId.Id > maxSlideId)
{
maxSlideId = slideId.Id;
prevSlideId = slideId;
}
}
maxSlideId++;
var newSlideId = slideIdList.InsertAfter(new SlideId(), prevSlideId);
newSlideId.Id = maxSlideId;
newSlideId.RelationshipId = presentationPart.GetIdOfPart(newSlidePart);
return newSlidePart;
}
void InsertContent(SlidePart slidePart, (...))
{
SwapPlaceholderText(slidePart, "Title", "ReplacementString1");
SwapPlaceholderText(slidePart, "Text", "ReplacementString2");
}
void SwapPlaceholderText(SlidePart slidePart, string placeholder, string value)
{
var textList = slidePart.Slide.Descendants<Drawing.Text>().Where(
t => t.Text.Equals(placeholder)).ToList();
foreach (Drawing.Text text in textList)
{
text.Text = value;
}
}
Answer
Ok, I realized how different MS Office versions can be.
a) If I try to open the .pptx file with Office 2013: error message + opens perfectly, no logo image nor slidepart showing any aditional information
b) If I try to open the .pptx file with Office 2007: error message + empty slides, no information at all
c) If I try to open the .pptx file with Office 2010: error message + empty slides and the most important information I could ever have: corrupted icon in logo's place!!!
I removed the logo image from my template and voilĂ , the file is is perfectly generated. Now, if I really NEED to add the logo image, I can do it programatically.
Thanks! After one week trying to realize what the hell was happening, a great friend of mine opened the file using Office 2010, then I DID realize the logo image was corrupted in my original template file.
Thanks :)
Ok, I realized how different MS Office versions can be.
a) If I try to open the .pptx file with Office 2013: error message + opens perfectly, no logo image nor slidepart showing any aditional information
b) If I try to open the .pptx file with Office 2007: error message + empty slides, no information at all
c) If I try to open the .pptx file with Office 2010: error message + empty slides and the most important information I could ever have: corrupted icon in logo's place!!!
I removed the logo image from my template and voilĂ , the file is is perfectly generated. Now, if I really NEED to add the logo image, I can do it programatically.
Thanks! After one week trying to realize what the hell was happening, a friend of mine opened the file using Office 2010, then I DID realize the logo image was corrupted in my original template file.
I have been developing a web application with asp.net and I have smoe question about SharZipLib. I have a file called Template.odt (from Open Office) and this file is a compacted file (like docx) and we have some other files inside it (manifiest, xml, images etc). I need to open this file change a file called content.xml and styles.xml and save in another .odt file and give to my client. But I'm not sure if we can use temporary files, so I was thinking how to do this using MemoryStream.
Look what I got:
protected byte[ GetReport() {
Stream inputStream = File.OpenRead(Server.MapPath("~/Odt/Template.odt"));
var zipInputStream = new ZipInputStream(inputStream);
var outputStream = new MemoryStream();
var zipOutputStream = new ZipOutputStream(outputStream);
ZipEntry entry = zipInputStream.GetNextEntry();
while (entry != null) {
if (entry.Name == "content.xml")
// how change the content ?
else if (entry.Name == "styles.xml")
// how change the content ?
// how to add it or create folders in the output ?
zipOutputStream.Write( ??? );
entry = zipInputStream.GetNextEntry();
}
zipOutputStream.Flush();
return outputStream.ToArray();
}
I'm not sure if it's right but I think it's on the way.
I try to take ExtraData from ZipEntry instance but I got it null, is it normal ?
Can someone help me?
Thank you
An example of how you can update ZIP files in memory can be found here:
http://wiki.sharpdevelop.net/SharpZipLib_Updating.ashx#Updating_a_zip_file_in_memory_1
In your case, you probably have to load content.xml into a XmlDocument or XDocument to modify it - but that depends on what you are trying to change exactly.
As a sidemark: when using streams, make sure you are disposing of them. The easiest way is to wrap the operation in using statement:
using(var inputStream = File.OpenRead(Server.MapPath("~/Odt/Template.odt")))
{
// ...
}
More information on that: http://www.codeproject.com/Articles/6564/Understanding-the-using-statement-in-C