This should be a pretty trivial programming task in C#, however after I have searched a while I simply cannot find anything relevant on how to remove metadata.
I want to remove jpg and png image metadata such as: folder path, shared with, owner and computer.
My application is an MVC 4 application. In my website users can upload an image I get this image at this ActionResult method
if (image != null)
{
photo.ImageFileName = image.FileName;
photo.ImageMimeType = image.ContentType;
photo.PhotoFile = new byte[image.ContentLength];
image.InputStream.Read(photo.PhotoFile, 0, image.ContentLength);
}
Photo is a property in the model, goes like this.
public byte[] PhotoFile { get; set; }
I imagine the way to remove above mentioned metadata or just all metadata, would be to use some coding like this
if (image != null)
{
image = image.RemoveAllMetaData; !!!
I dont mind using some 3rd party dll as long as it is compatible with NET 4.
Thanks.
'Metadata' here is a bit ambiguous--Do you mean the data which is required for a viewer to properly determine the image format so it can be displayed, saving only the raw image data? Or, more likely, do you mean the extra information, such as author, camera type, GPS location, etc, that is often added via the EXIF tags?
If you mean something like the EXIF data, there's a lot of programming material already on the web about how to add/modify/remove EXIF tags, and even some apps which already strips such tags: http://www.steelbytes.com/?mid=30 for example.
If you mean you just want the raw image data, you'll probably have to read and process the image first, since both JPEG and PNG do not contain simply the raw image data; It's encoded with various methods--which is why they contain metadata to tell you how to decode it in the first place. You'll have to learn/explore the JPEG and PNG data formats to extract the original raw image data (or a reasonable facsimile in the case of a "lossy" encoding).
All the above is well-documented on various websites which can be found on Google, and many include image manipulation libraries which can handle these chores for you. I suspect you just didn't know to search for something like "JPEG PNG EXIF METADATA".
BTW, EXIF applies to JPEG's, where EXIF is, loosely (and not fully technically correct) an addition of data (extension) to the end of the JPEG file, which can usually simply be truncated to remove. A quick Google search for me turned up something like libexif.sourceforge.net and other similar results.
I'm not entirely certain about the PNG format, but I believe the PNG format (which does call such items "metadata" as well) was written to include such data as part of the file format rather than an "extension" tagged on after the fact like EXIF is. PNG, however, is open source, and you can obtain libraries and code for manipulating them from the PNG website (www.libpng.org).
There's an app for that but it's written in Perl. It doesn't recompress the image and it's here http://www.sno.phy.queensu.ca/~phil/exiftool
Found it in this thread
How to remove EXIF data without recompressing the JPEG?
Do what all the social media websites do. Create a new image file, stream in the image byte data and use the file you created than the original one that was uploaded. Of course, now you will need to find out the original image's color depth and so on so that the image you create is not of a lower quality -- unless you need to do a disk or image resize as well.
Related
Looking for a way to convert iTextSharp.text.pdf.BarcodeQRCode to System.Drawing.Image
This is what I have so far...
public System.Drawing.Image GetQRCode(string content)
{
iTextSharp.text.pdf.BarcodeQRCode qrcode = new iTextSharp.text.pdf.BarcodeQRCode(content, 115, 115, null);
iTextSharp.text.Image img = qrcode.GetImage();
MemoryStream ms = new MemoryStream(img.OriginalData);
return System.Drawing.Image.FromStream(ms);
}
In line 3 above using img.OriginalData returns null
Using img.RawData on line 3 instead thows invalid parameter error on line 4.
I've googled some of the code samples on how to perform the thing you want and your code (the "OriginalData" approach) is basicaly the same: https://csharp.hotexamples.com/examples/iTextSharp.text.pdf/BarcodeQRCode/-/php-barcodeqrcode-class-examples.html .
However, I don't see how it could work. From my investigations of BarcodeQRCode#getImage it seems that OriginalData is not set while processing such a barcode, so it will always be null.
More than that, the code you mention belongs to iText 5, which is end of life and no longer maintained (with an exception of considerable security fixes), so it's recommended to update to iText 7.
As for iText 7, I do see how to achieve the same in Java, since barcode classes do have a createAwtImage method there. .NET, on the other hand, lacks such a functionality, so I'd day that one unfortunately couldn't do it in .NET.
There are some good reasons for that. iText's Images (and a barcode could be easily converted to an iText's Image object as shown here: https://kb.itextpdf.com/home/it7kb/faq/how-to-generate-2d-barcode-as-vector-image) represent a PDF's XObject. In PDF syntax, an image file (jpg, png, etc.) is an XObject with the raw image data stored inside. However, an XObject can also contain PDF syntaxt content (it is not just used for image files). So to render such a content one needs to process the data from PDF syntax to image syntax, which is not that easy. There are some means in Java's awt to do so, that's why it's implemented in Java. As for .NET, since there is no out-of-the-box means to convert PDF images to System.Drawing.Image, it was decided not to implement it.
To conclude, there is another iText product, pdfRender, which allows one to convert PDF files (and you could create a page just for a barcode) to images. Perhaps you might want to play with it: https://itextpdf.com/en/products/itext-7/convert-pdf-to-image-pdfrender
I am trying to validate an image submitted to backend in Base64 string format by parsing it into an Image object, extracting it from the same Image object and finally comparing input byte array and output byte array assuming these two should be the same or there was something wrong in the input image. Here is the code:
private void UpdatePhoto(string photoBase64)
{
var imageDataInBytes = Convert.FromBase64String(photoBase64);
ValidateImageContent(imageDataInBytes);
}
private void ValidateImageContent(byte[] imageDataInBytes)
{
using (var inputMem = new MemoryStream(imageDataInBytes))
{
var img = Image.FromStream(inputMem, false, true);
using (MemoryStream outputMemStream = new MemoryStream())
{
img.Save(outputMemStream, img.RawFormat);
var outputSerialized = outputMemStream.ToArray();
if (!outputSerialized.SequenceEqual(imageDataInBytes))
throw new Exception("Invalid image. Identified extra data in the input. Please upload another photo.");
}
}
}
and it fails on an image that I know is a valid one.
Is my assumption wrong that output of Image.Save must be the same as what Image.FromStream is fed with? Is there a way to correct this logic to achieve this way of validation correctly?
If you compare the original image with the created image, you will notice a few differences in the metadata: For my sample image, I could observe that some metadata was stripped (XMP data was completely removed). In addition, while the EXIF data was preserved, the endianness it is written in was reversed from little endian to big endian. This alone explains why the data won’t match.
In my example, the actual image data was identical but you won’t be able to tell easily from just looking at the bytes.
If you wanted to produce a result identical to the source, you would have to produce the metadata in the exact same way as the source did. You won’t be able to do so without actually looking closely at the metadata of the original photo though. .NET’s Image simply isn’t able to pertain all the metadata that a file could contain. And even if you were able to extract all the metadata and store it in the right format again, there are lots of fine nuances between metadata serializers that make it very difficult to produce the exact same result.
So if you wanted to compare the images, you should probably strip the metadata and just compare the image data. But then, when you think about how you save the image (Raw), then you will just get the exact same blob of data again, so I wouldn’t expect differences there.
I tried to use magickimage.net (C#) to convert HEIC image files (from iPhone 7) to JPG format.
Using all the default values (see below), though it converts successfully -- however, when comparing the converted image, vs, if I copy the files from iPhone directly to my computer as JPG, I noticed the images converted from magickimage look more "pale", lacking vivid color (saturation, I would say).
Just wondering if anyone happens to know the right settings to improve that?
using (MagickImage image = new MagickImage(files[i]))
{
image.Format = MagickFormat.Jpeg;
image.Write(MyFile.ReplaceFileExtension(files[i], "jpg"));
}
Just a guess, most likely not apply the correct color profile.
Check the image if it has Color properties ie. Table VII. Image Properties and Figure 1 in the Technical spec for HEIF is at:
https://nokiatech.github.io/heif/technical.html
Also I would check if the following display the image correctly.
https://github.com/liuziangexit/HEIF-Utility/tree/master/HEIF%20Utility%20English
I need to convert image files to PDF without using third party libraries in C#. The images can be in any format like (.jpg, .png, .jpeg, .tiff).
I am successfully able to do this with the help of itextsharp; here is the code.
string value = string.Empty;//value contains the data from a json file
List<string> sampleData;
public void convertdata()
{
//sampleData = Newtonsoft.Json.JsonConvert.DeserializeObject<List<string>>(value);
var jsonD = System.IO.File.ReadAllLines(#"json.txt");
sampleData = Newtonsoft.Json.JsonConvert.DeserializeObject<List<string>>(jsonD[0]);
Document document = new Document();
using (var stream = new FileStream("test111.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
PdfWriter.GetInstance(document, stream);
document.Open();
foreach (var item in sampleData)
{
newdata = Convert.FromBase64String(item);
var image = iTextSharp.text.Image.GetInstance(newdata);
document.Add(image);
Console.WriteLine("Conversion done check folder");
}
document.Close();
}
But now I need to perform the same without using third party library.
I have searched the internet but I am unable to get something that can suggest a proper answer. All I am getting is to use it with either "itextsharp" or "PdfSharp" or with the "GhostScriptApi".
Would someone suggest a possible solution?
This is doable but not practical in the sense that it would very likely take way too much time for you to implement. The general procedure is:
Open the image file format
Either copy the encoded bytes verbatim to a stream in a PDF document you have created or decode the image data and re-encode it in a PDF stream (whether it's the former or latter depends on the image format)
Save the PDF
This looks easy (it's only three points after all :-)) but when you start to investigate you'll see that it's very complicated.
First of all you need to understand enough of the PDF specification to write a new PDF file from scratch, doing all of the right things. The PDF specification is way over 1000 pages by now; you don't need all of it but you need to support a good portion of it to write a proper PDF document.
Secondly you will need to understand every image file format you want to support. That by itself is not trivial (the TIFF file format for example is so broad that it's a nightmare to support a reasonable fraction of TIFF files out there). In some cases you'll be able to simply copy the bulk of an image file format into your PDF document (jpeg files fall in that category for example), that's a complication you want to support because uncompressing the JPEG file and then recompressing it in a PDF stream will cause quality loss.
So... possible? Yes. Plausible? No. Unless you have gotten lots and lots of time to complete this project.
The structure of the simpliest PDF document with one single page and one single image is the following:
- pdf header
- pdf document catalog
- pages info
- image
- image header
- image data
- page
- reference to image
- list of references to objects inside pdf document
Check this Python code that is doing the following steps to convert image to PDF:
Writes PDF header;
Checks image data to find which filter to use. You should better select just one format like FlateDecode codec (used by PDF to compress images without loss);
Writes "catalog" object which is basically is the array of references to page objects.
Writes image object header;
Writes image data (pixels by pixels, converted to the given codec format) as the "stream" object in pdf;
Writes "page" object which contains "image" object;
Writes "trailer" section with the set of references to objects inside PDF and their starting offsets. PDF format stores references of objects at the end of PDF document.
I would write my own ASP.NET Web Service or Web API service and call it within the app :)
I want to add image to pdf file. I'm using iTextSharp for this.
I have the following code:
var imageBanner = iTextSharp.text.Image.GetInstance(bannerImagePath);
The problem in that the RawData property is equal NULL for jpg images, but for png is all ok.
Please read chapter 10 of the book "iText in Action". The Image class is abstract. It has different implementations for different image types. Some image types exist in PDF. For instance: a JPG (DCTDecode) can be copied literally into a PDF. File types such as PNG and GIF don't exist in PDF, so they need to be converted to raw data first; they are compressed (FlateDecode) later on in the process.
As there's absolutely no need for any 'processing' when dealing with JPGs, no memory is wasted on creating a raw image. It would be bad if RawData weren't null, hence my question: why is it a problem for you? you should be happy that RawData is null!