Modifying XMP data with C#

Modifying XMP data with C# - c#

I'm using C# in ASP.NET version 2. I'm trying to open an image file, read (and change) the XMP header, and close it back up again. I can't upgrade ASP, so WIC is out, and I just can't figure out how to get this working.
Here's what I have so far:
Bitmap bmp = new Bitmap(Server.MapPath(imageFile));
MemoryStream ms = new MemoryStream();
StreamReader sr = new StreamReader(Server.MapPath(imageFile));
*[stuff with find and replace here]*
byte[] data = ToByteArray(sr.ReadToEnd());
ms = new MemoryStream(data);
originalImage = System.Drawing.Image.FromStream(ms);
Any suggestions?

How about this kinda thing?
byte[] data = File.ReadAllBytes(path);
... find & replace bit here ...
File.WriteAllBytes(path, data);
Also, i really recommend against using System.Bitmap in an asp.net process, as it leaks memory and will crash/randomly fail every now and again (even MS admit this)
Here's the bit from MS about why System.Drawing.Bitmap isn't stable:
http://msdn.microsoft.com/en-us/library/system.drawing.aspx
"Caution:
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions."

Part 1 of the XMP spec 2012, page 10 specifically talks about how to edit a file in place without needing to understand the surrounding format (although they do suggest this as a last resort). The embedded XMP packet looks like this:
<?xpacket begin="■" id="W5M0MpCehiHzreSzNTczkc9d"?>
... the serialized XMP as described above: ...
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf= ...>
...
</rdf:RDF>
</x:xmpmeta>
... XML whitespace as padding ...
<?xpacket end="w"?>
In this example, ‘■’ represents the
Unicode “zero width non-breaking space
character” (U+FEFF) used as a
byte-order marker.
The (XMP Spec 2010, Part 3, Page 12) also gives specific byte patterns (UTF-8, UTF16, big/little endian) to look for when scanning the bytes. This would complement Chris' answer about reading the file in as a giant byte stream.

You can use the following functions to read/write the binary data:
public byte[] GetBinaryData(string path, int bufferSize)
{
MemoryStream ms = new MemoryStream();
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read))
{
int bytesRead;
byte[] buffer = new byte[bufferSize];
while((bytesRead = fs.Read(buffer,0,bufferSize))>0)
{
ms.Write(buffer,0,bytesRead);
}
}
return(ms.ToArray());
}
public void SaveBinaryData(string path, byte[] data, int bufferSize)
{
using (FileStream fs = File.Open(path, FileMode.Create, FileAccess.Write))
{
int totalBytesSaved = 0;
while (totalBytesSaved<data.Length)
{
int remainingBytes = Math.Min(bufferSize, data.Length - totalBytesSaved);
fs.Write(data, totalBytesSaved, remainingBytes);
totalBytesSaved += remainingBytes;
}
}
}
However, loading entire images to memory would use quite a bit of RAM. I don't know much about XMP headers, but if possible you should:
Load only the headers in memory
Manipulate the headers in memory
Write the headers to a new file
Copy the remaining data from the original file

Related

Decompress file with wrong size

I have a method that decompresses *.gz file:
using (FileStream originalFileStream = new FileStream(gztempfilename, FileMode.Open, FileAccess.Read))
{
using (FileStream decompressedFileStream = new FileStream(outputtempfilename, FileMode.Create, FileAccess.Write))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
}
It worked perfectly, but recently I received pack of files with wrong size:
When I open them with 7-zip they have Packed Size ~ 1,600,000 and Size = 7 (it should be ~20,000,000).
So when I extract them using this code I get only a part of the file. But when I extract this file using 7-zip I get full file.
How can I handle this situation in my code?

My guess is that that the other end does a mistake when GZipping the files. It looks like it does not set the ISIZE bytes correctly.
The ISIZE bytes are the last four bytes of a valid GZip file and come after a 32-bit CRC value which in turn comes directly after the compressed data bytes.
7-Zip seems to be robust against such mistakes whereas the GZipStream is not. It is odd however that 7-Zip is not showing you any errors. It should show you (tested with 7-Zip 16.02 x64/Win7)...
CRC error in case the size is simply wrong,
"Unexpected end of data" in case some or all of the ISIZE bytes are cut off,
"There are some data after end of the payload data" in case there is more data following the ISIZE bytes.
7-Zip always uses the last four bytes of the packed file to determine the size of the original unpacked file without checking if the file is valid and whether the bytes read for that are actually the ISIZE bytes.
You can verify this by checking those last four bytes of the GZipped file with a hex viewer. For your example they should be exactly 07 00 00 00.
If you know the exact size of the unpacked original file you could replace those bytes so that they specify the correct size. For instance, if the unpacked file's size is 20,000,078, which is 01312D4E in hex (0-padded to eight digits), those bytes should be 4E 2D 31 01.
In case you don't know the exact size you can try replacing them with the maximum value, i.e. FF FF FF FF.
After that try your unpack code again.
This is obviously only a hacky solution to your problem. Better try fixing the code that GZips the files you receive or try to find a library that is more robust than GZipStream.

I've used ICSharpCode.SharpZipLib.GZip.GZipInputStream from this library instead of System.IO.Compression.GZipStream and it helped.

Did you try this for check the size? ie:
byte[] bArray;
using (FileStream f = new FileStream(tempFile, FileMode.Open))
{
bArray= new byte[f.Length];
f.Read(b, 0, f.Length);
}
Regards
try:
GZipStream uncompressed = new GZipStream(streamIn, CompressionMode.Decompress, true);
FileStream streamOut = new FileStream(tempDoc[0], FileMode.Create, FileAccess.Write, FileShare.None);

Looks like this is some sort of bug in GZipStream (it does not write original file length into gz end of file).
You need to change the way you compress your files using GZipStream.
The way it will work:
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
And this way will cause the error you have (no matter will you use Flush() or not):
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
{
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
}

You might need to call decompressedStream.Seek() after closing the gZip stream.
As shown here.

What is the efficient way to write stream of bytes into a file in c#?

I have big big data in form of bytes around 5GB.
I need to store this data in a file ServerData.xml. This data should be first converted into string and then should be saved to file so that we can perform operation on the file.
I used below code to convert stream of bytes to string and then to save the same in a file.
private const string fileName = "ServerData.xml";
public void ProcessBuffer(byte[] receiveBuffer, int bytes)
{
if (!File.Exists(fileName))
{
using (File.Create(fileName)) { };
}
TextWriter tw = new StreamWriter(fileName, true);
tw.Write(Encoding.UTF8.GetString(receiveBuffer).TrimEnd((Char)0));
tw.Close();
}
Is it the right way ?
or please suggest better way so that there should not be any memory issue if any in future ?

The code in your question can only work if ProcessBuffer is always called with a UTF-8 encoded text that is broken on code point boundaries. That seems pretty unlikely to me, so I would expect that you encounter errors when decoding to text.
However, decoding to text and then writing, is rather pointless and indeed counter-productive. The bytes are already UTF-8 encoded. Write them directly to file as they arrive from the socket. Don't perform any processing of them. When you come to read the XML using XmlReader, the parser will read the encoding as UTF-8 from the document's XML declaration, and be able to decode the rest of the document. I am assuming that the document's XML declaration specifies UTF-8 but that seems highly likely. You should check.
You should get rid of the text writer which is no use to you for writing bytes. Write the bytes directly to a file stream. And try to avoid opening and closing the file repeatedly. That's very inefficient. Open and close the file exactly once.

Why do you need to convert it to a string?
using System.IO;
public static void WriteBytes(byte[] bytes, string filename)
{
using (FileStream fs = new FileStream(filename, FileMode.OpenOrCreate))
using (BinaryWriter writer = new BinaryWriter(fs, Encoding.UTF8))
{
writer.Write(bytes);
}
}

You can simply write these bytes to a file using FileStream:
public void ProcessBuffer(byte[] receivedBuffer, int bytes)
{
using (var fileStream = new FileStream(fileName, FileMode.Create)) // overwrites file
{
fileStream.Write(receivedBuffer, 0, bytes);
}
}
Update: You won't be able to work with such a big XML document if you don't have enough resources. I would suggest reformatting this file. For example, I would parse this XML and insert data into a SQL database. Then, you can easily operate with such amounts of data.

I would prefer that I write all bytes to file. And when reading, convert it to string and then convert to XML using XDocument, XElement etc. By writing bytes in file you will save space, and it is efficient,
Instead of using FileStream, I will prefer File.WriteAllBytes method.
private const string fileName = "ServerData.xml";
public void ProcessBuffer(byte[] receiveBuffer, int bytes)
{
File.WriteAllBytes(filename, bytes);
// And when reading
var bytes = File.ReadAllBytes(filename);
var binaryReader = new BinaryReader(new MemoryStream(bytes));
// Parse strings and make xml,
binaryReader.ReadString();
}

FileStream to Bitmap - Parameter is not valid

I have read the posts on this subject but none of them explains it to me clearly enough to be able to fix the problem.
I am trying to upload a file from a local directory to the server.
Here is my code:
string fullPath = Path.Combine(
AppDomain.CurrentDomain.BaseDirectory + #"Images\Readings", PhotoFileName);
Stream s = System.IO.File.OpenRead(fileUpload);
byte[] buffer = new byte[s.Length];
s.Read(buffer, 0, Convert.ToInt32(s.Length));
using (FileStream fs = new FileStream(fullPath, FileMode.Create))
{
fs.Write(buffer, 0, Convert.ToInt32(fs.Length));
Bitmap bmp = new Bitmap((Stream)fs);
bmp.Save(fs, ImageFormat.Jpeg);
}
I keep on getting an Argument Exception: "Parameter is not valid" on line:
Bitmap bmp = new Bitmap((Stream)fs);
Can anyone explain this to me please

There are at least two problems, probably three. First, your copying code is broken:
byte[] buffer = new byte[s.Length];
s.Read(buffer, 0, Convert.ToInt32(s.Length));
You've assumed that this will read all of the data in a single Read call, and ignored the return value for Read. Generally, you'd need to loop round, reading data and writing it (the amount you've just read) to the output stream, until you read the end. However, as of .NET 4, Stream.CopyTo makes this much simpler.
Next is how you're creating the bitmap:
using (FileStream fs = new FileStream(fullPath, FileMode.Create))
{
fs.Write(buffer, 0, Convert.ToInt32(fs.Length));
Bitmap bmp = new Bitmap((Stream)fs);
bmp.Save(fs, ImageFormat.Jpeg);
}
You're trying to read from the stream when you've just written to it - but without "rewinding"... so there's no more data left to read.
Finally, I would strongly advise against using Bitmap.Save to write to the same stream that you're loading the bitmap from. Bitmap will keep a stream open, and read from it when it needs to - if you're trying to write to it at the same time, that could be very confusing.
It's not clear why you're using Bitmap at all, to be honest - if you're just trying to save the file that was uploaded, without any changes, just use:
using (Stream input = File.OpenRead(fileUpload),
output = File.Create(fullPath))
{
input.CopyTo(output);
}
This is assuming that fileUpload really is an appropriate filename - it's not clear why you haven't just written the file to the place you want to write it to straight away, to be honest. Or use File.Copy to copy the file. The above code should work with any stream, so you can change it to save the stream straight from the request...

How to compress the xml file at the time of Writing

I ave writing an XML File of size more than 1GB but at the time of writing I want to compress that file so that the size of an xml file is reduces so that at tile of xmlDoc.Load(fileName) load the file in minimum time duration.
my code for Writing an XML File is
using (FileStream fileStream = new FileStream(_logFilePath, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite))
{
xmlDoc.Load(fileStream);
int byteLenght = fileStream.ReadByte();
byte[] intBytes = BitConverter.GetBytes(byteLenght);
intBytes = Compress(intBytes);
xmlDoc.DocumentElement.AppendChild(newelement);
fileStream.SetLength(0);
xmlDoc.Save(fileStream);
}
also for compression
private static byte[] Compress(byte[] data)
{
byte[] retVal;
using (MemoryStream compressedMemoryStream = new MemoryStream())
{
DeflateStream compressStream = new DeflateStream(compressedMemoryStream, CompressionMode.Compress, true);
compressStream.Write(data, 0, data.Length);
compressStream.Close();
retVal = new byte[compressedMemoryStream.Length];
compressedMemoryStream.Position = 0L;
compressedMemoryStream.Read(retVal, 0, retVal.Length);
compressedMemoryStream.Close();
compressStream.Close();
}
return retVal;
}
but its not work for compression the file.

Compressing the file on disk won't do much to improve the time spent loading the document, because the larger part of the time is in building up the object graph for the XmlDocument. It is so slow that it can take as long (or longer) as reading the uncompressed XML from disk. Although compression can save time here, it's only a minor gain if a fast media like an internal hdd is used.
If you want to improve performance working with large XML files, you'll need to use something like an XmlReader that streams the file instead of loading it all at once.

Best way to read a large file into a byte array in C#?

I have a web server which will read large binary files (several megabytes) into byte arrays. The server could be reading several files at the same time (different page requests), so I am looking for the most optimized way for doing this without taxing the CPU too much. Is the code below good enough?
public byte[] FileToByteArray(string fileName)
{
byte[] buff = null;
FileStream fs = new FileStream(fileName,
FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
buff = br.ReadBytes((int) numBytes);
return buff;
}

Simply replace the whole thing with:
return File.ReadAllBytes(fileName);
However, if you are concerned about the memory consumption, you should not read the whole file into memory all at once at all. You should do that in chunks.

I might argue that the answer here generally is "don't". Unless you absolutely need all the data at once, consider using a Stream-based API (or some variant of reader / iterator). That is especially important when you have multiple parallel operations (as suggested by the question) to minimise system load and maximise throughput.
For example, if you are streaming data to a caller:
Stream dest = ...
using(Stream source = File.OpenRead(path)) {
byte[] buffer = new byte[2048];
int bytesRead;
while((bytesRead = source.Read(buffer, 0, buffer.Length)) > 0) {
dest.Write(buffer, 0, bytesRead);
}
}

I would think this:
byte[] file = System.IO.File.ReadAllBytes(fileName);

Your code can be factored to this (in lieu of File.ReadAllBytes):
public byte[] ReadAllBytes(string fileName)
{
byte[] buffer = null;
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
buffer = new byte[fs.Length];
fs.Read(buffer, 0, (int)fs.Length);
}
return buffer;
}
Note the Integer.MaxValue - file size limitation placed by the Read method. In other words you can only read a 2GB chunk at once.
Also note that the last argument to the FileStream is a buffer size.
I would also suggest reading about FileStream and BufferedStream.
As always a simple sample program to profile which is fastest will be most beneficial.
Also your underlying hardware will have a large effect on performance. Are you using server based hard disk drives with large caches and a RAID card with onboard memory cache? Or are you using a standard drive connected to the IDE port?

Depending on the frequency of operations, the size of the files, and the number of files you're looking at, there are other performance issues to take into consideration. One thing to remember, is that each of your byte arrays will be released at the mercy of the garbage collector. If you're not caching any of that data, you could end up creating a lot of garbage and be losing most of your performance to % Time in GC. If the chunks are larger than 85K, you'll be allocating to the Large Object Heap(LOH) which will require a collection of all generations to free up (this is very expensive, and on a server will stop all execution while it's going on). Additionally, if you have a ton of objects on the LOH, you can end up with LOH fragmentation (the LOH is never compacted) which leads to poor performance and out of memory exceptions. You can recycle the process once you hit a certain point, but I don't know if that's a best practice.
The point is, you should consider the full life cycle of your app before necessarily just reading all the bytes into memory the fastest way possible or you might be trading short term performance for overall performance.

I'd say BinaryReader is fine, but can be refactored to this, instead of all those lines of code for getting the length of the buffer:
public byte[] FileToByteArray(string fileName)
{
byte[] fileData = null;
using (FileStream fs = File.OpenRead(fileName))
{
using (BinaryReader binaryReader = new BinaryReader(fs))
{
fileData = binaryReader.ReadBytes((int)fs.Length);
}
}
return fileData;
}
Should be better than using .ReadAllBytes(), since I saw in the comments on the top response that includes .ReadAllBytes() that one of the commenters had problems with files > 600 MB, since a BinaryReader is meant for this sort of thing. Also, putting it in a using statement ensures the FileStream and BinaryReader are closed and disposed.

In case with 'a large file' is meant beyond the 4GB limit, then my following written code logic is appropriate. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries.
In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size.
(using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)
private uint Crc32CAlgorithmBigCrc(string fileName)
{
uint hash = 0;
byte[] buffer = null;
FileInfo fileInfo = new FileInfo(fileName);
long fileLength = fileInfo.Length;
int blockSize = 1024000000;
decimal div = fileLength / blockSize;
int blocks = (int)Math.Floor(div);
int restBytes = (int)(fileLength - (blocks * blockSize));
long offsetFile = 0;
uint interHash = 0;
Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
bool firstBlock = true;
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
buffer = new byte[blockSize];
using (BinaryReader br = new BinaryReader(fs))
{
while (blocks > 0)
{
blocks -= 1;
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(blockSize);
if (firstBlock)
{
firstBlock = false;
interHash = Crc32CAlgorithm.Compute(buffer);
hash = interHash;
}
else
{
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
offsetFile += blockSize;
}
if (restBytes > 0)
{
Array.Resize(ref buffer, restBytes);
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(restBytes);
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
buffer = null;
}
}
//MessageBox.Show(hash.ToString());
//MessageBox.Show(hash.ToString("X"));
return hash;
}

Overview: if your image is added as a action= embedded resource then use the GetExecutingAssembly to retrieve the jpg resource into a stream then read the binary data in the stream into an byte array
public byte[] GetAImage()
{
byte[] bytes=null;
var assembly = Assembly.GetExecutingAssembly();
var resourceName = "MYWebApi.Images.X_my_image.jpg";
using (Stream stream = assembly.GetManifestResourceStream(resourceName))
{
bytes = new byte[stream.Length];
stream.Read(bytes, 0, (int)stream.Length);
}
return bytes;
}

Use the BufferedStream class in C# to improve performance. A buffer is a block of bytes in memory used to cache data, thereby reducing the number of calls to the operating system. Buffers improve read and write performance.
See the following for a code example and additional explanation:
http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

use this:
bytesRead = responseStream.ReadAsync(buffer, 0, Length).Result;

I would recommend trying the Response.TransferFile() method then a Response.Flush() and Response.End() for serving your large files.

If you're dealing with files above 2 GB, you'll find that the above methods fail.
It's much easier just to hand the stream off to MD5 and allow that to chunk your file for you:
private byte[] computeFileHash(string filename)
{
MD5 md5 = MD5.Create();
using (FileStream fs = new FileStream(filename, FileMode.Open))
{
byte[] hash = md5.ComputeHash(fs);
return hash;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Modifying XMP data with C# - c#

Related

Decompress file with wrong size

What is the efficient way to write stream of bytes into a file in c#?

FileStream to Bitmap - Parameter is not valid

How to compress the xml file at the time of Writing

Best way to read a large file into a byte array in C#?

Categories

Resources