Streaming large multi-part uploads to S3 using AmazonS3Client

Streaming large multi-part uploads to S3 using AmazonS3Client - c#

An upload stream given to AmazonS3Client must be seekable. You can make a Stream seekable using AmazonS3Util.MakeStreamSeekable. However, the source of this reveals that it will not perform well with large streams:
public static System.IO.Stream MakeStreamSeekable(System.IO.Stream input)
{
System.IO.MemoryStream output = new System.IO.MemoryStream();
const int readSize = 32 * 1024;
byte[] buffer = new byte[readSize];
int count = 0;
using (input)
{
while ((count = input.Read(buffer, 0, readSize)) > 0)
{
output.Write(buffer, 0, count);
}
}
output.Position = 0;
return output;
}
So, what approaches are available to upload a Stream to S3 without copying the entire contents into memory?

Each chunk must be a seekable Stream, but you can have many chunks. The solution is to split up the input Stream.

Related

Convert Stream to byte[] c# for large files of 2GB

Trying to convert Stream object to byte[] and using the below method for the same:
public static byte[] ReadFully(System.IO.Stream input)
{
byte[] buffer = new byte[16*1024];
using (System.IO.MemoryStream ms = new System.IO.MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
However the input parameter "input" is for large file that is of 2 GB and hence the code does not get enter into while loop and hence does not convert it to byte array.
For smaller files it is working fine

That's what a Stream is for.
You don't load the whole content into a byte[], you read a small buffer from the Stream into memory and handle it, then dispose and read the next buffer.
If you still need to use a byte[]:
It seems like your app can't handle more than 2^32 Bytes Memory, meaning it's 32bit.
Try changing it to 64bit (in Project Properties go to Build and disable Prefer 32 bit)

Object limited below 32bit. (that is why all index using int)
how about use list contains byte array to deal entire data?
public List<byte[]> ReadBytesList(string fileName)
{
List<byte[]> rawDataBytes= new List<byte[]>();
byte[] buff;
FileStream fs = new FileStream(fileName,
FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
int arrayCount= (int)(numBytes / 2100000000); //2147483648 is max
int arrayRest = (int)(numBytes % 2100000000);
if(arrayCount>0)
{
for (int i = 0; i < arrayCount; i++)
{
buff = br.ReadBytes(2100000000);
rawDataBytes.Add(buff);
}
buff = br.ReadBytes(arrayRest);
rawDataBytes.Add(buff);
}
else
{
buff = br.ReadBytes(arrayRest);
rawDataBytes.Add(buff);
}
return rawDataBytes;
}

ushort array compression in C#

I've got a ushort array (actually an image coming from a camera) that I'd like to lossless compress before persistent storage. I'm using the GZipStream function provided in System.IO.Compression.GZipStream. This approach, to my knowledge, requires that I convert the ushort array to a byte array. My solution appears to function properly, but just isn't as quick as I'd like. The images are about 2 Mbytes in raw size, and the compress time ranges (on my slow machine) 200 - 400 msecs, and decompress time ranges 100 - 200 msecs. Looking for suggestions for improving my performance.
public static class Zip
{
public static ushort[] Decompress_ByteToShort(byte[] zippedData)
{
byte[] decompressedData = null;
using (MemoryStream outputStream = new MemoryStream())
{
using (MemoryStream inputStream = new MemoryStream(zippedData))
{
using (GZipStream zip = new GZipStream(inputStream, CompressionMode.Decompress))
{
zip.CopyTo(outputStream);
}
}
decompressedData = outputStream.ToArray();
}
ushort[] decompressShort = new ushort[decompressedData.Length / sizeof(ushort)];
Buffer.BlockCopy(decompressedData, 0, decompressShort, 0, decompressedData.Length);
return decompressShort;
}
public static byte[] Compress_ShortToByte(ushort[] plainData)
{
byte[] compressesData = null;
byte[] uncompressedData = new byte[plainData.Length * sizeof(ushort)];
Buffer.BlockCopy(plainData, 0, uncompressedData, 0, plainData.Length * sizeof(ushort));
using (MemoryStream outputStream = new MemoryStream())
{
using (GZipStream zip = new GZipStream(outputStream, CompressionMode.Compress))
{
zip.Write(uncompressedData, 0, uncompressedData.Length);
}
//Dont get the MemoryStream data before the GZipStream is closed
//since it doesn’t yet contain complete compressed data.
//GZipStream writes additional data including footer information when its been disposed
compressesData = outputStream.ToArray();
}
return compressesData;
}
}

The first problem in your approach I see is that you are using byte arrays instead of direcly loading and writing to files.
Using a smaller temporary buffer and reading\writing to streams and files directly in chunks should be much faster.
Here I propose some functions and overloads you can use to decompress from byte arrays, to byte arrays, from stream, to stream, from file and to file.
The performance improvement should be from 10% to 20%.
Try to adjust the constants as needed.
I used DeflateStream instead of GZipStream, this increases the performance a bit.
You can go back to a GZipStream if you prefer.
I tried just the byte to ushort and ushort to byte[] version of the code and it is about 10% faster.
Accessing directly to files instead of loading it to a big buffer should increase the performance even more.
WARNING: This approach of reading and writing images in this way is not little-endian/big-endian agnostic - it means that a file saved from a Intel/AMD machine is not compatible with an ARM machine, for example in some tablets! Just as a side note :)
/// <summary>The average file size, used to preallocate the right amount of memory for compression.</summary>
private const int AverageFileSize = 100000;
/// <summary>The default size of the buffer used to convert data. WARNING: Must be a multiple of 2!</summary>
private const int BufferSize = 32768;
/// <summary>Decompresses a byte array to unsigned shorts.</summary>
public static ushort[] Decompress_ByteToShort(byte[] zippedData)
{
using (var inputStream = new MemoryStream(zippedData))
return Decompress_File(inputStream);
}
/// <summary>Decompresses a file to unsigned shorts.</summary>
public static ushort[] Decompress_File(string inputFilePath)
{
using (var stream = new FileStream(inputFilePath, FileMode.Open, FileAccess.Read))
return Decompress_File(stream);
}
/// <summary>Decompresses a file stream to unsigned shorts.</summary>
public static ushort[] Decompress_File(Stream zippedData)
{
using (var zip = new DeflateStream(zippedData, CompressionMode.Decompress, true))
{
// Our temporary buffer.
var buffer = new byte[BufferSize];
// Read the number of bytes, written initially as header in the file.
zip.Read(buffer, 0, sizeof(int));
var resultLength = BitConverter.ToInt32(buffer, 0);
// Creates the result array
var result = new ushort[resultLength];
// Decompress the file chunk by chunk
var resultOffset = 0;
for (; ; )
{
// Read a chunk of data
var count = zip.Read(buffer, 0, BufferSize);
if (count <= 0)
break;
// Copy a piece of the decompressed buffer
Buffer.BlockCopy(buffer, 0, result, resultOffset, count);
// Advance counter
resultOffset += count;
}
return result;
}
}
/// <summary>Compresses an ushort array to a file array.</summary>
public static byte[] Compress_ShortToByte(ushort[] plainData)
{
using (var outputStream = new MemoryStream(AverageFileSize))
{
Compress_File(plainData, outputStream);
return outputStream.ToArray();
}
}
/// <summary>Compresses an ushort array directly to a file.</summary>
public static void Compress_File(ushort[] plainData, string outputFilePath)
{
using (var stream = new FileStream(outputFilePath, FileMode.OpenOrCreate, FileAccess.Write))
Compress_File(plainData, stream);
}
/// <summary>Compresses an ushort array directly to a file stream.</summary>
public static void Compress_File(ushort[] plainData, Stream outputStream)
{
using (var zip = new DeflateStream(outputStream, CompressionMode.Compress, true))
{
// Our temporary buffer.
var buffer = new byte[BufferSize];
// Writes the length of the plain data
zip.Write(BitConverter.GetBytes(plainData.Length), 0, sizeof(int));
var inputOffset = 0;
var availableBytes = plainData.Length * sizeof(ushort);
while (availableBytes > 0)
{
// Compute the amount of bytes to copy.
var bytesCount = Math.Min(BufferSize, availableBytes);
// Copy a chunk of plain data into the temporary buffer
Buffer.BlockCopy(plainData, inputOffset, buffer, 0, bytesCount);
// Write the buffer
zip.Write(buffer, 0, bytesCount);
// Advance counters
inputOffset += bytesCount;
availableBytes -= bytesCount;
}
}
}

There is GZipStream constructor (Stream, CompressionLevel) you can change CompressionLevel to speed up compression there is level which say Fastest in this enumeration.
Links to relevant documentation:
http://msdn.microsoft.com/pl-pl/library/hh137341(v=vs.110).aspx
http://msdn.microsoft.com/pl-pl/library/system.io.compression.compressionlevel(v=vs.110).aspx

how to buffer an input stream until it is complete

i'm implementing a wcf service that accepts image streams. however i'm currently getting an exception when i run it. as its trying to get the length of the stream before the stream is complete. so what i'd like to do is buffer the stream until its complete. however i cant find any examples of how to do this...
can anyone help?
my code so far:
public String uploadUserImage(Stream stream)
{
Stream fs = stream;
BinaryReader br = new BinaryReader(fs);
Byte[] bytes = br.ReadBytes((Int32)fs.Length);// this causes exception
File.WriteAllBytes(filepath, bytes);
}

Rather than try to fetch the length, you should read from the stream until it returns that it's "done". In .NET 4, this is really easy:
// Assuming we *really* want to read it into memory first...
MemoryStream memoryStream = new MemoryStream();
stream.CopyTo(memoryStream);
memoryStream.Position = 0;
File.WriteAllBytes(filepath, memoryStream);
In .NET 3.5 there's no CopyTo method, but you can write something similar yourself:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
However, now we've got something to copy a stream, why bother reading it all into memory first? Let's just write it straight to a file:
using (FileStream output = File.OpenWrite(filepath))
{
CopyStream(stream, output); // Or stream.CopyTo(output);
}

I'm not sure what you are returning (or not returning), but something like this might work for you:
public String uploadUserImage(Stream stream) {
const int KB = 1024;
Byte[] bytes = new Byte[KB];
StringBuilder sb = new StringBuilder();
using (BinaryReader br = new BinaryReader(stream)) {
int len;
do {
len = br.Read(bytes, 0, KB);
string readData = Encoding.UTF8.GetString(bytes);
sb.Append(readData);
} while (len == KB);
}
//File.WriteAllBytes(filepath, bytes);
return sb.ToString();
}
A string can hold up to 2 GB, I believe.

Try this :
using (StreamWriter sw = File.CreateText(filepath))
{
stream.CopyTo(sw);
sw.Close();
}

Jon Skeets answer for .Net 3.5 and below using a Buffer Read is actually done incorrectly.
The buffer isn't cleared between reads which can result in issues on any read that returns less than 8192, for example if the 2nd read, read 192 bytes, the 8000 last bytes from the first read would STILL be in the buffer which would then be returned to the stream.
My code below you supply it a Stream and it will return a IEnumerable array.
Using this you can for-each it and Write to a MemoryStream and then use .GetBuffer() to end up with a compiled merged byte[].
private IEnumerable<byte[]> ReadFullStream(Stream stream) {
while(true) {
byte[] buffer = new byte[8192];//since this is created every loop, its buffer is cleared
int bytesRead = stream.Read(buffer, 0, buffer.Length);//read up to 8192 bytes into buffer
if (bytesRead == 0) {//if we read nothing, stream is finished
break;
}
if(bytesRead < buffer.Length) {//if we read LESS than 8192 bytes, resize the buffer to essentially remove everything after what was read, otherwise you will have nullbytes/0x00bytes at the end of your buffer
Array.Resize(ref buffer, bytesRead);
}
yield return buffer;//yield return the buffer data
}//loop here until we reach a read == 0 (end of stream)
}

compressing and decompressing source data gives result different than source data

In my app I need to Decompress data written by DataContractSerializer to compression Deflate Stream in another app, edit the decompressed data and Compress it again.
Decompression works fine, but not for data compressed by me.
The problem is that when I do this:
byte[] result = Compressor.Compress(Compressor.Decompress(sourceData));
the length of the result byte array is different than sourceData array.
For example:
string source = "test value";
byte[] oryg = Encoding.Default.GetBytes(source);
byte[] comp = Compressor.Compress(oryg);
byte[] result1 = Compressor.Decompress(comp);
string result2 = Encoding.Default.GetString(res);
and here result1.Length is 0 and result2 is "" of course
Here is the code of my Compressor class.
public static class Compressor
{
public static byte[] Decompress(byte[] data)
{
byte[] result;
using (MemoryStream baseStream = new MemoryStream(data))
{
using (DeflateStream stream = new DeflateStream(baseStream, CompressionMode.Decompress))
{
result = ReadFully(stream, -1);
}
}
return result;
}
public static byte[] Compress(byte[] data)
{
byte[] result;
using (MemoryStream baseStream = new MemoryStream())
{
using (DeflateStream stream = new DeflateStream(baseStream, CompressionMode.Compress, true))
{
stream.Write(data, 0, data.Length);
result = baseStream.ToArray();
}
}
return result;
}
/// <summary>
/// Reads data from a stream until the end is reached. The
/// data is returned as a byte array. An IOException is
/// thrown if any of the underlying IO calls fail.
/// </summary>
/// <param name="stream">The stream to read data from</param>
/// <param name="initialLength">The initial buffer length</param>
private static byte[] ReadFully(Stream stream, int initialLength)
{
// If we've been passed an unhelpful initial length, just
// use 32K.
if (initialLength < 1)
{
initialLength = 65768 / 2;
}
byte[] buffer = new byte[initialLength];
int read = 0;
int chunk;
while ((chunk = stream.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
// If we've reached the end of our buffer, check to see if there's
// any more information
if (read == buffer.Length)
{
int nextByte = stream.ReadByte();
// End of stream? If so, we're done
if (nextByte == -1)
{
return buffer;
}
// Nope. Resize the buffer, put in the byte we've just
// read, and continue
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
// Buffer is now too big. Shrink it.
byte[] ret = new byte[read];
Array.Copy(buffer, ret, read);
return ret;
}
}
Please help me with this case if You can.
Best regards,
Adam

(edited: switched from using flush, which still might not flush out all bytes, to now ensuring deflate is disposed first, as per Phil's answer here: zip and unzip string with Deflate)
Before attempting to read from backing store, you have to ensure the deflate stream has fully flushed itself when compressing, allowing deflate to finish compressing and write final bytes. Closing the deflate steam, or disposing of it, will achieve this.
public static byte[] Compress(byte[] data)
{
byte[] result;
using (MemoryStream baseStream = new MemoryStream())
{
using (DeflateStream stream = new DeflateStream(baseStream, CompressionMode.Compress, true))
{
stream.Write(data, 0, data.Length);
}
result = baseStream.ToArray(); // only safe to read after deflate closed
}
return result;
}
Also your ReadFully routine looks incredibly complicated and likely to have bugs.
One being:
while ((chunk = stream.Read(buffer, read, buffer.Length - read)) > 0)
When reading the 2nd chunk, read will be greater than the length of the buffer, meaning it'll always pass a negative value to stream.Read for the number of bytes to read. My guess is that it'll never read the 2nd chunk, returning zero, and fall out of the while loop.
I recommend Jon's version of ReadFully for this purpose: Creating a byte array from a stream

How to properly serve a PDF file

I am using .NET 3.5 ASP.NET. Currently my web site serves a PDF file in the following manner:
context.Response.WriteFile(#"c:\blah\blah.pdf");
This works great. However, I'd like to serve it via the context.Response.Write(char [], int, int) method.
So I tried sending out the file via
byte [] byteContent = File.ReadAllBytes(ReportPath);
ASCIIEncoding encoding = new ASCIIEncoding();
char[] charContent = encoding.GetChars(byteContent);
context.Response.Write(charContent, 0, charContent.Length);
That did not work (e.g. browser's PDF plugin complains that the file is corrupted).
So I tried the Unicode approach:
byte [] byteContent = File.ReadAllBytes(ReportPath);
UnicodeEncoding encoding = new UnicodeEncoding();
char[] charContent = encoding.GetChars(byteContent);
context.Response.Write(charContent, 0, charContent.Length);
which also did not work.
What am I missing?

You should not convert the bytes into characters, that is why it becomes "corrupted". Even though ASCII characters are stored in bytes the actual ASCII character set is limited to 7 bits. Thus, converting a byte stream with the ASCIIEncoding will effectively remove the 8th bit from each byte.
The bytes should be written to the OutputStream stream of the Response instance.
Instead of loading all bytes from the file upfront, which could possibly consume a lot of memory, reading the file in chunks from a stream is a better approach. Here's a sample of how to read from one stream and then write to another:
void LoadStreamToStream(Stream inputStream, Stream outputStream)
{
const int bufferSize = 64 * 1024;
var buffer = new byte[bufferSize];
while (true)
{
var bytesRead = inputStream.Read(buffer, 0, bufferSize);
if (bytesRead > 0)
{
outputStream.Write(buffer, 0, bytesRead);
}
if ((bytesRead == 0) || (bytesRead < bufferSize))
break;
}
}
You can then use this method to load the contents of your file directly to the Response.OutputStream
LoadStreamToStream(fileStream, Response.OutputStream);
Better still, here's a method opening a file and loading its contents to a stream:
void LoadFileToStream(string inputFile, Stream outputStream)
{
using (var streamInput = new FileStream(inputFile, FileMode.Open, FileAccess.Read))
{
LoadStreamToStream(streamInput, outputStream);
streamInput.Close();
}
}

You may also need to set the ContentType by doing something like this:
Response.ContentType = "application/octet-stream";

Building upon Peter Lillevold's answer, I went and just made some extension methods for his above functions.
public static void WriteTo(this Stream inputStream, Stream outputStream)
{
const int bufferSize = 64 * 1024;
var buffer = new byte[bufferSize];
while (true)
{
var bytesRead = inputStream.Read(buffer, 0, bufferSize);
if (bytesRead > 0)
{
outputStream.Write(buffer, 0, bytesRead);
}
if ((bytesRead == 0) || (bytesRead < bufferSize)) break;
}
}
public static void WriteToFromFile(this Stream outputStream, string inputFile)
{
using (var inputStream = new FileStream(inputFile, FileMode.Open, FileAccess.Read))
{
inputStream.WriteTo(outputStream);
inputStream.Close();
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Streaming large multi-part uploads to S3 using AmazonS3Client - c#

Each chunk must be a seekable Stream, but you can have many chunks. The solution is to split up the input Stream.

Related

Convert Stream to byte[] c# for large files of 2GB

ushort array compression in C#

how to buffer an input stream until it is complete

compressing and decompressing source data gives result different than source data

How to properly serve a PDF file

Categories

Resources