Serializing Json directly into AWS S3 bucket with Newtonsoft.Json

Serializing Json directly into AWS S3 bucket with Newtonsoft.Json - c#

I have an object that has to be converted to Json format and uploaded via Stream object. This is the AWS S3 upload code:
AWSS3Client.PutObjectAsync(new PutObjectRequest()
{
InputStream = stream,
BucketName = name,
Key = keyName
}).Wait();
Here stream is Stream type which is read by AWSS3Client.
The data that I am uploading is a complex object that has to be in Json format.
I can convert object to string using JsonConvert.SerializeObject or serialize to file using JsonSerializer but since amount of data is quite significant I would prefer to avoid temporary string or file and convert object to readable Stream right away. My ideal code would look something like this:
AWSS3Client.PutObjectAsync(new PutObjectRequest()
{
InputStream = MagicJsonConverter.ToStream(myDataObject),
BucketName = name,
Key = keyName
}).Wait();
Is there a way to achieve this using Newtonsoft.Json ?

You need two things here: one is producer/consumer stream, e.g. BlockingStream from this StackOverflow question, and second, Json.Net serializer writing to this stream like in this another SO question.

Another practical option is to wrap the memory stream with gzip stream (2 lines of code).
Usually, JSON files will have great compression (1GB file can be compressed to 50MB).
Then when serving the stream to S3, wrap it with gzip stream which decompresses it.
I guess the trade-off comparing to temp file is CPU vs IO (both will probably work well). If you can save it compressed on S3 it will save you space and increase networking efficiency too.
Example code:
var compressed = new MemoryStream();
using (var zip = new GZipStream(compressed, CompressionLevel.Fastest, true))
{
-> Write to zip stream...
}
compressed.Seek(0, SeekOrigin.Begin);
-> Use stream to upload to S3

Related

Wrapping a JSON Stream

I'm trying to store large objects as gzipped JSON text to an Azure blob.
I don't want to hold the serialized data in memory, and I don't want to spool to disk if I can avoid it, but I don't see how to just let it serialize and compress on the fly.
I'm using JSON.NET from Newtonsoft (pretty much the de facto standard JSON serializer for .NET), but the signatures of the methods don't really seem to support on-the-fly streaming.
Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob has an UploadFromStream(Stream source, AccessCondition accessCondition = null, BlobRequestOptions options = null, OperationContext operationContext = null) method, but in order for that to work properly, I need to have the position of the stream be 0, and the JsonSerializer.SerializeObject doesn't do that. It just acts on a stream, and when it's done the stream position is at EOF.
What I'd like to do is something like this:
public void SaveObject(object obj, string path, JsonSerializerSettings settings = null)
{
using (var jsonStream = new JsonStream(object, settings ?? _defaultSerializerSettings))
using (var gzipStream = new GZipStream(jsonStream))
{
var blob = GetCloudBlockBlob(path);
blob.UploadFromStream(gzipStream);
}
}
...the idea being, serialization does not start until something is pulling data (in this case, the GZipStream, which does not compress data until pulled by the blob.UploadFromStream() method) thus it maintains a low overhead. It does not need to be a seekable stream, it just needs to be read on demand.
I trust everyone can see how this would work if you were doing a stream from System.IO.File.OpenRead() instead of new JsonStream(object obj). While it gets a bit more complicated because Json.NET needs to "look ahead" and potentially fill a buffer, they got it working with the CryptoStream and GZipStream and that works real slick.
Is there a way to do this that does not load the entire JSON representation of the object into memory, or spool it to disk first just to regurgitate? If CryptoStreams can do it, we should be able to do it with Json.NET without a large amount of effort. I would think.

Audio file is not working via FTP upload programatically

I am uploading an .mp3 file via FTP code using C#, the file is uploaded successfully on server but when i bind to a simple audio control or directly view in browser it does not work as expected, whereas when i upload manually on the server it works perfectly.
Code:
var inputStream = FileUpload1.PostedFile.InputStream;
byte[] fileBytes = new byte[inputStream.Length];
inputStream.Read(fileBytes, 0, fileBytes.Length);
Note: When i view the file in Firefox it shows MIME type is not supported.
Thanks!

You're reading the file as a string then using UTF8 encoding to turn it into bytes. If you do that, and the file contains any binary sequence that doesn't code to a valid UTF8 value, parts of the data stream will simply get discarded.
Instead, read it directly as bytes. Don't bother with the StreamReader. Call the Read() method on the underlying stream. Example:
var inputStream = FileUpload1.PostedFile.InputStream
byte[] fileBytes = new byte[inputStream.Length];
inputStream.Read(fileBytes, 0, fileStream.Length);

Can we convert a live video stream into a byte array?

I'm using C#.NET I'm getting a live video stream from a url(rtsp://streamurl). Now I want to know if we can convert this live stream into a byte array so that I can use NReco.VideoConverter component to encode this Stream using .h264 and then stream it via a server.
I'm currently gathering details and studying basics on NReco.VideoEncoder. It has a method to convert a live video stream, but for the input file, it requires System.IO.Stream instead of a URL path. That's why I'm asking this question. Thanks!

I have no experience with NReco.VideoEncoder, so this is just a guess:
When looking at your link to the interface you'll see:
public ConvertLiveMediaTask ConvertLiveMedia(
Stream inputStream,
string inputFormat,
string outputFile,
string outputFormat,
ConvertSettings settings
)
Stream is very flexible (first input param), so you should be able to use anything from file as well as web... so you should be able to do it this way (haven't compiled this code):
// convert url to stream
WebRequest request=WebRequest.Create(url); // your rtsc url?
request.Timeout=30*60*1000;
request.UseDefaultCredentials=true;
request.Proxy.Credentials=request.Credentials;
WebResponse response=(WebResponse)request.GetResponse();
using (Stream stream = response.GetResponseStream())
{
var converter = new FFMpegConverter(); // init converter
converter.ConvertLiveMedia(stream, // put your stream here
"???", // problem here... no rtsc support in Formats enum found, so you might need to know the video format
"C:\whateverpath\whatever.hevc", // extension?
Format.h265);
}
I don't see how rtsc is supported here and you might need to now what kind of video encoding is packed into rtsc first, otherwise the converter doesn't understand the input (at least when using this interface you mentioned).
And that's what I meant in my comment: You need to know the data structure of the (byte) stream to know how to interpret the bits or you have to make a guess.
Their website states the feature:
Live video stream transcoding from C# Stream (or Webcam, RTSP URL, file) to C#
Stream (or streaming server URL, file)

Amazon S3 Transferutility use FilePath or Stream?

When uploading a file to S3 using the TransportUtility class, there is an option to either use FilePath or an input stream. I'm using multi-part uploads.
I'm uploading a variety of things, of which some are files on disk and others are raw streams. I'm currently using the InputStream variety for everything, which works OK, but I'm wondering if I should specialize the method further. For the files on disk, I'm basically using File.OpenRead and passing that stream to the InputStream of the transfer request.
Are there any performance gains or otherwise to prefer the FilePath method over the InputStream one where the input is known to be a file.
In short: Is this the same thing
using (var fs = File.OpenRead("some path"))
{
var uploadMultipartRequest = new TransferUtilityUploadRequest
{
BucketName = "defaultBucket",
Key = "key",
InputStream = fs,
PartSize = partSize
};
using (var transferUtility = new TransferUtility(s3Client))
{
await transferUtility.UploadAsync(uploadMultipartRequest);
}
}
As:
var uploadMultipartRequest = new TransferUtilityUploadRequest
{
BucketName = "defaultBucket",
Key = "key",
FilePath = "some path",
PartSize = partSize
};
using (var transferUtility = new TransferUtility(s3Client))
{
await transferUtility.UploadAsync(uploadMultipartRequest);
}
Or are there any significant difference between the two? I know if files are large or not, and could prefer one method or another based on that.
Edit: I've also done some decompiling of the S3Client, and there does indeed seem to be some difference in regards to the concurrency level of the transfer, as found in MultipartUploadCommand.cs
private int CalculateConcurrentServiceRequests()
{
int num = !this._fileTransporterRequest.IsSetFilePath() || this._s3Client is AmazonS3EncryptionClient ? 1 : this._config.ConcurrentServiceRequests;
if (this._totalNumberOfParts < num)
num = this._totalNumberOfParts;
return num;
}

From the TransferUtility documentation:
When uploading large files by specifying file paths instead of a
stream, TransferUtility uses multiple threads to upload multiple parts
of a single upload at once. When dealing with large content sizes and
high bandwidth, this can increase throughput significantly.
Which tells that using the file paths will use the MultiPart upload, but using the stream wont.
But when I read through this Upload Method (stream, bucketName, key):
Uploads the contents of the specified stream. For large uploads, the
file will be divided and uploaded in parts using Amazon S3's multipart
API. The parts will be reassembled as one object in Amazon S3.
Which means that MultiPart is used on Streams as well.
Amazon recommend to use MultiPart upload if the file size is larger than 100MB http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html
Multipart upload allows you to upload a single object as a set of
parts. Each part is a contiguous portion of the object's data. You can
upload these object parts independently and in any order. If
transmission of any part fails, you can retransmit that part without
affecting other parts. After all parts of your object are uploaded,
Amazon S3 assembles these parts and creates the object. In general,
when your object size reaches 100 MB, you should consider using
multipart uploads instead of uploading the object in a single
operation.
Using multipart upload provides the following advantages:
Improved throughput—You can upload parts in parallel to improve
throughput. Quick recovery from any network issues—Smaller part size
minimizes the impact of restarting a failed upload due to a network
error. Pause and resume object uploads—You can upload object parts
over time. Once you initiate a multipart upload there is no expiry;
you must explicitly complete or abort the multipart upload. Begin an
upload before you know the final object size—You can upload an object
as you are creating it.
So based on Amazon S3 there is no different between using Stream or File Path, but It might make a slightly performance difference based on your code and OS.

I think the difference may be that they both use Multipart Upload API, but using a FilePath allows for concurrent uploads, however,
When you're using a stream for the source of data, the TransferUtility
class does not do concurrent uploads.
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingTheMPDotNetAPI.html

Decompressing a Zip file from a string

I'm fetching an object from couchbase where one of the fields has a file. The file is zipped and then encoded in base64.
How would I be able to take this string and decompress it back to the original file?
Then, if I'm using ASP.MVC 4 - How would I send it back to the browser as a downloadable file?
The original file is being created on a Linux system and decoded on a Windows system (C#).

You should use Convert.FromBase64String to get the bytes, then decompress, and then use Controller.File to have the client download the file. To decompress, you need to open the zip file using some sort of ZIP library. .NET 4.5's built-in ZipArchive class should work. Or you could use another library, both SharpZipLib and DotNetZip support reading from streams.
public ActionResult MyAction()
{
string base64String = // get from Linux system
byte[] zipBytes = Convert.FromBase64String(base64String);
using (var zipStream = new MemoryStream(zipBytes))
using (var zipArchive = new ZipArchive(zipStream))
{
var entry = zipArchive.Entries.Single();
string mimeType = MimeMapping.GetMimeMapping(entry.Name);
using (var decompressedStream = entry.Open())
return File(decompressedStream, mimeType);
}
}
You'll also need the MIME type of the file, you can use MimeMapping.GetMimeMapping to help you get that for most common types.

I've used SharpZipLib successfully for this type of task in the past.
For an example that's very close to what you need to do have a look here.
Basically, the steps should be something like this:
you get the compressed input as a string from the database
create a MemoryStream and write the string to it
seek back to the beginning of the memory stream
use the MemoryStream as an input to the SharpZipLib ZipFile class
follow the example provided above to unpack the contents of the ZipFile
Update
If the string contains only the zipped contents of the file (not a full Zip archive) then you can simply use the GZipStream class in .NET to unzip the contents. You can find a sample here. But the initial steps are the same as above (get string from db, write to memory stream, feed memory stream as input to the GZipStream to decompress).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.