Random Access Stream Azure Block Storage - c#

Is there a way to create a stream object directly to Azure Blob or Azure Block Storage Blob.
IE
var s = new AzureStream(blockObject)
ms.CopyTo(s);
s.position = 200;
ms.CopyTo(s);
s.Read...
This would allow for some awesome interactions such as storing database Indices in azure blob and not needing to pull them local.

Not sure if this answers your question, but you can read a range of bytes from a blob. When using the REST API directly, you can specify the bytes you want to read in either the Range or x-ms-range header.
When using C# SDK, You can use DownloadRangeToStream method, something like:
using (var ms = new MemoryStream())
{
long offset = 200;
long bytesToRead = 1024;
blob.DownloadRangeToStream(ms, offset, bytesToRead);
}

If your question is, "can I use Streams with Azure Blobs", in order to avoid the need to hold the entire size of the blob in memory at any point in time, then the answer is absolutely yes.
For example, when reading block blobs, as per this answer here, blobs can be accessed as a stream handle with methods such as CloudBlob.OpenReadAsync. The default buffer size is 4MB, but can be adjusted via properties like StreamMinimumReadSizeInBytes. Here we copy the blob stream to another open, output stream:
using (var stream = await myBlockBlob.OpenReadAsync(cancellationToken))
{
await stream.CopyToAsync(outputStream);
}
Similar, you can write a stream directly into Blob Storage:
await blockBlob.UploadFromStreamAsync(streamToSave, cancellationToken);

Related

Unable to Append to Append Blob in Azure

I'm trying to perform a textbook append to an append blob in azure.
First I create a blob container. I know this operation succeeds because I can see the container in the storage explorer.
Next I create the blob. I know this operation succeeds because I can see the blob in the storage explorer.
Finally I attempt to append to the blob with the following code.
var csa = CloudStorageAccount.Parse(BLOB_CONNECTION_STRING);
var client = csa.CreateCloudBlobClient();
var containerRefernece = client.GetContainerReference(CONTAINER_NAME);
var blobrefernce = containerRefernece.GetAppendBlobReference(BLOB_NAME);
var ms = new MemoryStream();
var sr = new StreamWriter(ms);
sr.WriteLine(message);
ms.Seek(0, SeekOrigin.Begin);
await blobrefernce.AppendBlockAsync(ms);
No matter what I do I get the following exception.
windowsAzure.Storage StorageException: The value for one of the HTTP headers is not in the correct format.
I'm at a bit of a loss as to how to proceed. I cant even determine what parameters are the problem from the exception. The connection string is copied directly from the azure portal. Note I am using the latest version (9.3.0) of the WindowsAzure.Storage NuGet package.
Any ideas how I can figure out what the problem is?
Thanks!
Just add sr.Flush(); after sr.WriteLine(message); to make buffered data written to the underlying stream immediately.
AutoFlush of StreamWriter is false by default, so buffered data won't be written to destination until we use Flush or Close.
We need to use the MemoryStream which is the construct parameter of StreamWriter, so we can't use Close otherwise we will get exception like Cannot access a closed Stream.

I know about uploading in chunks, do we have to do something on receiving end?

my azure function receives large video files and images and stores it in Azure blob. Client API is sending data in chunks to my Azure htttp trigger function. Do I have to do something at receiving-end to improve performance like receiving data in chunks?
Bruce, actually Client code is being developed by some other team. right now i am testing it by postman and getting files from multipart http request.
foreach (HttpContent ctnt in provider.Contents) {
var dataStream = await ctnt.ReadAsStreamAsync();
if (ctnt.Headers.ContentDisposition.Name.Trim().Replace("\"", "") == "file")
{
byte[] ImageBytes = ReadFully(dataStream);
var fileName = WebUtility.UrlDecode(ctnt.Headers.ContentDisposition.FileName);
} }
ReadFully Function
public static byte[] ReadFully(Stream input){
using (MemoryStream ms = new MemoryStream())
{
input.CopyTo(ms);
return ms.ToArray();
}}
As BlobRequestOptions.ParallelOperationThread states as follows:
Gets or sets the number of blocks that may be simultaneously uploaded.
Remarks:
When using the UploadFrom* methods on a blob, the blob will be broken up into blocks. Setting this
value limits the number of outstanding I/O "put block" requests that the library will have in-flight
at a given time. Default is 1 (no parallelism). Setting this value higher may result in
faster blob uploads, depending on the network between the client and the Azure Storage service.
If blobs are small (less than 256 MB), keeping this value equal to 1 is advised.
I assumed that you could explicitly set the ParallelOperationThreadCount for faster blob uploading.
var requestOption = new BlobRequestOptions()
{
ParallelOperationThreadCount = 5 //Gets or sets the number of blocks that may be simultaneously uploaded.
};
//upload a blob from the local file system
await blockBlob.UploadFromFileAsync("{your-file-path}",null,requestOption,null);
//upload a blob from the stream
await blockBlob.UploadFromStreamAsync({stream-for-upload},null,requestOption,null);
foreach (HttpContent ctnt in provider.Contents)
Based on your code, I assumed that you retrieve the provider instance as follows:
MultipartMemoryStreamProvider provider = await request.Content.ReadAsMultipartAsync();
At this time, you could use the following code for uploading your new blob:
var blobname = ctnt.Headers.ContentDisposition.FileName.Trim('"');
CloudBlockBlob blockBlob = container.GetBlockBlobReference(blobname);
//set the content-type for the current blob
blockBlob.Properties.ContentType = ctnt.Headers.ContentType.MediaType;
await blockBlob.UploadFromStreamAsync(await ctnt.Content.ReadAsStreamAsync(), null,requestOption,null);
I would prefer use MultipartFormDataStreamProvider which would store the uploaded files from the client to the file system instead of MultipartMemoryStreamProvider which would use the server memory for temporarily storing the data sent from the client. For the MultipartFormDataStreamProvider approach, you could follow this similar issue. Moreover, I would prefer use the Azure Storage Client Library with my Azure function, you could follow Get started with Azure Blob storage using .NET.
UPDATE:
Moreover, you could follow this tutorial about breaking a large file into small chunks, upload them in the client side, then merge them back in your server side.

How to save entire MailKit mime message as a byte array

I'm building a simple .net MailKit IMAP client. Rather then pulling emails again and again from the IMAP server, is it possible to store the entire MailKit mime message (in full, including attachments) as a byte array? If so, how?
Then I could write it to MySql or a file and reuse it for testing code changes.
As Lucas points out, you can use the MimeMessage.WriteTo() method to write the message to either a file name or to a stream (such as a MemoryStream).
If you want the message as a byte array in order to save it to an SQL database, you could do this:
using (var memory = new MemoryStream ()) {
message.WriteTo (memory);
var blob = memory.ToArray ();
// now save the blob to the database
}
To read it back from the database, you'd first read the blob as a byte[] and then do this:
using (var memory = new MemoryStream (blob, false)) {
message = MimeMessage.Load (memory);
}

An excel file is hosted on Azure Blob, How do read it into a FileStream?

I upload a file using c# to Azure Blob.
Now I want to read the uploaded file using ExcelDataReader.
I am using the below code.
where _imageRootPathNos is the path(http://imor.blob.core.windows.net/files) where the file is saved
FileStream stream = System.IO.File.Open(_imageRootPathNos + "/" + "ImEx.xlsx", FileMode.Open, FileAccess.Read);
I get an Error System.ArgumentException: 'URI formats are not supported.'
What am I missing?
ExcelDataReader can read data from any stream, not just a FileStream. You can use either WebClient (obsolete), HttpClient or Azure SDK to open a stream and read the blob.
Reading or downloading a blob opens and reads a stream anyway. Instead of eg. downloading the blob or reading all of its contents in a buffer, you access the stream directly. No matter which technique you use, in the end, you open a stream over a single URL for reading.
In your case, you can download and keep the file to reuse it, or you can read from the stream directly. You may want to do that in a web application if you don't have permissions to write to a disk file or if you server many requests at the same time and don't want to deal with temporary file storage.
Using HttpClient, you can use the GetStreamAsync method to open a stream :
var client=new HttpClient();
client.BaseAddress = new Uri("https://imor.blob.core.windows.net/files");
// Set headers and credentials
// ...
using(var stream=await client.GetStreamAsync("ImEx.xlsx"))
{
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//Process the data
}
With the Azure SDK, you can use the CloudBlob.OpenRead method:
var blob = container.GetBlockBlobReference("Imex.xlsx");
using(var stream=await blob.OpenReadAsync())
{
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//Process the data
}
You may want to store the data in a memory buffer or a file, eg for caching or reprocessing. To do that you can create a MemoryStream or FileStream respectively, and copy the data from the blob stream to the target stream.
With HttpClient, you can fill a memory buffer with :
//To avoid reallocations, create a buffer large enough to hold the file
using(var memStream=new MemoryStream(65536))
{
using(var stream=await client.GetStreamAsync("ImEx.xlsx"))
{
await stream.CopyToAsync(memStream);
}
memStream.Position=0;
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(memStream);
}
With the SDK:
using(var memStream=new MemoryStream(65536))
{
//.....
var blob = container.GetBlockBlobReference("Imex.xlsx");
await stream.DownloadToStreamAsync(memStream);
memStream.Position=0;
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(memStream);
//...
}
To download a file you can replace the MemoryStream with a FileStream.
You can't access Azure Blob Storage files using a standard FileSteam. As suggested in Chris's answer you could use the Azure SDK to access the file. Alternatively you could use the Azure Blob Service API.
Another solution would be to use Azure File Storage and create a mapped network drive to the File storage. Then you could use your code to access the file as if it were on a local or network storage system.
There are quite a number of technical differentiators between the two services.
As pricing goes, Azure File Storage is more expensive than Azure Blob Storage, however depending on the intended use, both are pretty cheap.
When working with the Azure Storage service, it's recommended that you use the Azure .NET SDK. The SDK exposes the appropriate methods to download, upload and manage your containers and blob storage. In this case, your code should look like this:
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("files");
// Retrieve reference to a blob named "imex.xlsx".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("Imex.xlsx");
// Save blob contents to a file.
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile"))
{
blockBlob.DownloadToStream(fileStream);
}
You can find all the information you need on how to use the SDK here: https://learn.microsoft.com/en-us/azure/storage/storage-dotnet-how-to-use-blobs
I used this block of code to read the excel file(uploaded on azure) into Dataset
Uri blobUri = new Uri(_imageRootPath + "/" + fileName);
var wc = new WebClient();
var sourceStream = wc.DownloadData(blobUri);
Stream memoryStream = new MemoryStream(sourceStream);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(memoryStream);
DataSet dsResult = excelReader.AsDataSet();
return dsResult;

Amazon S3 Transferutility use FilePath or Stream?

When uploading a file to S3 using the TransportUtility class, there is an option to either use FilePath or an input stream. I'm using multi-part uploads.
I'm uploading a variety of things, of which some are files on disk and others are raw streams. I'm currently using the InputStream variety for everything, which works OK, but I'm wondering if I should specialize the method further. For the files on disk, I'm basically using File.OpenRead and passing that stream to the InputStream of the transfer request.
Are there any performance gains or otherwise to prefer the FilePath method over the InputStream one where the input is known to be a file.
In short: Is this the same thing
using (var fs = File.OpenRead("some path"))
{
var uploadMultipartRequest = new TransferUtilityUploadRequest
{
BucketName = "defaultBucket",
Key = "key",
InputStream = fs,
PartSize = partSize
};
using (var transferUtility = new TransferUtility(s3Client))
{
await transferUtility.UploadAsync(uploadMultipartRequest);
}
}
As:
var uploadMultipartRequest = new TransferUtilityUploadRequest
{
BucketName = "defaultBucket",
Key = "key",
FilePath = "some path",
PartSize = partSize
};
using (var transferUtility = new TransferUtility(s3Client))
{
await transferUtility.UploadAsync(uploadMultipartRequest);
}
Or are there any significant difference between the two? I know if files are large or not, and could prefer one method or another based on that.
Edit: I've also done some decompiling of the S3Client, and there does indeed seem to be some difference in regards to the concurrency level of the transfer, as found in MultipartUploadCommand.cs
private int CalculateConcurrentServiceRequests()
{
int num = !this._fileTransporterRequest.IsSetFilePath() || this._s3Client is AmazonS3EncryptionClient ? 1 : this._config.ConcurrentServiceRequests;
if (this._totalNumberOfParts < num)
num = this._totalNumberOfParts;
return num;
}
From the TransferUtility documentation:
When uploading large files by specifying file paths instead of a
stream, TransferUtility uses multiple threads to upload multiple parts
of a single upload at once. When dealing with large content sizes and
high bandwidth, this can increase throughput significantly.
Which tells that using the file paths will use the MultiPart upload, but using the stream wont.
But when I read through this Upload Method (stream, bucketName, key):
Uploads the contents of the specified stream. For large uploads, the
file will be divided and uploaded in parts using Amazon S3's multipart
API. The parts will be reassembled as one object in Amazon S3.
Which means that MultiPart is used on Streams as well.
Amazon recommend to use MultiPart upload if the file size is larger than 100MB http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html
Multipart upload allows you to upload a single object as a set of
parts. Each part is a contiguous portion of the object's data. You can
upload these object parts independently and in any order. If
transmission of any part fails, you can retransmit that part without
affecting other parts. After all parts of your object are uploaded,
Amazon S3 assembles these parts and creates the object. In general,
when your object size reaches 100 MB, you should consider using
multipart uploads instead of uploading the object in a single
operation.
Using multipart upload provides the following advantages:
Improved throughput—You can upload parts in parallel to improve
throughput. Quick recovery from any network issues—Smaller part size
minimizes the impact of restarting a failed upload due to a network
error. Pause and resume object uploads—You can upload object parts
over time. Once you initiate a multipart upload there is no expiry;
you must explicitly complete or abort the multipart upload. Begin an
upload before you know the final object size—You can upload an object
as you are creating it.
So based on Amazon S3 there is no different between using Stream or File Path, but It might make a slightly performance difference based on your code and OS.
I think the difference may be that they both use Multipart Upload API, but using a FilePath allows for concurrent uploads, however,
When you're using a stream for the source of data, the TransferUtility
class does not do concurrent uploads.
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingTheMPDotNetAPI.html

Categories