I'm trying to perform a textbook append to an append blob in azure.
First I create a blob container. I know this operation succeeds because I can see the container in the storage explorer.
Next I create the blob. I know this operation succeeds because I can see the blob in the storage explorer.
Finally I attempt to append to the blob with the following code.
var csa = CloudStorageAccount.Parse(BLOB_CONNECTION_STRING);
var client = csa.CreateCloudBlobClient();
var containerRefernece = client.GetContainerReference(CONTAINER_NAME);
var blobrefernce = containerRefernece.GetAppendBlobReference(BLOB_NAME);
var ms = new MemoryStream();
var sr = new StreamWriter(ms);
sr.WriteLine(message);
ms.Seek(0, SeekOrigin.Begin);
await blobrefernce.AppendBlockAsync(ms);
No matter what I do I get the following exception.
windowsAzure.Storage StorageException: The value for one of the HTTP headers is not in the correct format.
I'm at a bit of a loss as to how to proceed. I cant even determine what parameters are the problem from the exception. The connection string is copied directly from the azure portal. Note I am using the latest version (9.3.0) of the WindowsAzure.Storage NuGet package.
Any ideas how I can figure out what the problem is?
Thanks!
Just add sr.Flush(); after sr.WriteLine(message); to make buffered data written to the underlying stream immediately.
AutoFlush of StreamWriter is false by default, so buffered data won't be written to destination until we use Flush or Close.
We need to use the MemoryStream which is the construct parameter of StreamWriter, so we can't use Close otherwise we will get exception like Cannot access a closed Stream.
Related
Is there a way to create a stream object directly to Azure Blob or Azure Block Storage Blob.
IE
var s = new AzureStream(blockObject)
ms.CopyTo(s);
s.position = 200;
ms.CopyTo(s);
s.Read...
This would allow for some awesome interactions such as storing database Indices in azure blob and not needing to pull them local.
Not sure if this answers your question, but you can read a range of bytes from a blob. When using the REST API directly, you can specify the bytes you want to read in either the Range or x-ms-range header.
When using C# SDK, You can use DownloadRangeToStream method, something like:
using (var ms = new MemoryStream())
{
long offset = 200;
long bytesToRead = 1024;
blob.DownloadRangeToStream(ms, offset, bytesToRead);
}
If your question is, "can I use Streams with Azure Blobs", in order to avoid the need to hold the entire size of the blob in memory at any point in time, then the answer is absolutely yes.
For example, when reading block blobs, as per this answer here, blobs can be accessed as a stream handle with methods such as CloudBlob.OpenReadAsync. The default buffer size is 4MB, but can be adjusted via properties like StreamMinimumReadSizeInBytes. Here we copy the blob stream to another open, output stream:
using (var stream = await myBlockBlob.OpenReadAsync(cancellationToken))
{
await stream.CopyToAsync(outputStream);
}
Similar, you can write a stream directly into Blob Storage:
await blockBlob.UploadFromStreamAsync(streamToSave, cancellationToken);
my azure function receives large video files and images and stores it in Azure blob. Client API is sending data in chunks to my Azure htttp trigger function. Do I have to do something at receiving-end to improve performance like receiving data in chunks?
Bruce, actually Client code is being developed by some other team. right now i am testing it by postman and getting files from multipart http request.
foreach (HttpContent ctnt in provider.Contents) {
var dataStream = await ctnt.ReadAsStreamAsync();
if (ctnt.Headers.ContentDisposition.Name.Trim().Replace("\"", "") == "file")
{
byte[] ImageBytes = ReadFully(dataStream);
var fileName = WebUtility.UrlDecode(ctnt.Headers.ContentDisposition.FileName);
} }
ReadFully Function
public static byte[] ReadFully(Stream input){
using (MemoryStream ms = new MemoryStream())
{
input.CopyTo(ms);
return ms.ToArray();
}}
As BlobRequestOptions.ParallelOperationThread states as follows:
Gets or sets the number of blocks that may be simultaneously uploaded.
Remarks:
When using the UploadFrom* methods on a blob, the blob will be broken up into blocks. Setting this
value limits the number of outstanding I/O "put block" requests that the library will have in-flight
at a given time. Default is 1 (no parallelism). Setting this value higher may result in
faster blob uploads, depending on the network between the client and the Azure Storage service.
If blobs are small (less than 256 MB), keeping this value equal to 1 is advised.
I assumed that you could explicitly set the ParallelOperationThreadCount for faster blob uploading.
var requestOption = new BlobRequestOptions()
{
ParallelOperationThreadCount = 5 //Gets or sets the number of blocks that may be simultaneously uploaded.
};
//upload a blob from the local file system
await blockBlob.UploadFromFileAsync("{your-file-path}",null,requestOption,null);
//upload a blob from the stream
await blockBlob.UploadFromStreamAsync({stream-for-upload},null,requestOption,null);
foreach (HttpContent ctnt in provider.Contents)
Based on your code, I assumed that you retrieve the provider instance as follows:
MultipartMemoryStreamProvider provider = await request.Content.ReadAsMultipartAsync();
At this time, you could use the following code for uploading your new blob:
var blobname = ctnt.Headers.ContentDisposition.FileName.Trim('"');
CloudBlockBlob blockBlob = container.GetBlockBlobReference(blobname);
//set the content-type for the current blob
blockBlob.Properties.ContentType = ctnt.Headers.ContentType.MediaType;
await blockBlob.UploadFromStreamAsync(await ctnt.Content.ReadAsStreamAsync(), null,requestOption,null);
I would prefer use MultipartFormDataStreamProvider which would store the uploaded files from the client to the file system instead of MultipartMemoryStreamProvider which would use the server memory for temporarily storing the data sent from the client. For the MultipartFormDataStreamProvider approach, you could follow this similar issue. Moreover, I would prefer use the Azure Storage Client Library with my Azure function, you could follow Get started with Azure Blob storage using .NET.
UPDATE:
Moreover, you could follow this tutorial about breaking a large file into small chunks, upload them in the client side, then merge them back in your server side.
I upload a file using c# to Azure Blob.
Now I want to read the uploaded file using ExcelDataReader.
I am using the below code.
where _imageRootPathNos is the path(http://imor.blob.core.windows.net/files) where the file is saved
FileStream stream = System.IO.File.Open(_imageRootPathNos + "/" + "ImEx.xlsx", FileMode.Open, FileAccess.Read);
I get an Error System.ArgumentException: 'URI formats are not supported.'
What am I missing?
ExcelDataReader can read data from any stream, not just a FileStream. You can use either WebClient (obsolete), HttpClient or Azure SDK to open a stream and read the blob.
Reading or downloading a blob opens and reads a stream anyway. Instead of eg. downloading the blob or reading all of its contents in a buffer, you access the stream directly. No matter which technique you use, in the end, you open a stream over a single URL for reading.
In your case, you can download and keep the file to reuse it, or you can read from the stream directly. You may want to do that in a web application if you don't have permissions to write to a disk file or if you server many requests at the same time and don't want to deal with temporary file storage.
Using HttpClient, you can use the GetStreamAsync method to open a stream :
var client=new HttpClient();
client.BaseAddress = new Uri("https://imor.blob.core.windows.net/files");
// Set headers and credentials
// ...
using(var stream=await client.GetStreamAsync("ImEx.xlsx"))
{
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//Process the data
}
With the Azure SDK, you can use the CloudBlob.OpenRead method:
var blob = container.GetBlockBlobReference("Imex.xlsx");
using(var stream=await blob.OpenReadAsync())
{
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//Process the data
}
You may want to store the data in a memory buffer or a file, eg for caching or reprocessing. To do that you can create a MemoryStream or FileStream respectively, and copy the data from the blob stream to the target stream.
With HttpClient, you can fill a memory buffer with :
//To avoid reallocations, create a buffer large enough to hold the file
using(var memStream=new MemoryStream(65536))
{
using(var stream=await client.GetStreamAsync("ImEx.xlsx"))
{
await stream.CopyToAsync(memStream);
}
memStream.Position=0;
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(memStream);
}
With the SDK:
using(var memStream=new MemoryStream(65536))
{
//.....
var blob = container.GetBlockBlobReference("Imex.xlsx");
await stream.DownloadToStreamAsync(memStream);
memStream.Position=0;
var excelReader = ExcelReaderFactory.CreateOpenXmlReader(memStream);
//...
}
To download a file you can replace the MemoryStream with a FileStream.
You can't access Azure Blob Storage files using a standard FileSteam. As suggested in Chris's answer you could use the Azure SDK to access the file. Alternatively you could use the Azure Blob Service API.
Another solution would be to use Azure File Storage and create a mapped network drive to the File storage. Then you could use your code to access the file as if it were on a local or network storage system.
There are quite a number of technical differentiators between the two services.
As pricing goes, Azure File Storage is more expensive than Azure Blob Storage, however depending on the intended use, both are pretty cheap.
When working with the Azure Storage service, it's recommended that you use the Azure .NET SDK. The SDK exposes the appropriate methods to download, upload and manage your containers and blob storage. In this case, your code should look like this:
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("files");
// Retrieve reference to a blob named "imex.xlsx".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("Imex.xlsx");
// Save blob contents to a file.
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile"))
{
blockBlob.DownloadToStream(fileStream);
}
You can find all the information you need on how to use the SDK here: https://learn.microsoft.com/en-us/azure/storage/storage-dotnet-how-to-use-blobs
I used this block of code to read the excel file(uploaded on azure) into Dataset
Uri blobUri = new Uri(_imageRootPath + "/" + fileName);
var wc = new WebClient();
var sourceStream = wc.DownloadData(blobUri);
Stream memoryStream = new MemoryStream(sourceStream);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(memoryStream);
DataSet dsResult = excelReader.AsDataSet();
return dsResult;
In an attempt to create a non-buffered file upload I have extended System.Web.Http.WebHost.WebHostBufferPolicySelector, overriding function UseBufferedInputStream() as described in this article: http://www.strathweb.com/2012/09/dealing-with-large-files-in-asp-net-web-api/. When a file is POSTed to my controller, I can see in trace output that the overridden function UseBufferedInputStream() is definitely returning FALSE as expected. However, using diagnostic tools I can see the memory growing as the file is being uploaded.
The heavy memory usage appears to be occurring in my custom MediaTypeFormatter (something like the FileMediaFormatter here: http://lonetechie.com/). It is in this formatter that I would like to incrementally write the incoming file to disk, but I also need to parse json and do some other operations with the Content-Type:multipart/form-data upload. Therefore I'm using HttpContent method ReadAsMultiPartAsync(), which appears to be the source of the memory growth. I have placed trace output before/after the "await", and it appears that while the task is blocking the memory usage is increasing fairly rapidly.
Once I find the file content in the parts returned by ReadAsMultiPartAsync(), I am using Stream.CopyTo() in order to write the file contents to disk. This writes to disk as expected, but unfortunately the source file is already in memory by this point.
Does anyone have any thoughts about what might be going wrong? It seems that ReadAsMultiPartAsync() is buffering the whole post data; if that is true why do we require var fileStream = await fileContent.ReadAsStreamAsync() to get the file contents? Is there another way to accomplish the splitting of the parts without reading them into memory? The code in my MediaTypeFormatter looks something like this:
// save the stream so we can seek/read again later
Stream stream = await content.ReadAsStreamAsync();
var parts = await content.ReadAsMultipartAsync(); // <- memory usage grows rapidly
if (!content.IsMimeMultipartContent())
{
throw new HttpResponseException(HttpStatusCode.UnsupportedMediaType);
}
//
// pull data out of parts.Contents, process json, etc.
//
// find the file data in the multipart contents
var fileContent = parts.Contents.FirstOrDefault(
x => x.Headers.ContentDisposition.DispositionType.ToLower().Trim() == "form-data" &&
x.Headers.ContentDisposition.Name.ToLower().Trim() == "\"" + DATA_CONTENT_DISPOSITION_NAME_FILE_CONTENTS + "\"");
// write the file to disk
using (var fileStream = await fileContent.ReadAsStreamAsync())
{
using (FileStream toDisk = File.OpenWrite("myUploadedFile.bin"))
{
((Stream)fileStream).CopyTo(toDisk);
}
}
WebHostBufferPolicySelector only specifies if the underlying request is bufferless. This is what Web API will do under the hood:
IHostBufferPolicySelector policySelector = _bufferPolicySelector.Value;
bool isInputBuffered = policySelector == null ? true : policySelector.UseBufferedInputStream(httpContextBase);
Stream inputStream = isInputBuffered
? requestBase.InputStream
: httpContextBase.ApplicationInstance.Request.GetBufferlessInputStream();
So if your implementation returns false, then the request is bufferless.
However, ReadAsMultipartAsync() loads everything into MemoryStream - because if you don't specify a provider, it defaults to MultipartMemoryStreamProvider.
To get the files to save automatically to disk as every part is processed use MultipartFormDataStreamProvider (if you deal with files and form data) or MultipartFileStreamProvider (if you deal with just files).
There is an example on asp.net or here. In these examples everything happens in controllers, but there is no reason why you wouldn't use it in i.e. a formatter.
Another option, if you really want to play with streams is to implement a custom class inheritng from MultipartStreamProvider that would fire whatever processing you want as soon as it grabs part of the stream. The usage would be similar to the aforementioned providers - you'd need to pass it to the ReadAsMultipartAsync(provider) method.
Finally - if you are feeling suicidal - since the underlying request stream is bufferless theoretically you could use something like this in your controller or formatter:
Stream stream = HttpContext.Current.Request.GetBufferlessInputStream();
byte[] b = new byte[32*1024];
while ((n = stream.Read(b, 0, b.Length)) > 0)
{
//do stuff with stream bit
}
But of course that's very, for the lack of better word, "ghetto."
I'm trying to fix a bug where the following code results in a 0 byte file on S3, and no error message.
This code feeds in a Stream (from the poorly-named FileUpload4) which contains an image and the desired image path (from a database wrapper object) to Amazon's S3, but the file itself is never uploaded.
CloudUtils.UploadAssetToCloud(FileUpload4.FileContent, ((ImageContent)auxSRC.Content).PhysicalLocationUrl);
ContentWrapper.SaveOrUpdateAuxiliarySalesRoleContent(auxSRC);
The second line simply saves the database object which stores information about the (supposedly) uploaded picture. This save is going through, demonstrating that the above line runs without error.
The first line above calls in to this method, after retrieving an appropriate bucketname:
public static bool UploadAssetToCloud(Stream asset, string path, string bucketName, AssetSecurity security = AssetSecurity.PublicRead)
{
TransferUtility txferUtil;
S3CannedACL ACL = GetS3ACL(security);
using (txferUtil = new Amazon.S3.Transfer.TransferUtility(AWSKEY, AWSSECRETKEY))
{
TransferUtilityUploadRequest request = new TransferUtilityUploadRequest()
.WithBucketName(bucketName)
.WithTimeout(TWO_MINUTES)
.WithCannedACL(ACL)
.WithKey(path);
request.InputStream = asset;
txferUtil.Upload(request);
}
return true;
}
I have made sure that the stream is a good stream - I can save it anywhere else I have permissions for, the bucket exists and the path is fine (the file is created at the destination on S3, it just doesn't get populated with the content of the stream). I'm close to my wits end, here - what am I missing?
EDIT: One of my coworkers pointed out that it would be better to the FileUpload's PostedFile property. I'm now pulling the stream off of that, instead. It still isn't working.
Is the stream positioned correctly? Check asset.Position to make sure the position is set to the beginning of the stream.
asset.Seek(0, SeekOrigin.Begin);
Edit
OK, more guesses (I'm down to guesses, though):
(all of this is assuming that you can still read from your incoming stream just fine "by hand")
Just for testing, try one of the simpler Upload methods on the TransferUtility -- maybe one that just takes a file path string. If that works, then maybe there are additional properties to set on the UploadRequest object.
If you hook the UploadProgressEvent on the UploadRequest object, do you get any additional clues to what's going wrong?
I noticed that the UploadRequest's api includes both an InputStream property, and a WithInputStream fluent API. Maybe there's a bug with setting InputStream? Maybe try using the .WithInputStream API instead
Which Stream are you using ? Does the stream you are using, support mark() and reset() method.
Might be while upload method first calculate the MD5 for the given stream and then upload it, So if you stream is not supporting these two method then at the time of MD5 calculation it reaches at eof and then unable to preposition for the stream to upload the object.