Azure blob storage V12 - Example of using specialized class BlockBlobStorage - c#

I have hard time with new version of Azure Storage Blobs client library for .NET.
What I need is create stream where I can write data and let's say, after stream reach size of 4MB, then I need to upload it. I found BlockBlobClient. There are two methods CommitBlockListAsync and StageBlockAsync. This methods looks like what I need, but I can't find some examples of usage.
Do you know about some scenario similar to my needs? Or can you someone help me understand this client?
Something like this I need, Every 4MB stage, clear stream and continue to write:
public class MyStreamWrapper : Stream
{
readonly BlockBlobClient _blockBlobClient;
readonly Stream _wrappedStream;
bool _isCommited;
readonly List<string> _blockIds;
public MyStreamWrapper (BlockBlobClient blockBlobClient)
{
_wrappedStream = new MemoryStream();
_blockBlobClient = blockBlobClient;
_isCommited = false;
_blockIds = new List<string>();
}
public override async Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
if ((_wrappedStream.Length + buffer.Length) / 1024 > 4) // check size if
{
int byteCount = (int)(_wrappedStream.Length - buffer.Length);
if (byteCount > 0)
{
_wrappedStream.Write(buffer, offset, byteCount);
offset += byteCount;
}
string base64Id = Convert.ToBase64String(buffer);
_blockIds.Add(base64Id);
_blockBlobClient.StageBlock(base64Id, _wrappedStream);
_wrappedStream.Flush();
}
await _wrappedStream.WriteAsync(buffer, offset, count, cancellationToken);
}
}

For the APIs that don't appear in the samples folder, look at the tests.
eg
[Test]
public async Task CommitBlockListAsync()
{
await using DisposingContainer test = await GetTestContainerAsync();
// Arrange
BlockBlobClient blob = InstrumentClient(test.Container.GetBlockBlobClient(GetNewBlobName()));
var data = GetRandomBuffer(Size);
var firstBlockName = GetNewBlockName();
var secondBlockName = GetNewBlockName();
var thirdBlockName = GetNewBlockName();
// Act
// Stage blocks
using (var stream = new MemoryStream(data))
{
await blob.StageBlockAsync(ToBase64(firstBlockName), stream);
}
using (var stream = new MemoryStream(data))
{
await blob.StageBlockAsync(ToBase64(secondBlockName), stream);
}
// Commit first two Blocks
var commitList = new string[]
{
ToBase64(firstBlockName),
ToBase64(secondBlockName)
};
await blob.CommitBlockListAsync(commitList);
// Stage 3rd Block
using (var stream = new MemoryStream(data))
{
await blob.StageBlockAsync(ToBase64(thirdBlockName), stream);
}
// Assert
Response<BlockList> blobList = await blob.GetBlockListAsync(BlockListTypes.All);
Assert.AreEqual(2, blobList.Value.CommittedBlocks.Count());
Assert.AreEqual(ToBase64(firstBlockName), blobList.Value.CommittedBlocks.First().Name);
Assert.AreEqual(ToBase64(secondBlockName), blobList.Value.CommittedBlocks.ElementAt(1).Name);
Assert.AreEqual(1, blobList.Value.UncommittedBlocks.Count());
Assert.AreEqual(ToBase64(thirdBlockName), blobList.Value.UncommittedBlocks.First().Name);
}
https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientTests.cs
Also familiarize yourself with the REST API, for which the client libraries are wrappers. You are using a "lower level" API approach, that maps directly to Put Block and Put Block List REST APIs.

Related

Stream Extensions to convert Stream content into String or Byte array

Using C# 10 I am creating Stream extensions to get content into a String or Byte array.
Something similar to File.ReadAllTextAsync in Microsoft's Net 6.
public static async Task<string> ReadAllTextAsync(this Stream stream). {
string result;
using (var reader = new StreamReader(stream)) {
result = await reader.ReadToEndAsync().ConfigureAwait(false);
}
return result;
}
public static async Task<byte[]> ReadAllBytesAsync(this Stream stream) {
using (var content = new MemoryStream()) {
var buffer = new byte[4096];
int read = await stream.ReadAsync(buffer, 0, 4096).ConfigureAwait(false);
while (read > 0) {
content.Write(buffer, 0, read);
read = await stream.ReadAsync(buffer, 0, 4096).ConfigureAwait(false);
}
return content.ToArray();
}
}
public static async Task<List<string>> ReadAllLinesAsync(this Stream stream) {
var lines = new List<string>();
using (var reader = new StreamReader(stream)) {
string line;
while ((line = await reader.ReadLineAsync().ConfigureAwait(false)) != null) {
lines.Add(line);
}
}
return lines;
}
Is there a better way to do this?
I am not sure about the ConfigureAwait(false) that I picked on some code online.
A better alternative for the ReadAllBytesAsync is
public static async Task<byte[]> ReadAllBytesAsync(this Stream stream)
{
switch (stream)
{
case MemoryStream mem:
return mem.ToArray();
default:
using var m = new MemoryStream();
await stream.CopyToAsync(m);
return mem.ToArray();
}
}
For the ReadAllLinesAsync, the async stream in C# 8 can make the code cleaner:
public IAsyncEnumerable<string> ReadAllLinesAsync(this Stream stream)
{
using var reader = new StreamReader(stream)
while (await reader.ReadLineAsync() is { } line)
{
yield return line;
}
}
notice that the empty brace { } here is actually a property pattern that is only available after C# 8, it checks whether reader.ReadLineAsync() is null, if it's not, assign it to the line variable.
Usage:
var lines = await stream.ReadAllLinesAsync();
await foreach (var line in lines)
{
// write your own logic here
}
P.S.:
The ConfigureAwait(false) is kinda useless if your app is single-threaded like console apps, it instructs the awaiter not to capture the SynchronizationContext and let continuation run on the thread that runs the await statement, this method is useful when you're writing a library or SDK, since your user may use your library in a GUI application, and the combination of block waiting such as calling Task.Wait() and the capturing of SynchronizationContext often leads to deadlock, and ConfigureAwait(false) solves this. For detail explanation see ConfigureAwait FAQ

ASP.NET Core 3.1 read stream file upload in HttpPut request problem

Problem Statement:
I'm trying to iterate over a Streamed file upload in a HttpPut request using the Request.Body stream and I'm having a real hard time and my google-fu has turned up little. The situation is that I expect something like this to work and it doesn't:
[HttpPut("{accountName}/{subAccount}/{revisionId}/{randomNumber}")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status500InternalServerError)]
public async Task<IActionResult> PutTest()
{
var memStream = new MemoryStream();
var b = new Memory<byte>();
int totalBytes = 0;
int bytesRead = 0;
byte[] buffer = new byte[1024];
do
{
bytesRead = await Request.Body.ReadAsync(new Memory<byte>(buffer), CancellationToken.None);
totalBytes += bytesRead;
await memStream.WriteAsync(buffer, 0, bytesRead);
} while (bytesRead > 0);
return Ok(memStream);
}
In the debugger, I can examine the Request.Body and look at it's internal _buffer. It contains the desired data. When the above code runs, the MemoryStream is full of zeros. During "Read", the buffer is also full of zeros. The Request.Body also has a length of 0.
The Goal:
Use a HttpPut request to upload a file via streaming, iterate over it in chunks, do some processing, and stream those chunks using gRPC to another endpoint. I want to avoid reading the entire file into memory.
What I've tried:
This works:
using (var sr = new StreamReader(Request.Body))
{
var body = await sr.ReadToEndAsync();
return Ok(body);
}
That code will read all of the Stream into memory as a string which is quite undesirable, but it proves to me that the Request.Body data can be read in some fashion in the method I'm working on.
In the configure method of the Startup.cs class, I have included the following to ensure that buffering is enabled:
app.Use(async (context, next) => {
context.Request.EnableBuffering();
await next();
});
I have tried encapsulating the Request.Body in another stream like BufferedStream and FileBufferingReadStream and those don't make a difference.
I've tried:
var reader = new BinaryReader(Request.Body, Encoding.Default);
do
{
bytesRead = reader.Read(buffer, 0, buffer.Length);
await memStream.WriteAsync(buffer);
} while (bytesRead > 0);
This, as well, turns up a MemoryStream with all zeros.
I use to do this kind of request body stream a lot in my current project.
This works perfectly fine for me:
[HttpPut("{accountName}/{subAccount}/{revisionId}/{randomNumber}")]
[ProducesResponseType(StatusCodes.Status200OK)]
[ProducesResponseType(StatusCodes.Status500InternalServerError)]
public async Task<IActionResult> PutTest(CancellationToken cancel) {
using (var to = new MemoryStream()) {
var from = HttpContext.Request.Body;
var buffer = new byte[8 * 1024];
long totalBytes = 0;
int bytesRead;
while ((bytesRead = await from.ReadAsync(buffer, 0, buffer.Length, cancel)) > 0) {
await to.WriteAsync(buffer, 0, bytesRead, cancel);
totalBytes += bytesRead;
}
return Ok(to);
}
}
The only things I am doing different are:
I am creating the MemoryStream in a scoped context (using).
I am using a slightly bigger buffer (some trial and error led me to this specific size)
I am using a different overload of Stream.ReadAsync, where I pass the bytes[] buffer, the reading length and the reading start position as 0.

How to read chunk of file in WebAPI when file is large

I have a big file, and I want to send it to Web API which will send it to Amazon. Since file is big I want to send file to Amazon in chunk wise.
So If I have 1 GB file, I want my API to receive file in let's say 20 MB chunk so that I can send it to Amazon and then again receive 20 MB chunk. How is this doable. Below is my attempt.
public async Task<bool> Upload()
{
var fileuploadPath = ConfigurationManager.AppSettings["FileUploadLocation"];
var provider = new MultipartFormDataStreamProvider(fileuploadPath);
var content = new StreamContent(HttpContext.Current.Request.GetBufferlessInputStream(true));
// Now code below writes to a folder, but I want to make sure I read it as soon as I receive some chunk
await content.ReadAsMultipartAsync(provider);
return true;
}
Pseudo Code:
While (await content.ReadAsMultipartAsync(provider) == 20 MB chunk)
{
//Do something
// Then again do something with rest of chunk and so on.
}
File is as large as 1 GB.
As of now entire file is getting sent by this line of code:
await content.ReadAsMultipartAsync(provider);
I am lost here please help me. All I want is receive file in small chunk and process it.
P.S: I am sending file as MultiPart/Form-Data from Postman to test.
Attempt No 2:
var filesReadToProvider = await Request.Content.ReadAsMultipartAsync();
foreach (var content in filesReadToProvider.Contents)
{
var stream = await content.ReadAsStreamAsync();
using (StreamReader sr = new StreamReader(stream))
{
string line = "";
while ((line = sr.ReadLine()) != null)
{
using (MemoryStream outputStream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(outputStream))
{
sw.WriteLine(line);
sw.Flush();
// Do Something
}
}
}
}
No time to test this, but the ReadBlock method seems to be what you want to use.
Should look something like what I have below, but it assumes all your other code is good and you just needed some help with the buffering. This is a "blocking" read operation, but there is also a ReadBlockAsync method which returns a Task.
const int bufferSize= 1024;
var filesReadToProvider = await Request.Content.ReadAsMultipartAsync();
foreach (var content in filesReadToProvider.Contents)
{
var stream = await content.ReadAsStreamAsync();
using (StreamReader sr = new StreamReader(stream))
{
int bytesRead;
char[] buffer = new char[bufferSize];
while ((bytesRead = sr.ReadBlock(buffer, 0, bufferSize)) > 0)
{
// Do something with the first <bytesRead> of buffer and
// not with <bufferSize> as <bytesRead> will contain the
// number of bytes actually read by the call to ReadBlock
}
}
}

Return stream immediately and then write to stream asynchronously

In my current code I have a method like this to read data from a device (pseudo code):
public async Task<string> ReadAllDataFromDevice()
{
var buffer = "";
using (var device = new Device())
{
while(device.HasMoreData)
{
buffer += await device.ReadLineAsync();
}
}
return buffer;
}
I then want to send all that data via the network to some receiver. The amount of data can be really large. So clearly the above design is not very memory-efficient since it requires to read all the data before I can start sending it to the network socket.
So what I'd like to have is a function that returns a stream instead. Something like this:
public async Task<Stream> ReadAllDataFromDevice()
{
var stream = new MemoryStream();
using (var device = new Device())
using (var streamWriter = new StreamWriter(stream, new UTF8Encoding(), 512, true))
{
while(device.HasMoreData)
{
var line = await device.ReadLineAsync();
await streamWriter.WriteLineAsync(line);
}
await streamWriter.FlushAsync();
}
return stream;
}
This returns a stream but it clearly does not solve my problem, because the stream is returned only after all the data has been read from the device.
So I came up with this:
public Stream ReadAllDataFromDevice()
{
var stream = new MemoryStream();
Task.Run(async () => {
using (var device = new Device())
using (var streamWriter = new StreamWriter(stream, new UTF8Encoding(), 512, true))
{
while(device.HasMoreData)
{
var line = await device.ReadLineAsync();
await streamWriter.WriteLineAsync(line);
}
await streamWriter.FlushAsync();
}
});
return stream;
}
Is this a good design? I'm especially concerned about thread-safety, lifetime of the stream object used in the lambda, and exception handling.
Or is there a better pattern for this kind of problem?
Edit
Actually I just came up with another design that looks much cleaner to me. Instead of having the ReadAllDataFromDevice() function returning a stream, I let the consumer of the data provide the stream, like this:
public async Task ReadAllDataFromDevice(Stream stream)
{
using (var device = new Device())
using (var streamWriter = new StreamWriter(stream, new UTF8Encoding(), 512, true))
{
while(device.HasMoreData)
{
var line = await device.ReadLineAsync();
await streamWriter.WriteLineAsync(line);
}
await streamWriter.FlushAsync();
}
}
This is the design I'm using now:
public async Task ReadAllDataFromDevice(Func<Stream, Task> readCallback)
{
using (var device = new Device())
{
await device.Initialize();
using (var stream = new DeviceStream(device))
{
await readCallback(stream);
}
}
}
The line-by-line device access is encapsulated in the custom DeviceStream class (not shown here).
The consumer of the data would look something like this:
await ReadAllDataFromDevice(async stream => {
using (var streamReader(stream))
{
var data = await streamReader.ReadToEndAsync();
// do something with data
}
});

How to upload the Stream from an HttpContent result to Azure File Storage

I am attempting to download a list of files from urls stored in my database, and then upload them to my Azure FileStorage account. I am successfully downloading the files and can turn them into files on my local storage or convert them to text and upload them. However I lose data when converting something like a pdf to a text and I do not want to have to store the files on the Azure app that this endpoint is hosted on as I do not need to manipulate the files in any way.
I have attempted to upload the files from the Stream I get from the HttpContent object using the UploadFromStream method on the CloudFile. Whenever this command is run I get an InvalidOperationException with the message "Operation is not valid due to the current state of the object."
I've tried converting the original Stream to a MemoryStream as well but this just writes a blank file to the FileStorage account, even if I set the position to the beginning of the MemoryStream. My code is below and if anyone could point out what information I am missing to make this work I would appreciate it.
public DownloadFileResponse DownloadFile(FileLink fileLink)
{
string fileName = string.Format("{0}{1}{2}", fileLink.ExpectedFileName, ".", fileLink.ExpectedFileType);
HttpStatusCode status;
string hash = "";
using (var client = new HttpClient())
{
client.Timeout = TimeSpan.FromSeconds(10); // candidate for .config setting
client.DefaultRequestHeaders.Add("User-Agent", USER_AGENT);
var request = new HttpRequestMessage(HttpMethod.Get, fileLink.ExpectedURL);
var sendTask = client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
var response = sendTask.Result; // not ensuring success here, going to handle error codes without exceptions
status = response.StatusCode;
if (status == HttpStatusCode.OK)
{
var httpStream = response.Content.ReadAsStreamAsync().Result;
fileStorage.WriteFile(fileLink.ExpectedFileType, fileName, httpStream);
hash = HashGenerator.GetMD5HashFromStream(httpStream);
}
}
return new DownloadFileResponse(status, fileName, hash);
}
public void WriteFile(string targetDirectory, string targetFilePath, Stream fileStream)
{
var options = SetOptions();
var newFile = GetTargetCloudFile(targetDirectory, targetFilePath);
newFile.UploadFromStream(fileStream, options: options);
}
public FileRequestOptions SetOptions()
{
FileRequestOptions options = new FileRequestOptions();
options.ServerTimeout = TimeSpan.FromSeconds(10);
options.RetryPolicy = new NoRetry();
return options;
}
public CloudFile GetTargetCloudFile(string targetDirectory, string targetFilePath)
{
if (!shareConnector.share.Exists())
{
throw new Exception("Cannot access Azure File Storage share");
}
CloudFileDirectory rootDirectory = shareConnector.share.GetRootDirectoryReference();
CloudFileDirectory directory = rootDirectory.GetDirectoryReference(targetDirectory);
if (!directory.Exists())
{
throw new Exception("Target Directory does not exist");
}
CloudFile newFile = directory.GetFileReference(targetFilePath);
return newFile;
}
Had the same problem, the only way it worked is by reading the coming stream (in your case it is httpStream in DownloadFile(FileLink fileLink) method) to a byte array and using UploadFromByteArray (byte[] buffer, int index, int count) instead of UploadFromStream
So your WriteFile(FileLink fileLink) method will look like:
public void WriteFile(string targetDirectory, string targetFilePath, Stream fileStream)
{
var options = SetOptions();
var newFile = GetTargetCloudFile(targetDirectory, targetFilePath);
const int bufferLength= 600;
byte[] buffer = new byte[bufferLength];
// Buffer to read from stram This size is just an example
List<byte> byteArrayFile = new List<byte>(); // all your file will be here
int count = 0;
try
{
while ((count = fileStream.Read(buffer, 0, bufferLength)) > 0)
{
byteArrayFile.AddRange(buffer);
}
fileStream.Close();
}
catch (Exception ex)
{
throw; // you need to change this
}
file.UploadFromByteArray(allFile.ToArray(), 0, byteArrayFile.Count);
// Not sure about byteArrayFile.Count.. it should work
}
According to your description and codes, I suggest you could use Steam.CopyTo to copy the stream to the local memoryStream firstly, then upload the MemoryStream to azure file storage.
More details, you could refer to below codes:
I just change the DownloadFile method to test it.
HttpStatusCode status;
using (var client = new HttpClient())
{
client.Timeout = TimeSpan.FromSeconds(10); // candidate for .config setting
// client.DefaultRequestHeaders.Add("User-Agent", USER_AGENT);
//here I use my blob file to test it
var request = new HttpRequestMessage(HttpMethod.Get, "https://xxxxxxxxxx.blob.core.windows.net/media/secondblobtest-eypt.txt");
var sendTask = client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
var response = sendTask.Result; // not ensuring success here, going to handle error codes without exceptions
status = response.StatusCode;
if (status == HttpStatusCode.OK)
{
MemoryStream ms = new MemoryStream();
var httpStream = response.Content.ReadAsStreamAsync().Result;
httpStream.CopyTo(ms);
ms.Position = 0;
WriteFile("aaa", "testaa", ms);
// hash = HashGenerator.GetMD5HashFromStream(httpStream);
}
}
I had a similar problem and got to find out that the UploadFromStream method only works with buffered streams. Nevertheless I was able to successfully upload files to azure storage by using a MemoryStream. I don't think this to be a very good solution as you are using up your memory resources by copying the content of the file stream to memory before handing it to the azure stream. What I have come up with is a way of writing directly to an azure stream by using instead the OpenWriteAsync method to create the stream and then a simple CopyToAsync from the source stream.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse( "YourAzureStorageConnectionString" );
CloudFileClient fileClient = storageAccount.CreateCloudFileClient();
CloudFileShare share = fileClient.GetShareReference( "YourShareName" );
CloudFileDirectory root = share.GetRootDirectoryReference();
CloudFile file = root.GetFileReference( "TheFileName" );
using (CloudFileStream fileWriteStream = await file.OpenWriteAsync( fileMetadata.FileSize, new AccessCondition(),
new FileRequestOptions { StoreFileContentMD5 = true },
new OperationContext() ))
{
await fileContent.CopyToAsync( fileWriteStream, 128 * 1024 );
}

Categories