I am currently working on a problem I've encountered while using Azure Blob Storage together with C# API. I also didn't find a fitting solution in the questions here since most of them just download files once and they're done.
What I want to achieve is to have an API as a proxy for handling file downloads for my mobile clients. Therefore I need fast response / fast first byte responses since the mobile applications have a rather low timeout of five seconds.
[HttpGet, Route("{id}")]
[Authorize(Policy = xxxxx)]
public async Task<FileStreamResult> Get(Guid tenantId, Guid id)
{
if (tenantId == default)
{
throw new ArgumentException($"Tenant id '{tenantId}' is not valid.");
}
if (id == default)
{
throw new ArgumentException($"Package id '{id}' is not valid.");
}
var assetPackage = await _assetPackageService.ReadPackage(myenum.myvalue, tenantId, id).ConfigureAwait(false);
if (assetPackage == null)
{
return File(new MemoryStream(), "application/octet-stream");
}
return File(assetPackage.FileStream, assetPackage.ContentType);
}
public async Task<AssetPackage> ReadPackage(AssetPackageContent packageContent, Guid tenantId, Guid packageId)
{
var blobRepository = await _blobRepositoryFactory.CreateAsync(_settings, tenantId.ToString())
.ConfigureAwait(false);
var blobPath = string.Empty;
//some missing irrelevant code
var blobReference = await blobRepository.ReadBlobReference(blobPath).ConfigureAwait(false);
if (blobReference == null)
{
return null;
}
var stream = new MemoryStream();
await blobReference.DownloadToStreamAsync(stream).ConfigureAwait(false);
stream.Seek(0, SeekOrigin.Begin);
return new AssetPackage(packageContent, stream, blobReference.Properties.ContentType);
}
I am aware that MemoryStream is terrible for downloading and stuff since it consumes the files into memory before distributing it to the client.
How would you tackle this? Is there a easy solution to have my API act as a proxy rather than downloading the whole file and then let the client download it again from my api?
Possible and working solution is - as silent mentioned - adding the Azure Storage to the Azure API Management. You could add authorization or work with SAS links which might or might not fit your application.
I followed this guide to setup my architecture and it works flawlessly. Thanks to silent for the initial idea.
Related
Background
I'm currently working on a .Net Core - C# application that is reliant on various Azure services. I've been tasked with creating an endpoint that allows users to bulk download a varying number of files based on some querying/filtering. The endpoint will be triggered by a download all button on the frontend and should return a .zip of all said files. The total size of this zip could be anywhere from 100KB-100GB depending on the query/filters provided.
Note: Although I'm familiar with Asynchrony, Concurrency, and Streams. Interactions between these and between api layers is something Im still getting my head around. Bear with me.
Question
How can I achieve this in a performant and scalable manner given some architectural constraints? Details provided below.
Architecture
The backend currently consist of 2 main layers. The API Layer consist of Azure Functions which are the first point of contact for any and all request from the frontend. The Service Layer stands in-between the API Layer and other Azure Services. In this particular case the Service Layer interacts with an Azure Blob Storage Container, where the various files are stored.
Current Implementation/Plan
Request:
The request itself is strait forward. The API Layer takes query's and filters and turns that into a list of filenames. That is then sent in the body of a POST request to the Service Layer. The Service Layer loops through the list and retrieves each file individually from the blob storage. As of right now there is no way of bulk downloading attachments. This is where complications start.
Given the potential file size, can't pull all data into memory at one time, may need to be streamed or batched.
Given many files, may need to download files in parallel from blob storage.
Need to build zip file from async parallel task? Which can't be built completely in memory.
Response:
I currently have a working version of this that doesn't worry about memory. The above diagram is meant as an illustration of the limitations/considerations of the task rather than a concept that can be put to code. No one layer can/should hold all of the data at any given time. My original attempt/idea was to use a series of streams that can pipe data down the line in some manor. However, I realized this might be a fools errand and decided to make this post.
Any thoughts on a better high-level work flow to accomplish this task would be greatly appreciated. I would also love to hear completely different solutions to the problem.
Thank you Sha. Posting your suggestions as an answer to help other community members.
POST a list of file paths to a Azure Function (Http trigger)
Create a queue message containing the file paths and put on a storage queue.
Listen to said storage queue with another Azure function (Queue trigger).
Stream each file from Azure Storage -> Add it to a Zip stream -> Stream it back to Azure storage.
Below code will help on creating ZIP file.
public class AzureBlobStorageCreateZipFileCommand : ICreateZipFileCommand
{
private readonly UploadProgressHandler _uploadProgressHandler;
private readonly ILogger<AzureBlobStorageCreateZipFileCommand> _logger;
private readonly string _storageConnectionString;
private readonly string _zipStorageConnectionString;
public AzureBlobStorageCreateZipFileCommand(
IConfiguration configuration,
UploadProgressHandler uploadProgressHandler,
ILogger<AzureBlobStorageCreateZipFileCommand> logger)
{
_uploadProgressHandler = uploadProgressHandler ?? throw new ArgumentNullException(nameof(uploadProgressHandler));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
_storageConnectionString = configuration.GetValue<string>("FilesStorageConnectionString") ?? throw new Exception("FilesStorageConnectionString was null");
_zipStorageConnectionString = configuration.GetValue<string>("ZipStorageConnectionString") ?? throw new Exception("ZipStorageConnectionString was null");
}
public async Task Execute(
string containerName,
IReadOnlyCollection<string> filePaths,
CancellationToken cancellationToken)
{
var zipFileName = $"{DateTime.UtcNow:yyyyMMddHHmmss}.{Guid.NewGuid().ToString().Substring(0, 4)}.zip";
var stopwatch = Stopwatch.StartNew();
try
{
using (var zipFileStream = await OpenZipFileStream(zipFileName, cancellationToken))
{
using (var zipFileOutputStream = CreateZipOutputStream(zipFileStream))
{
var level = 0;
_logger.LogInformation("Using Level {Level} compression", level);
zipFileOutputStream.SetLevel(level);
foreach (var filePath in filePaths)
{
var blockBlobClient = new BlockBlobClient(_storageConnectionString, containerName, filePath);
var properties = await blockBlobClient.GetPropertiesAsync(cancellationToken: cancellationToken);
var zipEntry = new ZipEntry(blockBlobClient.Name)
{
Size = properties.Value.ContentLength
};
zipFileOutputStream.PutNextEntry(zipEntry);
await blockBlobClient.DownloadToAsync(zipFileOutputStream, cancellationToken);
zipFileOutputStream.CloseEntry();
}
}
}
stopwatch.Stop();
_logger.LogInformation("[{ZipFileName}] DONE, took {ElapsedTime}",
zipFileName,
stopwatch.Elapsed);
}
catch (TaskCanceledException)
{
var blockBlobClient = new BlockBlobClient(_zipStorageConnectionString, "zips", zipFileName);
await blockBlobClient.DeleteIfExistsAsync();
throw;
}
}
private async Task<Stream> OpenZipFileStream(
string zipFilename,
CancellationToken cancellationToken)
{
var zipBlobClient = new BlockBlobClient(_zipStorageConnectionString, "zips", zipFilename);
return await zipBlobClient.OpenWriteAsync(true, options: new BlockBlobOpenWriteOptions
{
ProgressHandler = _uploadProgressHandler,
HttpHeaders = new BlobHttpHeaders
{
ContentType = "application/zip"
}
}, cancellationToken: cancellationToken);
}
private static ZipOutputStream CreateZipOutputStream(Stream zipFileStream)
{
return new ZipOutputStream(zipFileStream)
{
IsStreamOwner = false
};
}
}
Check Zip File using Azure functions for further information.
Created a POST API which basically save a file in one directory.
Will asynchronous code make my API better at handling the scalability when multiple requests come from clients?
Currently, the code works synchronously.
Should I make every method as asynchronous? And where should I place the keyword await?
The tasks:
Task 1: Read request content (XML)
Task 2: Create a directory if not created already
Task 3: Uniquely make filenames unique
Save file on the directory
[System.Web.Mvc.HttpPost]
public IHttpActionResult Post(HttpRequestMessage request)
{
try
{
string contentResult = string.Empty;
ValidateRequest(ref contentResult, request);
//contentResult = "nothing";
//Validation of the post-requested XML
//XmlReaderSettings(contentResult);
using (StringReader s = new StringReader(contentResult))
{
doc.Load(s);
}
string path = MessagePath;
//Directory creation
DirectoryInfo dir = Directory.CreateDirectory($#"{path}\PostRequests");
string dirName = dir.Name;
//Format file name
var uniqueFileName = UniqueFileNameFormat();
doc.Save($#"{path}\{dirName}\{uniqueFileName}");
}
catch (Exception e)
{
LogService.LogToEventLog($"Error occured while receiving a message from messagedistributor: " + e.ToString(), System.Diagnostics.EventLogEntryType.Error);
throw e;
}
LogService.LogToEventLog($"Message is received sucessfully from messagedistributor: ", System.Diagnostics.EventLogEntryType.Information);
return new ResponseMessageResult(Request.CreateResponse((HttpStatusCode)200));
}
Yes, it should.
When you use async with a network or IO calls, you do not block threads and they can be reused for processing other requests.
But, if you have only one drive and other clients do the the same job - you will not get speed benefits, but whole system health still would be better with async calls.
Recently I started developing an API for the company I work for.
After some research we resulted to using NLog as a logging library and it has a Layout Renderer for logging the posted request body, so that's good. But, there is also a need for logging the response, the time it took for the request to process and return (since it will be also consumed by 3rd party vendors and usually the way it goes with some of them is: -I clicked the thing. -hmm no you didn't).
Now, I have read so much these days about Middleware logging and but some posts are dated, some solutions work partially (having an issue with viewing the developer page), somewhere in github I've read that it's bad practice to log the response since it can contain sensitive data. Maybe there is something like telemetry I'm missing?
Thanks for your time and help and sorry for the rant, I'm still pretty burned after the endless reading-testing.
What I have already tried and what the current issue is.
The issue with the context.Response.Body is that it is a non-readable, but writeable stream. In order to read it, it must be assigned to another stream, then assign a new readable stream to the .Body, then allow it to continue to the controller, read the returned stream and copy it back to the .Body.
The example middleware class. (Credits to: jarz.net | logging-middleware)
public class LoggingMiddleware
{
private readonly RequestDelegate _next;
private readonly ILogger<LoggingMiddleware> _logger;
public LoggingMiddleware(RequestDelegate next, ILogger<LoggingMiddleware> logger)
{
_logger = logger;
_next = next;
}
public async Task Invoke(HttpContext context)
{
if (_logger.IsEnabled(LogLevel.Trace))
{
string responseBodyString = string.Empty;
try
{
// Swap the original Response.Body stream with one we can read / seek
Stream originalResponseBody = context.Response.Body;
using MemoryStream replacementResponseBody = new MemoryStream();
context.Response.Body = replacementResponseBody;
await _next(context); // Continue processing (additional middleware, controller, etc.)
// Outbound (after the controller)
replacementResponseBody.Position = 0;
// Copy the response body to the original stream
await replacementResponseBody.CopyToAsync(originalResponseBody).ConfigureAwait(false);
context.Response.Body = originalResponseBody;
if (replacementResponseBody.CanRead)
{
replacementResponseBody.Position = 0;
responseBodyString = new StreamReader(replacementResponseBody, leaveOpen: true).ReadToEndAsync().ConfigureAwait(false).GetAwaiter().GetResult();
replacementResponseBody.Position = 0;
}
}
finally
{
if (responseBodyString.Length > 0)
{
_logger.LogTrace($"{responseBodyString}");
}
}
}
else
await _next(context);
}
}
Here is a minimal, end-to-end example of adding logging middleware to an ASP.NET Core service, without disrupting things like the generation of the developer exception page.
https://github.com/Treit/LoggingMiddlewareExample
I want to get an alert when a service (grafana or influxdb) in an Azure virtual machine (Ubuntu 16.04) has stopped. I'd like to use c# to connect to the VM and check the status of grafana and influxdb services. Can anyone share a code sample that implements this?
Both services provide health endpoints that can be used to check their status from a remote server. There's no need to open a remote shell connection. In fact, it would be impossible to monitor large server farms if one had to SSH to each one.
In the simplest case, and ignoring networking issues, one can simply hit the health endpoints to check the status of both services. A rough implementation could look like this :
public async Task<bool> CheckBoth()
{
var client = new HttpClient
{
Timeout = TimeSpan.FromSeconds(30)
};
const string grafanaHealthUrl = "https://myGrafanaURL/api/health";
const string influxPingUrl = "https://myInfluxURL/ping";
var (grafanaOK, grafanaError) = await CheckAsync(client, grafanaHealthUrl,
HttpStatusCode.OK, "Grafana error");
var (influxOK, influxError) = await CheckAsync(client, influxPingUrl,
HttpStatusCode.NoContent,"InfluxDB error");
if (!influxOK || !grafanaOK)
{
//Do something with the errors
return false;
}
return true;
}
public async Task<(bool ok, string result)> CheckAsync(HttpClient client,
string healthUrl,
HttpStatusCode expected,
string errorMessage)
{
try
{
var status = await client.GetAsync(healthUrl);
if (status.StatusCode != expected)
{
//Failure message, get it and log it
var statusBody = await status.Content.ReadAsStringAsync();
//Possibly log it ....
return (ok: false, result: $"{errorMessage}: {statusBody}");
}
}
catch (TaskCanceledException)
{
return (ok: false, result: $"{errorMessage}: Timeout");
}
return (ok: true, "");
}
Perhaps a better solution would be to use Azure Monitor to ping the health URLs periodically and send an alert if they are down.
Here is something you can use to connect to Azure linux using SSH in c#
using (var client = new SshClient("my-vm.cloudapp.net", 22, "username", "password​"))
{
client.Connect();
Console.WriteLine("it worked!");
client.Disconnect();
Console.ReadLine();
}
Usually SSH server only allow public key auth or other two factor auth.
Change your /etc/ssh/sshd_configuncomment #PasswordAuthentication yes
# Change to no to disable tunnelled clear text passwords
#PasswordAuthentication yes
Later you can poll for installed services.
Also for an alternative solution, you can deploy a rest api in your linux VM to check the status of your service and the call it from C# httpclient for the status.
Hope it helps
I'm streaming data into BQ with .NET API. And I noticed in Process Explorer that new TCP/IP connections are created and ended over and over again. I'm wondering if it's possible to reuse the connection and avoid big overhead of connection creation and end?
public async Task InsertAsync(BaseBigQueryTable table, IList<IDictionary<string, object>> rowList, GetBqInsertIdFunction getInsert,CancellationToken ct)
{
if (rowList.Count == 0)
{
return;
}
string tableId = table.TableId;
IList<TableDataInsertAllRequest.RowsData> requestRows = rowList.Select(row => new TableDataInsertAllRequest.RowsData {Json = row,InsertId = getInsert(row)}).ToList();
TableDataInsertAllRequest request = new TableDataInsertAllRequest { Rows = requestRows };
bool needCreateTable = false;
BigqueryService bqService = null;
try
{
bqService = GetBigQueryService();
TableDataInsertAllResponse response =
await
bqService.Tabledata.InsertAll(request, _account.ProjectId, table.DataSetId, tableId)
.ExecuteAsync(ct);
IList<TableDataInsertAllResponse.InsertErrorsData> insertErrors = response.InsertErrors;
if (insertErrors != null && insertErrors.Count > 0)
{
//handling errors, removed for easier reading..
}
}catch{
//... removed for easier reading
}
finally
{
if (bqService != null)
bqService.Dispose();
}
}
private BigqueryService GetBigQueryService()
{
return new BigqueryService(new BaseClientService.Initializer
{
HttpClientInitializer = _credential,
ApplicationName = _applicationName,
});
}
** Follow up **
The answer given below seems to be the only solution to reduce http connections. however, I found using batch request on large mount of live data streaming could have some limitation. see my another questions on this: Google API BatchRequest: An established connection was aborted by the software in your host machine
Below link documents how to batch API calls together to reduce the number of HTTP connections your client has to make
https://cloud.google.com/bigquery/batch
After batch request is issued, you can get response and parse out all involved jobids. As an alternative you can preset jobids in batch request for each and every inner request. Note: you need to make sure those jobids are unique
After that you can check what is going on with each of these jobs via jobs.get https://cloud.google.com/bigquery/docs/reference/v2/jobs/get