Reading/writing transient files in an Azure Worker Role

Reading/writing transient files in an Azure Worker Role - c#

We have a multi-tenant system consisting of an Azure Web Role, Worker Role and Desktop/Mobile apps. Each client app allows uploading images that get routed to a tenant-specific Azure Blob Storage account.
The Azure Worker Role polls these files and processes them. We use third-party SDKs for processing that require either a file system path or a stream. Providing a stream directly from blob storage is trivial but the SDK also expects to spit out physical metadata files that our app consumes.
This is a problem since the SDK is a black box and does not provide an alternative. Is there a way to have local storage within worker roles for transient files. This storage is only required for a few seconds per worker role iteration and may be recycled/discarded if the role is recycled or shut down. In addition, the files are rather large (500MB+) so blob latency is not desired.
Searching around revealed some hacky workarounds, the best of which appears to be something that wraps blob storage to let our role access it as a file system.
Is there a way to simply have access to a file system similar to Web Role App_Data folders?

You can use RoleEnvironment.GetLocalResource() from within an Azure Worker Role to get a named handle to local file storage:
RoleEnvironment.GetLocalResource()
This will avoid hardcoding of specific file paths that may change over time, etc.
Good luck!

Related

Uploading files to a directory with an ASP.NET Core Web Application hosted on Azure

I am using Azure App Service with Azure SQL Database to host an ASP.NET Core Web Application.
This application involves uploading documents to a directory. On my local dev machine I am simply using:
var fileUploadDir = $"C:\\FileUploads";
On Azure what feature would I use to create a directory structure and place files in the directory structure using:
var filePath = Path.Combine(fileUploadDir, formFile.FileName);
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
What Azure feature would I use and is there an API for file system actions or can I simply update the fileUploadDir that my existing code uses with an Azure directory path?

With an Azure App Service you can upload your file the same way. You just have to create your directory in the wwwroot folder. If you got multiple instances, this folder will be shared between them as stated in the documentation File access across multiple instances
:
File access across multiple instances The home directory contains an
app's content, and application code can write to it. If an app runs on
multiple instances, the home directory is shared among all instances
so that all instances see the same directory. So, for example, if an
app saves uploaded files to the home directory, those files are
immediately available to all instances.
Nevertheless, depending on the need of your application, a better solution could be to use a blob storage to manage your files especially if they must persist. The use of blobs can also be useful if you want to trigger async treatments with azure function after the upload for instance.
For short duration process with temporary files I used the home directory without any issue. As soon as the processing can be long, or if I want to keep the files, I tend to use asynchronous processing and the blob storage.
The blob storage solves the problems of access to the files in the home directory by users, allows to rely on a service dedicated to the storage and not a simple storage of type file system related to app service. Writing, deleting is simple and provides many other possibilities: direct access via REST service, access via shared access signature, async processing ...

Storing Images in Azure and Accessing it in Code

Before I published the website I've been working on to Azure, I kept all the images inside a local "Catalog" folder which was referenced in the program like;
image src='/Catalog/Images/Thumbs/<%#:Item.ImagePath%
Now it is deployed on Azure, I believe I need to turn to something called "Unstructured Blob Storage" to store and retrieve the images on the website.
This is my first time using Azure, I am wondering if it is as easy as storing the images in an unstructured blob storage on Azure, then just changing the "Catalog/Images/Thumbs" to the file path on Azure.
Does anybody know exactly how this works?
Thanks!

AFAIK, after deployed your web application to Azure, you could still store your resources (e.g. image, doc, excel, etc.) within your web application. In order to better manage your resources and reduce the pressure for your application when serving the static resources, you could store your resources in a central data store.
This is my first time using Azure, I am wondering if it is as easy as storing the images in an unstructured blob storage on Azure, then just changing the "Catalog/Images/Thumbs" to the file path on Azure.
Based on your requirement, you could create a blob container named catalog and upload your images to the virtual directory Images/Thumbs, and set anonymous read access to your container and blobs. For a simple way, you could leverage Azure Storage Explorer to upload your images and set access level for your container as follows:
And you image would look like this:
<img src="https://brucchstorage.blob.core.windows.net/catalog/Images/Thumbs/lake.jpeg">
Moreover, you could leverage AzCopy to copy data to Azure Blob storage using simple commands with optimal performance. Additionally, you could leverage Azure Storage client library to manage your storage resources, details you could follow here.

How to write a file to an Azure website folder from a webjob or worker role

I need to write an XML file to an Azure website folder under site root, with data pulled from the website's Azure SQL DB. This will be done on a recurring basis.
I have written a worker role which pulls the data and writes the file, however, it writes to a folder within the worker role folder structure using the below code. I am not sure how to specify and access the web folder path.
XmlTextWriter xml = new XmlTextWriter(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "feed.xml"), Encoding.UTF8);
Both the worker role and the website are in the same VS project, and deployed to Azure.
Since both the website and the worker role are in their respective VM instances, maybe the WR cannot access the website - I am not completely sure, and would really appreciate any guidance.
Would a webjob be better suited for this? Can it connect to an Azure DB, and write to a folder within the site root? Appreciate any code examples.

You wouldn't want to write a any single location as the server running your website is rather ethereal and can be taken offline and replaced at any time.
You should consider having your job generate the file and write it to Blob Storage and having your website read and serve the same file. Set up your site to treat the path to the file as a regular non-static route (this will let you keep the path /feeds/feed.xml) and the Action can read the XML file from your blob storage and serve it as the response.

AWS S3 upload or EC2 upload to handle permissions

I'm trying to find out what is the best storage service for my specific problem.
The Application
I'm developing a mobile app with Xamarin (.NET). Each user has to register and log in to use my service. Each user can be in several Groups where he hast the permission to store files in (each file about 200kb). My Backend is a EC2 instance hosting Cassandra as my database.
The Problems
I think about using AWS S3 for storing the files.
Question #1:
Should i directly upload to S3 or should i upload to EC2, handle the permissions and then store it in S3. When using direct upload to S3, i have the advantage of much less bandwith used on my EC2 instance. For direct uploading i have to provide a Token Vending Machine, which has two modes for providing the credentials i need to interact with S3: anonymous and identity. As i read the anonymous approach is mostly user for read-only scenarios. But for the identity approach the user has to register in a browser windows, which is absolutely nothing that i want for my users.
The application initiates communication with the TVM by bringing up a
browser on the mobile device to enable the user to register a user
name and password with the TVM. The TVM derives a secret key from the
password. The TVM stores the user name and secret key for future
reference.
Is it even possible to handle the permissions i need(each user can only upload and download files to groups which he belongs to)only with assigning AWS permissions to the TVM credentials?
Question #2:
Should i maybe consider storing each file directly in cassandra, since every file is only about 200kb? Problem here is, that the same files could be accessed several times per second.

I would use S3. That way you don't have to worry about bandwidth and permissions on the file. You do have interact with the Amazon S3 and IAM Service (Their authorization service). You can do this through the API and your language of choice (Python, Ruby, Java, etc)
If you are concerned about being tied to Amazon you can potentially setup something like OpenStack Storage (compatible with the S3 API) in your own datacenter and move your data to it. The files would still be handled by your initial application since your code would be "S3 compatible"

Sync Files from Azure Blob to Local

I like to write a process in Worker role, to download (sync) batch of files under a folder(directory) to local mirrored folder(directory)
Is there a timestamp(or a way to get) on the time of last folder(directory) updated?
Since folder(directory) structure unsure, but simply put is download whatever there to local, as soon as it changes. Except recursion and setup a timer to check it repeatedly, whats another smart idea do you have?
(edit) p.s. I found many solutions on sync files from local to Azure storage, but the same principle on local files cannot apply on Azure blob, I am still looking for a way that most easily to download(sync) files to local as soon as they are changed.

Eric, I believe the concept you're trying to implement isn't really that effective for your core requirement, if I understand it correctly.
Consider the following scenario:
Keep your views in the blob storage.
Implement Azure (AppFabric) Cache.
Store any view file to the cache, if it's not yet there on a web request with unlimited(or a very long) expiration time.
Enable local cache on your web role instances with a short expiration time (e.g. 5 minutes)
Create a (single, separated) worker role, outside your web roles, which scans your blobs' ETags for changes in interval. Reset the view's cache key for any blob changed
Get rid of those ugly "workers" inside of your web roles :-)
There're a few things to think about in this scenario:
Your updated views will get to the web role instances within "local cache expiration time + worker scan interval". The lower the values, the more distributed cache requests and blob storage transactions.
The Azure AppFabric Cache is the only Azure service preventing the whole platform to be truly scalable. You have to choose the best cache plan based on the overall size (in MB) of your views, the number of your instances and the number of simultaneous cache requests required per instance.
consider caching of the compiled views inside your instances (not in the AppFabric cache). Reset this local cache based on the dedicated AppFabric cache key/keys. This will raise the performance greatly for you, as rendering the output html will be as easy as injecting the model to the pre-compiled views.
of course, the cache-retrieval code in your web roles must be able to retrieve the view from the primary source (storage), if it is unable to retrieve it from the cache for whatever reason.

My suggestion is to create an abstraction on top of the blob storage, so that no one is directly writing to the blob. Then submit a message to Azure's Queue service when a new file is written. Have the file receiver poll that queue for changes. No need to scan the entire blob store recursively.
As far as the abstraction goes, use an Azure web role or worker role to authenticate and authorize your clients. Have it write to the Blob store(s). You can implement the abstraction using HTTPHandlers or WCF to directly handle the IO requests.
This abstraction will allow you to overcome the blob limitation of 5000 files you mention in the comments above, and will allow you scale out and provide additional features to your customers.
I'd be interested in seeing your code when you have a chance. Perhaps I can give you some more tips or code fixes.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.