Azure Blob Storage large file upload

Azure Blob Storage large file upload - c#

I am in the process of developing an application that will run on Azure and requires a user to upload very large .tiff files. We are going to use blob storage to store the files. I have been reviewing several websites to determine the correct approach to handling this situation and this link provides a good example but if I am using angular js on the frontend to grab and chunk them to the sas locator and upload via javascript, http://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-shared-access-signature-part-2/. My main confusion centers around the expiration time I should give to SAS for the user to perform the upload. I would like a distinct SAS to be created each time a user performs a file upload and for it to go away once the file is uploaded. I also would like to allow other site users to view the files as well. What is the best approach for handling these two scenarios? Also, there are examples on how to generate SAS Locator for the container and for the blob, if I need tp add a new blob to a container, which makes more sense?

It sounds like you may want to use a stored access policy on the container. This allows you to modify the expiry time for the SAS after the SAS has been created. Take a look at http://azure.microsoft.com/en-us/documentation/articles/storage-manage-access-to-resources/#use-a-stored-access-policy.
With the stored access policy, you could create a SAS with a longer expiry time and with write permissions only for the upload operation, and then have your application check when the file has finished uploading, and change the expiry time to revoke the SAS.
You can create a separate SAS and stored access policy with read permissions only, for users who are viewing the files on the site. In this case you can provide a long expiry time and change it or revoke it only if you need to.

Related

In UWP, where should I save user data?

I made an UWP app for Microsoft Store. However, user data automatically saved in the LocalState folder will be deleted every time the app is updated. I want the data to be retained after every updating, so I'm trying to suggest the users to save their data by themselves in the Document folder or somewhere to avoid their data deleted, but I don't want to bother them. Where should I save user data?
The roaming folder will be unable to use in future and I don't want to use Azure because of its fee.

The common approach is to store the data in some remote location, like for example in the cloud. You would typically use a service of some kind to request and save the data.
If you think Azure is to expensive, you'll have to find a cheaper storage solution. The principle is the same regardless of which storage provider you use.
As mentioned in the docs, roaming data is (or at least will be) deprecated. The recommended replacement is Azure App Service.

Expiry of url after file upload using Azure Blob Storage?

I've been researching and I haven't found the maximum time limit that a url can be accessed after file upload using Azure Blob Storage. The url that will be generated will be accessed by anonymous users and I wanted to know what is the maximum time that anonymous users can access it?

The url that will be generated will be accessed by anonymous users and
I wanted to know what is the maximum time that anonymous users can
access it?
As such there's no maximum time limit imposed by Azure on the expiry of a SAS URL. You can set it to 9999-12-31T23:59:59Z so that it never expires.
However it is not recommended. You should always issue SAS URLs that are short lived so that they can't be misused.
You can find more information about the best practices for SAS here: https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview#best-practices-when-using-sas.

Create a cloud storage app with ASP.NET and Azure

I need to create a cloud storage application using ASP.NET MVC, C# and integrate it with Azure storage.
I currently have a functional interface which allows users to register and securely stores their details in an SQL database. I also have a basic file uploader using Azure Blob storage that was created using this tutorial as a guideline.
My question regards how to give users their own container/page so that their files are only accessible by them. At the moment, the file uploader and Azure container is shared so that anybody with an account can view and edit the uploads. I want to restrict this so that each user has their own individual space that cannot be read or modified by others.
I have searched for answers but cannot find anything that suits my needs. Any advice would be greatly appreciated.

My question regards how to give users their own container/page so that
their files are only accessible by them.
One way to achieve this is by assigning a container to a user. When a user signs up, as a part of registration process you create a blob container for the user and store the name of the container along with other details about the user. When the user signs in, you fetch this information and only show files from that container only. Similarly when the user uploads the files, you save the files in that container only.
A few things you would need to consider:
You can't set any hard limit on the size of the container. So a container can be as big as a size of your storage account. If you want to put some restrictions on how much data a user can upload, you would need to manage that outside of storage in your application. You may also want to look into Azure File Service if that's a requirement. In Azure File Service, you can restrict the size of a share (equivalent of a blob container).
You may even want to load-balance your users across multiple storage accounts to achieve better throughput. If you decide to go down this route then along with container name, you would also need to store the storage account name along with user information.

Store the document names in a separate SQL database linked to the user's account. Then display to the user only those files with filenames linked specifically to them, and you should be on your way! I've used this architecture before, and it works like a charm. You should attach a Guid or some other unique identifier to each filename before implementing this model, however.

AWS S3 upload or EC2 upload to handle permissions

I'm trying to find out what is the best storage service for my specific problem.
The Application
I'm developing a mobile app with Xamarin (.NET). Each user has to register and log in to use my service. Each user can be in several Groups where he hast the permission to store files in (each file about 200kb). My Backend is a EC2 instance hosting Cassandra as my database.
The Problems
I think about using AWS S3 for storing the files.
Question #1:
Should i directly upload to S3 or should i upload to EC2, handle the permissions and then store it in S3. When using direct upload to S3, i have the advantage of much less bandwith used on my EC2 instance. For direct uploading i have to provide a Token Vending Machine, which has two modes for providing the credentials i need to interact with S3: anonymous and identity. As i read the anonymous approach is mostly user for read-only scenarios. But for the identity approach the user has to register in a browser windows, which is absolutely nothing that i want for my users.
The application initiates communication with the TVM by bringing up a
browser on the mobile device to enable the user to register a user
name and password with the TVM. The TVM derives a secret key from the
password. The TVM stores the user name and secret key for future
reference.
Is it even possible to handle the permissions i need(each user can only upload and download files to groups which he belongs to)only with assigning AWS permissions to the TVM credentials?
Question #2:
Should i maybe consider storing each file directly in cassandra, since every file is only about 200kb? Problem here is, that the same files could be accessed several times per second.

I would use S3. That way you don't have to worry about bandwidth and permissions on the file. You do have interact with the Amazon S3 and IAM Service (Their authorization service). You can do this through the API and your language of choice (Python, Ruby, Java, etc)
If you are concerned about being tied to Amazon you can potentially setup something like OpenStack Storage (compatible with the S3 API) in your own datacenter and move your data to it. The files would still be handled by your initial application since your code would be "S3 compatible"

Sync Files from Azure Blob to Local

I like to write a process in Worker role, to download (sync) batch of files under a folder(directory) to local mirrored folder(directory)
Is there a timestamp(or a way to get) on the time of last folder(directory) updated?
Since folder(directory) structure unsure, but simply put is download whatever there to local, as soon as it changes. Except recursion and setup a timer to check it repeatedly, whats another smart idea do you have?
(edit) p.s. I found many solutions on sync files from local to Azure storage, but the same principle on local files cannot apply on Azure blob, I am still looking for a way that most easily to download(sync) files to local as soon as they are changed.

Eric, I believe the concept you're trying to implement isn't really that effective for your core requirement, if I understand it correctly.
Consider the following scenario:
Keep your views in the blob storage.
Implement Azure (AppFabric) Cache.
Store any view file to the cache, if it's not yet there on a web request with unlimited(or a very long) expiration time.
Enable local cache on your web role instances with a short expiration time (e.g. 5 minutes)
Create a (single, separated) worker role, outside your web roles, which scans your blobs' ETags for changes in interval. Reset the view's cache key for any blob changed
Get rid of those ugly "workers" inside of your web roles :-)
There're a few things to think about in this scenario:
Your updated views will get to the web role instances within "local cache expiration time + worker scan interval". The lower the values, the more distributed cache requests and blob storage transactions.
The Azure AppFabric Cache is the only Azure service preventing the whole platform to be truly scalable. You have to choose the best cache plan based on the overall size (in MB) of your views, the number of your instances and the number of simultaneous cache requests required per instance.
consider caching of the compiled views inside your instances (not in the AppFabric cache). Reset this local cache based on the dedicated AppFabric cache key/keys. This will raise the performance greatly for you, as rendering the output html will be as easy as injecting the model to the pre-compiled views.
of course, the cache-retrieval code in your web roles must be able to retrieve the view from the primary source (storage), if it is unable to retrieve it from the cache for whatever reason.

My suggestion is to create an abstraction on top of the blob storage, so that no one is directly writing to the blob. Then submit a message to Azure's Queue service when a new file is written. Have the file receiver poll that queue for changes. No need to scan the entire blob store recursively.
As far as the abstraction goes, use an Azure web role or worker role to authenticate and authorize your clients. Have it write to the Blob store(s). You can implement the abstraction using HTTPHandlers or WCF to directly handle the IO requests.
This abstraction will allow you to overcome the blob limitation of 5000 files you mention in the comments above, and will allow you scale out and provide additional features to your customers.
I'd be interested in seeing your code when you have a chance. Perhaps I can give you some more tips or code fixes.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.