I have to update a large file stored as Azure blob. This update will take a few seconds and I need to ensure that no other client ever gets the partially updated file.
As described in https://learn.microsoft.com/en-us/azure/storage/common/storage-concurrency it should be easy to lock the file for writing but as far as I understand, other clients will still be able to read the file. I could use read locks but that would mean that only one client can read the file and that's not what I want.
According to Preventing azure blob from being accessed by other service while it's being created it seems that at least new files will be "committed" at the end of an upload but I could not find information what happens when I update an existing file.
So, the question is: What will other clients read during an update (replace) operation?
Will they read the old file until the new data is committed or
will they read the partially updated file content?
I did a test for the scenario(I didn't find any official doc about this), updating a 400M file in blob with a 600M file. and during the update(about 10 seconds after starting update), use code to read the blob which is updating.
The test result is that, only the old file can be read during updating.
Related
I want to download a file from a direct link. Those files are a between 900mb and 30GB. That's prettly large so I don't want to download them to a temp folder and then upload them. I want to use something like Azure Functions to do this every x hours and the temp storage then becomes pretty limited.
Is there a way to download / download stream and upload simultaneously to blobstorage? I don't want to save it first.
Hope you can help me out
Is there a way to download / download stream and upload simultaneously
to blobstorage? I don't want to save it first.
You don't really need to do it. Azure Storage can do this for you.
You will need to use Copy Blob functionality and provide the URL of the file you wish to transfer and Azure Storage will asynchronously copy the file into blob storage. Please do note that this is an asynchronous operation and you do not have control over when the blob gets copied.
If you want synchronous copy operation, you can take a look at Put Block From URLenter link description here operation. This is where you control how many bytes of data you want to transfer from source to blob storage.
I am using the Azure Storage File Shares client library for .NET in order to save files in the cloud, read them and so on. I got a file saved in the storage which is supposed to be updated after every time I'm doing a specific action in my code.
The way I'm doing it now is by downloading the file from the storage using
ShareFileDownloadInfo download = file.Download();
And then I edit the file locally and uploading it back to the storage.
The problem is that the file can be updated frequently which means lots of downloads and uploads of the file which increases in size.
Is there a better way of editing a file on Azure storage? Maybe some way to edit the file directly in the storage without the need to download it before editing?
Downloading and uploading the file is the correct way to make edits with the way you currently handling the data. If you are finding yourself doing this often, there are some strategies you could use to reduce traffic:
If you are the only one editing the file you could cache a copy of it locally and upload the updates to that copy instead of downloading it each time.
Cache pending updates and only update the file at regular intervals instead of with each change.
Break the single file up into multiple time-boxed files, say one per hour. This wouldn't help with frequency but it can with size.
FYI, when pushing logs to storage, many Azure services use a combination of #2 and #3 to minimize traffic.
I've found tons of information about how to create and upload a zip file to Azure storage, but I'm trying to see if there's a good way to add content to an existing zip file in blob storage that doesn't involve downloading the whole blob, rebuilding in memory, then uploading again.
My use case involves zipping several hundred, up to several million, items into an archive to be stored in Azure blob storage for later download. I have code that will handle splitting so I don't wind up with a single several-GB size file, but I still run into memory management issues when dealing with large quantities of files. I'm trying to address this by creating the zip file in blob storage, and adding subsequent files to it one by one. I recognize this will incur cost for the additional writes, and that's fine.
I know how to use Append Blobs and Block Blobs, and I have working code to create the zip file and upload, but I can't seem to find out if there's a way to do this. Anyone managed to accomplish this, or able to confirm that this is not possible?
Since you're dealing with zip files, only way to add new files to an existing zip file is to download the blob, add new file to that zip file and then reupload that blob.
I have a binary data file that is written to from a live data stream, so it keeps on growing as stream comes. In the meanwhile, I need to open it at the same time in read-only mode to display data on my application (time series chart). Opening the whole file takes a few minutes as it is pretty large (a few 100' MBytes).
What I would like to do is, rather than re-opening/reading the whole file every x seconds, read only the last data that was added to the file and append it to the data that was already read.
I would suggest using FileSystemWatcher to be notified of changes to the file. From there, cache information such as the size of the file between events and add some logic to only respond to full lines, etc. You can use the Seek() method of the FileStream class to jump to a particular point in the file and read only from there. I hope it helps.
If you control the writing of this file, I would split it in several files of a predefined size.
When the writer determines that the current file is larger than, say, 50MB, close it and immediately create a new file to write data to. The process writing this data should always know the current file to write received data to.
The reader thread/process would read all these files in order, jumping to the next file when the current file was read completely.
You can probably use a FileSystemWatcher to monitor for changes in the file, like the example given here: Reading changes in a file in real-time using .NET.
But I'd suggest that you evaluate another solution, including a queue, like RabbitMQ, or Redis - any queue that has Subscriber-Publisher model. Then you'll just push the live data into the queue, and will have 2 different listeners(subscribers) - one to save in the file, and the other to process the last-appended data. This way you can achieve more flexibility with distributing load of the application.
I am using FileUpload control of asp.net and uploading the excel with some data. I can't save it in some folder. I can have stream of excel sheet file or I can have Blobstream after uploading excel as a blob. Now I want to convert that excel sheets 1st sheet to datatable so how shall I do that? I am using C# .NET. I don't want to use Interop library. I can use external libraries. Oledb connection is getting failed since I don't have any physical path of excel as a data source. I tried following links:
1) http://www.codeproject.com/Articles/14639/Fast-Excel-file-reader-with-basic-functionality
2) http://exceldatareader.codeplex.com/
Please help.
Depending on the type of Excel file you can use the examples you posted or go for the OpenXML alternative (for xlsx files): http://openexcel.codeplex.com/
Now, the problem with the physical path is easy to solve. Saving the file to blob storage is great. But if you want, you can also save it in a local resource to have it locally. This will allow you to process the file using a simple OleDb connection. Once you're done with the file, you can just delete it from the local resource (it will still be available in the blob storage since you also uploaded it there).
Don't forget to have some kind of clean up mechanism in case your processing fails. You wouldn't want to end up with a disk filled with temporary files (even though it could take a while before this happens).
Read more on local resources here: http://msdn.microsoft.com/en-us/library/windowsazure/ee758708.aspx
You should use OpenXML SDK which is an officially suggested way of working with MS Office documents - http://www.microsoft.com/download/en/details.aspx?id=5124
I first created local storage as per the link:
http://msdn.microsoft.com/en-us/library/windowsazure/ee758708.aspx
suggested by Sandrino above. Thanks Sandrino for this. Then I used oledb connection and it gave me an error "Microsoft.Jet.Oledb.4.0 dll is not registered". Then I logged on to the azure server and in the IIS changed app pool configuration for 32-bit. To change app pool to 32-bit refer the following link:
http://blog.nkadesign.com/2008/windows-2008-the-microsoftjetoledb40-provider-is-not-registered-on-the-local-machine/
The approach you followed is not the correct one, as you said you logged on to azure and changed, the VM which is running on azure is not the permanent one for you. For any updates you are going to get new VM machine. you might have to find turn around for this, instead of modifying manually. You can make use of the startup tasks in your azure app. See the link below it may help you.
http://msdn.microsoft.com/en-us/library/gg456327.aspx