How to rollback S3 document upload? - c#

I make a request to a REST API (let it be API 1) which internally calls another 2 APIs (APIs 2 & 3) synchronously.
API 1 = REST API
API 2 = Pre-signed url to upload a file into S3
API 3 = A DB update (SQL Server)
API 3 i.e., the DB update will be made only if the file is successfully uploaded (API 2) into S3.
In case the DB update (API 3) is failed, the changes API 2 did should be rolled back i.e., the uploaded file should be deleted in S3.
Please advise how to handle this scenario. Any out-of-the-box solution is welcome.

S3 services are not transactional. Basically all the rest apis are not transactional so all the operations are atomic:
What are atomic operations for newbies?
What this means is that you can't rollback the operation if it has succeeded.
It would be easy to say that it's ok. Once local db fails I can issue a delete call on s3 to delete my file. But what happens if this fails too?
Another way would be to first write to your database and then post the file. Again if the file upload fails you can rollback the db command. That is safer, but still... what happens when you send the request but you get a timeout? The file might be on the server but you just won't know.
Enter the world of eventual consistency.
While there are ways to mitigate the issue with retries (check polly library for test retries) what you can do is store the action.
You want to upload the file. Add it to a queue and run the task. Mark the task as failed. Retry as many times as you want and mark the failure reasons.
Here comes manual interventions. When all else fails, someone should intervene with some resolution strategy.

If you need to "undo" an upload to any file system (S3 is a file system) you do it like this.
Upload the new file with some temporary unique file name (a guid will do fine).
To "commit" the upload, remove the file you're replacing and rename the one you just uploaded so it has the same name as the old one.
To "roll back" the upload, remove the temp file.
An even better way, if your application allows it, is to give each version of the file a different name. Then you just upload each new one with its own name, and clean up by deleting the old ones.
In your particular scenario, it might make sense to do your database update operation first, and the upload second, if that won't open you up to a nasty race condition.

Related

Scan Uploaded File using Sophos Labs Intelix in c#

I am new to SophosLabs Intelix. I am trying to build a sample in my ASP .Net Application(webforms/MVC) in which I want to run an Antivirus Scan on the uploaded file by the User. If the Uploaded file is clean I want to upload it to the server else I want to cancel the operation. I want to specifically use SophosLabs Intelix for the functionality. It would be great if someone can guide me regarding this functionality. A code sample in C# would be appreciated a lot.
Thanks in advance for your help.
Sample:
if(file.HasFile)
{
//run an antivirus scan
//result
if(result == NoThreat){
//Uploaded Successfully
}
else{
//File contains a virus. Upload failed!
}
}
else{
//Please select a file to upload!
}
I suggest to start with the implementation of OAUTH 2 authentication request. You can find some ideas here: How do I get an OAuth 2.0 authentication token in C#
As soon as you have the access_token you can use if for /reports/?sha256=... query.
It may return the report immediately.
If it does not return any data (404) this request was free and you can POST the file to the root endpoint "/" for analysis.
It can take a few seconds/minutes, during that you should poll the report from /reports/{job_id} endpoint as long as you get it.
If you cannot wait minutes for decision data, you may use the File Hash Lookup API as well that returns immediately.
It may give a reputationScore between 30..69 so cannot decide how dangerous the file is, but in this case you can still perform a static or dynamic analysis on it.

When listing a Drive folder's changes (via ChangeResource) for the first time, what page token should be used?

Lets say the user already has files synchronized (via my app) to their Drive folder. Now they sign into my app on a second device and is ready to sync files for the first time. Do I use the Changes API for the initial sync process?
I ask because using the Changes API requires a StartPageToken, which requires that there had been a previous sync operation. Well there is no possible way for user to already have a StartPageToken if they are synchronizing data on a device for the first time.
Google's documentation is a joke. They shouldn't leave it up to us to read between the lines and just figure this out. I'm sure I can cook up something that will "work", but how do I ever know that it is the "appropriate" and EFFICIENT way to go about handling this?
public async Task<AccessResult> GetChangesAsync(CancellationToken cancellationToken, string fields = "*")
{
ChangesResource.ListRequest listRequest = new ChangesResource.ListRequest(DriveService, startPageToken)
{
Spaces = Folder_appDataFolder,
Fields = fields + ", nextPageToken",
IncludeRemoved = true,
PageSize = 20
};
ChangeList changeList = await listRequest.ExecuteAsync(cancellationToken);
}
Here, I am looking to start syncing the user's for the first time and so a page token doesn't even make sense for that because during the first sync your goal is to get all of the users data. From then on you are looking to only sync any further changes.
One approach I thought of is to simply use ListRequest to list all of the users data and start downloading files that way. I can then simply request a start page token and store it to be used during sync attempts that occur later...
...But what if during the initial download of the user's files (800 files, for example) an error occurs, and the ListRequest fails on file 423? Because I cannot attain a StartPageToken in the middle of a ListRequest to store in case of emergency, do I have to start all over and download all 800 files again, instead of starting at file 423?
When doing changes.list for the first time you should call getStartPageToken this will return the page token you can use to get the change list. If its the first time then there will be no changes of course.
If the user is using your application from more then one device then the logical course of action would be for you to save the page token in a central location when the user started the application for the first time on the first deceive. This will enable you to use that same token on all additional devices that the user may chose to use.
This could be on your own server or even in the users app data folder on drive
I am not exactly user what your application is doing but i really dont think you should be downloading the users files unless they try to access it. There is no logical reason i can think of for your application to store a mirror image of a users drive account. Access the data they need when they need it. You shouldn't need everything. Again i dont know exactly what your application does.

ASP.net: Concurrent file upload Fails after N number of large uploads

I am working on an asp.net (Webforms, asp.net 2.0, Framework 3.5) application. It is 32 bit application running on IIS 7.0, with OS Windows 2008 R2 SP1
I am facing an issue with large file uploads. The files which are more than 20 MB or so. The application is able to upload large files however, it is noticed that after N number of uploads, the next set of uploads keep on failing until IIS is restarted.
The application supports concurrent file uploads. It is noticed that, single large file upload always works. Only when we start upload for more than 1 file, one of the uploads get stuck.
I tried looking at the temp folders in which posted file data gets uploaded and noticed that when the issue happens, the upload for the failing file never starts from server's view point as it never generates any temp file and after few sec, the request fails.
When the things fail,
CPU is all OK
W3wp stands at 2 GB memory usage (against total 4 GB RAM)
W3wp does not show an crash as the other pages of the application still works fine
I tried using wireshark to see network traffic, but it also say ERR_connection_RESET. Apart from that, I am not getting any clue.
I am suspecting below things but not sure how to conclude or fix.
1) To start concurrent uploads, server needs to cop up with data pumping rate from client side and when it is unable to match that, it must be failing internally. This could be due to server's inability to server concurrent requests.
2) Frequent large uploads increases the memory footprint of the application to an extent where it cannot work with concurrent uploads, because to dump the files at temporary location in chunked manner, RAM is still required
Here is my web config setting
<httpRuntime maxRequestLength="2097151" executionTimeout="10800" enableVersionHeader="false"/>
From the implementation perspective,
1) We have client side implementation written in Java script, which creates FormData and sends the XHR to server
2) Server has a method which gets called when complete file is copied to server's temp directory, and we extract the file data using Request.Files collection and then processes further
When issue happens, the server method gets called, but the Request.Files comes empty.
Please let me know if anyone have very good insight on this which can guide me to the root cause and fix.
UPDATE:
Client side code representation:
//Set HTTP headers
_http.setRequestHeader("x-uploadmethod", "formdata");
_http.setRequestHeader("x-filename", "Name of file");
// Prepare form data
var data = new FormData();
data.append("Name of file", File contents);
//Sends XHR request
_http.send(data);
Server side code representation:
HttpFileCollection files = Request.Files;
int Id = objUpload.UploadMyAssets(files[0]);
The logic in UploadMyAssets is taking files[0] as HttpPostedFile and then move ahead with application specific logic.
Thanks
I had the same issue. Turns out ASP.NET Default Session Manager is blocking with async streams over https (HTTP/2). Didn't happen over http (non-ssl).
Resolved this by using SessionStateBehavior.Readonly for the Controller Class. Related to this post:
ASP.Net Asynchronous HTTP File Upload Handler

Azure File System - Can I "Watch" or only poll?

I am an experienced windows C# developer, but new to the world of Azure, and so trying to figure out a "best practice" as I implement one or more Azure Cloud Services.
I have a number of (external, and outside of my control) sources that can all save files to a folder (or possibly a set of folders). In the current state of my system under Windows, I have a FileSystemWatcher set up to monitor a folder and raise an event when a file appears there.
In the world of Azure, what is the equivalent way to do this? Or is there?
I am aware I could create a timer (or sleep) to pass some time (say 30 seconds), and poll the folder, but I'm just not sure that's the "best" way in a cloud environment.
It is important to note that I have no control over the inputs - in other words the files are saved by an external device over which I have no control; so I can't, for example, push a message onto a queue when the file is saved, and respond to that message...
Although, in the end, that's the goal... So I intend to have a "Watcher" service which will (via events or polling) detect the presence of one or more files, and push a message onto the appropriate queue for the next step in my workflow to respond to.
It should be noted that I am using VS2015, and the latest Azure SDK stuff, so I'm not limited by anything legacy.
What I have so far is basically this (a snippet of a larger code base):
storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create a CloudFileClient object for credentialed access to File storage.
fileClient = storageAccount.CreateCloudFileClient();
// Obtain the file share name from the config file
string sharenameString = CloudConfigurationManager.GetSetting("NLRB.Scanning.FileSharename");
// Get a reference to the file share.
share = fileClient.GetShareReference(sharenameString);
// Ensure that the share exists.
if (share.Exists())
{
Trace.WriteLine("Share exists.");
// Get a reference to the root directory for the share.
rootDir = share.GetRootDirectoryReference();
//Here is where I want to start watching the folder represented by rootDir...
}
Thanks in advance.
If you're using an attached disk (or local scratch disk), the behavior would be like on any other Windows machine, so you'd just set up a file watcher accordingly with FileSystemWatcher and deal with callbacks as you normally would.
There's Azure File Service, which is SMB as-a-service and would support any actions you'd be able to do on a regular SMB volume on your local network.
There's Azure blob storage. These can not be watched. You'd have to poll for changes to, say, a blob container.
You could create a loop that polls the root directory periodically using
CloudFileDirectory.ListFilesAndDirectories method.
https://msdn.microsoft.com/en-us/library/dn723299.aspx
You could also write a small recursive method to call this in sub directories.
To detect differences you can build up an in memory hash map of all files and directories. If you want something like a persistent distributed cache then you can use ie. Redis to keep this list of files/directories. Every time you poll if the file or directory is not in your list then you detected a new file/ directory under root.
You could separate the responsibility of detection and business logic ie. a worker role keeps polling the directory and writes the new files to a queue and the consumer end another worker role/ web service that does the processing with that information.
Azure Blob Storage pushes events through Azure Event Grid. Blob storage has two event types, Microsoft.Storage.BlobCreated and Microsoft.Storage.BlobDeleted. So instead of long polling you can simply react to the created event.
See this link for more information:
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-overview
I had a very similar requirement. I used BOX application. It has a Webhook feature for events occurring in Files or Folders: such as Add, Move, Delete etc..
Also there are some newer alternatives with Azure Autromation.
I'm pretty new to Azure too, and actually I'm investigating a file watcher type thing. I'm considering something involving Azure Functions, because of this, which looks like a way of triggering some code when a blog is created or updated. There's a way of specifying a pattern too: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob

How can I overwrite/update a file that is currently being served by IIS?

The problem:
My company puts out a monthly newsletter which I host on our internal website. I have a page for the author of the newsletter to upload the latest version. Once the author has uploaded the latest newsletter, he sends a broadcast email to announce the new newsletter. Employees invariably check the new newsletter and send feedback to the author with corrections that need to be made.
Once the author has made the necessary corrections (typically within an hour of sending the broadcast email), he revisits my page and replaces the latest version with the updated newsletter.
Immediately following the replacement (or update, if you will) of the newsletter, anyone attempting to access it gets a 500 - Internal Server Error.
My IT guy who maintains the server cannot delete/rename/move the file because of a permissions error and has to do a lot of convoluted things to get the file deleted (and once the file is deleted, the author of the newsletter can re-upload the corrected copy and it works fine.
My IT guy and I are pretty sure that the problem stems from that I'm trying to replace the file while IIS is actively serving it to users (which I thought of and thought that I had coded against happening).
The code that runs the replacement is as follows:
Protected Sub ReplaceLatestNewsletter()
Dim dr As DataRow
Dim sFile As String
Dim mFileLock As Mutex
Try
If Me.Archives.Rows.Count > 0 Then
dr = Me.Archives.Rows(0)
sFile = dr("File").ToString
If dr("Path").ToString.Length > 0 Then
mFileLock = New Mutex(True, "MyMutexToPreventReadsOnOverwrite")
Try
mFileLock.WaitOne()
System.IO.File.Delete(dr("Path").ToString)
Catch ex As Exception
lblErrs.Text = ex.ToString
Finally
mFileLock.ReleaseMutex()
End Try
End If
fuNewsletter.PostedFile.SaveAs(Server.MapPath("~/Newsletter/archives/" & sFile))
End If
Catch ex As Exception
lblErrs.Text = ex.ToString
End Try
dr = Nothing
sFile = Nothing
mFileLock = Nothing
End Sub
I thought the Mutex would take care of this (although after re-reading documentation I'm not sure I can actually use it like I'm trying to). Other comments on the code above:
Me.Archives is a DataTable stored in ViewState
dr("File").ToString is the filename (no path)
dr("Path").ToString is the full local machine path and filename (i.e., 'C:\App_Root\Newsletters\archives\20120214.pdf')
The filenames of the newsletters are set to "YYYYMMDD.pdf" where YYYYMMDD is the date (formatted) of the upload.
In any case, I'm pretty sure that the code above is not establishing an exclusive lock on the file so that the file can be overwritten safely.
Ultimately, I would like to make sure that the following happens:
If IIS is currently serving the file, wait until IIS has finished serving it.
Before IIS can serve the file again, establish an exclusive lock on the file so that no other process, thread, user (etc.) can read from or write to the file.
Either delete the file entirely and write a new file to replace it or overwrite the existing file with the new content.
Remove the exclusive lock so that users can access the file again.
Suggestions?
Also, can I use a Mutex to get a mutually exclusive lock on a file in the Windows filesystem?
Thank you in advance for your assistance and advice.
EDIT:
The way that the links for the newsletter are generated is based on the physical filename. The method used is:
Get all PDF files in the "archives" directory. For each file:
Parse the date of publication from the filename.
Store the date, the path to the file, the filename, and a URL to each file in a DataRow in a DataTable
Sort the DataTable by date (descending).
Output the first row as the current issue.
Output all subsequent rows as "archives" organized by year and month.
UPDATE:
In lieu of not being able to discern when all existing requests for that file have completed, I took a closer look at the first part of #Justin's answer ("your mutex will only have an effect if the process that reads from the file also obtains the same mutex.")
This led me to Configure IIS7 to server static content through ASP.NET Runtime and the linked article in the accepted answer.
To that end, I have implemented a handler for all PDF files which implements New Mutex(True, "MyMutexToPreventReadsOnOverwrite") to ensure that only one thread is doing something with the PDF at any given time.
Thank you for you answer, #Justin. While I did not wind up using the implementation you suggested, your answer pointed me towards an acceptable solution.
Your mutex will only have an effect if the process that reads from the file also obtains the same mutex. What is the method used to serve up the file? Is ASP.Net used or is this just a static file?
My workflow would be a little different:
Write the new newsletter to a new file
Have IIS start serving up the new file instead of the old one for the given Newsletter url
Delete the old file once all existing requests for that file have completed
This requires no locking and also means that we don't need to wait for requests for the current file be completed (something which could potentially take an indefinite amount of time if people keep on making new requests). The only interesting bit is step 2 which will depend on how the file is served - the easiest way would probably be to either set up a HTTP redirect or use URL rewriting
HTTP Redirect
A HTTP Redirect is where the server tells the client to look in a different place when it gets a request for a given resource so that the browser URL is automatically updated to match the new location. For example if the user requested http://server/20120221.pdf then they could be automatically redirected to another URL such as http://server/20120221_v2.pdf (the URL shown in the browser would change however the URL they need to type in would not).
You can do this in IIS 7 using the httpRedirect configuration element, for example:
<configuration>
<system.webServer>
<httpRedirect enabled="true" exactDestination="true" httpResponseStatus="Found">
<!-- Note that I needed to add a * in for IIS to accept the wildcard even though it isn't used in this case -->
<add wildcard="*20120221.pdf" destination="20120221_v2.pdf" />
</httpRedirect>
</system.webServer>
</configuration>
The linked page shows how to change these settings from ASP.Net
Url Rewriting
Alternatively IIS can be set up to automatically serve up the content of a different file for a given URL without the client (the browser) ever knowing the difference. This is called URL rewriting and can be done in IIS using something like this however it does require that additional components be installed to IIS to work.
Using a HTTP Redirect is probably the easiest method.

Categories