AWSSDK s3 .net question. My multipart upload crashes after 15mins of transfer - c#

I'm uploading rather a lot of data (30gb+) across thousands of files. The whole process takes a while but I've been finding that consistently after 15 mins of transfers, the upload process fails and I get errors for each file that is currently being transferred (I'm doing it multithreaded so there are multiple uploads at once). The error code I'm getting is "error: Amazon.S3.AmazonS3Exception: The difference between the request time and the current time is too large. ---> Amazon.Runtime.Internal.HttpErrorResponseException: The remote server returned an error: (403) Forbidden. ---> System.Net.WebException: The remote server returned an error: (403) Forbidden."
Seeing as its exactly 15 mins from the start of the whole process that this thing crashes, I think its maybe that the client is timing out, however I've set my client's timout to 45 mins I think:
{
var client = new AmazonS3Client(new AmazonS3Config()
{
RegionEndpoint = RegionEndpoint.EUWest2,
UseAccelerateEndpoint = true,
Timeout = TimeSpan.FromMinutes(45),
ReadWriteTimeout = TimeSpan.FromMinutes(45),
RetryMode = RequestRetryMode.Standard,
MaxErrorRetry = 10
});
Parallel.ForEach(srcObjList, async srcObj =>
{
try
{
var putObjectRequest = new PutObjectRequest();
putObjectRequest.BucketName = destBucket;
putObjectRequest.Key = srcObj.Key;
putObjectRequest.FilePath = filePathString;
putObjectRequest.CannedACL = S3CannedACL.PublicRead;
var uploadTask = client.PutObjectAsync(putObjectRequest);
lock (threadLock)
{
syncTasks.Add(uploadTask);
}
await uploadTask;
}
catch (Exception e)
{
Debug.LogError($"Copy task ({srcObj.Key}) failed with error: {e}");
throw;
}
});
try
{
await Task.WhenAll(syncTasks.Where(x => x != null).ToArray());
}
catch (Exception e)
{
Debug.LogError($"Upload encountered an issue: {e}");
}
});
await transferOperations;
Debug.Log("Done!");```

The documentation doesn't specify the maximum timeout value, but given that you're seeing 15 minutes exactly, it stands to reason there is some upper limit to the timeout value, either a hard limit or something in the S3 bucket's settings.
This answer suggests a clock synchronization difference might also be the case, but then I'd wonder why the transfer starts at all.

Related

Azure Function SSH.Net Socket read operation has timed out

I'm attempting to connect from a timed Azure function to a 3rd party SFTP server that I have access to, but do not control. My function runs successfully locally when using the azure functions emulator, however I receive an exception ("Socket read operation has timed out after 30000 milliseconds.") when attempting to run in Azure.
Is there anything from a networking perspective I need to do to allow/set up outbound SFTP connections, or does anyone see anything wrong with my code below?
var ftpHost = Environment.GetEnvironmentVariable("SFTP:Server");
var ftpUser = Environment.GetEnvironmentVariable("SFTP:User");
var ftpPass = Environment.GetEnvironmentVariable("SFTP:Password");
var ftpDirectory = Environment.GetEnvironmentVariable("SFTP:WorkingDirectory");
log.Info($"Connecting to {ftpHost}"); //This outputs the correct values I would expect from my app settings
using (var sftp = new SftpClient(ftpHost, ftpUser, ftpPass))
{
sftp.Connect(); //This throws the exception
log.Info("Connected");
var files = sftp.ListDirectory(ftpDirectory);
log.Info("Directory listing successful");
var exceptions = new List<Exception>();
foreach (var file in files.Where(f => f.IsRegularFile))
{
try
{
log.Info($"{file.FullName} - {file.LastWriteTimeUtc}");
var records = Process(sftp, file);
log.Info($"Parsed {records.Count} records");
sftp.DeleteFile(file.FullName);
log.Info($"Deleted {file.FullName}");
}
catch (Exception ex)
{
exceptions.Add(ex);
}
}
if (exceptions.Any())
{
throw new AggregateException(exceptions);
}
}
Edit
I did leave my failing code out there and the failures appear to be intermittent. Running every 15 minutes, I have a roughly 50% success rate. In the last 20 attempts, 10 have succeeded.

Setting file permissions in batch

I'm outputting the following error that's generated when executing a batch of file permission changes to a file. I'm not sure what to make of this, since it's only a problem for some of the files, and the issue doesn't trigger when trying to run in debug mode:
500 >> Internal Error. User message: "An internal error has occurred which prevented the sharing of these item(s): Example File.DOCX"
I'm using the following code:
var batch = new Google.Apis.Requests.BatchRequest(service);
Google.Apis.Requests.BatchRequest.OnResponse<Permission> callback = delegate (
Permission permission,
Google.Apis.Requests.RequestError error,
int index,
System.Net.Http.HttpResponseMessage message) {
if (error != null) {
// Handle error
Console.WriteLine("File PERMISSION Error: " + error.Code + " >> " + error.Message);
} else {
Console.WriteLine("File Permission ID: " + permission.Id);
}
};
Permission filePermission = new Permission()
{
EmailAddress = "test-email#gmail.com"
, Type = GoogleDriveRoleType
, Role = GoogleDriveRole
};
var permExec = service.Permissions.Create(filePermission, googleDriveObjectId);
permExec.SendNotificationEmail = false;
permExec.Fields = "id";
batch.Queue(permExec, callback);
await batch.ExecuteAsync();
This code is within a method that's public static async Task MyMethod(...).
You may want to make your batch request smaller. You might be experiencing 500 internal error because you are flooding the server with too many request per second. As stated in this related SO post, as the server handles your batch request, it is not smart enough to slow down to avoid the error 500. You can also use exponential backoff and then retry the batch request. As stated in this forum, currently there is no way to know in a batch request what part fail or succeed. You have to create your own implementation regarding this. Hope this helps.

FTP client, Unexpected error occurred on a receive occurs twice, then times out indefinitely

I have an FTP client, running as part of a windows service that gets information from an FTP server on a scheduled basis. My issue is that sometimes, the FTP server is down for planned maintanance. When this happens, my FTP client still calls out on a scheduled basis and fails with the following error:
System.Net.WebException. The underlying connection was closed: An unexpected error occurred on a receive
I get the error above twice. After this, I get the following timeout error every time indefinitely:
System.Net.WebException The operation has timed out
Even with the maintenance window complete, my windows service will keep timing out when attempting to connect to the FTP server. The only way we can solve the problem is by restarting the windows service. The following code shows my FTP client code:
var _request = (FtpWebRequest)WebRequest.Create(configuration.Url);
_request.Method = WebRequestMethods.Ftp.DownloadFile;
_request.KeepAlive = false;
_request.Timeout = configuration.RequestTimeoutInMilliseconds;
_request.Proxy = null; // Do NOT use a proxy
_request.Credentials = new NetworkCredential(configuration.UserName, configuration.Password);
_request.ServicePoint.ConnectionLeaseTimeout = configuration.RequestTimeoutInMilliseconds;
_request.ServicePoint.MaxIdleTime = configuration.RequestTimeoutInMilliseconds;
try
{
using (var _response = (FtpWebResponse)_request.GetResponse())
using (var _responseStream = _response.GetResponseStream())
using (var _streamReader = new StreamReader(_responseStream))
{
this.c_rateSourceData = _streamReader.ReadToEnd();
}
}
catch (Exception genericException)
{
throw genericException;
}
Anyone know what the issue might be?

The connection was closed unexpectedly C# after a long running time

Hi I was making a crawler for a site. After about 3 hours of crawling, my app stopped on a WebException. below are my code in c#. client is predefined WebClient object that will be disposed every time gameDoc has already been processed. gameDoc is a HtmlDocument object (from HtmlAgilityPack)
while (retrygamedoc)
{
try
{
gameDoc.LoadHtml(client.DownloadString(url)); // this line caused the exception
retrygamedoc = false;
}
catch
{
client.Dispose();
client = new WebClient();
retrygamedoc = true;
Thread.Sleep(500);
}
}
I tried to use code below (to keep the webclient fresh) from this answer
while (retrygamedoc)
{
try
{
using (WebClient client2 = new WebClient())
{
gameDoc.LoadHtml(client2.DownloadString(url)); // this line cause the exception
retrygamedoc = false;
}
}
catch
{
retrygamedoc = true;
Thread.Sleep(500);
}
}
but the result is still the same. Then I use StreamReader and the result stays the same! below are my code using StreamReader.
while (retrygamedoc)
{
try
{
// using native to check the result
HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
string responsestring = string.Empty;
HttpWebResponse response = (HttpWebResponse)webreq.GetResponse(); // this cause the exception
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
responsestring = reader.ReadToEnd();
}
gameDoc.LoadHtml(client.DownloadString(url));
retrygamedoc = false;
}
catch
{
retrygamedoc = true;
Thread.Sleep(500);
}
}
What should I do and check? I am so confused because I got am able to crawl on some pages, on the same site, then in about 1000 reasults, it cause the exception. the message from exception is only The request was aborted: The connection was closed unexpectedly. and the status is ConnectionClosed
PS. the app is a desktop form app.
update :
Now I am skipping the values and turned them to null so that the crawling can goes on. But if the data is really needed, I still have to update the crawling result manually, which is tiring because the result contains thousands of record. Please help me.
example :
it was like you have downloaded like about 1300 data from the website, then the application stopped saying The request was aborted: The connection was closed unexpectedly. while all your internet connection still on and on a good speed.
ConnectionClosed may indicate (and probably does) that the server you're downloading from is closing the connection. Perhaps it is noticing a large amount of requests from your client and is denying you additional service.
Since you can't control server-side shenanigans, I'd recommend you have some sort of logic to retry the download a bit later.
Got this error because it was returned as 404 from the server.

UploadValuesAsync response time

I am writing test harness to test a HTTP Post. Test case would send 8 http request using UploadValuesAsync in webclient class in 10 seconds interval. It sleeps 10 seconds after every 8 request. I am recording start time and end time of each request. When I compute the average response time. I am getting around 800 ms. But when I run this test case synchronously using UploadValues method in web client I am getting average response time 250 milliseconds. Can you tell me why is difference between these two methods? I was expecting the less response time in Aync but I did not get that.
Here is code that sends 8 requests async
var count = 0;
foreach (var nameValueCollection in requestCollections)
{
count++;
NameValueCollection collection = nameValueCollection;
PostToURL(collection,uri);
if (count % 8 == 0)
{
Thread.Sleep(TimeSpan.FromSeconds(10));
count = 0;
}
}
UPDATED
Here is code that sends 8 requests SYNC
public void PostToURLSync(NameValueCollection collection,Uri uri)
{
var response = new ServiceResponse
{
Response = "Not Started",
Request = string.Join(";", collection.Cast<string>()
.Select(col => String.Concat(col, "=", collection[col])).ToArray()),
ApplicationId = collection["ApplicationId"]
};
try
{
using (var transportType2 = new DerivedWebClient())
{
transportType2.Expect100Continue = false;
transportType2.Timeout = TimeSpan.FromMilliseconds(2000);
response.StartTime = DateTime.Now;
var responeByte = transportType2.UploadValues(uri, "POST", collection);
response.EndTime = DateTime.Now;
response.Response = Encoding.Default.GetString(responeByte);
}
}
catch (Exception exception)
{
Console.WriteLine(exception.ToString());
}
response.ResponseInMs = (int)response.EndTime.Subtract(response.StartTime).TotalMilliseconds;
responses.Add(response);
Console.WriteLine(response.ResponseInMs);
}
Here is the code that post to the HTTP URI
public void PostToURL(NameValueCollection collection,Uri uri)
{
var response = new ServiceResponse
{
Response = "Not Started",
Request = string.Join(";", collection.Cast<string>()
.Select(col => String.Concat(col, "=", collection[col])).ToArray()),
ApplicationId = collection["ApplicationId"]
};
try
{
using (var transportType2 = new DerivedWebClient())
{
transportType2.Expect100Continue = false;
transportType2.Timeout = TimeSpan.FromMilliseconds(2000);
response.StartTime = DateTime.Now;
transportType2.UploadValuesCompleted += new UploadValuesCompletedEventHandler(transportType2_UploadValuesCompleted);
transportType2.UploadValuesAsync(uri, "POST", collection,response);
}
}
catch (Exception exception)
{
Console.WriteLine(exception.ToString());
}
}
Here is the upload completed event
private void transportType2_UploadValuesCompleted(object sender, UploadValuesCompletedEventArgs e)
{
var now = DateTime.Now;
var response = (ServiceResponse)e.UserState;
response.EndTime = now;
response.ResponseInMs = (int) response.EndTime.Subtract(response.StartTime).TotalMilliseconds;
Console.WriteLine(response.ResponseInMs);
if (e.Error != null)
{
response.Response = e.Error.ToString();
}
else
if (e.Result != null && e.Result.Length > 0)
{
string downloadedData = Encoding.Default.GetString(e.Result);
response.Response = downloadedData;
}
//Recording response in Global variable
responses.Add(response);
}
One problem you're probably running into is that .NET, by default, will throttle outgoing HTTP connections to the limit (2 concurrent connections per remote host) that are mandated by the relevant RFC. Assuming 2 concurrent connections and 250ms per request, that means the response time for your first 2 requests will be 250ms, the second 2 will be 500ms, the third 750ms, and the last 1000ms. This would yield a 625ms average response time, which is not far from the 800ms you're seeing.
To remove the throttling, increase ServicePointManager.DefaultConnectionLimit to the maximum number of concurrent connections you want to support, and you should see your average response time go down alot.
A secondary problem may be that the server itself is slower handling multiple concurrent connections than handing one request at a time. Even once you unblock the throttling problem above, I'd expect each of the async requests to, on average, execute somewhat slower than if the server was only executing one request at a time. How much slower depends on how well the server is optimized for concurrent requests.
A final problem may be caused by test methodology. For example, if your test client is simulating a browser session by storing cookies and re-sending cookies with each request, that may run into problems with some servers that will serialize requests from a single user. This is often a simplification for server apps so they won't have to deal with locking cross-requests state like session state. If you're running into this problem, make sure that each WebClient sends different cookies to simulate different users.
I'm not saying that you're running into all three of these problems-- you might be only running into 1 or 2-- but these are the most likley culprits for the problem you're seeing.
As Justin said, I tried ServicePointManager.DefaultConnectionLimit but that did not fix the issue. I could not able reproduce other problems suggested by Justin. I am not sure how to reproduce them in first place.
What I did, I ran the same piece of code in peer machine that runs perfectly response time that I expected. The difference between the two machines is operating systems. Mine is running on Windows Server 2003 and other machine is running on Windows Server 2008.
As it worked on the other machines, I suspect that it might be one of the problem specified by Justin or could be server settings on 2003 or something else. I did not spend much time after that to dig this issue. As this is a test harness that we had low priority on this issue. We left off with no time further.
As I have no glue on what exactly fixed it, I am not accepting any answer other than this. Becuase at very least I know that switching to server 2008 fixed this issue.

Categories