Hangfire: huge latency when using MySQL

Hangfire: huge latency when using MySQL - c#

I'm using HangFire with MySQL backend. When I do simple
var jobId = BackgroundJob.Enqueue(
() => Debug.WriteLine("Test"));
I see a delay of 5-10 seconds, even though I've set the polling rate to 1 sec:
app.UseHangfireServer(new BackgroundJobServerOptions()
{
SchedulePollingInterval = TimeSpan.FromSeconds(1),
ServerCheckInterval = TimeSpan.FromSeconds(1),
HeartbeatInterval = TimeSpan.FromSeconds(1)
});
My setup is the most simple as possible - server and client on the same machine, queue is empty. I was expecting delays not bigger than 1 second.
I do not plan to use distributed servers, can I force somehow a server to immediately start the task? I assume if I switch to in-memory storage for hangfire it would start tasks immediately, - I found https://github.com/perrich/Hangfire.MemoryStorage but it is stated that it should not be used in production. What are my options to optimize latency?

Related

Entity framework transient failure and command timeout

I have some sporadic transient errors when connecting from my docker container to the hosting machine database.
For this reason, I configure my DB context to retry on failure. This solved the problem partially because the client application still fails since the retry of the request can take up to 2 minutes (on average 30 seconds).
This is too long and represents a bad user experience. I was trying to understand why it takes so long but the only thing I can think of is that the timeout until declaring the connection as failure is too long. I thought of making the command timeout smaller. (By default it is 30 seconds if I am not wrong) Maybe put it 2-3 seconds. (most of my queries take less than 30ms) but I don't know if this would create other problems.
When checking my logs I discovered that the problem doesn't rely on the retry logic because it retries straight after the failure but what takes so long is the failure response.
This is my current configuration.
builder.Services.AddDbContext<AuthDbContext>(options =>
{
options.UseNpgsql(EnvironmentVariables.GetEnvironmentVariable(EnvironmentVariables.DB_AUTH), conf =>
{
conf.EnableRetryOnFailure(5, TimeSpan.FromSeconds(5), new List<string> { "4060" });
conf.CommandTimeout(2); //This is the command timeout that I want to add.
});
options.LogTo(
filter: (eventId, level) => eventId.Id == CoreEventId.ExecutionStrategyRetrying,
logger: (eventData) =>
{
var retryEventData = eventData as ExecutionStrategyEventData;
var exceptions = retryEventData.ExceptionsEncountered;
Log.Information("TRANSIENT ERROR Retry #{attemptNumber} with delay {delayMs} due to error: {errorMessage}", exceptions.Count, retryEventData.Delay, exceptions.Last().Message);
});
});

There are a few points that you may consider
I am not sure which of 2 timeouts you experienced What is the difference between SqlCommand.CommandTimeout and SqlConnection.ConnectionTimeout?
Setting connection timeout to 3 seconds will make errors more often, I suggest do not make it less than 15 sec.
I recently learned that EF Core doesn’t consider timeouts as transient errors, and EnableRetryOnFailure doesn’t retry in case of timeout. You can used custom strategy if you want to retry timeout, eg. as https://github.com/dotnet/efcore/issues/27826#issuecomment-1177641624

Hangfire persist local scope

I would like to use Hangfire to create long running fire and forget task. If the web server dies and the background job is retried, I would like it to pick up where it left off.
In the example below, let's say that foo.RetryCount reaches 3 -> server restarts -> Hangfire reruns the job. In this case I would only like to run the task 7 more times (based on MaxAttemps), instead of restarting from zero.
I thought Hangfire persisted the arguments passed to the method in their current state, but as far as I can tell they are reset.
var foo = new Foo { RetryCount = 0, MaxAttemps = 10 };
BackgroundJob.Enqueue(() => RequestAndRetryOnFailure(foo));
void RequestAndRetryOnFailure(Foo foo)
{
// make request to server, if fail, wait for a
// while and try again later if not foo.MaxAttemps is reached
foo.RetryCount++;
}

I use hangfire extensively for a lot of different actions and have a constant need to reschedule a job that started but couldn't execute due to certain constraints.
The persistency you are referring to happens in the serialized version of the job that's enqeued but no longer kept once it does execute.
What I would recommend is, schedule the job to execute after certain amount if the server is not available. This will also help restart the job if the job is scheduled and hangfire reboots.
var foo = new Foo { RetryCount = 0, MaxAttemps = 10 };
BackgroundJob.Enqueue(() => RequestAndRetryOnFailure(foo));
void RequestAndRetryOnFailure(Foo foo)
{
// make request to server, if fail, wait for a
// while and try again later if not foo.MaxAttemps is reached
if (request to server failed)
{
foo.RetryCount ++;
If (foo.RetryCount < foo.MaxAttempts)
BackgroundJob.Schedule(RequestAndRetryOnFailure(foo), Timespan.FromSeconds(30));
else
return; // do nothing
}
}

Speed up reverse DNS lookups for large batch of IPs

For analytics purposes, I'd like to perform reverse DNS lookups on large batches of IPs. "Large" meaning, at least tens of thousands per hour. I'm looking for ways to increase the processing rate, i.e. lower the processing time per batch.
Wrapping the async version of Dns.GetHostEntry into await-able tasks has already helped a lot (compared to sequential requests), leading to a throughput of appox. 100-200 IPs/second:
static async Task DoReverseDnsLookups()
{
// in reality, thousands of IPs
var ips = new[] { "173.194.121.9", "173.252.110.27", "98.138.253.109" };
var hosts = new Dictionary<string, string>();
var tasks =
ips.Select(
ip =>
Task.Factory.FromAsync(Dns.BeginGetHostEntry,
(Func<IAsyncResult, IPHostEntry>) Dns.EndGetHostEntry,
ip, null)
.ContinueWith(t =>
hosts[ip] = ((t.Exception == null) && (t.Result != null))
? t.Result.HostName : null));
var start = DateTime.UtcNow;
await Task.WhenAll(tasks);
var end = DateTime.UtcNow;
Console.WriteLine("Resolved {0} IPs in {1}, that's {2}/sec.",
ips.Count(), end - start,
ips.Count() / (end - start).TotalSeconds);
}
Any ideas how to further improve the processing rate?
For instance, is there any way to send a batch of IPs to the DNS server?
Btw, I'm assuming that under the covers, I/O Completion Ports are used by the async methods - correct me if I'm wrong please.

Hello here are some tips so you can improve:
Cache the queries locally since this information don't usually change for
days or even years. This way you don't have to resolve every time.
Most DNS servers will automatically cache the information, so the next time it will resolve
pretty fast. Usually the cache is 4 hours, at least it is the default on Windows servers.
This means that if you run this process in a batch in a short period, it will perform better that
if you resolve the addresses several times during the day allowing cahce to expire.
It is good that you are using Task Parallelism but you are still asking the same DNS servers
configured on your machine. I think that having two machines using different DNS servers will
improve the process.
I hope this helps.

As always, I would suggest using TPL Dataflow's ActionBlock instead of firing all requests at once and waiting for all to complete. Using an ActionBlock with a high MaxDegreeOfParallelism lets the TPL decide for itself how many calls to fire concurrently, which can lead to a better utilization of resources:
var block = new ActionBlock<string>(
async ip =>
{
try
{
var host = (await Dns.GetHostEntryAsync(ip)).HostName;
if (!string.IsNullOrWhitespace(host))
{
hosts[ip] = host;
}
}
catch
{
return;
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5000});
I would also suggest adding a cache, and making sure you don't resolve the same ip more than once.
When you use .net's Dns class it includes some fallbacks beside DNS (e.g LLMNR), which makes it very slow. If all you need are DNS queries you might want to use a dedicated library like ARSoft.Tools.Net.
P.S: Some remarks about your code sample:
You should be using GetHostEntryAsync instead of FromAsync
The continuation can potentially run on different threads so you should really be using ConcurrentDictionary.

Boosting performance on async web calls

Backgound: I must call a web service call 1500 times which takes roughly 1.3 seconds to complete. (No control over this 3rd party API.) total Time = 1500 * 1.3 = 1950 seconds / 60 seconds = 32 minutes roughly.
I came up with what I though was a good solution however it did not pan out that great.
So I changed the calls to async web calls thinking this would dramatically help my results it did not.
Example Code:
Pre-Optimizations:
foreach (var elmKeyDataElementNamed in findResponse.Keys)
{
var getRequest = new ElementMasterGetRequest
{
Key = new elmFullKey
{
CmpCode = CodaServiceSettings.CompanyCode,
Code = elmKeyDataElementNamed.Code,
Level = filterLevel
}
};
ElementMasterGetResponse getResponse;
_elementMasterServiceClient.Get(new MasterOptions(), getRequest, out getResponse);
elementList.Add(new CodaElement { Element = getResponse.Element, SearchCode = filterCode });
}
With Optimizations:
var tasks = findResponse.Keys.Select(elmKeyDataElementNamed => new ElementMasterGetRequest
{
Key = new elmFullKey
{
CmpCode = CodaServiceSettings.CompanyCode,
Code = elmKeyDataElementNamed.Code,
Level = filterLevel
}
}).Select(getRequest => _elementMasterServiceClient.GetAsync(new MasterOptions(), getRequest)).ToList();
Task.WaitAll(tasks.ToArray());
elementList.AddRange(tasks.Select(p => new CodaElement
{
Element = p.Result.GetResponse.Element,
SearchCode = filterCode
}));
Smaller Sampling Example:
So to easily test I did a smaller sampling of 40 records this took 60 seconds with no optimizations with the optimizations it only took 50 seconds. I would have though it would have been closer to 30 or better.
I used wireshark to watch the transactions come through and realized the async way was not sending as fast I assumed it would have.
Async requests captured
Normal no optimization
You can see that the asnyc pushes a few very fast then drops off...
Also note that between requests 10 and 11 it took nearly 3 seconds.
Is the overhead for creating threads for the tasks that slow that it takes seconds?
Note: The tasks I am referring to are the 4.5 TAP task library.
Why wouldn't the request come faster than that.
I was told the Apache web server I was hitting could hold 200 max threads so I don't see an issue there..
Am I not thinking about this clearly?
When calling web services are there little advantages from async requests?
Do I have a code mistake?
Any ideas would be great.

After many days of searching I found this post that solved my problem:
Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)
The reason that the request was not hitting the server quicker was due too the my client side code and nothing to do with the server. By default C# only allows 2 concurrent requests.
see here: http://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.defaultconnectionlimit.aspx
I simply added this line of code and then all request came through in milliseconds.
System.Net.ServicePointManager.DefaultConnectionLimit = 50;

azure queue performance

For the windows azure queues the scalability target per storage is supposed to be around 500 messages / second (http://msdn.microsoft.com/en-us/library/windowsazure/hh697709.aspx). I have the following simple program that just writes a few messages to a queue. The program takes 10 seconds to complete (4 messages / second). I am running the program from inside a virtual machine (on west-europe) and my storage account also is located in west-europe. I don't have setup geo replication for my storage. My connection string is setup to use the http protocol.
// http://blogs.msdn.com/b/windowsazurestorage/archive/2010/06/25/nagle-s-algorithm-is-not-friendly-towards-small-requests.aspx
ServicePointManager.UseNagleAlgorithm = false;
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
queue.CreateIfNotExist();
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
Any idea how I can get better performance?
EDIT:
Based on Sandrino Di Mattia's answer I re-analyzed the code I've originally posted and found out that it was not complete enough to reproduce the error. In fact I had created a queue just before the call to ServicePointManager.UseNagleAlgorithm = false; The code to reproduce my problem looks more like this:
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
//ServicePointManager.UseNagleAlgorithm = false; // If you change the nagle algorithm here, the performance will be okay.
queue.CreateIfNotExist();
ServicePointManager.UseNagleAlgorithm = false; // TOO LATE, the queue is already created without 'nagle'
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
The suggested solution from Sandrino to configure the ServicePointManager using the app.config file has the advantage that the ServicePointManager is initialized when the application starts up, so you don't have to worry about time dependencies.

I answered a similar question a few days ago: How to achive more 10 inserts per second with azure storage tables.
For adding 1000 items in table storage it took over 3 minutes, and with the changes I described in my answer it dropped to 4 seconds (250 requests/sec). In the end, table storage and storage queues aren't all that different. The backend is the same, data is simply stored in a different way. And both table storage and queues are exposed through a REST API, so if you improve the way you handle your requests, you'll get a better performance.
The most important changes:
expect100Continue: false
useNagleAlgorithm: false (you're already doing this)
Parallel requests combined with connectionManagement/maxconnection

Also, ServicePointManager.DefaultConnectionLimit should be increased before making a service point. Actually Sandrino's answer says the same thing but using config.
Turn off proxy detection even in the cloud. Auto detect in proxy config element. Slows initialisation.
Choose distributed partition keys.
Collocate your account near to compute, and customers.
Design to add more accounts as needed.
Microsoft set the SLA at 2,000 tps on queues and tables as of 07 2012.
I didn't read Sandrino's linked answer, sorry, just was on this question as I was watching Build 2012 session on exactly this.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Hangfire: huge latency when using MySQL - c#

Related

Entity framework transient failure and command timeout

Hangfire persist local scope

Speed up reverse DNS lookups for large batch of IPs

Boosting performance on async web calls

azure queue performance

Categories

Resources