Azure Table Storage QueryAll(), ImproveThroughput - c#

I have some data (approximatly 5 Mio of items in 1500 tables, 10GB) in azure tables. The entities can be large and contain some serialized binary data in the protobuf format.
I have to process all of them and transform it to another structure. This processing is not thread safe. I also process some data from a mongodb replica set using the same code (the mongodb is hosted in another datacenter).
For debugging purposes I log the throughput and realized that it is very low. With mongodb I have a throughput of 5000 items / sec, with azure table storage only 30 items per second.
To improve the performance I try to use TPL dataflow, but it doesnt help:
public async Task QueryAllAsync(Action<StoredConnectionSetModel> handler)
{
List<CloudTable> tables = await QueryAllTablesAsync(companies, minDate);
ActionBlock<StoredConnectionSetModel> handlerBlock = new ActionBlock<StoredConnectionSetModel>(handler, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 1 });
ActionBlock<CloudTable> downloaderBlock = new ActionBlock<CloudTable>(x => QueryTableAsync(x, s => handlerBlock.Post(s), completed), new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 20 });
foreach (CloudTable table in tables)
{
downloaderBlock.Post(table);
}
}
private static async Task QueryTableAsync(CloudTable table, Action<StoredConnectionSetModel> handler)
{
TableQuery<AzureTableEntity<StoredConnectionSetModel>> query = new TableQuery<AzureTableEntity<StoredConnectionSetModel>>();
TableContinuationToken token = null;
do
{
TableQuerySegment<AzureTableEntity<StoredConnectionSetModel>> segment = await table.ExecuteQuerySegmentedAsync<AzureTableEntity<StoredConnectionSetModel>>(query, token);
foreach (var entity in segment.Results)
{
handler(entity.Entity);
}
token = segment.ContinuationToken;
}
while (token != null)
}
I run the batch process on my local machine (with 100mbit connection) and in azure (as worker role) and it is very strange, that the throughput on my machine is higher (100 items / sec) than on azure. I reach my max capacity of the internet connection locally but the worker role should not have this 100mbit limitation I hope.
How can I increase the throughput? I have no ideas what is going wrong here.
EDIT: I realized that I was wrong with the 30items per second. It is often higher (100/sec), depending on the size of the items I guess. According to the documentation (http://azure.microsoft.com/en-us/documentation/articles/storage-performance-checklist/#subheading10) there is a limit:
The scalability limit for accessing tables is up to 20,000 entities (1KB each) per second for an account. This are only 19MB / sec, not so impressive, if you keep in mind, that there are also normal requests from the production system). Probably I test it to use multiple accounts.
EDIT #2: I made two single tests, starting with a list of 500 keys [1...500] (Pseudo Code)
Test#1 Old approach (TABLE 1)
foreach (key1 in keys)
foreach (key2 in keys)
insert new Entity { paritionkey = key1, rowKey = key2 }
Test#2 New approach (TABLE 2)
numpartitions = 100
foreach (key1 in keys)
foreach (key2 in keys)
insert new Entity { paritionkey = (key1 + key2).GetHashCode() % numParitions, rowKey = key1 + key2 }
Each entity gets another property with 10KB of random text data.
Then I made the query tests, in the first case I just query all entities from Table 1 in one thread (sequential)
In the next test I create on task for each partitionkey and query all entities from Table 2 (parallel). I know that the test is no that good, because in my production environment I have a lot more partitions than only 500 per table, but it doesnt matter. At least the second attempt should perform well.
It makes no difference. My max throughput is 600 entities/sec, varying from 200 to 400 the most of the time. The documentation says that I can query 20.000 entities / sec (with 1 KB each), so I should get at least 1500 or so in average, I think. I tested it on a machine with 500MBit internet connection and I only reached about 30mbit, so this should not be the problem.

You should also check out the Table Storage Design Guide. Hope this helps.

Related

Why is this eating memory?

I wrote an application whose purpose is to read logs from a large table (90 million) and process them into easily understandable stats, how many, how long etc.
The first run took 7.5 hours and only had to process 27 of the 90 million. I would like to speed this up. So I am trying to run the queries in parallel. But when I run the below code, within a couple minutes I crash with an Out of Memory exception.
Environments:
Sync
Test : 26 Applications, 15 million logs, 5 million retrieved, < 20mb, takes 20 seconds
Production: 56 Applications, 90 million logs, 27 million retrieved, < 30mb, takes 7.5 hours
Async
Test : 26 Applications, 15 million logs, 5 million retrieved, < 20mb, takes 3 seconds
Production: 56 Applications, 90 million logs, 27 million retrieved, Memory Exception
public void Run()
{
List<Application> apps;
//Query for apps
using (var ctx = new MyContext())
{
apps = ctx.Applications.Where(x => x.Type == "TypeIWant").ToList();
}
var tasks = new Task[apps.Count];
for (int i = 0; i < apps.Count; i++)
{
var app = apps[i];
tasks[i] = Task.Run(() => Process(app));
}
//try catch
Task.WaitAll(tasks);
}
public void Process(Application app)
{
//Query for logs for time period
using (var ctx = new MyContext())
{
var logs = ctx.Logs.Where(l => l.Id == app.Id).AsNoTracking();
foreach (var log in logs)
{
Interlocked.Increment(ref _totalLogsRead);
var l = log;
Task.Run(() => ProcessLog(l, app.Id));
}
}
}
Is it ill advised to create 56 contexts?
Do I need to dispose and re-create contexts after a certain number of logs retrieved?
Perhaps I'm misunderstanding how the IQueryable is working? <-- My Guess
My understanding is that it will retrieve logs as needed, I guess that means for the loop is it like a yield? or is my issue that 56 'threads' call to the database and I am storing 27 million logs in memory?
Side question
The results don't really scale together. Based on the Test environment results i would expect Production would only take a few minutes. I assume the increase is directly related to the number of records in the table.
With 27 Million rows the problem is one of stream processing, not parallel execution. You need to approach the problem as you would with SQL Server's SSIS or any other ETL tools: each processing step is a transofrmation that processes its input and sends its output to the next step.
Parallel processing is achieved by using a separate thread to run each step. Some steps could also use multiple threads to process multiple inputs up to a limit. Setting limits to each step's thread count and input buffer ensures you can achieve maximum throughput without flooding your machine with waiting tasks.
.NET's TPL Dataflow addresses exactly this scenario. It provides blocks to transfrom inputs to outputs (TransformBlock), split collections to individual messages (TransformManyBlock), execute actions without transformations (ActionBlock), combine data in batches (BatchBlock) etc.
You can also specify the Maximum degree of parallelism for each step so that, eg. you have only 1 log queries executing at each time, but use 10 tasks for log processing.
In your case, you could:
Start with a TransformManyBlock that receives an application type and returns a list of app IDs
A TranformBlock reads the logs for a specific ID and sends them downstream
An ActionBlock processes the batch.
Step #3 could be broken to many other steps. Eg if you don't need to process all app log entries together, you can use a step to process individual entries. Or you could first group them by date.
Another option is to create a custom block to read data from the database using a DbDataReader and post each entry to the next step immediatelly, instead of waiting for all rows to return. This would allow you to process each entry as it arrives, instead of waiting to receive all entries.
If each app log contains many entries, this could be a huge memory and time saver

Speed up reverse DNS lookups for large batch of IPs

For analytics purposes, I'd like to perform reverse DNS lookups on large batches of IPs. "Large" meaning, at least tens of thousands per hour. I'm looking for ways to increase the processing rate, i.e. lower the processing time per batch.
Wrapping the async version of Dns.GetHostEntry into await-able tasks has already helped a lot (compared to sequential requests), leading to a throughput of appox. 100-200 IPs/second:
static async Task DoReverseDnsLookups()
{
// in reality, thousands of IPs
var ips = new[] { "173.194.121.9", "173.252.110.27", "98.138.253.109" };
var hosts = new Dictionary<string, string>();
var tasks =
ips.Select(
ip =>
Task.Factory.FromAsync(Dns.BeginGetHostEntry,
(Func<IAsyncResult, IPHostEntry>) Dns.EndGetHostEntry,
ip, null)
.ContinueWith(t =>
hosts[ip] = ((t.Exception == null) && (t.Result != null))
? t.Result.HostName : null));
var start = DateTime.UtcNow;
await Task.WhenAll(tasks);
var end = DateTime.UtcNow;
Console.WriteLine("Resolved {0} IPs in {1}, that's {2}/sec.",
ips.Count(), end - start,
ips.Count() / (end - start).TotalSeconds);
}
Any ideas how to further improve the processing rate?
For instance, is there any way to send a batch of IPs to the DNS server?
Btw, I'm assuming that under the covers, I/O Completion Ports are used by the async methods - correct me if I'm wrong please.
Hello here are some tips so you can improve:
Cache the queries locally since this information don't usually change for
days or even years. This way you don't have to resolve every time.
Most DNS servers will automatically cache the information, so the next time it will resolve
pretty fast. Usually the cache is 4 hours, at least it is the default on Windows servers.
This means that if you run this process in a batch in a short period, it will perform better that
if you resolve the addresses several times during the day allowing cahce to expire.
It is good that you are using Task Parallelism but you are still asking the same DNS servers
configured on your machine. I think that having two machines using different DNS servers will
improve the process.
I hope this helps.
As always, I would suggest using TPL Dataflow's ActionBlock instead of firing all requests at once and waiting for all to complete. Using an ActionBlock with a high MaxDegreeOfParallelism lets the TPL decide for itself how many calls to fire concurrently, which can lead to a better utilization of resources:
var block = new ActionBlock<string>(
async ip =>
{
try
{
var host = (await Dns.GetHostEntryAsync(ip)).HostName;
if (!string.IsNullOrWhitespace(host))
{
hosts[ip] = host;
}
}
catch
{
return;
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5000});
I would also suggest adding a cache, and making sure you don't resolve the same ip more than once.
When you use .net's Dns class it includes some fallbacks beside DNS (e.g LLMNR), which makes it very slow. If all you need are DNS queries you might want to use a dedicated library like ARSoft.Tools.Net.
P.S: Some remarks about your code sample:
You should be using GetHostEntryAsync instead of FromAsync
The continuation can potentially run on different threads so you should really be using ConcurrentDictionary.

Takes a long time to insert rows to database - data Parallel method

i insert data to database from terminal to access Through WebService
like this:
using (Conn = new OleDbConnection(Work_Connect))
{
foreach (DataRow R in ds.Tables["MyCount"].Rows)
{
U_TermNum = TermNum;
U_Id = Id;
U_Bar = R["Bar"].ToString().Trim();
U_Qty = R["Qty"].ToString().Trim();
U_Des = R["Des"].ToString().Trim();
U_UserName = UserName;
U_UserID = UserID;
SQL = "INSERT INTO MyTbl (ID,Bar,Qty,TermNum,Des,UserName,UserID) VALUES (#A,#B,#C,#D,#E,#F,#G)";
using (OleDbCommand Cmd4 = new OleDbCommand(SQL, Conn))
{
Cmd4.Parameters.AddWithValue("#A", Convert.ToInt32(U_Id));
Cmd4.Parameters.AddWithValue("#B", U_Bar);
Cmd4.Parameters.AddWithValue("#C", Convert.ToDouble(U_Qty));
Cmd4.Parameters.AddWithValue("#D", U_TermNum);
Cmd4.Parameters.AddWithValue("#E", U_Des);
Cmd4.Parameters.AddWithValue("#F", U_UserName);
Cmd4.Parameters.AddWithValue("#G", U_UserID);
Cmd4.ExecuteNonQuery();
}
}
i try to send from 20 terminal's
if i send 1--> wait 10 seconds send 2 --> wait 10 seconds --> .......... it works very fast and all teriminals ends to send after 1 minute
but if i send Parallel at ones --> it works very slow and all terminals ends after 6 minuts
why ? and how to change my code that i can send parallel and all ends fast ?
Now I noticed that not all rows was insert to the database
(When I want to put them all in - one)
How to deal with this trouble ?
If you find that your application is bogging down under load then using an Access back-end database might not be the right choice for your situation. Specifically:
ACE/Jet (Access) databases are generally not recommended for use with Web applications where the number of concurrent connections can vary greatly and web traffic can "spike" the level of activity well above ACE/Jet's "comfort zone"..
Informal discussions among Access developers tend to consider ~10 concurrent users as the point where an Access application will start to slow down, and ~25 concurrent users is often cited as the practical limit. These are very general guidelines, of course, and some Access applications can handle many more concurrent users depending on their usage patterns (e.g., mostly lookups with occasional inserts and updates).
So, if your application will regularly have ~20 concurrent connections hammering INSERTs into the database as fast as they can then then you should consider switching your database back-end to a server-based product that is better-suited to that type of activity.

azure queue performance

For the windows azure queues the scalability target per storage is supposed to be around 500 messages / second (http://msdn.microsoft.com/en-us/library/windowsazure/hh697709.aspx). I have the following simple program that just writes a few messages to a queue. The program takes 10 seconds to complete (4 messages / second). I am running the program from inside a virtual machine (on west-europe) and my storage account also is located in west-europe. I don't have setup geo replication for my storage. My connection string is setup to use the http protocol.
// http://blogs.msdn.com/b/windowsazurestorage/archive/2010/06/25/nagle-s-algorithm-is-not-friendly-towards-small-requests.aspx
ServicePointManager.UseNagleAlgorithm = false;
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
queue.CreateIfNotExist();
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
Any idea how I can get better performance?
EDIT:
Based on Sandrino Di Mattia's answer I re-analyzed the code I've originally posted and found out that it was not complete enough to reproduce the error. In fact I had created a queue just before the call to ServicePointManager.UseNagleAlgorithm = false; The code to reproduce my problem looks more like this:
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
//ServicePointManager.UseNagleAlgorithm = false; // If you change the nagle algorithm here, the performance will be okay.
queue.CreateIfNotExist();
ServicePointManager.UseNagleAlgorithm = false; // TOO LATE, the queue is already created without 'nagle'
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
The suggested solution from Sandrino to configure the ServicePointManager using the app.config file has the advantage that the ServicePointManager is initialized when the application starts up, so you don't have to worry about time dependencies.
I answered a similar question a few days ago: How to achive more 10 inserts per second with azure storage tables.
For adding 1000 items in table storage it took over 3 minutes, and with the changes I described in my answer it dropped to 4 seconds (250 requests/sec). In the end, table storage and storage queues aren't all that different. The backend is the same, data is simply stored in a different way. And both table storage and queues are exposed through a REST API, so if you improve the way you handle your requests, you'll get a better performance.
The most important changes:
expect100Continue: false
useNagleAlgorithm: false (you're already doing this)
Parallel requests combined with connectionManagement/maxconnection
Also, ServicePointManager.DefaultConnectionLimit should be increased before making a service point. Actually Sandrino's answer says the same thing but using config.
Turn off proxy detection even in the cloud. Auto detect in proxy config element. Slows initialisation.
Choose distributed partition keys.
Collocate your account near to compute, and customers.
Design to add more accounts as needed.
Microsoft set the SLA at 2,000 tps on queues and tables as of 07 2012.
I didn't read Sandrino's linked answer, sorry, just was on this question as I was watching Build 2012 session on exactly this.

Bad perfomance of mass insertion to Redis DB with Sider .NET client

I need to insert about one million key-value pairs in Redis DB. I have a Redis server instance on the same computer with my C# application. I use Sider client to connect to Redis. All settings are default. The following code executes for 4 seconds:
redis_client.Pipeline(c =>
{
for (int i = 0; i < 1000; ++i)
{
Console.Write("\r" + i);
string key = "aaaaaaaaaaa" + i;
string value = "bbbbbbbbbb";
c.Set(key, value);
}
});
I tried both usual and pipeline method of insertion. Standard benchmark of Redis shows similar results. CPU or HDD have no problems and them enough for another mass insertion in different databases. Official benchmark page of Redis says about possibility of ~100000 SET operations per second. I have less then 1000... What's the problem?

Categories