Related
EDIT: As result of the answers so far I like to add more focus in what I like to zero in on: A database that allows writing in-memory (could be simple C# code) with persistence to storage options in order to access the data from within R. Redis so far looks the most promising. I also consider to actually use something similar to Lockfree++ or ZeroMQ, in order to avoid writing data concurrently to the database, but rather sending all to be persisted data over a message bus/other implementation and to have one "actor" handle all write operations to an in-memory db or other solution. Any more ideas aside Redis (some mentioned SQLite and I will need to still test its performance). Any other suggestions?
I am searching for the ideal database structure/solution that meets most of my below requirements but so far I utterly failed. Can you please help?
My tasks: I run a process in .Net 4.5 (C#) and generate (generally) value types that I want to use for further analysis in other applications and therefore like to either preserve in-memory or persist on disk. More below. The data is generated within different tasks/threads and thus a row based data format does not lend itself well to match this situation (because the data generated in different threads is generated at different times and is thus not aligned). Thus I thought a columnar data structure may be suitable but please correct me if I am wrong.
Example:
Tasks/Thread #1 generates the following data at given time stamps
datetime.ticks / value of output data
1000000001 233.23
1000000002 233.34
1000000006 234.23
...
Taks/Thread #2 generates the following data at given time stamps
datetime.ticks / value of output data
1000000002 33.32
1000000005 34.34
1000000015 54.32
...
I do not need to align the time stamps at the .Net run-time, I am first and foremost after preserving the data and to process the data within R or Python at a later point.
My requirements:
Fast writes, fast writes, fast writes: It can happen that I generate 100,000- 1,000,000 data points per second and need to persist (worst case) or retain in memory the data. Its ok to run the writes on its own thread so this process can lag the data generation process but limitation is 16gb RAM (64bit code), more below.
Preference is for columnar db format as it lends itself well to how I want to query the data later but I am open to any other structure if it makes sense in regards to the examples above (document/key-value also ok if all other requirements are met, especially in terms of write speed).
API that can be referenced from within .Net. Example: HDF5 may be considered capable by some but I find their .Net port horrible.Something that supports .Net a little better would be a plus but if all other requirements are met then I can deal with something similar to the HDF5 .Net port.
Concurrent writes if possible: As described earlier I like to write data concurrently from different tasks/threads.
I am constrained by 16gb memory (run .Net process in 64bit) and thus I probably look for something that is not purely in-memory as I may sometimes generate more data than that. Something in-memory which persists at times or a pure persistence model is probably preferable.
Preference for embedded but if a server in a client/server solution can run as a windows service then no issue.
In terms of data access I have strong preference for a db solution for which interfaces from R and Python already exist because I like to use the Panda library within Python for time series alignments and other analysis and run analyses within R.
If the API/library supports in addition SQL/SQL-like/Linq/ like queries that would be terrific but generally I just need the absolute bare bones such as load columnar data in between start and end date (given the "key"/index is in such format) because I analyze and run queries within R/Python.
If it comes with a management console or data visualizer that would be a plus but not a must.
Should be open source or priced within "reach" (no, KDB does not qualify in that regards ;-)
OK, here is what I have so far, and again its all I got because most db solution simply fail already on the write performance requirement:
Infobright and Db4o. I like what I read so far but I admit I have not checked into any performance stats
Something done myself. I can easily store value types in binary format and index the data by datetime.ticks , I just would need to somehow write scripts to load/deserialize the data in Python/R. But it would be a massive tasks if I wanted to add concurrency, a query engine, and other goodies. Thus I look for something already out there.
I can't comment -- low rep (I'm new here) -- so you get a full answer instead...
First, are you sure you need a database at all? If fast write speed and portability to R is your biggest concern then have you just considered a flat file mechanism? According to your comments you're willing to batch writes out but you need persistence; if those were my requirements I'd write a straight-to-disck buffering system that was lightning fast then build a separate task that periodically took the disk files and moved them into a data store for R, and that's only if R reading the flat files wasn't sufficient in the first place.
If you can do alignment after-the-fact, then you could write the threads to separate files in your main parallel loop, cutting each file off every so often, and leave the alignment and database loading to the subprocess.
So (in crappy pseudo_code), build a thread process that you'd call with backgroundworker or some such and include a threadname string uniquely identifying each worker and thus each filestream (task/thread):
file_name = threadname + '0001.csv' // or something
open(file_name for writing)
while(generating_data) {
generate_data()
while (buffer_not_full and very_busy) {
write_data_to_buffer
generate_data()
}
flush_buffer_to_disk(file_name)
if(file is big enough or enough time has passed or we're not too busy) {
close(file_name)
move(file_name to bob's folder)
increment file_name
open(file_name for writing)
}
)
Efficient and speedy file I/O and buffering is a straightforward and common problem. Nothing is going to be faster than this. Then you can just write another process to do the database loads and not sweat the performance there:
while(file_name in list of files in bob's folder sorted by date for good measure)
{
read bob's file
load bob's file to database
align dates, make pretty
}
And I wouldn't write that part in C#, I'd batch script it and use the database's native loader which is going to be as fast as anything you can build from scratch.
You'll have to make sure the two loops don't interfere much if you're running on the same hardware. That is, run the task threads at a higher priority, or build in some mutex or performance limiters so that the database load doesn't hog resources while the threads are running. I'd definitely segregate the database server and hardware so that file I/O to the flat files isn't compromised.
FIFO queues would work if you're on Unix, but you're not. :-)
Also, hardware is going to have more of a performance impact for you than the database engine, I'd imagine. If you're on a budget I'm guessing you're on COTS hardware, so springing for a solid state drive may up performance fairly cheaply. As I said, separating the DB storage from the flat file storage would help, and the CPU/RAM for R, the Database, and your Threads should all be segregated ideally.
What I'm saying is that choice of DB vendor probably isn't your biggest issue, unless you have a lot of money to spend. You'll be hardware bound most of the time otherwise. Database tuning is an art, and while you can eek out minor performance gains at the top end, having a good database administrator will keep most databases in the same ballpark for performance. I'd look at what R and Python support well and that you're comfortable with. If you think in columnar fashion then look at R and C#'s support for Cassandra (my vote), Hana, Lucid, HBase, Infobright, Vertica and others and pick one based on price and support. For traditional databases on a single commodity machine, I haven't seen anything that MySQL can't handle.
This is not to answer my own question but to keep track of all data bases which I tested so far and why they have not met my requirements (yet): each time I attempted to write 1 million single objects (1 long, 2 floats) to the database. For ooDBs, I stuck the objects into a collection and wrote the collection itself, similar story for key/value such as Redis but also attempted to write simple ints (1mil) to columnar dbs such as InfoBright.
Db4o, awefully slow writes: 1mil objects within a collection took about 45 seconds. I later optimized the collection structure and also wrote each object individually, not much love here.
InfoBright: Same thing, very slow in terms of write speed, which surprised me quite a bit as it organizes data in columnar format but I think the "knowledge tree" only kicks in when querying data rather than when saving flat data structures/tables-like structures.
Redis (through BookSleeve): Great API for .Net: Full Redis functionality (though couple drawbacks to run the server on Windows machines vs. a Linux or Unix box). Performance was very fast...North of 1 million items per second. I serialized all objects using Protocol Buffers (protobuf-net, both written by Marc Gravell), still need to play a lot more with the library but R and Python both have full access to the Redis DB, which is a big plus. Love it so far. The Async framework that Marc wrote around the Redis base functions is awesome, really neat and it works so far. I wanna spend a little more time to experiment with the Redis Lists/Collection types as well, as I so far only serialized to byte arrays.
SqLite: I ran purely in-memory and managed to write 1 million value type elements in around 3 seconds. Not bad for a pure RDBMS, obviously the in-memory option really speeds things up. I only created one connection, one transaction, created one command, one parameter, and simply adjusted the value of the parameter within a loop and ran the ExecuteNonQuery on each iteration. The transaction commit was then run outside the loop.
HDF5: Though there is a .Net port and there also exists a library to somehow work with HDF5 files out of R, I strongly discourage anyone to do so. Its a pure nightmare. The .Net port is very badly written, heck, the whole HDF5 concept is more than questionable. Its a very old and in my opinion outgrown solution to store vectorized/columnar data. This is 2012 not 1995. If one cannot completely delete datasets and vectors out of the file in which they were stored before then I do not call that an annoyance but a major design flaw. The API in general (not just .Net) is very badly designed and written imho, there are tons of class objects that nobody, without having spent hours and hours of studying the file structure, understands how to use. I think that is somewhat evidenced by the very sparse amount of documentation and example code that is out there. Furthermore, the h5r R library is a drama, an absolute nightmare. Its badly written as well (often the file upon writing is not correctly close due to a faulty flush and it corrupts files), the library has issues to even be properly installed on 32 bit OSs...and it goes on and on. I write the most about HDF5 because I spent the most of my time on this piece of .... and ended up with the most frustration. The idea to have a fast columnar file storage system, accessible from R and .Net was enticing but it just does not deliver what it promised in terms of API integration and usability or lack thereof.
Update: I ditched testing velocityDB simply because there does not seem any adapter to access the db from within R available. I currently contemplate writing my own GUI with charting library which would access the generated data either from a written binary file or have it sent over a broker-less message bus (zeroMQ) or sent through LockFree++ to an "actor" (my gui). I could then call R from within C# and have results returned to my GUI. That would possibly allow me the most flexibility and freedom, but would obviously also be the most tedious to code. I am running into more and more limitations during my tests that with each db test I befriend this idea more and more.
RESULT: Thanks for the participation. In the end I awarded the bounty points to Chipmonkey because he suggested partly what I considered important points to the solution to my problem (though I chose my own, different solution in the end).
I ended up with a hybrid between Redis in memory storage and direct calls out of .Net to the R.dll. Redis allows access to its data stored in memory by different processes. This makes it a convenient solution to quickly store the data as key/value in Redis and to then access the same data out of R. Additionally I directly send data and invoke functions in R through its .dll and the excellent R.Net library. Passing a collection of 1 million value types to R takes about 2.3 seconds on my machine which is fast enough given that I get the convenience to just pass in the data, invoke computational functions within R out of the .Net environment and getting the results back sync or async.
Just a note: I once had a similar problem posted by a fellow in a delphi forum. I could help him with a simple ID-key-value database backend I wrote at that time (kind of a NoSQL engine). Basically, it uses a B-Tree to store triplets (32bit ObjectID, 32bit PropertyKey, 64bit Value). I could manage to save about 500k/sec Values in real time (about 5 years ago). Of course, the data was indexed on all three values (ID, property-ID and value). You could optimize this by ignoring the value index.
The source I still have is in Delphi, but I would think about implementing something like that using C#. I cannot tell you whether it will meet your needs for performance, but if all else fails, give it a try. Using a buffered write should also drastically improve performance.
I would go with way combining persistence storage (I personally prefer db4o, but you can use files as well as mentioned above) and storing objects into memory this way:
use BlockingCollection<T> to store objects in memory (I believe you will achieve better performance then 1000000/s to store objects in memory), and than have one or more processing threads which will consume the objects and store them into persistent database
// Producing thread
for (int i=0; i<1000000; i++)
blockingCollection.Add(myObject);
// Consuming threads
while (true)
{
var myObject = blockingCollection.Take();
db4oSession.Store(myObject); // or write it to the files or whathever
}
BlockingCollection pretty much solves Producer-Consumer workflow, and in case you will use multiple instance of them and use AddToAny/TakeFromAny you can reach any kind of multithreaded performance
each consuming thread could have different db4o session (file) to reach desired performance (db4o is singlethreaded).
Since you want to use ZeroMQ why not use memcache over Redis?
ZeroMQ offers no persistence as far as I know. Memcache also offers no persistence and is a bit faster than Redis.
Or perhaps the other way, if you use Redis why not use beanstalk MQ?
If you want to use Redis (for the persistence) you might want to switch from ZeroMQ to beanstalk MQ (also a fast in memory queue, but also has persistence via logging). Beanstalk also has C# libs.
I am at the start of a mid sized asp.net c# project and with an application performance requirement to be able to support around 400+ concurrent users.
What are the things I need to keep in mind while architecting an application to meet such performance and availability standards? The page need to be served in under 5 seconds. I plan to have the application and database on separate physical machines. From a coding and application layering perspective:-
If I have the database layer exposed to the application layer via a
WCF service, will it hamper the performance? Should I use a direct
tcp connection instead?
Will it matter if I am using Entity framework or some other ORM or the enterprise library data block?
Should I log exceptions to database or a text file?
How do I check while development if the code being built is going to meet those performance standards eventually? Or is this even a point I need to worry about at development stage?
Do I need to put my database connection code and other classes that hold lookup data that rarely change for the life of the application, in static classes so it is available thru the life of the application?
What kind of caching policy should I apply?
What free tools can I use to measure and test performance? I know of red-gate performance measurement tools but that has a high license cost, so free tools are what I'd prefer.
I apologize if this question is too open ended. Any tips or thoughts on how I should proceed?
Thanks for your time.
An important consideration when designing a scalable application is to make it stateless. No sessions. Another important consideration is to cache everything that you can in order to reduce database queries. And this cache should be distributed to other machines which are specifically design to store it. Then all you have to do is throw an additional server when the application starts to run slowly due to an increased user load.
As far as your questions about WCF are concerned, you can use WCF, it won't be a bottleneck for your application. It will definitely add an additional layer which will slow things a bit but if you want to expose a reusable layer that can scale independently on its own WCF is great.
ORMs might indeed introduce a performance slowdown in your application. It's more due to the fact that you have less control over the generated SQL queries and thus more difficult to tune them. This doesn't mean that you shouldn't use an ORM. It's just to be careful about what SQL it spits and tune it with your DB admin. There are also lightweight ORMs such as dapper, PetaPoco and Massive that you might consider.
As far as static classes are concerned, they won't improve performance that much compared to instance classes. A class instantiation on the CLR is a pretty fast operation as Ayende explains. Static classes will introduce tight coupling between your data access layer and your consuming layer. So you can forget about static classes for the moment.
For error logging, I would recommend you ELMAH.
For benchmarking there are quite a lot of tools, Apanche Bench is one that is simple to use.
There's always a trade-off between developer productivity, maintainability and performance; you can only really make that trade-off sensibly if you can measure. Productivity is measured by how long it takes to get something done; maintainability is harder to measure, but luckily, performance is fairly easy to quantify. In general, I'd say to optimize for productivity and maintainability first, and only optimize for performance if you have a measurable problem.
To work in this way, you need to have performance targets, and a way of regularly assessing the solution against those targets - it's very hard to retro-fit performance into a project. However, optimizing for performance without proven necessity tends to lead to obscure, hard-to-debug software solutions.
Firstly, you need to turn your performance target into numbers you can measure; for web applications, that's typically "dynamic page requests per second". 400 concurrent users probably don't all request pages at exactly the same time - they usually spend some time reading the page, completing forms etc. On the other hand, AJAX-driven sites request a lot more dynamic pages.
Use Excel or something to work from peak concurrent users to dynamic page generations per second based on wait time, requests per interaction, and build in a buffer - I usually over-provision by 50%.
For instance:
400 concurrent users with a session length of 5 interactions and
2 dynamic pages per interaction means 400 * 5 * 2 = 4000 page requests.
With a 30 seconds wait time, those requests will be spread over 30 * 5 = 150 seconds.
Therefore, your average page requests / second is 4000 / 150 = 27 requests / second.
With a 50% buffer, you need to be able to support a peak of roughly 40 requests / second.
That's not trivial, but by no means exceptional.
Next, set up a performance testing environment whose characteristics you completely understand and can replicate, and can map to the production environment. I usually don't recommend re-creating production at this stage. Instead, reduce your page generations / second benchmark to match the performance testing environment (e.g. if you have 4 servers in production and only 2 in the performance testing environment, reduce by half).
As soon as you start developing, regularly (at least once a week, ideally every day) deploy your work-in-progress to this testing environment. Use a load test generator (Apache Benchmark or Apache JMeter work for me), write load tests simulating typical user journeys (but without the wait time), and run them against your performance test environment. Measure success by hitting your target "page generations / second" benchmark. If you don't hit the benchmark, work out why (Redgate's ANTS profiler is your friend!).
Once you get closer to the end of the project, try to get a test environment that's closer to the production system in terms of infrastructure. Deploy your work, and re-run your performance tests, increasing the load to reflect the "real" pages / second requirement. At this stage, you should have a good idea of the performance characteristics of the app, so you're really only validating your assumptions. It's usually a lot harder and more expensive to get such a "production-like" environment, and it's usually a lot harder to make changes to the software, so you should use this purely to validate, not to do the regular performance engineering work.
I'm curious how does a typical C# profiler work?
Are there special hooks in the virtual machine?
Is it easy to scan the byte code for function calls and inject calls to start/stop timer?
Or is it really hard and that's why people pay for tools to do this?
(as a side note i find a bit interesting bec it's so rare - google misses the boat completely on the search "how does a c# profiler work?" doesn't work at all - the results are about air conditioners...)
There is a free CLR Profiler by Microsoft, version 4.0.
https://www.microsoft.com/downloads/en/details.aspx?FamilyID=be2d842b-fdce-4600-8d32-a3cf74fda5e1
BTW, there's a nice section in the CLR Profiler doc that describes how it works, in detail, page 103. There's source as part of distro.
Is it easy to scan the byte code for
function calls and inject calls to
start/stop timer?
Or is it really hard and that's why
people pay for tools to do this?
Injecting calls is hard enough that tools are needed to do it.
Not only is it hard, it's a very indirect way to find bottlenecks.
The reason is what a bottleneck is is one or a small number of statements in your code that are responsible for a good percentage of time being spent, time that could be reduced significantly - i.e. it's not truly necessary, i.e. it's wasteful.
IF you can tell the average inclusive time of one of your routines (including IO time), and IF you can multiply it by how many times it has been called, and divide by the total time, you can tell what percent of time the routine takes.
If the percent is small (like 10%) you probably have bigger problems elsewhere.
If the percent is larger (like 20% to 99%) you could have a bottleneck inside the routine.
So now you have to hunt inside the routine for it, looking at things it calls and how much time they take. Also you want to avoid being confused by recursion (the bugaboo of call graphs).
There are profilers (such as Zoom for Linux, Shark, & others) that work on a different principle.
The principle is that there is a function call stack, and during all the time a routine is responsible for (either doing work or waiting for other routines to do work that it requested) it is on the stack.
So if it is responsible for 50% of the time (say), then that's the amount of time it is on the stack,
regardless of how many times it was called, or how much time it took per call.
Not only is the routine on the stack, but the specific lines of code costing the time are also on the stack.
You don't need to hunt for them.
Another thing you don't need is precision of measurement.
If you took 10,000 stack samples, the guilty lines would be measured at 50 +/- 0.5 percent.
If you took 100 samples, they would be measured as 50 +/- 5 percent.
If you took 10 samples, they would be measured as 50 +/- 16 percent.
In every case you find them, and that is your goal.
(And recursion doesn't matter. All it means is that a given line can appear more than once in a given stack sample.)
On this subject, there is lots of confusion. At any rate, the profilers that are most effective for finding bottlenecks are the ones that sample the stack, on wall-clock time, and report percent by line. (This is easy to see if certain myths about profiling are put in perspective.)
1) There's no such thing as "typical". People collect profile information by a variety of means: time sampling the PC, inspecting stack traces, capturing execution counts of methods/statements/compiled instructions, inserting probes in code to collect counts and optionally calling contexts to get profile data on a call-context basis. Each of these techniques might be implemented in different ways.
2) There's profiling "C#" and profiling "CLR". In the MS world, you could profile CLR and back-translate CLR instruction locations to C# code. I don't know if Mono uses the same CLR instruction set; if they did not, then you could not use the MS CLR profiler; you'd have to use a Mono IL profiler. Or, you could instrument C# source code to collect the profiling data, and then compile/run/collect that data on either MS, Mono, or somebody's C# compatible custom compiler, or C# running in embedded systems such as WinCE where space is precious and features like CLR-built-ins tend to get left out.
One way to instrument source code is to use source-to-source transformations, to map the code from its initial state to code that contains data-collecting code as well as the original program. This paper on instrumenting code to collect test coverage data shows how a program transformation system can be used to insert test coverage probes by inserting statements that set block-specific boolean flags when a block of code is executed. A counting-profiler substitutes counter-incrementing instructions for those probes. A timing profiler inserts clock-snapshot/delta computations for those probes. Our C# Profiler implements both counting and timing profiling for C# source code both ways; it also collect the call graph data by using more sophisticated probes that collect the execution path. Thus it can produce timing data on call graphs this way. This scheme works anywhere you can get your hands on a halfway decent resolution time value.
This is a link to a lengthy article that discusses both instrumentation and sampling methods:
http://smartbear.com/support/articles/aqtime/profiling/
I have been asked to show the benefits and limitations of Parallelism and evaluate it for use within our company. We are predominantly a data orientated business, and essentially load objects from the database, then put them through some business logic, display to the user, then save back to the DB. In my mind, there isn't too much in that pipe line that would benefit from running in parallel, but being fairly new to the concept, I could be completely wrong. Would there be any part of that simple pipe line that would benefit from running in parallel? And are there any guidelines for how to implement this style of programming?
Also, are there any tools (preferably that come with VS2010) that would show where bottle necks occur and would be able to visually show what's going on when I click "Go" on a simple app that runs a given amount of loops (pre-written simple maths loops e.g. for i as integer = 1 to 1000 - do some calculations) in parallel, then in series?
I need to be able to display the difference using a decent profiling tool.
Yes, even from that simple model you could greatly benefit from parrallelism.
Say for instance that during a load of your data you're doing something like this:
foreach(var datarow in someDataSet)
{
//put your data into some business objects here
}
you could optimize this with parrallelism by doing something like this:
Parrallel.ForEach(someDataSet, datarow =>
{
//put your data into some business objects here
});
This could greatly increase your performance depending on how much data your processing here.
Each data row will now be processed asynchronously instead of in sequence like the typical foreach loop.
My suggestion to you would be to run some simple performance tests on an example as simple as this one and see what kind of results you get. Plot it out in a spreadsheet or something, and show it to your team. You might be suprised with the results you get.
You may reap more benefit from implementing a caching layer (distributed or otherwise) than parallelizing your current pipeline.
With a caching layer, the objects you use frequently will reside in the in-memory cache, allowing for much greater read/write performance. There are a number of options for keeping the cache in sync, and these will vary depending on which vendor you choose.
I'd suggest having a look at MemCached and NCache and see if you think they would be a good fit.
EDIT: As far as profiling tools go, I've used dotTrace extensively and would highly recommend it. You can download a 30 day trial from JetBrains' website.
Certainly there are many tasks that can be parallelized, a detailed analysis can help but bottlenecks are possible candidates.
This material can help you Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4
Possibly, but my general response to this sort of query would typically be - Do you have any performance problems in your application(s)? If yes then by all means investigate why and consider whether parallel execution can help. If not then time is probably best spent elsewhere.
Have you checked out Microsoft's Parallel Computing with Managed Code site? It contains several articles on implementation guidelines discussing both when and how to use .Net 4's parallel features.
I am tasked with building an application wherein the business users will be defining a number of rules for data manipulation & processing (e.g. taking one numerical value and splitting it equally amongst a number of records selected on the basis of the condition specified in the rule).
On a monthly basis, a batch application has to be run in order to process around half a million records as per the rules defined. Each record has around 100 fields. The environment is .NET, C# and SQL server with a third party rule engine
Could you please suggest how to go about defining and/or ascertaining what kind of hardware will be best suited if the requirement is to process records within a timeframe of let's say around 8 to 10 hours. How will the specs vary if the user either wants to increase or decrease the timeframe depending on the hardware costs?
Thanks in advance
Abby
Create the application and profile it?
Step 0. Create the application. It is impossible to tell real world performance of a multi-computer system like you're describing from "paper" specifications... You need to try it and see what holds the biggest slow downs... This is traditionally physical IO, but not always...
Step 1. Profile with sample sets of data in an isolated environment. This is a gross metric. You're not trying to isolate what takes the time, just measuring the overall time it takes to run the rules.
What does isolated environment mean? You want to use the same sorts of network hardware between the machines, but do not allow any other traffic on that network segment. That introduces too many variables at this point.
What does profile mean? With current hardware, measure how long it takes to complete under the following circumstances. Write a program to automate the data generation.
Scenario 1. 1,000 of the simplest rules possible.
Scenario 2. 1,000 of the most complex rules you can reasonably expect users to enter.
Scenarios 3 & 4. 10,000 Simplest and most complex.
Scenarios 5 & 6. 25,000 Simplest and Most complex
Scenarios 7 & 8. 50,000 Simplest and Most complex
Scenarios 9 & 10. 100,000 Simplest and Most complex
Step 2. Anaylze the data.
See if there are trends in completion time. Figure out if they appear tied to strictly the volume of rules or if the complexity also factors in... I assume it will.
Develop a trend line that shows how long you can expect it to take if there are 200,000 and 500,000 rules. Perform another run at 200,000. See if the trend line is correct, if not, revise your method of developing the trend line.
Step 3. Measure the database and network activity as the system processes the 20,000 rule sets. See if there is more activity happening with more rules. If so the more you speed up the throughput to and from the SQL server the faster it will run.
If these are "relatively low," then CPU and RAM speed are likely where you'll want to beef up the requested machines specification...
Of course if all this testing is going to cost your employer more than buying the beefiest server hardware possible, just quantify the cost of the time spent testing vs. the cost of buying the best server and being done with it and only tweaking your app and the SQL that you control to improve performance...
If this system is not first of a kind, so you can consider following:
Re-use (after additional evaluation) hardware requirements from previous projects
Evaluate hardware requirements based on workload and hardware configuration of existing application
If that is not the case and performance requirements are very important, then the best way would be to create a prototype with, say, 10 rules implemented. Process the dataset using the prototype and extrapolate to a full rule set. Based on this information you should be able to derive initial performance and hardware requirements. Then you can fine tune these specifications taking into account planned growth in processed data volume, scalability requirements and redundancy.