Im still pretty new so bear with me on this one, my question(s) are not meant to be argumentative or petty but during some reading something struck me as odd.
Im under the assumption that when computers were slow and memory was expensive using the correct variable type was much more of a necessity than it is today. Now that memory is a bit easier to come by people seem to have relaxed a bit. For example, you see this sample code everywhere:
for (int i = 0; i < length; i++)
int? (-2,147,483,648 to 2,147,483,648) for length? Isnt byte (0-255) a better choice?
So Im curious of your opinion and what you believe to be best practice, I hate to think this would be used only because the acronym "int" is more intuitive for a beginner...or has memory just become so cheap that we really dont need to concern ourselves with such petty things and therefore we should just use long so we can be sure any other numbers/types(within reason) used can be cast automagically?
...or am Im just being silly by concerning myself with such things?
Luca Bolognese posted this in his blog.
Here's the relevant part:
Use int whenever your values can fit in an int, even for values which can never be
negative
Use long when your values can't fit in an int.
Byte, sbyte, short, ushort, uint, and ulong should only ever be used for interop with C code. Otherwise they're not worth the hassle.
Using a variable that is smaller than the CPU native register size, can actually result in more code being emitted.
As another poster said, don't worry about micro-optimisiations. If you have a performance problem, first profile. 9 times out of 10 your performance problem won't be where you thought it was.
No, I don't think you are being silly, this is a great question!
My opinion is that using strongly typed variables is a best practice. In your example, the variable i is always positive so it could be unsigned int.
When developing programs we need to consider: 1) size 2) speed and 3) the cost of the programmer. These are not mutually exclusive, sometimes we trade off size for speed and of course those best able to do this (great programmers) cost more than beginners.
Also remember what is fastest on computer X may be slower on computer B. Is it a 16 bit, 32 bit, 64 bit etc. operating system? In many cases we want a variable to be aligned on word boundaries for speed so that using variables smaller than a word does not end up saving any space.
So it is not necessary best to use the smallest possible variable, but it is always best practice to make an informed choice as to the best type to use.
Generally memory is cheap these days, since you don't have to worry about it you can concern yourself with more important programming details. Think of it like managing memory, the less of it you have to do the more productive you are overall. Ultimately it has to do with the level of abstraction, if you don't need to control how all the cogs and wheels work then it is best not to tinker with them.
In practice you will generally always use either ints or longs (if you need the extra size). I wouldn't concern myself with anything less these days unless I was optimizing. And remember the golden rule of optimization Don't optimize unless you have to. Write your code first then optimize if needed.
Another similar situation is when designing a database schema. I often see people design a schema an allow only what they need on NVARCHAR columns. To me this is ridiculous because it is a variable length column so you are not wasting space, and by given yourself plenty of room you avoid problems down the road. I once worked for a company that had internal logging on the website; once I upgraded to IE8 the website start crashing. After some investigation I found that the logging schema only allowed 32 characters for the browser id string, but when using IE8 (with Vis Studio and other extensions) the browser id string grew beyond 32 and caused a problem which prevented the website from working at all. Sure there could have been more stringent length checking and better error handling on the part of the developer in charge of that, but by allowing for 256 instead of 32 would not only have prevented the crash, but we wouldn't be truncating the data in the db.
I am not suggesting that you use strings, and Int64 for all your datatypes (no more than I suggest you set all your sql columns to NVARCHAR(4000)) because you lose readability. But choose an appropriate type and give yourself lots of padding.
Local variables like loop indexes are cheap. How many frames are you going to have on the stack at a time? Fifty? A hundred? A thousand? What's the overhead of using a thousand int counters instead of a thousand byte counters? 3K? Is saving 3K worth the rework when it turns out that a couple of those arrays need more than 255 elements?
If you're allocating tens of millions of these things, then squeezing the bit count may make sense. For locals, it's a false economy.
Another factor is what the type communicates to your readers. For better or for worse, people put very little interpretation on an int; but when they see a byte they'll tend to interpret it as something specifically byte-oriented, such as binary data coming off a stream, maybe pixels, or a stream that needs to be run through an encoder to turn it into a string. Using a byte as a loop counter will throw many readers of your code off their stride as they stop and wonder, "Wait, why is this a byte instead of an int?"
Considering this note may help you:
The runtime optimizes the performance
of 32-bit integer types (Int32 and
UInt32), so use those types for
counters and other frequently accessed
integral variables. For floating-point
operations, Double is the most
efficient type because those
operations are optimized by hardware.
source: MCTS Self-Paced Training Kit (Exam 70-536): Microsoft® .NET Framework Application Development Foundation, Second edition
Note: I think this is OK for x86 machines, but for x64 I don't know.
Related
Let's start small, say I need to store a const value of 200, should I always be using a unsigned byte for this?
This is just a minimal thing I guess. But what about structs? Is it wise to build up my structs so that it is dividable by 32 on a 32 bit system? Let's say I need to iterate over a very large array of structs, does it matter much if the struct consists of 34 bits or 64? I would think it gains a lot if I could squeeze off 2 bits from the 34 bit struct?
Or does all this make unnecessary overhead and am I better off replacing all my bits and shorts to ints inside this struct so the CPU does not have to "go looking" for the right memory block?
This is a strong processor implementation detail, the CLR and the jitter already do a lot of work to ensure that your data types are optimal to get the best perf out of the program. There is for example never a case where a struct ever occupies 34 bits, the CLR design choices already ensure that you get a running start on using types that work well on modern processors.
Structs are laid-out to be optimal and that involves alignment choices that depend on the data type. An int for example will always be aligned to an offset that's a multiple of 4. Which gives the processor an easy time to read the int, it doesn't have to multiplex the misaligned bytes back into an int and avoids a scenario where the value straddles a cpu cache line and needs to be glued back together from multiple memory bus reads. Some processors event treat misaligned reads and writes as fatal errors, one of the reasons you don't have an Itanium in your machine.
So if you have a struct that has a byte and an int then you'll end up with a data type that takes 8 bytes which doesn't use 3 of the bytes, the ones between the byte and the int. These unused bytes are called padding. There can also be padding at the end of a struct to ensure that alignment is still optimal when you put them in an array.
Declaring a single variable as a byte is okay, Intel/AMD processors take the same amount of time to read/write one as a 32-bit int. But using short is not okay, that requires an extra byte in the cpu instruction (a size override prefix) and can cost an extra cpu cycle. In practice you don't often save any memory because of the alignment rule. Using byte only buys you something if it can be combined with another byte. An array of bytes is fine, a struct with multiple byte members is fine. Your example is not, it works just as well when you declare it int.
Using types smaller than an int can be awkward in C# code, the MSIL code model is int-based. Basic operators like + and - are only defined for int and larger, there is no operator for smaller types. So you end up having to use a cast to truncate the result back to a smaller size. The sweet spot is int.
Wow, it really depends on a bunch of stuff. Are you concerned about performance or memory? If it's performance you are generally better off staying with the "natural" word size alignment. So for example if you are using a 64-bit processor using 64-bit ints, aligned on 64-bit boundaries provides the best performance. I don't think C# makes any guarantees about this type of thing (it's meant to remain abstract from the hardware).
That said there is a informal rule that says "Avoid the sin of premature optimization". This is particularly true in C#. If you aren't having a performance or memory issue, don't worry about it.
If you find you are having a performance problem, use a profiler to determine where the problem actually is (it might not be where you think). If it's a memory problem determine the objects consuming the most memory and determine where you can optimize (as per your example using a byte rather than an int or short, if possible).
If you really have to worry about such details you might want to consider using C++, where you can better control memory usage (for example you can allocate large blocks of memory without it being initialized), access bitfields, etc.
I am attempting to ascertain the maximum sizes (in RAM) of a List and a Dictionary. I am also curious as to the maximum number of elements / entries each can hold, and their memory footprint per entry.
My reasons are simple: I, like most programmers, am somewhat lazy (this is a virtue). When I write a program, I like to write it once, and try to future-proof it as much as possible. I am currently writing a program that uses Lists, but noticed that the iterator wants an integer. Since the capabilities of my program are only limited by available memory / coding style, I'd like to write it so I can use a List with Int64s or possibly BigInts (as the iterators). I've seen IEnumerable as a possibility here, but would like to find out if I can just stuff a Int64 into a Dictionary object as the key, instead of rewriting everything. If I can, I'd like to know what the cost of that might be compared to rewriting it.
My hope is that should my program prove useful, I need only hit recompile in 5 years time to take advantage of the increase in memory.
Is it specified in the documentation for the class? No, then it's unspecified.
In terms of current implementations, there's no maximum size in RAM in the classes themselves, if you create a value type that's 2MB in size, push a few thousand into a list, and receive an out of memory exception, that's nothing to do with List<T>.
Internally, List<T>s workings would prevent it from ever having more than 2billion items. It's harder to come to a quick answer with Dictionary<TKey, TValue>, since the way things are positioned within it is more complicated, but really, if I was looking at dealing with a billion items (if a 32-bit value, for example, then 4GB), I'd be looking to store them in a database and retrieve them using data-access code.
At the very least, once you're dealing with a single data structure that's 4GB in size, rolling your own custom collection class no longer counts as reinventing the wheel.
I am using a concurrentdictionary to rank 3x3 patterns in half a million games of go. Obviously there are a lot of possible patterns. With C# 4.0 the concurrentdictionary goes out of memory at around 120 million objects. It is using 8GB at that time (on a 32GB machine) but wants to grow way too much I think (tablegrowths happen in large chunks with concurrentdictionary). Using a database would slow me down at least a hundredfold I think. And the process is taking 10 hours already.
My solution was to use a multiphase solution, actually doing multiple passes, one for each subset of patterns. Like one pass for odd patterns and one for even patterns. When using more objects no longer fails I can reduce the amount of passes.
C# 4.5 adds support for larger arraysin 64bit by using unsigned 32bit pointers for arrays
(the mentioned limit goes from 2 billion to 4 billion). See also
http://msdn.microsoft.com/en-us/library/hh285054(v=vs.110).aspx. Not sure which objects will benefit from this, List<> might.
I think you have bigger issues to solve before even wondering if a Dictionary with an int64 key will be useful in 5 or 10 years.
Having a List or Dictionary of 2e+10 elements in memory (int32) doesn't seem to be a good idea, never mind 9e+18 elements (int64). Anyhow the framework will never allow you to create a monster that size (not even close) and probably never will. (Keep in mind that a simple int[int.MaxValue] array already far exceeds the framework's limit for memory allocation of any given object).
And the question remains: Why would you ever want your application to hold in memory a list of so many items? You are better of using a specialized data storage backend (database) if you have to manage that amount of information.
My question is basically about how the C# compiler handles memory allocation of small datatypes. I do know that for example operators like add are defined on int and not on short and thus computations will be executed as if the shorts are int members.
Assuming the following:
There's no business logic/validation logic associated with the choice of short as a datatype
We're not doing anything with unsafe code
Does using the short datatype wherever possible reduce the memory footprint of my application and is it advisable to do so? Or is using short and the like not worth the effort as the compiler allocates the full memory ammount of a int32 for example and adds additional casts when doing arithmetic.
Any links on the supposed runtime performance impact would be greatly appreciated.
Related questions:
Why should I use int instead of a byte or short in C#
Integer summing blues, short += short problem
From a memory-only perspective, using short instead of int will be better. The simple reason is that a short variable needs only half the size of an int variable in memory. The CLR does not expand short to int in memory.
Nevertheless this reduced memory consumption might and probably will decrease runtime performance of your application significantly. All modern CPUs do perform much better with 32bit numbers than with 16bit numbers. Additionally in many cases the CLR will have to convert between short and int when e.g. calling methods that take int arguments. There are many other performance considerations you have to take before going this way.
I would only change this at very dedicated locations and modules of your application and only if you really encounter measurable memory shortages.
In some cases you can of course switch from int to short easily without hurting performance. One example is a giant array of ints all of which do also fit to shorts.
It makes sense in terms of memory usage only if you have in your program very large arrays (or collections built on arrays like List<>) of these types, or arrays of packed structs composed of same. By 'large' I mean that the total memory footprint of these arrays is a large percentage of the working set and a large percentage of the available memory. As for advisability, I'd venture that it is inadvisable to use short types unless the data your program operates on is explicitly specified in terms of short etc., or the volume of data runs into gigabytes.
In short - yes. However you should also take care about memory alignment. You may find Mastering C# structs and Memory alignment of classes in c#? useful
Depends on what you are using the shorts for. Also, are you allocating that many variables that the memory footprint is going to matter?
If this program was going to be used on a mobile device or a device with memory limitations then I might be concerned. However, most machines today are running at least 1-2 gb of ram an have pretty decent dual core processors. Also, most mobile devices today are becoming beast mini computers. If your declaring so much that that type of machine would start to die then you have a problem in your code already.
However, in response to the question. It can matter in memory limited machines if your declaring a lot of 4 byte variables when you only need a 2 byte variable to fill them then you should probable use the short.
If you preforming complicated calculations, square roots and such, or high value calculations. Then you should probably use variables with more bytes so you don't risk losing any data. Just declare what you need when you need it. Zero it out if your done with it to make sure C# cleans it up, if your worried about memory limitations.
When it comes to the machine language, at the registry level, I think it is better to align with the registry size as most of the move and arithmetic functions are done at the registry boundaries. If the machine has 32-bit registry set, it is better to align with 32-bit. If the machine has 16-bit registries for I/O operations, it is a good practice to align with 16-bit to reduce the number of operations in moving the content.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
It seems like optimization is a lost art these days. Wasn't there a time when all programmers squeezed every ounce of efficiency from their code? Often doing so while walking five miles in the snow?
In the spirit of bringing back a lost art, what are some tips that you know of for simple (or perhaps complex) changes to optimize C#/.NET code? Since it's such a broad thing that depends on what one is trying to accomplish it'd help to provide context with your tip. For instance:
When concatenating many strings together use StringBuilder instead. See link at the bottom for caveats on this.
Use string.Compare to compare two strings instead of doing something like string1.ToLower() == string2.ToLower()
The general consensus so far seems to be measuring is key. This kind of misses the point: measuring doesn't tell you what's wrong, or what to do about it if you run into a bottleneck. I ran into the string concatenation bottleneck once and had no idea what to do about it, so these tips are useful.
My point for even posting this is to have a place for common bottlenecks and how they can be avoided before even running into them. It's not even necessarily about plug and play code that anyone should blindly follow, but more about gaining an understanding that performance should be thought about, at least somewhat, and that there's some common pitfalls to look out for.
I can see though that it might be useful to also know why a tip is useful and where it should be applied. For the StringBuilder tip I found the help I did long ago at here on Jon Skeet's site.
It seems like optimization is a lost art these days.
There was once a day when manufacture of, say, microscopes was practiced as an art. The optical principles were poorly understood. There was no standarization of parts. The tubes and gears and lenses had to be made by hand, by highly skilled workers.
These days microscopes are produced as an engineering discipline. The underlying principles of physics are extremely well understood, off-the-shelf parts are widely available, and microscope-building engineers can make informed choices as to how to best optimize their instrument to the tasks it is designed to perform.
That performance analysis is a "lost art" is a very, very good thing. That art was practiced as an art. Optimization should be approached for what it is: an engineering problem solvable through careful application of solid engineering principles.
I have been asked dozens of times over the years for my list of "tips and tricks" that people can use to optimize their vbscript / their jscript / their active server pages / their VB / their C# code. I always resist this. Emphasizing "tips and tricks" is exactly the wrong way to approach performance. That way leads to code which is hard to understand, hard to reason about, hard to maintain, that is typically not noticably faster than the corresponding straightforward code.
The right way to approach performance is to approach it as an engineering problem like any other problem:
Set meaningful, measurable, customer-focused goals.
Build test suites to test your performance against these goals under realistic but controlled and repeatable conditions.
If those suites show that you are not meeting your goals, use tools such as profilers to figure out why.
Optimize the heck out of what the profiler identifies as the worst-performing subsystem. Keep profiling on every change so that you clearly understand the performance impact of each.
Repeat until one of three things happens (1) you meet your goals and ship the software, (2) you revise your goals downwards to something you can achieve, or (3) your project is cancelled because you could not meet your goals.
This is the same as you'd solve any other engineering problem, like adding a feature -- set customer focused goals for the feature, track progress on making a solid implementation, fix problems as you find them through careful debugging analysis, keep iterating until you ship or fail. Performance is a feature.
Performance analysis on complex modern systems requires discipline and focus on solid engineering principles, not on a bag full of tricks that are narrowly applicable to trivial or unrealistic situations. I have never once solved a real-world performance problem through application of tips and tricks.
Get a good profiler.
Don't bother even trying to optimize C# (really, any code) without a good profiler. It actually helps dramatically to have both a sampling and a tracing profiler on hand.
Without a good profiler, you're likely to create false optimizations, and, most importantly, optimize routines that aren't a performance problem in the first place.
The first three steps to profiling should always be 1) Measure, 2) measure, and then 3) measure....
Optimization guidelines:
Don't do it unless you need to
Don't do it if it's cheaper to throw new hardware at the problem instead of a developer
Don't do it unless you can measure the changes in a production-equivalent environment
Don't do it unless you know how to use a CPU and a Memory profiler
Don't do it if it's going to make your code unreadable or unmaintainable
As processors continue to get faster the main bottleneck in most applications isn't CPU, it's bandwidth: bandwidth to off-chip memory, bandwidth to disk and bandwidth to net.
Start at the far end: use YSlow to see why your web site is slow for end-users, then move back and fix you database accesses to be not too wide (columns) and not too deep (rows).
In the very rare cases where it's worth doing anything to optimize CPU usage be careful that you aren't negatively impacting memory usage: I've seen 'optimizations' where developers have tried to use memory to cache results to save CPU cycles. The net effect was to reduce the available memory to cache pages and database results which made the application run far slower! (See rule about measuring.)
I've also seen cases where a 'dumb' un-optimized algorithm has beaten a 'clever' optimized algorithm. Never underestimate how good compiler-writers and chip-designers have become at turning 'inefficient' looping code into super efficient code that can run entirely in on-chip memory with pipelining. Your 'clever' tree-based algorithm with an unwrapped inner loop counting backwards that you thought was 'efficient' can be beaten simply because it failed to stay in on-chip memory during execution. (See rule about measuring.)
When working with ORMs be aware of N+1 Selects.
List<Order> _orders = _repository.GetOrders(DateTime.Now);
foreach(var order in _orders)
{
Print(order.Customer.Name);
}
If the customers are not eagerly loaded this could result in several round trips to the database.
Don't use magic numbers, use enumerations
Don't hard-code values
Use generics where possible since it's typesafe & avoids boxing & unboxing
Use an error handler where it's absolutely needed
Dispose, dispose, dispose. CLR wound't know how to close your database connections, so close them after use and dispose of unmanaged resources
Use common-sense!
OK, I have got to throw in my favorite: If the task is long enough for human interaction, use a manual break in the debugger.
Vs. a profiler, this gives you a call stack and variable values you can use to really understand what's going on.
Do this 10-20 times and you get a good idea of what optimization might really make a difference.
If you identify a method as a bottleneck, but you don't know what to do about it, you are essentially stuck.
So I'll list a few things. All of these things are not silver bullets and you will still have to profile your code. I'm just making suggestions for things you could do and can sometimes help. Especially the first three are important.
Try solving the problem using just (or: mainly) low-level types or arrays of them.
Problems are often small - using a smart but complex algorithm does not always make you win, especially if the less-smart algorithm can be expressed in code that only uses (arrays of) low level types. Take for example InsertionSort vs MergeSort for n<=100 or Tarjan's Dominator finding algorithm vs using bitvectors to naively solve the data-flow form of the problem for n<=100. (the 100 is of course just to give you some idea - profile!)
Consider writing a special case that can be solved using just low-level types (often problem instances of size < 64), even if you have to keep the other code around for larger problem instances.
Learn bitwise arithmetic to help you with the two ideas above.
BitArray can be your friend, compared to Dictionary, or worse, List. But beware that the implementation is not optimal; You can write a faster version yourself. Instead of testing that your arguments are out of range etc., you can often structure your algorithm so that the index can not go out of range anyway - but you can not remove the check from the standard BitArray and it is not free.
As an example of what you can do with just arrays of low level types, the BitMatrix is a rather powerful structure that can be implemented as just an array of ulongs and you can even traverse it using an ulong as "front" because you can take the lowest order bit in constant time (compared with the Queue in Breadth First Search - but obviously the order is different and depends on the index of the items rather than purely the order in which you find them).
Division and modulo are really slow unless the right hand side is a constant.
Floating point math is not in general slower than integer math anymore (not "something you can do", but "something you can skip doing")
Branching is not free. If you can avoid it using a simple arithmetic (anything but division or modulo) you can sometimes gain some performance. Moving a branch to outside a loop is almost always a good idea.
People have funny ideas about what actually matters. Stack Overflow is full of questions about, for example, is ++i more "performant" than i++. Here's an example of real performance tuning, and it's basically the same procedure for any language. If code is simply written a certain way "because it's faster", that's guessing.
Sure, you don't purposely write stupid code, but if guessing worked, there would be no need for profilers and profiling techniques.
The truth is that there is no such thing as the perfect optimised code. You can, however, optimise for a specific portion of code, on a known system (or set of systems) on a known CPU type (and count), a known platform (Microsoft? Mono?), a known framework / BCL version, a known CLI version, a known compiler version (bugs, specification changes, tweaks), a known amount of total and available memory, a known assembly origin (GAC? disk? remote?), with known background system activity from other processes.
In the real world, use a profiler, and look at the important bits; usually the obvious things are anything involving I/O, anything involving threading (again, this changes hugely between versions), and anything involving loops and lookups, but you might be surprised at what "obviously bad" code isn't actually a problem, and what "obviously good" code is a huge culprit.
Tell the compiler what to do, not how to do it. As an example, foreach (var item in list) is better than for (int i = 0; i < list.Count; i++) and m = list.Max(i => i.value); is better than list.Sort(i => i.value); m = list[list.Count - 1];.
By telling the system what you want to do it can figure out the best way to do it. LINQ is good because its results aren't computed until you need them. If you only ever use the first result, it doesn't have to compute the rest.
Ultimately (and this applies to all programming) minimize loops and minimize what you do in loops. Even more important is to minimize the number of loops inside your loops. What's the difference between an O(n) algorithm and an O(n^2) algorithm? The O(n^2) algorithm has a loop inside of a loop.
I don't really try to optimize my code but at times I will go through and use something like reflector to put my programs back to source. It is interesting to then compare what I wrong with what the reflector will output. Sometimes I find that what I did in a more complicated form was simplified. May not optimize things but helps me to see simpler solutions to problems.
Explicitly checking/handling that you don't hit the 2^31 - 1 (?) maximum number of entries when adding to a C# List is crazyness, true of false?
(Assuming this is an app where the average List size is less than a 100.)
1. Memory limits
Well, size of System.Object without any properties is 8 bytes (2x32 bit pointers), or 16 bytes in 64-bit system. [EDIT:] Actually, I just checked in WinDbg, and the size is 12bytes on x86 (32-bit).
So in a 32-bit system, you would need 24Gb ram (which you cannot have on a 32-bit system).
2. Program design
I strongly believe that such a large list shouldn't be held in memory, but rather in some other storage medium. But in that case, you will always have the option to create a cached class wrapping a List, which would handle actual storage under the hood. So testing the size before adding is the wrong place to do the testing, your List implementation should do it itself if you find it necessary one day.
3. To be on the safe side
Why not add a re-entrance counter inside each method to prevent a Stack Overflow? :)
So, yes, it's crazy to test for that. :)
Seems excessive. Would you not hit the machine's memory limit first, depending on the size of the objects in your list ? (I assume this check is performed by the user of the List class, and is not any check in the implementation?)
Perhaps it's reassuring that colleagues are thinking ahead though ? (sarcasm!)
It would seem so, and I probably wouldn't include the check but I'm conflicted on this. Programmers once though that 2 digits were enough to represent the year in date fields on the grounds that it was fine for the expected life of their code, however we discovered that this assumption wasn't correct.
Look at the risk, look at the effort and make a judgement call (otherwise known as an educated guess! :-) ). I wouldn't say there's any hard or fast rule on this one.
As in the answer above there would more things going wrong I suspect than to worry about that. But yes if you have the time and inclination that you can polish code till it shines!
True
(well you asked true or false..)
Just tried this code:
List<int> list = new List<int>();
while (true) list.Add(1);
I got a System.OutOfMemoryException. So what would you do to check / handle this?
If you keep adding items to the list, you'll run out of memory long before you hit that limit. By "long" I really mean "a lot sooner than you think".
See this discussion on the large object heap (LOB). Once you hit around 21500 elements (half that on a 64-bit system) (assuming you're storing object references), your list will start to be a large object. Since the LOB isn't compacted in the same way the normal .NET heaps are, you'll eventually fragment it badly enough that a large enough continous memory area cannot be allocated.
So you don't have to check for that limit at all, it's not a real limit.
Yes, that is crazyness.
Consider what happens to the rest of the code when you start to reach those numbers. Is the application even usable if you would have millions of items in the list?
If it's even possible that the application would reach that amount of data, perhaps you should instead take measures to keep the list from getting that large. Perhaps you should not even keep all the data in memory at once. I can't really imagine a scenario where any code could practially make use of that much data.