Have Arrays in .NET lost their significance? - c#

For every situation that warrants the use of an array ... there is an awesome collection with benefits. Is there any specific use case for Arrays any more in .NET?

Sending/Receiving data with a specific length comes to mind, ie. Serial Port, Web Request, FTP Request. Basically stuff that works on a lower level in the system. Also, most Collections are using an array for storage (Noteable exception: LinkedList<T>). Collections are just another abstraction layer.

Arrays are useful because they are always linear in memory and are fast to work with. For example I can take a byte[] and marshal directly into a structure without any problems but a List<T> would have to be converted to an array first as far as I know.

No they still have their uses and should always be considered.
Remember arrays are very basic representations of a fixed length so they are very fast and most languages understand them depending on the type used in the array.
You need to define an array size at the time that it is created and cannot change its size later. Lists and other things can grow as needed which adds overhead with respect to memory allocation.
Lists and other types are useful because they can do a lot but sometimes you don't need all that extra overhead so an array is all you need.
It's like driving a 4x4 because you think one day you might need to go off roading even though there is 99.9% of a chance you will be on normal roads. Array would be the basic car and a List for example would be a 4x4... it does everything else a car can do (an under the hood might use most of the same parts) but at the expense of gas, cost, might not fit in certain parking stalls, etc...
Arrays = performance and compatability
Lists (or other representations) = ease of use at a cost of performance and compatability

Yes, there certainly still is a use for arrays. Some methods still needs arrays.
For example:
string[] items = "a,b;c:d".Split(new char[]{',',';',':'});
It's still the simplest way to keep a bunch of items, and the number one choice until you need some specific feature, like for example dynamic growth.

Have arrays lost (some) significance?
Yes. For many tasks requiring a 'table' of items there now are more flexible and useful solutions like List<> and IEnumerable<>.
Have arrays lost their importance?
No. They are the fastest form of storage and they are used 'under the hood' in most of the collection classes, System.String etc.
So, arrays have become more low-level, and an application programmer will be using them directly less often.

Quite apart from everything else, many of those great collection classes you refer to are implemented using arrays. You might not be using them explicitly, but you're using loads of them and your program is better for it. That means that arrays must be in the language (or that the collections are implemented directly using lots of native code, which would be suckier).

Yes? Anytime I have a type which internally maintains a fixed-size collection of items, I use an array as it's the fastest to iterate and requires the least memory. No sense using a List<T>, Queue<T>, etc. if you don't need those features.

Two cases where I work daily with arrays:
Image analysis. Images are almost always byte[] or int[] arrays.
External hardware communication. Most devices require perfectly structured arrays to send/receive messages.

It's a fair question, but the answer is definitely that they're still useful. Speed is one reason, simplicity for fixed sizes is another. But I think the most important one is flexibility. It gives you a nice base to design your own collection, backed by a simple array, if you ever needed it.

No , Array will not loose its importance.
As when you know about the no of items in advance , you can go for array which gives you very fast access.
2- In Graph theroy , still when you store information about link between vertex , you can implment using arrays which are more fast than LinkList implementation.
3- Some methods like string.split returns array.
you can use this wonderful static placeholder of items in a verity of computer problems

I use arrays to maniulate images like in WritableBitmap class.

One thing I must tell you Arrays are the building blocks for any programming language. If you want to declare a storage having more than one element, Arrays are the basic option for you.
Say for instance a List.
If you see the definition of List, it actually holds
T[] items
Just use Reflector and find the definition of List, you will be surprised to find out List to be actually an Array. In .NET most of the collection other than LinkedList are basically an Array implementation. They used Array because of its fast storage and retrieval.
I agree that Array has limitation of Update or Remove, if your main emphasis is in storage than speed, you might go for Linked List.

What do you think the backing field behind many of those fancy collections is?

Related

Difference between ImmutableArray<T> and ImmutableList<T>

What is difference between ImmutableArray<T> and ImmutableList<T>, and where would it be best to use each?
Here is some reading that might help explain: Please welcome ImmutableArray
Here's an excerpt:
Reasons to use immutable array:
Updating the data is rare or the number of elements is quite small (<16)
you need to be able to iterate over the data in performance critical sections
you have many instances of immutable collections and you can’t afford keeping the data in trees
Reasons to stick with immutable list:
Updating the data is common or the number of elements isn’t expected to be small
Updating the collection is more performance critical than iterating the contents
I think you are asking where to use each of them. Please welcome ImmutableArray will help. To summarize, use immutable array when:
Updating the data is rare or the number of elements is quite small (<16)
You need to be able to iterate over the data in performance critical sections
You have many instances of immutable collections and you can’t afford keeping the data in trees
Use immutable list when:
Updating the data is common or the number of elements isn't expected to be small
Updating the collection is more performance critical than iterating the contents
The main difference is that an ImmutableArray is just a wrapper around a normal array, which means that retrieving elements within the array is extremely quick.
An ImmutableArray is also a struct, meaning that it's a value type, so it takes up less space than an Immutable List which is a reference type.
For these reasons, an ImmutableArray is suitable to use if you rarely need to update its data, or the number of elements is small (less than 16), or if the performance of iterating over the collection is important.
If your needs do not match the reasons above, you should use an ImmutableList.
Look here for the documentation.
Going by the blog post "Please welcome ImmutableArray<T>" (the same blog post everyone else is referencing), the differences are...
ImmutableArray:
Simple implementation (just an array).
Not optimized for quickly updating large collections (the entire array needs to be copied).
More efficient for most use cases (anything that does not involve updating large collections).
ImmutableList:
More complicated implementation (something involving trees).
Is optimized for quickly updating large collections (it can be done in logarithmic time).
Less efficient for most use cases.
So if you're not updating much, or your collections are small, use ImmutableArray. If you're going to be frequently updating large collections, you'll need to use ImmutableList.

What are the alternatives for System.Collections.ArrayList?

I have been asked to revise the code written some time ago for a windows form application. The programmer has used ArrayList heavily. I think generic lists are way more efficient compared to array lists and plan to rewrite the code using List<T> I wanted to know if there are any other alternatives that might also be worth considering. I work on .net 2.0
If you're working in .NET 2, then you won't have any of the concurrent collections in .NET 4 available to you, which pretty much just leaves List<T> in terms of "collections which are a bit like ArrayList. (Even within the concurrent collections, there isn't an immediate equivalent - and you should only use the concurrent collections when you actually anticipate concurrent access anyway.)
There are Stack<T> and Queue<T>, as well as LinkedList<T> - but all of those are somewhat different to ArrayList in terms of what you can do with them. They're worth considering if you don't need random access, of course.
I wouldn't expect too much more in terms of efficiency unless you're currently boxing a lot of large value types in your ArrayList. What you can expect is far clearer code. Fewer casts, less uncertainty about the contents of the collection, etc.
If you have the option of upgrading to .NET 3.5 at any point in the near future, that would then give you access to LINQ, which is fabulously useful when dealing with collections. Relatively few new collection types, but much simpler ways of expressing operations on them.
Update:
For add to/remove from head/tail it is better to use LinkedList<T>, but if you can determine exact maximum capacity of collection and size will be close to capacity then may be it's better to use Queue<T> (because internally it's array, reallocated when size reaches capacity). With Queue you will not get memory overhead that comes with LinkedList nodes.
Original:
From MSDN: The List<T> class is the generic equivalent of the ArrayList class.
Please, read carefully List<T> Performance Considerations section.
What should you use depends on how ArrayList is used? Is it random access or add to/remove from head/tail?
Try SortedList and Collection.
Both supported by .NET Framework 2.0

Creating a List64 class

I've decided that I need to create a 64-bit List to fulfill some of my program's needs, namely the ability to use longs as indexes. I've looked at the Mono code for implementing a List, and have come to the general conclusion that no matter what I choose, I should make a variation of IList (using longs) to work with it.
Now my question is, what do you think would be a good approach to this design? I am currently thinking of two possibilities -> since a List is just a wrapper for the Array class, I can just rewrite the List class to use giant arrays; or I can write the class to use a List of Lists to maintain and grow the data as needed. The problem with the first appears to be choosing an array that is too large, and the second is trying to make Remove() and other assorted methods work, when I'd probably need to perform massive memory copies to keep everything indexed properly. Your thoughts?
Since arrays are limited to be indexed by int you will have to use multiples of them anyway. I'd directly go for List of List. Note that all objects in CLR are limited by 2Gb single chunk of memory for allocation.
Side notes:
if you planning to implement true 4GB+ linear array that support insert/remove operations in the middle you simply should adjust your performace expectations - any insert or remove will be slow. At such scale having append only array maybe reasonable approach.
if add/remove are important consider some other data structures i.e. B-tree. You will sacrify constant access time in most cases, but gain reasonable.
if you arrays are sparse consider simple Dictionary instead or as backing storage.
Here is link to more detailed discussion about large arrays.

C# creating a fixed size hashtable

I want to be able to create a fixed size hashmap of say 100 buckets, and if I need to store over 100 items then collisions and overwriting will just have to happen. The hashtable class has a IsFixedSize property however it is readonly.
Am I thinking about this completely wrongly, or is there a solution to this?
Collections in the .NET framework don't allow for a lot of fine-tuning. Although you might find one efficient enough for your needs. Try some viable ones out before optimizing.
If you don't roll your own then you might find a 3rd party alternative that has more fine-grained controls. For example, see The C5 Generic Collection Library
for C# and CLI as a possible start. Check into the various Hash* classes on their documentation page.
If you decide to roll your own then you'll want to implement some of the standard interfaces for collections and/or lists, enumerations, etc so they work as expected with C# foreach and language and .NET features.
You might also take an efficient C++ implementation if you have one and there are ways of using it in C#/.NET. It might take a bit of finagling but there are answers on SO about how to accomplish this kind of thing.

In-memory search index for application takes up too much memory - any suggestions?

In our desktop application, we have implemented a simple search engine using an inverted index.
Unfortunately, some of our users' datasets can get very large, e.g. taking up ~1GB of memory before the inverted index has been created. The inverted index itself takes up a lot of memory, almost as much as the data being indexed (another 1GB of RAM).
Obviously this creates problems with out of memory errors, as the 32 bit Windows limit of 2GB memory per application is hit, or users with lesser spec computers struggle to cope with the memory demand.
Our inverted index is stored as a:
Dictionary<string, List<ApplicationObject>>
And this is created during the data load when each object is processed such that the applicationObject's key string and description words are stored in the inverted index.
So, my question is: is it possible to store the search index more efficiently space-wise? Perhaps a different structure or strategy needs to be used? Alternatively is it possible to create a kind of CompressedDictionary? As it is storing lots of strings I would expect it to be highly compressible.
If it's going to be 1GB... put it on disk. Use something like Berkeley DB. It will still be very fast.
Here is a project that provides a .net interface to it:
http://sourceforge.net/projects/libdb-dotnet
I see a few solutions:
If you have the ApplicationObjects in an array, store just the index - might be smaller.
You could use a bit of C++/CLI to store the dictionary, using UTF-8.
Don't bother storing all the different strings, use a Trie
I suspect you may find you've got a lot of very small lists.
I suggest you find out roughly what the frequency is like - how many of your dictionary entries have single element lists, how many have two element lists etc. You could potentially store several separate dictionaries - one for "I've only got one element" (direct mapping) then "I've got two elements" (map to a Pair struct with the two references in) etc until it becomes silly - quite possibly at about 3 entries - at which point you go back to normal lists. Encapsulate the whole lot behind a simple interface (add entry / retrieve entries). That way you'll have a lot less wasted space (mostly empty buffers, counts etc).
If none of this makes much sense, let me know and I'll try to come up with some code.
I agree with bobwienholt, but If you are indexing datasets I assume these came from a database somewhere. Would it make sense to just search that with a search engine like DTSearch or Lucene.net?
You could take the approach Lucene did. First, you create a random access in-memory stream (System.IO.MemoryStream), this stream mirrors a on-disk one, but only a portion of it (if you have the wrong portion, load up another one off the disk). This does cause one headache, you need a file-mappable format for your dictionary. Wikipedia has a description of the paging technique.
On the file-mappable scenario. If you open up Reflector and reflect the Dictionary class you will see that is comprises of buckets. You can probably use each of these buckets as a page and physical file (this way inserts are faster). You can then also loosely delete values by simply inserting a "item x deleted" value to the file and every so often clean the file up.
By the way, buckets hold values with identical hashes. It is very important that your values that you store override the GetHashCode() method (and the compiler will warn you about Equals() so override that as well). You will get a significant speed increase in lookups if you do this.
How about using Memory Mapped File Win32 API to transparently back your memory structure?
http://www.eggheadcafe.com/articles/20050116.asp has the PInvokes necessary to enable it.
Is the index only added to or do you remove keys from it as well?

Categories