Related
I am working in c# and I am creating a list (newList) that I can get from the length of another list (otherList). In c# is list implemented in a way that preallocating the length of the list is better for performance using otherList.Count or to just use newList.Add(obj) and not worry about the length?
The following constructor for List<T> is implemented for the purpose of improving performance in scenarios like yours:
http://msdn.microsoft.com/en-us/library/dw8e0z9z.aspx
public List(int capacity)
Just pass the capacity in the constructor.
newList = new List<string>(otherList.Count);
If you know the exact length of the new list, creating it with that capacity indeed performs - a bit - better.
The reason is that the implementation of List<T> internally uses an array. If this gets too small, a new array is created and the items from the old array are copied over to the new item.
Taken from the Remarks section on MSDN
The capacity of a List<T> is the number of elements that the List<T>
can hold. As elements are added to a List<T>, the capacity is automatically increased as required by reallocating the internal array.
If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the List<T>.
The capacity can be decreased by calling the TrimExcess method or by
setting the Capacity property explicitly. Decreasing the capacity
reallocates memory and copies all the elements in the List<T>.
So, this would suggest that there would be a performance increase if you have an estimate of the size of the list you are going to populate. Of course the other-side of this is allocating a list size too big and therefore using up memory unnecessarily.
To be honest, I would not worry about this sort of micro optimization unless I really need to.
So I benchmarked the scenario of preallocation, and just wanted to share some numbers. The code simply times this:
var repetitions = 100000000
var list = new List<object>();
for (var i = 0; i < repetitions; i++)
list.Add(new object());
The comparison is by allocating in the list the exact number of repetitions, twice the number and without preallocation. This are the times:
List not preallocated: 6561, 6394, 6556, 6283, 6466
List preallocated: 5951, 6037, 5885, 6044, 5996
List double preallocated: 6710, 6665, 6729, 6760, 6624
So is preallocation worth it? Yes if the number of objects in the list is known, or if a bottom line of this number is known (how many items will the list have at least).
If not, you could risk actually wasting more time and memory, since the preallocation will use time and memory to make space for the allocation desired.
Note that also in terms of memory, exact preallocation is the most memory efficient, since not preallocating will make your list grow over the tightest needed capacity.
I want to write a function that takes a collection of integers and removes the duplicates from the collection. I can not apply any sorting algorithm. Similarly I cannot duplicate the collection. I need to conserve the memory and provide an efficient solution that can process millions of items without significantly overusing the battery.
if you are very short on memory, best solution would be not to include the redundant
integers in the list in the first place.
To do this you might use an array [0..65536] of boolean (that you might 'pack' 8 by 8 to get it smaller) which record which one was allready used.
Another solution is to have the list sorted, by inserting items in the right place, but not inserting them if they are allready here. Insertion will be in log(number of unique items so far) for each item, so it should be something like a n*log(n) time for your list.
If you do not have control over the source, you could still use an array of boolean, maybe bigger if you need to, then initialize it (set all to false, then : isUsed[itemList[i]] = true;), then you can dispose of the list so you have memory again, then build a new list out of the array. So the output will be ordered.
If your integers are 32 bits, array would be 500 MB big, so maybe too big..., but depending on the integers distribution (is there a wide range of possible numbers ?? ), you might do be able to lower that size...
Notice that if you are very short on memory you might use an object pool to reuse objects.
(you might even re-use objects that you just removed from the list.)
This question already has answers here:
How to find the index of an element in an array in Java?
(15 answers)
Closed 6 years ago.
I was asked this question in an interview. Although the interview was for dot net position, he asked me this question in context to java, because I had mentioned java also in my resume.
How to find the index of an element having value X in an array ?
I said iterating from the first element till last and checking whether the value is X would give the result. He asked about a method involving less number of iterations, I said using binary search but that is only possible for sorted array. I tried saying using IndexOf function in the Array class. But nothing from my side answered that question.
Is there any fast way of getting the index of an element having value X in an array ?
As long as there is no knowledge about the array (is it sorted? ascending or descending? etc etc), there is no way of finding an element without inspecting each one.
Also, that is exactly what indexOf does (when using lists).
How to find the index of an element having value X in an array ?
This would be fast:
int getXIndex(int x){
myArray[0] = x;
return 0;
}
A practical way of finding it faster is by parallel processing.
Just divide the array in N parts and assign every part to a thread that iterates through the elements of its part until value is found. N should preferably be the processor's number of cores.
If a binary search isn't possible (beacuse the array isn't sorted) and you don't have some kind of advanced search index, the only way I could think of that isn't O(n) is if the item's position in the array is a function of the item itself (like, if the array is [10, 20, 30, 40], the position of an element n is (n / 10) - 1).
Maybe he wants to test your knowledge about Java.
There is Utility Class called Arrays, this class contains various methods for manipulating arrays (such as sorting and searching)
http://download.oracle.com/javase/6/docs/api/java/util/Arrays.html
In 2 lines you can have a O(n * log n) result:
Arrays.sort(list); //O(n * log n)
Arrays.binarySearch(list, 88)); //O(log n)
Puneet - in .net its:
string[] testArray = {"fred", "bill"};
var indexOffset = Array.IndexOf(testArray, "fred");
[edit] - having read the question properly now, :) an alternative in linq would be:
string[] testArray = { "cat", "dog", "banana", "orange" };
int firstItem = testArray.Select((item, index) => new
{
ItemName = item,
Position = index
}).Where(i => i.ItemName == "banana")
.First()
.Position;
this of course would find the FIRST occurence of the string. subsequent duplicates would require additional logic. but then so would a looped approach.
jim
It's a question about data structures and algorithms (altough a very simple data structure). It goes beyond the language you are using.
If the array is ordered you can get O(log n) using binary search and a modified version of it for border cases (not using always (a+b)/2 as the pivot point, but it's a pretty sophisticated quirk).
If the array is not ordered then... good luck.
He can be asking you about what methods you have in order to find an item in Java. But anyway they're not faster. They can be olny simpler to use (than a for-each - compare - return).
There's another solution that's creating an auxiliary structure to do a faster search (like a hashmap) but, OF COURSE, it's more expensive to create it and use it once than to do a simple linear search.
Take a perfectly unsorted array, just a list of numbers in memory. All the machine can do is look at individual numbers in memory, and check if they are the right number. This is the "password cracker problem". There is no faster way than to search from the beginning until the correct value is hit.
Are you sure about the question? I have got a questions somewhat similar to your question.
Given a sorted array, there is one element "x" whose value is same as its index find the index of that element.
For example:
//0,1,2,3,4,5,6,7,8,9, 10
int a[10]={1,3,5,5,6,6,6,8,9,10,11};
at index 6 that value and index are same.
for this array a, answer should be 6.
This is not an answer, in case there was something missed in the original question this would clarify that.
If the only information you have is the fact that it's an unsorted array, with no reletionship between the index and value, and with no auxiliary data structures, then you have to potentially examine every element to see if it holds the information you want.
However, interviews are meant to separate the wheat from the chaff so it's important to realise that they want to see how you approach problems. Hence the idea is to ask questions to see if any more information is (or could be made) available, information that can make your search more efficient.
Questions like:
1/ Does the data change very often?
If not, then you can use an extra data structure.
For example, maintain a dirty flag which is initially true. When you want to find an item and it's true, build that extra structure (sorted array, tree, hash or whatever) which will greatly speed up searches, then set the dirty flag to false, then use that structure to find the item.
If you want to find an item and the dirty flag is false, just use the structure, no need to rebuild it.
Of course, any changes to the data should set the dirty flag to true so that the next search rebuilds the structure.
This will greatly speed up (through amortisation) queries for data that's read far more often than written.
In other words, the first search after a change will be relatively slow but subsequent searches can be much faster.
You'll probably want to wrap the array inside a class so that you can control the dirty flag correctly.
2/ Are we allowed to use a different data structure than a raw array?
This will be similar to the first point given above. If we modify the data structure from an array into an arbitrary class containing the array, you can still get all the advantages such as quick random access to each element.
But we gain the ability to update extra information within the data structure whenever the data changes.
So, rather than using a dirty flag and doing a large update on the next search, we can make small changes to the extra information whenever the array is changed.
This gets rid of the slow response of the first search after a change by amortising the cost across all changes (each change having a small cost).
3. How many items will typically be in the list?
This is actually more important than most people realise.
All talk of optimisation tends to be useless unless your data sets are relatively large and performance is actually important.
For example, if you have a 100-item array, it's quite acceptable to use even the brain-dead bubble sort since the difference in timings between that and the fastest sort you can find tend to be irrelevant (unless you need to do it thousands of times per second of course).
For this case, finding the first index for a given value, it's probably perfectly acceptable to do a sequential search as long as your array stays under a certain size.
The bottom line is that you're there to prove your worth, and the interviewer is (usually) there to guide you. Unless they're sadistic, they're quite happy for you to ask them questions to try an narrow down the scope of the problem.
Ask the questions (as you have for the possibility the data may be sorted. They should be impressed with your approach even if you can't come up with a solution.
In fact (and I've done this in the past), they may reject all your possibile approaches (no, it's not sorted, no, no other data structures are allowed, and so on) just to see how far you get.
And maybe, just maybe, like the Kobayashi Maru, it may not be about winning, it may be how you deal with failure :-)
My requirement is to find a duplicate number in an array of integers of length 10 ^ 15.
I need to find a duplicate in one pass. I know the method (logic) to find a duplicate number from an array, but how can I handle such a large size.
An array of 10^15 of integers would require more than a petabyte to store. You said it can be done in a single pass, so there's no need to store all the data. But even reading this amount of data would take a lot of time.
But wait, if the numbers are integers, they fall into a certain range, let's say N = 2^32. So you only need to search at most N+1 numbers to find a duplicate. Now that's feasible.
You can use a BitVector array with length = 2^(32-5) = 0x0800000
This has a bit for each posible int32 number.
Note: easy solution (BitArray) do´nt support adecuate constructor.
BitVector32[] bv = new BitVector32[0x8000000];
int[] ARR = ....; // Your array
foreach (int I in ARR)
{
int Element = I >> 5;
int Bit = I & 0x1f;
if (bv[Element ][Bit])
{
// Option for First Duplicate Found
}
else
{
bv[I][Bit] = true;
}
}
You'll need a different data structure. I suspect the requirement isn't really to use an array - I'd hope not, as arrays can only hold up to Int32.MaxValue elements, i.e. 2,147,483,647... much less than 10^15. Even on a 64-bit machine, I believe the CLR requires that arrays have at most that many elements. (See the docs for Array.CreateInstance for example - even though you can specify the bounds as 64-bit integers, it will throw an exception if they're actually that big.)
Now, if you can explain what the real requirement is, we may well be able to suggest alternative data structures.
If this is a theoretical problem rather than a practical one, it would be helpful if you could tell us those constraints, too.
For example, if you've got enough memory for the array itself, then asking for 2^24 bytes to store which numbers you've seen already (one bit per value) isn't asking for much. This is assuming the values themselves are 32-bit integers, of course. Start with an empty array, and set the relevant bit for each number you find. If you find you're about to set one that's already set, then you've found the first duplicate.
You can declare it in the normal way: new int[1000000000000000]. However this will only work on a 64-bit machine; the largest you can expect to store on a 32-bit machine is a little over 2GB.
Realistically you won't be able to store the whole array in memory. You'll need to come up with a way of generating it in smaller chunks, and checking those chunks individually.
What's the data in the array? Maybe you don't need to generate it all in one go. Or maybe you can store the data in a file.
You cannot declare an array of size greater than Int32.MaxValue (2^31, or approx. 2*10^9), so you will have to either chain arrays together or use a List<int> to hold all of the values.
Your algorithm should really be the same regardless of the array size. The best time complexity you'll get has got to be (ideally) O(n) of course.
Consider the following pseudo-code for the algorithm:
Create a HashSet<int> of capacity equal to the range of numbers in your array.
Loop over each number in the array and check if it already exists in the hashset.
if no, add it to the hashset now.
if yes, you've found a duplicate.
Memory usage here is far from trivial, but if you want speed, it will do the job.
You don't need to do anything. By definition, there will be a duplicate because 2^32 < 10^15 - there aren't enough numbers to fill a 10^15 array uniquely. :)
Now if there is an additional requirement that you know where the duplicates are... thats another story, but it wasn't in the original problem.
question,
1) is the number of items in the array 10^15
2) or can the value of the items be 10^15?
if it is #1:
where are you pulling the nummbers from? if its a file you can step through it.
are there more than 2,147,483,647 unique numbers?
if its #2:
a int64 can handle the number
if its #1 and #2:
are there more than 2,147,483,647 unique numbers?
if there are less then 2,147,483,647 unique numbers you can use a List<bigint>
In my program I have a bunch of growing arrays where a new element is grown one by one to the end of the array. I identified Lists to be a speed bottleneck in a critical part of my program due to their slow access time in comparison with an array - switching to an array increased performance tremendously to an acceptable level. So to grow the array i'm using Array.Resize. This works well as my implementation restricts the array size to approximately 20 elements, so the O(N) performance of Array.Resize is bounded.
But it would be better if there was a way to just increase an array by one element at the end without having to use Array.Resize; which I believe does a copy of the old array to the newly sized array.
So my question is, is there a more efficiant method for adding one element to the end of an array without using List or Array.Resize?
A List has constant time access just like an array. For 'growing arrays' you really should be using List.
When you know that you may be adding elements to an array backed structure, you don't want to add one new size at a time. Usually it is best to grow an array by doubling it's size when it fills up.
As has been previously mentioned, List<T> is what you are looking for. If you know the initial size of the list, you can supply an initial capacity to the constructor, which will increase your performance for your initial allocations:
List<int> values = new List<int>(5);
values.Add(1);
values.Add(2);
values.Add(3);
values.Add(4);
values.Add(5);
List's allocate 4 elements to begin with (unless you specify a capacity when you construct it) and then grow every 4 elements.
Why don't you try a similar thing with Array? I.e. create it as having 4 elements, then when you insert the fifth element, first grow the array by another 4 elements.
There is no way to resize an array, so the only way to get a larger array is to use Array.Resize to create a new array.
Why not just create the arrays to have 20 elements from start (or whatever capacity you need at most), and use a variable to keep track of how many elements are used in the array? That way you never have to resize any arrays.
Growing an array AFAIK means that a new array is allocated, the existing content being copied to the new instance. I doubt that this should be faster than using List...?
it's much faster to resize an array in chunks (like 10) and store this as a seperate variable e.g capacity and then only resize the array when the capacity is reached. This is how a list works but if you prefer to use arrays then you should look into resizing them in larger chunks especially if you have a large number of Array.Resize calls
I think that every method, that wants to use array, will not be ever optimized because an array is a static structure so I think it's better to use dynamic structures like List or others.