combination question - c#

i have an array like below
int[] array = new array[n];// n may be 2,3,4
example for N = 4
int[] array = new array[4];
array[0] = 2;
array[1] = 4;
array[2] = 6;
array[3] = 8;
how i calculate all unrepeated combination of this array without using linq may be within?
2,4,6,8
2,4,8,6
2,8,6,4
2,6,4,6
8,6,4,2
2,4,6,8
.......
.......
.......

Here's a pretty flexible C# implementation using iterators.

Well, given that you are looking for all unrepeated combinations, that means there will be N! such combinations... (so, in your case, N! = 4! = 24 such combinations).
As I'm in the middle of posting this, dommer has pointed out a good implementation.
Just be warned that it's is going to get really slow for large values of N (since there are N! permutations).

Think about the two possible states of the world to see if that sheds any light.
1) There are no dupes in my array(i.e. each number in the array is unique). In this case, how many possible permutations are there?
2) There is one single dupe in the array. So, of the number of permutations that you calculated in part one, how many are just duplicates
Hmmm, lets take a three element array for simplicity
1,3,5 has how many permutations?
1,3,5
1,5,3
3,1,5
3,5,1
5,1,3
5,3,1
So six permutations
Now what happens if we change the list to say 1,5,5?
We get
1,5,5
5,1,5
5,5,1
My question to you would be, how can you express this via factorials?
Perhaps try writing out all the permutations with a four element array and see if the light bulb goes off?

Related

Best performing algorithm for unique trip selection using arrays?

If I am given three arrays of equal length. Each array represents the distance to a specific attraction (ie the first array is only theme parks, the second is only museums, the third is only beaches) on a road trip I am taking. I wan't to determine all possible trips stopping at one of each type of attraction on each trip, never driving backwards, and never visiting the same attraction twice.
IE if I have the following three arrays:
[29 50]
[61 37]
[37 70]
The function would return 3 because the possible combinations would be: (29,61,70)(29,37,70)(50,61,70)
What I've got so far:
public int test(int[] A, int[] B, int[] C) {
int firstStop = 0;
int secondStop = 0;
int thirdStop = 0;
List<List<int>> possibleCombinations = new List<List<int>>();
for(int i = 0; i < A.Length; i++)
{
firstStop = A[i];
for(int j = 0; j < B.Length; j++)
{
if(firstStop < B[j])
{
secondStop = B[j];
for(int k = 0; k < C.Length; k++)
{
if(secondStop < C[k])
{
thirdStop = C[k];
possibleCombinations.Add(new List<int>{firstStop, secondStop, thirdStop});
}
}
}
}
}
return possibleCombinations.Count();
}
This works for the folowing test cases:
Example test: ([29, 50], [61, 37], [37, 70])
OK Returns 3
Example test: ([5], [5], [5])
OK Returns 0
Example test: ([61, 62], [37, 38], [29, 30])
FAIL Returns 0
What is the correct algorithm to calculate this correctly?
What is the best performing algorithm?
How can I tell the performance of this algorithm's time complexity (ie is it O(N*log(N))?)
UPDATE: The question has been rewritten with new details and still is completely unclear and self-contradictory; attempts to clarify the problem with the original poster have been unsuccessful, and the original poster admits to having started coding before understanding the problem themselves. The solution below is correct for the problem as it was originally stated; what the solution to the real problem looks like, no one can say, because no one can say what the real problem is. I'll leave this here for historical purposes.
Let's re-state the problem:
We are given three arrays of distances to attractions along a road.
We wish to enumerate all sequences of possible stops at attractions that do not backtrack. (NOTE: The statement of the problem is to enumerate them; the wrong algorithm given counts them. These are completely different problems. Counting them can be extremely fast. Enumerating them is extremely slow! If the problem is to count them then clarify the problem.)
No other constraints are given in the problem. (For example, it is not given in the problem that we stop at no more than one beach, or that we must stop at one of every kind, or that we must go to a beach before we go to a museum. If those are constraints then they must be stated in the problem)
Suppose there are a total of n attractions. For each attraction either we visit it or we do not. It might seem that there are 2n possibilities. However, there's a problem. Suppose we have two museums, M1 and M2 both 5 km down the road. The possible routes are:
(Start, End) -- visit no attractions on your road trip
(Start, M1, End)
(Start, M2, End)
(Start, M1, M2, End)
(Start, M2, M1, End)
There are five non-backtracking possibilities, not four.
The algorithm you want is:
Partition the attractions by distance, so that all the partitions contain the attractions that are at the same distance.
For each partition, generate a set of all the possible orderings of all the subsets within that partition. Do not forget that "skip all of them" is a possible ordering.
The combinations you want are the Cartesian product of all the partition ordering sets.
That should give you enough hints to make progress. You have several problems to solve here: partitioning, permuting within a partition, and then taking the cross product of arbitrarily many sets. I and many others have written articles on all of these subjects, so do some research if you do not know how to solve these sub-problems yourself.
As for the asymptotic performance: As noted above, the problem given is to enumerate the solutions. The best possible case is, as noted before, 2n for cases where there are no attractions at the same distance, so we are at least exponential. If there are collisions then it becomes a product of many factorials; I leave it to you to work it out, but it's big.
Again: if the problem is to work out the number of solutions, that's much easier. You don't have to enumerate them to know how many solutions there are! Just figure out the number of orderings at each partition and then multiply all the counts together. I leave figuring out the asymptotic performance of partitioning, working out the number of orderings, and multiplying them together as an exercise.
Your solution runs in O(n ^ 3). But if you need to generate all possible combinations and the distances are sorted row and column wise i.e
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
all solutions will degrade to O(n^3) as it requires to compute all possible subsequences.
If the input has lots of data and the distance between each of them is relatively far then a Sort + binary search + recursive solution might be faster.
static List<List<int>> answer = new List<List<int>>();
static void findPaths(List<List<int>> distances, List<int> path, int rowIndex = 0, int previousValue = -1)
{
if(rowIndex == distances.Count)
{
answer.Add(path);
return;
}
previousValue = previousValue == -1 ? distances[0][0] : previousValue;
int startIndex = distances[rowIndex].BinarySearch(previousValue);
startIndex = startIndex < 0 ? Math.Abs(startIndex) - 1 : startIndex;
// No further destination can be added
if (startIndex == distances[rowIndex].Count)
return;
for(int i=startIndex; i < distances[rowIndex].Count; ++i)
{
var temp = new List<int>(path);
int currentValue = distances[rowIndex][i];
temp.Add(currentValue);
findPaths(distances, temp, rowIndex + 1, currentValue);
}
}
The majority of savings in this solution comes from the fact that since the data is already sorted we need not look distances in the next destinations with distance less than the previous value we have.
For smaller and more closed distances this might be a overkill with the additional sorting and binary search overhead making it slower than the straightforward brute force approach.
Ultimately i think this comes down to how your data is and you can try out both approaches and try which one is faster for you.
Note: This solution does not assume strictly increasing distances i.e) [29, 37, 37] is valid here. If you do not want such solution you'll have to change Binary Search to do a upper bound as opposed to lower bound.
Use Dynamic Programming with State. As there are only 3 arrays, so there are only 2*2*2 states.
Combine the arrays and sort it. [29, 37, 37, 50, 61, 70]. And we make an 2d-array: dp[0..6][0..7]. There are 8 states:
001 means we have chosen 1st array.
010 means we have chosen 2nd array.
011 means we have chosen 1st and 2nd array.
.....
111 means we have chosen 1st, 2nd, 3rd array.
The complexity is O(n*8)=O(n)

Find compressed total ZipFile size in KB (not length) [duplicate]

How can I determine size of an array (length / number of items) in C#?
If it's a one-dimensional array a,
a.Length
will give the number of elements of a.
If b is a rectangular multi-dimensional array (for example, int[,] b = new int[3, 5];)
b.Rank
will give the number of dimensions (2) and
b.GetLength(dimensionIndex)
will get the length of any given dimension (0-based indexing for the dimensions - so b.GetLength(0) is 3 and b.GetLength(1) is 5).
See System.Array documentation for more info.
As #Lucero points out in the comments, there is a concept of a "jagged array", which is really nothing more than a single-dimensional array of (typically single-dimensional) arrays.
For example, one could have the following:
int[][] c = new int[3][];
c[0] = new int[] {1, 2, 3};
c[1] = new int[] {3, 14};
c[2] = new int[] {1, 1, 2, 3, 5, 8, 13};
Note that the 3 members of c all have different lengths.
In this case, as before c.Length will indicate the number of elements of c, (3) and c[0].Length, c[1].Length, and c[2].Length will be 3, 2, and 7, respectively.
You can look at the documentation for Array to find out the answer to this question.
In this particular case you probably need Length:
int sizeOfArray = array.Length;
But since this is such a basic question and you no doubt have many more like this, rather than just telling you the answer I'd rather tell you how to find the answer yourself.
Visual Studio Intellisense
When you type the name of a variable and press the . key it shows you a list of all the methods, properties, events, etc. available on that object. When you highlight a member it gives you a brief description of what it does.
Press F1
If you find a method or property that might do what you want but you're not sure, you can move the cursor over it and press F1 to get help. Here you get a much more detailed description plus links to related information.
Search
The search terms size of array in C# gives many links that tells you the answer to your question and much more. One of the most important skills a programmer must learn is how to find information. It is often faster to find the answer yourself, especially if the same question has been asked before.
Use a tutorial
If you are just beginning to learn C# you will find it easier to follow a tutorial. I can recommend the C# tutorials on MSDN. If you want a book, I'd recommend Essential C#.
Stack Overflow
If you're not able to find the answer on your own, please feel free to post the question on Stack Overflow. But we appreciate it if you show that you have taken the effort to find the answer yourself first.
for 1 dimensional array
int[] listItems = new int[] {2,4,8};
int length = listItems.Length;
for multidimensional array
int length = listItems.Rank;
To get the size of 1 dimension
int length = listItems.GetLength(0);
yourArray.Length :)
With the Length property.
int[] foo = new int[10];
int n = foo.Length; // n == 10
For a single dimension array, you use the Length property:
int size = theArray.Length;
For multiple dimension arrays the Length property returns the total number of items in the array. You can use the GetLength method to get the size of one of the dimensions:
int size0 = theArray.GetLength(0);
In most of the general cases 'Length' and 'Count' are used.
Array:
int[] myArray = new int[size];
int noOfElements = myArray.Length;
Typed List Array:
List <int> myArray = new List<int>();
int noOfElements = myArray.Count;
it goes like this:
1D:
type[] name=new type[size] //or =new type[]{.....elements...}
2D:
type[][]name=new type[size][] //second brackets are emtpy
then as you use this array :
name[i]=new type[size_of_sec.Dim]
or You can declare something like a matrix
type[ , ] name=new type [size1,size2]
What has been missed so far is what I suddenly was irritated about:
How do I know the amount of items inside the array? Is .Length equal .Count of a List?
The answer is: the amount of items of type X which have been put into an array of type X created with new X[number] you have to carry yourself!
Eg. using a counter: int countItemsInArray = 0 and countItemsInArray++ for every assignment to your array.
(The array just created with new X[number] has all space for number items (references) of type X already allocated, you can assign to any place inside as your first assignment, for example (if number = 100 and the variable name = a) a[50] = new X();.
I don't know whether C# specifies the initial value of each place inside an array upon creation, if it doesn't or the initial value you cannot compare to (because it might be a value you yourself have put into the array), you would have to track which places inside the array you already assigned to too if you don't assign sequentially starting from 0 (in which case all places smaller than countItemsInArray would be assigned to).)
In your question size of an array (length / number of items) depending on whether / is meant to stand for "alternative" or "divide by" the latter still has to be covered (the "number of items" I just gave as "amount of items" and others gave .Length which corresponds to the value of number in my code above):
C# has a sizeof operator (https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/sizeof). It's safe to use for built-in types (such as int) (and only operates on types (not variables)). Thus the size of an array b of type int in bytes would be b.Length * sizeof(int).
(Due to all space of an array already being allocated on creation, like mentioned above, and sizeof only working on types, no code like sizeof(variable)/sizeof(type) would work or yield the amount of items without tracking.)

Calculating numbers from an external file from my project (Project Euler #13)

I'm trying to find multiple ways to solve Project Euler's problem #13. I've already got it solved two different ways, but what I am trying to do this time is to have my solution read from a text file that contains all of the numbers, from there it converts it and adds the column numbers farthest to the right. I also want to solve this problem in a way such that if we were to add new numbers to our list, the list can contain any amount of rows or columns, so it's length is not predefined (non array? I'm not sure if a jagged array would apply properly here since it can't be predefined).
So far I've got:
static void Main(string[] args)
{
List<int> sum = new List<int>();
string bigIntFile = #"C:\Users\Justin\Desktop\BigNumbers.txt";
string result;
StreamReader streamReader = new StreamReader(bigIntFile);
while ((result = streamReader.ReadLine()) != null)
{
for (int i = 0; i < result.Length; i++)
{
int converted = Convert.ToInt32(result.Substring(i, 1));
sum.Add(converted);
}
}
}
which reads the file and converts each character from the string to a single int. I'm trying to think how I can store that int in a collection that is like 2D array, but the collection needs to be versatile and store any # of rows / columns. Any ideas on how to store these digits other than just a basic list? Is there maybe a way I can set up a list so it's like a 2D array that is not predefined? Thanks in advance!
UPDATE: Also I don't want to use "BigInteger". That'd be a little too easy to read the line, convert the string to a BigInt, store it in a BigInt list and then sum up all the integers from there.
There is no resizable 2D collection built into the .NET framework. I'd just go with the "jagged arrays" type of data structure, just with lists:
List<List<int>>
You can also vary this pattern by using an array for each row:
List<int[]>
If you want to read the file a little simpler, here is how:
List<int[]> numbers =
File.EnumerateLines(path)
.Select(lineStr => lineStr.Select(#char => #char - '0').ToArray())
.ToList();
Far less code. You can reuse a lot of built-in stuff to do basic data transformations. That gives you less code to write and to maintain. It is more extensible and it is less prone to bugs.
If you want to select a column from this structure, do it like this:
int colIndex = ...;
int[] column = numbers.Select(row => row[index]).ToArray();
You can encapsulate this line into a helper method to remove noise from your main addition algorithm.
Note, that the efficiency of all those patterns is far less than a 2D array, but in your case it is good enough.
In this case you can simply use an 2D array, since you actually do know in advance its dimensions: 100 x 50.
If for some reason you want to solve a more general problem, you may indeed use a List of Lists, List>.
having said that, I wonder: are you actually trying to sum up all the numbers? if so, I would suggest another approach: consider just which section part of the 50 digit numbers actually influences the first digits of their sum. Hint: you don't need the entire number.

smart way to generate unique random number

i want to generate a sequence of unique random numbers in the range of 00000001 to 99999999.
So the first one might be 00001010, the second 40002928 etc.
The easy way is to generate a random number and store it in the database, and every next time do it again and check in the database if the number already exists and if so, generate a new one, check it again, etc.
But that doesn't look right, i could be regenerating a number maybe 100 times if the number of generated items gets large.
Is there a smarter way?
EDIT
as allways i forgot to say WHY i wanted this, and it will probably make things clearer and maybe get an alternative, and it is:
we want to generate an ordernumber for a booking, so we could just use 000001, 000002 etc. But we don't want to give the competitors a clue of how much orders are created (because it's not a high volume market, and we don't want them to know if we are on order 30 after 2 months or at order 100. So we want to have an order number which is random (yet unique)
You can use either an Linear Congruential Generator (LCG) or Linear Feedback Shift Register (LFSR). Google or wikipedia for more info.
Both can, with the right parameters, operate on a 'full-cycle' (or 'full period') basis so that they will generate a 'psuedo-random number' only once in a single period, and generate all numbers within the range. Both are 'weak' generators, so no good for cyptography, but perhaps 'good enough' for apparent randomness. You may have to constrain the period to work within your 'decimal' maximum as having 'binary' periods is necessary.
Update: I should add that it is not necessary to pre-calculate or pre-store previous values in any way, you only need to keep the previous seed-value (single int) and calculate 'on-demand' the next number in the sequence. Of course you can save a chain of pre-calculated numbers to your DB if desired, but it isn't necessary.
How about creating a set all of possible numbers and simply randomising the order? You could then just pick the next number from the tail.
Each number appears only once in the set, and when you want a new one it has already been generated, so the overhead is tiny at the point at which you want one. You could do this in memory or the database of your choice. You'll just need a sensible locking strategy for pulling the next available number.
You could build a table with all the possible numbers in it, give the record a 'used' field.
Select all records that have not been 'used'
Pick a random number (r) between 1 and record count
Take record number r
Get your 'random value' from the record
Set the 'used' flag and update the db.
That should be more efficient than picking random numbers, querying the database and repeat until not found as that's just begging for an eternity for the last few values.
Use Pseudo-random Number Generators.
For example - Linear Congruential Random Number Generator
(if increment and n are coprime, then code will generate all numbers from 0 to n-1):
int seed = 1, increment = 3;
int n = 10;
int x = seed;
for(int i = 0; i < n; i++)
{
x = (x + increment) % n;
Console.WriteLine(x);
}
Output:
4
7
0
3
6
9
2
5
8
1
Basic Random Number Generators
Mersenne Twister
Using this algorithm might be suitable, though it's memory consuming:
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
Put the numbers in the array from 1 to 99999999 and do the shuffle.
For the extremely limited size of your numbers no you cannot expect uniqueness for any type of random generation.
You are generating a 32bit integer, whereas to reach uniqueness you need a much larger number in terms around 128bit which is the size GUIDs use which are guaranteed to always be globally unique.
In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at
The Art of Computer Programming, Volume 2: Seminumerical Algorithms
by Donald E. Knuth. Chapter 3 is all about random numbers.
You could just place your numbers in a set. If the size of the set after generation of your N numbers is too small, generate some more.
Do some trial runs. How many numbers do you have to generate on average? Try to find out an optimal solution to the tradeoff "generate too many numbers" / "check too often for duplicates". This optimal is a number M, so that after generating M numbers, your set will likely hold N unique numbers.
Oh, and M can also be calculated: If you need an extra number (your set contains N-1), then the chance of a random number already being in the set is (N-1)/R, with R being the range. I'm going crosseyed here, so you'll have to figure this out yourself (but this kinda stuff is what makes programming fun, no?).
You could put a unique constraint on the column that contains the random number, then handle any constraint voilations by regenerating the number. I think this normally indexes the column as well so this would be faster.
You've tagged the question with C#, so I'm guessing you're using C# to generate the random number. Maybe think about getting the database to generate the random number in a stored proc, and return it.
You could try giving writing usernames by using a starting number and an incremental number. You start at a number (say, 12000), then, for each account created, the number goes up by the incremental value.
id = startValue + (totalNumberOfAccounts * inctrementalNumber)
If incrementalNumber is a prime value, you should be able to loop around the max account value and not hit another value. This creates the illusion of a random id, but should also have very little conflicts. In the case of a conflicts, you could add a number to increase when there's a conflict, so the above code becomes. We want to handle this case, since, if we encounter one account value that is identical, when we increment, we will bump into another conflict when we increment again.
id = startValue + (totalNumberOfAccounts * inctrementalNumber) + totalConflicts
By fallowing line we can get e.g. 6 non repetitive random numbers for range e.g. 1 to 100.
var randomNumbers = Enumerable.Range(1, 100)
.OrderBy(n => Guid.NewGuid())
.Take(6)
.OrderBy(n => n);
I've had to do something like this before (create a "random looking" number for part of a URL). What I did was create a list of keys randomly generated. Each time it needed a new number it simply randomly selected a number from keys.Count and XOR the key and the given sequence number, then outputted XORed value (in base 62) prefixed with the keys index (in base 62).
I also check the output to ensure it does not contain any naught words. If it does simply take the next key and have a second go.
Decrypting the number is equally simple (the first digit is the index to the key to use, a simple XOR and you are done).
I like andora's answer if you are generating new numbers and might have used it had I known. However if I was to do this again I would have simply used UUIDs. Most (if not every) platform has a method for generating them and the length is just not an issue for URLs.
You could try shuffling the set of possible values then using them sequentially.
I like Lazarus's solution, but if you want to avoid effectively pre-allocating the space for every possible number, just store the used numbers in the table, but build an "unused numbers" list in memory by adding all possible numbers to a collection then deleting every one that's present in the database. Then select one of the remaining numbers and use that, adding it to the list in the database, obviously.
But, like I say, I like Lazaru's solution - I think that's your best bet for most scenarios.
function getShuffledNumbers(count) {
var shuffledNumbers = new Array();
var choices = new Array();
for (var i = 0; i<count; i++) {
// choose a number between 1 and amount of numbers remaining
choices[i] = selectedNumber = Math.ceil(Math.random()*(99999999 - i));
// Now to figure out the number based on this selection, work backwards until
// you figure out which choice this number WOULD have been on the first step
for (var j = 0; j < i; j++) {
if (choices[i - 1 - j] >= selectedNumber) {
// This basically says "it was choice number (selectedNumber) on the last step,
// but if it's greater than or equal to this, it must have been choice number
// (selectedNumber + 1) on THIS step."
selectedNumber++;
}
}
shuffledNumbers[i] = selectedNumber;
}
return shuffledNumbers;
}
This is as fast a way I could think of and only uses memory as it needs, however if you run it all the way through it will use double as much memory because it has two arrays, choices and shuffledNumbers.
Running a linear congruential generator once to generate each number is apt to produce rather feeble results. Running it through a number of iterations which is relatively prime to your base (100,000,000 in this case) will improve it considerably. If before reporting each output from the generator, you run it through one or more additional permutation functions, the final output will still be a duplicate-free permutation of as many numbers as you want (up to 100,000,000) but if the proper functions are chosen the result can be cryptographically strong.
create and store ind db two shuffled versions(SHUFFLE_1 and SHUFFLE_2) of the interval [0..N), where N=10'000;
whenever a new order is created, you assign its id like this:
ORDER_FAKE_INDEX = N*SHUFFLE_1[ORDER_REAL_INDEX / N] + SHUFFLE_2[ORDER_REAL_INDEX % N]
I also came with same kind of problem but in C#. I finally solved it. Hope it works for you also.
Suppose I need random number between 0 and some MaxValue and having a Random type object say random.
int n=0;
while(n<MaxValue)
{
int i=0;
i=random.Next(n,MaxValue);
n++;
Write.Console(i.ToString());
}
the stupid way: build a table to record, store all the numble first, and them ,every time the numble used, and flag it as "used"
System.Random rnd = new System.Random();
IEnumerable<int> numbers = Enumerable.Range(0, 99999999).OrderBy(r => rnd.Next());
This gives a randomly shuffled collection of ints in your range. You can then iterate through the collection in order.
The nice part about this is that you're not actually creating the entire collection in memory.
See comments below - this will generate the entire collection in memory when you iterate to the first element.
You can genearate number like below if you are ok with consumption of memory.
import java.util.ArrayList;
import java.util.Collections;
public class UniqueRandomNumbers {
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<Integer>();
for (int i=1; i<11; i++) {
list.add(i);
}
Collections.shuffle(list);
for (int i=0; i<11; i++) {
System.out.println(list.get(i));
}
}
}

i need to use large size of array

My requirement is to find a duplicate number in an array of integers of length 10 ^ 15.
I need to find a duplicate in one pass. I know the method (logic) to find a duplicate number from an array, but how can I handle such a large size.
An array of 10^15 of integers would require more than a petabyte to store. You said it can be done in a single pass, so there's no need to store all the data. But even reading this amount of data would take a lot of time.
But wait, if the numbers are integers, they fall into a certain range, let's say N = 2^32. So you only need to search at most N+1 numbers to find a duplicate. Now that's feasible.
You can use a BitVector array with length = 2^(32-5) = 0x0800000
This has a bit for each posible int32 number.
Note: easy solution (BitArray) do´nt support adecuate constructor.
BitVector32[] bv = new BitVector32[0x8000000];
int[] ARR = ....; // Your array
foreach (int I in ARR)
{
int Element = I >> 5;
int Bit = I & 0x1f;
if (bv[Element ][Bit])
{
// Option for First Duplicate Found
}
else
{
bv[I][Bit] = true;
}
}
You'll need a different data structure. I suspect the requirement isn't really to use an array - I'd hope not, as arrays can only hold up to Int32.MaxValue elements, i.e. 2,147,483,647... much less than 10^15. Even on a 64-bit machine, I believe the CLR requires that arrays have at most that many elements. (See the docs for Array.CreateInstance for example - even though you can specify the bounds as 64-bit integers, it will throw an exception if they're actually that big.)
Now, if you can explain what the real requirement is, we may well be able to suggest alternative data structures.
If this is a theoretical problem rather than a practical one, it would be helpful if you could tell us those constraints, too.
For example, if you've got enough memory for the array itself, then asking for 2^24 bytes to store which numbers you've seen already (one bit per value) isn't asking for much. This is assuming the values themselves are 32-bit integers, of course. Start with an empty array, and set the relevant bit for each number you find. If you find you're about to set one that's already set, then you've found the first duplicate.
You can declare it in the normal way: new int[1000000000000000]. However this will only work on a 64-bit machine; the largest you can expect to store on a 32-bit machine is a little over 2GB.
Realistically you won't be able to store the whole array in memory. You'll need to come up with a way of generating it in smaller chunks, and checking those chunks individually.
What's the data in the array? Maybe you don't need to generate it all in one go. Or maybe you can store the data in a file.
You cannot declare an array of size greater than Int32.MaxValue (2^31, or approx. 2*10^9), so you will have to either chain arrays together or use a List<int> to hold all of the values.
Your algorithm should really be the same regardless of the array size. The best time complexity you'll get has got to be (ideally) O(n) of course.
Consider the following pseudo-code for the algorithm:
Create a HashSet<int> of capacity equal to the range of numbers in your array.
Loop over each number in the array and check if it already exists in the hashset.
if no, add it to the hashset now.
if yes, you've found a duplicate.
Memory usage here is far from trivial, but if you want speed, it will do the job.
You don't need to do anything. By definition, there will be a duplicate because 2^32 < 10^15 - there aren't enough numbers to fill a 10^15 array uniquely. :)
Now if there is an additional requirement that you know where the duplicates are... thats another story, but it wasn't in the original problem.
question,
1) is the number of items in the array 10^15
2) or can the value of the items be 10^15?
if it is #1:
where are you pulling the nummbers from? if its a file you can step through it.
are there more than 2,147,483,647 unique numbers?
if its #2:
a int64 can handle the number
if its #1 and #2:
are there more than 2,147,483,647 unique numbers?
if there are less then 2,147,483,647 unique numbers you can use a List<bigint>

Categories