I'm trying to find multiple ways to solve Project Euler's problem #13. I've already got it solved two different ways, but what I am trying to do this time is to have my solution read from a text file that contains all of the numbers, from there it converts it and adds the column numbers farthest to the right. I also want to solve this problem in a way such that if we were to add new numbers to our list, the list can contain any amount of rows or columns, so it's length is not predefined (non array? I'm not sure if a jagged array would apply properly here since it can't be predefined).
So far I've got:
static void Main(string[] args)
{
List<int> sum = new List<int>();
string bigIntFile = #"C:\Users\Justin\Desktop\BigNumbers.txt";
string result;
StreamReader streamReader = new StreamReader(bigIntFile);
while ((result = streamReader.ReadLine()) != null)
{
for (int i = 0; i < result.Length; i++)
{
int converted = Convert.ToInt32(result.Substring(i, 1));
sum.Add(converted);
}
}
}
which reads the file and converts each character from the string to a single int. I'm trying to think how I can store that int in a collection that is like 2D array, but the collection needs to be versatile and store any # of rows / columns. Any ideas on how to store these digits other than just a basic list? Is there maybe a way I can set up a list so it's like a 2D array that is not predefined? Thanks in advance!
UPDATE: Also I don't want to use "BigInteger". That'd be a little too easy to read the line, convert the string to a BigInt, store it in a BigInt list and then sum up all the integers from there.
There is no resizable 2D collection built into the .NET framework. I'd just go with the "jagged arrays" type of data structure, just with lists:
List<List<int>>
You can also vary this pattern by using an array for each row:
List<int[]>
If you want to read the file a little simpler, here is how:
List<int[]> numbers =
File.EnumerateLines(path)
.Select(lineStr => lineStr.Select(#char => #char - '0').ToArray())
.ToList();
Far less code. You can reuse a lot of built-in stuff to do basic data transformations. That gives you less code to write and to maintain. It is more extensible and it is less prone to bugs.
If you want to select a column from this structure, do it like this:
int colIndex = ...;
int[] column = numbers.Select(row => row[index]).ToArray();
You can encapsulate this line into a helper method to remove noise from your main addition algorithm.
Note, that the efficiency of all those patterns is far less than a 2D array, but in your case it is good enough.
In this case you can simply use an 2D array, since you actually do know in advance its dimensions: 100 x 50.
If for some reason you want to solve a more general problem, you may indeed use a List of Lists, List>.
having said that, I wonder: are you actually trying to sum up all the numbers? if so, I would suggest another approach: consider just which section part of the 50 digit numbers actually influences the first digits of their sum. Hint: you don't need the entire number.
Related
I have to do a program in C# Form, which has to load from a file which looks something like that:
100ACTGGCTTACACTAATCAAG
101TTAAGGCACAGAAGTTTCCA
102ATGGTATAAACCAGAAGTCT
...
120GCATCAGTACGTACCCGTAC
20 lines formed with a number (ID) and 20 letters (ADN); the other file looks like that:
TGCAACGTGTACTATGGACC
In few words, this is a game where a murder is done, there are 20 people; i have to load and split the letters and.. i have to compare them and in the end i have to find the best match.
I have no idea how to do that, I don't know how to load the letters in the array and then to split them.. and then to compare them.
What you want to do here, is use something like a calculation of the Levenshtein distance between the strings.
In simple terms, that provides a count of how many single letters you have to change for a string to become equal to another. In the context of DNA or Proteins, this can be interpreted as representing the number of mutations between two individuals or samples. A shorter distance will therefore indicate a closer relationship between the two.
The algorithm can be fairly heavy computationally, but will give you a good answer. It's also quite fun and enlightening to implement. You can find a couple of ways of implementing it under the wikipedia article.
If you find it challenging to understand how it works, I recommend you set up an example grid by hand, with one short string horizontally along the top, and one vertically along the left side, and try going through the calculations manually, just to understand the concept properly (it can be confusing at first, but is really not that difficult).
This is a simple match function. It might not be of the complexity your game requires. This solution does not require an explicit split on the strings in order to get an array of DNA "letters". The DNA is compared in place.
Compare each "suspect" entry to the "evidence one.
int idLength = 3;
string evidence = //read from file
List<string> suspects = //read from file
List<double> matchScores = new List<double>();
foreach (string suspect in suspects)
{
int count = 0;
for (int i = idLength; i < suspect.Length; i++)
{
if (suspect[i + idLength] == evidence[i]) count++;
}
matchScores.Add(count * 100 / evidence.Length);
}
The matchScores list now contains all the individual match scores. I did not save the maximum match score in a separate variable as there can be several "suspects" with the same score. To find out which subject has the best match, just iterate the matchScores list. The index of the best match is the index of the suspect in the suspects list.
Optimization notes:
you could check each "suspect" string to see where (i.e. at what index does) the DNA sequence starts, as it could be variable;
a dictionary could be used here, instead of two lists, with the "suspect string" as key and the match score as value
I've been struggling for a while now with an error I can't fix.
I searched the Internet without any success and started wandering if it is possible what I want to accomplish.
I want the create an array with the a huge amount of nodes, so huge that it I need BigInteger.
I founded that LinkedList would fit my solution the best, so I started with this code.
BigInteger[] numberlist = { 0, 1 };
LinkedList<BigInteger> number = new LinkedList<BigInteger>(numberlist);
for(c = 2; c <= b; c++)
{
numberlist [b] = 1; /* flag numbers to 1 */
}
The meaning of this is to set all nodes in the linkedlist to active (1).
The vars c and b are bigintegers too.
The error I get from VS10 is :
Cannot implicitly convert type 'System.Numerics.BigInteger' to 'int'. An explicit conversion exists (are you missing a cast?)
The questions:
Is it possible to accomplish?
How can I flag all nodes in number with the use of BigInteger (not int)?
Is there an other better method for accomplishing the thing?
UPDATE
In the example I use c++ as the counter. This is variable though...
The node list could look like this:
numberlist[2]
numberlist[3]
numberlist[200]
numberlist[20034759044900]
numberlist[23847982344986350]
I'll remove processed nodes. At the maximum I'll use 1,5gb of memory.
Please reply on this update, I want to know whether my ideas are correct or not.
Also I would like too learn from my mistakes!
The generic argument of LinkedList<T> describes the element type and has nothing to do with the number of elements you can put in the collection.
Indexing into a linked list is a bad idea too. It's an O(n) operation.
And I can't imagine how you can have more elements than what fits into an Int64. There is simply not enough memory to back that.
You can have more than 2^31-1 elements in a 64bit process, but most likely you need to create your own collection type for that, since most built in collections have lower limits.
If you need more than 2^31 flags I'd create my own collection type that's backed by multiple arrays and bitpacks the flags. That way you get about 8*2^31 = 16 billion flags into a 2GB array.
If your data is sparse you could consider using a HashSet<Int64> or Dictionary<Int64,Node>.
If your data has long sequences with the same value you could use some tree structure or perhaps some variant of run-length-encoding.
If you don't need the indexes at all, you could just use a Queue<T> and dequeue from the beginning.
From your update it seems you don't want to have huge amount of data, you just want to index them using huge numbers. If that's the case, you can use Dictionary<BigInteger, int> or Dictionary<BigInteger, bool> if you only want true/false values. Alternatively, you could use HashSet<BigInteger>, if you don't need to distinguish between false and “not in collection”.
LinkedList<BigInteger> is a small number of elements, where each element is a BigInteger.
.NET doesn't allow any single array to be larger than 2GB (even on 64-bit), so there's no point in having an index larger than an int.
Try breaking your big array into smaller segments, where each segment can be addressed by an int.
If I may read your mind, it sounds like what you want a sparse array which is indexed by a BigInteger. As others have mentioned, LinkedList<BigInteger> is entirely the wrong data structure for this. I suggest something entirely different, namely a Dictionary<BigInteger, int>. This allows you to do the following:
Dictionary<BigInteger, int> data = new Dictionary<BigInteger, int>();
BigInteger b = GetBigInteger();
data[b] = 1; // the BigInteger is the *index*, and the integer is the *value*
I have two multi-dimentional arrays declared like this:
bool?[,] biggie = new bool?[500, 500];
bool?[,] small = new bool?[100, 100];
I want to copy part of the biggie one into the small. Let’s say I want from the index 100 to 199 horizontally and 100 to 199 vertically.
I have written a simple for statement that goes like this:
for(int x = 0; x < 100; x++)
{
For(int y = 0; y < 100; y++)
{
Small[x,y] = biggie[x+100,y+100];
}
}
I do this A LOT in my code, and this has proven to be a major performance jammer.
Array.Copy only copies single-dimentional arrays, and with multi-dimentional arrays it just considers as if the whole matrix is a single array, putting each row at the end of the other, which won’t allow me to cut a square in the middle of my array.
Is there a more efficient way to do this?
Ps.: I do consider refactoring my code in order not to do this at all, and doing whatever I want to do with the bigger array. Copying matrixes just can’t be painless, the point is that I have already stumbled upon this before, looked for an answer, and got none.
In my experience, there are two ways to do this efficiently:
Use unsafe code and work directly with pointers.
Convert the 2D array to a 1D array and do the necessary arithmetic when you need to access it as a 2D array.
The first approach is ugly and it uses potentially invalid assumptions since 2D arrays are not guaranteed to be laid out contiguously in memory. The upshot to the first approach is that you don't have to change your code that is already using 2D arrays. The second approach is as efficient as the first approach, doesn't make invalid assumptions, but does require updating your code.
i want to generate a sequence of unique random numbers in the range of 00000001 to 99999999.
So the first one might be 00001010, the second 40002928 etc.
The easy way is to generate a random number and store it in the database, and every next time do it again and check in the database if the number already exists and if so, generate a new one, check it again, etc.
But that doesn't look right, i could be regenerating a number maybe 100 times if the number of generated items gets large.
Is there a smarter way?
EDIT
as allways i forgot to say WHY i wanted this, and it will probably make things clearer and maybe get an alternative, and it is:
we want to generate an ordernumber for a booking, so we could just use 000001, 000002 etc. But we don't want to give the competitors a clue of how much orders are created (because it's not a high volume market, and we don't want them to know if we are on order 30 after 2 months or at order 100. So we want to have an order number which is random (yet unique)
You can use either an Linear Congruential Generator (LCG) or Linear Feedback Shift Register (LFSR). Google or wikipedia for more info.
Both can, with the right parameters, operate on a 'full-cycle' (or 'full period') basis so that they will generate a 'psuedo-random number' only once in a single period, and generate all numbers within the range. Both are 'weak' generators, so no good for cyptography, but perhaps 'good enough' for apparent randomness. You may have to constrain the period to work within your 'decimal' maximum as having 'binary' periods is necessary.
Update: I should add that it is not necessary to pre-calculate or pre-store previous values in any way, you only need to keep the previous seed-value (single int) and calculate 'on-demand' the next number in the sequence. Of course you can save a chain of pre-calculated numbers to your DB if desired, but it isn't necessary.
How about creating a set all of possible numbers and simply randomising the order? You could then just pick the next number from the tail.
Each number appears only once in the set, and when you want a new one it has already been generated, so the overhead is tiny at the point at which you want one. You could do this in memory or the database of your choice. You'll just need a sensible locking strategy for pulling the next available number.
You could build a table with all the possible numbers in it, give the record a 'used' field.
Select all records that have not been 'used'
Pick a random number (r) between 1 and record count
Take record number r
Get your 'random value' from the record
Set the 'used' flag and update the db.
That should be more efficient than picking random numbers, querying the database and repeat until not found as that's just begging for an eternity for the last few values.
Use Pseudo-random Number Generators.
For example - Linear Congruential Random Number Generator
(if increment and n are coprime, then code will generate all numbers from 0 to n-1):
int seed = 1, increment = 3;
int n = 10;
int x = seed;
for(int i = 0; i < n; i++)
{
x = (x + increment) % n;
Console.WriteLine(x);
}
Output:
4
7
0
3
6
9
2
5
8
1
Basic Random Number Generators
Mersenne Twister
Using this algorithm might be suitable, though it's memory consuming:
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
Put the numbers in the array from 1 to 99999999 and do the shuffle.
For the extremely limited size of your numbers no you cannot expect uniqueness for any type of random generation.
You are generating a 32bit integer, whereas to reach uniqueness you need a much larger number in terms around 128bit which is the size GUIDs use which are guaranteed to always be globally unique.
In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at
The Art of Computer Programming, Volume 2: Seminumerical Algorithms
by Donald E. Knuth. Chapter 3 is all about random numbers.
You could just place your numbers in a set. If the size of the set after generation of your N numbers is too small, generate some more.
Do some trial runs. How many numbers do you have to generate on average? Try to find out an optimal solution to the tradeoff "generate too many numbers" / "check too often for duplicates". This optimal is a number M, so that after generating M numbers, your set will likely hold N unique numbers.
Oh, and M can also be calculated: If you need an extra number (your set contains N-1), then the chance of a random number already being in the set is (N-1)/R, with R being the range. I'm going crosseyed here, so you'll have to figure this out yourself (but this kinda stuff is what makes programming fun, no?).
You could put a unique constraint on the column that contains the random number, then handle any constraint voilations by regenerating the number. I think this normally indexes the column as well so this would be faster.
You've tagged the question with C#, so I'm guessing you're using C# to generate the random number. Maybe think about getting the database to generate the random number in a stored proc, and return it.
You could try giving writing usernames by using a starting number and an incremental number. You start at a number (say, 12000), then, for each account created, the number goes up by the incremental value.
id = startValue + (totalNumberOfAccounts * inctrementalNumber)
If incrementalNumber is a prime value, you should be able to loop around the max account value and not hit another value. This creates the illusion of a random id, but should also have very little conflicts. In the case of a conflicts, you could add a number to increase when there's a conflict, so the above code becomes. We want to handle this case, since, if we encounter one account value that is identical, when we increment, we will bump into another conflict when we increment again.
id = startValue + (totalNumberOfAccounts * inctrementalNumber) + totalConflicts
By fallowing line we can get e.g. 6 non repetitive random numbers for range e.g. 1 to 100.
var randomNumbers = Enumerable.Range(1, 100)
.OrderBy(n => Guid.NewGuid())
.Take(6)
.OrderBy(n => n);
I've had to do something like this before (create a "random looking" number for part of a URL). What I did was create a list of keys randomly generated. Each time it needed a new number it simply randomly selected a number from keys.Count and XOR the key and the given sequence number, then outputted XORed value (in base 62) prefixed with the keys index (in base 62).
I also check the output to ensure it does not contain any naught words. If it does simply take the next key and have a second go.
Decrypting the number is equally simple (the first digit is the index to the key to use, a simple XOR and you are done).
I like andora's answer if you are generating new numbers and might have used it had I known. However if I was to do this again I would have simply used UUIDs. Most (if not every) platform has a method for generating them and the length is just not an issue for URLs.
You could try shuffling the set of possible values then using them sequentially.
I like Lazarus's solution, but if you want to avoid effectively pre-allocating the space for every possible number, just store the used numbers in the table, but build an "unused numbers" list in memory by adding all possible numbers to a collection then deleting every one that's present in the database. Then select one of the remaining numbers and use that, adding it to the list in the database, obviously.
But, like I say, I like Lazaru's solution - I think that's your best bet for most scenarios.
function getShuffledNumbers(count) {
var shuffledNumbers = new Array();
var choices = new Array();
for (var i = 0; i<count; i++) {
// choose a number between 1 and amount of numbers remaining
choices[i] = selectedNumber = Math.ceil(Math.random()*(99999999 - i));
// Now to figure out the number based on this selection, work backwards until
// you figure out which choice this number WOULD have been on the first step
for (var j = 0; j < i; j++) {
if (choices[i - 1 - j] >= selectedNumber) {
// This basically says "it was choice number (selectedNumber) on the last step,
// but if it's greater than or equal to this, it must have been choice number
// (selectedNumber + 1) on THIS step."
selectedNumber++;
}
}
shuffledNumbers[i] = selectedNumber;
}
return shuffledNumbers;
}
This is as fast a way I could think of and only uses memory as it needs, however if you run it all the way through it will use double as much memory because it has two arrays, choices and shuffledNumbers.
Running a linear congruential generator once to generate each number is apt to produce rather feeble results. Running it through a number of iterations which is relatively prime to your base (100,000,000 in this case) will improve it considerably. If before reporting each output from the generator, you run it through one or more additional permutation functions, the final output will still be a duplicate-free permutation of as many numbers as you want (up to 100,000,000) but if the proper functions are chosen the result can be cryptographically strong.
create and store ind db two shuffled versions(SHUFFLE_1 and SHUFFLE_2) of the interval [0..N), where N=10'000;
whenever a new order is created, you assign its id like this:
ORDER_FAKE_INDEX = N*SHUFFLE_1[ORDER_REAL_INDEX / N] + SHUFFLE_2[ORDER_REAL_INDEX % N]
I also came with same kind of problem but in C#. I finally solved it. Hope it works for you also.
Suppose I need random number between 0 and some MaxValue and having a Random type object say random.
int n=0;
while(n<MaxValue)
{
int i=0;
i=random.Next(n,MaxValue);
n++;
Write.Console(i.ToString());
}
the stupid way: build a table to record, store all the numble first, and them ,every time the numble used, and flag it as "used"
System.Random rnd = new System.Random();
IEnumerable<int> numbers = Enumerable.Range(0, 99999999).OrderBy(r => rnd.Next());
This gives a randomly shuffled collection of ints in your range. You can then iterate through the collection in order.
The nice part about this is that you're not actually creating the entire collection in memory.
See comments below - this will generate the entire collection in memory when you iterate to the first element.
You can genearate number like below if you are ok with consumption of memory.
import java.util.ArrayList;
import java.util.Collections;
public class UniqueRandomNumbers {
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<Integer>();
for (int i=1; i<11; i++) {
list.add(i);
}
Collections.shuffle(list);
for (int i=0; i<11; i++) {
System.out.println(list.get(i));
}
}
}
My requirement is to find a duplicate number in an array of integers of length 10 ^ 15.
I need to find a duplicate in one pass. I know the method (logic) to find a duplicate number from an array, but how can I handle such a large size.
An array of 10^15 of integers would require more than a petabyte to store. You said it can be done in a single pass, so there's no need to store all the data. But even reading this amount of data would take a lot of time.
But wait, if the numbers are integers, they fall into a certain range, let's say N = 2^32. So you only need to search at most N+1 numbers to find a duplicate. Now that's feasible.
You can use a BitVector array with length = 2^(32-5) = 0x0800000
This has a bit for each posible int32 number.
Note: easy solution (BitArray) do´nt support adecuate constructor.
BitVector32[] bv = new BitVector32[0x8000000];
int[] ARR = ....; // Your array
foreach (int I in ARR)
{
int Element = I >> 5;
int Bit = I & 0x1f;
if (bv[Element ][Bit])
{
// Option for First Duplicate Found
}
else
{
bv[I][Bit] = true;
}
}
You'll need a different data structure. I suspect the requirement isn't really to use an array - I'd hope not, as arrays can only hold up to Int32.MaxValue elements, i.e. 2,147,483,647... much less than 10^15. Even on a 64-bit machine, I believe the CLR requires that arrays have at most that many elements. (See the docs for Array.CreateInstance for example - even though you can specify the bounds as 64-bit integers, it will throw an exception if they're actually that big.)
Now, if you can explain what the real requirement is, we may well be able to suggest alternative data structures.
If this is a theoretical problem rather than a practical one, it would be helpful if you could tell us those constraints, too.
For example, if you've got enough memory for the array itself, then asking for 2^24 bytes to store which numbers you've seen already (one bit per value) isn't asking for much. This is assuming the values themselves are 32-bit integers, of course. Start with an empty array, and set the relevant bit for each number you find. If you find you're about to set one that's already set, then you've found the first duplicate.
You can declare it in the normal way: new int[1000000000000000]. However this will only work on a 64-bit machine; the largest you can expect to store on a 32-bit machine is a little over 2GB.
Realistically you won't be able to store the whole array in memory. You'll need to come up with a way of generating it in smaller chunks, and checking those chunks individually.
What's the data in the array? Maybe you don't need to generate it all in one go. Or maybe you can store the data in a file.
You cannot declare an array of size greater than Int32.MaxValue (2^31, or approx. 2*10^9), so you will have to either chain arrays together or use a List<int> to hold all of the values.
Your algorithm should really be the same regardless of the array size. The best time complexity you'll get has got to be (ideally) O(n) of course.
Consider the following pseudo-code for the algorithm:
Create a HashSet<int> of capacity equal to the range of numbers in your array.
Loop over each number in the array and check if it already exists in the hashset.
if no, add it to the hashset now.
if yes, you've found a duplicate.
Memory usage here is far from trivial, but if you want speed, it will do the job.
You don't need to do anything. By definition, there will be a duplicate because 2^32 < 10^15 - there aren't enough numbers to fill a 10^15 array uniquely. :)
Now if there is an additional requirement that you know where the duplicates are... thats another story, but it wasn't in the original problem.
question,
1) is the number of items in the array 10^15
2) or can the value of the items be 10^15?
if it is #1:
where are you pulling the nummbers from? if its a file you can step through it.
are there more than 2,147,483,647 unique numbers?
if its #2:
a int64 can handle the number
if its #1 and #2:
are there more than 2,147,483,647 unique numbers?
if there are less then 2,147,483,647 unique numbers you can use a List<bigint>