I have a List that contains multiple numbers like this:
1.75
1.25
2.03
1.44
What I want to do, is to find a difference between lowest and highest number. In this case it would be 1.25 and 2.03, which would make 0.78.
How should I do it?
The steps are quite simple:
Find the largest number in the list
Find the smallest number in the list
The result you need = [Result of 1] - [Result of 2]
To implement this you can use LINQ:
// Intialize your list (or use the existing one)
var list = new List<decimal>{ 1.75m, 1.25m, 2.03m, 1.44m};
// The result is maximum of the list minus minimum of the list
var result = list.Max() - list.Min();
// Print or use the result
Console.WriteLine(result); // prints the result 0.78
first determine the maximum and the minimum number and then substract the min from the max.
There are a few different ways you can do it, but first a few questions:
Is the list sorted? If it is, then it's really simple.
If it's not sorted, then you may iterate through the list, an have two variables "min" and "max", and in the end just find the difference.
Are you given the list? If not and you are the one adding values to the list, then you can keep track of the added values, and assign them appropriately to your "min" and "max" variables.
You can use LINQ
I need to store a set of elements. What I need is functionality to
remove (single) elements and
add (sets of) elements and
each object should only be in the set once and
get a random element from the set
I chose the HashSet (C#) since it sports fast methods for removing elements (hashSet.remove(element)), adding sets (hashSet.UnionWith(anotherHashSet)) and the nature of a HashSet guarantees that there are not duplicates, so requirements 1 to 3 are taken care of.
The only way I found to get a random element is
Object object = hashSet.ElementAt(rnd.Next(hashSet.Count));
But this is very slow, since I call it once for every pixel of my map (creating a random flood fill from multiple starting points; mapsize 500x500 at the moment but I'd like to go bigger) and the hashset holds rather many items. (A quick test shows it blows up to 5752 entries before shrinking again.)
Profiling (CPU sampling) tells me my ElementAt calls take over 50%.
I realize 500x500 operations over a big hashset is no easy task, but other operations (Remove and UnionWith) are called as often as ElementAt, so the main problem seems to be the operation and not the number of calls.
I vaguely understand why getting a certain element from a HashSet is very expensive (when compared to getting it from a list or another ordered data structure, but I just want a random pick. Can it really be so hard and is there no way around it? Is there a better data structure for my purpose?
Changing everything to Lists doesn't help because now other methods become bottlenecks and it takes even longer.
Casting the HashSet to an array and pick my random element from there expectedly doesn't help because while picking a random element from an array is quick, casting the hashset to the array in the first place takes longer than running hashSet.ElementAt by itself.
If you want to understand better what I am trying to do: A link to my question and the answer.
I think that OrderedDictionary might suit your purposes:
var dict = new OrderedDictionary();
dict.Add("My String Key", "My String");
dict.Add(12345, 54321);
Console.WriteLine(dict[0]); // Prints "My String"
Console.WriteLine(dict[1]); // Prints 54321
Console.WriteLine(dict["My String Key"]); // Prints "My String"
Console.WriteLine(dict[(object)12345]); // Prints 54321 (note the need to cast!)
This has fast add and remove, and O(1) indexing. It only works with object keys and values though - there's no generic version.
[EDIT] Many years later: We now have the strongly-typed generic SortedDictionary<TKey, TValue> which might be better.
The basic problem is the indexing.
In an array or a list, the data is indexed by its coördinate - usually just a simple int index. In a HashSet, you pick the index yourself - the key. The side-effect is, though, that there is no "coördinate" - the question "element at index 3" doesn't make sense, really. The way it's actually implemented is that the whole HashSet is enumerated, item after item, and the n-th item is returned. This means that to get the 1000th item, you have to enumerate all the 999 items before that as well. This hurts.
The best way to solve this would be to pick the random based on an actual key of the HashSet. Of course, this only works if it's reasonable to pick random keys just like that.
If you can't pick the key at random in a satisfactory way, you'll probably want to keep two separate lists - whenever you add a new item to a HashSet, add its key to a List<TKey>; you can then easily pick a random key from the List, and follow it. Depending on your requirements, duplicates may not be much of a problem.
And of course, you could save on the ElementAt enumerations if you only do the enumeration once - for example, before searching the HashSet, you could convert it to List. This only makes sense if you're picking multiple random indices at once, of course (e.g. if you pick 5 indices at random at once, you'll save about 1/5th of the time on average) - if you're always picking one, then modifying the HashSet and picking another, it's not going to help.
Depending on your exact use case, it might also be worth having a look at SortedSet. It works in a similar way to HashSet, but it maintains order in the keys. The helpful part is that you can use the GetViewBetween method to get a whole range of keys - you could use this quite effectively if your keys are sparse, but well balanced between arbitrary ranges. You'd just first pick a range at random, then get the items in range with GetViewBetween, and pick a random one out of those as well. In effect, this will allow you to partition the search results, and should save quite a bit of time.
Let's say I have an array of n integers and I am supposed to return
the product of most common integers.
Example: (2,3,4,2,2,1,4)- returns 8.
I can use normal arrays to store the values.
If I want to have runtime optimization, should I use hashtable in C# or dictionary?
Will that help me have space optimization as well?
Or is there any other approach to have space optimization?
You can have speed optimisation or space optimisation, but not both.
If you want space optimisation you can sort the array, then loop through it to look for the longest sequence of the same value. This will need very little storage except the array, but it will change the original array. Sorting would generally be O(n log n), and then looping would be O(n).
If you want speed optimisation, you can use a dictionary to count the occurances, then get the largest count. This would need by average n * 10 bytes for the dictionary, but it leaves the original array unchanged. Counting would be close to O(n), and finding the largest count would be O(n).
This is so contrived that it looks like homework, so I'm just going to roughly describe how I'll do it. Then you can try something, and come back if you got any questions about it.
I would associate in a table of some sorts (a Dictionary<int, int> is the simplest choice) the numbers with how many times they were encountered. Then it's just a matter of taking the one with the highest count (let's call that count n from now on) and multiplying that number with itself n times.
i want to generate a sequence of unique random numbers in the range of 00000001 to 99999999.
So the first one might be 00001010, the second 40002928 etc.
The easy way is to generate a random number and store it in the database, and every next time do it again and check in the database if the number already exists and if so, generate a new one, check it again, etc.
But that doesn't look right, i could be regenerating a number maybe 100 times if the number of generated items gets large.
Is there a smarter way?
EDIT
as allways i forgot to say WHY i wanted this, and it will probably make things clearer and maybe get an alternative, and it is:
we want to generate an ordernumber for a booking, so we could just use 000001, 000002 etc. But we don't want to give the competitors a clue of how much orders are created (because it's not a high volume market, and we don't want them to know if we are on order 30 after 2 months or at order 100. So we want to have an order number which is random (yet unique)
You can use either an Linear Congruential Generator (LCG) or Linear Feedback Shift Register (LFSR). Google or wikipedia for more info.
Both can, with the right parameters, operate on a 'full-cycle' (or 'full period') basis so that they will generate a 'psuedo-random number' only once in a single period, and generate all numbers within the range. Both are 'weak' generators, so no good for cyptography, but perhaps 'good enough' for apparent randomness. You may have to constrain the period to work within your 'decimal' maximum as having 'binary' periods is necessary.
Update: I should add that it is not necessary to pre-calculate or pre-store previous values in any way, you only need to keep the previous seed-value (single int) and calculate 'on-demand' the next number in the sequence. Of course you can save a chain of pre-calculated numbers to your DB if desired, but it isn't necessary.
How about creating a set all of possible numbers and simply randomising the order? You could then just pick the next number from the tail.
Each number appears only once in the set, and when you want a new one it has already been generated, so the overhead is tiny at the point at which you want one. You could do this in memory or the database of your choice. You'll just need a sensible locking strategy for pulling the next available number.
You could build a table with all the possible numbers in it, give the record a 'used' field.
Select all records that have not been 'used'
Pick a random number (r) between 1 and record count
Take record number r
Get your 'random value' from the record
Set the 'used' flag and update the db.
That should be more efficient than picking random numbers, querying the database and repeat until not found as that's just begging for an eternity for the last few values.
Use Pseudo-random Number Generators.
For example - Linear Congruential Random Number Generator
(if increment and n are coprime, then code will generate all numbers from 0 to n-1):
int seed = 1, increment = 3;
int n = 10;
int x = seed;
for(int i = 0; i < n; i++)
{
x = (x + increment) % n;
Console.WriteLine(x);
}
Output:
4
7
0
3
6
9
2
5
8
1
Basic Random Number Generators
Mersenne Twister
Using this algorithm might be suitable, though it's memory consuming:
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
Put the numbers in the array from 1 to 99999999 and do the shuffle.
For the extremely limited size of your numbers no you cannot expect uniqueness for any type of random generation.
You are generating a 32bit integer, whereas to reach uniqueness you need a much larger number in terms around 128bit which is the size GUIDs use which are guaranteed to always be globally unique.
In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at
The Art of Computer Programming, Volume 2: Seminumerical Algorithms
by Donald E. Knuth. Chapter 3 is all about random numbers.
You could just place your numbers in a set. If the size of the set after generation of your N numbers is too small, generate some more.
Do some trial runs. How many numbers do you have to generate on average? Try to find out an optimal solution to the tradeoff "generate too many numbers" / "check too often for duplicates". This optimal is a number M, so that after generating M numbers, your set will likely hold N unique numbers.
Oh, and M can also be calculated: If you need an extra number (your set contains N-1), then the chance of a random number already being in the set is (N-1)/R, with R being the range. I'm going crosseyed here, so you'll have to figure this out yourself (but this kinda stuff is what makes programming fun, no?).
You could put a unique constraint on the column that contains the random number, then handle any constraint voilations by regenerating the number. I think this normally indexes the column as well so this would be faster.
You've tagged the question with C#, so I'm guessing you're using C# to generate the random number. Maybe think about getting the database to generate the random number in a stored proc, and return it.
You could try giving writing usernames by using a starting number and an incremental number. You start at a number (say, 12000), then, for each account created, the number goes up by the incremental value.
id = startValue + (totalNumberOfAccounts * inctrementalNumber)
If incrementalNumber is a prime value, you should be able to loop around the max account value and not hit another value. This creates the illusion of a random id, but should also have very little conflicts. In the case of a conflicts, you could add a number to increase when there's a conflict, so the above code becomes. We want to handle this case, since, if we encounter one account value that is identical, when we increment, we will bump into another conflict when we increment again.
id = startValue + (totalNumberOfAccounts * inctrementalNumber) + totalConflicts
By fallowing line we can get e.g. 6 non repetitive random numbers for range e.g. 1 to 100.
var randomNumbers = Enumerable.Range(1, 100)
.OrderBy(n => Guid.NewGuid())
.Take(6)
.OrderBy(n => n);
I've had to do something like this before (create a "random looking" number for part of a URL). What I did was create a list of keys randomly generated. Each time it needed a new number it simply randomly selected a number from keys.Count and XOR the key and the given sequence number, then outputted XORed value (in base 62) prefixed with the keys index (in base 62).
I also check the output to ensure it does not contain any naught words. If it does simply take the next key and have a second go.
Decrypting the number is equally simple (the first digit is the index to the key to use, a simple XOR and you are done).
I like andora's answer if you are generating new numbers and might have used it had I known. However if I was to do this again I would have simply used UUIDs. Most (if not every) platform has a method for generating them and the length is just not an issue for URLs.
You could try shuffling the set of possible values then using them sequentially.
I like Lazarus's solution, but if you want to avoid effectively pre-allocating the space for every possible number, just store the used numbers in the table, but build an "unused numbers" list in memory by adding all possible numbers to a collection then deleting every one that's present in the database. Then select one of the remaining numbers and use that, adding it to the list in the database, obviously.
But, like I say, I like Lazaru's solution - I think that's your best bet for most scenarios.
function getShuffledNumbers(count) {
var shuffledNumbers = new Array();
var choices = new Array();
for (var i = 0; i<count; i++) {
// choose a number between 1 and amount of numbers remaining
choices[i] = selectedNumber = Math.ceil(Math.random()*(99999999 - i));
// Now to figure out the number based on this selection, work backwards until
// you figure out which choice this number WOULD have been on the first step
for (var j = 0; j < i; j++) {
if (choices[i - 1 - j] >= selectedNumber) {
// This basically says "it was choice number (selectedNumber) on the last step,
// but if it's greater than or equal to this, it must have been choice number
// (selectedNumber + 1) on THIS step."
selectedNumber++;
}
}
shuffledNumbers[i] = selectedNumber;
}
return shuffledNumbers;
}
This is as fast a way I could think of and only uses memory as it needs, however if you run it all the way through it will use double as much memory because it has two arrays, choices and shuffledNumbers.
Running a linear congruential generator once to generate each number is apt to produce rather feeble results. Running it through a number of iterations which is relatively prime to your base (100,000,000 in this case) will improve it considerably. If before reporting each output from the generator, you run it through one or more additional permutation functions, the final output will still be a duplicate-free permutation of as many numbers as you want (up to 100,000,000) but if the proper functions are chosen the result can be cryptographically strong.
create and store ind db two shuffled versions(SHUFFLE_1 and SHUFFLE_2) of the interval [0..N), where N=10'000;
whenever a new order is created, you assign its id like this:
ORDER_FAKE_INDEX = N*SHUFFLE_1[ORDER_REAL_INDEX / N] + SHUFFLE_2[ORDER_REAL_INDEX % N]
I also came with same kind of problem but in C#. I finally solved it. Hope it works for you also.
Suppose I need random number between 0 and some MaxValue and having a Random type object say random.
int n=0;
while(n<MaxValue)
{
int i=0;
i=random.Next(n,MaxValue);
n++;
Write.Console(i.ToString());
}
the stupid way: build a table to record, store all the numble first, and them ,every time the numble used, and flag it as "used"
System.Random rnd = new System.Random();
IEnumerable<int> numbers = Enumerable.Range(0, 99999999).OrderBy(r => rnd.Next());
This gives a randomly shuffled collection of ints in your range. You can then iterate through the collection in order.
The nice part about this is that you're not actually creating the entire collection in memory.
See comments below - this will generate the entire collection in memory when you iterate to the first element.
You can genearate number like below if you are ok with consumption of memory.
import java.util.ArrayList;
import java.util.Collections;
public class UniqueRandomNumbers {
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<Integer>();
for (int i=1; i<11; i++) {
list.add(i);
}
Collections.shuffle(list);
for (int i=0; i<11; i++) {
System.out.println(list.get(i));
}
}
}
In some library code, I have a List that can contain 50,000 items or more.
Callers of the library can invoke methods that result in strings being added to the list. How do I efficiently check for uniqueness of the strings being added?
Currently, just before adding a string, I scan the entire list and compare each string to the to-be-added string. This starts showing scale problems above 10,000 items.
I will benchmark this, but interested in insight.
if I replace the List<> with a Dictionary<> , will ContainsKey() be appreciably faster as the list grows to 10,000 items and beyond?
if I defer the uniqueness check until after all items have been added, will it be faster? At that point I would need to check every element against every other element, still an n^^2 operation.
EDIT
Some basic benchmark results. I created an abstract class that exposes 2 methods: Fill and Scan. Fill just fills the collection with n items (I used 50,000). Scan scans the list m times (I used 5000) to see if a given value is present. Then I built an implementation of that class for List, and another for HashSet.
The strings used were uniformly 11 characters in length, and randomly generated via a method in the abstract class.
A very basic micro-benchmark.
Hello from Cheeso.Tests.ListTester
filling 50000 items...
scanning 5000 items...
Time to fill: 00:00:00.4428266
Time to scan: 00:00:13.0291180
Hello from Cheeso.Tests.HashSetTester
filling 50000 items...
scanning 5000 items...
Time to fill: 00:00:00.3797751
Time to scan: 00:00:00.4364431
So, for strings of that length, HashSet is roughly 25x faster than List , when scanning for uniqueness. Also, for this size of collection, HashSet has zero penalty over List when adding items to the collection.
The results are interesting and not valid. To get valid results, I'd need to do warmup intervals, multiple trials, with random selection of the implementation. But I feel confident that that would move the bar only slightly.
Thanks everyone.
EDIT2
After adding randomization and multple trials, HashSet consistently outperforms List in this case, by about 20x.
These results don't necessarily hold for strings of variable length, more complex objects, or different collection sizes.
You should use the HashSet<T> class, which is specifically designed for what you're doing.
Use HashSet<string> instead of List<string>, then it should scale very well.
From my tests, HashSet<string> takes no time compared to List<string> :)
Possibly off-topic, but if you want to scale very large unique sets of strings (millions+) in a language-independent way, you might check out Bloom Filters.
Does the Contains(T) function not work for you?
I have read that dictionary<> is implemented as an associative array. In some languages (not necessarily anything related to .NET), string indexes are stored as a tree structure that forks at each node based upon the character in the node. Please see http://en.wikipedia.org/wiki/Associative_arrays.
A similar data structure was devised by Aho and Corasick in 1973 (I think). If you store 50,000 strings in such a structure, then it matters not how many strings you are storing. It matters more the length of the strings. If they are are about the same length, then you will likely never see a slow-down in lookups because the search algorithm is linear in run-time with respect to the length of the string you are searching for. Even for a red-black tree or AVL tree, the search run-time depends more upon the length of the string you are searching for rather than the number of elements in the index. However, if you choose to implement your index keys with a hash function, you now incurr the cost of hashing the string (going to be O(m), m = string length) and also the lookup of the string in the index, which will likely be on the order of O(log(n)), n = number of elements in the index.
edit: I'm not a .NET guru. Other more experienced people suggest another structure. I would take their word over mine.
edit2: your analysis is a little off for comparing uniqueness. If you use a hashing structure or dictionary, then it will not be an O(n^2) operation because of the reasoning I posted above. If you continue to use a list, then you are correct that it is O(n^2) * (max length of a string in your set) because you must examine each element in the list each time.