I've been searching around for a solution to this, but I think because of how I'm thinking about it, my search phrases might be a bit loaded in favor of topics that aren't completely relevant.
I have a number, say 950,000. This represents an inventory of [widgets] within an entire system. I have about 200 "buckets" that should each receive a portion of this inventory such that there are no widgets left over.
What I would like to happen is for each bucket to receive different amounts. I don't have any solid code to show right now, but here's some pesudo code to illustrate what I've been thinking:
//List<BucketObject> _buckets is a collection of "buckets", each of which has a "quantity" property for holding these numbers.
int _widgetCnt = 950000;
int _bucketCnt = _buckets.Count; //LINQ
//To start, each bucket receives (_widgetCnt / _bucketCnt) or 4750.
for (int _b = 0; b< _bucketCnt - 1; i++)
{
int _rndAmt = _rnd.Next(1, _buckets[i].Quantity/2); //Take SOME from this bucket...
int _rndBucket = _rnd.Next(0,_bucketCnt - 1); //Get a random bucket index from the List<BucketObject> collection.
_buckets.ElementAt(_rndBucket).Quantity += _rndAmt;
_buckets.ElementAt(i).Quantity -= _rndAmt;
}
Is this a statistically/mathematically proper way to handle this, or is there a distribution formula out there that handles this? The kicker is that while this pseudo code would run 200 times (so each bucket has a chance to alter its quantities) it would have to run X number of times depending on the TYPE of widget (which currently stands at just 11 flavors, but is expected to expand significantly in the future).
{EDIT}
This system is for a commodity trading game. Quantities at the 200 shops must differ because the inventory will determine the price at that station. The distro can't be even because that would make all prices the same. Over time, prices will naturally get out of balance, but the inventory must start out off-balance. And all inventories have to be at least similar in scope (ie, no one shop can have 1 item, and another have 900,000)
Sure, there is a solution. You could use Dirichlet Distribution for such task. Property of the distribution is that
Sumi xi = 1
So solution would be to sample 200 (equal the number of buckets) random values from Dirichlet, and then multiply each value by 950,000 (or whatever total inventory is) and that would give you number of items per bucket. If you want non-uniform sampling, you could tweak alpha in the Dirichlet sampling.
Items per bucket shall be rounded up/down, of course, but that is pretty trivial
I have Dirichlet sampling in C# somewhere, if you struggle to implement it - tell me and I would dig it out
UPDATE
I found some code, .NET Core 2, below is the excerpt. I used to sample Dirichlet RNs with the sample alphas, making all of them different is trivial.
//
// Dirichlet sampling, using Gamma sampling from Math .NET
//
using MathNet.Numerics.Distributions;
using MathNet.Numerics.Random;
static void SampleDirichlet(double alpha, double[] rn)
{
if (rn == null)
throw new ArgumentException("SampleDirichlet:: Results placeholder is null");
if (alpha <= 0.0)
throw new ArgumentException($"SampleDirichlet:: alpha {alpha} is non-positive");
int n = rn.Length;
if (n == 0)
throw new ArgumentException("SampleDirichlet:: Results placeholder is of zero size");
var gamma = new Gamma(alpha, 1.0);
double sum = 0.0;
for(int k = 0; k != n; ++k) {
double v = gamma.Sample();
sum += v;
rn[k] = v;
}
if (sum <= 0.0)
throw new ApplicationException($"SampleDirichlet:: sum {sum} is non-positive");
// normalize
sum = 1.0 / sum;
for(int k = 0; k != n; ++k) {
rn[k] *= sum;
}
}
Related
Context: I am building a random-number generating user interface where a user can enter values for the following:
lowerLimit: the lower limit for each randomly generated number
upperLimit: the upper limit for each randomly generated number
maxPrecision: the maximum precision each randomly generated number
Quantity: the maximum number of random number values to be generated
The question is: how can I ensure that at a given lowerLimit/upperLimit range and at a given precision, that the user does not request a greater quantity than is possible?
Example:
lowerLimit: 1
upperLimit: 1.01
maxPrecision: 3
Quantity: 50
At this precision level (3), there are 11 possible values between 1 and 1.01: 1.000, 1.001, 1.002, 1.003, 1.004, 1.005, 1.006, 1.007, 1.008, 1.009, 1.100, yet the user is asking for the top 50.
In one version of the function that returns only distinct values that match user criteria, I am using a dictionary object to store already-generated values and if the value already exists, try another random number until I have found X distinct random number values where X is the user-desired quantity. The problem is, my logic allows for a never-ending loop if the number of possible values is less than the user-entered quantity.
While I could probably employ logic to detect runaway condition, I thought it would be a nicer approach to somehow calculate the quantity of possible return values in advance to make sure it is possible. But that logic is eluding me. (Haven't tried anything because I can't think of how to do it).
Please note: I did see question Generating random, unique values C# but is does not address the specifics of my question relating to number of possible values at a given precision and subsequent runaway condition.
private Random RandomSeed = new Random();
public double GetRandomDouble(double lowerBounds, double upperBounds, int maxPrecision)
{
//Return a randomly-generated double between lowerBounds and upperBounds
//with maximum precision of maxPrecision
double x = (RandomSeed.NextDouble() * ((upperBounds - lowerBounds))) + lowerBounds;
return Math.Round(x, maxPrecision);
}
public double[] GetRandomDoublesUnique(double lowerBounds, double upperBounds, int maxPrecision, int quantity)
{
//This method returns an array of doubles containing randomly-generated numbers
//between user-entered lowerBounds and upperBounds with a maximum precision of
//maxPrecision. The array size is capped at user-entered quantity.
//Create Dictionary to store number values already generated so we can ensure
//we don't have duplicates
Dictionary<double, int> myDoubles = new Dictionary<double, int>();
double[] returnValues = new double[quantity];
double nextValue;
for (int i = 0; i < quantity; i++)
{
nextValue = GetRandomDouble(lowerBounds, upperBounds, maxPrecision);
if (!myDoubles.ContainsKey(nextValue))
{
myDoubles.Add(nextValue, i);
returnValues[i] = nextValue;
}
else
{
i -= 1;
}
}
return returnValues;
}
Number of items can be computed by just subtracting "position" of first from last (pseudo-code below, use Math.Pow to compute 10^x):
(int)(last * 10 ^ precision) - (int)(first * 10 ^ precision)
This may need to be adjusted depending on whether you want boundaries and whether you take decimal (precise) or float/double as input - some +/-1 and Math.Round may need to be sprinkled in to get desired results for all expected values.
After you get number of items there are essentially two cases
there are significantly more choices that desired results (i.e. 1 to 100, take 5 random numbers) - use code you have to filter out duplicates.
there the number of choices is close or less than desired number of results (i.e. 1 to 10, return 11 random numbers) - pre-generate the list of all value and shuffle.
Experiment with the boundary between "significantly more" and "close" - I'd use 25% as boundary ( i.e. 1 to 100, take 76 - use shuffling) to avoid excessive retires close to the end (which is exact reason of slowness/infinite retries of basic approach).
Correct implementation of shuffle is in Randomize a List<T> (check out similar posts like Generating random, unique values C# for more discussion).
The easiest way would probably be to convert the values to integers by multiplying them by 10 ^ precision and then subtract
int lowerInt = (int)(lower * (decimal)Math.Pow(10, precision));
int higherInt = (int)(higher * (decimal)Math.Pow(10, precision));
int possibleValues = higherInt - lowerInt + 1
I feel like it would defeat the purpose of you project to require the user to know how many possible values there are in advance, since it seems like thats what they are hitting this function for in the first place. I'm assuming that requirement was just to alleviate the technical issues you were having. You can just change your loop to this now
for (int i = 0; i < possibleValues; i++)
This is what worked based on Josh Williard's answer.
public double[] GetRandomDoublesUnique(double lowerBounds, double upperBounds, int maxPrecision, int quantity)
{
if (lowerBounds >= upperBounds)
{
throw new Exception("Error in GetRandomDoublesUnique is: LowerBounds is greater than UpperBounds!");
}
//These next few lines are for the purpose of determining the maximum possible number of return values
//possibleValues is populated to prevent a runaway condition that could occurs if the
//max possible values--at the given precision level--is less than the user-selected quantity.
//i.e. if user selects 1 to 1.01, precision of 3, and quantity of 50, there would be a problem
// if we didn't limit loop to the 11 possible values at precision of 3:
//1.000, 1.001, 1.002, 1.003, 1.004, 1.005, 1.006, 1.007, 1.008, 1.009, 1.010
int lowerInt = (int)(lowerBounds * (double)Math.Pow(10, maxPrecision));
int higherInt = (int)(upperBounds * (double)Math.Pow(10, maxPrecision));
int possibleValues = higherInt - lowerInt + 1;
//Create Dictionary to store number values already generated so we can ensure
//we don't have duplicates
Dictionary<double, int> myDoubles = new Dictionary<double, int>();
double[] returnValues = new double[(quantity>possibleValues?possibleValues:quantity)];
double NextValue;
//Iterate through and generate values--limiting to both the user-selected quantity and # of possible values
for (int i = 0; (i < quantity)&&(i<possibleValues); i++)
{
NextValue = GetRandomDouble(lowerBounds, upperBounds, maxPrecision);
if (!myDoubles.ContainsKey(NextValue))
{
myDoubles.Add(NextValue, i);
returnValues[i] = NextValue;
}
else
{
i -= 1;
}
}
return returnValues;
}
First things first:
I have a git repo over here that holds the code of my current efforts and an example data set
Background
The example data set holds a bunch of records in Int32 format. Each record is composed of several bit fields that basically hold info on events where an event is either:
The detection of a photon
The arrival of a synchronizing signal
Each Int32 record can be treated like following C-style struct:
struct {
unsigned TimeTag :16;
unsigned Channel :12;
unsigned Route :2;
unsigned Valid :1;
unsigned Reserved :1; } TTTRrecord;
Whether we are dealing with a photon record or a sync event, time
tag will always hold the time of the event relative to the start of
the experiment (macro-time).
If a record is a photon, valid == 1.
If a record is a sync signal or something else, valid == 0.
If a record is a sync signal, sync type = channel & 7 will give either a value indicating start of frame or end of scan line in a frame.
The last relevant bit of info is that Timetag is 16 bit and thus obviously limited. If the Timetag counter rolls over, the rollover counter is incremented. This rollover (overflow) count can easily be obtained from channel overflow = Channel & 2048.
My Goal
These records come in from a high speed scanning microscope and I would like to use these records to reconstruct images from the recorded photon data, preferably at 60 FPS.
To do so, I obviously have all the info:
I can look over all available data, find all overflows, which allows me to reconstruct the sequential macro time for each record (photon or sync).
I also know when the frame started and when each line composing the frame ended (and thus also how many lines there are).
Therefore, to reconstruct a bitmap of size noOfLines * noOfLines I can process the bulk array of records line by line where each time I basically make a "histogram" of the photon events with edges at the time boundary of each pixel in the line.
Put another way, if I know Tstart and Tend of a line, and I know the number of pixels I want to spread my photons over, I can walk through all records of the line and check if the macro time of my photons falls within the time boundary of the current pixel. If so, I add one to the value of that pixel.
This approach works, current code in the repo gives me the image I expect but it is too slow (several tens of ms to calculate a frame).
What I tried already:
The magic happens in the function int[] Renderline (see repo).
public static int[] RenderlineV(int[] someRecords, int pixelduration, int pixelCount)
{
// Will hold the pixels obviously
int[] linePixels = new int[pixelCount];
// Calculate everything (sync, overflow, ...) from the raw records
int[] timeTag = someRecords.Select(x => Convert.ToInt32(x & 65535)).ToArray();
int[] channel = someRecords.Select(x => Convert.ToInt32((x >> 16) & 4095)).ToArray();
int[] valid = someRecords.Select(x => Convert.ToInt32((x >> 30) & 1)).ToArray();
int[] overflow = channel.Select(x => (x & 2048) >> 11).ToArray();
int[] absTime = new int[overflow.Length];
absTime[0] = 0;
Buffer.BlockCopy(overflow, 0, absTime, 4, (overflow.Length - 1) * 4);
absTime = absTime.Cumsum(0, (prev, next) => prev * 65536 + next).Zip(timeTag, (o, tt) => o + tt).ToArray();
long lineStartTime = absTime[0];
int tempIdx = 0;
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
for (int i = tempIdx; i < someRecords.Length; i++)
{
if (valid[i] == 1 && lineStartTime + (j + 1) * pixelduration >= absTime[i])
{
count++;
}
}
// Avoid checking records in the raw data that were already binned to a pixel.
linePixels[j] = count;
tempIdx += count;
}
return linePixels;
}
Treating photon records in my data set as an array of structs and addressing members of my struct in an iteration was a bad idea. I could increase speed significantly (2X) by dumping all bitfields into an array and addressing these. This version of the render function is already in the repo.
I also realised I could improve the loop speed by making sure I refer to the .Length property of the array I am running through as this supposedly eliminates bounds checking.
The major speed loss is in the inner loop of this nested set of loops:
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
lineStartTime += pixelduration;
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
}
// Doing it here is faster.
linePixels[j] = count;
tempIdx += count;
}
Rendering 400 lines like this in a for loop takes roughly 150 ms in a VM (I do not have a dedicated Windows machine right now and I run a Mac myself, I know I know...).
I just installed Win10CTP on a 6 core machine and replacing the normal loops by Parallel.For() increases the speed by almost exactly 6X.
Oddly enough, the non-parallel for loop runs almost at the same speed in the VM or the physical 6 core machine...
Regardless, I cannot imagine that this function cannot be made quicker. I would first like to eke out every bit of efficiency from the line render before I start thinking about other things.
I would like to optimise the function that generates the line to the maximum.
Outlook
Until now, my programming dealt with rather trivial things so I lack some experience but things I think I might consider:
Matlab is/seems very efficient with vectored operations. Could I achieve similar things in C#, i.e. by using Microsoft.Bcl.Simd? Is my case suited for something like this? Would I see gains even in my VM or should I definitely move to real HW?
Could I gain from pointer arithmetic/unsafe code to run through my arrays?
...
Any help would be greatly, greatly appreciated.
I apologize beforehand for the quality of the code in the repo, I am still in the quick and dirty testing stage... Nonetheless, criticism is welcomed if it is constructive :)
Update
As some mentioned, absTime is ordered already. Therefore, once a record is hit that is no longer in the current pixel or bin, there is no need to continue the inner loop.
5X speed gain by adding a break...
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
else
{
break;
}
}
I have a list that stores well over a million objects within it. I need to look through the list and update the objects found through the below query as efficiently as possible.
I was thinking of using a Dictionary or HashSet, but I'm relatively new to C# and could not figure out how to implement these other two approaches. My current code is simply a LINQ statement searching through an IList.
public IList<LandObject> landObjects = new List<LandObject>();
var lObjsToUpdate = from obj in landObjects
where
obj.position.x >= land.x - size.x &&
obj.position.x <= land.x + size.x &&
obj.position.y >= land.y - size.y &&
obj.position.y <= land.y + size.y
select obj;
foreach(var item in lObjsToUpdate)
{
//do what I need to do with records I've found
}
Could anyone be so kind as to suggest how I might approach this efficiently?
The real answer should involve performance tests and comparisons, and depends on your runtime environment (memory vs. cpu, etc).
The first thing I would try, since land and size are constant, you can maintain a simple HashSet<LandObject> of objects that fit the criteria (in addition to a list or set of all objects or just all other objects). Every time a new object is added, check if it fits the criteria and if so - add it to that set of objects. I don't know how good HashSet is at avoiding collisions when working with object references, but you can try to measure it.
BTW, this related question discussing memory limits of HashSet<int> might interest you. With .NET < 4.5 your HashSet should be ok up to several million items. From what I understand, with some configuration .NET 4.5 removes the limitation of 2gb max object size and you'll be able to go crazy, assuming you have a 64-bit machine.
One thing that will probably help with that many iterations is do the calculations once and use different variables inside your query. Also it should help some to increase the range by 1 on each end and eliminate checking for equals:
public IList<LandObject> landObjects = new List<LandObject>();
float xmax = land.x + size.x + 1;
float xmin = land.x - size.x - 1;
float ymax = land.y + size.y + 1;
float ymin = land.y - size.y - 1;
var lObjsToUpdate = from obj in landObjects
where
obj.position.x > xmin &&
obj.position.x < xmax &&
obj.position.y > ymin &&
obj.position.y < ymax
select obj;
Ideally you need a way to partition elements so that you don't have to test every single one to find the ones that fit and the ones that should be thrown out. How you partition will depend on how dense the items are - it might be as simple as partitioning on the integer portion of the X coordinate for instance, or by some suitable scaled value of that coordinate.
Given a method (let's call it Partition for now) that takes an X coordinate and returns a partition value for it, you can filter on the X coordinate fairly quickly as a first-pass to reduce the total number of nodes you need to check. You might need to play with the partition function a little to get the right distribution though.
For example, say that you have floating-point coordinates in the range -100 < X <= 100, with your 1,000,000+ objects distributed fairly uniformly across that range. That would divide the list into 200 partitions of (on average) 5000 entries if partitioned on integer values of X. That means that for every integer step in the X dimension of your search range you only have ~5,000 entries to test.
Here's some code:
public interface IPosition2F
{
float X { get; }
float Y { get; }
}
public class CoordMap<T> where T : IPosition2F
{
SortedDictionary<int, List<T>> map = new SortedDictionary<int,List<T>>();
readonly Func<float, int> xPartition = (x) => (int)Math.Floor(x);
public void Add(T entry)
{
int xpart = xPartition(entry.X);
List<T> col;
if (!map.TryGetValue(xpart, out col))
{
col = new List<T>();
map[xpart] = col;
}
col.Add(entry);
}
public T[] ExtractRange(float left, float top, float right, float bottom)
{
var rngLeft = xPartition(left) - 1;
var rngRight = xPartition(right) + 1;
var cols =
from keyval in map
where keyval.Key >= rngLeft && keyval.Key <= rngRight
select keyval.Value;
var cells =
from cell in cols.SelectMany(c => c)
where cell.X >= left && cell.X <= right &&
cell.Y >= top && cell.Y <= bottom
select cell;
return cells.ToArray();
}
public CoordMap()
{ }
// Create instance with custom partition function
public CoordMap(Func<float, int> partfunc)
{
xPartition = partfunc;
}
}
That will partition on the X coordinate, reducing your final search space. If you wanted to take it a step further you could also partition on the Y coordinate... I'll leave that as an exercise for the reader :)
If your parition function is very finely grained and could result in a large number of partitions, it might be useful to add a ColumnRange function similar to:
public IEnumerable<List<T>> ColumnRange(int left, int right)
{
using (var mapenum = map.GetEnumerator())
{
bool finished = mapenum.MoveNext();
while (!finished && mapenum.Current.Key < left)
finished = mapenum.MoveNext();
while (!finished && mapenum.Current.Key <= right)
{
yield return mapenum.Current.Value;
finished = mapenum.MoveNext();
}
}
}
The ExtractRange method can then use that like so:
public T[] ExtractRange(float left, float top, float right, float bottom)
{
var rngLeft = xPartition(left) - 1;
var rngRight = xPartition(right) + 1;
var cells =
from cell in ColumnRange(rngLeft, rngRight).SelectMany(c => c)
where cell.X >= left && cell.X <= right &&
cell.Y >= top && cell.Y <= bottom
select cell;
return cells.ToArray();
}
I used SortedDictionary for convenience, and because it makes it possible to do an ExtractRange method that is reasonably quick. There are other container types that are possibly better suited to the task.
I have several ordered List of X/Y Pairs and I want to calculate a ordered List of X/Y Pairs representing the average of these Lists.
All these Lists (including the "average list") will then be drawn onto a chart (see example picture below).
I have several problems:
The different lists don't have the same amount of values
The X and Y values can increase and decrease and increase (and so on) (see example picture below)
I need to implement this in C#, altought I guess that's not really important for the algorithm itself.
Sorry, that I can't explain my problem in a more formal or mathematical way.
EDIT: I replaced the term "function" with "List of X/Y Pairs" which is less confusing.
I would use the method Justin proposes, with one adjustment. He suggests using a mappingtable with fractional indices, though I would suggest integer indices. This might sound a little mathematical, but it's no shame to have to read the following twice(I'd have to too). Suppose the point at index i in a list of pairs A has searched for the closest points in another list B, and that closest point is at index j. To find the closest point in B to A[i+1] you should only consider points in B with an index equal to or larger than j. It will probably by j + 1, but could be j or j + 2, j + 3 etc, but never below j. Even if the point closest to A[i+1] has an index smaller than j, you still shouldn't use that point to interpolate with, since that would result in an unexpected average and graph. I'll take a moment now to create some sample code for you. I hope you see that this optimalization makes sense.
EDIT: While implementing this, I realised that j is not only bounded from below(by the method described above), but also bounded from above. When you try the distance from A[i+1] to B[j], B[j+1], B[j+2] etc, you can stop comparing when the distance A[i+1] to B[j+...] stops decreasing. There's no point in searching further in B. The same reasoning applies as when j was bounded from below: even if some point elsewhere in B would be closer, that's probably not the point you want to interpolate with. Doing so would result in an unexpected graph, probably less smooth than you'd expect. And an extra bonus of this second bound is the improved performance. I've created the following code:
IEnumerable<Tuple<double, double>> Average(List<Tuple<double, double>> A, List<Tuple<double, double>> B)
{
if (A == null || B == null || A.Any(p => p == null) || B.Any(p => p == null)) throw new ArgumentException();
Func<double, double> square = d => d * d;//squares its argument
Func<int, int, double> euclidianDistance = (a, b) => Math.Sqrt(square(A[a].Item1 - B[b].Item1) + square(A[a].Item2 - B[b].Item2));//computes the distance from A[first argument] to B[second argument]
int previousIndexInB = 0;
for (int i = 0; i < A.Count; i++)
{
double distance = euclidianDistance(i, previousIndexInB);//distance between A[i] and B[j - 1], initially
for (int j = previousIndexInB + 1; j < B.Count; j++)
{
var distance2 = euclidianDistance(i, j);//distance between A[i] and B[j]
if (distance2 < distance)//if it's closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point.
{
distance = distance2;
previousIndexInB = j;
}
else
{
break;//don't place the yield return statement here, because that could go wrong at the end of B.
}
}
yield return LinearInterpolation(A[i], B[previousIndexInB]);
}
}
Tuple<double, double> LinearInterpolation(Tuple<double, double> a, Tuple<double, double> b)
{
return new Tuple<double, double>((a.Item1 + b.Item1) / 2, (a.Item2 + b.Item2) / 2);
}
For your information, the function Average returns the same amount of interpolated points the list A contains, which is probably fine, but you should think about this for your specific application. I've added some comments in it to clarify some details, and I've described all aspects of this code in the text above. I hope it's clear, and otherwise feel free to ask questions.
SECOND EDIT: I misread and thought you had only two lists of points. I have created a generalised function of that above accepting multiple lists. It still uses only those principles explained above.
IEnumerable<Tuple<double, double>> Average(List<List<Tuple<double, double>>> data)
{
if (data == null || data.Count < 2 || data.Any(list => list == null || list.Any(p => p == null))) throw new ArgumentException();
Func<double, double> square = d => d * d;
Func<Tuple<double, double>, Tuple<double, double>, double> euclidianDistance = (a, b) => Math.Sqrt(square(a.Item1 - b.Item1) + square(a.Item2 - b.Item2));
var firstList = data[0];
for (int i = 0; i < firstList.Count; i++)
{
int[] previousIndices = new int[data.Count];//the indices of points which are closest to the previous point firstList[i - 1].
//(or zero if i == 0). This is kept track of per list, except the first list.
var closests = new Tuple<double, double>[data.Count];//an array of points used for caching, of which the average will be yielded.
closests[0] = firstList[i];
for (int listIndex = 1; listIndex < data.Count; listIndex++)
{
var list = data[listIndex];
double distance = euclidianDistance(firstList[i], list[previousIndices[listIndex]]);
for (int j = previousIndices[listIndex] + 1; j < list.Count; j++)
{
var distance2 = euclidianDistance(firstList[i], list[j]);
if (distance2 < distance)//if it's closer than the previously checked point, keep searching. Otherwise stop the search and return an interpolated point.
{
distance = distance2;
previousIndices[listIndex] = j;
}
else
{
break;
}
}
closests[listIndex] = list[previousIndices[listIndex]];
}
yield return new Tuple<double, double>(closests.Select(p => p.Item1).Average(), closests.Select(p => p.Item2).Average());
}
}
Actually that I did the specific case for 2 lists separately might have been a good thing: it is easily explained and offers a step before understanding the generalised version. Furthermore, the square root could be taken out, since it doesn't change the order of the distances when sorted, just the lengths.
THIRD EDIT: In the comments it became clear there might be a bug. I think there are none, aside from the mentioned small bug, which shouldn't make any difference except for at the end of the graphs. As a proof that it actually works, this is the result of it(the dotted line is the average):
I'll use a metaphor of your functions being cars racing down a curvy racetrack, where you want to extract the center-line of the track given the cars' positions. Each car's position can be described as a function of time:
p1(t) = (x1(t), y1(t))
p2(t) = (x2(t), y2(t))
p3(t) = (x3(t), y3(t))
The crucial problem is that the cars are racing at different speeds, which means that p1(10) could be twice as far down the race track as p2(10). If you took a naive average of these two points, and there was a sharp curve in the track between the cars, the average may be far from the track.
If you could just transform your functions to no longer be a function of time, but a function of the distance along the track, then you would be able to do what you want.
One way you could do this would be to choose the slowest car (i.e., the one with the greatest number of samples). Then, for each sample of the slowest car's position, look at all of the other cars' paths, find the two closest points, and choose the point on the interpolated line which is closest to the slowest car's position. Then average these points together. Once you do this for all of the slow car's samples, you have an average path.
I'm assuming that all of the cars start and end in roughly the same places; if any of the cars just race a small portion of the track, you will need to add some more logic to detect that.
A possible improvement (for both performance and accuracy), is to keep track of the most recent sample you are using for each car and the speed of each car (the relative sampling rate). For your slowest car, it would be a simple map: 1 => 1, 2 => 2, 3 => 3, ... For the other cars, though, it could be more like: 1 => 0.3, 2 => 0.7, 3 => 1.6 (fractional values are due to interpolation). The speed would be the inverse of the change in sample number (e.g., the slow car would have speed 1, and the other car would have speed 1/(1.6-0.7)=1.11). You could then ensure that you don't accidentally backtrack on any of the cars. You could also improve the calculation speed because you don't have to search through the whole set of all points on each path; instead, you can assume that the next sample will be somewhere close to the current sample plus 1/speed.
As these are not y=f(x) functions, are they perhaps something like (x,y)=f(t)?
If so, you could interpolate along t, and calculate avg(x) and avg(y) for each t.
EDIT This of course assumes that t can be made available to your code - so that you have an ordered list of T/X/Y triples.
There are several ways this can be done. One is to combine all of your data into one single set of points, and do a best-fit curve through the combined set.
you have e.g. 2 "functions" with
fc1 = { {1,0.3} {2, 0.5} {3, 0.1} }
fc1 = { {1,0.1} {2, 0.8} {3, 0.4} }
You want the arithmetic mean (slang: "average") of the two functions. To do this you just calculate the pointwise arithmetic mean:
fc3 = { {1, (0.3+0.1)/2} ... }
Optimization:
If you have large numbers of points you should first convert your "ordered List of X/Y Pairs" into a Matrix OR at least store the points column-wise like so:
{0.3, 0.1}, {0.5, 0.8}, {0.1, 0.4}
I have a coding/maths problem that I need help translating into C#. It's a poker chip calculator that takes in the BuyIn, the number of players and the total amount of chips for each colour (there are x amount of colours) and their value.
It then shows you every possible combination of chips per person to equal the Buy In. The user can then pick the chipset distribution they would like to use. It's best illustrated with a simple example.
BuyIn: $10
Number of Players: 1
10 Red Chips, $1 value
10 Blue Chips, $2 value
10 Green Chips, $5 value
So, the possible combinations are:
R/B/G
10/0/0
8/1/0
6/2/0
5/0/1
4/3/0
2/4/0
1/2/1
etc.
I have spent a lot of time trying to come up with an algorithm in C#/.NET to work this out. I am stumbling on the variable factor - there's usually only 3 or 4 different chips colours in a set, but there could be any amount. If you have more than one player than you have to count up until TotalChips / NumberOfPlayers.
I started off with a loop through all the chips and then looping from 0 up to NumberOfChips for that colour. And this is pretty much where I have spent the last 4 hours... how do I write the code to loop through x amount of chips and check the value of the sum of the chips and add it to a collection if it equals the BuyIn? I need to change my approach radically methinks...
Can anyone put me on the right track on how to solve this please? Pseudo code would work - thank you for any advice!
The below is my attempt so far - it's hopeless (and wont compile, just an example to show you my thought process so far) - Might be better not to look at it as it might biased you on a solution...
private void SplitChips(List<ChipSuggestion> suggestions)
{
decimal valueRequired = (decimal)txtBuyIn.Value;
decimal checkTotal = 0;
ChipSuggestion suggestion;
//loop through each colour
foreach (Chip chip in (PagedCollectionView)gridChips.ItemsSource)
{
//for each value, loop through them all again
foreach (Chip currentChip in (PagedCollectionView)gridChips.ItemsSource)
{
//start at 0 and go all the way up
for (int i = 0; i < chip.TotalChipsInChipset; i++)
{
checkTotal = currentChip.ChipValue * i;
//if it is greater than than ignore and stop
if (checkTotal > valueRequired)
{
break;
}
else
{
//if it is equal to then this is a match
if (checkTotal == valueRequired)
{
suggestion = new ChipSuggestion();
suggestion.SuggestionName = "Suggestion";
chipRed.NumberPerPlayer = i;
suggestion.Chips.Add(chipRed);
chipBlue.NumberPerPlayer = y;
suggestion.Chips.Add(chipBlue);
chipGreen.NumberPerPlayer = 0;
suggestion.Chips.Add(chipGreen);
//add this to the Suggestion
suggestions.Add(suggestion);
break;
}
}
}
}
}
}
Here's an implementation that reads the number of chips, the chips (their worth and amount) and the buyin and displays the results in your example format. I have explained it through comments, let me know if you have any questions.
class Test
{
static int buyIn;
static int numChips;
static List<int> chips = new List<int>(); // chips[i] = value of chips of color i
static List<int> amountOfChips = new List<int>(); // amountOfChips[i] = number of chips of color i
static void generateSolutions(int sum, int[] solutions, int last)
{
if (sum > buyIn) // our sum is too big, return
return;
if (sum == buyIn) // our sum is just right, print the solution
{
for (int i = 0; i < chips.Count; ++i)
Console.Write("{0}/", solutions[i]);
Console.WriteLine();
return; // and return
}
for (int i = last; i < chips.Count; ++i) // try adding another chip with the same value as the one added at the last step.
// this ensures that no duplicate solutions will be generated, since we impose an order of generation
if (amountOfChips[i] != 0)
{
--amountOfChips[i]; // decrease the amount of chips
++solutions[i]; // increase the number of times chip i has been used
generateSolutions(sum + chips[i], solutions, i); // recursive call
++amountOfChips[i]; // (one of) chip i is no longer used
--solutions[i]; // so it's no longer part of the solution either
}
}
static void Main()
{
Console.WriteLine("Enter the buyin:");
buyIn = int.Parse(Console.ReadLine());
Console.WriteLine("Enter the number of chips types:");
numChips = int.Parse(Console.ReadLine());
Console.WriteLine("Enter {0} chips values:", numChips);
for (int i = 0; i < numChips; ++i)
chips.Add(int.Parse(Console.ReadLine()));
Console.WriteLine("Enter {0} chips amounts:", numChips);
for (int i = 0; i < numChips; ++i)
amountOfChips.Add(int.Parse(Console.ReadLine()));
int[] solutions = new int[numChips];
generateSolutions(0, solutions, 0);
}
}
Enter the buyin:
10
Enter the number of chips types:
3
Enter 3 chips values:
1
2
5
Enter 3 chips amounts:
10
10
10
10/0/0/
8/1/0/
6/2/0/
5/0/1/
4/3/0/
3/1/1/
2/4/0/
1/2/1/
0/5/0/
0/0/2/
Break the problem down recursively by the number of kinds of chips.
For the base case, how many ways are there to make an $X buy-in with zero chips? If X is zero, there is one way: no chips. If X is more than zero, there are no ways to do it.
Now we need to solve the problem for N kinds of chips, given the solution for N - 1. We can take one kind of chip, and consider every possible number of that chip up to the buy-in. For example, if the chip is $2, and the buy-in is $5, try using 0, 1, or 2 of them. For each of these tries, we have to use only the remaining N - 1 chips to make up the remaining value. We can solve that by doing a recursive call, and then adding our current chip to each solution it returns.
private static IEnumerable<IEnumerable<Tuple<Chip, int>>> GetAllChipSuggestions(List<Chip> chips, int players, int totalValue)
{
return GetAllChipSuggestions(chips, players, totalValue, 0);
}
private static IEnumerable<IEnumerable<Tuple<Chip, int>>> GetAllChipSuggestions(List<Chip> chips, int players, int totalValue, int firstChipIndex)
{
if (firstChipIndex == chips.Count)
{
// Base case: we have no chip types remaining
if (totalValue == 0)
{
// One way to make 0 with no chip types
return new[] { Enumerable.Empty<Tuple<Chip, int>>() };
}
else
{
// No ways to make more than 0 with no chip types
return Enumerable.Empty<IEnumerable<Tuple<Chip, int>>>();
}
}
else
{
// Recursive case: try each possible number of this chip type
var allSuggestions = new List<IEnumerable<Tuple<Chip, int>>>();
var currentChip = chips[firstChipIndex];
var maxChips = Math.Min(currentChip.TotalChipsInChipset / players, totalValue / currentChip.ChipValue);
for (var chipCount = 0; chipCount <= maxChips; chipCount++)
{
var currentChipSuggestion = new[] { Tuple.Create(currentChip, chipCount) };
var remainingValue = totalValue - currentChip.ChipValue * chipCount;
// Get all combinations of chips after this one that make up the rest of the value
foreach (var suggestion in GetAllChipSuggestions(chips, players, remainingValue, firstChipIndex + 1))
{
allSuggestions.Add(suggestion.Concat(currentChipSuggestion));
}
}
return allSuggestions;
}
}
For some large combinations this is propably not solvable in finite time.
(It is a NP problem)
http://en.wikipedia.org/wiki/Knapsack_problem
There are also links with Code? that could help you.
Hope this helps a bit.