Efficient averaging (moving average) - c#

I have a stream of data (integers) with given (constant) frequency. From time to time I need to compute different averages (predefined). I am looking for solution to do it fast and efficient.
Assumptions:
Sampling rate is constant (predefined) and might be something between 125-500 SPS
Averages I need to compute are predefined and it might me one average or many (for example only last 200ms average or last 250ms and last 500ms). There might be many averages but they are predefined!
At any time I need to be able to compute current average (real time)
What I have right now:
I assume that in particular timeframe there will be always the same amount of data. So having frequency 100SPS I assume that one second contain exactly 100 values
Queue with constant length is created (something like buffer)
For EVERY defined average, Sum variable is created
Every time new sample arrive I place it on the queue.
Every time I have new sample in the queue I add its value to the every Sum variables I have and also remove value of element which is out of the window (based on position in Queue)
Once I need to compute average I just take the particular Sum variable and divide it by number of elements this Sum should contain
To give you more better insight there is a code which I have right now:
public class Buffer<T> : LinkedList<T>
{
private readonly int capacity;
public bool IsFull => Count >= capacity;
public Buffer(int capacity)
{
this.capacity = capacity;
}
public void Enqueue(T item)
{
if (Count == capacity)
{
RemoveFirst();
}
AddLast(item);
}
}
public class MovingAverage
{
private readonly Buffer<float> Buffer;
private static readonly object bufferLock = new object();
public Dictionary<string, float> Sums { get; private set; }
public Dictionary<string, int> Counts { get; private set; }
public MovingAverage(List<int> sampleCounts, List<string> names)
{
if (sampleCounts.Count != names.Count)
{
throw new ArgumentException("Wrong Moving Averages parameters");
}
Buffer = new Buffer<float>(sampleCounts.Max());
Sums = new Dictionary<string, float>();
Counts = new Dictionary<string, int>();
for (int i = 0; i < names.Count; i++)
{
Sums[names[i]] = 0;
Counts[names[i]] = sampleCounts[i];
}
}
public void ProcessAveraging(float val)
{
lock (bufferLock)
{
if (float.IsNaN(val))
{
val = 0;
}
foreach (var keyVal in Counts.OrderBy(a => a.Value))
{
Sums[keyVal.Key] += val;
if (Buffer.Count >= keyVal.Value)
{
Sums[keyVal.Key] -= Buffer.ElementAt(Buffer.Count - keyVal.Value);
}
}
Buffer.Enqueue(val);
}
}
public float GetLastAverage(string averageName)
{
lock (bufferLock)
{
if (Buffer.Count >= Counts[averageName])
{
return Sums[averageName] / Counts[averageName];
}
else
{
return Sums[averageName] / Buffer.Count;
}
}
}
}
That works really nice and is fast enough but in real world having 100 SPS doesnt really mean you will always have 100 samples in 1 second. Sometimes its 100, sometimes 99, sometimes 101. Computing these averages is critical for my system and 1 sample more or less could change a lot. Thats why I need a real timer telling me whether sample is already out of moving-average window or not.
The idea with adding timestamp to every sample seems to be promising

Plenty of answers here.. Might as well add another one :)
This one might need some minor debugging for "off by one" etc - I didn't have a real dataset to work with so perhaps treat it as pseudocode
It's like yours: there's a buffer that is circular - give it enough capacity to hold N samples where N is enough to inspect your moving averages - 100 SPS and want to inspect 250ms I think you'll need at least 25, but we aren't short on space so you could make it more
struct Cirray
{
long _head;
TimedFloat[] _data;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public void Add(float f)
{
_data[_head++%_data.Length] = new TimedFloat() { F = f };
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
}
struct TimedFloat
{
[DllImport("Kernel32.dll", CallingConvention = CallingConvention.Winapi)]
private static extern void GetSystemTimePreciseAsFileTime(out long filetime);
private float _f;
public float F { get => _f;
set {
_f = value;
GetSystemTimePreciseAsFileTime(out long x);
T = DateTime.FromFileTimeUtc(x).Ticks;
}
}
public long T;
}
The normal DateTime.UtcNow isn't very precise - about 16ms - so it's probably no good for timestamping data like this if youre saying that even one sample could throw it off. Instead we can make it so we get the ticks equivalent of the high resolution timer, if your system supports it (if not, you might have to change system, or abuse a StopWatch class into giving a higher resolution supplement) and we're timestamping every data item.
I thought about going to the complexity of maintaining N number of constantly moving pointers to various tail ends of the data and dec/incrementing N number of sums - it could still be done (and you clearly know how) but your question read like you'd probably call for the averages infrequently enough that an N sums/counts solution would spend more time maintaining the counts than it would to just run through 250 or 500 floats every now and then and just add them up. GetAverages as a result takes an array of ticks (10 thousand per ms) of the ranges you want the data over, e.g. new[] { 50 * 10000, 100 * 10000, 150 * 10000, 200 * 10000, 250 * 10000 } for 50ms to 250ms in steps of 50, and it starts at the current head and sums backwards until the point where it's going to break a time boundary (and this might be the off-by-one bit) whereupon it yields the average for that timespan, then resumes summing and counting (the count given by math of the start minus the current index) for the next time span.. I think I understood right that you want e.g. the "average over the last 50ms" and "average over the last 100ms", not "average for the recent 50ms" and "average for the 50ms before recent"
Edit:
Thought about it some more and did this:
struct Cirray
{
long _head;
TimedFloat[] _data;
RunningAverage[] _ravgs;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public Cirray(int capacity, int[] deltas) : this(capacity)
{
_ravgs = new RunningAverage[deltas.Length];
for (int i = 0; i < deltas.Length; i++)
_ravgs[i] = new RunningAverage() { OverMilliseconds = deltas[i] };
}
public void Add(float f)
{
//in c# every assignment returns the assigned value; capture it for use later
var addedTF = (_data[_head++ % _data.Length] = new TimedFloat() { F = f });
if (_ravgs == null)
return;
foreach (var ra in _ravgs)
{
//add the new tf to each RA
ra.Count++;
ra.Total += addedTF.F;
//move the end pointer in the RA circularly up the array, subtracting/uncounting as we go
var boundary = addedTF.T - ra.OverMilliseconds;
while (_data[ra.EndPointer].T < boundary) //while the sample is timed before the boundary, move the
{
ra.Count--;
ra.Total -= _data[ra.EndPointer].F;
ra.EndPointer = (ra.EndPointer + 1) % _data.Length; //circular indexing
}
}
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
public IEnumerable<float> GetAverages() //from the built ins
{
foreach (var ra in _ravgs)
{
if (ra.Count == 0)
yield return 0;
else
yield return (float)(ra.Total / ra.Count);
}
}
}
Absolutely haven't tested it, but it embodies my thinking in the comments

Instead of using a linked list I would fall back to some internal functions as array copy. In this answer I included a possible rewrite for your buffer class. Taking over the idea to keep a sum at every position.
This buffer keeps track of all the sums but in order to do that it needs to sum up every item with the new value. Based on the frequency you need to get that average it might be better to sum up when you need it and only keep the individual values.
In any way I just wanted to point out how you could do it with Array.Copy
public class BufferSum
{
private readonly int _capacity;
private readonly int _last;
private float[] _items;
public int Count { get; private set; }
public bool IsFull => Count >= _capacity;
public BufferSum(int capacity)
{
_capacity = capacity;
_last = capacity - 1;
_items = new float[_capacity];
}
public void Enqueue(float item)
{
if (Count == _capacity)
{
Array.Copy(_items, 1, _items, 0, _last);
_items[_last] = 0;
}
else
{
Count++;
}
for (var i = 0; i < Count; i ++)
{
_items[i] += item;
}
}
public float Avarage => _items[0] / Count;
public float AverageAt(int ms, int fps)
{
var _pos = Convert.ToInt32(ms / 1000 * fps);
return _items[Count - _pos] / _pos;
}
}
Additional be careful with the lock statement that will take a lot of time to.

Make an array of size 500, int counter c.
For every sample:
summ -= A[c % 500] //remove old value
summ += sample
A[c % 500] = sample //replace it with new value
c++
if needed, calculate
average = summ / 500

You always want to remove the oldest element on one side of your sequence and add a new element at the other side of the sequence: you need a queue instead of a stack.
I think a round list will be faster: as long as you have not the maximum size, just add the elements, once you've got the maximum size, replace the oldest element.
This seems like a nice reusable class. Later we'll add the moving average part.
class RoundArray<T>
{
public RoundArray(int maxSize)
{
this.MaxSize = maxSize;
this.roundArray = new List<T>(maxSize);
}
private readonly int maxSize;
private readonly List<T> roundArray;
public int indexOldestItem = 0;
public void Add(T item)
{
// if list not full, just add
if (this.roundArray.Count < this.maxSize)
this.roundArray.Add(item);
else
{
// list is full, replace the oldest item:
this.roundArray[oldestItem] = item;
oldestItem = (oldestItem + 1) % this.maxSize;
}
public int Count => this.roundArray.Count;
public T Oldest => this.roundArray[this.indexOldestItem];
}
}
To make this class useful, add methods to enumerate the data, starting at the oldest or the newest, consider to add other useful reusable methods. Maybe you should implement IReadOnlyCollection<T>. Maybe some private fields should have public properties.
Your moving average calculator will use this RoundArray. Whenever an item is added, and your roundArray is not full yet, the item is added to the sum and to the round array.
If the roundArray is full, then the item replaces the oldest item. You subtract the value of the OldestItem from the Sum, and add the new Item to the Sum.
class MovingAverageCalculator
{
public MovingAverageCalculator(int maxSize)
{
this.roundArray = new RoundArray<int>(maxSize);
}
private readonly RoundArray<int> roundArray;
private int sum = 0;
private int Count => this.RoundArray.Count;
private int Average => this.sum / this.Count;
public voidAdd(int value)
{
if (this.Count == this.MaxSize)
{
// replace: remove the oldest value from the sum and add the new one
this.Sum += value - this.RoundArray.Oldest;
}
else
{
// still building: just add the new value to the Sum
this.Sum += value;
}
this.RoundArray.Add(value);
}
}

Cumulative sums.
Compute a series of cumulative sums1 for every block of ~1000 or so elements. (Could be less however 500 or 1000 is not that much of a difference and this will be more comfortable) You want to hold every block as long as at least one element inside is relevant. Then it can be recycled.2
When you need your current sum and you are within one block, your desired sum is:block[max_index] - block[last_relevant_number].
For the case when you are at the borderline of two blocks b1, b2 in this order, your desired sum is:
b1[b1.length - 1] - b1[last_relevant_number] + b2[max_index]
And we are done. The main advantage of this approach is that you don't need to know beforehands how many elements you want to keep and you can compute the result on the go.
You also don't need to handle the removal of the elements as you will naturally overwrite them when you recycle the segment - keeping the indices is all you need.
Example: let us have a constant timeseries ts = [1,1,1, .... 1]. The cumulative sums of the series will be cumsum = [1,2,3 ... n]. The sum from i-th to the j-th(inclusive) element of the ts will be cumsum[j] - cumsum[i - 1] = j - i - 1. For i = 5, j = 6 it will be 6 - 4 = 2 which is correct.
1 For array [1,2,3,4,5] these would be [1,3,6,10,15] - just for the sake of completeness.
2 Since you mentioned ~500 elements, two blocks should be enough.

Related

How to wrap around the index when reaching the end or beginning of an array in c#?

Im currently writing a weapon script for a FPS and I want to switch weapons with the mouse wheel. I created an array with the weapons in it and everytime I scroll up with the mouse wheel the index of the weapon increases by one. My problem is that when I'm at the last weapon I get IndexOutOfBounds error message. I've tried to reset the weapon index to 0 if its at the end of the array but for some reason that didn't work. I've also tried to do it with a while loop instead of an if-statement but that didn't work as well. Here's the code:
public class WeaponManager : MonoBehaviour
{
[SerializeField]
private WeaponHandler[] weapons;
private int current_weapon_index;
void Start()
{
current_weapon_index = 0;
weapons[current_weapon_index].gameObject.SetActive(true);
}
void Update()
{
if (Input.GetKeyDown(KeyCode.Alpha1))
{
TurnOnSelectedWeapon(0);
}
if (Input.GetKeyDown(KeyCode.Alpha2))
{
TurnOnSelectedWeapon(1);
}
if (Input.GetKeyDown(KeyCode.Alpha3))
{
TurnOnSelectedWeapon(2);
}
if (Input.GetKeyDown(KeyCode.Alpha4))
{
TurnOnSelectedWeapon(3);
}
if (Input.GetKeyDown(KeyCode.Alpha5))
{
TurnOnSelectedWeapon(4);
}
if (Input.GetKeyDown(KeyCode.Alpha6))
{
TurnOnSelectedWeapon(5);
}
if(Input.mouseScrollDelta.y > 0)
{
SwitchToNextWeapon();
}
if (Input.mouseScrollDelta.y < 0)
{
SwitchToPreviousWeapon();
}
}
void TurnOnSelectedWeapon(int weaponIndex)
{
weapons[current_weapon_index].gameObject.SetActive(false);
weapons[weaponIndex].gameObject.SetActive(true);
current_weapon_index = weaponIndex;
}
void SwitchToNextWeapon()
{
weapons[current_weapon_index].gameObject.SetActive(false);
current_weapon_index++;
weapons[current_weapon_index].gameObject.SetActive(true);
if (current_weapon_index >= weapons.Length)
{
current_weapon_index = 0;
}
}
void SwitchToPreviousWeapon()
{
weapons[current_weapon_index].gameObject.SetActive(false);
current_weapon_index--;
weapons[current_weapon_index].gameObject.SetActive(true);
}
}
void SwitchToNextWeapon()
{
weapons[current_weapon_index].gameObject.SetActive(false);
var temp = current_weapon_index + 1;
current_weapon_index = temp >= weapons.Count() ? 0 : temp;
weapons[current_weapon_index].gameObject.SetActive(true);
}
void SwitchToPreviousWeapon()
{
weapons[current_weapon_index].gameObject.SetActive(false);
var temp = current_weapon_index - 1;
current_weapon_index = temp < 0 ? weapons.Count() - 1 : temp;
weapons[current_weapon_index].gameObject.SetActive(true);
}
Just add a check before increasing or decreasing current weapon index. If it reaches max, then revert to 0 and if it's reached min (0) then set index to max.
It would be quite straightforward to implement a class which handles this for you, just maintaining an internal List<> of items. Add a Current property to read the currently selected item, as well as a MoveNext/MovePrevious methods to be called from your mouse wheel handler
public class ContinuousList<T>
{
private List<T> internalList = new List<T>();
private int currentIndex = 0;
public void Add(T item) => internalList.Add(item);
public T Current { get => internalList[currentIndex]; }
public void MoveNext()
{
currentIndex++;
if(currentIndex >= internalList.Count) currentIndex = 0;
}
public void MovePrevious()
{
currentIndex--;
if(currentIndex <= 0) currentIndex = internalList.Count - 1;
}
}
Usage assuming you maybe have some weapons which have a base class Weapon:
var weaponList = new ContinuousList<Weapon>();
weaponList.Add(new Sword());
weaponList.Add(new Axe());
var currentWeapon = weaponList.Current; // gets Sword
weaponList.MoveNext();
var currentWeapon = weaponList.Current; // gets Axe
weaponList.MoveNext();
var currentWeapon = weaponList.Current; // Back to Sword
Live example: https://dotnetfiddle.net/Ji7rkt
Note that it is very easy to implement IEnumerable<T> on this ContinuousList so that it can be used in any enumerations and with LINQ methods. I did not want to complicate a simple example with this but to see this in action check here: https://dotnetfiddle.net/NtdfDi
Cyclic values are easily handled using a modulo (%) operator.
int mod = 5;
for (int i = 0; i < 10; i++)
{
Console.WriteLine(i % mod);
}
You'll see that the output cycles from 0 to mod-1: 0,1,2,3,4,0,1,2,...
This covers the case for incrementing by one:
int index = 4;
int mod = myArray.Length; // assume 5 items in the array
// Increment and cycle
index = ++index % mod;
You'll see that index is now 0 because you were at the end of the list, so the next item should be at the start of the list.
However, there is a bit of an issue for decrementing cyclical values. For a reason I don't understand, C# has opted to allow for negative modulo values, i.e.:
-1 % 5 = -1
... instead of 4, which is what you'd expect.
Edit: It is contended in the comments that 4 is not what everyone would expect. From experience when I was tackling this issue for the first time, I found a lot of confusion/annoyance online at the existence of negative modulo results, but I cannot disprove that this is observation bias on my part.
I've tackled this issue in the past, and the easiest way to solve this is to:
Take the modulo
Add the modulo
Take the modulo again
In essence, if the first step ends up with a negative result (e.g. -1), we simply add the modulo, therefore pushing the value above zero. However, if the first step was already a positive result, we've now made the value too high. Therefore, by taking the modulo again, we are able to cancel out the potentially too high value. This covers both cases.
Here is a dotnetfiddle to prove that it works.
In other words:
public int Increment(int current, int mod)
{
return ((++current % mod) + mod) % mod;
}
public int Decrement(int current, int mod)
{
return ((--current % mod) + mod) % mod;
}
For the sake of DRY, you can reshape it so you only use this complex formula once
public int Cycle(int current, int mod)
{
return ((current % mod) + mod) % mod;
}
... but then you have to manually in/decrement the value first. Which version you prefer is up to you.
This answer is for swift but basically applies in the same way in c#.
So in general you can wrap any given index to an array length by applying modulo twice.
This works in more general cases, not only for moving a single step up and down:
public static class ArrayUtils
{
public static void Forward<T>(ref int currentIndex, T[] array, int amount)
{
currentIndex = WrapIndex(currentIndex + amount, array);
}
public static void Backward<T>(ref int currentIndex, T[] array, int amount)
{
currentIndex = WrapIndex(currentIndex - amount, array);
}
public static int WrapIndex<T>(int newIndex, T[] array)
{
var length = array.Length;
return ((newIndex % length) + length) % length;
}
}
See Fiddle
Which you can now use for any array.

Compare current and last value in list

There is a moving average suppose: 2, 4, 6 , 8 , 10...n;
Then add the current value (10) to list
List<int>numHold = new List<int>();
numhold.Add(currentvalue);
Inside the list:
the current value is added
10
and so on
20
30
40 etc
by using
var lastdigit = numHold[numhold.Count -1];
I can get the last digit but the output is
current: 10 last: 10
current: 20 last: 20
the output should be
current: 20 last: 10
Thanks
Typically, C# indexers start from 0, so the first element has index 0. On the other hand, Count/Length will use 1 for one element. So your
numHold[numhold.Count - 1]
actually takes the last element in the list. If you need the one before that, you need to use - 2 - though be careful you do not reach outside of the bounds of the list (something like Math.Max(0, numhold.Count - 2) might be appropriate).
You can also store the values in separate variables:
List<int> nums = new List<int> { 1 };
int current = 1;
int last = current;
for (int i = 0; i < 10; i++)
{
last = current;
current = i * 2;
nums.Add(current);
}
Console.WriteLine("Current: {0}", current);
Console.WriteLine("Last: {0}", last);
Question is so unclear, but if ur using moving average to draw a line graph 📈 you would use a circular buffer which can be implemented by urself utilizing an object that contains an array of specified size, and the next available position. You could also download a nuget package that already has it done.
A relatively simple way to calculate a moving average is to use a circular buffer to hold the last N values (where N is the number of values for which to compute a moving average).
For example:
public sealed class MovingAverage
{
private readonly int _max;
private readonly double[] _numbers;
private double _total;
private int _front;
private int _count;
public MovingAverage(int max)
{
_max = max;
_numbers = new double[max];
}
public double Average
{
get { return _total / _count; }
}
public void Add(double value)
{
_total += value;
if (_count == _max)
_total -= _numbers[_front];
else
++_count;
_numbers[_front] = value;
_front = (_front+1)%_max;
}
};
which you might use like this:
var test = new MovingAverage(11);
for (int i = 0; i < 25; ++i)
{
test.Add(i);
Console.WriteLine(test.Average);
}
Note that this code is optimised for speed. After a large number of iterations, you might start to get rounding errors. You can avoid this by adding to class MovingAverage a slower method to calculate the average instead of using the Average property:
public double AccurateAverage()
{
double total = 0;
for (int i = 0, j = _front; i < _count; ++i)
{
total += _numbers[j];
if (--j < 0)
j = _max - 1;
}
return total/_count;
}
Your last item will always be at position 0.
List<int>numHold = new List<int>();
numHold.add(currentvalue); //Adding 10
numHold[0]; // will contain 10
numHold.add(currentvalue); //Adding 20
numHold[0]; // will contain 10
numHold[numhold.Count - 1]; // will contain 20
the better way to get first and last are
numHold.first(); //Actually last in your case
numHold.last(); //first in your case

How to elegantly assign values in function of number range?

I am searching of an elegant way to assign values in function of a number belonging to a specific range.
For example, having the number X, the elegant way would return:
'a' - if X is between 0 and 1000
'b' - if X is between 1000 and 1500
and so on (but a fixed number of defined intervals)
By elegant I mean something more appealing than
if ((x => interval_1) && (x < interval_2))
class_of_x = 'a';
else if ((x => interval_2) && (x < interval_3))
class_of_x = 'b';
...
or
if(Enumerable.Range(interval_1, interval_2).Contains(x))
class_of_x = 'a';
else if(Enumerable.Range(interval_2 + 1, interval_3).Contains(x))
class_of_x = 'b';
...
I hate seeing so many IFs.
Also, the interval values can be stored in a collection (maybe this would help me eliminate the ISs?), not necessary as interval_1, interval_2 and so on.
Somewhat inspired by this question How to elegantly check if a number is within a range? which came out while looking for a solution for the problem described above.
You can create extention method:
public static class IntExtensions
{
// min inclusive, max exclusive
public static bool IsBetween(this int source, int min, int max)
{
return source >= min && source < max
}
}
and then
// Item1 = min, Item2 = max, Item3 = character class
IList<Tuple<int, int, char>> ranges = new List<Tuple<int, int, char>>();
// init your ranges here
int num = 1;
// assuming that there certainly is a range which fits num,
// otherwise use "OrDefault"
// it may be good to create wrapper for Tuple,
// or create separate class for your data
char characterClass = ranges.
First(i => num.IsBetween(i.Item1, i.Item2)).Item3;
If my comment is correct then your first if statement has a lot of unnecessary checks, if its not less than interval 2 then it must be greater than or equal to, therefore:
if((x => i1) && (x < i2))
else if(x < i3)
else if(x < i4)...
When a "true" argument is found then the rest of the if statement is irrelevant, as long as your conditions are in order this should suit your needs
Create an Interval class and use LINQ:
public class Interval
{
public string TheValue { get; set; }
public int Start { get; set; }
public int End { get; set; }
public bool InRange(int x)
{
return x >= this.Start && x <= this.End;
}
}
public void MyMethod()
{
var intervals = new List<Interval>();
// Add them here...
var x = 3213;
var correctOne = intervals.FirstOrDefault(i => i.InRange(x));
Console.WriteLine(correctOne.TheValue);
}
Firstly, define a little class to hold the inclusive maximum value, and the corresponding value to use for that band:
sealed class Band
{
public int InclusiveMax;
public char Value;
}
Then declare an array of Band which specifies the value to use for each band and loop to find the corresponding band value for any input:
public char GetSetting(int input)
{
var bands = new[]
{
new Band {InclusiveMax = 1000, Value = 'a'},
new Band {InclusiveMax = 1500, Value = 'b'},
new Band {InclusiveMax = 3000, Value = 'c'}
};
char maxSetting = 'd';
foreach (var band in bands)
if (input <= band.InclusiveMax)
return band.Value;
return maxSetting;
}
Note: In real code, you would wrap all this into a class which initialises the bands array only once, and not every single time it's called (as it is in the code above).
Here you could also use the static System.Linq.Enumerable's Range() method that implements
IEnumerable<T>
with the Contains() method (again from System.Linq.Enumerable), to do something like:
var num = 254;
if(Enumerable.Range(100,300).Contains(num)) { ...your logic here; }
This looks more elegant in my eyes at least.

LINQ to calculate a moving average of a SortedList<dateTime,double>

I have a time series in the form of a SortedList<dateTime,double>. I would like to calculate a moving average of this series. I can do this using simple for loops. I was wondering if there is a better way to do this using linq.
my version:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var mySeries = new SortedList<DateTime, double>();
mySeries.Add(new DateTime(2011, 01, 1), 10);
mySeries.Add(new DateTime(2011, 01, 2), 25);
mySeries.Add(new DateTime(2011, 01, 3), 30);
mySeries.Add(new DateTime(2011, 01, 4), 45);
mySeries.Add(new DateTime(2011, 01, 5), 50);
mySeries.Add(new DateTime(2011, 01, 6), 65);
var calcs = new calculations();
var avg = calcs.MovingAverage(mySeries, 3);
foreach (var item in avg)
{
Console.WriteLine("{0} {1}", item.Key, item.Value);
}
}
}
class calculations
{
public SortedList<DateTime, double> MovingAverage(SortedList<DateTime, double> series, int period)
{
var result = new SortedList<DateTime, double>();
for (int i = 0; i < series.Count(); i++)
{
if (i >= period - 1)
{
double total = 0;
for (int x = i; x > (i - period); x--)
total += series.Values[x];
double average = total / period;
result.Add(series.Keys[i], average);
}
}
return result;
}
}
}
In order to achieve an asymptotical performance of O(n) (as the hand-coded solution does), you could use the Aggregate function like in
series.Skip(period-1).Aggregate(
new {
Result = new SortedList<DateTime, double>(),
Working = List<double>(series.Take(period-1).Select(item => item.Value))
},
(list, item)=>{
list.Working.Add(item.Value);
list.Result.Add(item.Key, list.Working.Average());
list.Working.RemoveAt(0);
return list;
}
).Result;
The accumulated value (implemented as anonymous type) contains two fields: Result contains the result list build up so far. Working contains the last period-1 elements. The aggregate function adds the current value to the Working list, builds the current average and adds it to the result and then removes the first (i.e. oldest) value from the working list.
The "seed" (i.e. the starting value for the accumulation) is build by putting the first period-1 elements into Working and initializing Result to an empty list.
Consequently tha aggregation starts with element period (by skipping (period-1) elements at the beginning)
In functional programming this is a typical usage pattern for the aggretate (or fold) function, btw.
Two remarks:
The solution is not "functionally" clean in that the same list objects (Working and Result) are reused in every step. I'm not sure if that might cause problems if some future compilers try to parallellize the Aggregate function automatically (on the other hand I'm also not sure, if that's possible after all...). A purely functional solution should "create" new lists at every step.
Also note that C# lacks powerful list expressions. In some hypothetical Python-C#-mixed pseudocode one could write the aggregation function like
(list, item)=>
new {
Result = list.Result + [(item.Key, (list.Working+[item.Value]).Average())],
Working=list.Working[1::]+[item.Value]
}
which would be a bit more elegant in my humble opinion :)
For the most efficient way possible to compute a Moving Average with LINQ, you shouldn't use LINQ!
Instead I propose creating a helper class which computes a moving average in the most efficient way possible (using a circular buffer and causal moving average filter), then an extension method to make it accessible to LINQ.
First up, the moving average
public class MovingAverage
{
private readonly int _length;
private int _circIndex = -1;
private bool _filled;
private double _current = double.NaN;
private readonly double _oneOverLength;
private readonly double[] _circularBuffer;
private double _total;
public MovingAverage(int length)
{
_length = length;
_oneOverLength = 1.0 / length;
_circularBuffer = new double[length];
}
public MovingAverage Update(double value)
{
double lostValue = _circularBuffer[_circIndex];
_circularBuffer[_circIndex] = value;
// Maintain totals for Push function
_total += value;
_total -= lostValue;
// If not yet filled, just return. Current value should be double.NaN
if (!_filled)
{
_current = double.NaN;
return this;
}
// Compute the average
double average = 0.0;
for (int i = 0; i < _circularBuffer.Length; i++)
{
average += _circularBuffer[i];
}
_current = average * _oneOverLength;
return this;
}
public MovingAverage Push(double value)
{
// Apply the circular buffer
if (++_circIndex == _length)
{
_circIndex = 0;
}
double lostValue = _circularBuffer[_circIndex];
_circularBuffer[_circIndex] = value;
// Compute the average
_total += value;
_total -= lostValue;
// If not yet filled, just return. Current value should be double.NaN
if (!_filled && _circIndex != _length - 1)
{
_current = double.NaN;
return this;
}
else
{
// Set a flag to indicate this is the first time the buffer has been filled
_filled = true;
}
_current = _total * _oneOverLength;
return this;
}
public int Length { get { return _length; } }
public double Current { get { return _current; } }
}
This class provides a very fast and lightweight implementation of a MovingAverage filter. It creates a circular buffer of Length N and computes one add, one subtract and one multiply per data-point appended, as opposed to the N multiply-adds per point for the brute force implementation.
Next, to LINQ-ify it!
internal static class MovingAverageExtensions
{
public static IEnumerable<double> MovingAverage<T>(this IEnumerable<T> inputStream, Func<T, double> selector, int period)
{
var ma = new MovingAverage(period);
foreach (var item in inputStream)
{
ma.Push(selector(item));
yield return ma.Current;
}
}
public static IEnumerable<double> MovingAverage(this IEnumerable<double> inputStream, int period)
{
var ma = new MovingAverage(period);
foreach (var item in inputStream)
{
ma.Push(item);
yield return ma.Current;
}
}
}
The above extension methods wrap the MovingAverage class and allow insertion into an IEnumerable stream.
Now to use it!
int period = 50;
// Simply filtering a list of doubles
IEnumerable<double> inputDoubles;
IEnumerable<double> outputDoubles = inputDoubles.MovingAverage(period);
// Or, use a selector to filter T into a list of doubles
IEnumerable<Point> inputPoints; // assuming you have initialised this
IEnumerable<double> smoothedYValues = inputPoints.MovingAverage(pt => pt.Y, period);
You already have an answer showing you how you can use LINQ but frankly I wouldn't use LINQ here as it will most likely perform poorly compared to your current solution and your existing code already is clear.
However instead of calculating the total of the previous period elements on every step, you can keep a running total and adjust it on each iteration. That is, change this:
total = 0;
for (int x = i; x > (i - period); x--)
total += series.Values[x];
to this:
if (i >= period) {
total -= series.Values[i - period];
}
total += series.Values[i];
This will mean that your code will take the same amount of time to execute regardless of the size of period.
This block
double total = 0;
for (int x = i; x > (i - period); x--)
total += series.Values[x];
double average = total / period;
can be rewritten as:
double average = series.Values.Skip(i - period + 1).Take(period).Sum() / period;
Your method may look like:
series.Skip(period - 1)
.Select((item, index) =>
new
{
item.Key,
series.Values.Skip(index).Take(period).Sum() / period
});
As you can see, linq is very expressive. I recommend to start with some tutorial like Introducing LINQ and 101 LINQ Samples.
To do this in a more functional way, you'd need a Scan method which exists in Rx but not in LINQ.
Let's look how it would look like if we'd have a scan method
var delta = 3;
var series = new [] {1.1, 2.5, 3.8, 4.8, 5.9, 6.1, 7.6};
var seed = series.Take(delta).Average();
var smas = series
.Skip(delta)
.Zip(series, Tuple.Create)
.Scan(seed, (sma, values)=>sma - (values.Item2/delta) + (values.Item1/delta));
smas = Enumerable.Repeat(0.0, delta-1).Concat(new[]{seed}).Concat(smas);
And here's the scan method, taken and adjusted from here:
public static IEnumerable<TAccumulate> Scan<TSource, TAccumulate>(
this IEnumerable<TSource> source,
TAccumulate seed,
Func<TAccumulate, TSource, TAccumulate> accumulator
)
{
if (source == null) throw new ArgumentNullException("source");
if (seed == null) throw new ArgumentNullException("seed");
if (accumulator == null) throw new ArgumentNullException("accumulator");
using (var i = source.GetEnumerator())
{
if (!i.MoveNext())
{
throw new InvalidOperationException("Sequence contains no elements");
}
var acc = accumulator(seed, i.Current);
while (i.MoveNext())
{
yield return acc;
acc = accumulator(acc, i.Current);
}
yield return acc;
}
}
This should have better performance than the brute force method since we are using a running total to calculate the SMA.
What's going on here?
To start we need to calculate the first period which we call seed here. Then, every subsequent value we calculate from the accumulated seed value. To do that we need the old value (that is t-delta) and the newest value for which we zip together the series, once from the beginning and once shifted by the delta.
At the end we do some cleanup by adding zeroes for the length of the first period and adding the initial seed value.
Another option is to use MoreLINQ's Windowed method, which simplifies the code significantly:
var averaged = mySeries.Windowed(period).Select(window => window.Average(keyValuePair => keyValuePair.Value));
I use this code to calculate SMA:
private void calculateSimpleMA(decimal[] values, out decimal[] buffer)
{
int period = values.Count(); // gets Period (assuming Period=Values-Array-Size)
buffer = new decimal[period]; // initializes buffer array
var sma = SMA(period); // gets SMA function
for (int i = 0; i < period; i++)
buffer[i] = sma(values[i]); // fills buffer with SMA calculation
}
static Func<decimal, decimal> SMA(int p)
{
Queue<decimal> s = new Queue<decimal>(p);
return (x) =>
{
if (s.Count >= p)
{
s.Dequeue();
}
s.Enqueue(x);
return s.Average();
};
}
Here is an extension method:
public static IEnumerable<double> MovingAverage(this IEnumerable<double> source, int period)
{
if (source is null)
{
throw new ArgumentNullException(nameof(source));
}
if (period < 1)
{
throw new ArgumentOutOfRangeException(nameof(period));
}
return Core();
IEnumerable<double> Core()
{
var sum = 0.0;
var buffer = new double[period];
var n = 0;
foreach (var x in source)
{
n++;
sum += x;
var index = n % period;
if (n >= period)
{
sum -= buffer[index];
yield return sum / period;
}
buffer[index] = x;
}
}
}

How to sort a part of an array with int64 indicies in C#?

The .Net framework has an Array.Sort overload that allows one to specify the starting and ending indicies for the sort to act upon. However these parameters are only 32 bit. So I don't see a way to sort a part of a large array when the indicies that describe the sort range can only be specified using a 64-bit number. I suppose I could copy and modify the the framework's sort implementation, but that is not ideal.
Update:
I've created two classes to help me around these and other large-array issues. One other such issue was that long before I got to my memory limit, I start getting OutOfMemoryException's. I'm assuming this is because the requested memory may be available but not contiguous. So for that, I created class BigArray, which is a generic, dynamically sizable list of arrays. It has a smaller memory footprint than the framework's generic list class, and does not require that the entire array be contiguous. I haven't tested the performance hit, but I'm sure its there.
public class BigArray<T> : IEnumerable<T>
{
private long capacity;
private int itemsPerBlock;
private int shift;
private List<T[]> blocks = new List<T[]>();
public BigArray(int itemsPerBlock)
{
shift = (int)Math.Ceiling(Math.Log(itemsPerBlock) / Math.Log(2));
this.itemsPerBlock = 1 << shift;
}
public long Capacity
{
get
{
return capacity;
}
set
{
var requiredBlockCount = (value - 1) / itemsPerBlock + 1;
while (blocks.Count > requiredBlockCount)
{
blocks.RemoveAt(blocks.Count - 1);
}
while (blocks.Count < requiredBlockCount)
{
blocks.Add(new T[itemsPerBlock]);
}
capacity = (long)itemsPerBlock * blocks.Count;
}
}
public T this[long index]
{
get
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
return blocks[blockNumber][itemNumber];
}
set
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
blocks[blockNumber][itemNumber] = value;
}
}
public IEnumerator<T> GetEnumerator()
{
for (long i = 0; i < capacity; i++)
{
yield return this[i];
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
And getting back to the original issue of sorting... What I really needed was a way to act on each element of an array, in order. But with such large arrays, it is prohibitive to copy the data, sort it, act on it and then discard the sorted copy (the original order must be maintained). So I created static class OrderedOperation, which allows you to perform an arbitrary operation on each element of an unsorted array, in a sorted order. And do so with a low memory footprint (trading memory for execution time here).
public static class OrderedOperation
{
public delegate void WorkerDelegate(int index, float progress);
public static void Process(WorkerDelegate worker, IEnumerable<int> items, int count, int maxItem, int maxChunkSize)
{
// create a histogram such that a single bin is never bigger than a chunk
int binCount = 1000;
int[] bins;
double binScale;
bool ok;
do
{
ok = true;
bins = new int[binCount];
binScale = (double)(binCount - 1) / maxItem;
int i = 0;
foreach (int item in items)
{
bins[(int)(binScale * item)]++;
if (++i == count)
{
break;
}
}
for (int b = 0; b < binCount; b++)
{
if (bins[b] > maxChunkSize)
{
ok = false;
binCount *= 2;
break;
}
}
} while (!ok);
var chunkData = new int[maxChunkSize];
var chunkIndex = new int[maxChunkSize];
var done = new System.Collections.BitArray(count);
var processed = 0;
var binsCompleted = 0;
while (binsCompleted < binCount)
{
var chunkMax = 0;
var sum = 0;
do
{
sum += bins[binsCompleted];
binsCompleted++;
} while (binsCompleted < binCount - 1 && sum + bins[binsCompleted] <= maxChunkSize);
Debug.Assert(sum <= maxChunkSize);
chunkMax = (int)Math.Ceiling((double)binsCompleted / binScale);
var chunkCount = 0;
int i = 0;
foreach (int item in items)
{
if (item < chunkMax && !done[i])
{
chunkData[chunkCount] = item;
chunkIndex[chunkCount] = i;
chunkCount++;
done[i] = true;
}
if (++i == count)
{
break;
}
}
Debug.Assert(sum == chunkCount);
Array.Sort(chunkData, chunkIndex, 0, chunkCount);
for (i = 0; i < chunkCount; i++)
{
worker(chunkIndex[i], (float)processed / count);
processed++;
}
}
Debug.Assert(processed == count);
}
}
The two classes can work together (that's how I use them), but they don't have to. I hope someone else finds them useful. But I'll admit, they are fringe case classes. Questions welcome. And if my code sucks, I'd like to hear tips, too.
One final thought: As you can see in OrderedOperation, I'm using ints and not longs. Currently that is sufficient for me despite the original question I had (the application is in flux, in case you can't tell). But the class should be able to handle longs as well, should the need arise.
You'll find that even on the 64-bit framework, the maximum number of elements in an array is int.MaxValue.
The existing methods that take or return Int64 just cast the long values to Int32 internally and, in the case of parameters, will throw an ArgumentOutOfRangeException if a long parameter isn't between int.MinValue and int.MaxValue.
For example the LongLength property, which returns an Int64, just casts and returns the value of the Length property:
public long LongLength
{
get { return (long)this.Length; } // Length is an Int32
}
So my suggestion would be to cast your Int64 indicies to Int32 and then call one of the existing Sort overloads.
Since Array.Copy takes Int64 params, you could pull out the section you need to sort, sort it, then put it back. Assuming you're sorting less than 2^32 elements, of course.
Seems like if you are sorting more than 2^32 elements then it would be best to write your own, more efficient, sort algorithm anyway.

Categories