C# Increase Array Find Loop Performance - c#

I've got an Datapoint[] file = new Datapoint[2592000] array. This array is filled with timestamps and random values. Creating them costs me like 2s. But in another function prepareData(); I'm preparing 240 Values for another Array TempBuffer.
In the prepareData() function I'm searching for matching values in the file array. If I can't find any I take the timestamp and set the value to 0 else I'm taking the found value + same timestamp.
The function looks like this:
public void prepareData()
{
stopWatch.Reset();
stopWatch.Start();
Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
for (double i = unixTimestamp; unixTimestamp - 240 < i; i--)
{
bool exists = true;
if (exists != (Array.Exists(file, element => element.XValue == i)))
{
TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), 0) }).ToArray();
}
else
{
DataPoint point = Array.Find(file, element => element.XValue == i);
TempBuffer = TempBuffer.Skip(1).Concat(new DataPoint[] { new DataPoint(UnixTODateTime(i).ToOADate(), point.YValues) }).ToArray();
}
}
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
}
Now the problem is with this amount of data in the file (2'592'000) the function needs like 40 seconds! With smaller amounts like 10'000 it's not problem and working fine and fast. But as soon as I set the file size to my prefered 2'592'000 points the CPU is pushed to 99% Performance and the function needs way too long.
TempBuffer Sample Value:
X = Converted UnixTimeStamp to DateTime and DateTime Converted To AODate
{X=43285.611087963, Y=23}
File Sample Value:
X = Unixtimestamp
{X=1530698090, Y=24}
It's important that the tempbuffer values are converted into AODate since the data inside the tempbuffer array is displayed in a mschart.
Is there a way to improve my code so I've got better performance?

This is most performant way for your task (this is just a template, not the final code):
public void prepareData()
{
// it will be initialized with null values
var tempbuffer = new DataPoint[240];
var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
var oldest = timestamp - 240 + 1;
// fill tempbuffer with existing DataPoints
for (int i = 0; i < file.Length; i++)
{
if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
{
tempbuffer[file[i].XValue - oldest] = new DataPoint(file[i].XValue, file[i].YValues);
}
}
// fill null values in tempbuffer with 'empty' DataPoints
for (int i = 0; i < tempbuffer.Length; i++)
{
tempbuffer[i] = tempbuffer[i] ?? new DataPoint(oldest + i, 0);
}
}
I have about ~10 ms
Update from comments:
If you want to fetch multiple DataPoint's and get the result using some function (e.g. average), then:
public void prepareData()
{
// use array of lists of YValues
var tempbuffer = new List<double>[240];
// initialize it
for (int i = 0; i < tempbuffer.Length; i++)
{
tempbuffer[i] = new List<double>(); //set capacity for better performance
}
var timestamp = (int)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
var oldest = timestamp - 240 + 1;
// fill tempbuffer with existing DataPoint's YValues
for (int i = 0; i < file.Length; i++)
{
if (file[i].XValue <= timestamp && file[i].XValue > timestamp - 240)
{
tempbuffer[file[i].XValue - oldest].Add(file[i].YValues);
}
}
// get result
var result = new DataPoint[tempbuffer.Length];
for (int i = 0; i < result.Length; i++)
{
result[i] = new DataPoint(oldest + i, tempbuffer[i].Count == 0 ? 0 : tempbuffer[i].Average());
}
}

You haven't given us a complete picture of your code. I would ideally like sample data and the full class definitions. But given the limit information available I think you'll find something like this works:
public void prepareData()
{
Int32 unixTimestamp = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
var map = file.ToLookup(x => x.XValue);
TempBuffer =
Enumerable
.Range(0, 240)
.Select(x => unixTimestamp - x)
.SelectMany(x =>
map[x]
.Concat(new DataPoint(UnixTODateTime(x).ToOADate(), 0)).Take(1))
.ToArray();
}

Array.Exists() and Array.Find() are O(N) operations, you are performing them x M (240) times.
Try LINQ Join instead:
DataPoint[] dataPoints; // your "file" variable
var seekedTimestamps = Enumerable.Range(0, 240).Select(i => unixTimestamp - i);
var matchingDataPoints = dataPoints.Join(seekedTimestamps, dp => dp.XValue, sts => sts, (dp, sts) => dp);
var missingTimestamps = seekedTimestamps.Except(matchingDataPoints.Select(mdp => mdp.XValue));
// do your logic with found and missing here
// ...
LINQ Join uses hashing (on the selected "keys") and is close to O(n)
Alternatively, assuming Timestamps in input are unique and you plan to do multiple operations on the input, construct a Dictionary<int (Timestamp), DataPoint> (expensive), which will give you O(1) retrieval of a wanted data point: var dataPoint = dict[wantedTimestamp];

If DataPoint is unique (no 2 instances with identical values) you should switch the list file to a dictionary. The dictionary lookup is way faster than iterating potentially all members of the array.
Of course you need to implement GetHashCode and Equals or define a unique key for each Datapoint.

Related

Calculate max on a sliding window for TimeSeries

Input:
public class MyObject
{
public double Value { get; set; }
public DateTime Date { get; set; }
}
Method to generate test objects:
public static MyObject[] GetTestObjects()
{
var rnd = new Random();
var date = new DateTime(2021, 1, 1, 0, 0, 0);
var result = new List<MyObject>();
for (int i = 0; i < 50000; i++)
{
//this is to simulate real data having gaps
if (rnd.Next(100) < 25)
{
continue;
}
var myObject = new MyObject()
{
Value = rnd.NextDouble(),
Date = date.AddMinutes(15 * i)
};
result.Add(myObject);
}
return result.ToArray();
}
Given this I require to calculate maximum Value for previous 12 month for each myObject. I could just think of doing this InParallel, but maybe there is an optimized solution?
Sorry for being unclear, this is what I use right now to get what I want:
public MyObject[] BruteForceBackward(MyObject[] testData)
{
return testData.AsParallel().Select(point =>
{
var max = testData.Where(x => x.Date <= point.Date && x.Date >= point.Date.AddYears(-1)).Max(x => x.Value);
return new MyObject() { Date = point.Date, Value = point.Value / max };
}).OrderBy(r => r.Date).ToArray();
}
This works but it is slow and eats processor resources (imagine, you have 100k objects), I believe there must be something better
I had a simillar project where i had to calculate such stuff on tons of sensor data.
You can now find a little more refined version in my Github repository, which should be ready to use (.Net):
https://github.com/forReason/Statistics-Helper-Library
In general you want to reduce the amount of loops going over all your data. At best, you want to touch each element only one single time.
Process Array (equiv. of BruteForceBackwards)
public static MyObject[] FlowThroughForward(ref MyObject[] testData)
{
// generate return array
MyObject[] returnData = new MyObject[testData.Length];
// keep track to minimize processing
double currentMaximum = 0;
List<MyObject> maximumValues = new List<MyObject>();
// go through the elements
for (int i = 0; i < testData.Length; i++)
{
// calculate the oldest date to keep in tracking list
DateTime targetDate = testData[i].Date.AddYears(-1);
// maximum logic
if (testData[i].Value >= currentMaximum)
{
// new maximum found, clear tracking list
// this is the best case scenario
maximumValues.Clear();
currentMaximum = testData[i].Value;
}
else
{
// unfortunately, no new maximum was found
// go backwards the maximum tracking list and check for smaller values
// clear the list of all smaller values. The list should therefore always
// be in descending order
for (int b = maximumValues.Count - 1; b >= 0; b--)
{
if (maximumValues[b].Value <= testData[i].Value)
{
// a lower value has been found. We have a newer, higher value
// clear this waste value from the tracking list
maximumValues.RemoveAt(b);
}
else
{
// there are no more lower values.
// stop looking for smaller values to save time
break;
}
}
}
// append new value to tracking list, no matter if higher or lower
// all future values might be lower
maximumValues.Add(testData[i]);
// check if the oldest value is too old to be kept in the tracking list
while (maximumValues[0].Date < targetDate)
{
// oldest value is to be removed
maximumValues.RemoveAt(0);
// update maximum
currentMaximum = maximumValues[0].Value;
}
// add object to result list
returnData[i] = new MyObject() { Date = testData[i].Date, Value = testData[i].Value / currentMaximum }; ;
}
return returnData;
}
Real Time Data or Streamed Data
Note: If you have really large lists, you might get memory issues with your approach to pass a full array. In this case: pass one value at a time, pass them from oldest value to newest value. Store the values back one at a time.
This Function can also be used on real time data.
The test method is included in code.
static void Main(string[] args)
{
int length = 50000;
Stopwatch stopWatch1 = new Stopwatch();
stopWatch1.Start();
var myObject = new MyObject();
var result = new List<MyObject>();
var date = new DateTime(2021, 1, 1, 0, 0, 0);
for (int i = 0; i < length; i++)
{
//this is to simulate real data having gaps
if (rnd.Next(100) < 25)
{
continue;
}
myObject.Value = rnd.NextDouble();
myObject.Date = date.AddMinutes(15 * i);
result.Add(CalculateNextObject(ref myObject));
}
stopWatch1.Stop();
Console.WriteLine("test code executed in " + stopWatch1.ElapsedMilliseconds + " ms");
Thread.Sleep(1000000);
}
private static Random rnd = new Random();
private static double currentMaximum = 0;
private static List<MyObject> maximumValues = new List<MyObject>();
public static MyObject CalculateNextObject(ref MyObject input)
{
// calculate the oldest date to keep in tracking list
DateTime targetDate = input.Date.AddYears(-1);
// maximum logic
if (input.Value >= currentMaximum)
{
// new maximum found, clear tracking list
// this is the best case scenario
maximumValues.Clear();
currentMaximum = input.Value;
}
else
{
// unfortunately, no new maximum was found
// go backwards the maximum tracking list and check for smaller values
// clear the list of all smaller values. The list should therefore always
// be in descending order
for (int b = maximumValues.Count - 1; b >= 0; b--)
{
if (maximumValues[b].Value <= input.Value)
{
// a lower value has been found. We have a newer, higher value
// clear this waste value from the tracking list
maximumValues.RemoveAt(b);
}
else
{
// there are no more lower values.
// stop looking for smaller values to save time
break;
}
}
}
// append new value to tracking list, no matter if higher or lower
// all future values might be lower
maximumValues.Add(input);
// check if the oldest value is too old to be kept in the tracking list
while (maximumValues[0].Date < targetDate)
{
// oldest value is to be removed
maximumValues.RemoveAt(0);
// update maximum
currentMaximum = maximumValues[0].Value;
}
// add object to result list
MyObject returnData = new MyObject() { Date = input.Date, Value = input.Value / currentMaximum };
return returnData;
}
Test Method
static void Main(string[] args)
{
MyObject[] testData = GetTestObjects();
Stopwatch stopWatch1 = new Stopwatch();
Stopwatch stopWatch2 = new Stopwatch();
stopWatch1.Start();
MyObject[] testresults1 = BruteForceBackward(testData);
stopWatch1.Stop();
Console.WriteLine("BruteForceBackward executed in " + stopWatch1.ElapsedMilliseconds + " ms");
stopWatch2.Start();
MyObject[] testresults2 = FlowThroughForward(ref testData);
stopWatch2.Stop();
Console.WriteLine("FlowThroughForward executed in " + stopWatch2.ElapsedMilliseconds + " ms");
Console.WriteLine();
Console.WriteLine("Comparing some random test results: ");
var rnd = new Random();
for (int i = 0; i < 10; i++)
{
int index = rnd.Next(0, testData.Length);
Console.WriteLine("Index: " + index + " brute: " + testresults1[index].Value + " flow: " + testresults2[index].Value);
}
Thread.Sleep(1000000);
}
Test result
Tests were performed on a machine with 32 cores, so in teory multithreaded aproach should be at advantage but youll see ;)
Function
Function Time
time %
BruteForceBackward
5334 ms
99.9%
FlowThroughForward
5 ms
0.094%
Performance improvement factor: ~time/1000
console output with data validation:
BruteForceBackward executed in 5264 ms
FlowThroughForward executed in 5 ms
Comparing some random test results:
Index: 25291 brute: 0.989688139105413 flow: 0.989688139105413
Index: 11945 brute: 0.59670821976193 flow: 0.59670821976193
Index: 30282 brute: 0.413238225210297 flow: 0.413238225210297
Index: 33898 brute: 0.38258761939139 flow: 0.38258761939139
Index: 8824 brute: 0.833512217105447 flow: 0.833512217105447
Index: 22092 brute: 0.648052464067263 flow: 0.648052464067263
Index: 24633 brute: 0.35859417692481 flow: 0.35859417692481
Index: 24061 brute: 0.540642018793402 flow: 0.540642018793402
Index: 34219 brute: 0.498785766613022 flow: 0.498785766613022
Index: 2396 brute: 0.151471808392111 flow: 0.151471808392111
Cpu usage was a lot higher on Bruteforce backwards due to parallelisation.
The worst case scenario are long periods of decreasing values. The code can still be vastly optimized but I guess this should be sufficient. For further optimisation, one might look to reduce the list shuffles when removing/adding elements to maximumValues.
An interesting and challenging problem. I put together a solution using a dynamic programming approach (first learned back in CS algorithms class back in '78). First, a tree is constructed containing pre-calculated local max values over recursively defined ranges. Once constructed, the max value for an arbitrary range can be efficiently calculated mostly using the pre-calculated values. Only at the fringes of the range does the calculation drop down to the element level.
It is not as fast as julian bechtold's FlowThroughForward method, but random access to ranges may be a plus.
Code to add to Main:
Console.WriteLine();
Stopwatch stopWatch3 = new Stopwatch();
stopWatch3.Start();
MyObject[] testresults3 = RangeTreeCalculation(ref testData, 10);
stopWatch3.Stop();
Console.WriteLine($"RangeTreeCalculation executed in {stopWatch3.ElapsedMilliseconds} ms");
... test comparison
Console.WriteLine($"Index: {index} brute: {testresults1[index].Value} flow: {testresults2[index].Value} rangeTree: {testresults3[index].Value}");
Test function:
public static MyObject[] RangeTreeCalculation(ref MyObject[] testDataArray, int partitionThreshold)
{
// For this implementation, we need to convert the Array to an ArrayList, because we need a
// reference type object that can be shared.
List<MyObject> testDataList = testDataArray.ToList();
// Construct a tree containing recursive collections of pre-calculated values
var rangeTree = new RangeTree(testDataList, partitionThreshold);
MyObject[] result = new MyObject[testDataList.Count];
Parallel.ForEach(testDataList, (item, state, i) =>
{
var max = rangeTree.MaxForDateRange(item.Date.AddYears(-1), item.Date);
result[i] = new MyObject() { Date = item.Date, Value = item.Value / max };
});
return result;
}
Supporting class:
// Class used to divide and conquer using dynamic programming.
public class RangeTree
{
public List<MyObject> Data; // This reference is shared by all members of the tree
public int Start { get; } // Index of first element covered by this node.
public int Count { get; } // Number of elements covered by this node.
public DateTime FirstDateTime { get; }
public DateTime LastDateTime { get; }
public double MaxValue { get; } // Pre-calculated max for all elements covered by this node.
List<RangeTree> ChildRanges { get; }
// Top level node constructor
public RangeTree(List<MyObject> data, int partitionThreshold)
: this(data, 0, data.Count, partitionThreshold)
{
}
// Child node constructor, which covers an recursively decreasing range of element.
public RangeTree(List<MyObject> data, int start, int count, int partitionThreshold)
{
Data = data;
Start = start;
Count = count;
FirstDateTime = Data[Start].Date;
LastDateTime = Data[Start + Count - 1].Date;
if (count <= partitionThreshold)
{
// If the range is smaller than the threshold, just calculate the local max
// directly from the items. No child ranges are defined.
MaxValue = Enumerable.Range(Start, Count).Select(i => Data[i].Value).Max();
}
else
{
// We still have a significant range. Decide how to further divide them up into sub-ranges.
// (There may be room for improvement here to better balance the tree.)
int partitionSize = (count - 1) / partitionThreshold + 1;
int partitionCount = (count - 1) / partitionSize + 1;
if (count < partitionThreshold * partitionThreshold)
{
// When one away from leaf nodes, prefer fewer full leaf nodes over more
// less populated leaf nodes.
partitionCount = (count - 1) / partitionThreshold + 1;
partitionSize = (count - 1) / partitionCount + 1;
}
ChildRanges = Enumerable.Range(0, partitionCount)
.Select(partitionNum => new {
ChildStart = Start + partitionNum * partitionSize,
ChildCount = Math.Min(partitionSize, Count - partitionNum * partitionSize)
})
.Where(part => part.ChildCount > 0) // Defensive
.Select(part => new RangeTree(Data, part.ChildStart, part.ChildCount, partitionThreshold))
.ToList();
// Now is the dynamic programming part:
// Calculate the local max as the max of all child max values.
MaxValue = ChildRanges.Max(chile => chile.MaxValue);
}
}
// Get the max value for a given range of dates withing this rangeTree node.
// This used the precalculated values as much as possible.
// Only at the fringes of the date range to we calculate at the element level.
public double MaxForDateRange(DateTime fromDate, DateTime thruDate)
{
double calculatedMax = Double.MinValue;
if (fromDate > this.LastDateTime || thruDate < this.FirstDateTime)
{
// Entire range is excluded. Nothing of interest here folks.
calculatedMax = Double.MinValue;
}
else if (fromDate <= this.FirstDateTime && thruDate >= this.LastDateTime)
{
// Entire range is included. Use the already-calculated max.
calculatedMax = this.MaxValue;
}
else if (ChildRanges != null)
{
// We have child ranges. Recurse and accumulate.
// Possible optimization: Calculate max for middle ranges first, and only bother
// with extreme partial ranges if their local max values exceed the preliminary result.
for (int i = 0; i < ChildRanges.Count; ++i)
{
double childMax = ChildRanges[i].MaxForDateRange(fromDate, thruDate);
if (childMax > calculatedMax)
{
calculatedMax = childMax;
}
}
}
else
{
// Leaf range. Loop through just this limited range of notes, checking individually for
// date in range and accumulating the result.
for (int i = 0; i < this.Count; ++i)
{
var element = Data[this.Start + i];
if (fromDate <= element.Date && element.Date <= thruDate && element.Value > calculatedMax)
{
calculatedMax = element.Value;
}
}
}
return calculatedMax;
}
}
There's plenty of room for improvement, such as parameterizing the types and generalizing the functionality to support more than just Max(Value), but the framework is there.
Assuming you meant you need the maximum Value for each of the last 12 months from result, then you can use LINQ:
var beginDateTime = DateTime.Now.AddMonths(-12);
var ans = result.Where(r => r.Date >= beginDateTime).GroupBy(r => r.Date.Month).Select(mg => mg.MaxBy(r => r.Value)).ToList();
Running some timing, I get that putting AsParallel after result changes the run time from around 16ms (first run) to around 32ms, so it is actually slower. It is about the same after the Where and about 23ms after the GroupBy (processing the 12 groups in parallel). On my PC at least, there isn't enough data or complex operations for parallelism, but the GroupBy isn't the most efficient.
Using an array and testing each element, I get the results in about 1.2ms:
var maxMOs = new MyObject[12];
foreach (var r in result.Where(r => r.Date >= beginDateTime)) {
var monthIndex = r.Date.Month-1;
if (maxMOs[monthIndex] == null || r.Value > maxMOs[monthIndex].Value)
maxMOs[monthIndex] = r;
}
Note that the results are not chronological; you could offset monthIndex by today's month to order the results if desired.
var maxMOs = new MyObject[12];
var offset = DateTime.Now.Month-11;
foreach (var r in result.Where(r => r.Date >= beginDateTime)) {
var monthIndex = r.Date.Month-offset;
if (maxMOs[monthIndex] == null || r.Value > maxMOs[monthIndex].Value)
maxMOs[monthIndex] = r;
}
A micro-optimization (mostly useful on repeat runnings) is to invert the test and use the null-propagating operator:
if (!(r.Value <= maxMOs[monthIndex]?.Value))
This saves about 0.2ms on the first run but up to 0.5ms on subsequent runs.
Here is a solution similar to julian bechtold's answer. Difference is that the maximum (and all related variables) are kept hidden away from the main implementation, in a separate class whose purpose is solely to keep track of the maximum over the past year. Algorithm is the same, I just use a few Linq expressions here and there.
We keep track of the maximum in the following class:
public class MaxSlidingWindow
{
private readonly List<MyObject> _maximumValues;
private double _max;
public MaxSlidingWindow()
{
_maximumValues = new List<MyObject>();
_max = double.NegativeInfinity;
}
public double Max => _max;
public void Add(MyObject myObject)
{
if (myObject.Value >= _max)
{
_maximumValues.Clear();
_max = myObject.Value;
}
else
{
RemoveValuesSmallerThan(myObject.Value);
}
_maximumValues.Add(myObject);
RemoveObservationsBefore(myObject.Date.AddYears(-1));
_max = _maximumValues[0].Value;
}
private void RemoveObservationsBefore(DateTime targetDate)
{
var toRemoveFromFront = 0;
while (_maximumValues[toRemoveFromFront].Date < targetDate && toRemoveFromFront <= maximumValues3.Count -1)
{
toRemoveFromFront++;
}
_maximumValues.RemoveRange(0, toRemoveFromFront);
}
private void RemoveValuesSmallerThan(double targetValue)
{
var maxEntry = _maximumValues.Count - 1;
var toRemoveFromBack = 0;
while (toRemoveFromBack <= maxEntry && _maximumValues[maxEntry - toRemoveFromBack].Value <= targetValue)
{
toRemoveFromBack++;
}
_maximumValues.RemoveRange(maxEntry - toRemoveFromBack + 1, toRemoveFromBack);
}
}
It can be used as follows:
public static MyObject[] GetTestObjects_MaxSlidingWindow()
{
var rnd = new Random();
var date = new DateTime(2021, 1, 1, 0, 0, 0);
var result = new List<MyObject>();
var maxSlidingWindow = new MaxSlidingWindow();
for (int i = 0; i < 50000; i++)
{
//this is to simulate real data having gaps
if (rnd.Next(100) < 25)
{
continue;
}
var myObject = new MyObject()
{
Value = rnd.NextDouble(),
Date = date.AddMinutes(15 * i)
};
maxSlidingWindow.Add(myObject);
var max = maxSlidingWindow.Max;
result.Add(new MyObject { Date = myObject.Date, Value = myObject.Value / max });
}
return result.ToArray();
}
See the relative timings below - above solution is slightly faster (timed over 10 million runs), but barely noticeable:
Relative timings

Making an X length List fit in between a 0 - 1 scale

I'm having an issue making my list (that could have any number of elements) correspond to another object that takes in a range of 0-1.
What are the steps involved so I can covert my lists data so that when my slider is at 0, it's at the start of my list and when its at 1, it's at the end of my list?
All the code that corresponds to my list and how I'm filling it out is as follows:
private List<DateTime> days = new List<DateTime>();
private string debugAreaString = "";
// Use this for initialization
void Start ()
{
Slider ();
sliderElement = sliderObject.GetComponent<UISlider>();
}
// Update is called once per frame
void Update ()
{
sliderElement.numberOfSteps = Convert.ToInt32(days.Count - 1);
for( int p = 0; p < sliderElement.numberOfSteps - 1; p++)
{
debugAreaString = Convert.ToString(days[p]);
//Debug.Log(days[p]);
}
Debug.Log(sliderElement.numberOfSteps);
}
void Slider()
{
startTime = new DateTime(startYear, startMonth, startDay);
endTime = new DateTime(endYear, endMonth, endDay);
TimeSpan elapsed = endTime.Subtract(startTime);
startString = startDay.ToString();
elapsedString = elapsed.TotalDays.ToString();
int totalDays = (int)endTime.Subtract(startTime).TotalDays;
days.Add(startTime);
for (var i = 1; i < totalDays; i++)
{
days.Add(startTime.AddDays(i));
}
days.Add(endTime);
}
The list gets filled with every single day between two points.
"What are the steps involved so I can covert my lists data so that when my slider is at 0, it's at the start of my list and when its at 1, it's at the end of my list?"
myList[(int)Math.Round(sliderValue*myList.Count)]
No conversion necessary.

How to populate two separate arrays from one comma-delimited list?

I have a comma delimited text file that contains 20 digits separated by commas. These numbers represent earned points and possible points for ten different assignments. We're to use these to calculate a final score for the course.
Normally, I'd iterate through the numbers, creating two sums, divide and be done with it. However, our assignment dictates that we load the list of numbers into two arrays.
so this:
10,10,20,20,30,35,40,50,45,50,45,50,50,50,20,20,45,90,85,85
becomes this:
int[10] earned = {10,20,30,40,45,50,20,45,85};
int[10] possible = {10,20,35,50,50,50,20,90,85};
Right now, I'm using
for (x=0;x<10;x++)
{
earned[x] = scores[x*2]
poss [x] = scores[(x*2)+1]
}
which gives me the results I want, but seems excessively clunky.
Is there a better way?
The following should split each alternating item the list into the other two lists.
int[20] scores = {10,10,20,20,30,35,40,50,45,50,45,50,50,50,20,20,45,90,85,85};
int[10] earned;
int[10] possible;
int a = 0;
for(int x=0; x<10; x++)
{
earned[x] = scores[a++];
possible[x] = scores[a++];
}
You can use LINQ here:
var arrays = csv.Split(',')
.Select((v, index) => new {Value = int.Parse(v), Index = index})
.GroupBy(g => g.Index % 2,
g => g.Value,
(key, values) => values.ToArray())
.ToList();
and then
var earned = arrays[0];
var possible = arrays[1];
Get rid of the "magic" multiplications and illegible array index computations.
var earned = new List<int>();
var possible = new List<int>();
for (x=0; x<scores.Length; x += 2)
{
earned.Add(scores[x + 0]);
possible.Add(scores[x + 1]);
}
This has very little that would need a text comment. This is the gold standard for self-documenting code.
I initially thought the question was a C question because of all the incomprehensible indexing. It looked like pointer magic. It was too clever.
In my codebases I usually have an AsChunked extension available that splits a list into chunks of the given size.
var earned = new List<int>();
var possible = new List<int>();
foreach (var pair in scores.AsChunked(2)) {
earned.Add(pair[0]);
possible.Add(pair[1]);
}
Now the meaning of the code is apparent. The magic is gone.
Even shorter:
var pairs = scores.AsChunked(2);
var earned = pairs.Select(x => x[0]).ToArray();
var possible = pairs.Select(x => x[1]).ToArray();
I suppose you could do it like this:
int[] earned = new int[10];
int[] possible = new int[10];
int resultIndex = 0;
for (int i = 0; i < scores.Count; i = i + 2)
{
earned[resultIndex] = scores[i];
possible[resultIndex] = scores[i + 1];
resultIndex++;
}
You would have to be sure that an equal number of values are stored in scores.
I would leave your code as is. You are technically expressing very directly what your intent is, every 2nd element goes into each array.
The only way to improve that solution is to comment why you are multiplying. But I would expect someone to quickly recognize the trick, or easily reproduce what it is doing. Here is an excessive example of how to comment it. I wouldn't recommend using this directly.
for (x=0;x<10;x++)
{
//scores contains the elements inline one after the other
earned[x] = scores[x*2] //Get the even elements into earned
poss [x] = scores[(x*2)+1] //And the odd into poss
}
However if you really don't like the multiplication, you can track the scores index separately.
int i = 0;
for (int x = 0; x < 10; x++)
{
earned[x] = scores[i++];
poss [x] = scores[i++];
}
But I would probably prefer your version since it does not depend on the order of the operations.
var res = grades.Select((x, i) => new {x,i}).ToLookup(y=>y.i%2, y=>y.x)
int[] earned = res[0].ToArray();
int[] possible = res[1].ToArray();
This will group all grades into two buckets based on index, then you can just do ToArray if you need result in array form.
here is an example of my comment so you do not need to change the code regardless of the list size:
ArrayList Test = new ArrayList { "10,10,20,20,30,35,40,50,45,50,45,50,50,50,20,20,45,90,85,85" };
int[] earned = new int[Test.Count / 2];
int[] Score = new int[Test.Count / 2];
int Counter = 1; // start at one so earned is the first array entered in to
foreach (string TestRow in Test)
{
if (Counter % 2 != 0) // is the counter even
{
int nextNumber = 0;
for (int i = 0; i < Score.Length; i++) // this gets the posistion for the next array entry
{
if (String.IsNullOrEmpty(Convert.ToString(Score[i])))
{
nextNumber = i;
break;
}
}
Score[nextNumber] = Convert.ToInt32(TestRow);
}
else
{
int nextNumber = 0;
for (int i = 0; i < earned.Length; i++) // this gets the posistion for the next array entry
{
if (String.IsNullOrEmpty(Convert.ToString(earned[i])))
{
nextNumber = i;
break;
}
}
earned[nextNumber] = Convert.ToInt32(TestRow);
}
Counter++
}

LINQ get x amount of elements from a list

I have a query which I get as:
var query = Data.Items
.Where(x => criteria.IsMatch(x))
.ToList<Item>();
This works fine.
However now I want to break up this list into x number of lists, for example 3. Each list will therefore contain 1/3 the amount of elements from query.
Can it be done using LINQ?
You can use PLINQ partitioners to break the results into separate enumerables.
var partitioner = Partitioner.Create<Item>(query);
var partitions = partitioner.GetPartitions(3);
You'll need to reference the System.Collections.Concurrent namespace. partitions will be a list of IEnumerable<Item> where each enumerable returns a portion of the query.
I think something like this could work, splitting the list into IGroupings.
const int numberOfGroups = 3;
var groups = query
.Select((item, i) => new { item, i })
.GroupBy(e => e.i % numberOfGroups);
You can use Skip and Take in a simple for to accomplish what you want
var groupSize = (int)Math.Ceiling(query.Count() / 3d);
var result = new List<List<Item>>();
for (var j = 0; j < 3; j++)
result.Add(query.Skip(j * groupSize).Take(groupSize).ToList());
If the order of the elements doesn't matter using an IGrouping as suggested by Daniel Imms is probably the most elegant way (add .Select(gr => gr.Select(e => e.item)) to get an IEnumerable<IEnumerable<T>>).
If however you want to preserve the order you need to know the total number of elements. Otherwise you wouldn't know when to start the next group. You can do this with LINQ but it requires two enumerations: one for counting and another for returning the data (as suggested by Esteban Elverdin).
If enumerating the query is expensive you can avoid the second enumeration by turning the query into a list and then use the GetRange method:
public static IEnumerable<List<T>> SplitList<T>(List<T> list, int numberOfRanges)
{
int sizeOfRanges = list.Count / numberOfRanges;
int remainder = list.Count % numberOfRanges;
int startIndex = 0;
for (int i = 0; i < numberOfRanges; i++)
{
int size = sizeOfRanges + (remainder > 0 ? 1 : 0);
yield return list.GetRange(startIndex, size);
if (remainder > 0)
{
remainder--;
}
startIndex += size;
}
}
static void Main()
{
List<int> list = Enumerable.Range(0, 10).ToList();
IEnumerable<List<int>> result = SplitList(list, 3);
foreach (List<int> values in result)
{
string s = string.Join(", ", values);
Console.WriteLine("{{ {0} }}", s);
}
}
The output is:
{ 0, 1, 2, 3 }
{ 4, 5, 6 }
{ 7, 8, 9 }
You can create an extension method:
public static IList<List<T>> GetChunks<T>(this IList<T> items, int numOfChunks)
{
if (items.Count < numOfChunks)
throw new ArgumentException("The number of elements is lower than the number of chunks");
int div = items.Count / numOfChunks;
int rem = items.Count % numOfChunks;
var listOfLists = new List<T>[numOfChunks];
for (int i = 0; i < numOfChunks; i++)
listOfLists[i] = new List<T>();
int currentGrp = 0;
int currRemainder = rem;
foreach (var el in items)
{
int currentElementsInGrp = listOfLists[currentGrp].Count;
if (currentElementsInGrp == div && currRemainder > 0)
{
currRemainder--;
}
else if (currentElementsInGrp >= div)
{
currentGrp++;
}
listOfLists[currentGrp].Add(el);
}
return listOfLists;
}
then use it like this :
var chunks = query.GetChunks(3);
N.B.
in case of number of elements not divisible by the number of groups, the first groups will be bigger. e.g. [0,1,2,3,4] --> [0,1] - [2,3] - [4]

Substracting adjacent two values in a list C#

There is a list called cardReaderHistory . That contains some time records in ordered fashion as follows,
InTime1
OutTime1
InTime2
OutTime2
InTime3
OutTime3
InTime4
OutTime4.....furthermore..
What I need is Calculate Working time by (OutTime1 - Intime1) + (OutTime1 - Intime1).....
double
How could I do this in C#...????
double hr = ((outTime1 - inTime1)+(OutTime2 - Intime2)+...);
Thank you..
Poor Beginner
You can filter the input sequence based on the index, and then zip the two sequences:
var inTimes = source.Where((x, index) => index % 2 == 0);
var outTimes = source.Where((x, index) => index % 2 == 1);
var result = inTimes.Zip(outTimes, (inTime, outTime) => outTime - inTime).Sum();
If you don't need the intermediary values, you can also do this:
var result = source.Select((x, index) => (index % 2 == 0) ? -x : x).Sum();
Assuming your cardReaderHistory list is a list of doubles:
List<double> cardReaderHistory = new List<double>(); //fill it somewhere
double result;
for(int i = 0; i < cardReaderHistory.Count(); i++)
{
if(i%2==0) //If it is equal
result -= cardReaderHistory[i];
else //its unequal
result += cardReaderHistory[i];
}
You just loop over your values and add or subtract based on if its even or not.
Something like this...
List<double> l = new List<double> { 1, 2, 3, 4, 5, 6, 7, 8 };
double hr = 0;
for (int i = 0; i < l.Count; i++)
{
hr += i%2 == 0 ? -l[i] : l[i];
}
It seems plausible that the list contains datetimes and not hours. Currently none of the other answers handles that. Here's my take.
var cardReaderHistory = new List<DateTime> {DateTime.Now.AddMinutes(-120), DateTime.Now.AddMinutes(-100), DateTime.Now.AddMinutes(-20), DateTime.Now};
var hours = cardReaderHistory.Split(2).Select(h => (h.Last() - h.First()).TotalHours).Sum();
Split is an extension method
static class ExtentionsMethods
{
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> seq, int size)
{
while (seq.Any())
{
yield return seq.Take(size);
seq = seq.Skip(size);
}
}
}

Categories