Hello I have an array of enumerations, and I'm trying sort that array by their enumeration value and get the array index's of the top
private enum Values
{
LOW,
MEDIUM,
HIGH
}
private Values[] Settings = new Values[10];
Settings[0] = LOW;
Settings[1] = HIGH;
Settings[2] = MEDIUM;
Settings[3] = LOW;
Settings[4] = LOW;
Settings[5] = LOW;
Settings[6] = LOW;
Settings[7] = LOW;
Settings[8] = MEDIUM;
Settings[9] = MEDIUM;
Basically now, with what I have above, I need to sort the settings array by enumeration value and get the array indexes of the top (let's say 3) items;
So I'd get back values 1, 2, 8
The platform I'm using does not support LINQ so those handy functions are not available.
I've been trying to wrap my head around this but it would help to have another pair of eyes.
thanks.
Implement a wrapper reference type,
class ValueWrapper : IComparable<ValueWrapper>
{
public Values Value { get; set; }
public int Index { get; set; }
public int CompareTo(ValueWrapper other)
{
return this.Value.CompareTo(other.Value) * -1; // Negating since you want reversed order
}
}
Usage -
ValueWrapper[] WrappedSettings = new ValueWrapper[10];
for(int i = 0; i < WrappedSettings.Length; i++)
{
WrappedSettings[i] = new ValueWrapper { Value = Settings[i], Index = i };
}
Array.Sort(WrappedSettings);
WrappedSettings will be sorted as you specified, preserving the indexes they were in the original array.
how about this:
Values first = Values.Low,second = Values.Low,third = Values.Low;
int firstI = -1,secondI = -1, thirdI = -1;
for(int i = 0;i < Settings.Length;i++)
{
if(Settings[i] > first || firstI == -1)
{
third = second;
thirdI = secondI;
second= first;
secondI= firstI;
first = Settings[i];
firstI = i;
}
else if(Settings[i] > second || secondI == -1)
{
third = second;
thirdI = secondI;
second = Settings[i];
secondI = i;
}
else if(Settings[i] > third || thirdI == -1)
{
third = Settings[i];
thirdI = i;
}
}
So, since you say you work in an environment where Linq is not available, I assume that other things like generics, nullables and so on will not be available either. A very low-tech solution.
Basic idea:
For every possible enum value, from highest to lowest, go through the list. If we find that value, output it and remember how many we have output. If we reach 3, stop the algorithm.
So, we first look for HIGH in the list, then for MEDIUM and so on.
class Program
{
private enum Values
{
LOW,
MEDIUM,
HIGH
}
static void Main(string[] args)
{
// Sample data
Values[] settings = new Values[10];
settings[0] = Values.LOW;
settings[1] = Values.HIGH;
settings[2] = Values.MEDIUM;
settings[3] = Values.LOW;
settings[4] = Values.LOW;
settings[5] = Values.LOW;
settings[6] = Values.LOW;
settings[7] = Values.LOW;
settings[8] = Values.MEDIUM;
settings[9] = Values.MEDIUM;
// Get Values of the enum type
// This list is sorted ascending by value but may contain duplicates
Array enumValues = Enum.GetValues(typeof(Values));
// Number of results found so far
int numberFound = 0;
// The enum value we used during the last outer loop, so
// we skip duplicate enum values
int lastValue = -1;
// For each enum value starting with the highest to the lowest
for (int i= enumValues.Length -1; i >= 0; i--)
{
// Get this enum value
int enumValue = (int)enumValues.GetValue(i);
// Check whether we had the same value in the previous loop
// If yes, skip it.
if(enumValue == lastValue)
{
continue;
}
lastValue = enumValue;
// For each entry in the list where we are searching
for(int j=0; j< settings.Length; j++)
{
// Check to see whether it is the currently searched value
if (enumValue == (int)settings[j])
{
// if yes, then output it.
Console.WriteLine(j);
numberFound++;
// Stop after 3 found entries
if (numberFound == 3)
{
goto finished;
}
}
}
}
finished:
Console.ReadLine();
}
}
Output is as requested 1,2,8
I'm not sure if this is exactly what you want because it doesn't sort the original array, but one way to get the indexes of the top three values is to simply store the indexes of the top values in another array. Then we can loop through the original array, and for each item, see if it's larger than any of the items at the indexes we've stored so far. If it is, then swap it with that item.
For example:
// Start the topIndexes array with all invalid indexes
var topIndexes = new[] {-1, -1, -1};
for (var settingIndex = 0; settingIndex < Settings.Length; settingIndex++)
{
var setting = Settings[settingIndex];
var topIndexLessThanSetting = -1;
// Find the smallest topIndex setting that's less than this setting
for (int topIndex = 0; topIndex < topIndexes.Length; topIndex++)
{
if (topIndexes[topIndex] == -1)
{
topIndexLessThanSetting = topIndex;
break;
}
if (setting <= Settings[topIndexes[topIndex]]) continue;
if (topIndexLessThanSetting == -1 ||
Settings[topIndexes[topIndex]] < Settings[topIndexes[topIndexLessThanSetting]])
{
topIndexLessThanSetting = topIndex;
}
}
topIndexes[topIndexLessThanSetting] = settingIndex;
}
// topIndexes = { 1, 2, 8 }
Related
Input:
public class MyObject
{
public double Value { get; set; }
public DateTime Date { get; set; }
}
Method to generate test objects:
public static MyObject[] GetTestObjects()
{
var rnd = new Random();
var date = new DateTime(2021, 1, 1, 0, 0, 0);
var result = new List<MyObject>();
for (int i = 0; i < 50000; i++)
{
//this is to simulate real data having gaps
if (rnd.Next(100) < 25)
{
continue;
}
var myObject = new MyObject()
{
Value = rnd.NextDouble(),
Date = date.AddMinutes(15 * i)
};
result.Add(myObject);
}
return result.ToArray();
}
Given this I require to calculate maximum Value for previous 12 month for each myObject. I could just think of doing this InParallel, but maybe there is an optimized solution?
Sorry for being unclear, this is what I use right now to get what I want:
public MyObject[] BruteForceBackward(MyObject[] testData)
{
return testData.AsParallel().Select(point =>
{
var max = testData.Where(x => x.Date <= point.Date && x.Date >= point.Date.AddYears(-1)).Max(x => x.Value);
return new MyObject() { Date = point.Date, Value = point.Value / max };
}).OrderBy(r => r.Date).ToArray();
}
This works but it is slow and eats processor resources (imagine, you have 100k objects), I believe there must be something better
I had a simillar project where i had to calculate such stuff on tons of sensor data.
You can now find a little more refined version in my Github repository, which should be ready to use (.Net):
https://github.com/forReason/Statistics-Helper-Library
In general you want to reduce the amount of loops going over all your data. At best, you want to touch each element only one single time.
Process Array (equiv. of BruteForceBackwards)
public static MyObject[] FlowThroughForward(ref MyObject[] testData)
{
// generate return array
MyObject[] returnData = new MyObject[testData.Length];
// keep track to minimize processing
double currentMaximum = 0;
List<MyObject> maximumValues = new List<MyObject>();
// go through the elements
for (int i = 0; i < testData.Length; i++)
{
// calculate the oldest date to keep in tracking list
DateTime targetDate = testData[i].Date.AddYears(-1);
// maximum logic
if (testData[i].Value >= currentMaximum)
{
// new maximum found, clear tracking list
// this is the best case scenario
maximumValues.Clear();
currentMaximum = testData[i].Value;
}
else
{
// unfortunately, no new maximum was found
// go backwards the maximum tracking list and check for smaller values
// clear the list of all smaller values. The list should therefore always
// be in descending order
for (int b = maximumValues.Count - 1; b >= 0; b--)
{
if (maximumValues[b].Value <= testData[i].Value)
{
// a lower value has been found. We have a newer, higher value
// clear this waste value from the tracking list
maximumValues.RemoveAt(b);
}
else
{
// there are no more lower values.
// stop looking for smaller values to save time
break;
}
}
}
// append new value to tracking list, no matter if higher or lower
// all future values might be lower
maximumValues.Add(testData[i]);
// check if the oldest value is too old to be kept in the tracking list
while (maximumValues[0].Date < targetDate)
{
// oldest value is to be removed
maximumValues.RemoveAt(0);
// update maximum
currentMaximum = maximumValues[0].Value;
}
// add object to result list
returnData[i] = new MyObject() { Date = testData[i].Date, Value = testData[i].Value / currentMaximum }; ;
}
return returnData;
}
Real Time Data or Streamed Data
Note: If you have really large lists, you might get memory issues with your approach to pass a full array. In this case: pass one value at a time, pass them from oldest value to newest value. Store the values back one at a time.
This Function can also be used on real time data.
The test method is included in code.
static void Main(string[] args)
{
int length = 50000;
Stopwatch stopWatch1 = new Stopwatch();
stopWatch1.Start();
var myObject = new MyObject();
var result = new List<MyObject>();
var date = new DateTime(2021, 1, 1, 0, 0, 0);
for (int i = 0; i < length; i++)
{
//this is to simulate real data having gaps
if (rnd.Next(100) < 25)
{
continue;
}
myObject.Value = rnd.NextDouble();
myObject.Date = date.AddMinutes(15 * i);
result.Add(CalculateNextObject(ref myObject));
}
stopWatch1.Stop();
Console.WriteLine("test code executed in " + stopWatch1.ElapsedMilliseconds + " ms");
Thread.Sleep(1000000);
}
private static Random rnd = new Random();
private static double currentMaximum = 0;
private static List<MyObject> maximumValues = new List<MyObject>();
public static MyObject CalculateNextObject(ref MyObject input)
{
// calculate the oldest date to keep in tracking list
DateTime targetDate = input.Date.AddYears(-1);
// maximum logic
if (input.Value >= currentMaximum)
{
// new maximum found, clear tracking list
// this is the best case scenario
maximumValues.Clear();
currentMaximum = input.Value;
}
else
{
// unfortunately, no new maximum was found
// go backwards the maximum tracking list and check for smaller values
// clear the list of all smaller values. The list should therefore always
// be in descending order
for (int b = maximumValues.Count - 1; b >= 0; b--)
{
if (maximumValues[b].Value <= input.Value)
{
// a lower value has been found. We have a newer, higher value
// clear this waste value from the tracking list
maximumValues.RemoveAt(b);
}
else
{
// there are no more lower values.
// stop looking for smaller values to save time
break;
}
}
}
// append new value to tracking list, no matter if higher or lower
// all future values might be lower
maximumValues.Add(input);
// check if the oldest value is too old to be kept in the tracking list
while (maximumValues[0].Date < targetDate)
{
// oldest value is to be removed
maximumValues.RemoveAt(0);
// update maximum
currentMaximum = maximumValues[0].Value;
}
// add object to result list
MyObject returnData = new MyObject() { Date = input.Date, Value = input.Value / currentMaximum };
return returnData;
}
Test Method
static void Main(string[] args)
{
MyObject[] testData = GetTestObjects();
Stopwatch stopWatch1 = new Stopwatch();
Stopwatch stopWatch2 = new Stopwatch();
stopWatch1.Start();
MyObject[] testresults1 = BruteForceBackward(testData);
stopWatch1.Stop();
Console.WriteLine("BruteForceBackward executed in " + stopWatch1.ElapsedMilliseconds + " ms");
stopWatch2.Start();
MyObject[] testresults2 = FlowThroughForward(ref testData);
stopWatch2.Stop();
Console.WriteLine("FlowThroughForward executed in " + stopWatch2.ElapsedMilliseconds + " ms");
Console.WriteLine();
Console.WriteLine("Comparing some random test results: ");
var rnd = new Random();
for (int i = 0; i < 10; i++)
{
int index = rnd.Next(0, testData.Length);
Console.WriteLine("Index: " + index + " brute: " + testresults1[index].Value + " flow: " + testresults2[index].Value);
}
Thread.Sleep(1000000);
}
Test result
Tests were performed on a machine with 32 cores, so in teory multithreaded aproach should be at advantage but youll see ;)
Function
Function Time
time %
BruteForceBackward
5334 ms
99.9%
FlowThroughForward
5 ms
0.094%
Performance improvement factor: ~time/1000
console output with data validation:
BruteForceBackward executed in 5264 ms
FlowThroughForward executed in 5 ms
Comparing some random test results:
Index: 25291 brute: 0.989688139105413 flow: 0.989688139105413
Index: 11945 brute: 0.59670821976193 flow: 0.59670821976193
Index: 30282 brute: 0.413238225210297 flow: 0.413238225210297
Index: 33898 brute: 0.38258761939139 flow: 0.38258761939139
Index: 8824 brute: 0.833512217105447 flow: 0.833512217105447
Index: 22092 brute: 0.648052464067263 flow: 0.648052464067263
Index: 24633 brute: 0.35859417692481 flow: 0.35859417692481
Index: 24061 brute: 0.540642018793402 flow: 0.540642018793402
Index: 34219 brute: 0.498785766613022 flow: 0.498785766613022
Index: 2396 brute: 0.151471808392111 flow: 0.151471808392111
Cpu usage was a lot higher on Bruteforce backwards due to parallelisation.
The worst case scenario are long periods of decreasing values. The code can still be vastly optimized but I guess this should be sufficient. For further optimisation, one might look to reduce the list shuffles when removing/adding elements to maximumValues.
An interesting and challenging problem. I put together a solution using a dynamic programming approach (first learned back in CS algorithms class back in '78). First, a tree is constructed containing pre-calculated local max values over recursively defined ranges. Once constructed, the max value for an arbitrary range can be efficiently calculated mostly using the pre-calculated values. Only at the fringes of the range does the calculation drop down to the element level.
It is not as fast as julian bechtold's FlowThroughForward method, but random access to ranges may be a plus.
Code to add to Main:
Console.WriteLine();
Stopwatch stopWatch3 = new Stopwatch();
stopWatch3.Start();
MyObject[] testresults3 = RangeTreeCalculation(ref testData, 10);
stopWatch3.Stop();
Console.WriteLine($"RangeTreeCalculation executed in {stopWatch3.ElapsedMilliseconds} ms");
... test comparison
Console.WriteLine($"Index: {index} brute: {testresults1[index].Value} flow: {testresults2[index].Value} rangeTree: {testresults3[index].Value}");
Test function:
public static MyObject[] RangeTreeCalculation(ref MyObject[] testDataArray, int partitionThreshold)
{
// For this implementation, we need to convert the Array to an ArrayList, because we need a
// reference type object that can be shared.
List<MyObject> testDataList = testDataArray.ToList();
// Construct a tree containing recursive collections of pre-calculated values
var rangeTree = new RangeTree(testDataList, partitionThreshold);
MyObject[] result = new MyObject[testDataList.Count];
Parallel.ForEach(testDataList, (item, state, i) =>
{
var max = rangeTree.MaxForDateRange(item.Date.AddYears(-1), item.Date);
result[i] = new MyObject() { Date = item.Date, Value = item.Value / max };
});
return result;
}
Supporting class:
// Class used to divide and conquer using dynamic programming.
public class RangeTree
{
public List<MyObject> Data; // This reference is shared by all members of the tree
public int Start { get; } // Index of first element covered by this node.
public int Count { get; } // Number of elements covered by this node.
public DateTime FirstDateTime { get; }
public DateTime LastDateTime { get; }
public double MaxValue { get; } // Pre-calculated max for all elements covered by this node.
List<RangeTree> ChildRanges { get; }
// Top level node constructor
public RangeTree(List<MyObject> data, int partitionThreshold)
: this(data, 0, data.Count, partitionThreshold)
{
}
// Child node constructor, which covers an recursively decreasing range of element.
public RangeTree(List<MyObject> data, int start, int count, int partitionThreshold)
{
Data = data;
Start = start;
Count = count;
FirstDateTime = Data[Start].Date;
LastDateTime = Data[Start + Count - 1].Date;
if (count <= partitionThreshold)
{
// If the range is smaller than the threshold, just calculate the local max
// directly from the items. No child ranges are defined.
MaxValue = Enumerable.Range(Start, Count).Select(i => Data[i].Value).Max();
}
else
{
// We still have a significant range. Decide how to further divide them up into sub-ranges.
// (There may be room for improvement here to better balance the tree.)
int partitionSize = (count - 1) / partitionThreshold + 1;
int partitionCount = (count - 1) / partitionSize + 1;
if (count < partitionThreshold * partitionThreshold)
{
// When one away from leaf nodes, prefer fewer full leaf nodes over more
// less populated leaf nodes.
partitionCount = (count - 1) / partitionThreshold + 1;
partitionSize = (count - 1) / partitionCount + 1;
}
ChildRanges = Enumerable.Range(0, partitionCount)
.Select(partitionNum => new {
ChildStart = Start + partitionNum * partitionSize,
ChildCount = Math.Min(partitionSize, Count - partitionNum * partitionSize)
})
.Where(part => part.ChildCount > 0) // Defensive
.Select(part => new RangeTree(Data, part.ChildStart, part.ChildCount, partitionThreshold))
.ToList();
// Now is the dynamic programming part:
// Calculate the local max as the max of all child max values.
MaxValue = ChildRanges.Max(chile => chile.MaxValue);
}
}
// Get the max value for a given range of dates withing this rangeTree node.
// This used the precalculated values as much as possible.
// Only at the fringes of the date range to we calculate at the element level.
public double MaxForDateRange(DateTime fromDate, DateTime thruDate)
{
double calculatedMax = Double.MinValue;
if (fromDate > this.LastDateTime || thruDate < this.FirstDateTime)
{
// Entire range is excluded. Nothing of interest here folks.
calculatedMax = Double.MinValue;
}
else if (fromDate <= this.FirstDateTime && thruDate >= this.LastDateTime)
{
// Entire range is included. Use the already-calculated max.
calculatedMax = this.MaxValue;
}
else if (ChildRanges != null)
{
// We have child ranges. Recurse and accumulate.
// Possible optimization: Calculate max for middle ranges first, and only bother
// with extreme partial ranges if their local max values exceed the preliminary result.
for (int i = 0; i < ChildRanges.Count; ++i)
{
double childMax = ChildRanges[i].MaxForDateRange(fromDate, thruDate);
if (childMax > calculatedMax)
{
calculatedMax = childMax;
}
}
}
else
{
// Leaf range. Loop through just this limited range of notes, checking individually for
// date in range and accumulating the result.
for (int i = 0; i < this.Count; ++i)
{
var element = Data[this.Start + i];
if (fromDate <= element.Date && element.Date <= thruDate && element.Value > calculatedMax)
{
calculatedMax = element.Value;
}
}
}
return calculatedMax;
}
}
There's plenty of room for improvement, such as parameterizing the types and generalizing the functionality to support more than just Max(Value), but the framework is there.
Assuming you meant you need the maximum Value for each of the last 12 months from result, then you can use LINQ:
var beginDateTime = DateTime.Now.AddMonths(-12);
var ans = result.Where(r => r.Date >= beginDateTime).GroupBy(r => r.Date.Month).Select(mg => mg.MaxBy(r => r.Value)).ToList();
Running some timing, I get that putting AsParallel after result changes the run time from around 16ms (first run) to around 32ms, so it is actually slower. It is about the same after the Where and about 23ms after the GroupBy (processing the 12 groups in parallel). On my PC at least, there isn't enough data or complex operations for parallelism, but the GroupBy isn't the most efficient.
Using an array and testing each element, I get the results in about 1.2ms:
var maxMOs = new MyObject[12];
foreach (var r in result.Where(r => r.Date >= beginDateTime)) {
var monthIndex = r.Date.Month-1;
if (maxMOs[monthIndex] == null || r.Value > maxMOs[monthIndex].Value)
maxMOs[monthIndex] = r;
}
Note that the results are not chronological; you could offset monthIndex by today's month to order the results if desired.
var maxMOs = new MyObject[12];
var offset = DateTime.Now.Month-11;
foreach (var r in result.Where(r => r.Date >= beginDateTime)) {
var monthIndex = r.Date.Month-offset;
if (maxMOs[monthIndex] == null || r.Value > maxMOs[monthIndex].Value)
maxMOs[monthIndex] = r;
}
A micro-optimization (mostly useful on repeat runnings) is to invert the test and use the null-propagating operator:
if (!(r.Value <= maxMOs[monthIndex]?.Value))
This saves about 0.2ms on the first run but up to 0.5ms on subsequent runs.
Here is a solution similar to julian bechtold's answer. Difference is that the maximum (and all related variables) are kept hidden away from the main implementation, in a separate class whose purpose is solely to keep track of the maximum over the past year. Algorithm is the same, I just use a few Linq expressions here and there.
We keep track of the maximum in the following class:
public class MaxSlidingWindow
{
private readonly List<MyObject> _maximumValues;
private double _max;
public MaxSlidingWindow()
{
_maximumValues = new List<MyObject>();
_max = double.NegativeInfinity;
}
public double Max => _max;
public void Add(MyObject myObject)
{
if (myObject.Value >= _max)
{
_maximumValues.Clear();
_max = myObject.Value;
}
else
{
RemoveValuesSmallerThan(myObject.Value);
}
_maximumValues.Add(myObject);
RemoveObservationsBefore(myObject.Date.AddYears(-1));
_max = _maximumValues[0].Value;
}
private void RemoveObservationsBefore(DateTime targetDate)
{
var toRemoveFromFront = 0;
while (_maximumValues[toRemoveFromFront].Date < targetDate && toRemoveFromFront <= maximumValues3.Count -1)
{
toRemoveFromFront++;
}
_maximumValues.RemoveRange(0, toRemoveFromFront);
}
private void RemoveValuesSmallerThan(double targetValue)
{
var maxEntry = _maximumValues.Count - 1;
var toRemoveFromBack = 0;
while (toRemoveFromBack <= maxEntry && _maximumValues[maxEntry - toRemoveFromBack].Value <= targetValue)
{
toRemoveFromBack++;
}
_maximumValues.RemoveRange(maxEntry - toRemoveFromBack + 1, toRemoveFromBack);
}
}
It can be used as follows:
public static MyObject[] GetTestObjects_MaxSlidingWindow()
{
var rnd = new Random();
var date = new DateTime(2021, 1, 1, 0, 0, 0);
var result = new List<MyObject>();
var maxSlidingWindow = new MaxSlidingWindow();
for (int i = 0; i < 50000; i++)
{
//this is to simulate real data having gaps
if (rnd.Next(100) < 25)
{
continue;
}
var myObject = new MyObject()
{
Value = rnd.NextDouble(),
Date = date.AddMinutes(15 * i)
};
maxSlidingWindow.Add(myObject);
var max = maxSlidingWindow.Max;
result.Add(new MyObject { Date = myObject.Date, Value = myObject.Value / max });
}
return result.ToArray();
}
See the relative timings below - above solution is slightly faster (timed over 10 million runs), but barely noticeable:
Relative timings
What I want to do is find groups of consecutive numbers(Excel row numbers) in a List and for each chunk of consecutive numbers, delete the rows en masse, rather than one at at time since at times I will be iterating through up to 9K rows and growing. The issue that I am running into is the way I have tweaked the foreach loop, I would need to reuse the last non-consecutive variable that was checked.
Ex: The list is rows {23,22,21,17,16,15} i would need to pull 23-21, then 17-15, out and delete the chunks (that's why they are in descending order, work from the bottom up). The loop enters the != if statement on 17, and works, but then 17 is already used and 16 is the next iteration of the loop, so 17 is never captured as the start of the next consecutively numbered grouping.
My question: Is there a way to hold on to the 17, and any other start of a new consecutive group, in this manner or am I barking up the wrong tree?
Code:
public void FindMatchingBlocks(string stateId, string[] rangeNames)
{
Excel.Worksheet wksht = wkbk.Sheets["Sheet1"];
Excel.Range rng = wksht.Range["$A$15:$A$23"];
string val;
string val2;
List<int>rowNums = new List<int>();
string rngStart = rangeNames[0].ToString(); //gives me "$A$15"
string rngEnd = rangeNames[1].ToString();//gives me $A$23$
string[] tempArray = rngEnd.Split('$');
string end = tempArray[2].ToString();
List<int> rowsToDelete = new List<int>();
foreach (Excel.Range range in rng)
{
if (range.Row < Convert.ToInt32(end)+1)
{
//pulls out the first two characters of the cell value to
// match it to the stateID, if they match they are not to
// be added to the list and not be deleted.
val = range.Value.ToString();
val2 = val.Substring(0, 2);
if (Convert.ToInt32(val2) != Convert.ToInt32(stateId))
{
rowsToDelete.Add(range.Row); // ends up being
// {23,22,21,17,16,15}
}
}
}
int count = 0;
int firstItem = 0;
rowsToDelete.Reverse(); //delete from the bottom up
foreach (int x in rowsToDelete)
{
// First value in the ordered list: start of a sequence
if (count == 0)
{
firstItem = x;
count = 1;
}
// Skip duplicate values
else if (x == firstItem - count)
{
count++;
}
// New value contributes to sequence
else if (x != firstItem - count)
{
int endRow = firstItem;
int startRow = firstItem - count + 1;
Excel.Range delRange = wksht.Rows[startRow.ToString() + ":" + endRow.ToString()];
delRange.Delete(Excel.XlDeleteShiftDirection.xlShiftUp);
count = 0;
firstItem = ????; //can I do something to keep the first
//non-consecutive number each time it is
// encountered. In this list it skips 17
}
}
}
Hopefully this is clear, took me a bit to figure out how to concisely explain what I need. Thanks.
What do we have? A sequence of integers.
What do we want? A sequence of integer ranges.
Start by representing that in the type system. We have IEnumerable<int> for a sequence of integers. Let's make a little type: (using C# 6 notation here)
struct MyRange
{
public int High { get; }
public int Low { get; }
public MyRange(int high, int low) : this()
{
High = high;
Low = low;
}
}
Easy. What is the signature of our method? We want integers in and ranges out, so:
static class MyExtensions
{
public static IEnumerable<MyRange> DescendingChunks(this IEnumerable<int> items)
Seems reasonable. Now what does this thing do? There are three cases. Either we've got no range at all because we're the first, or we're extending the current range, or we've got a new range. So one case for each:
{
bool first = true;
int high = 0;
int low = 0;
foreach(int item in items)
{
if (first)
{
high = item;
low = item;
first = false;
}
else if (item == low - 1)
{
low = item;
}
else
{
yield return new MyRange(high, low);
high = item;
low = item;
}
}
And we never yielded the last thing in the sequence...
yield return new MyRange(high, low);
}
Make sense? Now instead of your loop
foreach (int x in rowsToDelete)
we have
foreach(MyRange range in rowsToDelete.DescendingChunks())
and now you know what range to modify.
Super bonus question: there is another case I did not enumerate, and as a result there is a small bug in this method. What is it?
It took some time but I was able to come up with a compact way to take a list of numbers find the consecutive numbers and group them into a List. Hopefully if someone finds this and its useful:
private void groupConsecutiveNumbers()
{
/* this could easily be changed to look for ascending numbered groups by switching some of the "-1" to "+1"
* and swapping the firstNum/endNum variables. */
int[] numArray = new int[]{ 50, 23, 22, 21, 15, 16, 14, 9, 5, 4, 3, 1};
int firstNum = 0;
int endNum = 0;
string grouping;
for (int i = 0; i < numArray.Length; i++)
{
//If there is only 1 member of the list, that will be the first and last member of the group
if (numArray.Length == 1)
{
firstNum = numArray[0];
endNum = numArray[0];
grouping = firstNum.ToString() + "-" + endNum.ToString();
lstGroups.Items.Add(grouping);
}
//if the number is the first one in the list then it automatically is the first one in the first list
else if (i == 0)
{
firstNum = numArray[0];
}
/* if its not the first one in the list and it is equal to the previous list item minus one
* (contiguously descending), then enter this loop */
else if (numArray[i] == (numArray[i-1] - 1))
{
//if this is the last member of the list, it automatically is the last item in the range
if ((i + 1) == numArray.Length)
{
endNum = numArray[i];
grouping = firstNum.ToString() + "-" + endNum.ToString();
lstGroups.Items.Add(grouping);
}
//if this isn't the last member of the list, exit the loop and continue with the next item.
else
{
continue;
}
}
/* if the item if its not the first one in the list and does NOT equal the last item minus one
* (not contiguously descending) then the previous item was the last contiguously descending
* item and the current item is the first item in the next group */
else if (numArray[i] != (numArray[i-1]-1))
{
endNum = numArray[i - 1];
grouping = firstNum.ToString() + "-" + endNum.ToString();
lstGroups.Items.Add(grouping);
firstNum = numArray[i];
endNum = 0;
}
/* After all that testing,if the item is the last item in the list AND the first number in the group
* is also the last item in the list then the current item in the list is both the first and last member
* in the current group. */
if ((i + 1) == numArray.Length && firstNum == numArray[i])
{
endNum = numArray[i];
grouping = firstNum.ToString() + "-" + endNum.ToString();
lstGroups.Items.Add(grouping);
}
}
}
Consider the following enumeration:
[Flags]
public enum EnumWithUniqueBitFlags
{
None = 0,
One = 1,
Two = 2,
Four = 4,
Eight = 8,
}
[Flags]
public enum EnumWithoutUniqueFlags
{
None = 0,
One = 1,
Two = 2,
Four = 4,
Five = 5,
}
The first one has values such that any combination will produce a unique bit combination while the second one will not. I need to programmatically determine this. So far I had only been checking if every value was a power of two but this since this code will be used to consume Enums developed by others, it is not practical.
var values = Enum.GetValues(typeof(TEnum)).OfType<TEnum>().ToList();
for (int i = 0; i < values.Count; i++)
{
if (((int) ((object) values [i])) != ((int) Math.Pow(2, i)))
{
throw (new Exception("Whatever."));
}
}
As opposed to the code above how could programatically determine that the following Enum meets the bit combination uniqueness objective (without assumptions such as values being powers of 2, etc.)?
[Flags]
public enum EnumWithoutUniqueFlags
{
Two = 2,
Four = 9,
Five = 64,
}
Please ignore which integral type the enum derives from as well as the fact that values could be negative.
You could compare the bitwise OR of enum values with the arithmetic sum of enum values:
var values = Enum.GetValues(typeof(EnumWithoutUniqueFlags))
.OfType<EnumWithoutUniqueFlags>().Select(val => (int)val).ToArray();
bool areBitsUnique = values.Aggregate(0, (acc, val) => acc | val) == values.Sum();
EDIT
As the #usr mentioned, the code above works for positive values only.
Despite the fact that the OP requested to ignore:
Please ignore which integral type the enum derives from as well as the fact that values could be negative.
I'm glad to introduce more optimal single loop approach:
private static bool AreEnumBitsUnique<T>()
{
int mask = 0;
foreach (int val in Enum.GetValues(typeof(T)))
{
if ((mask & val) != 0)
return false;
mask |= val;
}
return true;
}
To make it work for other underlying type (uint, long, ulong), just change the type of the mask and val variables.
EDIT2
Since the Enum.GetValues method returns values in the ascending order of their unsigned magnitude, if you need to check for duplicates of zero value, you could use the following approach:
private static bool AreEnumBitsUnique<T>()
{
int mask = 0;
int index = 0;
foreach (int val in Enum.GetValues(typeof(T)))
{
if ((mask & val) != 0) // If `val` and `mask` have common bit(s)
return false;
if (val == 0 && index != 0) // If more than one zero value in the enum
return false;
mask |= val;
index += 1;
}
return true;
}
To be unique, the enum must have no bits in common with other enum values. The bitwise AND operation can be used for this.
for (int i = 0; i < values.Count; ++i)
{
for (int j = 0; j < values.Count; ++j)
{
if (i != j && ((values[i] & values[j]) != 0))
throw new Exception(...);
}
}
I would cast the enums to uints, do a bitwise not of a 0x0, use an xor and if the next value is less than the previous one you have a collision
public static bool CheckEnumClashing<TEnum>()
{
uint prev = 0;
uint curr = 0;
prev = curr = ~curr;
foreach(var target in Enum.GetValues(typeof(TEnum)).Select(a=>(uint)a))
{
curr ^=target;
if( curr <= prev )
return false;
prev = curr;
}
return true;
}
I want to get the index of the first bit that equal to 1, and the index of the last bit that equal to 1.
For example:
data=0x3E
first bit = 1
last bit = 5
How can I do this?
Take a look at BitArray class and its Item property. Then you could create a BitArray and loop it's items to get what you want easily.
I'm sure there are some fun bit-twiddling tricks you can use, but here's a naive solution:
public static void Main()
{
uint data = 0x3E;
uint firstMask = 1;
uint lastMask = 0x80000000;
int? first = null;
int? last = null;
for (int i=0; i < 32; i++) {
if ((firstMask & data) > 0 && first == null)
{
first = i;
}
if ((lastMask & data) != 0 && last == null)
{
last = i;
}
firstMask = firstMask << 1;
lastMask = lastMask >> 1;
}
last = 31-last;
Console.WriteLine(first);
Console.WriteLine(last);
}
See http://dotnetfiddle.net/FmN7QL
Manually build a lookup table:
var firstBit = new int[0x100] { -1, 0, 1, 0 ... };
var lastBit = new int[0x100] { -1, 0, 1, 1 ... };
Then use 'data' to index into those tables.
byte data = 0x3e;
var f = firstBit[data];
var l = lastBit[data];
I'll leave it as an exercise to automate the creation of the tables. :)
The .Net framework has an Array.Sort overload that allows one to specify the starting and ending indicies for the sort to act upon. However these parameters are only 32 bit. So I don't see a way to sort a part of a large array when the indicies that describe the sort range can only be specified using a 64-bit number. I suppose I could copy and modify the the framework's sort implementation, but that is not ideal.
Update:
I've created two classes to help me around these and other large-array issues. One other such issue was that long before I got to my memory limit, I start getting OutOfMemoryException's. I'm assuming this is because the requested memory may be available but not contiguous. So for that, I created class BigArray, which is a generic, dynamically sizable list of arrays. It has a smaller memory footprint than the framework's generic list class, and does not require that the entire array be contiguous. I haven't tested the performance hit, but I'm sure its there.
public class BigArray<T> : IEnumerable<T>
{
private long capacity;
private int itemsPerBlock;
private int shift;
private List<T[]> blocks = new List<T[]>();
public BigArray(int itemsPerBlock)
{
shift = (int)Math.Ceiling(Math.Log(itemsPerBlock) / Math.Log(2));
this.itemsPerBlock = 1 << shift;
}
public long Capacity
{
get
{
return capacity;
}
set
{
var requiredBlockCount = (value - 1) / itemsPerBlock + 1;
while (blocks.Count > requiredBlockCount)
{
blocks.RemoveAt(blocks.Count - 1);
}
while (blocks.Count < requiredBlockCount)
{
blocks.Add(new T[itemsPerBlock]);
}
capacity = (long)itemsPerBlock * blocks.Count;
}
}
public T this[long index]
{
get
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
return blocks[blockNumber][itemNumber];
}
set
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
blocks[blockNumber][itemNumber] = value;
}
}
public IEnumerator<T> GetEnumerator()
{
for (long i = 0; i < capacity; i++)
{
yield return this[i];
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
And getting back to the original issue of sorting... What I really needed was a way to act on each element of an array, in order. But with such large arrays, it is prohibitive to copy the data, sort it, act on it and then discard the sorted copy (the original order must be maintained). So I created static class OrderedOperation, which allows you to perform an arbitrary operation on each element of an unsorted array, in a sorted order. And do so with a low memory footprint (trading memory for execution time here).
public static class OrderedOperation
{
public delegate void WorkerDelegate(int index, float progress);
public static void Process(WorkerDelegate worker, IEnumerable<int> items, int count, int maxItem, int maxChunkSize)
{
// create a histogram such that a single bin is never bigger than a chunk
int binCount = 1000;
int[] bins;
double binScale;
bool ok;
do
{
ok = true;
bins = new int[binCount];
binScale = (double)(binCount - 1) / maxItem;
int i = 0;
foreach (int item in items)
{
bins[(int)(binScale * item)]++;
if (++i == count)
{
break;
}
}
for (int b = 0; b < binCount; b++)
{
if (bins[b] > maxChunkSize)
{
ok = false;
binCount *= 2;
break;
}
}
} while (!ok);
var chunkData = new int[maxChunkSize];
var chunkIndex = new int[maxChunkSize];
var done = new System.Collections.BitArray(count);
var processed = 0;
var binsCompleted = 0;
while (binsCompleted < binCount)
{
var chunkMax = 0;
var sum = 0;
do
{
sum += bins[binsCompleted];
binsCompleted++;
} while (binsCompleted < binCount - 1 && sum + bins[binsCompleted] <= maxChunkSize);
Debug.Assert(sum <= maxChunkSize);
chunkMax = (int)Math.Ceiling((double)binsCompleted / binScale);
var chunkCount = 0;
int i = 0;
foreach (int item in items)
{
if (item < chunkMax && !done[i])
{
chunkData[chunkCount] = item;
chunkIndex[chunkCount] = i;
chunkCount++;
done[i] = true;
}
if (++i == count)
{
break;
}
}
Debug.Assert(sum == chunkCount);
Array.Sort(chunkData, chunkIndex, 0, chunkCount);
for (i = 0; i < chunkCount; i++)
{
worker(chunkIndex[i], (float)processed / count);
processed++;
}
}
Debug.Assert(processed == count);
}
}
The two classes can work together (that's how I use them), but they don't have to. I hope someone else finds them useful. But I'll admit, they are fringe case classes. Questions welcome. And if my code sucks, I'd like to hear tips, too.
One final thought: As you can see in OrderedOperation, I'm using ints and not longs. Currently that is sufficient for me despite the original question I had (the application is in flux, in case you can't tell). But the class should be able to handle longs as well, should the need arise.
You'll find that even on the 64-bit framework, the maximum number of elements in an array is int.MaxValue.
The existing methods that take or return Int64 just cast the long values to Int32 internally and, in the case of parameters, will throw an ArgumentOutOfRangeException if a long parameter isn't between int.MinValue and int.MaxValue.
For example the LongLength property, which returns an Int64, just casts and returns the value of the Length property:
public long LongLength
{
get { return (long)this.Length; } // Length is an Int32
}
So my suggestion would be to cast your Int64 indicies to Int32 and then call one of the existing Sort overloads.
Since Array.Copy takes Int64 params, you could pull out the section you need to sort, sort it, then put it back. Assuming you're sorting less than 2^32 elements, of course.
Seems like if you are sorting more than 2^32 elements then it would be best to write your own, more efficient, sort algorithm anyway.