enum ToString and performance - c#

I was profiling some C# code and wound up investigating the implementation of ToString for enum values.
The ToString method ultimately calls System.Type.GetEnumName. Here's the relevant source code, taken from http://referencesource.microsoft.com/#mscorlib/system/type.cs
public virtual string GetEnumName(object value)
{
if (value == null)
throw new ArgumentNullException("value");
if (!IsEnum)
throw new ArgumentException(Environment.GetResourceString("Arg_MustBeEnum"), "enumType");
Contract.EndContractBlock();
Type valueType = value.GetType();
if (!(valueType.IsEnum || Type.IsIntegerType(valueType)))
throw new ArgumentException(Environment.GetResourceString("Arg_MustBeEnumBaseTypeOrEnum"), "value");
Array values = GetEnumRawConstantValues();
int index = BinarySearch(values, value);
if (index >= 0)
{
string[] names = GetEnumNames();
return names[index];
}
return null;
}
// Returns the enum values as an object array.
private Array GetEnumRawConstantValues()
{
string[] names;
Array values;
GetEnumData(out names, out values);
return values;
}
// This will return enumValues and enumNames sorted by the values.
private void GetEnumData(out string[] enumNames, out Array enumValues)
{
Contract.Ensures(Contract.ValueAtReturn<String[]>(out enumNames) != null);
Contract.Ensures(Contract.ValueAtReturn<Array>(out enumValues) != null);
FieldInfo[] flds = GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Static);
object[] values = new object[flds.Length];
string[] names = new string[flds.Length];
for (int i = 0; i < flds.Length; i++)
{
names[i] = flds[i].Name;
values[i] = flds[i].GetRawConstantValue();
}
// Insertion Sort these values in ascending order.
// We use this O(n^2) algorithm, but it turns out that most of the time the elements are already in sorted order and
// the common case performance will be faster than quick sorting this.
IComparer comparer = Comparer.Default;
for (int i = 1; i < values.Length; i++)
{
int j = i;
string tempStr = names[i];
object val = values[i];
bool exchanged = false;
// Since the elements are sorted we only need to do one comparision, we keep the check for j inside the loop.
while (comparer.Compare(values[j - 1], val) > 0)
{
names[j] = names[j - 1];
values[j] = values[j - 1];
j--;
exchanged = true;
if (j == 0)
break;
}
if (exchanged)
{
names[j] = tempStr;
values[j] = val;
}
}
enumNames = names;
enumValues = values;
}
// Convert everything to ulong then perform a binary search.
private static int BinarySearch(Array array, object value)
{
ulong[] ulArray = new ulong[array.Length];
for (int i = 0; i < array.Length; ++i)
ulArray[i] = Enum.ToUInt64(array.GetValue(i));
ulong ulValue = Enum.ToUInt64(value);
return Array.BinarySearch(ulArray, ulValue);
}
If I'm understanding this correctly, whenever an enum's name is requested:
The entire set of enum values and names are inserted into arrays using insertion sort.
The values are copied to an array of ulongs.
This array is searched with binary search.
Am right to think that this is grossly inefficient? (especially for enums with hundreds or thousands of values)
It seems like some sort of dictionary/hashtable structure would be ideal. But even a simple O(n) linear scan of the values would have accomplished the same goal without sorting or binary searching or allocating superfluous arrays.
I was able to precompute a dictionary for one particular case where ToString was being called many times in a loop; and that yielded significant performance gains. But is there any built-in, alternative, more holistic technique that would also facilitate things like JSON serialization?

Related

Sorting an array of enumeration

Hello I have an array of enumerations, and I'm trying sort that array by their enumeration value and get the array index's of the top
private enum Values
{
LOW,
MEDIUM,
HIGH
}
private Values[] Settings = new Values[10];
Settings[0] = LOW;
Settings[1] = HIGH;
Settings[2] = MEDIUM;
Settings[3] = LOW;
Settings[4] = LOW;
Settings[5] = LOW;
Settings[6] = LOW;
Settings[7] = LOW;
Settings[8] = MEDIUM;
Settings[9] = MEDIUM;
Basically now, with what I have above, I need to sort the settings array by enumeration value and get the array indexes of the top (let's say 3) items;
So I'd get back values 1, 2, 8
The platform I'm using does not support LINQ so those handy functions are not available.
I've been trying to wrap my head around this but it would help to have another pair of eyes.
thanks.
Implement a wrapper reference type,
class ValueWrapper : IComparable<ValueWrapper>
{
public Values Value { get; set; }
public int Index { get; set; }
public int CompareTo(ValueWrapper other)
{
return this.Value.CompareTo(other.Value) * -1; // Negating since you want reversed order
}
}
Usage -
ValueWrapper[] WrappedSettings = new ValueWrapper[10];
for(int i = 0; i < WrappedSettings.Length; i++)
{
WrappedSettings[i] = new ValueWrapper { Value = Settings[i], Index = i };
}
Array.Sort(WrappedSettings);
WrappedSettings will be sorted as you specified, preserving the indexes they were in the original array.
how about this:
Values first = Values.Low,second = Values.Low,third = Values.Low;
int firstI = -1,secondI = -1, thirdI = -1;
for(int i = 0;i < Settings.Length;i++)
{
if(Settings[i] > first || firstI == -1)
{
third = second;
thirdI = secondI;
second= first;
secondI= firstI;
first = Settings[i];
firstI = i;
}
else if(Settings[i] > second || secondI == -1)
{
third = second;
thirdI = secondI;
second = Settings[i];
secondI = i;
}
else if(Settings[i] > third || thirdI == -1)
{
third = Settings[i];
thirdI = i;
}
}
So, since you say you work in an environment where Linq is not available, I assume that other things like generics, nullables and so on will not be available either. A very low-tech solution.
Basic idea:
For every possible enum value, from highest to lowest, go through the list. If we find that value, output it and remember how many we have output. If we reach 3, stop the algorithm.
So, we first look for HIGH in the list, then for MEDIUM and so on.
class Program
{
private enum Values
{
LOW,
MEDIUM,
HIGH
}
static void Main(string[] args)
{
// Sample data
Values[] settings = new Values[10];
settings[0] = Values.LOW;
settings[1] = Values.HIGH;
settings[2] = Values.MEDIUM;
settings[3] = Values.LOW;
settings[4] = Values.LOW;
settings[5] = Values.LOW;
settings[6] = Values.LOW;
settings[7] = Values.LOW;
settings[8] = Values.MEDIUM;
settings[9] = Values.MEDIUM;
// Get Values of the enum type
// This list is sorted ascending by value but may contain duplicates
Array enumValues = Enum.GetValues(typeof(Values));
// Number of results found so far
int numberFound = 0;
// The enum value we used during the last outer loop, so
// we skip duplicate enum values
int lastValue = -1;
// For each enum value starting with the highest to the lowest
for (int i= enumValues.Length -1; i >= 0; i--)
{
// Get this enum value
int enumValue = (int)enumValues.GetValue(i);
// Check whether we had the same value in the previous loop
// If yes, skip it.
if(enumValue == lastValue)
{
continue;
}
lastValue = enumValue;
// For each entry in the list where we are searching
for(int j=0; j< settings.Length; j++)
{
// Check to see whether it is the currently searched value
if (enumValue == (int)settings[j])
{
// if yes, then output it.
Console.WriteLine(j);
numberFound++;
// Stop after 3 found entries
if (numberFound == 3)
{
goto finished;
}
}
}
}
finished:
Console.ReadLine();
}
}
Output is as requested 1,2,8
I'm not sure if this is exactly what you want because it doesn't sort the original array, but one way to get the indexes of the top three values is to simply store the indexes of the top values in another array. Then we can loop through the original array, and for each item, see if it's larger than any of the items at the indexes we've stored so far. If it is, then swap it with that item.
For example:
// Start the topIndexes array with all invalid indexes
var topIndexes = new[] {-1, -1, -1};
for (var settingIndex = 0; settingIndex < Settings.Length; settingIndex++)
{
var setting = Settings[settingIndex];
var topIndexLessThanSetting = -1;
// Find the smallest topIndex setting that's less than this setting
for (int topIndex = 0; topIndex < topIndexes.Length; topIndex++)
{
if (topIndexes[topIndex] == -1)
{
topIndexLessThanSetting = topIndex;
break;
}
if (setting <= Settings[topIndexes[topIndex]]) continue;
if (topIndexLessThanSetting == -1 ||
Settings[topIndexes[topIndex]] < Settings[topIndexes[topIndexLessThanSetting]])
{
topIndexLessThanSetting = topIndex;
}
}
topIndexes[topIndexLessThanSetting] = settingIndex;
}
// topIndexes = { 1, 2, 8 }

Determine if an Enum with FlagsAttribute has unique bit values

Consider the following enumeration:
[Flags]
public enum EnumWithUniqueBitFlags
{
None = 0,
One = 1,
Two = 2,
Four = 4,
Eight = 8,
}
[Flags]
public enum EnumWithoutUniqueFlags
{
None = 0,
One = 1,
Two = 2,
Four = 4,
Five = 5,
}
The first one has values such that any combination will produce a unique bit combination while the second one will not. I need to programmatically determine this. So far I had only been checking if every value was a power of two but this since this code will be used to consume Enums developed by others, it is not practical.
var values = Enum.GetValues(typeof(TEnum)).OfType<TEnum>().ToList();
for (int i = 0; i < values.Count; i++)
{
if (((int) ((object) values [i])) != ((int) Math.Pow(2, i)))
{
throw (new Exception("Whatever."));
}
}
As opposed to the code above how could programatically determine that the following Enum meets the bit combination uniqueness objective (without assumptions such as values being powers of 2, etc.)?
[Flags]
public enum EnumWithoutUniqueFlags
{
Two = 2,
Four = 9,
Five = 64,
}
Please ignore which integral type the enum derives from as well as the fact that values could be negative.
You could compare the bitwise OR of enum values with the arithmetic sum of enum values:
var values = Enum.GetValues(typeof(EnumWithoutUniqueFlags))
.OfType<EnumWithoutUniqueFlags>().Select(val => (int)val).ToArray();
bool areBitsUnique = values.Aggregate(0, (acc, val) => acc | val) == values.Sum();
EDIT
As the #usr mentioned, the code above works for positive values only.
Despite the fact that the OP requested to ignore:
Please ignore which integral type the enum derives from as well as the fact that values could be negative.
I'm glad to introduce more optimal single loop approach:
private static bool AreEnumBitsUnique<T>()
{
int mask = 0;
foreach (int val in Enum.GetValues(typeof(T)))
{
if ((mask & val) != 0)
return false;
mask |= val;
}
return true;
}
To make it work for other underlying type (uint, long, ulong), just change the type of the mask and val variables.
EDIT2
Since the Enum.GetValues method returns values in the ascending order of their unsigned magnitude, if you need to check for duplicates of zero value, you could use the following approach:
private static bool AreEnumBitsUnique<T>()
{
int mask = 0;
int index = 0;
foreach (int val in Enum.GetValues(typeof(T)))
{
if ((mask & val) != 0) // If `val` and `mask` have common bit(s)
return false;
if (val == 0 && index != 0) // If more than one zero value in the enum
return false;
mask |= val;
index += 1;
}
return true;
}
To be unique, the enum must have no bits in common with other enum values. The bitwise AND operation can be used for this.
for (int i = 0; i < values.Count; ++i)
{
for (int j = 0; j < values.Count; ++j)
{
if (i != j && ((values[i] & values[j]) != 0))
throw new Exception(...);
}
}
I would cast the enums to uints, do a bitwise not of a 0x0, use an xor and if the next value is less than the previous one you have a collision
public static bool CheckEnumClashing<TEnum>()
{
uint prev = 0;
uint curr = 0;
prev = curr = ~curr;
foreach(var target in Enum.GetValues(typeof(TEnum)).Select(a=>(uint)a))
{
curr ^=target;
if( curr <= prev )
return false;
prev = curr;
}
return true;
}

How do I get all possible combinations of an enum ( Flags )

[Flags]
public enum MyEnum
{
None = 0,
Setting1 = (1 << 1),
Setting2 = (1 << 2),
Setting3 = (1 << 3),
Setting4 = (1 << 4),
}
I need to be able to somehow loop over every posible setting and pass the settings combination to a function. Sadly I have been unable to figure out how to do this
Not tested, use at your own risk, but should solve the problem generically enough. System.Enum is not a valid restriction as technically C# only allow inheritance in/with class the backend bypasses this for Enum and ValueType. So sorry for the ugly casting. It is also not horribly efficient but unless you are running this against a dynamically generated type it should only ever have to be done once per execution (or once period if saved).
public static List<T> GetAllEnums<T>()
where T : struct
// With C# 7.3 where T : Enum works
{
// Unneeded if you add T : Enum
if (typeof(T).BaseType != typeof(Enum)) throw new ArgumentException("T must be an Enum type");
// The return type of Enum.GetValues is Array but it is effectively int[] per docs
// This bit converts to int[]
var values = Enum.GetValues(typeof(T)).Cast<int>().ToArray();
if (!typeof(T).GetCustomAttributes(typeof(FlagsAttribute), false).Any())
{
// We don't have flags so just return the result of GetValues
return values;
}
var valuesInverted = values.Select(v => ~v).ToArray();
int max = 0;
for (int i = 0; i < values.Length; i++)
{
max |= values[i];
}
var result = new List<T>();
for (int i = 0; i <= max; i++)
{
int unaccountedBits = i;
for (int j = 0; j < valuesInverted.Length; j++)
{
// This step removes each flag that is set in one of the Enums thus ensuring that an Enum with missing bits won't be passed an int that has those bits set
unaccountedBits &= valuesInverted[j];
if (unaccountedBits == 0)
{
result.Add((T)(object)i);
break;
}
}
}
//Check for zero
try
{
if (string.IsNullOrEmpty(Enum.GetName(typeof(T), (T)(object)0)))
{
result.Remove((T)(object)0);
}
}
catch
{
result.Remove((T)(object)0);
}
return result;
}
This works by getting all the values and ORing them together, rather than summing, in case there are composite numbers included. Then it takes every integer up to the maximum and masking them with the reverse of each Flag, this causes valid bits to become 0, allowing us to identify those bits that are impossible.
The check at the end is for missing zero from an Enum. You can remove it if you are fine with always including a zero enum in the results.
Gave the expected result of 15 when given an enum containing 2,4,6,32,34,16384.
Since it's a flagged enum, why not simply:
Get the highest vale of the enum.
Calculate the amound of combinations, i.e. the upper bound.
Iterate over every combination, i.e. loop from 0 to upper bound.
An example would look like this
var highestEnum = Enum.GetValues(typeof(MyEnum)).Cast<int>().Max();
var upperBound = highestEnum * 2;
for (int i = 0; i < upperBound; i++)
{
Console.WriteLine(((MyEnum)i).ToString());
}
Here is a solution particular to your code sample, using a simple for loop (don't use, see update below)
int max = (int)(MyEnum.Setting1 | MyEnum.Setting2 | MyEnum.Setting3 | MyEnum.Setting4);
for (int i = 0; i <= max; i++)
{
var value = (MyEnum)i;
SomeOtherFunction(value);
}
Update: Here is a generic method that will return all possible combinations. And thank #David Yaw for the idea to use a queue to build up every combination.
IEnumerable<T> AllCombinations<T>() where T : struct
{
// Constuct a function for OR-ing together two enums
Type type = typeof(T);
var param1 = Expression.Parameter(type);
var param2 = Expression.Parameter(type);
var orFunction = Expression.Lambda<Func<T, T, T>>(
Expression.Convert(
Expression.Or(
Expression.Convert(param1, type.GetEnumUnderlyingType()),
Expression.Convert(param2, type.GetEnumUnderlyingType())),
type), param1, param2).Compile();
var initalValues = (T[])Enum.GetValues(type);
var discoveredCombinations = new HashSet<T>(initalValues);
var queue = new Queue<T>(initalValues);
// Try OR-ing every inital value to each value in the queue
while (queue.Count > 0)
{
T a = queue.Dequeue();
foreach (T b in initalValues)
{
T combo = orFunction(a, b);
if (discoveredCombinations.Add(combo))
queue.Enqueue(combo);
}
}
return discoveredCombinations;
}
public IEnumerable<TEnum> AllCombinations<TEnum>() where TEnum : struct
{
Type enumType = typeof (TEnum);
if (!enumType.IsEnum)
throw new ArgumentException(string.Format("The type {0} does not represent an enumeration.", enumType), "TEnum");
if (enumType.GetCustomAttributes(typeof (FlagsAttribute), true).Length > 0) //Has Flags attribute
{
var allCombinations = new HashSet<TEnum>();
var underlyingType = Enum.GetUnderlyingType(enumType);
if (underlyingType == typeof (sbyte) || underlyingType == typeof (short) || underlyingType == typeof (int) || underlyingType == typeof (long))
{
long[] enumValues = Array.ConvertAll((TEnum[]) Enum.GetValues(enumType), value => Convert.ToInt64(value));
for (int i = 0; i < enumValues.Length; i++)
FillCombinationsRecursive(enumValues[i], i + 1, enumValues, allCombinations);
}
else if (underlyingType == typeof (byte) || underlyingType == typeof (ushort) || underlyingType == typeof (uint) || underlyingType == typeof (ulong))
{
ulong[] enumValues = Array.ConvertAll((TEnum[]) Enum.GetValues(enumType), value => Convert.ToUInt64(value));
for (int i = 0; i < enumValues.Length; i++)
FillCombinationsRecursive(enumValues[i], i + 1, enumValues, allCombinations);
}
return allCombinations;
}
//No Flags attribute
return (TEnum[]) Enum.GetValues(enumType);
}
private void FillCombinationsRecursive<TEnum>(long combination, int start, long[] initialValues, HashSet<TEnum> combinations) where TEnum : struct
{
combinations.Add((TEnum)Enum.ToObject(typeof(TEnum), combination));
if (combination == 0)
return;
for (int i = start; i < initialValues.Length; i++)
{
var nextCombination = combination | initialValues[i];
FillCombinationsRecursive(nextCombination, i + 1, initialValues, combinations);
}
}
private void FillCombinationsRecursive<TEnum>(ulong combination, int start, ulong[] initialValues, HashSet<TEnum> combinations) where TEnum : struct
{
combinations.Add((TEnum)Enum.ToObject(typeof(TEnum), combination));
if (combination == 0)
return;
for (int i = start; i < initialValues.Length; i++)
{
var nextCombination = combination | initialValues[i];
FillCombinationsRecursive(nextCombination, i + 1, initialValues, combinations);
}
}
First, grab a list of all the individual values. Since you've got 5 values, that's (1 << 5) = 32 combinations, so iterate from 1 to 31. (Don't start at zero, that would mean to include none of the enum values.) When iterating, examine the bits in the number, every one bit in the iteration variable means to include that enum value. Put the results into a HashSet, so that there aren't duplicates, since including the 'None' value doesn't change the resulting enum.
List<MyEnum> allValues = new List<MyEnum>(Enum.Getvalues(typeof(MyEnum)));
HashSet<MyEnum> allCombos = new Hashset<MyEnum>();
for(int i = 1; i < (1<<allValues.Count); i++)
{
MyEnum working = (MyEnum)0;
int index = 0;
int checker = i;
while(checker != 0)
{
if(checker & 0x01 == 0x01) working |= allValues[index];
checker = checker >> 1;
index++;
}
allCombos.Add(working);
}
I usually don't want to update each variable representing the max of an enumeration, when I add a new member to an enumeration.
For example I dislike the statement Greg proposed:
int max = (int)(MyEnum.Setting1 | MyEnum.Setting2 | ... | MyEnum.SettingN);
Consider when you have several of these variabeles scattered throughout your solution and you decide to modify your enumeration. That surely isn't a desirable scenario.
I will admit in advance that my code is slower, but it is automatically correct after an enumeration has been modified, and I strive to code in such a robust manner. I'm willing to pay some computational penalty for that, C# is all about that anywayz. I propose:
public static IEnumerable<T> GetAllValues<T>() where T : struct
{
if (!typeof(T).IsEnum) throw new ArgumentException("Generic argument is not an enumeration type");
int maxEnumValue = (1 << Enum.GetValues(typeof(T)).Length) - 1;
return Enumerable.Range(0, maxEnumValue).Cast<T>();
}
This assumes the enumeration contains members for all powers of 2 up to a certain power(including 0), exactly like a flag-enumeration is usually used.
I'm probably a little late to the party I would like to leave my solution which also includes the values plus the possible text of the combination in form of ("V1 | V2", "V1 | V2 | V3", etc).
I took some aspects of the solutions proposed above, so thanks to all that had posted the previous answers :D.
Note: only work with Enum set as base 2 combinations.
public static Dictionary<int,string> GetCombinations( this Enum enu)
{
var fields = enu.GetType()
.GetFields()
.Where(f => f.Name != "value__")
.DistinctBy(f=> Convert.ToInt32(f.GetRawConstantValue()));
var result = fields.ToDictionary(f=>Convert.ToInt32(f.GetRawConstantValue()), f => f.Name);
int max = Enum.GetValues(enu.GetType()).Cast<int>().Max();
int upperBound = max * 2;
for (int i = 0 ; i <= upperBound ; i += 2)
{
string s = Convert.ToString(i, 2).PadLeft(Math.Abs(i-max),'0');
Boolean[] bits = s.Select(chs => chs == '1' ? true : false)
.Reverse()
.ToArray();
if (!result.ContainsKey(i))
{
var newComb = string.Empty;
for (int j = 1; j < bits.Count(); j++)
{
var idx = 1 << j;
if (bits[j] && result.ContainsKey(idx))
{
newComb = newComb + result[idx] + " | ";
}
}
newComb = newComb.Trim(new char[] { ' ', '|' });
if (!result.ContainsValue(newComb) && !string.IsNullOrEmpty(newComb))
{
result.Add(i, newComb);
}
}
}
return result;
}
Concluded with this version:
Without checks:
public IEnumerable<T> AllCombinations<T>() where T : struct
{
var type = typeof(T);
for (var combination = 0; combination < Enum.GetValues(type).Cast<int>().Max()*2; combination++)
{
yield return (T)Enum.ToObject(type, combination);
}
}
With some checks:
public IEnumerable<T> AllCombinations<T>() where T : struct
{
var type = typeof(T);
if (!type.IsEnum)
{
throw new ArgumentException($"Type parameter '{nameof(T)}' must be an Enum type.");
}
for (var combination = 0; combination < Enum.GetValues(type).Cast<int>().Max()*2; combination++)
{
var result = (T)Enum.ToObject(type, combination);
// Optional check for legal combination.
// (and is not necessary if all flag a ascending exponent of 2 like 2, 4, 8...
if (result.ToString() == combination.ToString() && combination != 0)
{
continue;
}
yield return result;
}
}

Rebase a 1-based array in c#

I have an array in c# that is 1-based (generated from a call to get_Value for an Excel Range
I get a 2D array for example
object[,] ExcelData = (object[,]) MySheet.UsedRange.get_Value(Excel.XlRangeValueDataType.xlRangeValueDefault);
this appears as an array for example ExcelData[1..20,1..5]
is there any way to tell the compiler to rebase this so that I do not need to add 1 to loop counters the whole time?
List<string> RowHeadings = new List<string>();
string [,] Results = new string[MaxRows, 1]
for (int Row = 0; Row < MaxRows; Row++) {
if (ExcelData[Row+1, 1] != null)
RowHeadings.Add(ExcelData[Row+1, 1]);
...
...
Results[Row, 0] = ExcelData[Row+1, 1];
& other stuff in here that requires a 0-based Row
}
It makes things less readable since when creating an array for writing the array will be zero based.
Why not just change your index?
List<string> RowHeadings = new List<string>();
for (int Row = 1; Row <= MaxRows; Row++) {
if (ExcelData[Row, 1] != null)
RowHeadings.Add(ExcelData[Row, 1]);
}
Edit: Here is an extension method that would create a new, zero-based array from your original one (basically it just creates a new array that is one element smaller and copies to that new array all elements but the first element that you are currently skipping anyhow):
public static T[] ToZeroBasedArray<T>(this T[] array)
{
int len = array.Length - 1;
T[] newArray = new T[len];
Array.Copy(array, 1, newArray, 0, len);
return newArray;
}
That being said you need to consider if the penalty (however slight) of creating a new array is worth improving the readability of the code. I am not making a judgment (it very well may be worth it) I am just making sure you don't run with this code if it will hurt the performance of your application.
Create a wrapper for the ExcelData array with a this[,] indexer and do rebasing logic there. Something like:
class ExcelDataWrapper
{
private object[,] _excelData;
public ExcelDataWrapper(object[,] excelData)
{
_excelData = excelData;
}
public object this[int x, int y]
{
return _excelData[x+1, y+1];
}
}
Since you need Row to remain as-is (based on your comments), you could just introduce another loop variable:
List<string> RowHeadings = new List<string>();
string [,] Results = new string[MaxRows, 1]
for (int Row = 0, SrcRow = 1; SrcRow <= MaxRows; Row++, SrcRow++) {
if (ExcelData[SrcRow, 1] != null)
RowHeadings.Add(ExcelData[SrcRow, 1]);
...
...
Results[Row, 0] = ExcelData[SrcRow, 1];
}
Why not use:
for (int Row = 1; Row <= MaxRows; Row++) {
Or is there something I'm missing?
EDIT: as it turns out that something is missing, I would use another counter (starting at 0) for that purpose, and use a 1 based Row index for the array. It's not good practice to use the index for another use than the index in the target array.
Is changing the loop counter too hard for you?
for (int Row = 1; Row <= MaxRows; Row++)
If the counter's range is right, you don't have to add 1 to anything inside the loop so you don't lose readability. Keep it simple.
I agree that working with base-1 arrays from .NET can be a hassle. It is also potentially error-prone, as you have to mentally make a shift each time you use it, as well as correctly remember which situations will be base 1 and which will be base 0.
The most performant approach is to simply make these mental shifts and index appropriately, using base-1 or base-0 as required.
I personally prefer to convert the two dimensional base-1 arrays to two dimensional base-0 arrays. This, unfortunately, requires the performance hit of copying over the array to a new array, as there is no way to re-base an array in place.
Here's an extension method that can do this for the 2D arrays returned by Excel:
public static TResult[,] CloneBase0<TSource, TResult>(
this TSource[,] sourceArray)
{
If (sourceArray == null)
{
throw new ArgumentNullException(
"The 'sourceArray' is null, which is invalid.");
}
int numRows = sourceArray.GetLength(0);
int numColumns = sourceArray.GetLength(1);
TResult[,] resultArray = new TResult[numRows, numColumns];
int lb1 = sourceArray.GetLowerBound(0);
int lb2 = sourceArray.GetLowerBound(1);
for (int r = 0; r < numRows; r++)
{
for (int c = 0; c < numColumns; c++)
{
resultArray[r, c] = sourceArray[lb1 + r, lb2 + c];
}
}
return resultArray;
}
And then you can use it like this:
object[,] array2DBase1 = (object[,]) MySheet.UsedRange.get_Value(Type.Missing);
object[,] array2DBase0 = array2DBase1.CloneBase0();
for (int row = 0; row < array2DBase0.GetLength(0); row++)
{
for (int column = 0; column < array2DBase0.GetLength(1); column++)
{
// Your code goes here...
}
}
For massively sized arrays, you might not want to do this, but I find that, in general, it really cleans up your code (and mind-set) to make this conversion, and then always work in base-0.
Hope this helps...
Mike
For 1 based arrays and Excel range operations as well as UDF (SharePoint) functions I use this utility function
public static object[,] ToObjectArray(this Object Range)
{
Type type = Range.GetType();
if (type.IsArray && type.Name == "Object[,]")
{
var sourceArray = Range as Object[,];
int lb1 = sourceArray.GetLowerBound(0);
int lb2 = sourceArray.GetLowerBound(1);
if (lb1 == 0 && lb2 == 0)
{
return sourceArray;
}
else
{
int numRows = sourceArray.GetLength(0);
int numColumns = sourceArray.GetLength(1);
var resultArray = new Object[numRows, numColumns];
for (int r = 0; r < numRows; r++)
{
for (int c = 0; c < numColumns; c++)
{
resultArray[r, c] = sourceArray[lb1 + r, lb2 + c];
}
}
return resultArray;
}
}
else if (type.IsCOMObject)
{
// Get the Value2 property from the object.
Object value = type.InvokeMember("Value2",
System.Reflection.BindingFlags.Instance |
System.Reflection.BindingFlags.Public |
System.Reflection.BindingFlags.GetProperty,
null,
Range,
null);
if (value == null)
value = string.Empty;
if (value is string)
return new object[,] { { value } };
else if (value is double)
return new object[,] { { value } };
else
{
object[,] range = (object[,])value;
int rows = range.GetLength(0);
int columns = range.GetLength(1);
object[,] param = new object[rows, columns];
Array.Copy(range, param, rows * columns);
return param;
}
}
else
throw new ArgumentException("Not A Excel Range Com Object");
}
Usage
public object[,] RemoveZeros(object range)
{
return this.RemoveZeros(range.ToObjectArray());
}
[ComVisible(false)]
[UdfMethod(IsVolatile = false)]
public object[,] RemoveZeros(Object[,] range)
{...}
The first function is com visible and will accept an excel range or a chained call from another function (the chained call will return a 1 based object array), the second call is UDF enabled for Excel Services in SharePoint. All of the logic is in the second function. In this example we are just reformatting a range to replace zero with string.empty.
You could use a 3rd party Excel compatible component such as SpreadsheetGear for .NET which has .NET friendly APIs - including 0 based indexing for APIs such as IRange[int rowIndex, int colIndex].
Such components will also be much faster than the Excel API in almost all cases.
Disclaimer: I own SpreadsheetGear LLC

How to sort a part of an array with int64 indicies in C#?

The .Net framework has an Array.Sort overload that allows one to specify the starting and ending indicies for the sort to act upon. However these parameters are only 32 bit. So I don't see a way to sort a part of a large array when the indicies that describe the sort range can only be specified using a 64-bit number. I suppose I could copy and modify the the framework's sort implementation, but that is not ideal.
Update:
I've created two classes to help me around these and other large-array issues. One other such issue was that long before I got to my memory limit, I start getting OutOfMemoryException's. I'm assuming this is because the requested memory may be available but not contiguous. So for that, I created class BigArray, which is a generic, dynamically sizable list of arrays. It has a smaller memory footprint than the framework's generic list class, and does not require that the entire array be contiguous. I haven't tested the performance hit, but I'm sure its there.
public class BigArray<T> : IEnumerable<T>
{
private long capacity;
private int itemsPerBlock;
private int shift;
private List<T[]> blocks = new List<T[]>();
public BigArray(int itemsPerBlock)
{
shift = (int)Math.Ceiling(Math.Log(itemsPerBlock) / Math.Log(2));
this.itemsPerBlock = 1 << shift;
}
public long Capacity
{
get
{
return capacity;
}
set
{
var requiredBlockCount = (value - 1) / itemsPerBlock + 1;
while (blocks.Count > requiredBlockCount)
{
blocks.RemoveAt(blocks.Count - 1);
}
while (blocks.Count < requiredBlockCount)
{
blocks.Add(new T[itemsPerBlock]);
}
capacity = (long)itemsPerBlock * blocks.Count;
}
}
public T this[long index]
{
get
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
return blocks[blockNumber][itemNumber];
}
set
{
Debug.Assert(index < capacity);
var blockNumber = (int)(index >> shift);
var itemNumber = index & (itemsPerBlock - 1);
blocks[blockNumber][itemNumber] = value;
}
}
public IEnumerator<T> GetEnumerator()
{
for (long i = 0; i < capacity; i++)
{
yield return this[i];
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
And getting back to the original issue of sorting... What I really needed was a way to act on each element of an array, in order. But with such large arrays, it is prohibitive to copy the data, sort it, act on it and then discard the sorted copy (the original order must be maintained). So I created static class OrderedOperation, which allows you to perform an arbitrary operation on each element of an unsorted array, in a sorted order. And do so with a low memory footprint (trading memory for execution time here).
public static class OrderedOperation
{
public delegate void WorkerDelegate(int index, float progress);
public static void Process(WorkerDelegate worker, IEnumerable<int> items, int count, int maxItem, int maxChunkSize)
{
// create a histogram such that a single bin is never bigger than a chunk
int binCount = 1000;
int[] bins;
double binScale;
bool ok;
do
{
ok = true;
bins = new int[binCount];
binScale = (double)(binCount - 1) / maxItem;
int i = 0;
foreach (int item in items)
{
bins[(int)(binScale * item)]++;
if (++i == count)
{
break;
}
}
for (int b = 0; b < binCount; b++)
{
if (bins[b] > maxChunkSize)
{
ok = false;
binCount *= 2;
break;
}
}
} while (!ok);
var chunkData = new int[maxChunkSize];
var chunkIndex = new int[maxChunkSize];
var done = new System.Collections.BitArray(count);
var processed = 0;
var binsCompleted = 0;
while (binsCompleted < binCount)
{
var chunkMax = 0;
var sum = 0;
do
{
sum += bins[binsCompleted];
binsCompleted++;
} while (binsCompleted < binCount - 1 && sum + bins[binsCompleted] <= maxChunkSize);
Debug.Assert(sum <= maxChunkSize);
chunkMax = (int)Math.Ceiling((double)binsCompleted / binScale);
var chunkCount = 0;
int i = 0;
foreach (int item in items)
{
if (item < chunkMax && !done[i])
{
chunkData[chunkCount] = item;
chunkIndex[chunkCount] = i;
chunkCount++;
done[i] = true;
}
if (++i == count)
{
break;
}
}
Debug.Assert(sum == chunkCount);
Array.Sort(chunkData, chunkIndex, 0, chunkCount);
for (i = 0; i < chunkCount; i++)
{
worker(chunkIndex[i], (float)processed / count);
processed++;
}
}
Debug.Assert(processed == count);
}
}
The two classes can work together (that's how I use them), but they don't have to. I hope someone else finds them useful. But I'll admit, they are fringe case classes. Questions welcome. And if my code sucks, I'd like to hear tips, too.
One final thought: As you can see in OrderedOperation, I'm using ints and not longs. Currently that is sufficient for me despite the original question I had (the application is in flux, in case you can't tell). But the class should be able to handle longs as well, should the need arise.
You'll find that even on the 64-bit framework, the maximum number of elements in an array is int.MaxValue.
The existing methods that take or return Int64 just cast the long values to Int32 internally and, in the case of parameters, will throw an ArgumentOutOfRangeException if a long parameter isn't between int.MinValue and int.MaxValue.
For example the LongLength property, which returns an Int64, just casts and returns the value of the Length property:
public long LongLength
{
get { return (long)this.Length; } // Length is an Int32
}
So my suggestion would be to cast your Int64 indicies to Int32 and then call one of the existing Sort overloads.
Since Array.Copy takes Int64 params, you could pull out the section you need to sort, sort it, then put it back. Assuming you're sorting less than 2^32 elements, of course.
Seems like if you are sorting more than 2^32 elements then it would be best to write your own, more efficient, sort algorithm anyway.

Categories