Managing timeseries in c# - c#

I wanted to have your opinion on what is the best way to manage time series in c# according to you. I need to have a 2 dimensions matrix-like with Datetime object as an index of rows (ordered and without duplicate) and each columns would represent the stock value for the relevant Datetime. I would like to know if any of those objects would be able to handle missing data for a date: adding a column or a time serie would add the missing date in the row index and would add "null" or "N/a" for missing values for existing dates.
A lot of stuff are already available in c# compared to c++ and I don't want to miss something obvious.

TeaFiles.Net is a library for time series storage in flat files. As I understand you only want to have the data in memory, in which case you would use a MemoryStream and pass it to the ctor.
// the time series item type
struct Tick
{
public DateTime Time;
public double Price;
public int Volume;
}
// create file and write some values
var ms = new MemoryStream();
using (var tf = TeaFile<Tick>.Create(ms))
{
tf.Write(new Tick { Price = 5, Time = DateTime.Now, Volume = 700 });
tf.Write(new Tick { Price = 15, Time = DateTime.Now.AddHours(1), Volume = 1700 });
// ...
}
ms.Position = 0; // reset the stream
// read typed
using (var tf = TeaFile<Tick>.OpenRead(ms))
{
Tick value = tf.Read();
Console.WriteLine(value);
}
https://github.com/discretelogics/TeaFiles.Net
You can install the library via NuGet packages Manager "TeaFiles.Net"
A vsix sample Project is also available in the VS Gallery.

You could use a mapping between the date and the stock value, such as Dictionary<DateTime, decimal>. This way the dates can be sparse.
If you need the prices of multiple stocks at each date, and not every stock appears for every date, then you could choose between Dictionary<DateTime, Dictionary<Stock, decimal>> and Dictionary<Stock, Dictionary<DateTime, decimal>>, depending on how you want to access the values afterwards (or even both if you don't mind storing the values twice).

The DateTime object in C# is a value Type which means it initializes with its default value and that is Day=1 Month=1 Year=1 Hour=1 Minute=1 Second=1. (or was it hour=12, i am not quite sure).
If I understood you right you need a datastructure that holds DateTime objects that are ordered in some way and when you insert a new object the adjacent dateTime objects will change to retain your order.
In this case I would focus mor on the datastructure than on the dateTime object.
Write a simple class that inherits from Lits<> for example and include the functionality you want on an insert oder delete operation.
Something like:
public class DateTimeList : List<DateTime> {
public void InsertDateTime (int position, DateTime dateTime) {
// insert the new object
this.InsertAt(position, dateTime)
// then take the adjacent objects (take care of integrity checks i.e.
// exists the index/object? in not null ? etc.
DateTime previous = this.ElementAt<DateTime>(position - 1);
// modify the previous DateTime obejct according to your needs.
DateTime next = this.ElementAt<DateTime>(position + 1);
// modify the next DateTime obejct according to your needs.
}
}

As you mentioned in your comment to Marc's answer, I believe the SortedList is a more appropriate structure to hold your time series data.
UPDATE
As zmbq mentioned in his comment to Marc's question, the SortedList is implemented as an array, so if faster insertion/removal times are needed then the SortedDictionary would be a better choice.
See Jon Skeet's answer to this question for an overview of the performance differences.

The is a time series library called TimeFlow, which allows smart creation and handling of time series.
The central TimeSeries class knows its timezone and is internally based on a sorted list of DatimeTimeOffset/Decimal pairs with specific frequency (Minute, Hour, Day, Month or even custom periods). The frequency can be changed during resample operations (e.g. hours -> days). It is also possible to combine time series unsing the standard operators (+,-,*,/) or advanced join operations using cusom methods.
Further more, the TimeFrame class combines multiple time series of same timezone and frequency (similar to pythons DataFrame but restricted to time series) for easier access.
Additional there is the great TimeFlow.Reporting library that provides advanced reporting / visualization (currently Excel and WPF) of time frames.
Disclaimer: I am the creator of these libraries.

Related

Find blocks of same values in a collection and manipulate surrounding values

I really don't know how to summarize the question in the title, sorry. :)
Let's assume I have a collection (i.e. ObservableCollection) containing thousands of objects. These objects consist of an ascending timestamp and a FALSE boolean value (very simplified).
Like this:
[0] 0.01, FALSE
[1] 0.02, FALSE
[2] 0.03, FALSE
[3] 0.04, FALSE
...
Now, let's assume that within this collection, there are blocks that have their flag set to TRUE.
Like this:
[2345] 23.46, FALSE
[2346] 23.47, FALSE
[2347] 23.48, FALSE
[2348] 23.49, TRUE
[2349] 23.50, TRUE
[2350] 23.51, TRUE
[2351] 23.52, TRUE
[2352] 23.53, TRUE
[2353] 23.54, FALSE
[2354] 23.55, FALSE
...
I need to find the blocks and set all flags within 1.5 seconds before and after the block to TRUE aswell.
How can I achieve this while maintaining a reasonable performance?
Matthias G solution is correct, although quite slow – seems to have n-squared complexity.
First algo scans the input values to filter them by IsActive, retrieve timestamps and put into a new list - this is O(n), at least. Then it scans the constructed list, which may be in the worst case the whole input – O(n), and for every timestamp retrieved it scans input values to modify appropriate of them – O(n^2).
Then it builds additional list just to be scanned once and destroyed.
I'd propose a solution similar somewhat to mergesort. First scan input values and for each Active item push appropriate time interval into a queue. You may delay pushing to see if the next interval overlaps the current one – then extend the interval instead of push. When the input list is done, finally push the last delayed interval. This way your queue will contain the (almost) minimum number of time intervals you want to modify.
Then scan again the values data and compare timestamps to the first interval in a queue. If the item's timestamp falls into the current interval, mark the item Active. If it falls past the interval remove the interval from the queue and compare the timestamp to the next one – and so on, until the item is in or before the interval. Your input data are in chronological order, so the intervals will be in the same order, too. This allows accomplishing the task in a single parallel pass through both lists.
Assuming you have a data structure like this:
Edit: Changed TimeStamp to double
public class Value
{
public double TimeStamp { get; set; }
public bool IsActive { get; set; }
}
And a list of this objects called values. Then you could search for active data sets and for each of them mark the values within a range around them as active:
double range = 1.5;
var activeTimeStamps = values.Where(value => value.IsActive)
.Select(value => value.TimeStamp)
.ToList();
foreach (var timeStamp in activeTimeStamps)
{
var valuesToMakeActive =
values.Where
(
value =>
value.TimeStamp >= timeStamp - range &&
value.TimeStamp <= timeStamp + range
);
foreach (var value in valuesToMakeActive)
{
value.IsActive = true;
}
}
Anyway, I guess there will be a solution with better performance..

Find last occurrence of an event in DDay.iCal

Suppose I have an iCalendar with a single event. This has a recurrence rule (RRULE) set with a COUNT to limit it, but also has some exception dates, and some exception rules.
I want to calculate the date of the last occurrence.
If the rules only had UNTILs set, this would be easy as I would know that this bounded the possible dates, so I could do the following.
IICalendar calendar = LoadCalendar();
Event evt = calendar.Events.Single();
DateTime start = evt.Start;
DateTime end = evt.RecurrenceRules.Select(r => r.Until).Max();
var lastOccurrence = evt.GetOccurrences(start, end).Last();
However, this approach will not work with a COUNT, as the exceptions can push the last occurrence indefinitely into the future (e.g. assume the first 500 dates of a weekly occurrence have been excluded - this would push the end date about 10 years into the future).
Is there a straightforward way to determine the last occurrence in this scenario? (Ultimately, I could write my own rule parser, or reflect on the one built into DDay, but I'm hoping for an easier way!).
Background
For reference, I am aiming to build a Quartz.NET Trigger which uses an iCalendar file to determine when to fire.
The COUNT is associated only with the RRULE, not to the event as a whole. See rfc5545#section-3.8.5.3 :
The
final recurrence set is generated by gathering all of the start
DATE-TIME values generated by any of the specified "RRULE" and
"RDATE" properties, and then excluding any start DATE-TIME values
specified by "EXDATE" properties.
You first build a set based on the RRULE (including its COUNT value), and then you remove the ones that are mentioned in EXDATE.
In other words, if you have an RRULE with a COUNT of 500 and 100 EXDATE instances, you end up with 400 instances.
Just FYI, you mention exception rules but EXRULE has been deprecated in RFC5545.

Fastest, Efficient, Elegant way of Parsing Strings to Dynamic types?

I'm looking for the fastest (generic approach) to converting strings into various data types on the go.
I am parsing large text data files generated by a something (files are several megabytes in size). This particulare function reads lines in the text file, parses each line into columns based on delimitters and places the parsed values into a .NET DataTable. This is later inserted into a database. My bottleneck by FAR is the string conversions (Convert and TypeConverter).
I have to go with a dynamic way (i.e. staying away form "Convert.ToInt32" etc...) because I never know what types are going to be in the files. The type is determined by earlier configuration during runtime.
So far I have tried the following and both take several minutes to parse a file. Note that
if I comment out this one line it runs in only a few hundred milliseconds.
row[i] = Convert.ChangeType(columnString, dataType);
AND
TypeConverter typeConverter = TypeDescriptor.GetConverter(type);
row[i] = typeConverter.ConvertFromString(null, cultureInfo, columnString);
If anyone knows of a faster way that is generic like this I would like to know about it. Or if my whole approach just sucks for some reason I'm open to suggestions. But please don't point me to non-generic approaches using hard coded types; that is simply not an option here.
UPDATE - Multi-threading to Improve Performance Test
In order to improve performance I have looked into splitting up parsing tasks to multiple threads. I found that the speed increased somewhat but still not as much as I had hoped. However, here are my results for those who are interested.
System:
Intel Xenon 3.3GHz Quad Core E3-1245
Memory: 12.0 GB
Windows 7 Enterprise x64
Test:
The test function is this:
(1) Receive an array of strings. (2) Split the string by delimitters. (3) Parse strings into data types and store them in a row. (4) Add row to data table. (5) Repeat (2)-(4) until finished.
The test included 1000 strings, each string being parsed into 16 columns, so that is 16000 string conversions total. I tested single thread, 4 threads (because of quad core), and 8 threads (because of hyper-threading). Since I'm only crunching data here I doubt adding more threads than this would do any good. So for the single thread it parses 1000 strings, 4 threads parse 250 strings each, and 8 threads parse 125 strings each. Also I tested a few different ways of using threads: thread creation, thread pool, tasks, and function objects.
Results:
Result times are in Milliseconds.
Single Thread:
Method Call: 17720
4 Threads
Parameterized Thread Start: 13836
ThreadPool.QueueUserWorkItem: 14075
Task.Factory.StartNew: 16798
Func BeginInvoke EndInvoke: 16733
8 Threads
Parameterized Thread Start: 12591
ThreadPool.QueueUserWorkItem: 13832
Task.Factory.StartNew: 15877
Func BeginInvoke EndInvoke: 16395
As you can see the fastest is using Parameterized Thread Start with 8 threads (the number of my logical cores). However it does not beat using 4 threads by much and is only about 29% faster than using a single core. Of course results will vary by machine. Also I stuck with a
Dictionary<Type, TypeConverter>
cache for string parsing as using arrays of type converters did not offer a noticeable performance increase and having one shared cached type converter is more maintainable rather than creating arrays all over the place when I need them.
ANOTHER UPDATE:
Ok so I ran some more tests to see if I could squeeze some more performance out and I found some interesting things. I decided to stick with 8 threads, all started from the Parameterized Thread Start method (which was the fastest of my previous tests). The same test as above was run, just with different parsing algorithms.
I noticed that
Convert.ChangeType and TypeConverter
take about the same amount of time. Type specific converters like
int.TryParse
are slightly faster but not an option for me since my types are dynamic. ricovox had some good advice about exception handling. My data does indeed have invalid data, some integer columns will put a dash '-' for empty numbers, so type converters blow up at that: meaning every row I parse I have at least one exception, thats 1000 exceptions! Very time consuming.
Btw this is how I do my conversions with TypeConverter. Extensions is just a static class and GetTypeConverter just returns a cahced TypeConverter. If an exceptions is thrown during the conversion, a default value is used.
public static Object ConvertTo(this String arg, CultureInfo cultureInfo, Type type, Object defaultValue)
{
Object value;
TypeConverter typeConverter = Extensions.GetTypeConverter(type);
try
{
// Try converting the string.
value = typeConverter.ConvertFromString(null, cultureInfo, arg);
}
catch
{
// If the conversion fails then use the default value.
value = defaultValue;
}
return value;
}
Results:
Same test on 8 threads - parse 1000 lines, 16 columns each, 250 lines per thread.
So I did 3 new things.
1 - Run the test: check for known invalid types before parsing to minimize exceptions.
i.e. if(!Char.IsDigit(c)) value = 0; OR columnString.Contains('-') etc...
Runtime: 29ms
2 - Run the test: use custom parsing algorithms that have try catch blocks.
Runtime: 12424ms
3 - Run the test: use custom parsing algorithms checking for invalid types before parsing to minimize exceptions.
Runtime 15ms
Wow! As you can see eliminating the exceptions made a world of difference. I never realized how expensive exceptions really were! So If I minimize my exceptions to TRULY unknown cases, then the parsing algorithm runs three orders of magnitude faster. I'm considering this absolutely solved. I believe I will keep the dynamic type conversion with TypeConverter, it is only a few milliseconds slower. Checking for known invalid types before converting avoids exceptions and that speeds things up incredibly! Thanks to ricovox for pointing that out which made me test this further.
if you are primarily going to be converting the strings to the native data types (string, int, bool, DateTime etc) you could use something like the code below, which caches the TypeCodes and TypeConverters (for non-native types) and uses a fast switch statement to quickly jump to the appropriate parsing routine. This should save some time over Convert.ChangeType because the source type (string) is already known, and you can directly call the right parse method.
/* Get an array of Types for each of your columns.
* Open the data file for reading.
* Create your DataTable and add the columns.
* (You have already done all of these in your earlier processing.)
*
* Note: For the sake of generality, I've used an IEnumerable<string>
* to represent the lines in the file, although for large files,
* you would use a FileStream or TextReader etc.
*/
IList<Type> columnTypes; //array or list of the Type to use for each column
IEnumerable<string> fileLines; //the lines to parse from the file.
DataTable table; //the table you'll add the rows to
int colCount = columnTypes.Count;
var typeCodes = new TypeCode[colCount];
var converters = new TypeConverter[colCount];
//Fill up the typeCodes array with the Type.GetTypeCode() of each column type.
//If the TypeCode is Object, then get a custom converter for that column.
for(int i = 0; i < colCount; i++) {
typeCodes[i] = Type.GetTypeCode(columnTypes[i]);
if (typeCodes[i] == TypeCode.Object)
converters[i] = TypeDescriptor.GetConverter(columnTypes[i]);
}
//Probably faster to build up an array of objects and insert them into the row all at once.
object[] vals = new object[colCount];
object val;
foreach(string line in fileLines) {
//delineate the line into columns, however you see fit. I'll assume a tab character.
var columns = line.Split('\t');
for(int i = 0; i < colCount) {
switch(typeCodes[i]) {
case TypeCode.String:
val = columns[i]; break;
case TypeCode.Int32:
val = int.Parse(columns[i]); break;
case TypeCode.DateTime:
val = DateTime.Parse(columns[i]); break;
//...list types that you expect to encounter often.
//finally, deal with other objects
case TypeCode.Object:
default:
val = converters[i].ConvertFromString(columns[i]);
break;
}
vals[i] = val;
}
//Add all values to the row at one time.
//This might be faster than adding each column one at a time.
//There are two ways to do this:
var row = table.Rows.Add(vals); //create new row on the fly.
// OR
row.ItemArray = vals; //(e.g. allows setting existing row, created previously)
}
There really ISN'T any other way that would be faster, because we're basically just using the raw string parsing methods defined by the types themselves. You could re-write your own parsing code for each output type yourself, making optimizations for the exact formats you'll encounter. But I assume that is overkill for your project. It would probably be better and faster to simply tailor the FormatProvider or NumberStyles in each case.
For example let's say that whenever you parse Double values, you know, based on your proprietary file format, that you won't encounter any strings that contain exponents etc, and you know that there won't be any leading or trailing space, etc. So you can clue the parser in to these things with the NumberStyles argument as follows:
//NOTE: using System.Globalization;
var styles = NumberStyles.AllowDecimalPoint | NumberStyles.AllowLeadingSign;
var d = double.Parse(text, styles);
I don't know for a fact how the parsing is implemented, but I would think that the NumberStyles argument allows the parsing routine to work faster by excluding various formatting possibilities. Of course, if you can't make any assumptions about the format of the data, then you won't be able to make these types of optimizations.
Of course, there's always the possibility that your code is slow simply because it takes time to parse a string into a certain data type. Use a performance analyzer (like in VS2010) to try to see where your actual bottleneck is. Then you'll be able to optimize better, or simply give up, e.g. in the case that there is noting else to do short of writing the parsing routines in assembly :-)
Here is a quick piece of code to try :
Dictionary<Type, TypeConverter> _ConverterCache = new Dictionary<Type, TypeConverter>();
TypeConverter GetCachedTypeConverter(Type type)
{
if (!_ConverterCache.ContainsKey(type))
_ConverterCache.Add(type, TypeDescriptor.GetConverter(type));
return _ConverterCache[type];
}
Then use the code below instead :
TypeConverter typeConverter = GetCachedTypeConverter(type);
Is is a little faster ?
A technique I commonly use is:
var parserLookup = new Dictionary<Type, Func<string, dynamic>>();
parserLookup.Add(typeof(Int32), s => Int32.Parse(s));
parserLookup.Add(typeof(Int64), s => Int64.Parse(s));
parserLookup.Add(typeof(Decimal), s => Decimal.Parse(s, NumberStyles.Number | NumberStyles.Currency, CultureInfo.CurrentCulture));
parserLookup.Add(typeof(DateTime), s => DateTime.Parse(s, CultureInfo.CurrentCulture, DateTimeStyles.AssumeLocal));
// and so on for any other type you want to handle.
This assumes you can figure out what Type your data represents. The use of dynamic also implies .net 4 or higher, but you can change that to object in most cases.
Cache your parser lookup for each file (or for your entire app) and you should get pretty good performance.

Clean up algorithm

I've made an C# application which connects to my webcam and reads the images at the speed the webcam delivers them. I'm parsing the stream, in a way that I have a few jpeg's per seconds.
I don't want to write all the webcam data to the disk, I want to store images in memory. Also the application will act as a webserver which I can supply a datetime in the querystring. And the webserver must server the image closest to that time which it still has in memory.
In my code I have this:
Dictionary<DateTime, byte[]> cameraImages;
of which DateTime is the timestamp of the received image and the bytearray is the jpeg.
All of that works; also handling the webrequest works. Basically I want to clean up that dictionary by keep images according to their age.
Now I need an algorithm for it, in that it cleans up older images.
I can't really figure out an algorithm for it, one reason for it is that the datetimes aren't exactly on a specific moment, and I can't be sure that an image always arrives. (Sometimes the image stream is aborted for several minutes). But what I want to do is:
Keep all images for the first minute.
Keep 2 images per second for the first half hour.
Keep only one image per second if it's older than 30 minutes.
Keep only one image per 30 seconds if it's older than 2 hours.
Keep only one image per minute if it's older than 12 hours.
Keep only one image per hour if it's older than 24 hours.
Keep only one image per day if it's older than two days.
Remove all images older than 1 weeks.
The above intervals are just an example.
Any suggestions?
I think #Kevin Holditch's approach is perfectly reasonable and has the advantage that it would be easy to get the code right.
If there were a large number of images, or you otherwise wanted to think about how to do this "efficiently", I would propose a thought process like the following:
Create 7 queues, representing your seven categories. We take care to keep the images in this queue in sorted time order. The queue data structure is able to efficiently insert at its front and remove from its back. .NET's Queue would be perfect for this.
Each Queue (call it Qi) has an "incoming" set and an "outgoing" set. The incoming set for Queue 0 is those images from the camera, and for any other set it is equal to the outgoing set for Queue i-1
Each queue has rules on both its input and output side which determine whether the queue will admit new items from its incoming set and whether it should eject items from its back into its outgoing set. As a specific example, if Q3 is the queue "Keep only one image per 30 seconds if it's older than 2 hours", then Q3 iterates over its incoming set (which is the outcoming set of Q2) and only admits item i where i's timestamp is 30 seconds or more away from Q3.first() (For this to work correctly the items need to be processed from highest to lowest timestamp). On the output side, we eject from Q3's tail any object older than 12 hours and this becomes the input set for Q4.
Again, #Kevin Holditch's approach has the virtue of simplicity and is probably what you should do. I just thought you might find the above to be food for thought.
You could do this quite easily (although it may not be the most efficient way by using Linq).
E.g.
var firstMinImages = cameraImages.Where(
c => c.Key >= DateTime.Now.AddMinutes(-1));
Then do an equivalent query for every time interval. Combine them into one store of images and overwrite your existing store (presuming you dont want to keep them). This will work with your current criteria as the images needed get progressively less over time.
My strategy would be to Group the elements into buckets that you plan to weed out, then pick 1 element from the list to keep... I have made an example of how to do this using a List of DateTimes and Ints but Pics would work exactly the same way.
My Class used to store each Pic
class Pic
{
public DateTime when {get;set;}
public int val {get;set;}
}
and a sample of a few items in the list...
List<Pic> intTime = new List<Pic>();
intTime.Add(new Pic() { when = DateTime.Now, val = 0 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-1), val = 1 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-1.01), val = 2 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-1.02), val = 3 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-2), val = 4 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-2.1), val = 5 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-2.2), val = 6 });
intTime.Add(new Pic() { when = DateTime.Now.AddDays(-3), val = 7 });
Now I create a helper function to bucket and remove...
private static void KeepOnlyOneFor(List<Pic> intTime, Func<Pic, int> Grouping, DateTime ApplyBefore)
{
var groups = intTime.Where(a => a.when < ApplyBefore).OrderBy(a=>a.when).GroupBy(Grouping);
foreach (var r in groups)
{
var s = r.Where(a=> a != r.LastOrDefault());
intTime.RemoveAll(a => s.Contains(a));
}
}
What this does is lets you specify how to group the object and set an age threshold on the grouping. Now finally to use...
This will remove all but 1 picture per Day for any pics greater than 2 days old:
KeepOnlyOneFor(intTime, a => a.when.Day, DateTime.Now.AddDays(-2));
This will remove all but 1 picture for each Hour after 1 day old:
KeepOnlyOneFor(intTime, a => a.when.Hour, DateTime.Now.AddDays(-1));
If you are on .net 4 you could use a MemoryCache for each interval with CachItemPolicy objects to expire them when you want them to expire and UpdateCallbacks to move some to the next interval.

A couple of questions about NHibernate's GuidCombGenerator

The following code can be found in the NHibernate.Id.GuidCombGenerator class. The algorithm creates sequential (comb) guids based on combining a "random" guid with a DateTime. I have a couple of questions related to the lines that I have marked with *1) and *2) below:
private Guid GenerateComb()
{
byte[] guidArray = Guid.NewGuid().ToByteArray();
// *1)
DateTime baseDate = new DateTime(1900, 1, 1);
DateTime now = DateTime.Now;
// Get the days and milliseconds which will be used to build the byte string
TimeSpan days = new TimeSpan(now.Ticks - baseDate.Ticks);
TimeSpan msecs = now.TimeOfDay;
// *2)
// Convert to a byte array
// Note that SQL Server is accurate to 1/300th of a millisecond so we divide by 3.333333
byte[] daysArray = BitConverter.GetBytes(days.Days);
byte[] msecsArray = BitConverter.GetBytes((long) (msecs.TotalMilliseconds / 3.333333));
// Reverse the bytes to match SQL Servers ordering
Array.Reverse(daysArray);
Array.Reverse(msecsArray);
// Copy the bytes into the guid
Array.Copy(daysArray, daysArray.Length - 2, guidArray, guidArray.Length - 6, 2);
Array.Copy(msecsArray, msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4);
return new Guid(guidArray);
}
First of all, for *1), wouldn't it be better to have a more recent date as the baseDate, e.g. 2000-01-01, so as to make room for more values in the future?
Regarding *2), why would we care about the accuracy for DateTimes in SQL Server, when we only are interested in the bytes of the datetime anyway, and never intend to store the value in an SQL Server datetime field? Wouldn't it be better to use all the accuracy available from DateTime.Now?
Re 1: there is no relevance to the actual day value, the two bytes used from the value simply roll over 65536 days after 1/1/1900. The only thing that matters is that the values are roughly sequential. The dbase is going to be a bit inefficient in the summer of 2079, nobody will notice.
Re 2: yes, makes no sense. But same story, the actual value doesn't matter.
The algorithm is questionable, messing with the guaranteed uniqueness of Guids is a tricky proposition. You'll have to rely on somebody in the nHibernate team having insider knowledge that this works without problems. If you change it, you're liable to break it.
COMB was created specifically to efficiently use GUIDs as a clustered index in SQL Server. That's why it's written around SQL Server specific behavior.
I'm very late to this party but I thought I'd share the original intent of COMB.
I started using nHibernate in '04 and we wanted to use GUIDs for our IDs. After some research I found that SQL Server was not efficient at using completely random GUIDs as a primary key / clustered index because it wanted to store them in order and would have to do insert into the middle of the table (page splits). I got the algorithm for COMB from this article: http://www.informit.com/articles/article.aspx?p=25862 by Jimmy Nilsson which is a very comprehensive description of why COMBs are the way they are (and a good read). I started using a custom generator to generate COMBs and them NHibernate picked it up as a built-in generator.
COMB may not produce in-order IDs in other servers. I've never researched it.

Categories