Classification of instances in Weka - c#

I'm trying to use Weka in my C# application. I've used IKVM to bring the Java parts into my .NET application. This seems to be working quite well. However, I am at a loss when it comes to Weka's API. How exactly do I classify instances if they are programmatically passed around in my application and not available as ARFF files.
Basically, I am trying to integrate a simple co-reference analysis using Weka's classifiers. I've built the classification model in Weka directly and saved it to disk, from where my .NET application opens it and uses the IKVM port of Weka to predict the class value.
Here is what I've got so far:
// This is the "entry" method for the classification method
public IEnumerable<AttributedTokenDecorator> Execute(IEnumerable<TokenPair> items)
{
TokenPair[] pairs = items.ToArray();
Classifier model = ReadModel(); // reads the Weka generated model
FastVector fv = CreateFastVector(pairs);
Instances instances = new Instances("licora", fv, pairs.Length);
CreateInstances(instances, pairs);
for(int i = 0; i < instances.numInstances(); i++)
{
Instance instance = instances.instance(i);
double classification = model.classifyInstance(instance); // array index out of bounds?
if(AsBoolean(classification))
MakeCoreferent(pairs[i]);
}
throw new NotImplementedException(); // TODO
}
// This is a helper method to create instances from the internal model files
private static void CreateInstances(Instances instances, IEnumerable<TokenPair> pairs)
{
instances.setClassIndex(instances.numAttributes() - 1);
foreach(var pair in pairs)
{
var instance = new Instance(instances.numAttributes());
instance.setDataset(instances);
for (int i = 0; i < instances.numAttributes(); i++)
{
var attribute = instances.attribute(i);
if (pair.Features.ContainsKey(attribute.name()) && pair.Features[attribute.name()] != null)
{
var value = pair.Features[attribute.name()];
if (attribute.isNumeric()) instance.setValue(attribute, Convert.ToDouble(value));
else instance.setValue(attribute, value.ToString());
}
else
{
instance.setMissing(attribute);
}
}
//instance.setClassMissing();
instances.add(instance);
}
}
// This creates the data set's attributes vector
private FastVector CreateFastVector(TokenPair[] pairs)
{
var fv = new FastVector();
foreach (var attribute in _features)
{
Attribute att;
if (attribute.Type.Equals(ArffType.Nominal))
{
var values = new FastVector();
ExtractValues(values, pairs, attribute.FeatureName);
att = new Attribute(attribute.FeatureName, values);
}
else
att = new Attribute(attribute.FeatureName);
fv.addElement(att);
}
{
var classValues = new FastVector(2);
classValues.addElement("0");
classValues.addElement("1");
var classAttribute = new Attribute("isCoref", classValues);
fv.addElement(classAttribute);
}
return fv;
}
// This extracts observed values for nominal attributes
private static void ExtractValues(FastVector values, IEnumerable<TokenPair> pairs, string featureName)
{
var strings = (from x in pairs
where x.Features.ContainsKey(featureName) && x.Features[featureName] != null
select x.Features[featureName].ToString())
.Distinct().ToArray();
foreach (var s in strings)
values.addElement(s);
}
private Classifier ReadModel()
{
return (Classifier) SerializationHelper.read(_model);
}
private static bool AsBoolean(double classifyInstance)
{
return classifyInstance >= 0.5;
}
For some reason, Weka throws an IndexOutOfRangeException when I call model.classifyInstance(instance). I have no idea why, nor can I come up with an idea how to rectify this issue.
I am hoping someone might know where I went wrong. The only documentation for Weka I found relies on ARFF files for prediction, and I don't really want to go there.

For some odd reason, this exception was raised by the DTNB classifier (I was using three in a majority vote classification model). Apparently, not using DTNB "fixed" the issue.

Related

How to compare simple and typed literals in dotnetrdf?

I'm comparing two graphs, one from a Turtle file with simple literal objects, the other from a file with explicit datatype IRIs. The graphs are otherwise equal.
Graph A:
<s> <p> "o"
Graph B:
<s> <p> "o"^^xsd:string
According to RDF 1.1 (3.3 Literals), "[s]imple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string". This is reflected in the concrete syntax specifications as well (N-Triples, Turtle, RDF XML).
So I'd expect both my graphs to consists of a single triple with a URI node s subject, a URI node p predicate, and a literal node o with type xsd:string object. Based on this I'd expect there to be no difference between the two.
However this is not the case in practice:
var graphStringA = "<http://example.com/subject> <http://example.com/predicate> \"object\".";
var graphStringB = "<http://example.com/subject> <http://example.com/predicate> \"object\"^^<http://www.w3.org/2001/XMLSchema#string>.";
var graphA = new Graph();
var graphB = new Graph();
StringParser.Parse(graphA, graphStringA);
StringParser.Parse(graphB, graphStringB);
var diff = graphA.Difference(graphB);
There's one added and one removed triple in the difference report. The graphs are different, because the datatypes for the object nodes are different: graphA.Triples.First().Object.Datatype is nothing, while graphB.Triples.First().Object.Datatype is the correct URI.
It appears to me that to modify this behaviour I'd have to either
go all the way down to LiteralNode (and change its assumptions about literal nodes), or
create a new GraphDiff (that takes the default datatype of string literals into account).
A workaround is to remove the "default" datatypes:
private static void RemoveDefaultDatatype(IGraph g)
{
var triplesWithDefaultDatatype =
from triple in g.Triples
where triple.Object is ILiteralNode
let literal = triple.Object as ILiteralNode
where literal.DataType != null
where literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#string" || literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#langString"
select triple;
var triplesWithNoDatatype =
from triple in triplesWithDefaultDatatype
let literal = triple.Object as ILiteralNode
select new Triple(
triple.Subject,
triple.Predicate,
g.CreateLiteralNode(
literal.Value,
literal.Language));
g.Assert(triplesWithNoDatatype.ToArray());
g.Retract(triplesWithDefaultDatatype);
}
Is there a way in dotnetrdf to compare simple literals to typed literals in a way that's consistent with RDF 1.1, without resorting to major rewrite or workaround as above?
dotNetRDF is not RDF 1.1 compliant nor do we claim to be. There is a branch which is rewritten to be compliant but it is not remotely production ready.
Assuming that you control the parsing process you can customise the handling of incoming data using the RDF Handlers API. You can then strip the implicit xsd:string type off literals as they come into the system by overriding the HandleTriple(Triple t) method as desired.
Using the Handlers API as per RobV's answer:
class StripStringHandler : BaseRdfHandler, IWrappingRdfHandler
{
protected override bool HandleTripleInternal(Triple t)
{
if (t.Object is ILiteralNode)
{
var literal = t.Object as ILiteralNode;
if (literal.DataType != null && (literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#string" || literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#langString"))
{
var simpleLiteral = this.CreateLiteralNode(literal.Value, literal.Language);
t = new Triple(t.Subject, t.Predicate, simpleLiteral);
}
}
return this.handler.HandleTriple(t);
}
private IRdfHandler handler;
public StripStringHandler(IRdfHandler handler) : base(handler)
{
this.handler = handler;
}
public IEnumerable<IRdfHandler> InnerHandlers
{
get
{
return this.handler.AsEnumerable();
}
}
protected override void StartRdfInternal()
{
this.handler.StartRdf();
}
protected override void EndRdfInternal(bool ok)
{
this.handler.EndRdf(ok);
}
protected override bool HandleBaseUriInternal(Uri baseUri)
{
return this.handler.HandleBaseUri(baseUri);
}
protected override bool HandleNamespaceInternal(string prefix, Uri namespaceUri)
{
return this.handler.HandleNamespace(prefix, namespaceUri);
}
public override bool AcceptsAll
{
get
{
return this.handler.AcceptsAll;
}
}
}
Usage:
class Program
{
static void Main()
{
var graphA = Load("<http://example.com/subject> <http://example.com/predicate> \"object\".");
var graphB = Load("<http://example.com/subject> <http://example.com/predicate> \"object\"^^<http://www.w3.org/2001/XMLSchema#string>.");
var diff = graphA.Difference(graphB);
Debug.Assert(diff.AreEqual);
}
private static IGraph Load(string source)
{
var result = new Graph();
var graphHandler = new GraphHandler(result);
var strippingHandler = new StripStringHandler(graphHandler);
var parser = new TurtleParser();
using (var reader = new StringReader(source))
{
parser.Load(strippingHandler, reader);
}
return result;
}
}

Entity Framework Code First and SQL Server 2012 Sequences

I was in the middle of implementing a database audit trail whereby CRUD operations performed through my controllers in my Web API project would serialize the old and new poco's and store their values for later retrieval (historical, rollback, etc...).
When I got it all working, I did not like how it made my controllers look during a POST because I ended up having to call SaveChanges() twice, once to get the ID for the inserted entity and then again to commit the audit record which needed to know that ID.
I set out to convert the project (still in its infancy) to use sequences instead of identity columns. This has the added bonus of further abstracting me from SQL Server, though that is not really an issue, but it also allows me to reduce the number of commits and lets me pull that logic out of the controller and stuff it into my service layer which abstracts my controllers from the repositories and lets me do work like this auditing in this "shim" layer.
Once the Sequence object was created and a stored procedure to expose it, I created the following class:
public class SequentialIdProvider : ISequentialIdProvider
{
private readonly IService<SequenceValue> _sequenceValueService;
public SequentialIdProvider(IService<SequenceValue> sequenceValueService)
{
_sequenceValueService = sequenceValueService;
}
public int GetNextId()
{
var value = _sequenceValueService.SelectQuery("GetSequenceIds #numberOfIds", new SqlParameter("numberOfIds", SqlDbType.Int) { Value = 1 }).ToList();
if (value.First() == null)
{
throw new Exception("Unable to retrieve the next id's from the sequence.");
}
return value.First().FirstValue;
}
public IList<int> GetNextIds(int numberOfIds)
{
var values = _sequenceValueService.SelectQuery("GetSequenceIds #numberOfIds", new SqlParameter("numberOfIds", SqlDbType.Int) { Value = numberOfIds }).ToList();
if (values.First() == null)
{
throw new Exception("Unable to retrieve the next id's from the sequence.");
}
var list = new List<int>();
for (var i = values.First().FirstValue; i <= values.First().LastValue; i++)
{
list.Add(i);
}
return list;
}
}
Which simply provides two ways to get IDs, a single and a range.
This all worked great during the first set of unit tests but as soon as I started testing it in a real world scenario, I quickly realized that a single call to GetNextId() would return the same value for the life of that context, until SaveChanges() is called, thus negating any real benefit.
I am not sure if there is a way around this short of creating a second context (not an option) or going old school ADO.NET and making direct SQL calls and use AutoMapper to get to the same net result. Neither of these are appeal to me so I am hoping someone else has an idea.
Don't know if this might help you, but this is how I did my audit log trail using code first.
The following is coded into a class inheriting from DbContext.
in my constructor I have the following
IObjectContextAdapter objectContextAdapter = (this as IObjectContextAdapter);
objectContextAdapter.ObjectContext.SavingChanges += SavingChanges;
This is my saving changes method wired up previously
void SavingChanges(object sender, EventArgs e) {
Debug.Assert(sender != null, "Sender can't be null");
Debug.Assert(sender is ObjectContext, "Sender not instance of ObjectContext");
ObjectContext context = (sender as ObjectContext);
IEnumerable<ObjectStateEntry> modifiedEntities = context.ObjectStateManager.GetObjectStateEntries(EntityState.Modified);
IEnumerable<ObjectStateEntry> addedEntities = context.ObjectStateManager.GetObjectStateEntries(EntityState.Added);
addedEntities.ToList().ForEach(a => {
//Assign ids to objects that don't have
if (a.Entity is IIdentity && (a.Entity as IIdentity).Id == Guid.Empty)
(a.Entity as IIdentity).Id = Guid.NewGuid();
this.Set<AuditLogEntry>().Add(AuditLogEntryFactory(a, _AddedEntry));
});
modifiedEntities.ToList().ForEach(m => {
this.Set<AuditLogEntry>().Add(AuditLogEntryFactory(m, _ModifiedEntry));
});
}
And these are the methods used previosly to build up the audit log details
private AuditLogEntry AuditLogEntryFactory(ObjectStateEntry entry, string entryType) {
AuditLogEntry auditLogEntry = new AuditLogEntry() {
EntryDate = DateTime.Now,
EntryType = entryType,
Id = Guid.NewGuid(),
NewValues = AuditLogEntryNewValues(entry),
Table = entry.EntitySet.Name,
UserId = _UserId
};
if (entryType == _ModifiedEntry) auditLogEntry.OriginalValues = AuditLogEntryOriginalValues(entry);
return auditLogEntry;
}
/// <summary>
/// Creates a string of all modified properties for an entity.
/// </summary>
private string AuditLogEntryOriginalValues(ObjectStateEntry entry) {
StringBuilder stringBuilder = new StringBuilder();
entry.GetModifiedProperties().ToList().ForEach(m => {
stringBuilder.Append(String.Format("{0} = {1},", m, entry.OriginalValues[m]));
});
return stringBuilder.ToString();
}
/// <summary>
/// Creates a string of all modified properties' new values for an entity.
/// </summary>
private string AuditLogEntryNewValues(ObjectStateEntry entry) {
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < entry.CurrentValues.FieldCount; i++) {
stringBuilder.Append(String.Format("{0} = {1},",
entry.CurrentValues.GetName(i), entry.CurrentValues.GetValue(i)));
}
return stringBuilder.ToString();
}
Hopefully this might point you into a direction that might help you solve your problem.

How to pass an array created within a method to a dropdown list outside that method?

I need some help "fine-tuning" a block of code in my program which populates an array and binds that array to a dropdown list. Here is the code:
using ([SQL Data Connection])
{
var stakes = from st in ddl.STK_Stakes
where st.STK_EVT_FK == eventId
select new
{
st.STK_Description
};
string[] stakeDesc = new string[stakes.Count()];
foreach (var stake in stakes)
{
stakeDesc[stakeCount] = stake.STK_Description;
stakeCount++;
}
foundDDL.DataSource = stakeDesc;
foundDDL.DataBind();
This code populates Drop-Down List "foundDDL" with options only when foundDDL is on the screen. This code works, but as it's currently used, it is executing every time an instance of foundDDL is created on the screen.
Since the options populated in foundDDL will always be the same while on that page, I want to move this code to its own method, which I can then run once at load time, populate my array, and then just feed that pre-populated array to foundDDL as needed. This will reduce the number of calls to my database and make my program that much more efficient.
The issue I'm having is that I can't figure out how to instantiate my array outside the method, since the number of spaces I'll need in the array is subject to change.
Try using a static property?
private static string[] stakeDesc;
public static string[] StakeDesc {
get {
if(stakeDesc == null)
//initialize it
return stakeDesc;
}
}
You don't have to specify how big it is when you declare it, just declare it outside the method and instantiate it as you are already doing, and if it's already instantiated, just return it.
//In global scope somewhere
string[] stakeDesc;
if (stakeDesc==null || OtherReasonToRefreshData())
{
using ([SQL Data Connection])
{
var stakes = from st in ddl.STK_Stakes
where st.STK_EVT_FK == eventId
select new
{
st.STK_Description
};
stakeDesc = new string[stakes.Count()];
foreach (var stake in stakes)
{
stakeDesc[stakeCount] = stake.STK_Description;
stakeCount++;
}
}
}
foundDDL.DataSource = stakeDesc;
foundDDL.DataBind();
Note you can instantiate the array with 10 elements, than reassign it to a new array with 50 elements without issue.
Because the information does not change, you should be caching this information instead of hitting the DB.
Roughly the code should go (have not compiled this):
class MyCodeBehindClass
{
private List<string> _StakeDesc;//use a dynamically sized array
private List<string> StakeDesc
{
get
{
if (Session[StakeDescKey] != null)
_StakeDesc = (List<string>)Session[StakeDescKey];
else
{
_StakeDesc = GetStakeDesc();
}
return _StakeDesc;
}
set { _StakeDesc = value; }
}
private List<string> GetStakeDesc()
{
using ([SQL Data Connection])
{
var stakes = from st in ddl.STK_Stakes
where st.STK_EVT_FK == eventId
select new
{
st.STK_Description
};
Session[StakeDescKey] = stakes.ToList();//Populate cache
}
}
private void PageLoad()
{
//if not postback...
foundDDL.DataSource = StakeDesc;//Smart property that checks cache
foundDDL.DataBind();
}
}

SmartThreadPool - Is it possible to pass delegate method with method parameters?

I have a long running process called ImportProductInformation called by a consoleapp that I'm trying to speed up, which appears to be an excellent candidate for thread-pooling, so I did a little searching and came across SmartThreadPool on CodeProject and am trying to implement it.
ImportProductInformation currently requires an "item", which is just a single entity-framework row pulled from a list. SmartThreadPool uses a delegate called "WorkItemCallback", but if I build it like below it complains about "Method name expected" in the foreach loop on smartThreadPool.QueueWorkItem, as it appears I can't pass my params to the delegated method. What am I missing here? I'm sure it's something stupid...noob lacking experience with delegates...any help would be appreciated:
public static void ImportProductInformation_Exec()
{
// List
List<productinformation> _list = GetProductInformation();
// Import
if (_list != null)
{
SmartThreadPool smartThreadPool = new SmartThreadPool();
foreach (var item in _list)
{
smartThreadPool.QueueWorkItem
(new WorkItemCallback
(ImportProductInformation(item)));
}
smartThreadPool.WaitForIdle();
smartThreadPool.Shutdown();
}
}
public void ImportProductInformation(productinformation item)
{
// Do work associated with "item" here
}
If I change the loop to this I get "Method is used like a Type" in the build error:
foreach (var item in _list)
{
ImportProductInformation ipi =
new ImportProductInformation(item);
smartThreadPool.QueueWorkItem(new WorkItemCallback(ipi));
}
Ended up getting it to work with this:
public class ProductInformationTaskInfo
{
public productinformation ProductInformation;
public ProductInformationTaskInfo(productinformation pi)
{
ProductInformation = pi;
}
}
public class PI
{
foreach (var item in _list)
{
ProductInformationTaskInfo pi =
new ProductInformationTaskInfo(item);
smartThreadPool.QueueWorkItem
(new WorkItemCallback
(ImportProductInformation), pi);
}
public static object ImportProductInformation(Object _pi)
{
ProductInformationTaskInfo pi = (ProductInformationTaskInfo)_pi;
var item = pi.ProductInformation;
// Do work here
}
}
I don't know or have the SmartThreadPool, the following is approximate:
foreach (var item in _list)
{
var itemCopy = item;
smartThreadPool.QueueWorkItem
(dummy => ImportProductInformation(itemCopy));
}
You may have to do some fixing.
This works because the lambda captures a variable from the containing method. And that's why you need itemCopy.
But note that the normal ThreadPool is not suited for longrunning tasks, the same may hold for the SmartThreadPool. It should also keep a limit on the number of threads, and when ImportProductInformation does mainly I/O threading might not help at all.
You can use anonymous methods:
int a = 15;
String b = "hello world!";
ThreadPool.QueueUserWorkItem((state)=>SomeFunction(a,b));

Caching in C#/.Net

I wanted to ask you what is the best approach to implement a cache in C#? Is there a possibility by using given .NET classes or something like that? Perhaps something like a dictionary that will remove some entries, if it gets too large, but where whose entries won't be removed by the garbage collector?
If you are using .NET 4 or superior, you can use MemoryCache class.
If you're using ASP.NET, you could use the Cache class (System.Web.Caching).
Here is a good helper class: c-cache-helper-class
If you mean caching in a windows form app, it depends on what you're trying to do, and where you're trying to cache the data.
We've implemented a cache behind a Webservice for certain methods
(using the System.Web.Caching object.).
However, you might also want to look at the Caching Application Block. (See here) that is part of the Enterprise Library for .NET Framework 2.0.
MemoryCache in the framework is a good place to start, but you might also like to consider the open source library LazyCache because it has a simpler API than memory cache and has built in locking as well as some other developer friendly features. It is also available on nuget.
To give you an example:
// Create our cache service using the defaults (Dependency injection ready).
// Uses MemoryCache.Default as default so cache is shared between instances
IAppCache cache = new CachingService();
// Declare (but don't execute) a func/delegate whose result we want to cache
Func<ComplexObjects> complexObjectFactory = () => methodThatTakesTimeOrResources();
// Get our ComplexObjects from the cache, or build them in the factory func
// and cache the results for next time under the given key
ComplexObject cachedResults = cache.GetOrAdd("uniqueKey", complexObjectFactory);
I recently wrote this article about getting started with caching in dot net that you may find useful.
(Disclaimer: I am the author of LazyCache)
The cache classes supplied with .NET are handy, but have a major problem - they can not store much data (tens of millions+) of objects for a long time without killing your GC. They work great if you cache a few thousand objects, but the moment you move into millions and keep them around until they propagate into GEN2 - the GC pauses would eventually start to be noticeable when you system comes to low memory threshold and GC needs to sweep all gens.
The practicality is this - if you need to store a few hundred thousand instances - use MS cache. Does not matter if your objects are 2-field or 25 field - its about the number of references.
On the other hand there are cases when large RAMs, which are common these days, need to be utilized, i.e. 64 GB.
For that we have created a 100% managed memory manager and cache that sits on top of it.
Our solution can easily store 300,000,000 object in-memory in-process without taxing GC at all - this is because we store data in large (250 mb) byte[] segments.
Here is the code: NFX Pile (Apache 2.0)
And video:
NFX Pile Cache - Youtube
You can use the ObjectCache.
See http://msdn.microsoft.com/en-us/library/system.runtime.caching.objectcache.aspx
For Local Stores
.NET MemoryCache
NCache Express
AppFabric Caching
...
As mentioned in other answers, the default choice using the .NET Framework is MemoryCache and the various related implementations in Microsoft NuGet packages (e.g. Microsoft.Extensions.Caching.MemoryCache). All of these caches bound size in terms of memory used, and attempt to estimate memory used by tracking how total physical memory is increasing relative to the number of cached objects. A background thread then periodically 'trims' entries.
MemoryCache etc. share some limitations:
Keys are strings, so if the key type is not natively string, you will be forced to constantly allocate strings on the heap. This can really add up in a server application when items are 'hot'.
Has poor 'scan resistance' - e.g. if some automated process is rapidly looping through all the items in that exist, the cache size can grow too fast for the background thread to keep up. This can result in memory pressure, page faults, induced GC or when running under IIS, recycling the process due to exceeding the private bytes limit.
Does not scale well with concurrent writes.
Contains perf counters that cannot be disabled (which incur overhead).
Your workload will determine the degree to which these things are problematic. An alternative approach to caching is to bound the number of objects in the cache (rather than estimating memory used). A cache replacement policy then determines which object to discard when the cache is full.
Below is the source code for a simple cache with least recently used eviction policy:
public sealed class ClassicLru<K, V>
{
private readonly int capacity;
private readonly ConcurrentDictionary<K, LinkedListNode<LruItem>> dictionary;
private readonly LinkedList<LruItem> linkedList = new LinkedList<LruItem>();
private long requestHitCount;
private long requestTotalCount;
public ClassicLru(int capacity)
: this(Defaults.ConcurrencyLevel, capacity, EqualityComparer<K>.Default)
{
}
public ClassicLru(int concurrencyLevel, int capacity, IEqualityComparer<K> comparer)
{
if (capacity < 3)
{
throw new ArgumentOutOfRangeException("Capacity must be greater than or equal to 3.");
}
if (comparer == null)
{
throw new ArgumentNullException(nameof(comparer));
}
this.capacity = capacity;
this.dictionary = new ConcurrentDictionary<K, LinkedListNode<LruItem>>(concurrencyLevel, this.capacity + 1, comparer);
}
public int Count => this.linkedList.Count;
public double HitRatio => (double)requestHitCount / (double)requestTotalCount;
///<inheritdoc/>
public bool TryGet(K key, out V value)
{
Interlocked.Increment(ref requestTotalCount);
LinkedListNode<LruItem> node;
if (dictionary.TryGetValue(key, out node))
{
LockAndMoveToEnd(node);
Interlocked.Increment(ref requestHitCount);
value = node.Value.Value;
return true;
}
value = default(V);
return false;
}
public V GetOrAdd(K key, Func<K, V> valueFactory)
{
if (this.TryGet(key, out var value))
{
return value;
}
var node = new LinkedListNode<LruItem>(new LruItem(key, valueFactory(key)));
if (this.dictionary.TryAdd(key, node))
{
LinkedListNode<LruItem> first = null;
lock (this.linkedList)
{
if (linkedList.Count >= capacity)
{
first = linkedList.First;
linkedList.RemoveFirst();
}
linkedList.AddLast(node);
}
// Remove from the dictionary outside the lock. This means that the dictionary at this moment
// contains an item that is not in the linked list. If another thread fetches this item,
// LockAndMoveToEnd will ignore it, since it is detached. This means we potentially 'lose' an
// item just as it was about to move to the back of the LRU list and be preserved. The next request
// for the same key will be a miss. Dictionary and list are eventually consistent.
// However, all operations inside the lock are extremely fast, so contention is minimized.
if (first != null)
{
dictionary.TryRemove(first.Value.Key, out var removed);
if (removed.Value.Value is IDisposable d)
{
d.Dispose();
}
}
return node.Value.Value;
}
return this.GetOrAdd(key, valueFactory);
}
public bool TryRemove(K key)
{
if (dictionary.TryRemove(key, out var node))
{
// If the node has already been removed from the list, ignore.
// E.g. thread A reads x from the dictionary. Thread B adds a new item, removes x from
// the List & Dictionary. Now thread A will try to move x to the end of the list.
if (node.List != null)
{
lock (this.linkedList)
{
if (node.List != null)
{
linkedList.Remove(node);
}
}
}
if (node.Value.Value is IDisposable d)
{
d.Dispose();
}
return true;
}
return false;
}
// Thead A reads x from the dictionary. Thread B adds a new item. Thread A moves x to the end. Thread B now removes the new first Node (removal is atomic on both data structures).
private void LockAndMoveToEnd(LinkedListNode<LruItem> node)
{
// If the node has already been removed from the list, ignore.
// E.g. thread A reads x from the dictionary. Thread B adds a new item, removes x from
// the List & Dictionary. Now thread A will try to move x to the end of the list.
if (node.List == null)
{
return;
}
lock (this.linkedList)
{
if (node.List == null)
{
return;
}
linkedList.Remove(node);
linkedList.AddLast(node);
}
}
private class LruItem
{
public LruItem(K k, V v)
{
Key = k;
Value = v;
}
public K Key { get; }
public V Value { get; }
}
}
This is just to illustrate a thread safe cache - it probably has bugs and can be a bottleneck under heavy concurrent workloads (e.g. in a web server).
A thoroughly tested, production ready, scalable concurrent implementation is a bit beyond a stack overflow post. To solve this in my projects, I implemented a thread safe pseudo LRU (think concurrent dictionary, but with constrained size). Performance is very close to a raw ConcurrentDictionary, ~10x faster than MemoryCache, ~10x better concurrent throughput than ClassicLru above, and better hit rate. A detailed performance analysis provided in the github link below.
Usage looks like this:
int capacity = 666;
var lru = new ConcurrentLru<int, SomeItem>(capacity);
var value = lru.GetOrAdd(1, (k) => new SomeItem(k));
GitHub: https://github.com/bitfaster/BitFaster.Caching
Install-Package BitFaster.Caching
Your question needs more clarification. C# is a language not a framework. You have to specify which framework you want to implement the caching. If we consider that you want to implement it in ASP.NET it is still depends completely on what you want from Cache. You can decide between in-process cache (which will keep the data inside the heap of your application) and out-of-process cache (in this case you can store the data in other memory than the heap like Amazon Elastic cache server). And there is also another decision to make which is between client caching or serve side caching. Usually in solution you have to develop different solution for caching different data. Because base on four factors (accessibility, persistency, size, cost) you have to make decision which solution you need.
I wrote this some time ago and it seems to work well. It allows you to differentiate different cache stores by using different Types: ApplicationCaching<MyCacheType1>, ApplicationCaching<MyCacheType2>....
You can decide to allow some stores to persist after execution and others to expire.
You will need a reference to the Newtonsoft.Json serializer (or use an alternative one) and of course all objects or values types to be cached must be serializable.
Use MaxItemCount to set a limit to the number of items in any one store.
A separate Zipper class (see code below) uses System.IO.Compression. This minimises the size of the store and helps speed up loading times.
public static class ApplicationCaching<K>
{
//====================================================================================================================
public static event EventHandler InitialAccess = (s, e) => { };
//=============================================================================================
static Dictionary<string, byte[]> _StoredValues;
static Dictionary<string, DateTime> _ExpirationTimes = new Dictionary<string, DateTime>();
//=============================================================================================
public static int MaxItemCount { get; set; } = 0;
private static void OnInitialAccess()
{
//-----------------------------------------------------------------------------------------
_StoredValues = new Dictionary<string, byte[]>();
//-----------------------------------------------------------------------------------------
InitialAccess?.Invoke(null, EventArgs.Empty);
//-----------------------------------------------------------------------------------------
}
public static void AddToCache<T>(string key, T value, DateTime expirationTime)
{
try
{
//-----------------------------------------------------------------------------------------
if (_StoredValues is null) OnInitialAccess();
//-----------------------------------------------------------------------------------------
string strValue = JsonConvert.SerializeObject(value);
byte[] zippedValue = Zipper.Zip(strValue);
//-----------------------------------------------------------------------------------------
_StoredValues.Remove(key);
_StoredValues.Add(key, zippedValue);
//-----------------------------------------------------------------------------------------
_ExpirationTimes.Remove(key);
_ExpirationTimes.Add(key, expirationTime);
//-----------------------------------------------------------------------------------------
}
catch (Exception ex)
{
throw ex;
}
}
//=============================================================================================
public static T GetFromCache<T>(string key, T defaultValue = default)
{
try
{
//-----------------------------------------------------------------------------------------
if (_StoredValues is null) OnInitialAccess();
//-----------------------------------------------------------------------------------------
if (_StoredValues.ContainsKey(key))
{
//------------------------------------------------------------------------------------------
if (_ExpirationTimes[key] <= DateTime.Now)
{
//------------------------------------------------------------------------------------------
_StoredValues.Remove(key);
_ExpirationTimes.Remove(key);
//------------------------------------------------------------------------------------------
return defaultValue;
//------------------------------------------------------------------------------------------
}
//------------------------------------------------------------------------------------------
byte[] zippedValue = _StoredValues[key];
//------------------------------------------------------------------------------------------
string strValue = Zipper.Unzip(zippedValue);
T value = JsonConvert.DeserializeObject<T>(strValue);
//------------------------------------------------------------------------------------------
return value;
//------------------------------------------------------------------------------------------
}
else
{
return defaultValue;
}
//---------------------------------------------------------------------------------------------
}
catch (Exception ex)
{
throw ex;
}
}
//=============================================================================================
public static string ConvertCacheToString()
{
//-----------------------------------------------------------------------------------------
if (_StoredValues is null || _ExpirationTimes is null) return "";
//-----------------------------------------------------------------------------------------
List<string> storage = new List<string>();
//-----------------------------------------------------------------------------------------
string strStoredObject = JsonConvert.SerializeObject(_StoredValues);
string strExpirationTimes = JsonConvert.SerializeObject(_ExpirationTimes);
//-----------------------------------------------------------------------------------------
storage.AddRange(new string[] { strStoredObject, strExpirationTimes});
//-----------------------------------------------------------------------------------------
string strStorage = JsonConvert.SerializeObject(storage);
//-----------------------------------------------------------------------------------------
return strStorage;
//-----------------------------------------------------------------------------------------
}
//=============================================================================================
public static void InializeCacheFromString(string strCache)
{
try
{
//-----------------------------------------------------------------------------------------
List<string> storage = JsonConvert.DeserializeObject<List<string>>(strCache);
//-----------------------------------------------------------------------------------------
if (storage != null && storage.Count == 2)
{
//-----------------------------------------------------------------------------------------
_StoredValues = JsonConvert.DeserializeObject<Dictionary<string, byte[]>>(storage.First());
_ExpirationTimes = JsonConvert.DeserializeObject<Dictionary<string, DateTime>>(storage.Last());
//-----------------------------------------------------------------------------------------
if (_ExpirationTimes != null && _StoredValues != null)
{
//-----------------------------------------------------------------------------------------
for (int i = 0; i < _ExpirationTimes.Count; i++)
{
string key = _ExpirationTimes.ElementAt(i).Key;
//-----------------------------------------------------------------------------------------
if (_ExpirationTimes[key] < DateTime.Now)
{
ClearItem(key);
}
//-----------------------------------------------------------------------------------------
}
//-----------------------------------------------------------------------------------------
if (MaxItemCount > 0 && _StoredValues.Count > MaxItemCount)
{
IEnumerable<KeyValuePair<string, DateTime>> countedOutItems = _ExpirationTimes.OrderByDescending(o => o.Value).Skip(MaxItemCount);
for (int i = 0; i < countedOutItems.Count(); i++)
{
ClearItem(countedOutItems.ElementAt(i).Key);
}
}
//-----------------------------------------------------------------------------------------
return;
//-----------------------------------------------------------------------------------------
}
//-----------------------------------------------------------------------------------------
}
//-----------------------------------------------------------------------------------------
_StoredValues = new Dictionary<string, byte[]>();
_ExpirationTimes = new Dictionary<string, DateTime>();
//-----------------------------------------------------------------------------------------
}
catch (Exception)
{
throw;
}
}
//=============================================================================================
public static void ClearItem(string key)
{
//-----------------------------------------------------------------------------------------
if (_StoredValues.ContainsKey(key))
{
_StoredValues.Remove(key);
}
//-----------------------------------------------------------------------------------------
if (_ExpirationTimes.ContainsKey(key))
_ExpirationTimes.Remove(key);
//-----------------------------------------------------------------------------------------
}
//=============================================================================================
}
You can easily start using the cache on the fly with something like...
//------------------------------------------------------------------------------------------------------------------------------
string key = "MyUniqueKeyForThisItem";
//------------------------------------------------------------------------------------------------------------------------------
MyType obj = ApplicationCaching<MyCacheType>.GetFromCache<MyType>(key);
//------------------------------------------------------------------------------------------------------------------------------
if (obj == default)
{
obj = new MyType(...);
ApplicationCaching<MyCacheType>.AddToCache(key, obj, DateTime.Now.AddHours(1));
}
Note the actual types stored in the cache can be the same or different from the cache type. The cache type is ONLY used to differentiate different cache stores.
You can then decide to allow the cache to persist after execution terminates using Default Settings
string bulkCache = ApplicationCaching<MyType>.ConvertCacheToString();
//--------------------------------------------------------------------------------------------------------
if (bulkCache != "")
{
Properties.Settings.Default.*MyType*DataCachingStore = bulkCache;
}
//--------------------------------------------------------------------------------------------------------
try
{
Properties.Settings.Default.Save();
}
catch (IsolatedStorageException)
{
//handle Isolated Storage exceptions here
}
Handle the InitialAccess Event to reinitialize the cache when you restart the app
private static void ApplicationCaching_InitialAccess(object sender, EventArgs e)
{
//-----------------------------------------------------------------------------------------
string storedCache = Properties.Settings.Default.*MyType*DataCachingStore;
ApplicationCaching<MyCacheType>.InializeCacheFromString(storedCache);
//-----------------------------------------------------------------------------------------
}
Finally here is the Zipper class...
public class Zipper
{
public static void CopyTo(Stream src, Stream dest)
{
byte[] bytes = new byte[4096];
int cnt;
while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0)
{
dest.Write(bytes, 0, cnt);
}
}
public static byte[] Zip(string str)
{
var bytes = Encoding.UTF8.GetBytes(str);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
{
CopyTo(msi, gs);
}
return mso.ToArray();
}
}
public static string Unzip(byte[] bytes)
{
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(msi, CompressionMode.Decompress))
{
CopyTo(gs, mso);
}
return Encoding.UTF8.GetString(mso.ToArray());
}
}
}
If you are looking to Cache something in ASP.Net then I would look at the Cache class. For example
Hashtable menuTable = new Hashtable();
menuTable.add("Home","default.aspx");
Cache["menu"] = menuTable;
Then to retrieve it again
Hashtable menuTable = (Hashtable)Cache["menu"];
- Memory cache implementation for .Net core
public class CachePocRepository : ICachedEmployeeRepository
{
private readonly IEmployeeRepository _employeeRepository;
private readonly IMemoryCache _memoryCache;
public CachePocRepository(
IEmployeeRepository employeeRepository,
IMemoryCache memoryCache)
{
_employeeRepository = employeeRepository;
_memoryCache = memoryCache;
}
public async Task<Employee> GetEmployeeDetailsId(string employeeId)
{
_memoryCache.TryGetValue(employeeId, out Employee employee);
if (employee != null)
{
return employee;
}
employee = await _employeeRepository.GetEmployeeDetailsId(employeeId);
_memoryCache.Set(employeeId,
employee,
new MemoryCacheEntryOptions()
{
AbsoluteExpiration = DateTimeOffset.UtcNow.AddDays(7),
});
return employee;
}
You could use a Hashtable
it has very fast lookups, no key collisions and your data will not garbage collected

Categories