I'm trying to expose an observable sequence that gives observers all existing records in a database table plus any future items. For the sake of argument, lets say it's log entries. Therefore, I'd have something like this:
public class LogService
{
private readonly Subject<LogEntry> entries;
public LogService()
{
this.entries = new Subject<LogEntry>();
this.entries
.Buffer(...)
.Subscribe(async x => WriteLogEntriesToDatabaseAsync(x));
}
public IObservable<LogEntry> Entries
{
get { return this.entries; }
}
public IObservable<LogEntry> AllLogEntries
{
get
{
// how the heck?
}
}
public void Log(string message)
{
this.entries.OnNext(new LogEntry(message));
}
private async Task<IEnumerable<LogEntry>> GetLogEntriesAsync()
{
// reads existing entries from DB table and returns them
}
private async Task WriteLogEntriesToDatabaseAsync(IList<LogEntry> entries)
{
// writes entries to the database
}
}
My initial thought for the implementation of AllLogEntries was something like this:
return Observable.Create<LogEntry>(
async observer =>
{
var existingEntries = await this.GetLogEntriesAsync();
foreach (var existingEntry in existingEntries)
{
observer.OnNext(existingEntry);
}
return this.entries.Subscribe(observer);
});
But the problem with this is that there could log entries that have been buffered and not yet written to the database. Hence, those entries will be missed because they are not in the database and have already passed through the entries observable.
My next thought was to separate the buffered entries from the non-buffered and use the buffered when implementing AllLogEntries:
return Observable.Create<LogEntry>(
async observer =>
{
var existingEntries = await this.GetLogEntriesAsync();
foreach (var existingEntry in existingEntries)
{
observer.OnNext(existingEntry);
}
return this.bufferedEntries
.SelectMany(x => x)
.Subscribe(observer);
});
There are two problems with this:
It means clients of AllLogEntries also have to wait for the buffer timespan to pass before they receive their log entries. I want them to see log entries instantaneously.
There is still a race condition in that log entries could be written to the database between the point at which I finish reading the existing ones and the point at which I return the future entries.
So my question is: how would I actually go about achieving my requirements here with no possibility of race conditions, and avoiding any major performance penalties?
To do this via the client code, you will probably have to implement a solution using polling and then look for differences between calls. Possibly combining a solution with
Observable.Interval() : http://rxwiki.wikidot.com/101samples#toc28 , and
Observable.DistinctUntilChanged()
will give you sufficient solution.
Alternatively, I'd suggest you try to find a solution where the clients are notified when the DB/table is updated. In a web application, you could use something like SignalR to do this.
For example: http://techbrij.com/database-change-notifications-asp-net-signalr-sqldependency
If its not a web-application, a similar update mechanism via sockets may work.
See these links (these came from the accepted answer of SignalR polling database for updates):
http://xsockets.net/api/net-c#snippet61
https://github.com/codeplanner/XSocketsPollingLegacyDB
Related
Quite a few questions/answers on this topic (only listing a couple that I found. There were many more).
C# Parallel - Adding items to the collection being iterated over, or equivalent?
ConcurrentQueue with multithreading
Thanks to many of them I've come up with what I'm hoping is a possible solution for my problem. I may also be overthinking it. I have an api that needs to write to a text file for logging purposes. Now the api is called N+ times and during each call, it needs to log the request. What I don't want to do is to stop the request from having to wait on the log to be recorded before returning the requested data. Now, the logs cannot just be dropped so it must also stack up on each request if the file is currently in use, using ReaderWriterLock for this. Then when the file isn't locked, I want to write the stacked logs.
I have come up with this in the hopes that it would satisfy the requirements but I think it will still cause a wait.
var wid = WindowsIdentity.GetCurrent().Token;
//add new log items
logs.Enqueue(helpers.createNewLog(requests));
string op;
while (logs.TryDequeue(out op))
{
using (WindowsIdentity.Impersonate(wid))
{
//write to text file, location on shared drive
var wrote = writers.WriteLog(op);
//item cannot be written since file locked, add back to queue to try again
if (!wrote)
{
logs.Enqueue(op);
}
}
}
Logs is a global like so
private static ConcurrentQueue<string> logs = new ConcurrentQueue<string>();
I feel like something isn't right but I'm struggling with what it is and which would be the best way in order for the requirements to be meet and still work in a web farm.
In my opinion, you should use a BlockingCollection instead of the ConcurrentQueue, here is an example of how you can use it as a Producer-Consumer is the same thing you are trying to do.
Now with ASP.Net you can insert modules to intercept every request, if you want to save a log, I suggest you register a module instead of going with your approach. On your Global.asax.cs you have a Register method
public class MvcApplication : System.Web.HttpApplication
{
public static void Register()
{
//registering an HttpModule
HttpApplication.RegisterModule(typeof(LogModule));
}
....
}
public class LogModule: IHttpModule
{
public void Dispose()
{
}
public void Init(HttpApplication context)
{
context.LogRequest += LogEvent;
}
private void LogEvent(object src, EventArgs args)
{
if (HttpContext.Current.CurrentNotification == RequestNotification.LogRequest)
{
if ((MvcHandler)HttpContext.Current.Handler != null)
{
Debug.WriteLine("This was logged!");
//Save the information to your file
}
}
}
}
Hope this helps
I have a web method upload Transaction (ASMX web service) that take the XML file, validate the file and store the file content in SQL server database. we noticed that a certain users can submit the same file twice at the same time. so we can have the same codes again in our database( we cannot use unique index on the database or do anything on database level, don't ask me why). I thought I can use the lock statement on the user id string but i don't know if this will solve the issue. or if I can use a cashed object for storing all user id requests and check if we have 2 requests from the same user Id we will execute the first one and block the second request with an error message
so if anyone have any idea please help
Blocking on strings is bad. Blocking your webserver is bad.
AsyncLocker is a handy class that I wrote to allow locking on any type that behaves nicely as a key in a dictionary. It also requires asynchronous awaiting before entering the critical section (as opposed to the normal blocking behaviour of locks):
public class AsyncLocker<T>
{
private LazyDictionary<T, SemaphoreSlim> semaphoreDictionary =
new LazyDictionary<T, SemaphoreSlim>();
public async Task<IDisposable> LockAsync(T key)
{
var semaphore = semaphoreDictionary.GetOrAdd(key, () => new SemaphoreSlim(1,1));
await semaphore.WaitAsync();
return new ActionDisposable(() => semaphore.Release());
}
}
It depends on the following two helper classes:
LazyDictionary:
public class LazyDictionary<TKey,TValue>
{
//here we use Lazy<TValue> as the value in the dictionary
//to guard against the fact the the initializer function
//in ConcurrentDictionary.AddOrGet *can*, under some conditions,
//run more than once per key, with the result of all but one of
//the runs being discarded.
//If this happens, only uninitialized
//Lazy values are discarded. Only the Lazy that actually
//made it into the dictionary is materialized by accessing
//its Value property.
private ConcurrentDictionary<TKey, Lazy<TValue>> dictionary =
new ConcurrentDictionary<TKey, Lazy<TValue>>();
public TValue GetOrAdd(TKey key, Func<TValue> valueGenerator)
{
var lazyValue = dictionary.GetOrAdd(key,
k => new Lazy<TValue>(valueGenerator));
return lazyValue.Value;
}
}
ActionDisposable:
public sealed class ActionDisposable:IDisposable
{
//useful for making arbitrary IDisposable instances
//that perform an Action when Dispose is called
//(after a using block, for instance)
private Action action;
public ActionDisposable(Action action)
{
this.action = action;
}
public void Dispose()
{
var action = this.action;
if(action != null)
{
action();
}
}
}
Now, if you keep a static instance of this somewhere:
static AsyncLocker<string> userLock = new AsyncLocker<string>();
you can use it in an async method, leveraging the delights of LockAsync's IDisposable return type to write a using statement that neatly wraps the critical section:
using(await userLock.LockAsync(userId))
{
//user with userId only allowed in this section
//one at a time.
}
If we need to wait before entering, it's done asynchronously, freeing up the thread to service other requests, instead of blocking until the wait is over and potentially messing up your server's performance under load.
Of course, when you need to scale to more than one webserver, this approach will no longer work, and you'll need to synchronize using a different means (probably via the DB).
In my WebApi controller I have the following (pseudo) code that receives update notifications from Instagrams real-time API:
[HttpPost]
public void Post(InstagramUpdate instagramUpdate)
{
var subscriptionId = instagramUpdate.SubscriptionId;
var lastUpdate = GetLastUpdate(subscriptionId);
// To avoid breaking my Instagram request limit, do not fetch new images too often.
if (lastUpdate.AddSeconds(5) < DateTime.UtcNow)
{
// More than 5 seconds ago since last update for this subscription. Get new images
GetNewImagesFromInstagram(subscriptionId);
UpdateLastUpdate(subscriptionId, DateTime.UtcNow);
}
}
This won't work very well if I receive two update notifications for the same subscription almost simultaneously, since lastUpdate won't have been updated until after the first request has been processed.
What would be the best way to tackle this problem? I'm thinking of using some kind of cache, but I'm not sure how. Is there some kind of best practices for these kind of things? I'm guessing it's a common problem: "receive notification, do something if something hasn't been done recently..."
Thanks to this answer I went with the following approach, using MemoryCache
[HttpPost]
public void Post(IEnumerable<InstagramUpdate> instagramUpdates)
{
foreach (var instagramUpdate in instagramUpdates)
{
if (WaitingToProcessSubscriptionUpdate(instagramUpdate.Subscription_id))
{
// Ongoing request, do nothing
}
else
{
// Process update
}
}
}
private bool WaitingToProcessSubscriptionUpdate(string subscriptionId)
{
// Check in the in memory cache if this subscription is in queue to be processed. Add it otherwise
var queuedRequest = _cache.AddOrGetExisting(subscriptionId, string.Empty, new CacheItemPolicy
{
// Automatically expire this item after 1 minute (if update failed for example)
AbsoluteExpiration = DateTime.Now.AddMinutes(1)
});
return queuedRequest != null;
}
I am afraid that it is awful idea, but ... Maybe it worth to add lock to this method ? Like
private List<int> subscriptions = new List<int>();
and then
int subscriptinId = 1;//add calculation here
int subscriptionIdIndex = subscriptions.IndexOf(subscriptinId);
lock (subscriptions[subscriptionIdIndex])
{
//your method code
}
Feel free to criticize this approach )
I am currently using the Change Notifications in Active Directory Domain Services in .NET as described in this blog. This will return all events that happen on an selected object (or in the subtree of that object). I now want to filter the list of events for creation and deletion (and maybe undeletion) events.
I would like to tell the ChangeNotifier class to only observe create-/delete-/undelete-events. The other solution is to receive all events and filter them on my side. I know that in case of the deletion of an object, the atribute list that is returned will contain the attribute isDeleted with the value True. But is there a way to see if the event represents the creation of an object? In my tests the value for usnchanged is always usncreated+1 in case of userobjects and both are equal for OUs, but can this be assured in high-frequency ADs? It is also possible to compare the changed and modified timestamp. And how can I tell if an object has been undeleted?
Just for the record, here is the main part of the code from the blog:
public class ChangeNotifier : IDisposable
{
static void Main(string[] args)
{
using (LdapConnection connect = CreateConnection("localhost"))
{
using (ChangeNotifier notifier = new ChangeNotifier(connect))
{
//register some objects for notifications (limit 5)
notifier.Register("dc=dunnry,dc=net", SearchScope.OneLevel);
notifier.Register("cn=testuser1,ou=users,dc=dunnry,dc=net", SearchScope.Base);
notifier.ObjectChanged += new EventHandler<ObjectChangedEventArgs>(notifier_ObjectChanged);
Console.WriteLine("Waiting for changes...");
Console.WriteLine();
Console.ReadLine();
}
}
}
static void notifier_ObjectChanged(object sender, ObjectChangedEventArgs e)
{
Console.WriteLine(e.Result.DistinguishedName);
foreach (string attrib in e.Result.Attributes.AttributeNames)
{
foreach (var item in e.Result.Attributes[attrib].GetValues(typeof(string)))
{
Console.WriteLine("\t{0}: {1}", attrib, item);
}
}
Console.WriteLine();
Console.WriteLine("====================");
Console.WriteLine();
}
LdapConnection _connection;
HashSet<IAsyncResult> _results = new HashSet<IAsyncResult>();
public ChangeNotifier(LdapConnection connection)
{
_connection = connection;
_connection.AutoBind = true;
}
public void Register(string dn, SearchScope scope)
{
SearchRequest request = new SearchRequest(
dn, //root the search here
"(objectClass=*)", //very inclusive
scope, //any scope works
null //we are interested in all attributes
);
//register our search
request.Controls.Add(new DirectoryNotificationControl());
//we will send this async and register our callback
//note how we would like to have partial results
IAsyncResult result = _connection.BeginSendRequest(
request,
TimeSpan.FromDays(1), //set timeout to a day...
PartialResultProcessing.ReturnPartialResultsAndNotifyCallback,
Notify,
request
);
//store the hash for disposal later
_results.Add(result);
}
private void Notify(IAsyncResult result)
{
//since our search is long running, we don't want to use EndSendRequest
PartialResultsCollection prc = _connection.GetPartialResults(result);
foreach (SearchResultEntry entry in prc)
{
OnObjectChanged(new ObjectChangedEventArgs(entry));
}
}
private void OnObjectChanged(ObjectChangedEventArgs args)
{
if (ObjectChanged != null)
{
ObjectChanged(this, args);
}
}
public event EventHandler<ObjectChangedEventArgs> ObjectChanged;
#region IDisposable Members
public void Dispose()
{
foreach (var result in _results)
{
//end each async search
_connection.Abort(result);
}
}
#endregion
}
public class ObjectChangedEventArgs : EventArgs
{
public ObjectChangedEventArgs(SearchResultEntry entry)
{
Result = entry;
}
public SearchResultEntry Result { get; set; }
}
I participated in a design review about five years back on a project that started out using AD change notification. Very similar questions to yours were asked. I can share what I remember, and don't think things have change much since then. We ended up switching to DirSync.
It didn't seem possible to get just creates & deletes from AD change notifications. We found change notification resulted enough events monitoring a large directory that notification processing could bottleneck and fall behind. This API is not designed for scale, but as I recall the performance/latency were not the primary reason we switched.
Yes, the usn relationship for new objects generally holds, although I think there are multi-dc scenarios where you can get usncreated == usnchanged for a new user, but we didn't test that extensively, because...
The important thing for us was that change notification only gives you reliable object creation detection under the unrealistic assumption that your machine is up 100% of the time! In production systems there are always some case where you need to reboot and catch up or re-synchronize, and we switched to DirSync because it has a robust way to handle those scenarios.
In our case it could block email to a new user for an indeterminate time if an object create were missed. That obviously wouldn't be good, we needed to be sure. For AD change notifications, getting that resync right that would have some more work and hard to test. But for DirSync, its more natural, and there's a fast-path resume mechanism that usually avoids resync. For safety I think we triggered a full re-synchronize every day.
DirSync is not as real-time as change notification, but its possible to get ~30-second average latency by issuing the DirSync query once a minute.
What is the most efficient way to sync documents in a RavenDB?
From an external source I get an IEnumerable of BlogPosts that I want to do the following with:
Add new objects that are new to RavenDB
Update existing objects
Remove objects that were removed in the external source
The code that needs implementation:
public void SyncIntoRaven(IEnumerable<BlogPost> postsToSync, IDocumentStore store) {
// TODO: Implement
// AddNewItems(postsToSync);
// TODO: Implement
// RemoveDeletedItems(postsToSync);
// TODO: Implement
// UpdateExistingItems(postsToSync);
}
One could just pull out all BlogPosts from RavenDB and sync locally to then push all the changes back, but I want to minimize traffic to RavenDB. But maybe that's not the right approach either?
If you are sharing the same ID between your external source and RavenDB, you can do this quite easily, in an ACID fashion, and within one transaction.
Keep track of IDs that changed between sync operations, and once you have that list of ID's you can easily do this:
Open a session, add the new documents using session.Store(), load all the documents need updating or deleting using session.Load(string[]) session.Load().Lazily, make the updates (and deletions using the Deferred option), and once you are done call session.SaveChanges().
That should get you covered, and happen in only one roundtrip to the server.
Either way, you never want to do complete sync every time. You always want to use deltas.
With the help in description-form from synhershko I figured it out and wanted to share the code, simplified to show the concepts.
private void RefreshBlogPosts(IDocumentSession session, IList<BlogPost> parsedPosts) {
var parsedPostsIds = parsedPosts.Select(x => x.Id);
var storePosts = session.Load<BlogPost>(parsedPostsIds);
// Update existing or create new posts
for(int i = 0; i < storePosts.Count(); i++) {
var parsedPost = parsedPosts[i];
var storePost = storePosts[i];
if(storePost == null) {
storePost = parsedPost;
session.Store(storePost);
} else {
// Update post's properties
}
}
// Find posts IDs no longer in database
var removedPostIds = session.Query<BlogPost>().Select(x => x.Id)
.Where(postId => !parsedPostsIds.Contains(postId));
foreach(var removedPostId in removedPostIds) {
session.Advanced.Defer(new DeleteCommandData() { Key = removedPostId });
}
session.SaveChanges();
}