Recursive Async HttpWebRequests - c#

Suppose I have the following class:
Public class FooBar
{
List<Items> _items = new List<Items>();
public List<Items> FetchItems(int parentItemId)
{
FetchSingleItem(int itemId);
return _items
}
private void FetchSingleItem(int itemId)
{
Uri url = new Uri(String.Format("http://SomeURL/{0}.xml", itemId);
HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.Create(url);
webRequest.BeginGetResponse(ReceiveResponseCallback, webRequest);
}
void ReceiveResponseCallback(IAsyncResult result)
{
// End the call and extract the XML from the response and add item to list
_items.Add(itemFromXMLResponse);
// If this item is linked to another item then fetch that item
if (anotherItemIdExists == true)
{
FetchSingleItem(anotherItemId);
}
}
}
There could be any number of linked items that I will only know about at runtime.
What I want to do is make the initial call to FetchSingleItem and then wait until all calls have completed then return List<Items> to the calling code.
Could someone point me in the right direction? I more than happy to refactor the whole thing if need be (which I suspect will be the case!)

Getting the hang of asynchronous coding is not easy especially when there is some sequential dependency between one operation and the next. This is the exact sort of problem that I wrote the AsyncOperationService to handle, its a cunningly short bit of code.
First a little light reading for you: Simple Asynchronous Operation Runner – Part 2. By all means read part 1 but its a bit heavier than I had intended. All you really need is the AsyncOperationService code from it.
Now in your case you would convert your fetch code to something like the following.
private IEnumerable<AsyncOperation> FetchItems(int startId)
{
XDocument itemDoc = null;
int currentId = startId;
while (currentID != 0)
{
yield return DownloadString(new Uri(String.Format("http://SomeURL/{0}.xml", currentId), UriKind.Absolute),
itemXml => itemDoc = XDocument.Parse(itemXml) );
// Do stuff with itemDoc like creating your item and placing it in the list.
// Assign the next linked ID to currentId or if no other items assign 0
}
}
Note the blog also has an implementation of DownloadString which in turn uses WebClient which simplifies things. However the principles still apply if for some reason you must stick with HttpWebRequest. (Let me know if you are having trouble creating an AsyncOperation for this)
You would then use this code like this:-
int startId = GetSomeIDToStartWith();
Foo myFoo = new Foo();
myFoo.FetchItems(startId).Run((err) =>
{
// Clear IsBusy
if (err == null)
{
// All items are now fetched continue doing stuff here.
}
else
{
// "Oops something bad happened" code here
}
}
// Set IsBusy
Note that the call to Run is asynchronous, code execution will appear to jump past it before all the items are fetched. If the UI is useless to the user or even dangerous then you need to block it in a friendly way. The best way (IMO) to do this is with the BusyIndicator control from the toolkit, setting its IsBusy property after the call to Run and clearing it in the Run callback.

All you need is a thread sync thingy. I chose ManualResetEvent.
However, I don't see the point of using asynchronous IO since you always wait for the request to finish before starting a new one. But the example might not show the whole story?
Public class FooBar
{
private ManualResetEvent _completedEvent = new ManualResetEvent(false);
List<Items> _items = new List<Items>();
public List<Items> FetchItems(int parentItemId)
{
FetchSingleItem(itemId);
_completedEvent.WaitOne();
return _items
}
private void FetchSingleItem(int itemId)
{
Uri url = new Uri(String.Format("http://SomeURL/{0}.xml", itemId);
HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.Create(url);
webRequest.BeginGetResponse(ReceiveResponseCallback, webRequest);
}
void ReceiveResponseCallback(IAsyncResult result)
{
// End the call and extract the XML from the response and add item to list
_items.Add(itemFromXMLResponse);
// If this item is linked to another item then fetch that item
if (anotherItemIdExists == true)
{
FetchSingleItem(anotherItemId);
}
else
_completedEvent.Set();
}
}

Related

How can you wait on AppDomain to process async callback in C# and then return the results?

I have some code that loads up and AppDomain(call it domain) calling an object function within the domain. The purpose is to get a list of items from a usb device using the device API to retrieve the information. The API requires a callback to return the information.
var AppDomain.CreateDomain(
$"BiometricsDomain{System.IO.Path.GetRandomFileName()}");
var proxy = domain.CreateInstanceAndUnwrap(proxy.Assembly.FullName, proxy.FullName
?? throw new InvalidOperationException()) as Proxy;
var ids = obj.GetIdentifications();
The proxy code loaded into the domain is as follows
public class Proxy : MarshalByRefObject
{
public List<String> GetIdentifications()
{
var control = new R100DeviceControl();
control.OnUserDB += Control_OnUserDB;
control.Open();
int nResult = control.DownloadUserDB(out int count);
// need to be able to return the list here but obviously that is not
// going to work.
}
private void Control_OnUserDB(List<String> result)
{
// Get the list of string from here
}
}
Is there a way to be able to wait on the device and return the information as needed when the callback is called? Since the GetIdentifications() has already returned I don't know how to get the
You can consider wrapping the Event-Based Asynchronous Pattern (EAP) operations as one task by using a TaskCompletionSource<TResult> so that the event can be awaited.
public class Proxy : MarshalByRefObject {
public List<String> GetIdentifications() {
var task = GetIdentificationsAsync();
return task.Result;
}
private Task<List<String>> GetIdentificationsAsync() {
var tcs = new TaskCompletionSource<List<string>>();
try {
var control = new R100DeviceControl();
Action<List<string>> handler = null;
handler = result => {
// Once event raised then set the
// Result property on the underlying Task.
control.OnUserDB -= handler;//optional to unsubscribe from event
tcs.TrySetResult(result);
};
control.OnUserDB += handler;
control.Open();
int count = 0;
//call async event
int nResult = control.DownloadUserDB(out count);
} catch (Exception ex) {
//Bubble the error up to be handled by calling client
tcs.TrySetException(ex);
}
// Return the underlying Task. The client code
// waits on the Result property, and handles exceptions
// in the try-catch block there.
return tcs.Task;
}
}
You can also improve on it by adding the ability to cancel using a CancellationToken for longer than expected callbacks.
With that the proxy can then be awaited
List<string> ids = proxy.GetIdentifications();
Reference How to: Wrap EAP Patterns in a Task
NOTE: Though there may be more elegant solutions to the problem of asynchronous processing, the fact that this occurs in a child AppDomain warrants child AppDomain best practices. (see links below)
i.e.
do not allow code meant for a child AppDomain to be executed in the parent domain
do not allow complex types to bubble to the parent AppDomain
do not allow exceptions to cross AppDomain boundaries in the form of custom exception types
OP:
I am using it for fault tolerance
First I would probably add a Open or similar method to give time for the data to materialise.
var proxy = domain.CreateInstanceAndUnwrap(proxy.Assembly.FullName, proxy.FullName
?? throw new InvalidOperationException()) as Proxy;
proxy.Open(); // <------ new method here
.
. some time later
.
var ids = obj.GetIdentifications();
Then in your proxy make these changes to allow for data processing to occur in the background so that by the time you call GetNotifications data may be ready.
public class Proxy : MarshalByRefObject
{
ConcurrentBag<string> _results = new ConcurrentBag<string>();
public void Open()
{
var control = new R100DeviceControl();
control.OnUserDB += Control_OnUserDB;
control.Open();
// you may need to store nResult and count in a field?
nResult = control.DownloadUserDB(out int count);
}
public List<String> GetIdentifications()
{
var copy = new List<string>();
while (_results.TryTake(out var x))
{
copy.Add(x);
}
return copy;
}
private void Control_OnUserDB(List<String> result)
{
// Get the list of string from here
_results.Add (result);
}
}
Now you could probably improve upon GetNotifications to accept a timeout in the event either GetNotifications is called before data is ready or if you call it multiply but before subsequent data to arrive.
More
How to: Run Partially Trusted Code in a Sandbox
Not sure why you just don't maintain a little state and then wait for the results in the call:
public class Proxy : MarshalByRefObject
{
bool runningCommand;
int lastResult;
R100DeviceControl DeviceControl { get{ if(deviceControl == null){ deviceControl = new R100DeviceControl(); deviceControl.OnUserDB += Control_OnUserDB; } return deviceControl; } }
public List<String> GetIdentifications()
{
if(runningCommand) return null;
DeviceControl.Open();
runningCommand = true;
lastResult = control.DownloadUserDB(out int count);
}
private void Control_OnUserDB(List<String> result)
{
runningCommand = false;
// Get the list of string from here
}
}
Once you have a pattern like this you can easily switch between async and otherwise whereas before it will look a little harder to understand because you integrated the async logic, this way you can implement the sync method and then make an async wrapper if you desire.

Take all items from ConcurrentBag using a swap

I'm trying to take all items in one fell swoop from a ConcurrentBag. Since there's nothing like TryEmpty on the collection, I've resorted to using Interlocked.Exchange in the same fashion as described here: How to remove all Items from ConcurrentBag?
My code looks like this:
private ConcurrentBag<Foo> _allFoos; //Initialized in constructor.
public bool LotsOfThreadsAccessingThisMethod(Foo toInsert)
{
this._allFoos.Add(toInsert);
return true;
}
public void SingleThreadProcessingLoopAsALongRunningTask(object state)
{
var token = (CancellationToken) state;
var workingSet = new List<Foo>();
while (!token.IsCancellationRequested)
{
if (!workingSet.Any())
{
workingSet = Interlocked.Exchange(ref this._allFoos, new ConcurrentBag<Foo>).ToList();
}
var processingCount = (int)Math.Min(workingSet.Count, TRANSACTION_LIMIT);
if (processingCount > 0)
{
using (var ctx = new MyEntityFrameworkContext())
{
ctx.BulkInsert(workingSet.Take(processingCount));
}
workingSet.RemoveRange(0, processingCount);
}
}
}
The problem is that this sometimes misses items that are added to the list. I've written a test application that feeds data to my ConcurrentBag.Add method and verified that it is sending all of the data. When I set a breakpoint on the Add call and check the count of the ConcurrentBag after, it's zero. The item just isn't being added.
I'm fairly positive that it's because the Interlocked.Exchange call doesn't use the internal locking mechanism of the ConcurrentBag so it's losing data somewhere in the swap, but I have no knowledge of what's actually happening.
How can I just grab all the items out of the ConcurrentBag at one time without resorting to my own locking mechanism? And why does Add ignore the item?
I think taking all the items from the ConcurentBag is not needed. You can achieve exactly the same behavior you are trying to implement simply by changing the processing logic as follows (no need for own synchronization or interlocked swaps):
public void SingleThreadProcessingLoopAsALongRunningTask(object state)
{
var token = (CancellationToken)state;
var buffer = new List<Foo>(TRANSACTION_LIMIT);
while (!token.IsCancellationRequested)
{
Foo item;
if (!this._allFoos.TryTake(out item))
{
if (buffer.Count == 0) continue;
}
else
{
buffer.Add(item);
if (buffer.Count < TRANSACTION_LIMIT) continue;
}
using (var ctx = new MyEntityFrameworkContext())
{
ctx.BulkInsert(buffer);
}
buffer.Clear();
}
}

Asp.Net caching pattern

There are a great number of articles available regarding thread safe caching, here's an example:
private static object _lock = new object();
public void CacheData()
{
SPListItemCollection oListItems;
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if(oListItems == null)
{
lock (_lock)
{
// Ensure that the data was not loaded by a concurrent thread
// while waiting for lock.
oListItems = (SPListItemCollection)Cache[“ListItemCacheName”];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
}
However, this example depends on the request for the cache also rebuilding the cache.
I'm looking for a solution where the request and rebuild are separate. Here's the scenario.
I have a web service that I want to monitor for certain types of error. If an error occurs, I create an monitor object and cache - it is updatable and is locked accordingly during update. Alls well so far.
Elsewhere, I check for the existence of the cached object, and the data it contains. This would work straight out of the box except for one particular scenario.
If the cache object is being updated - say a status change, I would like to wait and get the latest info rather than the current info, which if returned, would be out of date. So for my fetch code, I need to check if the object is currently being created/updating, and if so wait, then retry.
As I pointed out, there are many examples of cache locking patterns but I can't seem to find one that for this scenario. Any ideas as to how to go about this would be appreciated?
You can try the following code using two locks. Write lock in the setter is quite simple and protects cache from being written by more than one threads. The getter use a simple double-check lock.
Now, the trick is in Refresh() method, which uses the same lock as the getter. The method uses the lock and in the first step removes list from the cache. It will trigger any getter to fail the first null check and wait for the lock. The method in the meantime gets items, sets cache again and releases the lock.
When it comes back to the getter, it reads the cache again and now it contains the list.
public class CacheData
{
private static object _readLock = new object();
private static object _writeLock = new object();
public SPListItemCollection ListItem
{
get
{
var oListItems = (SPListItemCollection) Cache["ListItemCacheName"];
if (oListItems == null)
{
lock (_readLock)
{
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
return oListItems;
}
set
{
lock (_writeLock)
{
Cache.Add("ListItemCacheName", value, ..);
}
}
}
public void Refresh()
{
lock (_readLock)
{
Cache.Remove("ListItemCacheName");
var oListItems = DoQueryToReturnItems();
ListItem = oListItems;
}
}
}
You can make the method and property static, if you do not need CacheData instance.

Is this thread safe? Breakpoints get hit multiple times

I have the following code:
public class EmailJobQueue
{
private EmailJobQueue()
{
}
private static readonly object JobsLocker = new object();
private static readonly Queue<EmailJob> Jobs = new Queue<EmailJob>();
private static readonly object ErroredIdsLocker = new object();
private static readonly List<long> ErroredIds = new List<long>();
public static EmailJob GetNextJob()
{
lock (JobsLocker)
{
lock (ErroredIdsLocker)
{
// If there are no jobs or they have all errored then get some new ones - if jobs have previously been skipped then this will re get them
if (!Jobs.Any() || Jobs.All(j => ErroredIds.Contains(j.Id)))
{
var db = new DBDataContext();
foreach (var emailJob in db.Emailing_SelectSend(1))
{
// Dont re add jobs that exist
if (Jobs.All(j => j.Id != emailJob.Id) && !ErroredIds.Contains(emailJob.Id))
{
Jobs.Enqueue(new EmailJob(emailJob));
}
}
}
while (Jobs.Any())
{
var curJob = Jobs.Dequeue();
// Check the job has not previously errored - if they all have then eventually we will exit the loop
if (!ErroredIds.Contains(curJob.Id))
return curJob;
}
return null;
}
}
}
public static void ReInsertErrored(long id)
{
lock (ErroredIdsLocker)
{
ErroredIds.Add(id);
}
}
}
I then start 10 threads which do this:
var email = EmailJobQueue.GetNextJob();
if (email != null)
{
// Breakpoint here
}
The thing is that if I put a breakpoint where the comment is and add one item to the queue then the breakpoint gets hit multiple times. Is this an issue with my code or a peculiarity with VS debugger?
Thanks,
Joe
It appears as if you are getting your jobs from the database:
foreach (var emailJob in db.Emailing_SelectSend(1))
Is that database call marking the records as unavailable for section in future queries? If not, I believe that's why you're hitting the break point multiple times.
For example, if I replace that call to the database with the following, I see your behavior.
// MockDB is a static configured as `MockDB.Enqueue(new EmailJob{Id = 1})`
private static IEnumerable<EmailJob> GetJobFromDB()
{
return new List<EmailJob>{MockDB.Peek()};
}
However, if I actually Dequeue from the mock db, it only hits the breakpoint once.
private static IEnumerable<EmailJob> GetJobFromDB()
{
var list = new List<EmailJob>();
if (MockDB.Any())
list.Add(MockDB.Dequeue());
return list;
}
This is a side effect of debugging a multi-threaded piece of your application.
You are seeing the breakpoint being hit on each thread. Debugging a multi-threaded piece of the application is tricky because you're actually debugging all threads at the same time. In fact, at times, it will jump between classes while you're stepping through because it's doing different things on all of those threads, depending on your application.
Now, to address whether or not it's thread-safe. That really depends on how you're using the resources on those threads. If you're just reading, it's likely that it's thread-safe. But if you're writing, you'll need to leverage at least the lock operation on shared objects:
lock (someLockObject)
{
// perform the write operation
}

Pattern for concurrent cache sharing

Ok I was a little unsure on how best name this problem :) But assume this scenarion, you're
going out and fetching some webpage (with various urls) and caching it locally. The cache part is pretty easy to solve even with multiple threads.
However, imagine that one thread starts fetching an url, and a couple of milliseconds later another want to get the same url. Is there any good pattern for making the seconds thread's method wait on the first one to fetch the page , insert it into the cache and return it so you don't have to do multiple requests. With little enough overhead that it's worth doing even for requests that take about 300-700 ms? And without locking requests for other urls
Basically when requests for identical urls comes in tightly after each other I want the second request to "piggyback" the first request
I had some loose idea of having a dictionary where you insert an object with the key as url when you start fetching a page and lock on it. If there's any matching the key already it get's the object, locks on it and then tries to fetch the url for the actual cache.
I'm a little unsure of the particulars however to make it really thread-safe, using ConcurrentDictionary might be one part of it...
Is there any common pattern and solutions for scenarios like this?
Breakdown wrong behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Starts fetching the same url since it still doesn't exist in Cache
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Finishes and also inserts into cache (or discards it), returns the page
Breakdown correct behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Wants the same url, but sees it's currently being fetched so waits on thread 1
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Notices that thread 1 is finished and returns the page thread 1 it fetched
EDIT
Most solutions sofar seem to misunderstand the problem and only addressing the caching, as I said that isnt the problem, the problem is when doing an external web fetch to make the second fetch that is done before the first one has cached it to use the result from the first rather then doing a second
You could use a ConcurrentDictionary<K,V> and a variant of double-checked locking:
public static string GetUrlContent(string url)
{
object value1 = _cache.GetOrAdd(url, new object());
if (value1 == null) // null check only required if content
return null; // could legitimately be a null string
var urlContent = value1 as string;
if (urlContent != null)
return urlContent; // got the content
// value1 isn't a string which means that it's an object to lock against
lock (value1)
{
object value2 = _cache[url];
// at this point value2 will *either* be the url content
// *or* the object that we already hold a lock against
if (value2 != value1)
return (string)value2; // got the content
urlContent = FetchContentFromTheWeb(url); // todo
_cache[url] = urlContent;
return urlContent;
}
}
private static readonly ConcurrentDictionary<string, object> _cache =
new ConcurrentDictionary<string, object>();
EDIT: My code is quite a bit uglier now, but uses a separate lock per URL. This allows different URLs to be fetched asynchronously, however each URL will only be fetched once.
public class UrlFetcher
{
static Hashtable cache = Hashtable.Synchronized(new Hashtable());
public static String GetCachedUrl(String url)
{
// exactly 1 fetcher is created per URL
InternalFetcher fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
lock( cache.SyncRoot )
{
fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
fetcher = new InternalFetcher(url);
cache[url] = fetcher;
}
}
}
// blocks all threads requesting the same URL
return fetcher.Contents;
}
/// <summary>Each fetcher locks on itself and is initilized with null contents.
/// The first thread to call fetcher.Contents will cause the fetch to occur, and
/// block until completion.</summary>
private class InternalFetcher
{
private String url;
private String contents;
public InternalFetcher(String url)
{
this.url = url;
this.contents = null;
}
public String Contents
{
get
{
if( contents == null )
{
lock( this ) // "this" is an instance of InternalFetcher...
{
if( contents == null )
{
contents = FetchFromWeb(url);
}
}
}
return contents;
}
}
}
}
Will the Semaphore please stand up! stand up! stand up!
use Semaphore you can easily synchronize your threads with it.
on both cases where
you are trying to load a page that is currently being cached
you are saving cache to a file where a page is loading from it.
in both scenarios you will face troubles.
it is just like writers and readers problem that is a common problem in Operating System Racing Issues. just when a thread wants to rebuild a cache or start caching a page no thread should read from it. if a thread is reading it it should wait until reading finished and replace the cache, no 2 threads should cache same page in to a same file. hence it is possible for all readers to read from a cache at anytime since no writer is writing on it.
you should read some semaphore using samples on msdn, it is very easy to use. just the thread that wants to do something is call the semaphore and if the resource can granted it do the works otherwise sleeps and wait to be woken up when the resource is ready.
Disclaimer: This might be a n00bish answer. Please pardon me, if it is.
I'd recommend using some shared dictionary object with locks to keep a track of the url being currently fetched or have already been fetched.
At every request, check the url against this object.
If an entry for the url is present, check the cache. (this means one of the threads has either fetched it or is currently fetching it)
If its available in the cache, use it, else put the current thread to sleep for a while and check back again. (if not in cache, some thread is still fetching it, so wait while its done)
If the entry is not found in the dictionary object, add the url to it and send the request. Once it obtains a response, add it to cache.
This logic should work, however, you would need to take care of cache expiration and removal of the entry from the dictionary object.
my solution is use atomicBoolean to control access database when cache is timeout or unexist;
at the same moment, only one thread(i call it read-th) can access database, the other threads spin until the read-th return data and write it into cache;
here codes; implement by java;
public class CacheBreakDownDefender<K, R> {
/**
* false = do not write null to cache when get null value from database;
*/
private final boolean writeNullToCache;
/**
* cache different query key
*/
private final ConcurrentHashMap<K, AtomicBoolean> selectingDBTagMap = new ConcurrentHashMap<>();
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(false));
}
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType, boolean writeNullToCache) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(writeNullToCache));
}
private CacheBreakDownDefender(boolean writeNullToCache) {
this.writeNullToCache = writeNullToCache;
}
public R readFromCache(K key, Function<K, ? extends R> getFromCache, Function<K, ? extends R> getFromDB, BiConsumer<K, R> writeCache) throws InterruptedException {
R result = getFromCache.apply(key);
if (result == null) {
final AtomicBoolean selectingDB = selectingDBTagMap.computeIfAbsent(key, x -> new AtomicBoolean(false));
if (selectingDB.compareAndSet(false, true)) {
try {
result = getFromDB.apply(key);
if (result != null || writeNullToCache) {
writeCache.accept(key, result);
}
} finally {
selectingDB.getAndSet(false);
selectingDBTagMap.remove(key);
}
} else {
while (selectingDB.get()) {
TimeUnit.MILLISECONDS.sleep(0L);
//do nothing...
}
return getFromCache.apply(key);
}
}
return result;
}
public static void main(String[] args) throws InterruptedException {
Map<String, String> map = new ConcurrentHashMap<>();
CacheBreakDownDefender<String, String> instance = CacheBreakDownDefender.getInstance(String.class, String.class, true);
for (int i = 0; i < 9; i++) {
int finalI = i;
new Thread(() -> {
String kele = null;
try {
if (finalI == 6) {
kele = instance.readFromCache("kele2", map::get, key -> "helloword2", map::put);
} else
kele = instance.readFromCache("kele", map::get, key -> "helloword", map::put);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("resut= {}", kele);
}).start();
}
TimeUnit.SECONDS.sleep(2L);
}
}
This is not exactly for concurrent caches but for all caches:
"A cache with a bad policy is another name for a memory leak" (Raymond Chen)

Categories