GeckoFX file download - c#

I have some issues with GeckoFX file download.
I tried the LauncherDialog method described in a few locations such as How to handle downloading in GeckoFX 29, and a WebRequest / Webclient method.
Both method work, but they do two requests to the server. The first one will trigger the LauncherDialog.Download event, and the LauncherDialog will make a new request to get the actual file.
I use GeckoFX in a custom web client for a customer, and in this particular case, this download request requires several seconds of processing on server side and modifies the data state. The second request is delayed and the returned data isn't the same as for the first request.
Also, this particular application doesn't need a download progress window of any sorts.
Is there any way to get the data stream from the initial request? Modifying GeckoFx-Winforms is not a problem. I would prefer avoiding any modification to GeckoFX-Core, but I'll do it if required.

Well, I revisited my wrong assumptions about XPCOM programming, took a look at selected locations in Firefox/Gecko source code and found one solution. Which could be very obvious for someone with some XPCOM/XUL programming experience, but was not initially for me. So I guess sharing my solution could help a few people.
In my case, the LauncherDialog method is definitely not the way to go.
Instead, I implemented the nsIFactory, nsIExternalHelperAppService and nsIStreamListener interfaces.
nsiStreamListener
internal class MyStreamListener : nsIStreamListener
{
public MyStreamListener(/*...*/) { }
public void OnStartRequest(nsIRequest aRequest, nsISupports aContext)
{
// This will get called once, when the download "begins".
// You can initialize your things here.
}
public void OnStopRequest(nsIRequest aRequest, nsISupports aContext, int aStatusCode)
{
// This will also get called once, when the download is
// complete or interrupted. You can perform the post-download
// actions here.
if (aStatusCode != GeckoError.NS_OK) {
// download interrupted
}
else {
// download completed
}
}
public void OnDataAvailable(nsIRequest aRequest, nsISupports aContext, nsIInputStream aInputStream, ulong aOffset, uint aCount)
{
// This gets called several times with small chunks of data.
// Do what you need with the stream. In my case, I read it
// in a small buffer, which then gets written to an output
// filestream (not shown).
// The aOffset parameter is the sum of all previously received data.
var lInput = InputStream.Create(aInputStream);
byte[] lBuffer = new byte[aCount];
lInput.Read(lBuffer, 0, (int)aCount);
}
}
nsIExternalHelperAppService
public class MyExternalHelperAppService : nsIExternalHelperAppService
{
public MyExternalHelperAppService(/* ... */)
{
/* ... */
}
public nsIStreamListener DoContent(nsACStringBase aMimeContentType, nsIRequest aRequest, nsIInterfaceRequestor aWindowContext, bool aForceSave)
{
var request = Request.CreateRequest(aRequest);
var lChannel = request as HttpChannel;
try {
if (lChannel != null) {
var uri = lChannel.OriginalUri;
var contentType = lChannel.ContentType;
var contentLength = lChannel.ContentLength;
var dispositionFilename = lChannel.ContentDispositionFilename;
// Do your contenttype validation, keeping only what you need.
// Make sure you clean dispositionFilename before using it.
// If you don't want to do anything with that file, you can return null;
return new MyStreamListener(/* ... */);
}
}
catch (COMException) {
/* ... */
}
return null;
}
}
nsIFactory (you can also overload GenericOneClassNsFactory<TFactory,TType>):
public IntPtr CreateInstance(nsISupports aOuter, ref Guid iid)
{
// This is called when the content dispatcher gets a DISPOSITION_ATTACHMENT
// on the channel, or when it doesn't have any builtin handler
// for the content type. It needs an external helper to handle
// the content, so it creates one and calls DoContent on it.
MyExternalHelperAppService _myExternalHelperAppService = new MyExternalHelperAppService(...);
IntPtr result;
IntPtr iUnknownForObject = Marshal.GetIUnknownForObject(_myExternalHelperAppService);
Marshal.QueryInterface(iUnknownForObject, ref iid, out result);
Marshal.Release(iUnknownForObject);
return result;
}
public void LockFactory(bool #lock) {
// do nothing here, it's not used, only kept for backwards compatibility.
}
Then, somewhere in my initialization code, I registered my nsIFactory with the proper contract:
Xpcom.RegisterFactory(typeof(MyExternalHelperAppService).GUID,
"MyExternalHelperAppService",
"#mozilla.org/uriloader/external-helper-app-service;1",
new MyNsFactory());
And that's all.

Related

Is there a way to avoid using side effects to process this data

I have an application I'm writing that runs script plugins to automate what a user used to have to do manually through a serial terminal. So, I am basically implementing the serial terminal's functionality in code. One of the functions of the terminal was to send a command which kicked off the terminal receiving continuously streamed data from a device until the user pressed space bar, which would then stop the streaming of the data. While the data was streaming, the user would then set some values in another application on some other devices and watch the data streamed in the terminal change.
Now, the streamed data can take different shapes, depending on the particular command that's sent. For instance, one response may look like:
---RESPONSE HEADER---
HERE: 1
ARE: 2 SOME:3
VALUES: 4
---RESPONSE HEADER---
HERE: 5
ARE: 6 SOME:7
VALUES: 8
....
another may look like:
here are some values
in cols and rows
....
So, my idea is to have a different parser based on the command I send. So, I have done the following:
public class Terminal
{
private SerialPort port;
private IResponseHandler pollingResponseHandler;
private object locker = new object();
private List<Response1Clazz> response1;
private List<Response2Clazz> response2;
//setter omited for brevity
//get snapshot of data at any point in time while response is polling.
public List<Response1Clazz> Response1 {get { lock (locker) return new List<Response1Clazz>(response1); }
//setter omited for brevity
public List<Response2Clazz> Response2 {get { lock (locker) return new List<Response1Clazz>(response2); }
public Terminal()
{
port = new SerialPort(){/*initialize data*/}; //open port etc etc
}
void StartResponse1Polling()
{
Response1 = new List<Response1Clazz>();
Parser<List<Response1Clazz>> parser = new KeyValueParser(Response1); //parser is of type T
pollingResponseHandler = new PollingResponseHandler(parser);
//write command to start polling response 1 in a task
}
void StartResponse2Polling()
{
Response2 = new List<Response2Clazz>();
Parser<List<Response2Clazz>> parser = new RowColumnParser(Response2); //parser is of type T
pollingResponseHandler = new PollingResponseHandler(parser); // this accepts a parser of type T
//write command to start polling response 2
}
OnSerialDataReceived(object sender, Args a)
{
lock(locker){
//do some processing yada yada
//we pass in the serial data to the handler, which in turn delegates to the parser.
pollingResponseHandler.Handle(processedSerialData);
}
}
}
the caller of the class would then be something like
public class Plugin : BasePlugin
{
public override void PluginMain()
{
Terminal terminal = new Terminal();
terminal.StartResponse1Polling();
//update some other data;
Response1Clazz response = terminal.Response1;
//process response
//update more data
response = terminal.Response1;
//process response
//terminal1.StopPolling();
}
}
My question is quite general, but I'm wondering if this is the best way to handle the situation. Right now I am required to pass in an object/List that I want modified, and it's modified via a side effect. For some reason this feels a little ugly because there is really no indication in code that this is what is happening. I am purely doing it because the "Start" method is the location that knows which parser to create and which data to update. Maybe this is Kosher, but I figured it is worth asking if there is another/better way. Or at least a better way to indicate that the "Handle" method produces side effects.
Thanks!
I don't see problems in modifying List<>s that are received as a parameter. It isn't the most beautiful thing in the world but it is quite common. Sadly C# doesn't have a const modifier for parameters (compare this with C/C++, where unless you declare a parameter to be const, it is ok for the method to modify it). You only have to give the parameter a self-explaining name (like outputList), and put a comment on the method (you know, an xml-comment block, like /// <param name="outputList">This list will receive...</param>).
To give a more complete response, I would need to see the whole code. You have omitted an example of Parser and an example of Handler.
Instead I see a problem with your lock in { lock (locker) return new List<Response1Clazz>(response1); }. And it seems to be non-sense, considering that you then do Response1 = new List<Response1Clazz>();, but Response1 only has a getter.

C# TPL: Possible to restart a failed Pipeline at an arbitrary step?

I have a data processing job that consists of about 20 sequential steps. The steps all fall under one of three categories:
do some file manipulation
import / export data from a database
make a call to a 3rd party web API
I've refactored the code from one long, awful looking method to a pipeline pattern, using examples here and here. All of the steps are TransformBlock, such as
var stepThirteenPostToWebApi = new TransformBlock<FileInfo, System.Guid>(async csv =>
{
dynamic task = await ApiUtils.SubmitData(csv.FullName);
return task.guid;
});
The code works most of the time, but occasionally a step in the pipeline fails for whatever reason - let's say a corrupt file can't be read in step 6 of 20 (just an example - any step could fail). The pipeline stops running further tasks, as it should.
However, the 3rd party web API introduces a challenge - we are charged for each job we initiate whether we execute all 20 steps or just the first one.
I would like to be able to fix whatever went wrong in the problem step (again, for our example let's say I fix the corrupt file in step 6 of 20), then pick back up at step 6. The 3rd party web API has a GUID for each job, and is asynchronous, so that should be fine - after the problem is fixed, it will happily let a job resume with remaining steps.
My question: Is it possible (and if so advisable?) to design a pipeline that could begin at any step, assuming the pre-requisites for that step were valid?
It would look something like:
job fails on step 6 and logs step 5 as the last successful step
a human comes along and fixes whatever caused step 6 to fail
a new pipeline is started at step 6
I realize a brute-force way would be to have StartAtStep2(), StartAtStep3(), StartAtStep4() methods. That doesn't seem like a good design, but I'm a bit new at this pattern so maybe that's acceptable.
The brute force way is not that bad, for example your above code would just need to be
bool StartAtStepThirteen(FileInfo csv)
{
return stepThirteenPostToWebApi.Post(csv);
}
The setup of the chain should be a separate method than the executing of the chain. You should save stepThirteenPostToWebApi in a class level variable in a class that represent's the entire chain, the setup of the chain could be done in the class's constructor.
Here is a simple 3 step version of the process. When a error happens instead of faulting the task chain I log the error and pass null along the chain for invalid entries. You could make that log method raise a event and then the user can decide what to do with the bad entry.
public class WorkChain
{
private readonly TransformBlock<string, FileInfo> stepOneGetFileInfo;
private readonly TransformBlock<FileInfo, System.Guid?> stepTwoPostToWebApi;
private readonly ActionBlock<System.Guid?> stepThreeDisplayIdToUser;
public WorkChain()
{
stepOneGetFileInfo = new TransformBlock<string, FileInfo>(new Func<string, FileInfo>(GetFileInfo));
stepTwoPostToWebApi = new TransformBlock<FileInfo, System.Guid?>(new Func<FileInfo, Task<Guid?>>(PostToWebApi));
stepThreeDisplayIdToUser = new ActionBlock<System.Guid?>(new Action<Guid?>(DisplayIdToUser));
stepOneGetFileInfo.LinkTo(stepTwoPostToWebApi, new DataflowLinkOptions() {PropagateCompletion = true});
stepTwoPostToWebApi.LinkTo(stepThreeDisplayIdToUser, new DataflowLinkOptions() {PropagateCompletion = true});
}
public void PostToStepOne(string path)
{
bool result = stepOneGetFileInfo.Post(path);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepOneGetFileInfo");
}
}
public void PostToStepTwo(FileInfo csv)
{
bool result = stepTwoPostToWebApi.Post(csv);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepTwoPostToWebApi");
}
}
public void PostToStepThree(Guid id)
{
bool result = stepThreeDisplayIdToUser.Post(id);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepThreeDisplayIdToUser");
}
}
public void CompleteAdding()
{
stepOneGetFileInfo.Complete();
}
public Task Completion { get { return stepThreeDisplayIdToUser.Completion; } }
private FileInfo GetFileInfo(string path)
{
try
{
return new FileInfo(path);
}
catch (Exception ex)
{
LogGetFileInfoError(ex, path);
return null;
}
}
private async Task<Guid?> PostToWebApi(FileInfo csv)
{
if (csv == null)
return null;
try
{
dynamic task = await ApiUtils.SubmitData(csv.FullName);
return task.guid;
}
catch (Exception ex)
{
LogPostToWebApiError(ex, csv);
return null;
}
}
private void DisplayIdToUser(Guid? obj)
{
if(obj == null)
return;
Console.WriteLine(obj.Value);
}
}

Awesomnium Post Parameters

currently i am working with Awsomnium 1.7 in the C# environment.
I'm just using the Core and trying to define custom post parameters.
Now, i googled a lot and i even posted at the awsomnium forums, but there was no answer.
I understand the concept, but the recent changes just dropped the suggested mechanic and examples.
What i found:
http://support.awesomium.com/kb/general-use/how-do-i-send-form-values-post-data
The problem with this is, that the WebView Class does not contain "OnResourceRequest" Event anymore.
So far, i have implemented the IResourceInterceptor and have the "OnRequest"-Function overwritten
public ResourceResponse OnRequest(ResourceRequest request)
is the signature, but i have no chance to reach in there in order to add request headers.
Anyone here any idea? I tried to look in the documentation, but i didn't find anything on that.....
You need to attach your IResourceInterceptor to WebCore, not WebView. Here's a working example:
Resource interceptor:
public class CustomResourceInterceptor : ResourceInterceptor
{
protected override ResourceResponse OnRequest(ResourceRequest request)
{
request.Method = "POST";
var bytes = "Appending some text to the request";
request.AppendUploadBytes(bytes, (uint) bytes.Length);
request.AppendExtraHeader("custom-header", "this is a custom header");
return null;
}
}
Main application:
public MainWindow()
{
WebCore.Started += WebCoreOnStarted;
InitializeComponent();
}
private void WebCoreOnStarted(object sender, CoreStartEventArgs coreStartEventArgs)
{
var interceptor = new CustomResourceInterceptor();
WebCore.ResourceInterceptor = interceptor;
//webView is a WebControl on my UI, but you should be able to create your own WebView off WebCore
webView.Source = new Uri("http://www.google.com");
}
HotN's answer above is good; in fact, it's what I based my answer on. However, I spent a week searching for this information and putting together something that will work. (The answer above has a couple of issues which, at the very least, make it unworkable with v1.7 of Awesomium.) What I was looking for was something that would work right out of the box.
And here is that solution. It needs improvement, but it suits my needs at the moment. I hope this helps someone else.
// CRI.CustomResourceInterceptor
//
// Author: Garison E Piatt
// Contact: {removed}
// Created: 11/17/14
// Version: 1.0.0
//
// Apparently, when Awesomium was first created, the programmers did not understand that someone would
// eventually want to post data from the application. So they made it incredibly difficult to upload
// POST parameters to the remote web site. We have to jump through hoops to get that done.
//
// This module provides that hoop-jumping in a simple-to-understand fashion. We hope. It overrides
// the current resource interceptor (if any), replacing both the OnRequest and OnFilterNavigation
// methods (we aren't using the latter yet).
//
// It also provides settable parameters. Once this module is attached to the WebCore, it is *always*
// attached; therefore, we can simply change the parameters before posting to the web site.
//
// File uploads are currently unhandled, and, once handled, will probably only upload one file. We
// will deal with that issue later.
//
// To incoroprate this into your application, follow these steps:
// 1. Add this file to your project. You know how to do that.
// 2. Edit your MainWindow.cs file.
// a. At the top, add:
// using CRI;
// b. inside the main class declaration, near the top, add:
// private CustomResourceInterceptor cri;
// c. In the MainWindow method, add:
// WebCore.Started += OnWebCoreOnStarted;
// cri = new CustomResourceInterceptor();
// and (set *before* you set the Source value for the Web Control):
// cri.Enabled = true;
// cri.Parameters = String.Format("login={0}&password={1}", login, pw);
// (Choose your own parameters, but format them like a GET query.)
// d. Add the following method:
// private void OnWebCoreOnStarted(object sender, CoreStartEventArgs coreStartEventArgs) {
// WebCore.ResourceInterceptor = cri;
// }
// 3. Compile your application. It should work.
using System;
using System.Runtime.InteropServices;
using System.Text;
using Awesomium.Core;
using Awesomium.Windows.Controls;
namespace CRI {
//* CustomResourceInterceptor
// This object replaces the standard Resource Interceptor (if any; we still don't know) with something
// that allows posting data to the remote web site. It overrides both the OnRequest and OnFilterNavigation
// methods. Public variables allow for run-time configuration.
public class CustomResourceInterceptor : IResourceInterceptor {
// Since the default interceptor remains overridden for the remainder of the session, we need to disable
// the methods herein unless we are actually using them. Note that both methods are disabled by default.
public bool RequestEnabled = false;
public bool FilterEnabled = false;
// These are the parameters we send to the remote site. They are empty by default; another safeguard
// against sending POST data unnecessarily. Currently, both values allow for only one string. POST
// variables can be combined (by the caller) into one string, but this limits us to only one file
// upload at a time. Someday, we will have to fix that. And make it backward-compatible.
public String Parameters = null;
public String FilePath = null;
/** OnRequest
** This ovverrides the default OnRequest method of the standard resource interceptor. It receives
** the resource request object as a parameter.
**
** It first checks whether or not it is enabled, and returns NULL if not. Next it sees if any
** parameters are defined. If so, it converst them to a byte stream and appends them to the request.
** Currently, files are not handled, but we hope to add that someday.
*/
public ResourceResponse OnRequest(ResourceRequest request) {
// We do nothing at all if we aren't enabled. This is a stopgap that prevents us from sending
// POST data with every request.
if (RequestEnabled == false) return null;
// If the Parameters are defined, convert them to a byte stream and append them to the request.
if (Parameters != null) {
var str = Encoding.Default.GetBytes(Parameters);
var bytes = Encoding.UTF8.GetString(str);
request.AppendUploadBytes(bytes, (uint)bytes.Length);
}
// If either the parameters or file path are defined, this is a POST request. Someday, we'll
// figure out how to get Awesomium to understand Multipart Form data.
if (Parameters != null || FilePath != null) {
request.Method = "POST";
request.AppendExtraHeader("Content-Type", "application/x-www-form-urlencoded"); //"multipart/form-data");
}
// Once the data has been appended to the page request, we need to disable this process. Otherwise,
// it will keep adding the data to every request, including those that come from the web site.
RequestEnabled = false;
Parameters = null;
FilePath = null;
return null;
}
/** OnFilterNavigation
** Not currently used, but needed to keep VisualStudio happy.
*/
public bool OnFilterNavigation(NavigationRequest request) {
return false;
}
}
}

Is this a good/preferable pattern to Azure Queue construction for a T4 template?

I'm building a T4 template that will help people construct Azure queues in a consistent and simple manner. I'd like to make this self-documenting, and somewhat consistent.
First I made the queue name at the top of the file, the queue names have to be in lowercase so I added ToLower()
The public constructor uses the built-in StorageClient API's to access the connection strings. I've seen many different approaches to this, and would like to get something that works in almost all situations. (ideas? do share)
I dislike the unneeded HTTP requests to check if the queues have been created so I made is a static bool . I didn't implement a Lock(monitorObject) since I don't think one is needed.
Instead of using a string and parsing it with commas (like most MSDN documentation) I'm serializing the object when passing it into the queue.
For further optimization I'm using a JSON serializer extension method to get the most out of the 8k limit. Not sure if an encoding will help optimize this any more
Added retry logic to handle certain scenarios that occur with the queue (see html link)
Q: Is "DataContext" appropriate name for this class?
Q: Is it a poor practice to name the Queue Action Name in the manner I have done?
What additional changes do you think I should make?
public class AgentQueueDataContext
{
// Queue names must always be in lowercase
// Is named like a const, but isn't one because .ToLower won't compile...
static string AGENT_QUEUE_ACTION_NAME = "AgentQueueActions".ToLower();
static bool QueuesWereCreated { get; set; }
DataModel.SecretDataSource secDataSource = null;
CloudStorageAccount cloudStorageAccount = null;
CloudQueueClient cloudQueueClient = null;
CloudQueue queueAgentQueueActions = null;
static AgentQueueDataContext()
{
QueuesWereCreated = false;
}
public AgentQueueDataContext() : this(false)
{
}
public AgentQueueDataContext(bool CreateQueues)
{
// This pattern of setting up queues is from:
// ttp://convective.wordpress.com/2009/11/15/queues-azure-storage-client-v1-0/
//
this.cloudStorageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
this.cloudQueueClient = cloudStorageAccount.CreateCloudQueueClient();
this.secDataSource = new DataModel.SecretDataSource();
queueAgentQueueActions = cloudQueueClient.GetQueueReference(AGENT_QUEUE_ACTION_NAME);
if (QueuesWereCreated == false || CreateQueues)
{
queueAgentQueueActions.CreateIfNotExist();
QueuesWereCreated = true;
}
}
// This is the method that will be spawned using ThreadStart
public void CheckQueue()
{
while (true)
{
try
{
CloudQueueMessage msg = queueAgentQueueActions.GetMessage();
bool DoRetryDelayLogic = false;
if (msg != null)
{
// Deserialize using JSON (allows more data to be stored)
AgentQueueEntry actionableMessage = msg.AsString.FromJSONString<AgentQueueEntry>();
switch (actionableMessage.ActionType)
{
case AgentQueueActionEnum.EnrollNew:
{
// Add to
break;
}
case AgentQueueActionEnum.LinkToSite:
{
// Link within Agent itself
// Link within Site
break;
}
case AgentQueueActionEnum.DisableKey:
{
// Disable key in site
// Disable key in AgentTable (update modification time)
break;
}
default:
{
break;
}
}
//
// Only delete the message if the requested agent has been missing for
// at least 10 minutes
//
if (DoRetryDelayLogic)
{
if (msg.InsertionTime != null)
if (msg.InsertionTime < DateTime.UtcNow + new TimeSpan(0, 10, 10))
continue;
// ToDo: Log error: AgentID xxx has not been found in table for xxx minutes.
// It is likely the result of a the registratoin host crashing.
// Data is still consistent. Deleting queued message.
}
//
// If execution made it to this point, then we are either fully processed, or
// there is sufficent reason to discard the message.
//
try
{
queueAgentQueueActions.DeleteMessage(msg);
}
catch (StorageClientException ex)
{
// As of July 2010, this is the best way to detect this class of exception
// Description: ttp://blog.smarx.com/posts/deleting-windows-azure-queue-messages-handling-exceptions
if (ex.ExtendedErrorInformation.ErrorCode == "MessageNotFound")
{
// pop receipt must be invalid
// ignore or log (so we can tune the visibility timeout)
}
else
{
// not the error we were expecting
throw;
}
}
}
else
{
// allow control to fall to the bottom, where the sleep timer is...
}
}
catch (Exception e)
{
// Justification: Thread must not fail.
//Todo: Log this exception
// allow control to fall to the bottom, where the sleep timer is...
// Rationale: not doing so may cause queue thrashing on a specific corrupt entry
}
// todo: Thread.Sleep() is bad
// Replace with something better...
Thread.Sleep(9000);
}
Q: Is "DataContext" appropriate name for this class?
In .NET we have a lot of DataContext classes, so in the sense that you want names to appropriately communicate what the class does, I think XyzQueueDataContext properly communicates what the class does - although you can't query from it.
If you want to stay more aligned to accepted pattern languages, Patterns of Enterprise Application Architecture calls any class that encapsulates access to an external system for a Gateway, while more specifically you may want to use the term Channel in the language of Enterprise Integration Patterns - that's what I would do.
Q: Is it a poor practice to name the Queue Action Name in the manner I have done?
Well, it certainly tightly couples the queue name to the class. This means that if you later decide that you want to decouple those, you can't.
As a general comment I think this class might benefit from trying to do less. Using the queue is not the same thing as managing it, so instead of having all of that queue management code there, I'd suggest injecting a CloudQueue into the instance. Here's how I implement my AzureChannel constructor:
private readonly CloudQueue queue;
public AzureChannel(CloudQueue queue)
{
if (queue == null)
{
throw new ArgumentNullException("queue");
}
this.queue = queue;
}
This better fits the Single Responsibility Principle and you can now implement queue management in its own (reusable) class.

Pattern for concurrent cache sharing

Ok I was a little unsure on how best name this problem :) But assume this scenarion, you're
going out and fetching some webpage (with various urls) and caching it locally. The cache part is pretty easy to solve even with multiple threads.
However, imagine that one thread starts fetching an url, and a couple of milliseconds later another want to get the same url. Is there any good pattern for making the seconds thread's method wait on the first one to fetch the page , insert it into the cache and return it so you don't have to do multiple requests. With little enough overhead that it's worth doing even for requests that take about 300-700 ms? And without locking requests for other urls
Basically when requests for identical urls comes in tightly after each other I want the second request to "piggyback" the first request
I had some loose idea of having a dictionary where you insert an object with the key as url when you start fetching a page and lock on it. If there's any matching the key already it get's the object, locks on it and then tries to fetch the url for the actual cache.
I'm a little unsure of the particulars however to make it really thread-safe, using ConcurrentDictionary might be one part of it...
Is there any common pattern and solutions for scenarios like this?
Breakdown wrong behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Starts fetching the same url since it still doesn't exist in Cache
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Finishes and also inserts into cache (or discards it), returns the page
Breakdown correct behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Wants the same url, but sees it's currently being fetched so waits on thread 1
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Notices that thread 1 is finished and returns the page thread 1 it fetched
EDIT
Most solutions sofar seem to misunderstand the problem and only addressing the caching, as I said that isnt the problem, the problem is when doing an external web fetch to make the second fetch that is done before the first one has cached it to use the result from the first rather then doing a second
You could use a ConcurrentDictionary<K,V> and a variant of double-checked locking:
public static string GetUrlContent(string url)
{
object value1 = _cache.GetOrAdd(url, new object());
if (value1 == null) // null check only required if content
return null; // could legitimately be a null string
var urlContent = value1 as string;
if (urlContent != null)
return urlContent; // got the content
// value1 isn't a string which means that it's an object to lock against
lock (value1)
{
object value2 = _cache[url];
// at this point value2 will *either* be the url content
// *or* the object that we already hold a lock against
if (value2 != value1)
return (string)value2; // got the content
urlContent = FetchContentFromTheWeb(url); // todo
_cache[url] = urlContent;
return urlContent;
}
}
private static readonly ConcurrentDictionary<string, object> _cache =
new ConcurrentDictionary<string, object>();
EDIT: My code is quite a bit uglier now, but uses a separate lock per URL. This allows different URLs to be fetched asynchronously, however each URL will only be fetched once.
public class UrlFetcher
{
static Hashtable cache = Hashtable.Synchronized(new Hashtable());
public static String GetCachedUrl(String url)
{
// exactly 1 fetcher is created per URL
InternalFetcher fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
lock( cache.SyncRoot )
{
fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
fetcher = new InternalFetcher(url);
cache[url] = fetcher;
}
}
}
// blocks all threads requesting the same URL
return fetcher.Contents;
}
/// <summary>Each fetcher locks on itself and is initilized with null contents.
/// The first thread to call fetcher.Contents will cause the fetch to occur, and
/// block until completion.</summary>
private class InternalFetcher
{
private String url;
private String contents;
public InternalFetcher(String url)
{
this.url = url;
this.contents = null;
}
public String Contents
{
get
{
if( contents == null )
{
lock( this ) // "this" is an instance of InternalFetcher...
{
if( contents == null )
{
contents = FetchFromWeb(url);
}
}
}
return contents;
}
}
}
}
Will the Semaphore please stand up! stand up! stand up!
use Semaphore you can easily synchronize your threads with it.
on both cases where
you are trying to load a page that is currently being cached
you are saving cache to a file where a page is loading from it.
in both scenarios you will face troubles.
it is just like writers and readers problem that is a common problem in Operating System Racing Issues. just when a thread wants to rebuild a cache or start caching a page no thread should read from it. if a thread is reading it it should wait until reading finished and replace the cache, no 2 threads should cache same page in to a same file. hence it is possible for all readers to read from a cache at anytime since no writer is writing on it.
you should read some semaphore using samples on msdn, it is very easy to use. just the thread that wants to do something is call the semaphore and if the resource can granted it do the works otherwise sleeps and wait to be woken up when the resource is ready.
Disclaimer: This might be a n00bish answer. Please pardon me, if it is.
I'd recommend using some shared dictionary object with locks to keep a track of the url being currently fetched or have already been fetched.
At every request, check the url against this object.
If an entry for the url is present, check the cache. (this means one of the threads has either fetched it or is currently fetching it)
If its available in the cache, use it, else put the current thread to sleep for a while and check back again. (if not in cache, some thread is still fetching it, so wait while its done)
If the entry is not found in the dictionary object, add the url to it and send the request. Once it obtains a response, add it to cache.
This logic should work, however, you would need to take care of cache expiration and removal of the entry from the dictionary object.
my solution is use atomicBoolean to control access database when cache is timeout or unexist;
at the same moment, only one thread(i call it read-th) can access database, the other threads spin until the read-th return data and write it into cache;
here codes; implement by java;
public class CacheBreakDownDefender<K, R> {
/**
* false = do not write null to cache when get null value from database;
*/
private final boolean writeNullToCache;
/**
* cache different query key
*/
private final ConcurrentHashMap<K, AtomicBoolean> selectingDBTagMap = new ConcurrentHashMap<>();
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(false));
}
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType, boolean writeNullToCache) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(writeNullToCache));
}
private CacheBreakDownDefender(boolean writeNullToCache) {
this.writeNullToCache = writeNullToCache;
}
public R readFromCache(K key, Function<K, ? extends R> getFromCache, Function<K, ? extends R> getFromDB, BiConsumer<K, R> writeCache) throws InterruptedException {
R result = getFromCache.apply(key);
if (result == null) {
final AtomicBoolean selectingDB = selectingDBTagMap.computeIfAbsent(key, x -> new AtomicBoolean(false));
if (selectingDB.compareAndSet(false, true)) {
try {
result = getFromDB.apply(key);
if (result != null || writeNullToCache) {
writeCache.accept(key, result);
}
} finally {
selectingDB.getAndSet(false);
selectingDBTagMap.remove(key);
}
} else {
while (selectingDB.get()) {
TimeUnit.MILLISECONDS.sleep(0L);
//do nothing...
}
return getFromCache.apply(key);
}
}
return result;
}
public static void main(String[] args) throws InterruptedException {
Map<String, String> map = new ConcurrentHashMap<>();
CacheBreakDownDefender<String, String> instance = CacheBreakDownDefender.getInstance(String.class, String.class, true);
for (int i = 0; i < 9; i++) {
int finalI = i;
new Thread(() -> {
String kele = null;
try {
if (finalI == 6) {
kele = instance.readFromCache("kele2", map::get, key -> "helloword2", map::put);
} else
kele = instance.readFromCache("kele", map::get, key -> "helloword", map::put);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("resut= {}", kele);
}).start();
}
TimeUnit.SECONDS.sleep(2L);
}
}
This is not exactly for concurrent caches but for all caches:
"A cache with a bad policy is another name for a memory leak" (Raymond Chen)

Categories