I have been reading that the proper way to scale Redis is to add a Separate instance (Even on the same machine is ok because CPU intensive). What I am wondering is if there are any existing components out there that facilitate the round robin / write / read similar to Mongos so that I could just call into it and it would properly write / read to one of the underlying instances. I realize that it is more complicated that what I have represented above, but didn't want to re-invent the wheel by trying to write my own proxy, etc to handle this.
Any suggestions / tips, etc would be appreciated.
Thanks,
S
The approach will work for scaling reads, but not writes as Redis is not yet released with redis-cluster.
For load balancing reads, any TCP load balancer should work fine such as Balance. I link that one because it is software based and pretty simple to set up and use. Of course, if you have a hardware load balancer you could do it there, or use any of several other software based load balancers.
Another option is to implement round robin in your client code, though I prefer to not do that myself. Once redis-cluster is released it won't really matter which server you connect to.
For balancing writes, you'll need to go the route of sharding your data, which is described rather well IMO at Craigslist's Redis usgae page. If you think you'll need to go this route, I'd recommend taking the line JZ takes and do the underlying setup in advance. Ideally once redis-cluster is ready there should be minimal, if any, code changes to move to the cluster handling it for you.
If you want a single IP to handle both reads and writes as well as multiple sharded write masters you would likely need to write that "proxy" yourself, or put the code in the client code you write. Alternatively, this proxy announcement may hold what you need, though I don't see anything about routing writes in it.
Ultimately, I think you'd need to test and validate you actually need that write scaling before implementing it. I've found that if I have all reads on one or more slaves, and have the slaves manage disk persistence, performance of writes is usually not an issue.
Related
I'm working on a web application that uses a number of external data sources for data that we need to display on the front end. Some of the external calls are expensive and some also comes with a monetary cost so we need a way to persist the result of these external requests to survive ie a app restart.
I've started with some proof of concept and my current solution is a combination of a persistent cache/storage (stores serialized json in files on disk) and a runtime cache. When the app starts it will populate runtime cache from the persistent cache, if the persistent cache is empty it would go ahead and call the webservices. Next time the app restarts we're loading from the persistent cache - avoiding the call to the external sources.
After the first population we want the cache to be update in the background with some kind of update process on a given schedule, we also want this update process to be smart enough to only update the cache if the request to the webservice was successful - otherwise keep the old version. Theres also a twist here, some webservices might return a complete collection while others requires one call per entity - so the update-process might differ depending on the concrete web service.
I'm thinking that this senario can't be totally unique, so I've looked around and done a fair bit of Googleing but I haven't fund any patterns or libraries that deals with something like this.
So what I'm looking for is any patterns that might be useful for us, if there is any C#-libraries or articles on the subject as well? I don't want to "reinvent the wheel". If anyone have solved similar problems I would love to hear more about how you approached them.
Thank you so much!
I would like to make the following happen:
My application runs on a Windows machine (call it application A).
I can modify the source code of application A to introduce bandwidth throttling.
I would like to be able to reuse my bandwidth throttling code and drop it into any other applications that I have (in other words, I would like to try and throttle the bandwidth on an application domain level in order to not have to re-factor existing applications for bandwidth throttling).
I want to throttle A's cumulative upload and download speed separately. For example, if A has a maximum of 5 Kbps allotted to upload, then all of A's upload streams will be capped to a cumulative amount of 5 Kbps.
My requirements:
I cannot use a kernel-mode driver.
I need to add throttling on an application domain level.
I have tried to research into this, especially on Stack Overflow but could not find anything useful for my case:
I have seen this example of using a ThrottledStream class wrapper around a Stream object that will introduce throttling when used, but I need this to be at a domain level; taking this approach is problematic because it would require me to refactor a lot of existing code in other applications.
I have seen this question who's answer speaks about using the Windows Filtering Platform API. Unfortunately, a requirement I have is that I absolutely can't use a kernel-mode driver to accomplish this, and my understanding is that the WFP API requires one.
Does anyone know a way to implement my specific bandwidth throttling requirements in order to throttle applications on an application domain level?
I think I have found a solution. With the QOS API, you need to get a handle to your target interface using TcOpenInterface (you can figure out which interface you want to target via a call to TcEnumerateInterfaces). With your interface handle, you need to call TcAddFlow along with a pointer to a TC_GEN_FLOW structure, which allows you to specify both a SendingFlowspec (FLOWSPEC structure) and a ReceivingFlowspec (FLOWSPEC structure) which contains a PeakBandwidth member. Then, to make your interface utilize this flow you've just added to it, you need to add a filter to your interface using a call to TcAddFilter, as MSDN says that the TcAddFilter function associates a new filter with an existing flow that allows packets matching the filter to be directed to the associated flow. I think that to make it application specific, calling TcRegisterClient may do the trick, which you will need to call anyways in order to get a client handle to use with TcEnumerateInterfaces and TcAddFlow from the looks of it (but this remains to be tested). I found this useful example as well (haven't tested it).
Taken from MSDN, the PeakBandwidth member is the upper limit on time-based transmission permission for a given flow, in bytes per second. The PeakBandwidth member restricts flows that may have accrued a significant amount of transmission credits, or tokens from overburdening network resources with one-time or cyclical data bursts, by enforcing a per-second data transmission ceiling. Some intermediate systems can take advantage of this information, resulting in more efficient resource allocation.
Is it bad practice to use a mysql database running on some remote server as a means of interfacing 2 remote computers? For example having box1 poll on a specific row of the remote db checking for values posted by box2, when box2 posts some value box1 carries out a,b,c.
Thanks for any advice.
Consider using something like ZeroMQ, which is an easy-to-use abstraction over sockets with bindings for most languages. There is some nice intro documentation as well as many examples of various patterns you can use in your application.
I can understand the temptation of using a database for this, but the idea of continually writing/polling simply to signal between clients wastes IO, ties up connections, etc. and, more importantly, seems like it would difficult to understand/debug by another person (or yourself in two years).
You can. If you were building something complex, I would caution against it, but it's fine -- you need to deal with having items being done only once, but that's not that difficult.
What you are doing is known as a message queue and there are open-source projects specific to that -- including some built on MySql.
Yes?
You're obfuscating the point of your code by placing a middleman in the situation. It sounds like you're trying to use something you know to do something you don't know. That's pretty normal, because then the problem seems solvable.
If there are only 2 computers (sender-receiver), then it is bad practice if you need fast response times. Otherwise it's fine... direct socket connection would be better, but don't waste time on it if you don't really need it.
On the other hand if there are more than two machines and/or you need fault tolerance then you actually need a middleman. Depending of the signalling you want between the machines the middleman can be a simple key-value store (e.g.: memcached, redis) or a message queue (e.g.: a dedicated message queue sofware, but I have seen MySQL used as a Queue at two different sites with big traffic)
I am about to develop a Windows service in C#. This service needs to keep track of events in the system, and write some data to files from time to time. These ongoing events form a certain state, so I'll keep the state in memory and update it as events will arrive. I don't want to over-complicate things so I don't want the state to be persistent on disk, but I'm wondering if I could somehow make it persistent in memory, so that if the service crashes (and auto restarts by Windows) it could pick up from where it left and go on (possibly losing some events, not a big deal).
I was thinking along the line of creating a "shared" memory area, thus letting Windows manage it, and using it only in the service - but I'm not sure that object will persist after the service dies.
Any ideas?
EDIT: I'm not looking for an overkill solution. The data is somewhat important so I'd like to keep it waiting in memory until the service is restarted, but the data is not too important. It's more of a nice-to-have feature if I can persist the data easily, without working with files, external 3rd party processes and so on. My ideal solution would be a simple built-in feature (in .NET or in Windows) that will provide me with some in-memoory persistence, just to recover from a crash event.
You can use a Persitent Caching Block from the Microsoft Enterprise Library.
It is configurable and you can use many backing stores like database and isolated storage.
I know you said that you don't want to over-complicate things by persisting it to disk, but it's definitely going to much more complicate to persist stuff into shared memory or any of the solutions listed here. The reason why so many applications use databases or file storage is because it's the simplest solution.
I would recommend you keep all the state in a single object or object hierarchy, serialize this object to XML and write it to a file. It really doesn't get much simpler than that.
You could use Memcached, or Redis (which also persists it's data on disk, but handles it automatically).
http://code.google.com/p/redis/
You could also take a look at this question:
Memcached with Windows and .NET
I don't see why it'd be harder to persist to disk.
using db4o you can persist the instances you are already working with.
How about using isolated storage and persisting the object into memory that way?
Even if, for instance, you keep the data on a shared-memory of some other networked pc, how would you "guarantee" that the networked pc wont hang/restart/halt/etc? In that case your service will lose the persisted data anyway.
I would suggest, and chances are you'd likely to end up, storing the data on the same disk.
Note that, because of the volatile nature of memory(RAM) you cannot reload data that was previously there, before the system restart; not unless you use some mechanism to store/reload on disk.
--EDIT--
In that case, how about using MSMQ? So you can push everything over the queue, and even if your service gets a restart, it would look for the items in the queue and continue onwards.
I am trying to work out how to calculate the latency of requests through a web-app (Javascript) to a .net webservice.
Currently I am essentially trying to sync both client and server time, which when hitting the webservice I can look at the offset (which would accurately show the 'up' latency.
The problem is - when you sync the time's, you have to factor in latency for that also. So currently I am timeing the sync request (round trip) and dividing by 2, in an attempt to get the 'up' latency...and then modify the sync accordingly.
This works on the assumption that latency is symmetrical, which it isn't. Does anyone know a procedure that would be able to determine specifically the up/down latency of a JS http request to a .net service? If it needs to involve multiple handshakes thats fine, what ever is as accurate as possible.
Thanks!!
I think this is a tough one - or impossible, to be honest.
There are probably a lot of things you can do to come more or less close to what you want. I can see two ways to tackle the problem:
Use something like NTP to synchronize the clocks and use absolute timestamps. This would be fairly easy but is of course only possible if you control both, server and client (which you probably do not).
Try to make an educated guess :) This would be along the lines what you are doing now. Maybe ping could be of some assistance in any way?
The following article might provide some additional idea(s): A Stream-based Time Synchronization Technique For Networked Computer Games.
Mainly it suggests to make multiple measurements and discard "outliers". But in the end it is not that far from your current implementation, if I understand correctly.
Otherwise there is some academic material available for a more theoretical approach (by first reading some stuff, I mean). These are some things I found: Time Synchronization in Ad Hoc Networks and A clock-sampling mutual network time-synchronization algorithm for wireless ad hoc networks. Or you could have a look at the NTP-Protocol.
I have not read those though :)