I am new to redis therefore I don't know more about its complex technicalities. But let me put my scenario here: I am running two websites from same server and I wanted redis to work on both. On searching, I found that I can do this by assigning different index to different db on same server instance like below:
//In my first website (development)
Idatabase dbOfDev=_conn.GetDatabase(0);
//In my second website (production)
Idatabase dbOfProd=_conn.GetDatabase(1);
This was ideal for me since I could cache both my database in same instance. But then I bumped into What's the Point of Multiple Redis Databases? and How do I change between redis database? links which says "Use of multiple database in same server instance is discouraged and deprecated". Though these links do try to explain the reason behind it, being a beginner, I am still not able to understand its deep technical aspects.
Can anyone explain the reason in simpler terms as why using multiple redis db of same server instance is discouraged. Also, in simpler terms, how can I manage caching of both my websites on same server without the above said approach?
how can I manage caching of both my websites on same server without the above said approach?
You can use different key tag for each website. Say, name the two websites as A and B. For all keys of website A, give each key a prefix(key tag): A:. On the other hand, give each key for website B another prefix: B:. In this way, you can have a unique key namespace for each website.
SET A:key1 val1
SET A:key2 val2
LPUSH B:key1 1
SADD B:key2 val
Also check this answer for more solutions.
Can anyone explain the reason in simpler terms as why using multiple redis db of same server instance is discouraged.
AFAIK, multiple databases feature is NOT discouraged and deprecated. It's a method to isolated key namespaces for different applications. However, the author of Redis consider
Redis multiple database errors my worst decision in Redis design at
all, since it makes Redis internals more complex.
Redis is single-threaded, so compared to multiple databases, multiple Redis instances can take advantage of multiple cores. If you have multiple databases in one Redis instance, you can still only use one core. Also Redis instance itself has little memory footprint, so you don't need to worry about multiple Redis instance costs you too much.
Redis is very fast, and normally the bottleneck is network bandwidth, NOT CPU. So normally you CANNOT get too much gain by using multiple Redis instances. However, if one of your application needs to do some slow commands on Redis, and don't want it to block other applications, you can have a separate Redis instance for the slow application, and have another Redis instance for other fast applications.
Also note that Redis Cluster doesn't support multiple databases.
Personally, I like this multiple database feature. Normally, if I run a Redis instance, not Redis Cluster, I'll put my data into some database other than the default database, i.e. database 0, to avoid incidentally login Redis and do some horrible things on the default database. Also it's very easy to implement a double buffer with multiple databases, e.g. write data to a new database, when it's done, use the SWAPDB command to swap the old DB and new DB automatically and efficiently.
It is not. If you are building a multi-tenant application, supporting multiple websites, it does make sense. And if one of the websites needs to scale more rapidly, you can setup a different instance (or cluster) for that one alone and migration is much simpler.
Related
We have a relative large scale application that uses relational DB (MSSQL).
After a lot of reading I've decided that I want to examine using MongoDB and not MSSQL, mainly because performance and scale issues.
I read and study about Mongo and couldn't figure out the answer for the following questions:
Should we do it? Bare in mind we have the time to invest, the only question is "is it good for us?"
How to model our data?
My problem with mongo is that we have a lot of one to many relations in our DB.
After reading this great post (and the second part as well), I've realized a good practice will be to divide the decision into 3 scenarios:
1 to few
1 to many
1 to squillions.
In our db, most of the times we use one-to-many, but the problem is that most of the times it's the same "one".
For example, we have users and transactions tables.
Each user can perform a transaction, so basically what I should do is to model the user as following:
{
"name": "John",
...,
"Transactions" : [ObjectId("..."), ObjectId("..."),...]
}
So far it's fine, the problem is that we have a lot more than just transactions, for example we could have: posts, requests and many more features like transactions, and then, my users collection becomes huge (more then 25 "columns"). And also when I want to retrieve a data set I have to do several queries unlike MSSQL in which I'm just using Join statement.
Another issue is that I'll have to save a lot of extra data, for example, for each transaction I have to save the terminal ID, and in the report I'll have to show the terminal name, in that case (as for my understanding) I have 2 choices, the one is to do 2 queries and the other is to save the terminal name as well. In relational DB this is a simple join.
So maybe for schemes like ours, Mongo(or any other document based DB) is not the best choice?
I know those are a newbie questions :)
We use c# for our server side (ASP.Net Web API)
Thanks in advance!
You can face with some serious issues while modeling your data with 2 and 3 approaches:
For One to many you may face with data inconsistency or/and eventual consistency. Here, you store inside document an index (array of references) to external documents. So, for your example to add a new transaction you need two requests: create a transaction and add its reference to a user (update document). Mongo DB has ACID transactions only on document level, so for your case application for some reason can create a transaction but doesn’t add its reference to user. It can be app failures, network problems, bugs and so on. Of course, you can simulate db transaction in app with try/catch block making data cleanup when an error occurs. It will help but not in fully because app can fall down between requests.
So, if your app is high loaded after some time you can have some number of “dad” transactions which are not linked to any user. It couldn’t be a big problem if your app doesn’t query transactions directly – only via users, you will have only useless data in db. Otherwise you will have data inconsistency.
To fix that you need to create background job which will make proper cleanup. So, some period of time your data can be inconsistent – eventual consistency. For some applications, it can be ok, for another – not.
The same problem you can face while deleting transactions.
I agree, that a document with 25 arrays of references (columns) looks not very good. Working with such objects manually will be harder (testing, manual data fixes and so on.
One to squillions doesn’t have this affect but you need indexes to query efficiently. For large and shared db you can have bad performance.
In general, I’d like to say document dbs are pretty good if your app works mostly with one document (aggregate) and don’t have a lot of references to another docs and you don’t need transactions between docs. Denormalization can also be a source of inconsistency.
Key-value data is very easy to scale. Document dbs – it’s one step closer to key-value data-store. Column-oriented dbs are even more closed to key-value and so they can be scaled even better.
Also, I recommend you to consider the next measures to improve your SQL Server db performance:
Caching – perhaps you can cache some your app aggregates instead of gathering (making joins) them in SQL db all the time. For instance, Stack Overflow uses SQL Server db and Redis for caching aggregates (questions with answers, comments and so on).
Tune query performance within indexes, db structure, demoralization and so on.
If your db is hosted in on premise SQL Server then additional memory, SSD disk, table partitioning, data compressions, replication can help. As a rule, SQL Server gives a good performance with these approaches for dbs up to 1 TB.
CQRS approach.
Consider storing your app data in different databases. Every type of dbs has its own strong and weak sides. Document DB is good for storing aggregates, SQL db – for relational data and so on. Complex apps as a rule use a few db types.
I'm creating a website content management system which stores a whole bunch of website articles and let user be able to modify these articles through the system. I'm a typical SQL Server developer however I'm thinking maybe this system can be done in DocumentDB.We are using C# plus WebAPI to do the read and write. I'm testing different data access technology to see which one performs better. I have been trying Ling, Linq Lambda, SQL and Stored Procedure. The thing is all these query methods seems all running around 600ms to 700ms when I test via Postman. For example, one of my test is a simple Get http://localhost:xxxxxx/multilanguage/resources/1, which would take 600ms+. That was only a 1 kb document and there are only have 5 documents stored in my collection so far. So I guess what I want to ask is: is there a quicker way to query DocumentDB than this. The reason I ask is because I did something similar in SQL Server before(not to query document, it was for relational tables). A much more complex query in a stored procedure on multiple joined tables only takes around 300ms. So I guess there should be a quicker way to do this. Thanks for any suggestions!
Most probably if you will change implementation to stab you will get same performance since actually you are testing connection time between yours server and client (postman).
There's a couple things you can do, but do keep in mind that DocumentDB, and other NoSQL solutions behave very differently than standard SQL Server. For example, the more nodes and RAM available to DocumentDB the better it will perform overall. The development instance of DocumentDB on Azure is understandably going to use fewer resources than a production instance. Since Azure takes care of scaling, one way to think about it is that the more data you have the better it will perform.
That said, something you are probably not used to is sharing your connection object for your whole application. That avoids the start up penalties every time you want to get your data. Summarizing Performance Tips:
Use TCP connection instead of HTTPS when you can
Use await client.OpenAsync() to avoid pausing on start up latency for the first request
Connect to the DocumentDB in the same region (keep in mind if you host across regions)
Use a singleton to access DocumentDB (it's threadsafe)
Cache your SelfLinks for quick access
Tune your page sizes so that you get only the data you intend to use
The more advanced performance tips cover index policies, etc. DocumentDB and other NoSQL databases behave differently than SQL databases. That also means your assumptions about how the APIs work are probably wrong. Make sure you are testing similar concepts. The SQL Server database connection object needs you to create/dispose of objects for each transaction so it can return those connections back to a connection pool. Treating DocumentDB the same way is going to cause the same kind of performance problems as if you didn't use a connection pool.
A co-worker and I are working on some Pharmacy software (in C#) which deals with the management of patient profiles, patient drug prescriptions, etc. All of these different sets of data are stored in a sql server database (we're using 2008 standard but future versions are fine too). Each store has its own sql server instance on a local machine.
Our Goal:
We want to have "Store A" be able to access "Store B's" databases if need be. Basically in the event that perhaps a pharmacy customer is out of town and visits one of the other pharmacy branches.
Things I've thought of:
My initial thoughts were to basically keep an online server instance of sql server which could be accessed through a dns link (or perhaps IP). I was trying to figure out the best way to keep these in sync and I came across sql servers replication. Problem is I was going to use Transactional Replication with updating subscribers but since it's deprecated It's not really a long term option anymore. Microsoft suggests using p2p replication, but that requires enterprise edition and we're really trying to avoid that if we can. I wanted to use a transactional type of replication since it does a much better job of keeping records consistent (not having to wait for something like a merge agent job to run every hour or something like that).
Something I've thought about more recently is maybe having an internet based sql server instance, which would contain nothing but linked servers back to each stores local machine. I wouldn't have to worry about sync problems if other stores just worked directly off each others local machines. But I've read of a lot of people saying that this is a horrible security vulnerability so I'm not sure if this is even a plausible idea but I think maybe there's some way to make this work?
Anyways so this is the basic gist of what we're trying to do. I don't know if replication or linked servers would be the better route to take.
Edit:
What about bi-directional replication? I was reading a little bit about this but I'm a little unsure about if this is what I need or not. I don't want to have to stagger primary keys between servers or anything, since they are pretty important in identifying prescription numbers and stuff like that. But if I could do bi-directional replication, that could be good too.
Not really an answer but I have more space...
SQL Azure is a the 'cloud' version of SQL Server. A VPN is a way of creating your own private network over the internet. Do some research on these terms. Many applications are going cloud nowadays. You should really consider the likelihood that there will be no internet access.
With regards to replication, you can 'roll your own' replication if you own this application and you are happy to support it.
The basic premise is:
Create a trigger on every table which writes the PK of every change to a log table
Create a process which manages copying and merging only changed info (based on the log table) using subscribers and publishers
I am making a member based web app in ASP MVC3 and I am trying to plan ahead, at first our user base will not be huge, but as with any software the potential for a sudden volume spike is always a possibility.
Thinking ahead to this scenario, I know that the database is the bottleneck area on most web apps. We are using MSSQL 2008RS we will have dedicated servers with several client databases each client has there own database so if one server begins to bottle neck we can scale vertically or move some of the databases to a new server and begin filling it up.
To access the databases we use primarily LINQ 2 SQL and are currently re-factoring some of our code to make use of the IQueryable mechanisms to do a lazy load of content. but each page contains quite a bit of content from various parts of the database.
We also have a few large databases that are used for widgets in the program that rarely change but have millions of rows. The goal with those is to somehow sync them to the primary source and distribute them across several machines and then load balance those servers.
With this layout should I even worry about caching, or will the built-in caching mechanisms in MSSQL be sufficient?
If so where should I begin? I have looked briefly at app fabric but it looks as tho it is for Azure only?
Resources:
How to cache data in a MVC application
http://stephenwalther.com/blog/archive/2008/08/28/asp-net-mvc-tip-39-use-the-velocity-distributed-cache.aspx
http://stephenwalther.com/blog/archive/2008/08/29/asp-net-mvc-tip-40-don-t-cache-pages-that-require-authentication.aspx
Lazy loading is a performance killer. Its better to load the entire object graph with one join than to lazy load other properties. This is especially the case with a list of objects. If you iterate you'll end up lazy loading for each item in the list. Furthermore every call to the db has overhead. Less calls = better performance.
SO was a top 1000 website before it needed two database servers. I think you'll be ok.
If your revenue model says "each client will have its own database" than your scaling issues should be really easy to solve. Sounds like you already have a plan to scale up with more servers as your client base increases. Whats the problem?
Caching on the web tier is usually the first scaling fix you'll have to worry about. You probably don't need to do a fresh db call with each page request.
Overall this sounds like a lot of premature optimization. Your traffic hasn't reached a point where you need to be worried about scaling. Make these kinds of decisions at the last second possible.
The database cache is different to most caches - it can if course load used data into memory and re-use query plans, but that isn't really a cache as such.
AppFabric is definitely not just azure; after all, I it was you wouldnt be able to install it (and use it) locally :) but in truth there is little between AppFabroc, redis and memcached (the latter lacks persistance, of course).
But I think you should initially look at using the inbuilt asp.net caching; both data caching via HttpContext.Cache, and caching of entire responses (or, in MVC 3, partials). Obviously you should have a broad idea of what data is used heavily by lots of requests, and is safe to re-use : cache that!
Just make sure you treat all cached FAA as immutable (if you need to update the cache, re-add a modified value; don't modify the existing objects) - reason: it won't work the same if you start needing to use distributed caching, as that uses serialization, and any changes you make won't be seen by the next request.
background: we've got a number of server processes and client apps that are used entirely internally, in a fairly controlled environment. we capture a significant amount of data every day that goes into a couple database machines. most everything is c#, with a few c++ apps.
just about every app has some basic (if not extensive) dependence on database data, whether it's for historical data, daily-calculated values, or assorted parameters. as the whole environment has gotten a bit more sprawling, I've been wondering about the sense in sticking an intermediary in between all client and server apps and the database, a sort of "database data broker". any app that needs values from the db makes a request to the data broker, instead of a dll wrapper function that calls a stored proc.
one immediate downside is that the data would make two trips across the network: from db to broker, and from broker to calling app. seems like poor form, but the amount of data would be small enough in each request that I'm ok with it as far as performance goes.
one (seeming) upside is that it would be trivial to set up a test environment, as it would entail just setting up a test data broker, and there's no maintaining of db connection strings locally anywhere else. also, I've been pondering creating a mini request language so you wouldn't have to enumerate functions for each dataset you might request (instead of GetX() and GetY(), there would be Get("name = X")
am I over-engineering this, or is it possibly a worthy architecture?
edit: thanks for all the great comments so far, great food for thought.
It depends on what you're trying to accomplish with it. According to Rocky Lhotka, you should only add a tier if you are forced to, kicking and screaming all the way.
I agree with him: don't tier unless you need to. I think there are valid reasons to add additional tiers, usually for purposes of security, scalability and maintainability. The question becomes: is yours a valid reason?
It looks like the major reason is maintainability. Does it outweigh the benefits you get by not having the tier?
only you can answer these:
what are the benefits of doing this?
what are the problems/risks of doing this?
do you need this to make testing easier or even possible?
if you make this change and when it goes live and crashes will you be fired?
if you make the changes and it goes live will you get a promotion?
etc...
As the former architect of a system that also used a database heavily as a "hub," I can say that there are several drawbacks that you should be aware of. Our system used databases:
As a transaction store (typical OLTP stuff)
As a staging queue (submitted but unprocessed transactions)
As a historical data store (results of processed transactions)
As an interoperation layer (untranslated commands or transactions issued from other systems)
One of the major drawbacks is ownership costs. When your databases become the single point of failure for so many types of operations, it becomes necessary to ensure that they are all hosted in high-availability environments. This not only expensive from a hardware perspective, but it is also expensive to support deployments to HA environments, since developers typically have very limited visibility to the internals.
A second drawback is that you have to seriously design integrity in to all of your tables. In a typical SOA environment, you have complete control over how data is modified. When you expose it through database tables, you must consider that any application with the right credentials will have the ability to modify data. Because of this, you must carefully consider utilitarian implementations of constraints. If you had a single service managing persistence, you could be much looser in constraints on the database and enforce them in code.
Third, if you ever want to expose any functionality that the database tables currently allow you to provide to outside parties, you must write service code anyway, so you might be better served doing it strategically as opposed to reacting to requests.
Fourth, UI interaction directly with the data layer creates security risks, especially if the client is a thick client.
Finally, writing code that responds to events (service calls) is much easier than polling code. Typically, organizations that rely heavily on database polling end up reinventing the wheel every time a new project requires a new "monitoring service." It can be avoided by creating a "framework," but those have their own pitfalls (primarily around prescription versus adoption).
This is just a laundry list of problems I have encountered. It's not necessarily meant to dissuade you from using databases for these functions, but it helps to know the dangers ahead of time so you can at least plan for them if they ever do become issues.
EDIT
Just thought of another scenario that caused us pains. Versioning your changes can be difficult. For example, if you need to change the shape of a table (normalize/denormalize), it has a cascading effect if multiple applications rely on it. In a SOA scenario, it is much easier, because you can keep your old API, change the internal interaction so that it works with the changed tables, and allow consumers to migrate to the new version on their own schedule.
A data broker sounds like a really good way to abstract out the multiple data sources for your apps. It would be easy to consolidate, change repositories, or otherwise move data around if needed in the future.
I may be misunderstanding something, but it seems to me like you should consider some entity framework. That is a framework you can use to "map" your interaction with the db to some domain objects. That way you work locally on domain objects that gets filled form your db, and when it is time to persist the state of your objects to the base, the framework handles all the connections back and forth. In this way you can also easily mock up these domain objects for unit testing without needing a db connection.
Check out NHibernate for a good entity framework alternative.
If you already have the database related know-how I think it's not a bad decission.
Good things that I can think of:
if the data model is consistent you can plug in new tools easily without making any changes in the other apps.
maybe you can have running the database more reliabily than your apps, so if one of them fails, the other one can still be working.
you can make backups and rollbacks using the database tools.
you can do emergency fixes manipulating the data directly with sql or some visual tool.
But if you have to learn new frameworks along the way, maybe the benefits are not worth the extra initial effort.
"any app that needs values from the db makes a request to the data broker"
When database technology was being invented over 40 years ago, the people doing that inventing had ideas along the lines of "any app that needs values from the db makes a request to the dbms".
Have you ever pondered the possibility that YOU ALREADY HAVE a "data broker", and that there might be very little added value in creating a second one of your own ?