Performing search in StackExchange.Redis - c#

I'm using Stack Exchange .Net Redis provider to store and retrieve values. I would like to know how can I search for certain records inside Redis (like any database, search needs to be executed in Redis instance not in .Net application)
Example:
public class Employee
{
public string FirstName { get; set; }
public string LastName { get; set; }
public int Age { get; set; }
public int Salary {get;set;}
}
If I have 100,000 records of employees stored as .Net "List<Employee> lstEmployee = new List<Employee>();" in Redis cache server and would like to fetch only the record whose age >50 and salary > 5000, how should I code for it?
Disclosure: I'm just getting started with Redis using this example.

First, a "cache server" is not intended to be used as a queryable store. If we assume instead that you mean simply a nosql backend, then ... well, frankly, that doesn't sound like the sort of query I would try and do via redis. The point of redis is that you build whatever indexes you need yourself. If you want ordered range queries (the age / salary), then a sorted set and ZRANGEBYSCORE is probably a viable option; however, intersecting these two queries is more difficult. You could try asking the same question ib the redisdb google-group, but just as a general redis question - not specific to any client library such as SE.Redis. If the operations exist ib redis, then you can use the client library to invoke them.
I'm wondering, however, whether "elastic" might be a better option for what you describe.

Related

How to prevent to update changed data

I use SQL Azure and have application, which sync data with external resource. Data is huge, approx 10K records, so, I get it from DB one time, update something if necessary during some minutes and save changes. It works, but problem with simultaneously access to data. IF during these some minutes other service add changes, these changes will be rewritten.
But in the most cases it concerns fields, which my application does not touch!
So, for example, my Table Device:
public partial class Device : BaseEntity
{
public string Name { get; set; }
public string IMEI { get; set; }
public string SN { get; set; }
public string ICCID { get; set; }
public string MacAddress { get; set; }
public DeviceStatus Status { get; set; }
first service (application with long-term process) can modify SN, ICCID, MacAddress, but not Status, second service, vice versa, can modify only Status.
Code to update in the first service:
_allLocalDevicesWithIMEI = _context.GetAllDevicesWithImei().ToList();
(it gets entities, not DTO, because really there are many fields can be changed)
and then:
_context.Devices.Update(localDevice);
for every device, which should be changed
and, eventually:
await _context.SaveChangesAsync();
How to mark, that field Status should be excluded from tracing?
One simple method to avoid update the status field when calling the first service is create a update entity not include the status field, and create another update entity for the second service which includes the status field.
Another way to resolve this problem is override the SaveChangesAsync method and control the update logic by yourself, but it's complex I think and the behavior is implicit, it will not easy for others to understand your code.
To avoid rewrite, you can specify RowVersion on entities. This is so called optimistic concurrency, it will throw error if rewrite happens and you can retry operation if someone already changed something. Or you can just level up your Transaction level, to something like RepeatableRead/Serialized to lock these rows for entire operation (which of course will pose huge performance impact and timeouts). Second option is simple, and good enough for background jobs and distributed transactions, first one is more flexible and usually faster. but hard to implement across multiple endpoints/entities.

Update Database after years of utilisation

EDIT 1: story example at the end
Years ago, we created tables in order to count how many products there were in our boxes.
There are two simple tables:
product (
code VARCHAR(16) PK,
length INT,
width INT,
height INT
)
box (
pkid INT IDENTITY(1,1),
barcode varchar(18),
product_code VARCHAR(16) FK,
quantity INT
)
And there two associated class:
public struct Product
{
public string Code { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public int Heigth { get; set; }
}
public struct Box
{
public int Id { get; set; }
public string BarCode { get; set; }
public Product Product { get; set; }
public int Quantity { get; set; }
}
After years, we need to put multiple different products in the same box, so we now need this:
product (
code VARCHAR(16) PK,
length INT,
width INT,
height INT
)
-- box changed
box (
pkid INT IDENTITY(1,1),
barcode varchar(18)
)
-- stock created
stock (
box_pkid INT M-PK FK,
product_code VARCHAR(16) M-PK FK,
quantity INT
)
and this:
public struct Product
{
public string Code { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public int Heigth { get; set; }
}
public struct Box
{
public int Id { get; set; }
public string BarCode { get; set; }
public Dictionary<Product, int> Content { get; set; } // <-- this changed
public int Quantity { get; set; }
}
But after years, we have lot of code, maybe duplicates in some dark places, left by leaving collaborators. I am a trainee, so I ask for my future experiences, in order to avoid this later.
What could be a solution to update our schema and keep data-integrity safe ? Even with millions of rows in DB ?
Example:
In 2014, we needed to store 10 Romeo and Juliet books in one box. If we had some Hamlet books, then we put them in another box. All 10 Romeo and Juliet books were the 'same' product (same cover, same content, same reference).
Today, we want to store, let's say, different Shakespear books in the same box. Or maybe different Love books. Or even Romeo and Juliet books AND figurines? So different products together: we should change the box table and Box class, shouldn't we?
You have many challenges; I'd split them into two high-level groups.
Firstly, how do you change your application at the code level, and secondly, how do you migrate your data from the old schema to the new one.
How do you change your code?
The first question is: can you be 100% certain that the classes you list are the only ways the data is accessed and modified? Are there any triggers, stored procedures, batch jobs, or other applications? I'm not aware of any way of finding this out other than by trawling through both the database schema artifacts, and the code base.
Within your "own" application, you have a choice. It's usually better to extend than modify your interface. In practical terms, that means that instead of changing the public Product Product { get; set; } signature to handle a dictionary, you keep it around, and add public Dictionary<Product, int> Content { get; set; } - if you can guarantee that the old method still works. This would mean limited re-writing all the dependencies of your class - you only have to worry about clients that need to understand that there could be more than 1 product in a box.
This allows you to follow a "lots of small changes, but the existing code continues to work" model; you can manage this via feature toggles etc. It's much lower risk - so the lesson here is "design your solution to be open to extension, but closed to change".
In this case, it doesn't seem possible - the "set" method may be okay (you can default that to a "one product in a box" solution), but the "get" method would have no graceful way of handling the case where you have more than 1 product in a box. If that's true, you change the class, and look for all the instances where your code won't compile, and follow the chain of dependencies.
For instance, in a typical MVC framework, in this case you'd be changing the model; this should cause the controller to report a compile error. In resolving that error, you almost certainly modify the signature of the controller methods. This in turn should break the view. So you follow that chain; doing this means your schema change becomes a "big bang, all-or-nothing" release. This typically is stressful for all involved...
How do you release your change?
This depends hugely on which of the two options you've chosen. #gburton's answer covers the database steps; these are necessary in both code options.
The second challenge is releasing new versions of your software; if it's a desktop client, for instance, you must make sure all clients are updated at the same time as your database change. This is (usually) horrible. If it's a web application, it's usually a little easier - fewer client machines to worry about.
Safely updating a legacy system is a classic problem. I'm guessing from your post that there isn't a nice safe dev copy of the DB, or at least one that is up to date, or you would already have a process to apply here.
I've written this in a system agnostic way even though you're obviously using MS SQL Server.
The key is to use caution, and ensure you are never 100% stuck if something goes wrong.
back up the old DB. Ensure you know how to do this without breaking anything.
Restore that backup into a new location.
figure out a test plan (this can be the longest part of the job)
make the changes to the new copy of the DB (don't touch the live one)
run through your test plan to ensure nothing has been broken.
If step 5 showed some errors, you just have to work through them. Once this is done, you have the scary part. The backup restore drill is critical here.
take backup of live database (your previous backup is probably out of date. you want as fresh a backup as possible to reduce data loss)
run a backup restore drill to make 100% sure you can recover
apply the changes to the live database
re-run your tests
Recovering a database down to the individual transaction is possible with many database engines. Consider using that process for step 6 if possible. How to achieve this would be a seperate question.

Entity Framework Best Way To Represent Arbitrary Data

I'm looking for a database solution in which some parts of my data may become unstructured.
More specifically, I'm using Entity Framework 6, but really I suppose this is more of a SQL rooted question.
Suppose I have a collection of generic geometric objects (GeoObj) that can be represented by an arbitrary amount of values (collection of int).
This is one solution, but I'm really not sure about the correctness/efficiency:
GeoObj
{
public int ID { get; set; }
public virtual ICollection<GeoValue> Values { get; set; }
}
GeoValue
{
public int ID { get; set; }
public int Value { get; set; }
}
Now the problem I see is that since I expect a lot of GeoObjs, and each GeoObj has a good amount of GeoValues, the GeoValues Table with get HUGE.
Will this slow down performance significantly?
Is there a better solution?
Thanks!
You're correct in your suspicion that the potential performance issue is ultimately rooted in your SQL database server. Are you using MS SQL Server?
The answer will be found in testing the performance of your specific version of SQL Server running on the hardware and configuration of the machine hosting the database.
As far as I know, Entity Framework doesn't have facilities that support control of table partitioning and things like that to work around the bottleneck of a HUGE table. That is in the scope of the database.
Check out https://msdn.microsoft.com/en-us/library/ms190787.aspx for some info on that.

Storing the value in a table column vs Calculating the value from navigation property

I need advice for using Entity Framework 6. Suppose the website features many products with many reviews (think Amazon.com). Assuming most visitors view products more than writing reviews, then if I want to display the average user rating for each product, should I add a AverageReviewRating column to store its value to (supposedly) speed up query performance?
Is this a good or bad practice? The alternative would be to access each Review from the navigation property and calculate the average rating from that. What's the recommended approach?
public class Product
{
public int ProductID { get; set; }
public string Name { get; set; }
// Should I store the rating in a pre-calculated column or not?
public int AverageReviewRating { get; set; }
public virtual ICollection<Review> Reviews { get; set; }
}
I would say if it is something huge like amazon you can pre store the calculated value.
This would help in better performance while fetching .Also you can delegate the task of updating the average to some kind of background process.
I would not store the average in a separate field becase
I will need to update it frequently
I will not be saving much in terms of performance
If you query using an Entity Framework context and you request an average of the review properties, the review will not be fetched. Entity Framework will generate a SQL select statement returning the average.
Something like: SELECT AVERAGE(ReviewScore) FROM ReviewsTable Where ReviewedProductId = ProductId
Furthermore, database denormalization should only be used when you meet serious performance problems.

Which ORM should I use together with ServiceStack and an existing database

I am currently developing a web service which provides basic CRUD operations on business objects. The service will be used by legacy applications which currently use direct database access.
I decided to use ServiceStack instead of WCF due to ServiceStacks great architecture.
However know I am trying to decide wether to use OrmLite, nHibernate or Entity Framework to access the existing legacy database.
Requirements for the ORM are as follows
Support for joins
Support for stored procedures
I already tried OrmLite (as it's fast and already included with ServiceStack). The only way I managed to join two tables was by using SQL (not an option). Is there any better way?
// #stackoverflow: This is my POCO DTO
public class Country
{
public long Id { get; set; }
public string Alpha2 { get; set; }
public string Alpha3 { get; set; }
public string ShortText { get; set; }
public string LongText { get; set; }
}
public class CountryRepository : ICountryRepository
{
// #stackoverflow: This is the query to join countries with translated names stored in another table
private const string CountriesSql =
#"SELECT C.Id, C.Alpha2, C.Alpha3, L.ShortText, L.LongText FROM COUNTRY AS C INNER JOIN LOCALIZATION AS L ON C.LocId = L.Id WHERE (L.Lang_Id = {0})";
private const string CountrySql = CountriesSql + " AND C.Id={2}";
private IDbConnection db;
public IDbConnectionFactory DbFactory { get; set; }
private IDbConnection Db
{
get { return db ?? (db = DbFactory.Open()); }
}
public List<Country> GetAll()
{
return Db.Select<Country>(CountriesSql, 0);
}
public Country GetById(long id)
{
return Db.SingleOrDefault<Country>(CountrySql, 0, id);
}
}
The example above shows one of the simple business objects. Most others require Insert, Update, Delete, multiple Joins, and Read with many filters.
If all you need are joins (lazy-loading or eager loading) and stored procedure support and want to get setup quickly then Entity Framework and nHibernate are great options. Here is a cool link about EntityFramework and the repository and unit of work pattern. http://blogs.msdn.com/b/adonet/archive/2009/06/16/using-repository-and-unit-of-work-patterns-with-entity-framework-4-0.aspx
If you are very concerned with performance and want more control over how your classes will look (ie POCOs) and behave then you can try something more lightweight like ORMLite or Dapper. These two are just thin wrappers with less features but they will give you the best performance and most flexibility -- even if that means writing some SQL every once in a while.
You can also use hybrid approaches. Don't be afraid to mix and match. This will be easiest when using POCOs.
I think the important thing is to code for your current database and current needs. However, to do so using proper interfaces so if the time came to switch to a different database or storage mechanism then you simply have to create a new data provider and plug it in.
Ormlite supports primitive Join functions using expressions. The new JoinSqlBuilder class can help with this. For SPs, I have added a new T4 file to generate corresponding c# functions. Currently the SP generation code supports Sql Server alone; if you are using any other db, you can easily add support for it.
You might consider LLBLGen Pro -- it's got great support for database first design and also has designer tools that speed up getting started if you use nHibernate or EF. But it is $$.
http://llblgen.com
As a follow up to this Matt Cowan has created an AWESOME template generator for building this sort of thing with LLBLGen. Check out the blog post here:
http://www.mattjcowan.com/funcoding/2013/03/10/rest-api-with-llblgen-and-servicestack/
and demo here:
http://northwind.mattjcowan.com/
The demo is entirely autogenerated!
Also check this comparison from an OO perspective between NHibernate 3.x and Entity Framework 5/6
http://www.dennisdoomen.net/2013/03/entity-framework-56-vs-nhibernate-3.html

Categories