How to use ‘Wide Column’ feature in Cassandra?

How to use ‘Wide Column’ feature in Cassandra? - c#

I checked the ranking in DbEngine about 'Wide Column Store' database, the Cassandra seems to be the widest choice at present.
If I understood correctly, the so called 'Wide Column' means the columns for one row are dynamically, such as count and the name of columns, so it doesn't need the Schema stuffs.
But from most articles and documentations online, I found there is always 'CREATE TABLE (...)' CQL query executed firstly, then insert the data with this schema. From my understanding, it's the 'Static Columns' in Cassandra, which has a fixed schema defined. So how to insert data without creating the schema firstly?
And also, I found another item called 'Wide Row', what does it exactly mean, any relations with the 'Wide Column'?
Thanks a lot, the concepts puzzled me a lot.

There are 2 interfaces to access the data in Cassandra - Thrift and CQL.
Thrift is kinda low level and gives you access to "internal" rows (aka Wide rows), and also allows you to use schemaless (dynamic) tables/column families.
CQL tables are built on top of the internal rows, and can only be accessed via CQL. CQL tables allow you to use all modern features like collections, user-types and etc.
You can find more information there: http://www.datastax.com/dev/blog/thrift-to-cql3

Related

Entity Framework and Dynamic Schema

I inherited an application that talks to many different client databases.
Most of these tables in the client databases have identical schema - but there are a handful of tables that have extra custom columns that contain tax information (ya - bad idea - I know … I didn't set it up).
These extra columns could be named anything. They are known at runtime as they can be looked up in another table.
I can setup EF to that it will read/write these tables (skipping the dynamic columns) but I really do need this information - as it is tax data.
I think my best route it to have a fixed model with extra properties added that could be filled by these dynamic columns.
How can I get Entity Framework to dynamically read and write these columns without using custom SQL statements on every call?
I can do extra reads and writes to read and write these extra columns separately (using custom sql)… but there must be some way to override EF so that it knows about these extra columns and can handle them correctly.
Any help would be appreciated.

In a first step, you could interrogate the _INFORMATION_SCHEMA_, or other metadata tables directly, to know if the table you want your context to be on has these columns. Based on that information, you can use a different DbContext (generic would probably work) but create it using MappingConfiguration in which you either ignore the columns if they aren't there, or map them to the POCO class your context desires.

Creating composite COLUMNS (not keys) with CQL 3

This article discusses possible ways CQL 3 could be used for creating composite columns in Cassandra 1.1. They are just ideas. Nothing is official, and the Datastax documentation doesn't cover this (only composite keys).
As I understand it, composite columns are a number of columns that together have only one value.
How do you create them with CQL?
EDIT
I will be using C# to interface into Cassandra. CQL looks straightforward to use, which is why I want to use it.

You've got a couple concepts confused, I think. Quite possibly this is the fault of the Datastax documentation; if you have any good suggestions for making it clearer after you have a better picture, I'll be glad to send them on.
The "composite keys" stuff in the Datastax docs is actually talking about composite Cassandra columns. The reason for the confusion is that rows in CQL 3 do not map directly to storage engine rows (what you work with when you use the thrift interface). "Composite key" in the context of a CQL table just means a primary key which consists of multiple columns, which is implemented by composite columns at the storage layer.
This article is one of the better explanations as to how the mapping happens and why the CQL model is generally easier to think about.
With this sort of use, the first CQL column becomes being the storage engine partition key.
As of Cassandra 1.2 (in development), it's also possible to create composite storage engine keys using CQL, by putting extra parentheses in the PRIMARY KEY definition around the CQL columns that will be stored in the partition key (see CASSANDRA-4179), but that's probably going to be the exception, not the rule.

With Cassandra, you store data in rows. Each row has a row key and some number of columns. Each column has a name and a value. Usually the column name and value (and row key, for that matter) are single values (int, long, UTF8, etc), but you can use composite values in row keys, column names and column values. A composite value is just some number of values that have been serialized together in some way.
Over time a number of language-specific API's have been developed. These API's start with the understanding I describe above and provide access to a Column Family accordingly. Hector, the java client API, is the one I'm most familiar with, but there are others.
CQL was introduced as a means to use Cassandra tables in an SQL/JDBC fashion. Not all Cassandra capabilities were supported through CQL at first, although CQL is getting more and more functional as time goes on.
I don't doubt your need for composite column names and values (I believe that's what your asking for). The problem is that CQL has yet to evolve (as I understand it) to that level of native support. Whether or not it ever will is not known to me.
I suggest that you complete the definition of your desired column family schemas, complete with composite values if necessary. Once you've done that, look at the various API's available to access Cassandra column families and choose the one that best supports your desired schema.
You haven't said what language you're using. If you were coding in java then I'd recommend Hector and not CQL.

Are you sure you want to create them with CQL? What is your use case?

.NET Updating Data in SQL View

I've been asked by my boss to replicate an MS Access feature that we're going to lose shortly after migrating our product to .NET.
The feature is the ability to view and update any data in the database, particularly Tables or Views, in a tabular grid.
I can do it for pure tables that have a identity column because the SqlDataAdapter can auto-generate the relevant CRUD methods on the fly, to fill / update via DataTables.
However, views are somewhat more tricky. SQL Server Management Studio does allow it. If you click 'Edit top xx rows' on a View, it allows you to edit the data in some columns in what looks to be a standard .NET DataGridView - though it feels a bit magical.
So, a few questions:
How does SSMS infer which primary key to use, even if the key is not in the view?
How does SSMS determine which column inside a view can or can not be edited / inserted / deleted etc.?
What would be my best option to replicate this inside a .NET application?
Is it possible to connect a DataGridView to an old style oledb / obdc connection that has a constant direct connection to the database?
Any guidance as normal will be highly appreciated.
Marlon

SQL Server views can be updated just as if they were a single table, as long as they conform to certain conditions.
From the documentation:
Updatable Views
You can modify the data of an underlying base table through a view, as
long as the following conditions are true:
Any modifications, including UPDATE, INSERT, and DELETE statements,
must reference columns from only one base table.
The columns being modified in the view must directly reference the
underlying data in the table columns. The columns cannot be derived in
any other way, such as through the following:
An aggregate function: AVG, COUNT, SUM, MIN, MAX, GROUPING, STDEV,
STDEVP, VAR, and VARP.
A computation. The column cannot be computed from an expression that
uses other columns. Columns that are formed by using the set operators
UNION, UNION ALL, CROSSJOIN, EXCEPT, and INTERSECT amount to a
computation and are also not updatable.
The columns being modified are not affected by GROUP BY, HAVING, or
DISTINCT clauses.
TOP is not used anywhere in the select_statement of the view together
with the WITH CHECK OPTION clause.
The previous restrictions apply to any subqueries in the FROM clause
of the view, just as they apply to the view itself. Generally, the
Database Engine must be able to unambiguously trace modifications from
the view definition to one base table. For more information, see
Modify Data Through a View.
I don't believe SSMS is doing anything special - editing the contents of a view offers exactly the same functionality as editing the contents of a table. If the user attempts to make a change that does not conform to the above conditions, SSMS will likely display an error.
How does SSMS infer which primary key to use, even if the key is not in the view?
It doesn't. SQL Server does since only one underlying table can be edited at a time.
How does SSMS determine which column inside a view can or can not be edited / inserted / deleted etc.?
Again, it's SQL Server that determines this, not SSMS.
What would be my best option to replicate this inside a .NET application?
As long as all your views conform to the above conditions, simply do the same as you're doing for tables, but be ready to handle the errors from users doing something they can't (this implies some user training will be required, just as it would be if they were using SSMS directly).

Custom Fields in .Net and SQL Server

We have a requirement on our project for custom fields. We have some standard fields on the table and each customer wants to be able to add their own custom fields. At the moment I am not interested in how this will work in the UI, but I want to know what the options are for the back end storage and retrieval of the data. The last time I did something like this was about 10 years ago in VB6 so I would be interested to know what the options are for this problem in today's .Net world.
The project is using SQL server for the backend, linq-to-sql for the ORM and a C# asp.net front end.
What are my options for this?
Thanks

There are four main options here:
actually change the schema (DDL) at runtime - however, pretty much no ORM will like that, and generally has security problems as your "app" account shouldn't normally be redefining the database; it does, however, avoid the "inner platform" effect inherent in the next two
use a key-value store as rows, i.e. a Customer table might have a CustomerValues table with pairs like "dfeeNumber"=12345 (one row per custom key/value pair) - but a pain to work with (instead of a "get", this is a "get" and a "list" per entity)
use a single hunk of data (xml, json, etc) in a CustomFields single cell - again, not ideal to work with, but it easier to store atomically with the main record (downside: forces you to load all the custom fields to read a single one)
use a document database (no schema at all) - but then: no ORM
I've used all 4 at different points. All 4 can work. YMMV.

I have a similar situation on the project I'm working on now.
Forget about linq-to-sql when you are having a flexible database schema. There is no way to update the linq-to-sql models on the fly when the DB schema changes.
Solutions:
Keep an extra table with the table name the values belong to , column name , value etc
Totally dynamically change your table schema each time they add a field.
Use a NOSQL solution like mongoDB or the Azure Table Storage. A NOSQL solution doesn't require a schema and can be changed on the fly.
This is a handy link 2 read:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

You're referring to an EAV model (entity-attribute-value).
Here's an article: http://hanssens.org/post/Generic-Entity-Attribute-Value-Model-e28093-A-POCO-Implementation.aspx

Customizeable database

What would be the best database/technique to use if I'd like to create a database that can "add", "remove" and "edit" tables and columns?
I'd like it to be scaleable and fast.
Should I use one table and four columns for this (Id, Table, Column, Type, Value) - Is there any good articles about this. Or is there any other solutions?
Maybe three tables: One that holds the tables, one that holds the columns and one for the values?
Maybe someone already has created a db for this purpose?
My requirements is that I'm using .NET (I guess the database don't have to be on windows, but I would prefer that)

Since (in comments on the question) you are aware of the pitfalls of the "inner platform effect", it is also true that this is a very common requirement - in particular to store custom user-defined columns. And indeed, most teams have needed this. Having tried various approaches, the one which I have found most successful is to keep the extra data in-line with the record - in particular, this makes it simple to obtain the data without requiring extra steps like a second complex query on an external table, and it means that all the values share things like timestamp/rowversion for concurrency.
In particular, I've found a CustomValues column (for example text or binary; typically json / xml, but could be more exotic) a very effective way to work, acting as a property-bag for the additional data. And you don't have to parse it (or indeed, SELECT it) until you know you need the extra data.
All you then need is a way to tie named keys to expected types, but you need that metadata anyway.
I will, however, stress the importance of making the data portable; don't (for example) store any specific platform-bespoke serialization (for example, BinaryFormatter for .NET) - things like xml / json are fine.
Finally, your RDBMS may also work with this column; for example, SQL Server has the xml data type that allows you to run specific queries and other operations on xml data. You must make your own decision whether that is a help or a hindrance ;p
If you also need to add tables, I wonder if you are truly using the RDBMS as an RDBMS; at that point I would consider switching from an RDBMS to a document-database such as CouchDB or Raven DB

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.