Mid-Tier Help Needed

Mid-Tier Help Needed - c#

In one sentence, what i ultimately need to know is how to share objects between mid-tier functions w/ out requiring the application tier to to pass the data model objects.
I'm working on building a mid-tier layer in our current environment for the company I am working for. Currently we are using primarily .NET for programming and have built custom data models around all of our various database systems (ranging from Oracle, OpenLDAP, MSSQL, and others).
I'm running into issues trying to pull our model from the application tier and move it into a series of mid-tier libraries. The main issue I'm running into is that the application tier has the ability to hang on to a cached object throughout the duration of a process and make updates based on the cached data, but the Mid-Tier operations do not.
I'm trying to keep the model objects out of the application as much as possible so that when we make a change to the underlying database structure, we can edit and redeploy the mid-tier easily and multiple applications will not need to be rebuilt. I'll give a brief update of what the issue is in pseudo-code, since that is what us developers understand best :)
main
{
MidTierServices.UpdateCustomerName("testaccount", "John", "Smith");
// since the data takes up to 4 seconds to be replicated from
// write server to read server, the function below is going to
// grab old data that does not contain the first name and last
// name update.... John Smith will be overwritten w/ previous
// data
MidTierServices.UpdateCustomerPassword("testaccount", "jfjfjkeijfej");
}
MidTierServices
{
void UpdateCustomerName(string username, string first, string last)
{
Customer custObj = DataRepository.GetCustomer(username);
/*******************
validation checks and business logic go here...
*******************/
custObj.FirstName = first;
custObj.LastName = last;
DataRepository.Update(custObj);
}
void UpdateCustomerPassword(string username, string password)
{
// does not contain first and last updates
Customer custObj = DataRepository.GetCustomer(username);
/*******************
validation checks and business logic go here...
*******************/
custObj.Password = password;
// overwrites changes made by other functions since data is stale
DataRepository.Update(custObj);
}
}
On a side note, options I've considered are building a home grown caching layer, which takes a lot of time and is a very difficult concept to sell to management. Use a different modeling layer that has built in caching support such as nHibernate: This would also be hard to sell to management, because this option would also take a very long time tear apart our entire custom model and replace it w/ a third party solution. Additionally, not a lot of vendors support our large array of databases. For example, .NET has LINQ to ActiveDirectory, but not a LINQ to OpenLDAP.
Anyway, sorry for the novel, but it's a more of an enterprise architecture type question, and not a simple code question such as 'How do I get the current date and time in .NET?'
Edit
Sorry, I forgot to add some very important information in my original post. I feel very bad because Cheeso went through a lot of trouble to write a very in depth response which would have fixed my issue were there not more to the problem (which I stupidly did not include).
The main reason I'm facing the current issue is in concern to data replication. The first function makes a write to one server and then the next function makes a read from another server which has not received the replicated data yet. So essentially, my code is faster than the data replication process.
I could resolve this by always reading and writing to the same LDAP server, but my admins would probably murder me for that. The specifically set up a server that is only used for writing and then 4 other servers, behind a load balancer, that are only used for reading. I'm in no way an LDAP administrator, so I'm not aware if that is standard procedure.

You are describing a very common problem.
The normal approach to address it is through the use of Optimistic Concurrency Control.
If that sounds like gobbledegook, it's not. It's pretty simple idea. The concurrency part of the term refers to the fact that there are updates happening to the data-of-record, and those updates are happening concurrently. Possibly many writers. (your situation is a degenerate case where a single writer is the source of the problem, but it's the same basic idea). The optimistic part I'll get to in a minute.
The Problem
It's possible when there are multiple writers that the read+write portion of two updates become interleaved. Suppose you have A and B, both of whom read and then update the same row in a database. A reads the database, then B reads the database, then B updates it, then A updates it. If you have a naive approach, then the "last write" will win, and B's writes may be destroyed.
Enter optimistic concurrency. The basic idea is to presume that the update will work, but check. Sort of like the trust but verify approach to arms control from a few years back. The way to do this is to include a field in the database table, which must be also included in the domain object, that provides a way to distinguish one "version" of the db row or domain object from another. The simplest is to use a timestamp field, named lastUpdate, which holds the time of last update. There are other more complex ways to do the consistency check, but timestamp field is good for illustration purposes.
Then, when the writer or updater wants to update the DB, it can only update the row for which the key matches (whatever your key is) and also when the lastUpdate matches. This is the verify part.
Since developers understand code, I'll provide some pseudo-SQL. Suppose you have a blog database, with an index, a headline, and some text for each blog entry. You might retrieve the data for a set of rows (or objects) like this:
SELECT ix, Created, LastUpdated, Headline, Dept FROM blogposts
WHERE CONVERT(Char(10),Created,102) = #targdate
This sort of query might retrieve all the blog posts in the database for a given day, or month, or whatever.
With simple optimistic concurrency, you would update a single row using SQL like this:
UPDATE blogposts Set Headline = #NewHeadline, LastUpdated = #NewLastUpdated
WHERE ix=#ix AND LastUpdated = #PriorLastUpdated
The update can only happen if the index matches (and we presume that's the primary key), and the LastUpdated field is the same as what it was when the data was read. Also note that you must insure to update the LastUpdated field for every update to the row.
A more rigorous update might insist that none of the columns had been updated. In this case there's no timestamp at all. Something like this:
UPDATE Table1 Set Col1 = #NewCol1Value,
Set Col2 = #NewCol2Value,
Set Col3 = #NewCol3Value
WHERE Col1 = #OldCol1Value AND
Col2 = #OldCol2Value AND
Col3 = #OldCol3Value
Why is it called "optimistic"?
OCC is used as an alternative to holding database locks, which is a heavy-handed approach to keeping data consistent. A DB lock might prevent anyone from reading or updating the db row, while it is held. This obviously has huge performance implications. So OCC relaxes that, and acts "optimistically", by presuming that when it comes time to update, the data in the table will not have been updated in the meantime. But of course it's not blind optimism - you have to check right before update.
Using Optimistic Cancurrency in practice
You said you use .NET. I don't know if you use DataSets for your data access, strongly typed or otherwise. But .NET DataSets, or specifically DataAdapters, include built-in support for OCC. You can specify and hand-code the UpdateCommand for any DataAdapter, and that is where you can insert the consistency checks. This is also possible within the Visual Studio design experience.
(source: asp.net)
If you get a violation, the update will return a result showing that ZERO rows were updated. You can check this in the DataAdapter.RowUpdated event. (Be aware that in the ADO.NET model, there's a different DataAdapter for each sort of database. The link there is for SqlDataAdapter, which works with SQL Server, but you'll need a different DA for different data sources.)
In the RowUpdated event, you can check for the number of rows that have been affected, and then take some action if the count is zero.
Summary
Verify the contents of the database have not been changed, before writing updates. This is called optimistic concurrency control.
Other links:
MSDN on Optimistic Concurrency Control in ADO.NET
Tutorial on using SQL Timestamps for OCC

Related

SQL - Better two queries instead of one big one

I am working on a C# application, which loads data from a MS SQL 2008 or 2008 R2 database. The table looks something like this:
ID | binary_data | Timestamp
I need to get only the last entry and only the binary data. Entries to this table are added irregular from another program, so I have no way of knowing if there is a new entry.
Which version is better (performance etc.) and why?
//Always a query, which might not be needed
public void ProcessData()
{
byte[] data = "query code get latest binary data from db"
}
vs
//Always a smaller check-query, and sometimes two queries
public void ProcessData()
{
DateTime timestapm = "query code get latest timestamp from db"
if(timestamp > old_timestamp)
data = "query code get latest binary data from db"
}
The binary_data field size will be around 30kB. The function "ProcessData" will be called several times per minutes, but sometimes can be called every 1-2 seconds. This is only a small part of a bigger program with lots of threading/database access, so I want to the "lightest" solution. Thanks.

Luckily, you can have both:
SELECT TOP 1 binary_data
FROM myTable
WHERE Timestamp > #last_timestamp
ORDER BY Timestamp DESC
If there is a no record newer than #last_timestamp, no record will be returned and, thus, no data transmission takes place (= fast). If there are new records, the binary data of the newest is returned immediately (= no need for a second query).

I would suggest you perform tests using both methods as the answer would depend on your usages. Simulate some expected behaviour.
I would say though, that you are probably okay to just do the first query. Do what works. Don't prematurely optimise, if the single query is too slow, try your second two-query approach.

Two-step approach is more efficient from overall workload of system point of view:
Get informed that you need to query new data
Query new data
There are several ways to implement this approach. Here are a pair of them.
Using Query Notifications which is built-in functionality of SQL Server supported in .NET.
Using implied method of getting informed of database table update, e.g. one described in this article at SQL Authority blog

I think that the better path is a storedprocedure that keeps the logic inside the database, Something with an output parameter with the data required and a return value like a TRUE/FALSE to signal the presence of new data

EF and Linq with self referential table

I have a self-referential table in my database that looks sort of like above. Basically its setup in such a way that each row has a unique ID (identity PK) and a DependentID to indicate any other record in the set that it is dependent on. It is very similar to the parent-child type examples you often see in SQL textbooks but my case is subtly unique in the sense that a given record can also be dependent upon itself (see row 1 above)
Two questions:
Can EF be made to represent this relationship properly? I've read several posts on here that suggest that it does not deal with this scenario gracefully so my initial thought was that it might not even be worth it, I might be better off just treating it as a normal table and writing the business logic to ensure the data gets inserted/updated correctly. In my scenario, I won't ever be querying these entities thru EF really, the app will basically load them all at startup and then I'll run linq queries against them at runtime to filter as needed
Assuming I cannot get it to work with EF and as I note in #1 I simply load em all up into memory at startup (there are only going to be 50-100 or so), what would be the most efficient way to join on this via linq? I would want to be able to pass in a DependentId and get all the records associated with it and their properties...so in this example I'd want to pass in '1' and get back:
1 - John - 10
2 - Mike - 25
3 - Bob - 5
thanks for the help

Indeed, the entity framework cannot represent such a relationship, certainly not in in a recursively queryable form.
But you are not asking for recursive queries, so you could treat DependentId as just another data column. Doing that, it would be trivial to build and execute your question-two query against the database.
UPDATE:
That query would look something like
int dependentIdToSearch = 1;
var q = from something in db.mytable
where something.DependentId == dependentIdToSearch
select new { something.Id, something.Name, something.Value };
END UPDATE
If you do need recursive queries (all direct and indirect dependencies of), you need a table valued function with a common table expression. The entity framework cannot deal with that either, at least not in the current version. If you need this support, you can wait for EF 5 or use Linq to SQL (which had support for table valued functions since the first version years ago).
You can indeed also read the entire table in memory, provided that it is read-only, or that there is only "one memory" (single server, not load-balanced or client app with local database).
If it's read-only, you have the option to build an object graph once at load time, enabling efficient execution later. For example, you could define a class with a collection of objects that are dependent on each object. Your query then becomes a trivial iteration over that collection.

High performance Custom user fields

looking for examples/tutorial for custom user fields, not via EAV
EAV is going to be problematic for various reasons such as performance
there are many base entities/tables with over 100000 records each
there will likely be over a dozen attributes
the records are to be displayed in a flat ui grid incl. custom fields so flattening them would be an issue while maintaining performance
Looking at enabling this via DDL where all custom fields would go into a matching table such as
<tablename>_custom_<userid>
and all user attributes would map to a column each and all their metadata stored in a metadata table
the retrieval would be simpler where the query would simply be
select *
from <tablename> A, tableName_custom_userid B
where B.KeyField = A.KeyField --( perhaps using outer join, haven't gone that far yet )
Wondering if there are any gotchas down the road that i need to be aware of ?
of course any samples/pointers would be helpful to kickstart the effort
specifically would appreciate any advice on using DDL for Sql Server compact 4

One technique I have seen used is to use a sort of 'hard-coded' EAV pattern. Don't hang up! It worked well with the dataset sizes you were talking about and didn't actually use EAV - it was only EAV-esque.
The idea is to have a set of tables to store these custom attributes within it, with some triggers (described below) on them. The custom attributes tablesets store metadata about the attribute (what table it goes with, data type, constraints, etc). You can get very fancy with this but I did not haev the need.
The triggers on your meta-tables are there to re-generate views that rollup base+extension into first class objects within the DB. So instead of table person + employee extension table, you have an employee view that includes both. When you drop a new value into the custom attributes tables, the triggers will re-roll the views and include the new stuff. If you wanted to go nuts, you could also have the triggers re-write stored procedures as well. Depending on how your mid-tier code is structured, you would still be forced to re-code some, however this would be the case anyway should you be applying rules that read the data.
In testing, I found that for the relatively small # of records you're talking about, performance was somewhat slower but followed roughly the same pattern of degradation (2x the number of records, ~2x as slow).
-- edits --
How I saw it done, you had a table that represented your first class objects, so a row for 'person' and a row for 'employee,' etc. We'll call that FCO. Then you had a secondary table that stored what tables represented the FCO. We'll call that Srcs.. For person, there would be one row, which is the person table. For Employee, there would be two rows, the person table and the Employee extension. There is a third table, called Attribs, which stores the columns from the tables that constitute the FCO. For simplicity, we'll say Employee has ID, Name and Address, and Employee has Hire Date and Department, and obviously PersonID referring back to Person table. So, 2 rows in FCO table (person and employee), 3 rows in Src table, 8 rows in Attribs.
The view, we'll call it vw_Employee, selects PersonID, Name, Address, Hire Date, Department from the two tables. It is built by a SQL stored procedure we'll call OnMetadataChange.
This SP is fired (by trigger or batch process), and its purpose is to generate the CREATE VIEW statements. It will iterate through every First Class Object, collect which fields from which tables constitute the view, and will issue a CREATE statement based on that. So OnMetadataChange produces a DROP and CREATE for each view, it generates a dynamic SQL statement that is executed once per entry in FCO table. It is preferable to do this with Triggers but not necessary. Hopefully your FCO definitions won't change too often, and when they do, there will probably be a code release as well. You can run your OnMetadataChange SP at that time.
The end result is a 2-layer database. The views constitute the First Class Object layer, which is meaningful to the application. The application only uses views. The tables constitute the 'physical' layer, which the application shouldn't care about. The meta-tables are essentially your mapping between the FCO layer and the physical layer. It takes some time to set it up, but it's quite effective, and gives you many of the benefits of EAV, while at the same time giving you the concrete benefits of 3nf tables (indexability, etc).
If you'd like I can throw some sample SQL out there.

Part of the problem you are having is that you are trying to store schema-less data in a SQL database, which is not its strength. There are three approaches that would make your life far easier:
1) Have a column which stores the serialized custom fields, with whatever format is mst convenient. For example, this column could store xml. Upsides are that you can use SQL Server Compact and pulling back a record is trivial. Downsides are that you always have to pull/push the entire xml blob to do an update, and it is difficult to impossible to query on any custom fields.
2) Upgrade to SQL Server Express, and use XML columns. This is nearly the same as the first suggestion, except that any server ready version of SQL Server has native support for XML data. These columns can have indexes added and fields within the data can be used in queries.
3) Use a Schema-less Database, like MongoDB or CouchDB. These databases are all about storing schemaless data, so your custom fields will be no different than any other field. As such, you can index and query custom fields. Upsides are that custom data is incredibly easy to work with, downsides are that you would have to spend some time rethinking how you store data to fit within their model.
If you do not need to query based on custom fields, or if you can query custom fields within business logic, then the first option can work for you. In any other case, I would err towards something with more capabilities than compact. If cost is the deciding factor, both SQL Server Express and MongoDB are free.

Storing Data from Forms without creating 100's of tables: ASP.NET and SQL Server

Let me first describe the situation. We host many Alumni events over the course of each year and provide online registration forms for each event. There is a large chunk of data that is common for each event:
An Event with dates, times, managers, internal billing info, etc.
A Registration record with info about the payment and total amount charged per form submission
Bio/Demographic and alumni data about the 1 or more attendees (name, address, degree, etc.)
We store all of the above data within columns in tables as you would expect.
The trouble comes with the 'extra' fields we are asked to put on the forms. Maybe it is a dinner and there is a Veggie or Carnivore option, perhaps there is lodging and there are bed or smoking options, or perhaps there is an optional transportation option. There are tons of weird little "can you add this to the form?" types of requests we receive.
Currently, we JSONify any non-standard data and store it all in one column (per attendee) called 'extras'. We can read this data out in code but it is not well suited to querying. Our internal staff would like to generate a quick report on Veggie dinners needed for instance.
Other than creating a separate table for each form that holds the specific 'extra' data items, are there any other approaches that could make my life (and reporting) easier? Anyone working in a simialr environment?

This is actually one of the toughest problem to solve efficiently. The SQL Server Customer Advisory Team has dedicated a white-paper to the topic which I highly recommend you read: Best Practices for Semantic Data Modeling for Performance and Scalability.
You basically have 3 options:
semantic database (entity-attribute-value)
XML column
sparse columns
Each solution comes with ups and downs. Out of the top of my hat I'd say XML is probably the one that gives you the best balance of power and flexibility, but the optimal solution really depends on lots of factors like data set sizes, frequency at which new attributes are created, the actual process (human operators) that create-populate-use these attributes etc, and not at least your team skill set (some might fare better with an EAV solution, some might fare better with an XML solution). If the attributes are created/managed under a central authority and adding new attributes is a reasonable rare event, then the sparse columns may be a better answer.

Well you could also have the following db structure:
Have a table to store custom attributes
AttributeID
AttributeName
Have a mapping table between events and attributes with:
AttributeID
EventID
AttributeValue
This means you will be able to store custom information per event. And you will be able to reuse your attributes. You can include some metadata as
AttributeType
AllowBlankValue
to the attribute to handle it easily afterwards

Have you considered using XML instead of JSON? Difference: XML is supported (special data type) and has query integration ;)

quick and dirty, but actually nice for querying: simply add new columns. it's not like the empty entries in the previous table should cost a lot.
more databasy solution: you'll have something like an event ID in your table. You can link this to an n:m table connecting events to additional fields. And then store the additional field data in a table with additional_field_id, record_id (from the original table) and the actual value. Probably creates ugly queries, but seems politically correct in terms of database design.
I understand "NoSQL" (not only sql ;) databases like couchdb let you store arbitrary fields per record, but since you're already with SQL Server, I guess that's not an option.

This is the solution that we first proposed in ASP.NET Forums (that later became Community Server), and that the ASP.NET team built a similar version of in the ASP.NET 2.0 Membership when they released it:
Property Bags on your domain objects
For example:
Event.Profile() or in your case, Event.Extras().
Basically, a property bag is a serialized collection of data stored in a name/value pair in a column (or columns). The ASP.NET 2.0 Membership went the route of storing names in a semi-colon delimited list, and values in the same:
Table: aspnet_Profile
Column: PropertyNames (separated by semi-colons, and has start index and end index)
Column: PropertyValues (separated by semi-colons, and only stores the string value)
The downside to that approach is it is all strings, and manually has to be parsed (even though the membership system does it for you automatically).
Recently, my current method is I've built FormCollection and NameValueCollection C# extension methods that automatically serialize the collections to an XML result. And I store that XML in the table in it's own column associated with that entity. I also have a deserializer C# extension on XElement that deserializes that data back to the collection at runtime.
This gives you the power of actually querying those properties in XML, via SQL (though, that can be slow though - always flatten out your read-only data).
The final note is runtime querying: The general rule we follow is, if you are going to query a property of an entity in normal application logic, then you move that property to an actual column on the table - and create the appropriate indexes. If that data will never be queried directly (for example, Linq-to-Sql or EF), then leave it in the XML Property Bag.
Property Bags gives you the power of extending your domain models however you like, without having to modify the db schema.

Programming pattern using typed datasets in VS 2008

I'm currently doing the following to use typed datasets in vs2008:
Right click on "app_code" add new dataset, name it tableDS.
Open tableDS, right click, add "table adapter"
In the wizard, choose a pre defined connection string, "use SQL statements"
select * from tablename and next + next to finish. (I generate one table adapter for each table in my DB)
In my code I do the following to get a row of data when I only need one:
cpcDS.tbl_cpcRow tr = (cpcDS.tbl_cpcRow)(new cpcDSTableAdapters.tbl_cpcTableAdapter()).GetData().Select("cpcID = " + cpcID)[0];
I believe this will get the entire table from the database and to the filtering in dotnet (ie not optimal), is there any way I can get the tableadapter to filer the result set on the database instead (IE what I want to is send select * from tbl_cpc where cpcID = 1 to the database)
And as a side note, I think this is a fairly ok design pattern for getting data from a database in vs2008. It's fairly easy to code with, read and mantain. But I would like to know it there are any other design patterns that is better out there? I use the datasets for read/update/insert and delete.

A bit of a shift, but you ask about different patterns - how about LINQ? Since you are using VS2008, it is possible (although not guaranteed) that you might also be able to use .NET 3.5.
A LINQ-to-SQL data-context provides much more managed access to data (filtered, etc). Is this an option? I'm not sure I'd go "Entity Framework" at the moment, though (see here).
Edit per request:
to get a row from the data-context, you simply need to specify the "predicate" - in this case, a primary key match:
int id = ... // the primary key we want to look for
using(var ctx = new MydataContext()) {
SomeType record = ctx.SomeTable.Single(x => x.SomeColumn == id);
//... etc
// ctx.SubmitChanges(); // to commit any updates
}
The use of Single above is deliberate - this particular usage [Single(predicate)] allows the data-context to make full use of local in-memory data - i.e. if the predicate is just on the primary key columns, it might not have to touch the database at all if the data-context has already seen that record.
However, LINQ is very flexible; you can also use "query syntax" - for example, a slightly different (list) query:
var myOrders = from row in ctx.Orders
where row.CustomerID = id && row.IsActive
orderby row.OrderDate
select row;
etc

There is two potential problem with using typed datasets,
one is testability. It's fairly hard work to set up the objects you want to use in a unit test when using typed datasets.
The other is maintainability. Using typed datasets is typically a symptom of a deeper problem, I'm guessing that all you business rules live outside the datasets, and a fair few of them take datasets as input and outputs some aggregated values based on them. This leads to business logic leaking all over the place, and though it will all be honky-dory the first 6 months, it will start to bite you after a while. Such a use of DataSets are fundamentally non-object oriented
That being said, it's perfectly possible to have a sensible architecture using datasets, but it doesn't come naturally. An ORM will be harder to set up initially, but will lend itself nicely to writing maintainable and testable code, so you don't have to look back on the mess you made 6 months from now.

You can add a query with a where clause to the tableadapter for the table you're interested in.
LINQ is nice, but it's really just shortcut syntax for what the OP is already doing.
Typed Datasets make perfect sense unless your data model is very complex. Then writing your own ORM would be the best choice. I'm a little confused as to why Andreas thinks typed datasets are hard to maintain. The only annoying thing about them is that the insert, update, and delete commands are removed whenever the select command is changed.
Also, the speed advantage of creating a typed dataset versus your own ORM lets you focus on the app itself and not the data access code.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.