List to Database

List to Database - c#

I might be way off here, and this question probably bordering subjective, but here goes anyway.
Currently I use IList<T> to cache information from the database in memory so I can use LINQ to query information from them. I have a ORM'ish layer I've written with the help of some questions here on SO, to easily query the information I need from the DB. For example:
IList<Customer> customers = DB.GetDataTable("Select * FROM Customers").ToList<Customer>();
Its been working fine. I also have extension methods to do CRUD updates on single items within these lists:
DB.Update<Customer>(customers(0));
Again working quite well.
Now in the GUI layer of my app, specifically when binding DataGridView's for the user to edit the data, i find myself bypassing this DAL layer and directly using TableAdapters within the forms which kind of breaks the layered architecture which smells a bit to me. I've also found the fact that I'm using TableAdapters here and ILists there, there are differing standards followed throughout my code which I would like to consolidate into one.
Ideally, I would like to be able to bind to these lists and then have the DAL update the list's 'dirty' data for me. To me, this process would involve the following:
Traversing the list for any 'dirty' items
For each of these, see if there is already an item with the PK in the DB
If (2), then update, else insert
Finally, perform a Delete FROM * WHERE ID NOT IN('all ids in list') query
I'm not entirely sure how this is handled in a TableAdapter, but I can see the performance of this method dropping significantly and quite quickly with increasing items in the list.
So my question is this:
Is there an easier way of committing List to a database? Note the word commit, as it may be an insert/update or delete.
Should I maybe convert to DataTable? e.g. here
I'm sure some of the more advanced ORM's will perform this type of thing, however is there any mini-orm (e.g. dapper/Petapoco/Simple.data etc) that can do this for me? I want to keep it simple (as is with my current DAL) and flexible (I don't mind writing the SQL if its gets me exactly what I need).

Currently I use IList to cache information from the database in memory so I can use LINQ to query information from them.
Linq also has a department called Linq-to-Datasets so this is not a compelling reason.
Better decide what you really want/need:
a full ORM like Entity Framework
use DataSets with DataDapters
use basic ADO.NET (DataReader and List<>) and implement your own change-tracking.
You can mix them to some extent but like you noted it's better to pick one.

Related

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.

You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).

You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)

If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.

You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.

Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

Confusion with 3 layer design

I've been reviewing examples on the web of 3 layer design and I've noticed that most samples return either datasets or data tables. The thing that is confusing me is what if you would rather return a generic list of type so you can utlize properties or methods from within the type your list is based on? As example using a Name property that concats various fields in a specific way depending on the data, if the List is bound to a control on a form then the Name property can be used as the datafield. If you would want to accomplish the same thing when using a dataset or table, you'd have to return the data from the database to acheive the same (I try not to use datasets or datatables so I'm probably very wrong about this statement. :) )
The part that is really confusing me is about resusing code, to me it seems the only way to reuse code is to retrieve the data into either a dataset or datatable and then loop through the data and add it to a List, is this generally the best practice for 3 layer or is there a way to do this without datasets and datatables.
The example in the link below demonstrates in essence using datasets or tables and then adding it to an object but I'm forced to ask if this is the best practice?
http://www.codeproject.com/Articles/36847/Three-Layer-Architecture-in-C-NET
Thanks

Using DataTables is a specific dotnetism. The reason behind it is that they contain metadata about the structure of the data, which lets DataGrid (and other such components) display the data automatically without using reflection or so. My guess is this is amongst other things a heritage of the MS Access approach to RAD, where the intent was enabling "business people" to create apps by generating the user interface directly from a SQL schema, essentially doing the opposite of a tiered design. This heritage then seems to have leaked into the hivemind.
There's nothing wrong about using "plain" data structures, as long as you're willing to give up the RAD features, and the trend lately seems to have been to get rid of this tradeoff too. (For instance with Web Forms' strongly typed data controls, and MVC's model binding features.)
Also, speaking more generally, Code Project articles from before MVC was established are not really a good source of wisdom on general software architecture.

What you should carry your data on depends entirely on your needs.
If you retrieve data from the DB and bind it to a datagrid, datasets might give you the perfect solution. If you want some other method where data tracks its own update status you should look into Entity Framework. If you retrieve data and send it through a web service for cross platform or cross domain processing you need to load your data onto some other serializable classes of your own.
Take a look at the article below. It is a little old and targeted at EF4 but it summerizes pros and cons of different strategies very well. (There are three articles in the series, I suggest you read them all)
http://msdn.microsoft.com/en-us/magazine/ee335715.aspx

I think the samples you're finding used data tables and datasets because it's a simple way to show 3-tier design. Now days Entity Framework has largely replaced the "data access layer" mentioned in the sample.
Before entity framework when I wrote a data access layer I would return a generic list that I built from the database. To run an update, delete, or insert I would pass an object in as the parameter to the methods, then use the object's properties as the values in the sql statement. I preferred doing it that way for the reasons you mentioned but also because it allowed me to change the object definitions or db schema (or even use a different db all together) independently of each other.

Beginning learning SQL with C#/ASP.NET

Sorry if this has been asked elsewhere, but I couldn't find a clear answer anywhere.
I have decided to begin learning to use relational databases a bit more, namely SQL. This is a major beginners question but its probably essential to get started on.
I'm basically a little confused the best practice on how to utilize SQL (or other). At college i have accessed databases (using JSON strings) for things such as mobile apps, but i have never actually designed and built a database myself, as my tutor made the mentioned database for us to access himself.
Lets say I have a C# application that holds genealogy information (i.e. families and their members) and i wanted to store each individual on a database. Would I, simply use the structure I already have but save to fields in a database instead of an xml or text document? Or does it work the other way, i.e. do I create a database with required fields then just retrieve this from the database in a c# application and manipulate the data as I so wish, so the application would be entirely different (so the c# application basically doesn't really hold/store any data and just works on whats fed from the database)?
Whats troubling me is that usually where I would store my c# objects in a dictionary or list for example, would i instead just retrieve straight from the database? or retrieve from the and store the data into a normal structure and work from there (surely this would defeat the point of fast-searching from a database)?
I may be over-thinking it slightly. Hope that makes sense. Thanks in advance

Would I, simply use the structure I already...
or
do I create a database with required fields...
I think that is the crux of your question.
Starting from the database
For me, when building an application that uses a backend database, an Entity-Relationship diagram is pretty crucial. I found quite a nice little tutorial for you here: http://www.sum-it.nl/cursus/dbdesign/english/index.php3 but you can easily find one that suits your learning style. The key point is that you are trying to model the problem domain (the real world out there that needs your application) in a way that your application can somehow capture. Once you have an E-R diagram of related tables, it is easier to figure out the details. Using SQL Management Studio for SQL Server 2008 (Express edition) you can create a few basic tables and build the E-R diagram right there and have it generate relationships for you. You can then, at your leisure, examine the SQL used to achieve that and refine accordingly.
Personally, I always start by examining the problem domain, then I build the E-R diagram, then I build the database. I start building the C# application when I'm reasonably confident the database reflects the problem domain.
Starting from your C# application
However, what really matters is that you model the real world in a meaningful and effective way. In your case you already have a starting point in structures you've created in C# and you can use them to give you a starting point to build the E-R diagram. If you find it easier to get a C# application going and then build a database that reflects it, that should be fine. Perhaps you already have an approach that helps you capture the problem domain effectively. It's an iterative process whatever you do: building the C# code might reveal problems with the underlying database design and vice versa.
Diagramming - E-R or UML?
I'm personally convinced that this whole business is so complicated that you really need some diagrams.
to visualise your database, use an E-R diagram
to visualise your C# application use a UML class diagram
As you head towards a working application, you'll see how these 2 diagrams begin to match or at least reflect eash other pretty closely. In both cases, (entities or classes) understanding the relationship between objects will be really important when you query the database because it is crucial to understand relationships between tables (especially using 1-to-many relationships to resolve a complex many-to-many relationship) and various techniques for joining tables in queries (INNER or OUTER joins etc) No matter how clever your C# application is, you will at some point need to understand at least some of the complexities of the SQL language - and it is easier if you can refer to an E-R diagram.
Where to store?
Whats troubling me is that usually where I would store my c# objects in a dictionary or list for example, would i instead just retrieve straight from the database?
In the database, without a doubt. A C# class called Family would have a property FamilyName, say, with a setter method built in. If you discover a spelling mistake and want to change the name, the setter method would open a connection to the database, run an UPDATE query with the specified family name, (and probably the family id) as a parameter, and update the underlying field accordingly. Retrieving data would involve running a SELECT query etc.
Conclusion
Do some tutorials on how to examine a problem domain, create an entity-relationship diagram and build a set of related tables based on the diagram. I'm convinced that way you'll find it much easier to keep track of the C# classes that you build to communicate with the backend database.
Here's an example of a simple E-R diagram for families and their members:
To begin with you might think members and family could be in one table, but then you discover that creates a lot of duplication so you separate that out into family and member table with a one-to-many relationship, but then you realise that, through marriage for instance, people can belong to more than one family and you need to create a many-to-many relationship. I think the E-R diagram is the best place to work out that kind of complexity.

Not knowing what your structures look like or how your DB will be designed this is hard to answer. But you should be able to use existing data structures, and just pipe the data from the database instead of the XML file.
Look into Linq-to-XML, C# has a strong library to interact with SQL. May be a bit confusing at first, but very powerful once you learn it.

If I am right you are asking also if you should retrieve all the records from the database and store them as objects in a collection or retrieve selected records from the database and use the dataset results without placing them in a purpose defined structure.
I tend to select the records I want from the database and then load the results into my purpose defined classes / structures. This allows you to add your manipulation methods to the class holding a record result etc. without needing to take in dataset results to each method. However you will find yourself doing singular updates all the time when a batch update might be more efficient... if that makes sense.

Take a look at entity frameworks code first. If your data structures are classes in your application there are techniques to use that to create your database schema from that. As far as the data. Store it in your database and populate your lists and dictionaries with it. Or populate list of class genealogy individual with it.

If you want to write your own data classes, there's a free tutorial here written by myself. What I would definitely not to is use the data sources in ASP.NET, as these wizards are the Barty Crouches of the ASP.NET world - they appear good, but turn out to be evil, as inevitably you'll want to be able to tweak them and you won't understand how to do this.

repositories and querying with raw sql?

I am struggling to understand how to best query a repository.
The three factors that are throwing me through a loop right now are:
Return type of data
Columns to run query on
Number of records to return
Point 1
In regards to question one:
I have Repositories with lot of methods that return a combination of both Entities and scalar values. This seems to lead to "method explosion". Should I always return an Entity object? How should I query for objects where I only need one column?
Point 2
When running a query should I include every column in the table even if I only need one, or two columns? If I create specific queries for this it leads to more methods in the Repository
Point 3
How should I provide conditions for the query? I read about Specifications, but my understanding is that you loop through the returned records and filter out the ones that pass into a new collection. This doesn't seem like a good idea performance wise. Right Now I just make a new method in the Repo like getNameById() which encapsulates the condition.
Please not that I am not using an ORM, I just have raw sql in my Repositories.
Update
Point 1:
Based on the answers and a bit more research would this be a good implementation?
Right now I have a large repository that return a mix of scalar and entity type objects (all same entity). I'm thinking I could reduce this greatly if I just use a GetUser(userId) method and forget writing methods that just return single column values.
For example if I need to return a user name I could call the GetUser(userId) method that hydrates the User object and then in the service layer just filter it down to the username.
Another way would be to use some sort of QueryBuilder class I could pass into the Repository which could be parsed to generate the proper sql.
Point 2
Looking back this is pretty similar to point one and my current solution would be to just grab all table fields. It's a tradeoff between performance and maintainability.
Point 3
I would need to provide some sort of where clause. I'm not sure if this make sense doing via Specification or just a sql string. My current solution is to make new methods for these types, but I would like something more generic for the Repository
Overall, still researching into this... I'd love to hear more input into this or links to books or references that kind of tie this all together.

I have Repositories with lot of methods that return a combination of both Entities and scalar values. This seems to lead to "method explosion". Should I always return an Entity object? How should I query for objects where I only need one column?
You can fight repository method explosion similar to how you would fight other SRP violations. You can create another repository for the same entity. See this answer to a similar question.
When running a query should I include every column in the table even if I only need one, or two columns? If I create specific queries for this it leads to more methods in the Repository
This is not a DDD question. Domain driven design does not deal with 'rows and columns'. There is always some redundancy in how much data you load to 'hydrate' the domain object, but you have to measure whether this really affects your performance. If this is really a performance bottleneck than it maybe a symptom of incorrect domain model.
How should I provide conditions for the query? I read about Specifications, but my understanding is that you loop through the returned records and filter out the ones that pass into a new collection. This doesn't seem like a good idea performance wise. Right Now I just make a new method in the Repo like getNameById() which encapsulates the condition.
This again is a data access issue. Nothing in DDD says that your repository can not convert Specification to a SQL query. It is up to you whether you do this or iterate over records in memory (as long as repository consumer only sees Specification and Repository and stays unaware of the actual implementation).
Regarding 'Raw SQL vs. ORM in DDD' you may find this answer interesting.

I agree with everything Dmitry says, but perhaps think you should have a read of CQRS.
I used to ask similar questions when getting started with DDD (regarding 'method explosion', not your SQL issues), and this lead me to CQRS. Personally, I don't really see how DDD is practical without it, and it answers a lot of these sorts of questions when it comes to querying data. Using it's principles what I'd suggest is:
Only use domain repositories when committing a transaction. That is, you don't use repositories to display data in the UI. You only fetch aggregates from your repository when you want to perform an operation against them.
Your repositories only return aggregates, not individual entities separately. This makes sense now as we are only using repositories in a transactional sense, and entities can only be mutated via atomic operations and persisted by the aggregate as a whole.
You create separate repositories (or 'query services') which provide tailor made queries and data types for whatever data you need. These can return dumb DTOs with no logic.
This keeps your proper domain & repositories clean, whilst providing the means to create a thin data access layer that provides high performing queries.
Regarding the specification pattern: rather than converting it to a SQL query in code, you could provide public properties on the specification that represent the criteria. These values could then be added in the where clause of your SQL or sent as parameters to a SPROC.

First of all, you haven't really explained what you are using all these queries for. Chances are it's for user interface needs. If so, there's no need to jump through all these hoops (service->repository->domain->dto->client), just query the database as directly as possible. And what do you know, gone are the questions whether you can query for scalars or just the columns you need. Just use plain sql and return what you need. Don't create abstractions that cause friction.

Chobo,
We need remember two things about Repository [Fowler PoEAA][Evans DDD] pattern:
Use a Repository pattern as a simple collection. Repository abstract these infrastructure details because aren't from the domain.
If a Repository pattren is a collection, it is a cluster of objects of the same type.
Two others types may help your Repository: the Query Object [Fowler PoEAA] and Data Mapper [Fowler PoEAA] patterns. Query Object pattern aggregate criterias using object oriented methods and know how to translate them as a SQL statement. Data Mapper pattern know map the object states from application and the table columns from databases.
You can use Lazy Load pattern [Fowler PoEAA] to mitigate the problem of large object in memory.
Success for you!

Win-based application (C#) needs a class for handling DB Connection

I think about having a class clsConnection which we can take advantage of in order to execute every SQL query like select, insert, update, delete, .... is pretty good.
But how complete it could be? How?

You could use LINQ to SQL as AB Kolan suggested or, if you don't have time for the learning curve, I'd suggest taking a look at the Microsoft Enterprise Library Data Access Application Blocks.

You can use the DAB (SQlHelper) from the enterprise Library. This has all the methods/properties necessary for database operation. You dont need to create you own code.
Alternately you can use a ORM like LINQ or NHibernate.

It sounds to me like you're just re-writing the ADO.NET SqlConnection (which already has an attached property of type SqlCommand). Or Linq to SQL (or, even, Linq to Entities).

When doing data access i tend to split it into 2 tiers - purely for testability.
totally seperate the logic for getting key values and managing the really low level data collection from the atomic inserts, updates, selects deletes etc.
This way you can test the logic of the low level data collection very easily without needing to read and write from a database.
this way one layer of classes effectively manages writes to individual tables whilst the other is concerned with getting the data from lookups etc to populate these tables
The Business logic layer that sits on top of these 2 dal layers obviously manages the actual business logic - this means that the datastructure is as seperated from the business logic as is realistically possible ... Ie you could replace the dal and not feel the pain so much.
the 2 routes you can take that work well are
ADO.Net
this is very powerful as you have total control, but at the same time it is time consuming and feels repetative. Also its old school so most people are bored of it hence all the linq 2 sql comments. With this you open a connection to the DB and then execute a command against it.
Basically you create a class to interface with the database and use this to use stored procedures that are in the database. The lowest level class essentially fires off the command with its parameters and then populates itself with the returned values.
and Linq 2 SQL
This is a cool system. Essentially it makes SP's redundant for 90% of cases in return for allowing strongly typed sqlesque statements in your code - save time and are more reliable. I still use 2 dal layers with this but take advantage of the fact that it will generate the basic class with properties for you and simply add functionality to actually do the atomic operations. The higher level then implements the read and write logic for multiple objects.
The nicest part is that you can generate collections of collections easily with linq 2 sql and then write all the inserts and updates with one command (altohguh in reality you tend to do things seperatley).
L2S is powerful once you start playing with it wheras generating a collection of objects from ado.net can be a real pain in comparison - especially when you have to do it again and again.
Another alternative is Linq 2 entities
I ahve had problems with this due to linked servers, also it doesn't like views much and if your tables dont have pk's or constraints then it doesn't like life much either. Id stay clear of it for a while.
Of course if you mean that you want a generic class for writing and reading data from a database I think you will be adding complexity rather than solving a problem. Really you can;t avoid writing code ;) - each bit of data access is unique, trying to genericise it past ado.net or l2s is really asking for trouble imo.

Small project:
A singleton class (like DatabaseConnection) might be good for what you're doing.
Large project:
Enterprise Library has some database code; NHibernate or Entities Framework, perhaps.
Your question wasn't specific enough to give a very definitive answer on this.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.