how to insert 61000 data objects into sql server 2005? - c#

I am trying to insert 61,000+ objects, obtained via a remote call, into a SQL Server 2005 database in the fastest method possible. Any suggestions?
I have looked at using SQLBulkCopy, but am having a few problems working out how to get the data into the right format since I am not starting with a DataTable, but instead have a list of objects. So if answers could contain code samples that would be appreciated.
I am trying to insert the data into a temp table before processing it to keep memory usage down.
Edit...
#JP - this is something that will run every night as a scheduled batch job with an IIS ASP.NET application.
Thanks.

If this is something you are doing one time or only periodically, you should look at using SSIS (it's basically DTS on steroids). You could build a package that gets the data from one datasource and inserts it into another. There are also features for stop/start and migration tracking. Without more details on your situation, I can't really provide code, but there are a lot of code samples out there on SSIS. You can learn more and play around with SSIS in Virtual Labs.

If you intend on using the SQLBulkCopy class I would suggest that you create a custom class that implements IDataReader that will be responsible for mapping the 61000 source data objects to the appropriate columns in the destination table and then using this custom class as a parameter to the SQLBulkCopy WriteToServer method.
The only tricky part will be implementing the IDataReader interface in your class. But even that shouldn't be too complicated. Just remember that your goal is to have this class map your 610000 data objects to column names. And that your class will be called by the SQLBulkCopy class to provide the data. The rest should come together pretty easily.
class CustomReaderClass : IDataReader
{
// make sure to implement the IDataReader inferface in this class
// and a method to load the 61 000 data objects
void Load()
{
// do whatever you have to do here to load the data..
// with the remote call..?!
}
}
//.. later you use it like so
SQLBulkCopy bulkCopyInstance;
CustomReaderClass aCustomReaderClass = new aCustomReaderClass();
aCustomReaderClass.Load();
// open destination connection
// .. and create a new instance of SQLBulkCopy with the dest connection
bulkCopyInstance.WriteToServer(aCustomReaderClass);
// close connection and you're done!
I hope the above "pseudo-code" makes some sense..

#Miky D had the right approach, but I would like to expand the details. Implementing IDataReader is not really that hard.
To get IDataReader working with a bulk inserter you should look at implementing:
Dispose();
FieldCount {
object GetValue(int i);
GetSchemaTable();
Read();
The rest can be stubs that throw NotImplementedExceptions, see this sample
Getting the schema table is also pretty easy. Just select one row from the target table and call GetSchemaTable().
To keep stuff clearer I like to have an abstract class that throws NotImplementedException on the non essential methods, perhaps down the line that abstract class can implement the missing bits for added robustness.
A couple of BIG caveats with this approach:
Which methods to implement is not documented in SQLBulkCopy
With the follow on that, in later versions of the framework/hotfixes or service pack may break you. So if I had mission critical code I would take the bite and implement the whole interface.
I think, that its pretty poor that SQLBulkCopy does not have an additional minimal interface for bulk inserting data, IDataReader is way to fat.

Related

Sitecore DataProvider with multiple tables

I have been reading several examples of Sitecore DataProvider implementation on single database and single table (using the config file parameters to specify particular table and columns trying to integrate with). Just wonder if it is possible to implement the dataprovider working on multiple tables instead of just one. Couldn't find any examples on this, just asking for any ideas or possibilities.
First problem I encounter when I try to deal with multiple tables is to override GetItemDefinition method. Since this method returns only one item definition and needs to know which particular table it will get the item information from. (This is specified in the config file if just dealing with one table). Basically I am looking for a way to switch (dynamically) between tables without changing the config file params every time.
If you're creating a custom data provider then then implemetation is left entirely up to you. If you have been following some of the examples such, such as the Northwind Dataprovider then as you state the implementation acts on a single database as specified in config. But you can specify whatever you need in the methods that you implement, and run logic to switch the select statement you call in the methods such as GetItemDefinition() and GetItemFields(). You can see in the Northwind exmaple that the SQL query is dynamically built:
StringBuilder sqlSelect = new StringBuilder();
sqlSelect.AppendFormat("SELECT {0} FROM {1}", nameField, table);
If you are building a read-only dataprovider then you might be able to make use of SQL Views, allowing you to write a query to combine the results from several tables using UNION operator. As long as each record has a unique ID across tables (i.e. if you are using GUIDs as the ID) then this should work fine.

Fetching SQL data from different tables with ADO.NET, Generics

I often come to a point that when developing an application that connects to a MS SQL database for basic CRUD functions, I need to iterate through the returned results and populate a list of a particular data type that matches the table design.
Specific to .NET, I would just create a method (e.g. GetAllObjA()) and use ADO.NET to specify a SqlCommand that hooks up to a Stored Proc to fetch the data from the return SqlDataReader. I would then iterate through the returned rows, creating a new object and adding that to a list for each.
If however, I wanted to fetch data for different data type, I would rewrite this for methods GetAllObjB(), GetAllObjC() and so forth with the list's data type being different, which feels like a complete waste of rewriting code.
I realise Generics have a purpose here where the data type can be specified in the one method such as GetAll< T >(). Unfortunately this would still require me to define which table I would be fetching the data from and still require me to match the column names with the member names in the object, which doesn't really solve the problem as the application code has no way of knowing how the table is designed.
Is this the extent of Generics' usefulness in this instance? As the applications I am building are fairly small scale, I don't feel an ORM is warranted if it is possible to hand-code the solution instead.
I agree with the comments that micro-ORM's can either solve your scenario or give you some good ideas. Massive is just plain fun to read because it fits in one file. It uses the dynamic functionality of the Framework to solve your issue. Dapper is oriented towards the mapping aspect.
Also, take a look at the Data Access Application Block. Although this is now Open Source, it was originally maintained by Microsoft. It was an overall enterprise application framework, so there are several bloated dependencies that you do not need. But the data access block has some great prototypes for exactly what you are asking for: mapping a IDataReader resultset to a POCO using generics. So you write the mapping code only once, and the only thing you define per table is the actual reader-to-poco property mapping.
Sometimes a table or its mapping may have some quirks, so you want to keep the mapping definition by hand. For other simple tables with basic primitives, the use of dynamics as demonstrated by Rob Connery's Massive combined with the generic rowset mapper can make for some very easy-to-read-and-maintain code.
Disclaimer: this is not a judgement on the merits of this approach over EF. It is simply suggesting a simple, non-EF approach.
So, you might have a small library defining an interface:
public interface IResultSetMapper<TResult>
{
Task<List<TResult>> MapSetAsync(IDataReader reader);
}
and a generic ADO.Net helper class that processes any reader:
// a method with a class that manages the Command (_Command) and Connection objects
public async Task<List<TOut>> ExecuteReaderAsync<TOut>(IResultSetMapper<TOut> resultsMapper)
{
List<TOut> results = null;
try
{
using(var connection = await _DbContext.CreateOpenedConnectionAsync())
{
_Command.Connection = connection;
// execute the reader and iterate the results; when done, the connection is closed
using(var reader = await _Command.ExecuteReaderAsync())
{
results = await resultsMapper.MapSetAsync(reader);
}
}
return results;
}
catch(Exception cmdEx)
{
// handle or log exception...
throw;
}
}
So, the above code would be a helper library, written once. Then your mapper in your application might look like;
internal class ProductReaderMap : IResultSetMapper<Product>
{
public async Task<List<Product>> MapSetAsync(IDataReader reader)
{
List<Product> results = new List<Product>();
using(reader)
{
results.Add(new Product
{
ProductId = r.GetInt32(0),
ProductName = r.GetString(1),
SupplierId = r.GetInt32(2),
UnitsInStock = r.GetInt16(3)
});
}
return results;
}
}
You could break this out even further, defining a row mapper rather that a row set mapper, since the iteration over the reader can be abstracted as well.

Static vs. Instance Write Methods in Data Access Layer

I am creating a Data Access Layer in C# for an SQL Server database table. The data access layer contains a property for each column in the table, as well as methods to read and write the data from the database. It seems to make sense to have the read methods be instance based. The question I have is regarding handling the database generated primary key property getter/setter and the write method. As far as I know I have three options...
Option 1: Using a static method while only allowing a getter on the primary key would allow me to enforce writing all of the correct values into the database, but is unwieldy as a developer.
Option 2: Using and instance based write method would be more maintainable, but I am not sure how I would handle the get/set on the primary key and it I would probably have to implement some kind of validation of the instance prior to writing to the database.
Option 3: Something else, but I am wary of LINQ and drag/drop stuff, they have burned me before.
Is there a standard practice here? Maybe I just need a link to a solid tutorial?
You might want to read up on active record patterns and some examples of them, and then implement your own class/classes.
Here's a rough sketch of a simple class that contains some basic concepts (below).
Following this approach you can expand on the pattern to meet your needs. You might be OK with retrieving a record from the DB as an object, altering its values, then updating the record (Option2). Or if that is too much overhead, using a static method that directly updates the record in the database (Option1). For an insert, the database (SP/query) should validate the natural/unique key on the table if you need to, and probably return a specific value/code indicating a unique constraint error). For updates, the same check would need to be performed if allowing natural key fields to be updated.
A lot of this depends on what functionality your application will allow for the specific table.
I tend to prefer retrieving an object from the DB then altering values and saving, over static methods. For me, it's easier to use from calling code and can handle arcane business logic inside the class easier.
public class MyEntityClass
{
private int _isNew;
private int _isDirty;
private int _pkValue;
private string _colValue;
public MyEntityClass()
{
_isNew = true;
}
public int PKValue
{
get {return _pkValue;}
}
public string ColValue
{
get {return _colValue;}
set
{
if (value != _colValue)
{
_colValue = value;
_isDirty = true;
}
}
}
public void Load(int pkValue)
{
_pkValue = pkValue;
//TODO: query database and set member vars based on results (_colVal)
// if data found
_isNew = false;
_isDirty = false;
}
public void Save()
{
if (_isNew)
{
//TODO: insert record into DB
//TODO: return DB generated PK ID value from SP/query, and set to _pkValue
}
else if (_isDirty)
{
//TODO: update record in DB
}
}
}
Have you had a look at the Entity Framework. I know you said you are wary of LINQ, but EF4 takes care of a lot of the things you mentioned and is a fairly standard practice for DALs.
I would stick with an ORM Tool (EF, OpenAccess by Telerik, etc) unless you need a customized dal that you need (not want) total control over. For side projects I use an ORM - at work however we have our own custom DAL with provider abstractions and with custom mappings between objects and the database.
Nhibernate is also a very solid tried and true ORM with a large community backing it.
Entity Framework is the way to go for your initial DAL, then optimize where you need it: Our company actually did some benchmarking in comparing EF vs SQL reader, and found that for querying the database for one or two tables worth of information, the speed is about 6's (neither being appreciably faster than the other). After two tables there is a performance hit, but its not terribly significant. The one place that writing your own SQL statements became worthwhile was in batch commit operations. At which point EF allows you to directly write the SQL queries. So save your self some time and use EF for the basic heavy lifting, and then use its direct connection for the more complicated operations. (Its the best of both worlds)

How to convert a DataSet object into an ObjectContext (Entity Framework) object on the fly?

I have an existing SQL Server database, where I store data from large specific log files (often 100 MB and more), one per database. After some analysis, the database is deleted again.
From the database, I have created both a Entity Framework Model and a DataSet Model via the Visual Studio designers. The DataSet is only for bulk importing data with SqlBulkCopy, after a quite complicated parsing process. All queries are then done using the Entity Framework Model, whose CreateQuery Method is exposed via an interface like this
public IQueryable<TTarget> GetResults<TTarget>() where TTarget : EntityObject, new()
{
return this.Context.CreateQuery<TTarget>(typeof(TTarget).Name);
}
Now, sometimes my files are very small and in such a case I would like to omit the import into the database, but just have a an in-memory representation of the data, accessible as Entities. The idea is to create the DataSet, but instead of bulk importing, to directly transfer it into an ObjectContext which is accessible via the interface.
Does this make sense?
Now here's what I have done for this conversion so far: I traverse all tables in the DataSet, convert the single rows into entities of the corresponding type and add them to instantiated object of my typed Entity context class, like so
MyEntities context = new MyEntities(); //create new in-memory context
///....
//get the item in the navigations table
MyDataSet.NavigationResultRow dataRow = ds.NavigationResult.First(); //here, a foreach would be necessary in a true-world scenario
NavigationResult entity = new NavigationResult
{
Direction = dataRow.Direction,
///...
NavigationResultID = dataRow.NavigationResultID
}; //convert to entities
context.AddToNavigationResult(entity); //add to entities
///....
A very tedious work, as I would need to create a converter for each of my entity type and iterate over each table in the DataSet I have. Beware, if I ever change my database model....
Also, I have found out, that I can only instantiate MyEntities, if I provide a valid connection string to a SQL Server database. Since I do not want to actually write to my fully fledged database each time, this hinders my intentions. I intend to have only some in-memory proxy database.
Can I do simpler? Is there some automated way of doing such a conversion, like generating an ObjectContext out of a DataSet object?
P.S: I have seen a few questions about unit testing that seem somewhat related, but not quite exact.
There are tools that map between objects, such as automapper. This is a very good open source tool.
However, these tools sometimes have problems, for example generating duplicate entity keys, or problems when the structure of the objects being mapped are very different.
If you are trying to automate it, I think that there is a greater chance of it working if you use EF 4 and POCO objects.
If you end up writing the mapping code manually, I would move it into a seperate procedure with automated unit tests on it.
The way we do this is to create a static class with "Map" methods":
From DTO to EF object
From EF to DTO
Then write a test for each method in which we check that the fields were mapped correctly.

What DAL strategy do you use or suggest?

My situation is that I screwed up essentially. I inherited my code base about 1.5 years ago when I took this position and rather than reinventing the wheel, even though I know now that I should have, I kept the DAL in pretty much the same structure as the previous developer.
Essentially there is one file (now at 15k lines of code) that serves as a go between to a bunch of DAO's that use DataSets and TableAdapters to retrieve data. My xsd files have grown to such size that they cause R# to crash visual studio every time it opens and the intermediary class that is now 15k lines also takes forever for R# to analyze. Not to mention it is ugly, it works but not well, and is an absolute nightmare to debug.
What I have tried thus far is switching to NHibernate. NHibernate is a great library, but unfortunately it was not adaptable enough to work with my application, from what the lead developer says (Fabio Maulo) it is pretty much a combination of my application requirements and the restrictions upon NHibernate when using identity as a database PK strategy.
So now I am back to essentially designing my own DAL. I am looking at a few different patterns for this, but would like to get your DAL design strategies. There are so many ways and reasons to implement a DAL in a particular manner so if you could please explain your strategy and why it was best fit for you I would greatly appreciate it.
Thanks in advance!
Edit: Let me explain why NHibernate did not work since that seems to be the immediate response. My users create a "job" that is actually just a transient representation of my Job class. Within this job they will give it one or a list of weight factors that are also transient at the time of creation. Finally they provide a list of job details that have a particular weight factor associated to them. Because, in the DB, weight factors are unique when I go to persist the job and it cascades down to weight factor it dies when it finds a duplicate weight factor. I tried running a check before assigning the weight factor to the detail (which I didn't want to do because I don't want the extra calls to the db) but calling CreateCriteria in NH also causes a flush in the session, according to Fabio, which destroys my cache and thus kills the entire in memory representation of the job. Folks over at the NH mailing list said I should switch over to GUID, but that is not a feasible option as the conversion process would be a nightmare.
My experience with NHibernate is that, while it is packed with features and very high-performance, you will eventually need to become an NHibernate expert in order to fix some unexpected behavior. Reading through the pro-NHibernate answers and seeing
Hmm , perhaps he uses long running
Sessions (Session per Business
Transaction model), and in such an
approach, using identity is
discouraged, since it breaks your
unitofwork (it needs to flush directly
after inserting a new entity). A
solution could be to drop the
identity, and use the HiLo identity
generator.
illustrates exactly what I mean.
What I've done is create a base class modeled somewhat off of the ActiveRecord pattern, that I inherit from and mark up the inherited class with attributes that attach it to a stored procedure each for Select, Insert, Update and Delete. The base class uses Reflection to read the attributes and assign the class's property values to SP parameters, and in the case of Select(), assign the result SQLDataReader's column values to the properties of a list of generics.
This is what DataObjectBase looks like:
interface IDataObjectBase<T>
{
void Delete();
void Insert();
System.Collections.Generic.List<T> Select();
void Update();
}
This is an example of a data class deriving from it:
[StoredProcedure("usp_refund_CustRefundDetailInsert", OperationType.Insert)]
[StoredProcedure("usp_refund_CustRefundDetailSelect", OperationType.Select)]
[StoredProcedure("usp_refund_CustRefundDetailUpdate", OperationType.Update)]
public class RefundDetail : DataObjectBase<RefundDetail>
{
[StoredProcedureParameter(null, OperationType.Update, ParameterDirection.Input)]
[StoredProcedureParameter(null, OperationType.Insert, ParameterDirection.Output)]
[StoredProcedureParameter(null, OperationType.Select, ParameterDirection.Input)]
[ResultColumn(null)]
public int? RefundDetailId
{ get; set; }
[StoredProcedureParameter(null, OperationType.Update, ParameterDirection.Input)]
[StoredProcedureParameter(null, OperationType.Insert, ParameterDirection.Input)]
[StoredProcedureParameter(null, OperationType.Select, ParameterDirection.Input)]
[ResultColumn(null)]
public int? RefundId
{ get; set; }
[StoredProcedureParameter(null, OperationType.Update, ParameterDirection.Input)]
[StoredProcedureParameter(null, OperationType.Insert, ParameterDirection.Input)]
[ResultColumn(null)]
public int RefundTypeId
{ get; set; }
[StoredProcedureParameter(null, OperationType.Update, ParameterDirection.Input)]
[StoredProcedureParameter(null, OperationType.Insert, ParameterDirection.Input)]
[ResultColumn(null)]
public decimal? RefundAmount
{ get; set; }
[StoredProcedureParameter(null, OperationType.Update, ParameterDirection.Input)]
[StoredProcedureParameter(null, OperationType.Insert, ParameterDirection.Input)]
[ResultColumn(null)]
public string ARTranId
{ get; set; }
}
I know it seems like I'm reinventing the wheel, but all of the libraries I found either had too much dependence on other libraries (ActiveRecord + NHibernate, for instance, which was a close second) or were too complicated to use and administer.
The library I made is very lightweight (maybe a couple of hundred lines of C#) and doesn't do anything more than assign values to parameters and execute the SP. It also lends itself very well to code generation, so eventually I expect to write no data access code. I also like that it uses a class instance instead of a static class, so that I can pass data to queries without some awkward criteria collection or HQL. Select() means "get more like me".
For me the best fit was a pretty simple concept - use DAO class definitions and with reflection create all SQL necessary to populate and save them. This way there is no mapping file, only simple classes. My DAO's require an Entity base class so it is not a POCO but that doesn't bother me. It does support any type of primary key, be it single identity column or multi column.
If your DAL is written to an interface, it would be much easier to switch to NHibernate or something comperable (I would prefer Fluent-NHibernate, but I digress). So why not spend the time instead refactoring the DAL to use an interface, and then write a new implementation using NH or your ORM of choice?
In recent projects we have stopped programming a separate DAL.
Instead we use an Object Relational Mapper (in our case Entity Framework). We then let the business layer program directly against the ORM.
This has saved us over 90% of development effort in some cases.
My first step would be to break the code out of a 15 KLOC monster, then come up with a strategy for creating a new DAL.
Linq to SQL is nice if you are using SQL Server. There is source out there for a LinqToSQL provider to Access and MySQL. I haven't tested it though. LinqToSql follows the UnitOfWork model which is similar to the way ADO.NET functions. You make a series of changes to a local copy of the data then commit all the changes with one update call. It's pretty clean I think.
You can also extend the DataRow class yourself to provide strongly typed access to your fields. I used XSLT to generate the DataRow descendants based on the metadata of each table. I have a generic DataTable decendant. MyDataTable where T is my derived row. I know that MS's strongly-typed datasets do a similar thing but I wanted a light-weight generic version that I complete control of. Once you have this, you can write static access methods that query the db and fill the DataTable.
You would be in charge of writing the changes from the DataTable back to the DataSource. I would write a generic class or method that creates the update,inserts and deletes.
Good Luck!
I use mine wrapper for SPs for the fastest data retrieving and L2S when perfomance is not a goal. My DAL uses repository pattern and encapsulated logic for TDD.

Categories