Is there a better/faster way to access raw data?

Is there a better/faster way to access raw data? - c#

This isn't related to a particular issue BUT is a question regarding "best practise".
For a while now, when I need to get data straight from the database I've been using the following method - I was wondering if there's a faster method which I don't know about?
DataTable results = new DataTable();
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["Name"]))
{
connection.Open();
using (SqlCommand command = new SqlCommand("StoredProcedureName",connection))
{
command.CommandType = CommandType.StoredProcedure;
/*Optionally set command.Parameters here*/
results.Load(command.ExecuteReader());
}
}
/*Do something useful with the results*/

There are indeed various ways of reading data; DataTable is quite a complex beast (with support for a number of complex scenarios - referential integrity, constraints, computed values, on-the-fly extra columns, indexing, filtering, etc). In a lot of cases you don't need all that; you just want the data. To do that, a simple object model can be more efficient, both in memory and performance. You could write your own code around IDataReader, but that is a solved problem, with a range of tools that do that for you. For example, you could do that via dapper with just:
class SomeTypeOfRow { // define something that looks like the results
public int Id {get;set;}
public string Name {get;set;}
//..
}
...
var rows = connection.Query<SomeTypeOfRow>("StoredProcedureName",
/* optionalParameters, */ commandType: CommandType.StoredProcedure).ToList();
which then very efficiently populates a List<SomeTypeOfRow>, without all the DataTable overheads. Additionally, if you are dealing with very large volumes of data, you can do
this in a fully streaming way, so you don't need to buffer 2M rows in memory:
var rows = connection.Query<SomeTypeOfRow>("StoredProcedureName",
/* optionalParameters, */ commandType: CommandType.StoredProcedure,
buffered: false); // an IEnumerable<SomeTypeOfRow>
For completeness, I should explain optionalParameters; if you wanted to pass #id=1, #name="abc", that would be just:
var rows = connection.Query<SomeTypeOfRow>("StoredProcedureName",
new { id = 1, name = "abc" },
commandType: CommandType.StoredProcedure).ToList();
which is, I think you'll agree, a pretty concise way of describing the parameters. This parameter is entirely optional, and can be omitted if no parameters are required.
As an added bonus, it means you get strong-typing for free, i.e.
foreach(var row in rows) {
Console.WriteLine(row.Id);
Console.WriteLine(row.Name);
}
rather than having to talk about row["Id"], row["Name"] etc.

Related

Select * from two tables into two separate lists

Basic idea in psuedocode
Select(a, b) => new Tuple<List<Item1>, List<Item2>>(a, b)
I am trying to accomplish this in:
A single query to the db
obviously using linq (query or method syntax)
Here are the two classes involved
public class Bundle
{
public Guid Id { get; set; }
public string Name { get; set; }
public HashSet<Inventory> Inventories { get; set; }
}
public class Inventory
{
public Guid Id { get; set; }
public string Name { get; set; }
public int Stock { get; set; }
}
Right now all I can think of is
using (var context = new MyEntities())
{
return new Tuple<IEnumerable<Inventory>, IEnumerable<Bundle>>(context.Inventories.OrderBy(a => a.Stock).ToList()
, context.Bundles.Include(b => b.Inventories).OrderBy(c => c.Name).ToList());
}
However, this would hit the database twice.
I know UNION is used to combine result sets from 2 different queries but the two queries must have the same # of columns so I'm assuming it's best used when selecting the same data.
How can I select data from two different tables into two separate lists with only ONE hit on the db?

If you want two result sets, you can do it by throwing two queries. This can be done in a single database call without issue, but it won't magically divide into two sets of objects as you are interested in.
In fact, asking more than one question and getting more than one result set is very common when the cost of establishing connection (instantiation cost or latency cost, etc.) is great enough to warrant it. I have done it myself in a stored procedure, asking for everything a page needs in one query.
But, if performance is the key issue, caching is also very common. And, if these are drop down lists, or something else where the list requested is small, and the list does not change often, you can even pull it into memory when the application starts and let it set on the web server so you are not making the database trip.
I am not fond of LINQ to SQL, as it creates a mess for DBAs, but you can do it something like this (just one example, any method where you can chain commands will work):
var connString = "{conn string here}";
var commandString = "select * from tableA; select * from tableB";
var conn = new SqlConnection(connString);
var command = new SqlCommand(commandString, conn);
try {
conn.Open();
var result = command.Execute();
// work with results here
}
finally {
conn.Dispose();
}
I have not filled in all of the details, but you can do this with any number of commands. Once again, if the information does not change, consider a single hit and holding in memory (caching through programming) or using some other type of cache mechanism.

If you are worried about performance, I don't think that reducing the amount of queries is the way to go. In fact, if you want fine-grain optimizations, LINQ isn't the most appropriate tool, either.
That being said, you could make the two different objects match the same interface/columns, filling with dummy properties for those missing in the other type. This should theoretically be translated to a SQL union containing all of the columns.
var union = context.Inventories
.Select(i => new { i.Id, i.Name, i.Stock, Inventories=null })
.Concat(context.Bundles.Select(b => new { b.Id, b.Name, Stock=0, b.Inventories));
Note that in this case Concat is preferred over of Union, as it doesn't alter the order of your rows and allows duplicate rows.

What you are asking for is not possible.
If you are trying to get two distinct results you need to hit the DB twice, once per result set. You can do an outer join, but this would make your result set larger than it should, and judging from the fact that you want to hit the DB once, you care about performance.
Also, if performance was an issue, I would not use Linq in the first place.

I think that you can do this by asking for multiple result sets (as Gregory pointed out already), I am not sure though what would be the "best" way for this. Why don't you take a look at this msdn article for example?
How to: Use Stored Procedures Mapped for Multiple Result Shape
Or this, for a linq-to-sql approach:
LINQ to SQL: returning multiple result sets

What is the most efficient way of retrieving data from a DataReader?

I spend a lot of time querying a database and then building collections of objects from the query. For performance I tend to use a Datareader and the code looks something like:
while(rdr.Read()){
var myObj = new myObj();
myObj.Id = Int32.Parse(rdr["Id"].ToString();
//more populating of myObj from rdr
myObj.Created = (DateTime)rdr["Created"];
}
For objects like DateTime I simply cast the rdr value to the required class, but this can't be done for value types like int hence the (IMHO) laborious ToString() followed by Int.Parse(...)
Of course there is an alternative:
myObj.Id = rdr.GetInt32(rdr.GetOrdinal("Id"));
which looks cleaner and doesn't involve a call to ToString().
A colleague and I were discussing this today - he suggests that accessing rdr twice in the above code might be less efficient that doing it my old skool way - could anyone confirm or deny this and suggest which of the above is the best way of doing this sort of thing? I would especially welcome answers from #JonSkeet ;-)

I doubt there will be a very appreciable performance difference, but you can avoid the name lookup on every row simply by lifting it out of the loop. This is probably the best you'll be able to achieve:
int idIdx = rdr.GetOrdinal("Id");
int createdIdx = rdr.GetOrdinal("Created");
while(rdr.Read())
{
var myObj = new myObj();
myObj.Id = rdr.GetFieldValue<int>(idIdx);
//more populating of myObj from rdr
myObj.Created = rdr.GetFieldValue<DateTime>(createdIdx);
}

I usually introduce a RecordSet class for this purpose:
public class MyObjRecordSet
{
private readonly IDataReader InnerDataReader;
private readonly int OrdinalId;
private readonly int OrdinalCreated;
public MyObjRecordSet(IDataReader dataReader)
{
this.InnerDataReader = dataReader;
this.OrdinalId = dataReader.GetOrdinal("Id");
this.OrdinalCreated = dataReader.GetOrdinal("Created");
}
public int Id
{
get
{
return this.InnerDataReader.GetInt32(this.OrdinalId);
}
}
public DateTime Created
{
get
{
return this.InnerDataReader.GetDateTime(this.OrdinalCreated);
}
}
public MyObj ToObject()
{
return new MyObj
{
Id = this.Id,
Created = this.Created
};
}
public static IEnumerable<MyObj> ReadAll(IDataReader dataReader)
{
MyObjRecordSet recordSet = new MyObjRecordSet(dataReader);
while (dataReader.Read())
{
yield return recordSet.ToObject();
}
}
}
Usage example:
List<MyObj> myObjects = MyObjRecordSet.ReadAll(rdr).ToList();

This makes the most sense to a reader. Whether it's the most "efficient" (you're literally calling two functions instead of one, it's not going to be as significant as casting, then calling a function). Ideally you should go with the option that looks more readable if it doesn't hurt your performance.
var ordinal = rdr.GetOrdinal("Id");
var id = rdr.GetInt32(ordinal);
myObj.Id = id;

Actually there is are differences in performance in how you use SqlDataReader, but they are somewhere else. Namely the ExecuteReader method accepts the CommandBehavior.SequentialAccess:
Provides a way for the DataReader to handle rows that contain columns with large binary values. Rather than loading the entire row, SequentialAccess enables the DataReader to load data as a stream. You can then use the GetBytes or GetChars method to specify a byte location to start the read operation, and a limited buffer size for the data being returned.
When you specify SequentialAccess, you are required to read from the columns in the order they are returned, although you are not required to read each column. Once you have read past a location in the returned stream of data, data at or before that location can no longer be read from the DataReader. When using the OleDbDataReader, you can reread the current column value until reading past it. When using the SqlDataReader, you can read a column value only once.
If you do not use large binary values then it makes very little difference. Getting a string and parsing is suboptimal, true, is better to get the value with rdr.SqlInt32(column) rather than a GetInt32() because of NULL. But the difference should not be noticeable on most application, unles your app is trully doing nothing else but read huge datasets. Most apps do not behave that way. Focusing on optimising the databse call itself(ie. have the query execute fast) will reap far greater benefits 99.9999% of the times.

For objects like DateTime I simply cast the rdr value to the required class, but this can't be done for value types like int
This isn't true: DateTime is also a value type and both of the following work in the same way, provided the field is of the expected type and is not null:
myObj.Id = (int) rdr["Id"];
myObj.Created = (DateTime)rdr["Created"];
If it's not working for you, perhaps the field you're reading is NULL? Or not of the required type, in which case you need to cast twice. E.g. for a SQL NUMERIC field, you might need:
myObj.Id = (int) (decimal) rdr["Id"];

Enumerable Chaining and Reset

I'm trying to import a file into a database and learn a more efficient way of doing things along the way. This article suggested chaining enumerations yields low memory usage and good performance.
This is my first time chaining several enumerations, and I'm not quite sure how to handle a reset appropriately...
Short story:
I have an enumeration which reads from a text file and maps to a DTO (see the Map Function), a Where enumerator, followed by an Import that takes an enumeration... It all works perfectly, except that when the filter returns 0 records... In that case, SQL errors saying System.ArgumentException: There are no records in the SqlDataRecord enumeration....
So I put a if(!list.Any()) return; at the top of my Import method, and it seems to work not error. Except it will always skip all the rows up to (and including) the first Valid row in the text file...
Do I need to somehow Reset() the enumerator after my Any() call? Why is this not necessary when the same struct is used in Linq and other Enumerable implementations?
Code:
public IEnumerable<DTO> Map(TextReader sr)
{
while (null != (line = sr.ReadLine()))
{
var dto = new DTO();
var row = line.Split('\t');
// ... mapping logic goes here ...
yield return (T)obj;
}
}
//Filter, called elsewhere
list.Where(x => Valid(x)).Select(x => LoadSqlRecord(x))
public void Import(IEnumerable<SqlDataRecord> list)
{
using (var conn = new SqlConnection(...))
{
if (conn.State != ConnectionState.Open)
conn.Open();
var cmd = conn.CreateCommand();
cmd.CommandText = "Import_Data";
cmd.CommandType = System.Data.CommandType.StoredProcedure;
var parm = new SqlParameter();
cmd.Parameters.Add(parm);
parm.ParameterName = "Data"
parm.TypeName = "udt_DTO";
parm.SqlDbType = SqlDbType.Structured;
parm.Value = list;
cmd.ExecuteNonQuery();
conn.Close();
}
}
Sorry for the long example. Thanks for reading...

The issue you are seeing is likely not because of the IEnumerable/IEnumerator interfaces, but rather with the underlying resources you are using to produce values.
For most enumerables, adding a list.Any() would not cause future enumerations of list to skip items because each call to list.GetEnumerator returns independent objects. You may have other reasons to not want to make multiple enumerators, such as multiple calls to the database via LINQ to SQL, but it every enumerator will get all the items.
In this case however, making multiple enumerators is not working because of the underlying resources. Based on the code you posted, I assume that the parameter passed to Import is based on a call to the Map method you posted. Each time through the enumerable returned from Map, you will "start" at the top of the method, but the TextReader and its current position is shared between all enumerators. Even if you did try to call Reset on the IEnumerators, this would not reset the TextReader. To solve your problem, you either need buffer the enumerable (eg ToList) or find a way to reset the TextReader.

Creating a generic retrieval method to return 1 record

I am working through a code sample, and I just want to hear some opinions on the way that they have done things. They use plain old ADO.NET. They have a generic function called Read that brings back 1 record. Here is the code:
public static T Read<T>(string storedProcedure, Func<IDataReader, T> make, object[] parms = null)
{
using (SqlConnection connection = new SqlConnection())
{
connection.ConnectionString = connectionString;
using (SqlCommand command = new SqlCommand())
{
command.Connection = connection;
command.CommandType = CommandType.StoredProcedure;
command.CommandText = storedProcedure;
command.SetParameters(parms);
connection.Open();
T t = default(T);
var reader = command.ExecuteReader();
if (reader.Read())
t = make(reader);
return t;
}
}
}
I don't know why:
They use Func<IDataReader, T> make as part of the method signature? Is it efficient to do it like this? Is there a better way/best practice to do this?
I don't understand T t = default(T);? Is it efficient to do it like this? Is there a better way/best practice to do this?
What does t = make(reader); do? Is it efficient to do it like this? Is there a better way/best practice to do this?
The calling function would look something like this:
public Customer GetCustomer(int customerId)
{
// Other code here
string storedProcedure = "MyStoredProcedure";
object[] parameters = { "#CustomerId", customerId };
return Db.Read(storedProcedure, Make, parameters);
}
private static Func<IDataReader, Customer> Make = reader =>
new Customer
{
CustomerId = reader["CustomerId"].AsId(),
Company = reader["CompanyName"].AsString(),
City = reader["City"].AsString
};
I don't understand the Func Make part? Can someone please explain to me what is happening here and if this is good practices. Any changes would be appreciated, but please provide detailed sample code :)

The delegate for the make (materialization) is pretty versatile and flexible, but IMO makes for a bit of unnecessary work in the vast majority of cases. In terms of what it does - they use the delegate as a callback to get the caller to specify how to read the record.
Note, since they don't expect the consumer to change record, they should probably be exposing IDataRecord, not IDataReader (any reader also implements record).
Note that if there are any error messages in the TDS stream after the first record, the approach shown won't see them - but that is an edge case. If you wanted to mitigate against that, you could read to the end of the TDS stream:
while(reader.NextResult()) {}
Personally, though, I'd just use dapper-dot-net here - avoids having to write that per-type code manually:
var cust = connection.Query<Customer>("MyStoredProcedure",
new { CustomerId = customerId },
commandType: CommandType.StoredProcedure).Single();
this executes MyStoredProcedure as a sproc, passing in #CustomerId with the value from customerId, then applies a direct column<===>property/field match to create Customer records, then asserts that there is exactly one result - and returns it.

They use Func make as part of the method signature? Is it efficient to do it like this? Is there a better way/best practice to do this?
Due to the generic nature of the method T is unknown so they don't know how to map the reader back to T properties. So they leave this responsibility to the caller of the method. Actually this is efficient enough and it's good practice.
I don't understand T t = default(T);? Is it efficient to do it like this? Is there a better way/best practice to do this?
Because T could be either value or reference type you cannot assign it to null. default(T) returns the default value for this type.In case of reference type this will be null. In case of a value type, for example integer, it will be 0.
What does t = make(reader); do? Is it efficient to do it like this? Is there a better way/best practice to do this?
It invokes the delegate that is passed and assigns the result to t.

C# function to return generic objects / entities

In my application I need to display a list of records returned by different stored procedures. Each store procedure returns different types of records (ie the number of columns and the column type are different).
My original thought was to create a class for each type of records and create a function which would execute the corresponding stored procedure and return List< MyCustomClass>. Something like this:
public class MyCustomClass1
{
public int Col1 { get; set; } //In reality the columns are NOT called Col1 and Col1 but have proper names
public int Col2 { get; set; }
}
public static List<MyCustomClass1> GetDataForReport1(int Param1)
{
List<MyCustomClass1> output = new List<MyCustomClass1>();
using (SqlConnection cn = new SqlConnection("MyConnectionString"))
using (SqlCommand cmd = new SqlCommand("MyProcNameForReport1", cn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#Param1", SqlDbType.Int).Value = Param1;
SqlDataReader rdr=cmd.ExecuteReader();
int Col1_Ordinal = rdr.GetOrdinal("Col1");
int Col2_Ordinal = rdr.GetOrdinal("Col2");
while (rdr.Read())
{
output.Add(new MyCustomClass1
{
Col1 = rdr.GetSqlInt32(Col1_Ordinal).Value,
Col2 = rdr.GetSqlInt32(Col2_Ordinal).Value
});
}
rdr.Close();
}
return output;
}
This works fine but as I don't need to manipulate those records in my client code (I just need to bind them to a graphical control in my application layer) it doesn't really make sense to do it this way as I would end up with plenty of custom classes that I wouldn't actually use. I found this which does the trick:
public static DataTable GetDataForReport1(int Param1)
{
DataTable output = new DataTable();
using (SqlConnection cn = new SqlConnection("MyConnectionString"))
using (SqlCommand cmd = new SqlCommand("MyProcNameForReport1", cn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#Param1", SqlDbType.Int).Value = Param1;
output.Load(cmd.ExecuteReader());
}
return output;
}
This returns a DataTable which I can bind to whatever control I use in my application layer. I'm wondering if using a DataTable is really needed.
Couldn't I return a list of objects created using anonymous classes:
public static List<object> GetDataForReport1(int Param1)
{
List<object> output = new List<object>();
using (SqlConnection cn = new SqlConnection("MyConnectionString"))
using (SqlCommand cmd = new SqlCommand("MyProcNameForReport1", cn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#Param1", SqlDbType.Int).Value = Param1;
SqlDataReader rdr=cmd.ExecuteReader();
int Col1_Ordinal = rdr.GetOrdinal("Col1");
int Col2_Ordinal = rdr.GetOrdinal("Col2");
while (rdr.Read())
{
output.Add(new
{
Col1 = rdr.GetSqlInt32(Col1_Ordinal).Value,
Col2 = rdr.GetSqlInt32(Col2_Ordinal).Value
});
}
rdr.Close();
}
return output;
}
Any other ideas? Basically I just want the function to return 'something' which I can bind to a graphical control and I prefer not to have to create custom classes as they wouldn’t really be used. What would be the best approach?

The good news is: this isn't the first time the entities vs datasets issue has come up. It's a debate much older than my own programming experience. If you don't want to write DTO's or custom entities then your options are DataTables/DataSets or you could reinvent this wheel again. You wouldn't be the first and you won't be the last. The argument for DTO's/Entities is that you don't have the performance overheads that you get with DataSets. DataSets have to store a lot of extra information about the datatypes of the various columns, etc...
One thing, you might be happy to hear is that if you go the custom entity route and you are happy for your object property names to match the column names returned by your sprocs, you can skip writing all those GetDataForReport() mapping functions where you are using GetOrdinal to map columns to properties. Luckily, some smart monkeys have properly sussed that issue here.
EDIT: I was researching an entirely different problem today (binding datasets to silverlight datagrids) and came across this article by Vladimir Bodurov, which shows how to transform an IEnumerable of IDictionary to an IEnumerable of dynamically created objects (using IL). It occured to me that you could easily modify the extension method to accept a datareader rather than the IEnumerable of IDictionary to solve your issue of dynamic collections. It's pretty cool. I think it would accomplish exactly what you were after in that you no longer need either the dataset or the custom entities. In effect you end up with a custom entity collection but you lose the overhead of writing the actual classes.
If you are lazy, here's a method that turns a datareader into Vladimir's dictionary collection (it's less efficient than actually converting his extension method):
public static IEnumerable<IDictionary> ToEnumerableDictionary(this IDataReader dataReader)
{
var list = new List<Dictionary<string, object>>();
Dictionary<int, string> keys = null;
while (dataReader.Read())
{
if(keys == null)
{
keys = new Dictionary<int, string>();
for (var i = 0; i < dataReader.FieldCount; i++)
keys.Add(i, dataReader.GetName(i));
}
var dictionary = keys.ToDictionary(ordinalKey => ordinalKey.Value, ordinalKey => dataReader[ordinalKey.Key]);
list.Add(dictionary);
}
return list.ToArray();
}

If you're going to XmlSerialize them and want that to be efficient, they can't be anonymous. Are you going to send back the changes? Anonymous classes wont be much use for that.
If you're really not going to do anything with the data other than gridify it, a DataTable may well be a good answer for your context.
In general, if there's any sort of interesting business login going on, you should be using Data Transfer Objects etc. rather than just ferrying around grids.

If you use anonymous classes, you are still defining each type. You just get to use Type Inference to reduce the amount of typing, and the class type will not clutter any namespaces.
The downside of using them here is that you want to return them out of the method. As you noted, the only way to do this is to cast them to objects, which makes all of their data inaccessable without reflection! Are you never going to do any processing or computation with this data? Display one of the columns as a picture instead of a value, or hide it based on some condition? Even if you are only ever going to have a single conditional statement, doing it with the anonymous class version will make you wish you hadn't.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.