What is the most efficient way of retrieving data from a DataReader?

What is the most efficient way of retrieving data from a DataReader? - c#

I spend a lot of time querying a database and then building collections of objects from the query. For performance I tend to use a Datareader and the code looks something like:
while(rdr.Read()){
var myObj = new myObj();
myObj.Id = Int32.Parse(rdr["Id"].ToString();
//more populating of myObj from rdr
myObj.Created = (DateTime)rdr["Created"];
}
For objects like DateTime I simply cast the rdr value to the required class, but this can't be done for value types like int hence the (IMHO) laborious ToString() followed by Int.Parse(...)
Of course there is an alternative:
myObj.Id = rdr.GetInt32(rdr.GetOrdinal("Id"));
which looks cleaner and doesn't involve a call to ToString().
A colleague and I were discussing this today - he suggests that accessing rdr twice in the above code might be less efficient that doing it my old skool way - could anyone confirm or deny this and suggest which of the above is the best way of doing this sort of thing? I would especially welcome answers from #JonSkeet ;-)

I doubt there will be a very appreciable performance difference, but you can avoid the name lookup on every row simply by lifting it out of the loop. This is probably the best you'll be able to achieve:
int idIdx = rdr.GetOrdinal("Id");
int createdIdx = rdr.GetOrdinal("Created");
while(rdr.Read())
{
var myObj = new myObj();
myObj.Id = rdr.GetFieldValue<int>(idIdx);
//more populating of myObj from rdr
myObj.Created = rdr.GetFieldValue<DateTime>(createdIdx);
}

I usually introduce a RecordSet class for this purpose:
public class MyObjRecordSet
{
private readonly IDataReader InnerDataReader;
private readonly int OrdinalId;
private readonly int OrdinalCreated;
public MyObjRecordSet(IDataReader dataReader)
{
this.InnerDataReader = dataReader;
this.OrdinalId = dataReader.GetOrdinal("Id");
this.OrdinalCreated = dataReader.GetOrdinal("Created");
}
public int Id
{
get
{
return this.InnerDataReader.GetInt32(this.OrdinalId);
}
}
public DateTime Created
{
get
{
return this.InnerDataReader.GetDateTime(this.OrdinalCreated);
}
}
public MyObj ToObject()
{
return new MyObj
{
Id = this.Id,
Created = this.Created
};
}
public static IEnumerable<MyObj> ReadAll(IDataReader dataReader)
{
MyObjRecordSet recordSet = new MyObjRecordSet(dataReader);
while (dataReader.Read())
{
yield return recordSet.ToObject();
}
}
}
Usage example:
List<MyObj> myObjects = MyObjRecordSet.ReadAll(rdr).ToList();

This makes the most sense to a reader. Whether it's the most "efficient" (you're literally calling two functions instead of one, it's not going to be as significant as casting, then calling a function). Ideally you should go with the option that looks more readable if it doesn't hurt your performance.
var ordinal = rdr.GetOrdinal("Id");
var id = rdr.GetInt32(ordinal);
myObj.Id = id;

Actually there is are differences in performance in how you use SqlDataReader, but they are somewhere else. Namely the ExecuteReader method accepts the CommandBehavior.SequentialAccess:
Provides a way for the DataReader to handle rows that contain columns with large binary values. Rather than loading the entire row, SequentialAccess enables the DataReader to load data as a stream. You can then use the GetBytes or GetChars method to specify a byte location to start the read operation, and a limited buffer size for the data being returned.
When you specify SequentialAccess, you are required to read from the columns in the order they are returned, although you are not required to read each column. Once you have read past a location in the returned stream of data, data at or before that location can no longer be read from the DataReader. When using the OleDbDataReader, you can reread the current column value until reading past it. When using the SqlDataReader, you can read a column value only once.
If you do not use large binary values then it makes very little difference. Getting a string and parsing is suboptimal, true, is better to get the value with rdr.SqlInt32(column) rather than a GetInt32() because of NULL. But the difference should not be noticeable on most application, unles your app is trully doing nothing else but read huge datasets. Most apps do not behave that way. Focusing on optimising the databse call itself(ie. have the query execute fast) will reap far greater benefits 99.9999% of the times.

For objects like DateTime I simply cast the rdr value to the required class, but this can't be done for value types like int
This isn't true: DateTime is also a value type and both of the following work in the same way, provided the field is of the expected type and is not null:
myObj.Id = (int) rdr["Id"];
myObj.Created = (DateTime)rdr["Created"];
If it's not working for you, perhaps the field you're reading is NULL? Or not of the required type, in which case you need to cast twice. E.g. for a SQL NUMERIC field, you might need:
myObj.Id = (int) (decimal) rdr["Id"];

Related

C# DataTable to Object Conversion Issue

I've written a piece of code for converting a DataTable object(which is created from an uploaded excel) to a list of custom object(ExcelTemplateRow in my case). While this method works fine when the values are supplied as expected(in terms of data type of the corresponding column), the code breaks and throws the below error when I try to give a random value(and hence the data type changes):
Object of type 'System.String' cannot be converted to type 'System.Nullable`1[System.Double]'
Below is the method for converting DataTable object to list:
public static List<T> ConvertToList<T>(DataTable dt)
{
var columnNames = dt.Columns.Cast<DataColumn>().Select(c => c.ColumnName.ToLower()).ToList();
var trimmedColumnNames = new List<string>();
foreach (var columnName in columnNames)
{
trimmedColumnNames.Add(columnName.Trim().ToLower());
}
var properties = typeof(T).GetProperties();
return dt.AsEnumerable().Select(row => {
var objT = Activator.CreateInstance<T>();
foreach (var property in properties)
{
if (trimmedColumnNames.Contains(property.Name.Trim().ToLower()))
{
try
{
if(row[property.Name] != DBNull.Value)
{
property.SetValue(objT, row[property.Name]);
}
else
{
property.SetValue(objT, null);
}
}
catch (Exception ex)
{
throw ex;
}
}
}
return objT;
}).ToList();
}
My custom object looks somewhat like this:
public class ExcelTemplateRow
{
public string? Country {get; set;}
public double? Year {get; set;}
//....
//....
}
In the excel that I'm uploading, for the Year field, the code works fine when I give proper double values viz 2020, 2021, 2022 etc. but the code breaks when I give something wrong e.g 2023g. It then assumes I'm passing a string and hence the error. I tried by changing the declaration of the Year property to public object? Year {get; set;} but it doesn't help. I want to make the method robust enough to handle such scenarios. Any help and I'd be highly grateful.

There's a few things to consider here, but I'll try to be as terse as possible. When you say:
but the code breaks when I give something wrong e.g 2023g
This means the C# type system is working exactly as intended. A double should never be able to accept the value of "2023g". You probably want to store the year as a string instead. This may involve an intermediate stage of validation, where you import all of your data as a string (ExcelTemplateRow should all be strings in this case).
Then your work is ahead of you to validate the data, and then once you've handled any errors, only then can you think about using types such as Double?. Although, you probably don't want to store your year as a double, an int might be more appropriate. Or maybe it isn't; perhaps you want to store the errors, because that's what a user has entered. Some careful consideration is required here. Don't rush with the type system, let it work for you, thinking about which datatypes to use will help you design the rest of your code.

The issue here is the data coming from the excel, not your code who behave correctly.
Say you try using the T is double? then you double.tryParse(row) and when it fails you take the 4 first caracter of the string 2023g but what will happen if you have another property of type double? but expect only 2 or 3 numbers but the user put some dummy stuff how do you will manage that ? It's impossible.
Fix the Excel not your code ;)
Log the error, send a message to the user to fix the data :)

Can you have a "strongly" typed DataReader result set?

What I'm trying to do is to safe guard my C# data retrieval code from IndexOutOfRangeException when using datareader.GetOrdinal(). This is no problem if my procedures don´t change and they just return just one result set. But if they return multiple result sets then I need to iterate the result sets with .NextResult(). That is no problem.
But what if somebody changes the procedure to have another select statement so that the order of my C# retrieval code changes and everything blows up?
But the question is: Can I check if the result is the result that I want?
Here below is the pseudo code for what I like to do.
using (SqlDataReader reader = cmd.ExecuteReader())
{
//The check I would like to be able to do
if(reader.Result != "The result with someColumnName")
{
//This is not the result Im looking for so I try the next one
reader.NextResult();
}
else //Get the result set I want.. If it blows up now it should..
{
if (reader.HasRows)
{
//Get all ordinals first. Faster than searching with index.
int someColumnNameOrdinal = reader.GetOrdinal("someColumnName");
while (reader.Read())
{
var someValue = reader.GetString(someColumnNameOrdinal );
}
}
}
}
I know that I could try to GetOrdinal, get exception, catch it and then try the next result, but that is just to damn unclean (and wrong).

Using OR/M is a good practice, but there are exceptions, of course. For example, you may be forced to query SQL server which does not support stored procs (SQL Server Compact etc.). To increase performance you may opt to using SqlDataReader. To ensure that your field names are always correct (correspond to your entity class fields) you may use the following practice - instead of hard-coding field names use this code instead:
GetPropertyName((YourEntityClass c) => c.YourField)
Where GetPropertyName function contains the following code (generics are used):
public static string GetPropertyName<T, TReturn>(System.Linq.Expressions.Expression<Func<T, TReturn>> expression)
{
System.Linq.Expressions.MemberExpression body = (System.Linq.Expressions.MemberExpression)expression.Body;
return body.Member.Name;
}
YourEntityClass is a class name for your table in Entity Framework, YourField is a field name in this table. In this case you have performance of SqlDataReader and safety of Entity Framework in the same time.

Assuming each of the result sets has a uniquely named first column, for the following line in your pseudo code example:
if(reader.Result != "The result with someColumnName")
you could use the SqlDataReader.GetName function to check the name of the first column. For example:
if(reader.GetName(0) != "ExpectedColumnName")

Is it unwise to use anonymous object + dynamic?

I have a class where one of the property return a List<object>. Inside that list I put a set of anonymous objects.
Then later of, I have a loop using that property's item as dynamic variable.
So my code looks like this:
private List<object> BookerTypes
{
get
{
if (this.bookerTypes == null)
{
this.bookerTypes = new List<object>();
var com = new SqlConnection(functions.ConnectionString).CreateCommand();
com.CommandText = #"
SELECT
BT.id
, BT.name
FROM dbo.BookerTypes AS BT
ORDER BY BT.name ASC
";
com.Connection.Open();
try
{
using (var dr = com.ExecuteReader())
{
while (dr.Read())
{
this.bookerTypes.Add(new { id = dr.GetInt32(0), name = dr.GetString(1) });
}
}
}
finally
{
com.Connection.Close();
}
}
return this.bookerTypes;
}
}
[...]
this.cblSBT.Items.Clear();
foreach(dynamic bt in this.BookerTypes)
{
this.cblSBT.Items.Add(new ListItem()
{
Value = bt.id.ToString()
, Text = bt.name
, Selected = this.competition.SubscriptionTypes.Contains((int)bt.id)
});
}
Aside from the obvious lost of strongly typed type, is there any reason I should not do this?

The primary reason not to do this is, as you said, you've lost your static typing. There are also performance costs associated with it as well, but they're less important than the noticable problem this code has in terms of readability and maintainability.
If it turns out that you misspell or mistype a variable name you don't get compile time checking (and it's easier to do without code completion support). You also don't have any effective means of knowing, at compile time, what variables might exist in the List<object> you're given. It becomes a non-trivial task to track down the source of that list to figure out what variables might be there to use.
It is almost certainly worth the time and effort to create a new named type instead of using an anonymous type when you're in this situation. The small up front cost of creating the new class is virtually always going to pay off.

On top of the type-safety loss & other concerns which have already been pointed out, I feel using dynamic here is just plain wrong.
The general use-case for dynamic is for consuming data from external sources e.g. API/COM etc. basically scenarios where the type of information isn't already clearly defined. In your scenario, you have control over what data you are asking for and you know what type of data to expect therefore I can't really justify why you would want to use it over the benefits gained from having a clearly defined, type-safe model.
Is it unwise to use anonymous object + dynamic?
In your scenario, I would argue yes.

Is there a better/faster way to access raw data?

This isn't related to a particular issue BUT is a question regarding "best practise".
For a while now, when I need to get data straight from the database I've been using the following method - I was wondering if there's a faster method which I don't know about?
DataTable results = new DataTable();
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["Name"]))
{
connection.Open();
using (SqlCommand command = new SqlCommand("StoredProcedureName",connection))
{
command.CommandType = CommandType.StoredProcedure;
/*Optionally set command.Parameters here*/
results.Load(command.ExecuteReader());
}
}
/*Do something useful with the results*/

There are indeed various ways of reading data; DataTable is quite a complex beast (with support for a number of complex scenarios - referential integrity, constraints, computed values, on-the-fly extra columns, indexing, filtering, etc). In a lot of cases you don't need all that; you just want the data. To do that, a simple object model can be more efficient, both in memory and performance. You could write your own code around IDataReader, but that is a solved problem, with a range of tools that do that for you. For example, you could do that via dapper with just:
class SomeTypeOfRow { // define something that looks like the results
public int Id {get;set;}
public string Name {get;set;}
//..
}
...
var rows = connection.Query<SomeTypeOfRow>("StoredProcedureName",
/* optionalParameters, */ commandType: CommandType.StoredProcedure).ToList();
which then very efficiently populates a List<SomeTypeOfRow>, without all the DataTable overheads. Additionally, if you are dealing with very large volumes of data, you can do
this in a fully streaming way, so you don't need to buffer 2M rows in memory:
var rows = connection.Query<SomeTypeOfRow>("StoredProcedureName",
/* optionalParameters, */ commandType: CommandType.StoredProcedure,
buffered: false); // an IEnumerable<SomeTypeOfRow>
For completeness, I should explain optionalParameters; if you wanted to pass #id=1, #name="abc", that would be just:
var rows = connection.Query<SomeTypeOfRow>("StoredProcedureName",
new { id = 1, name = "abc" },
commandType: CommandType.StoredProcedure).ToList();
which is, I think you'll agree, a pretty concise way of describing the parameters. This parameter is entirely optional, and can be omitted if no parameters are required.
As an added bonus, it means you get strong-typing for free, i.e.
foreach(var row in rows) {
Console.WriteLine(row.Id);
Console.WriteLine(row.Name);
}
rather than having to talk about row["Id"], row["Name"] etc.

What is the best way to store static data in C# that will never change

I have a class that stores data in asp.net c# application that never changes. I really don't want to put this data in the database - I would like it to stay in the application. Here is my way to store data in the application:
public class PostVoteTypeFunctions
{
private List<PostVoteType> postVotes = new List<PostVoteType>();
public PostVoteTypeFunctions()
{
PostVoteType upvote = new PostVoteType();
upvote.ID = 0;
upvote.Name = "UpVote";
upvote.PointValue = PostVotePointValue.UpVote;
postVotes.Add(upvote);
PostVoteType downvote = new PostVoteType();
downvote.ID = 1;
downvote.Name = "DownVote";
downvote.PointValue = PostVotePointValue.DownVote;
postVotes.Add(downvote);
PostVoteType selectanswer = new PostVoteType();
selectanswer.ID = 2;
selectanswer.Name = "SelectAnswer";
selectanswer.PointValue = PostVotePointValue.SelectAnswer;
postVotes.Add(selectanswer);
PostVoteType favorite = new PostVoteType();
favorite.ID = 3;
favorite.Name = "Favorite";
favorite.PointValue = PostVotePointValue.Favorite;
postVotes.Add(favorite);
PostVoteType offensive = new PostVoteType();
offensive.ID = 4;
offensive.Name = "Offensive";
offensive.PointValue = PostVotePointValue.Offensive;
postVotes.Add(offensive);
PostVoteType spam = new PostVoteType();
spam.ID = 0;
spam.Name = "Spam";
spam.PointValue = PostVotePointValue.Spam;
postVotes.Add(spam);
}
}
When the constructor is called the code above is ran. I have some functions that can query the data above too. But is this the best way to store information in asp.net? if not what would you recommend?

This is a candidate for an immutable struct that "looks like" an enumeration:
(Also, I noticed you used the same id value for two of them, so I fixed that...
You can use the following just as you would an enumeration...
PostVoteTypeFunctions myVar = PostVoteTypeFunctions.UpVote;
and real nice thing is that this approach requires no instance storage other than a 4-byte integer (which will be stored on stack, since it's a struct). All hard-coded values are stored in the type itself... of which only one will exist per AppDomain...
public struct PostVoteTypeFunctions
{
private int id;
private bool isDef;
private PostVoteTypeFunctions ( ) { } // private to prevent direct instantiation
private PostVoteTypeFunctions(int value) { id=value; isDef = true; }
public bool HasValue { get { return isDef; } }
public bool isNull{ get { return !isDef; } }
public string Name
{
get
{ return
id==1? "UpVote":
id==2? "DownVote":
id==3? "SelectAnswer":
id==4? "Favorite":
id==5? "Offensive":
id==6? "Spam": "UnSpecified";
}
}
public int PointValue
{
get
{ return // Why not hard code these values here as well ?
id==1? PostVotePointValue.UpVote:
id==2? PostVotePointValue.DownVote
id==3? PostVotePointValue.SelectAnswer:
id==4? PostVotePointValue.Favorite:
id==5? PostVotePointValue.Offensive:
id==6? PostVotePointValue.Spam:
0;
}
}
// Here Add additional property values as property getters
// with appropriate hardcoded return values using above pattern
// following region is the static factories that create your instances,
// .. in a way such that using them appears like using an enumeration
public static PostVoteTypeFunctions UpVote = new PostVoteTypeFunctions(1);
public static PostVoteTypeFunctions DownVote= new PostVoteTypeFunctions(2);
public static PostVoteTypeFunctions SelectAnswer= new PostVoteTypeFunctions(3);
public static PostVoteTypeFunctions Favorite= new PostVoteTypeFunctions(4);
public static PostVoteTypeFunctions Offensive= new PostVoteTypeFunctions(5);
public static PostVoteTypeFunctions Spam= new PostVoteTypeFunctions(0);
}

It is difficult to tell from the fragment of code you have posted whether you expose any of the data outside the class.
If not, then this would work. However, if not, there are several issues:
If you are exposing the List, you should only ever return a copy of it as an IEnumerable<PostVoteType> using the yield keyword.
Make sure your PostVoteType is immutable, otherwise the references can be changed and the fields used might be altered

Looking at your code, it looks like you're just trying to create a set of objects that really just put the enum PostVotePointValue into some sort of list. I.e. you already have what you need defined in just the enum itself. I would encourage you to not define the same information in two places (this data store you are asking for and the enum). This is common mistake I see people make. They create a lookup table/list, then create an enum that mirrors the rows of the table and that means they have to modify two places for any change to the list.
If PostVotePointValue isn't an enum but just some constants or if there is more info you are planning on packing in, then this isn't relevant.
Here's some examples of how to work with Enums as 'lists' from http://www.csharp-station.com/Tutorials/Lesson17.aspx
// iterate through Volume enum by name
public void ListEnumMembersByName()
{
Console.WriteLine("\n---------------------------- ");
Console.WriteLine("Volume Enum Members by Name:");
Console.WriteLine("----------------------------\n");
// get a list of member names from Volume enum,
// figure out the numeric value, and display
foreach (string volume in Enum.GetNames(typeof(Volume)))
{
Console.WriteLine("Volume Member: {0}\n Value: {1}",
volume, (byte)Enum.Parse(typeof(Volume), volume));
}
}
// iterate through Volume enum by value
public void ListEnumMembersByValue()
{
Console.WriteLine("\n----------------------------- ");
Console.WriteLine("Volume Enum Members by Value:");
Console.WriteLine("-----------------------------\n");
// get all values (numeric values) from the Volume
// enum type, figure out member name, and display
foreach (byte val in Enum.GetValues(typeof(Volume)))
{
Console.WriteLine("Volume Value: {0}\n Member: {1}",
val, Enum.GetName(typeof(Volume), val));
}
}
}
You should be able to adapt the above into an approach that will give you a list that you can use for databinding if you need it.

I am wondering why you could not just use a simple enum for this?
public enum PostVoteType
{
UpVote = 0,
DownVote = 1,
SelectAnswer = 2,
Favorite = 3,
Offensize = 4,
Spam = 5
}

"Never" is a very hard word indeed.
In your particular case you are asserting that not only is your PostVoteType data absolute and immutable, but so is the container collection. Frankly I don't believe you can know that, because you are not the business (your interpretation of requirement is imperfect) and you are not psychic (your knowledge of the future is imperfect).
I would suggest that you always store any data which cannot be expressed as an enumeration in some kind of repository. Where you expect relational and/or transactional and/or mutable needs that means a database, if you expect high read to write ratio that can be a config file (which I believe this case should be).
Edit: In terms of memory persistance I agree with others that the cache is the best place to store this, or rather in a domain object which is backed by cache.
Aside: your construction of PostVoteTypes is horrible - strongly suggest you want a refactor :)

If it doesn't change, is commonly accessed, and is the same for all users, then the .NET cache is the proper place. Have a property that yields these values. Inside, the property checks the cache for this list and returns the stored value; otherwise, it constructs it from scratch, adds to the cache, and returns the value.
This should still probably be configured in the database though, even if you cache it. I imagine that you'll need to use these value in conjunction with other data in your DB.

When you need to often access the same data, and need not to store it into the underlying database, and that this data is about the same in every situation the application may encounter, then I suggest to use caching. Caching is born from these requirements. Caching is normally the fastest way to providing data as they are always kept in memory or something similar to ease and to make the access easier by the application.
Here is a nice caching tool provided with Microsoft Enterprise Library, the Caching Application Block.
I think it is worth to take the time to learn how to use it effectively.

create a singleton class.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.