How to improve speed of generic loop while reading from database - c#

I have a mapping function that reads from the data loaded from the database.
It maps onto a class that is through generic.
The problem is that it takes 3 minutes for looping through a 10000 records.
I would like to improve its performance I am looking for a solution for this?
public static List<T> GetList<T>(string query = null, string whereClause = null, Dictionary<string, string> Params = null)
{
var results = new List<T>();
var properties = GetReadableProperties<T>();
query = QueryMaker.SelectQuery<T>(query, whereClause);
using (SqlConnection Connection = new SqlConnection(ConnectionString))
{
Connection.Open();
SqlCommand cmd = GetSqlCommandWithParams(query, Connection, Params);
SqlDataReader reader = cmd.ExecuteReader();
if (reader.HasRows)
{
// From here
while (reader.Read())
{
var item = Activator.CreateInstance<T>();
foreach (var property in properties)
{
DBReader(item, property, reader);
}
results.Add(item);
}
// To here. It takes 3 minutes. reading a 10000 record from database into reader isn't as slow as this code is.
}
Connection.Close();
return results;
}
}
This is the DBReader function:
private static void DBReader(Object _Object, PropertyInfo property, SqlDataReader reader)
{
if (!reader.IsDBNull(reader.GetOrdinal(property.Name)))
{
Type convertTo = Nullable.GetUnderlyingType(property.PropertyType) ?? property.PropertyType;
property.SetValue(_Object, Convert.ChangeType(reader[property.Name], convertTo), null);
}
else
{
property.SetValue(_Object, null);
}
}

I can see a number of things you could theoretically improve. For example:
store the result of GetOrdinal to reuse when getting values.
use a generic new() instead of Activator.CreateInstance.
Rather than using reflection, you could precompile an expression into a Func<> and invoke it for each row.
Call the appropriate Get...() methods based on the desired type to avoid boxing and unboxing.
But all of these are micro-optimizations and even combining them all together is unlikely to make a significant difference. The amount of time you're describing (minutes for tens of thousands of rows) is unlikely to be caused by any of these kinds of issues. It's most likely coming from I/O operations somewhere, but it's impossible to be certain without profiling the code to see where the most time is being spent. Profile your code, and then use those results to inform what approach you take.

Related

Convert SqlDataReader to linq expression

When looking at examples on the internet on how to use SqlDataReader I found things like:
var command = new SqlCommand( // initialize with query and connection...
var reader = command.ExecuteReader();
while(reader.Read())
{
var foo = reader["SomeColName"];
// etc
}
Can I use the following extension method:
public static IEnumerable<IDataReader> ToEnumerable(this IDataReader reader)
{
while (reader.Read())
yield return reader;
}
In order to execute queries using linq?
If I use the following extension method will it be bad? If I care about performance should I use the first implementation?
You just need to cast your IEnumerable to IEnumerable<IDataRecord>
var enumerable = reader.Cast<IDataRecord>();
var col = enumerable.Select(record => record .GetOrdinal("SomeColName"));
I can't see how your proposed extension would help you, as you only get an enumeration containing the same instance of IDataReader for each read row.
I'm not sure if there is any good way to combine IDataReader with LINQ, but you may want something like that:
public static IEnumerable<Dictionary<string, object>> ToEnumerable(this IDataReader reader)
{
while (reader.Read())
{
Dictionary<string, object> result = new Dictionary<string, object>();
for (int column = 0; column < reader.FieldCount; column++)
result.Add(reader.GetName(column), reader.GetValue(column));
yield return result;
}
}
This will return a sequence of dictionaries (one dictionary per row) which map the column name to the object in that "cell".
Unfortunatly, you lose type safety and create memory overhead. The specific implementation would still depend on what you really want to achieve. Personally, I always use only the column index instead of the name.
That said, I think the best approach is still to use the IDataReader directly to create pocos and to use this pocos with LINQ:
IEnumerable<string> GetSomeColValue(IDataReader reader)
{
while(reader.Read())
yield return reader.GetString(reader.GetOrdinal("SomeColName"));
}
Of course this is a shortened example. In real code I would read out all data from the reader completely and close/dispose the reader instance inside the method that called ExecuteReader before returning anything.

SELECT into Array

I use ExecuteReader().
It returns only the last result. I wanted to display the result like an array in tbid_1.Text, tbid_1.Text, tbid_1.Text etc.
public void Select(FrmVIRGO frm)
{
string query = "SELECT* FROM tb_patient_information ";
if (this.OpenConnection() == true)
{ //Create Command
MySqlCommand cmd = new MySqlCommand(query, connection);
//Create a data reader and Execute the command
MySqlDataReader dataReader = cmd.ExecuteReader();
while (dataReader.Read())
{
// I think use like this
frm.tbid_1.Text = dataReader["id_patient"][1].ToString(); //... id_patient1
frm.tbid_2.Text = dataReader["id_patient"][2].ToString(); //... id_patient2
frm.tbid_3.Text = dataReader["id_patient"][3].ToString(); //... id_patient3
}
//close Data Reader
dataReader.Close();
//close Connection
this.CloseConnection();
}
}
Your code appears to be expecting that once you've called dataReader.Read(), you can access all of the records by an index.
Your data reader is an instance of IDataReader, the interface that most .NET data access libraries use to represent the concept of "reading the results of a query". IDataReader only gives you access to one record at a time. Each time you call dataReader.Read(), the IDataReader advances to the next record. When it returns false, it means that you have reached the end of the result set.
For example, you could transform your code above to something like this:
dataReader.Read(); // dataReader is at 1st record
frm.tbid_1.Text = dataReader["id_patient"].ToString();
dataReader.Read(); // dataReader is at 2nd record
frm.tbid_2.Text = dataReader["id_patient"].ToString();
dataReader.Read(); // dataReader is at 3rd record
frm.tbid_3.Text = dataReader["id_patient"].ToString();
Note that this is not the way you should do it, I'm just using it to illustrate the way a DataReader works.
If you are expecting exactly 3 records to be returned, you could use something similar to the code above. I would modify it to verify that dataReader.Read() returns true before you read data from each record, however, and handle a case where it doesn't in a meaningful way (e.g., throw an exception that explains the error, log the error, etc.).
Generally though, if I am working with raw ADO.Net (as opposed to using an OR/M) I prefer to convert each record in the IDataReader to a dictionary beforehand, and work with those.
For example, you could write the following extension method for DataReader:
public static class DataReaderExtensions
{
public static IList<IDictionary<string, object>> ListRecordsAsDictionaries(this IDataReader reader)
{
var list = new List<IDictionary<string, object>>();
while (reader.Read())
{
var record = new Dictionary<string, object>();
for (var i = 0; i < reader.FieldCount; i++)
{
var key = reader.GetName(i);
var value = reader[i];
record.Add(key, value);
}
list.Add(record);
}
return list;
}
}
This method iterates over the IDataReader, and sticks the values from each row into a Dictionary<string, object>. I have found this pattern to generally be fairly useful when dealing with raw ADO stuff.
This approach has a couple of caveats:
You don't get access to the records (the Dictionary instances) until all of the data has been received from the server. If you are handling records individually as the DataReader makes them available, you may actually be able to start processing the data while some of it is still in-transit. (Note: This could be fixed by making this method return IEnumerable<IDictionary<string, object>> instead of IList<IDictionary<string, object>>, and using yield return to yield each record as it becomes available.)
If you are iterating over a large data set, you may not want to instantiate that many dictionaries. It may be better to just handle each record individually instead.
You lose access to some of the information that DataReader can provide about the record(s) (e.g., you can't use DataReader.GetDataTypeName as you are iterating over the records).

Enumerable Chaining and Reset

I'm trying to import a file into a database and learn a more efficient way of doing things along the way. This article suggested chaining enumerations yields low memory usage and good performance.
This is my first time chaining several enumerations, and I'm not quite sure how to handle a reset appropriately...
Short story:
I have an enumeration which reads from a text file and maps to a DTO (see the Map Function), a Where enumerator, followed by an Import that takes an enumeration... It all works perfectly, except that when the filter returns 0 records... In that case, SQL errors saying System.ArgumentException: There are no records in the SqlDataRecord enumeration....
So I put a if(!list.Any()) return; at the top of my Import method, and it seems to work not error. Except it will always skip all the rows up to (and including) the first Valid row in the text file...
Do I need to somehow Reset() the enumerator after my Any() call? Why is this not necessary when the same struct is used in Linq and other Enumerable implementations?
Code:
public IEnumerable<DTO> Map(TextReader sr)
{
while (null != (line = sr.ReadLine()))
{
var dto = new DTO();
var row = line.Split('\t');
// ... mapping logic goes here ...
yield return (T)obj;
}
}
//Filter, called elsewhere
list.Where(x => Valid(x)).Select(x => LoadSqlRecord(x))
public void Import(IEnumerable<SqlDataRecord> list)
{
using (var conn = new SqlConnection(...))
{
if (conn.State != ConnectionState.Open)
conn.Open();
var cmd = conn.CreateCommand();
cmd.CommandText = "Import_Data";
cmd.CommandType = System.Data.CommandType.StoredProcedure;
var parm = new SqlParameter();
cmd.Parameters.Add(parm);
parm.ParameterName = "Data"
parm.TypeName = "udt_DTO";
parm.SqlDbType = SqlDbType.Structured;
parm.Value = list;
cmd.ExecuteNonQuery();
conn.Close();
}
}
Sorry for the long example. Thanks for reading...
The issue you are seeing is likely not because of the IEnumerable/IEnumerator interfaces, but rather with the underlying resources you are using to produce values.
For most enumerables, adding a list.Any() would not cause future enumerations of list to skip items because each call to list.GetEnumerator returns independent objects. You may have other reasons to not want to make multiple enumerators, such as multiple calls to the database via LINQ to SQL, but it every enumerator will get all the items.
In this case however, making multiple enumerators is not working because of the underlying resources. Based on the code you posted, I assume that the parameter passed to Import is based on a call to the Map method you posted. Each time through the enumerable returned from Map, you will "start" at the top of the method, but the TextReader and its current position is shared between all enumerators. Even if you did try to call Reset on the IEnumerators, this would not reset the TextReader. To solve your problem, you either need buffer the enumerable (eg ToList) or find a way to reset the TextReader.

Enforce only single row returned from DataReader

I seem to write this quite a lot in my code:
using (var reader = cmd.ExecuteReader())
{
if (reader.Read())
{
result = new User((int)reader["UserId"], reader["UserName"].ToString());
}
if (reader.Read())
{
throw new DataException("multiple rows returned from query");
}
}
Is there some built in way to do this that I don't know about?
I don't know, but this code can be delegated into an extension method:
public static R Single<R>(this DataReader reader, Func<DataReader,R> selector) {
R result = default(R);
if (reader.Read())
result = selector(reader);
if (reader.Read())
throw new DataException("multiple rows returned from query");
return result;
}
to be used like that:
using (var reader = cmd.ExecuteReader())
{
User u = reader.Single(r => new User((int)r["UserId"], r["UserName"].ToString()))
}
Saving you from code duplication.
This may or may not help depending on what your goal is. If you need to detect that multiple rows were returned in order to throw an appropriate exception, then this won't help.
If you just want to make sure that only one result is returned, you can potentially get a performance bump by using this method. From what I understand, data providers can use this to optimize the query in anticipation of a single row result.
In any case, what you'll want to do is use SqlCommand.ExecuteReader to create your data reader, but pass in an argument from the CommandBehavior enumeration (specifically CommandBehavior.SingleRow). ExecuteReader is overloaded to accept this.
CommandBehavior enum
SqlCommand.ExecuteReader overload
So your code might look like this:
using (var reader = cmd.ExecuteReader(CommandBehavior.SingleRow))
{
if (reader.Read())
{
result = new User((int)reader["UserId"], reader["UserName"].ToString());
}
}
If you are using a sql to fetch your'e data this might help by letting you remove that kind of coding in every instance that you need to use a data reader.
SELECT TOP ([Number of rows you want to be selected])
FROM [Table Name]
WHERE [Condition]
EX:
SELECT TOP (1)
FROM tblUsers
WHERE Username = 'Allan Chua'
Another tip use stored procedures, Using them could minimize the repetition of SQL query and unnecessary coding.

C# function to return generic objects / entities

In my application I need to display a list of records returned by different stored procedures. Each store procedure returns different types of records (ie the number of columns and the column type are different).
My original thought was to create a class for each type of records and create a function which would execute the corresponding stored procedure and return List< MyCustomClass>. Something like this:
public class MyCustomClass1
{
public int Col1 { get; set; } //In reality the columns are NOT called Col1 and Col1 but have proper names
public int Col2 { get; set; }
}
public static List<MyCustomClass1> GetDataForReport1(int Param1)
{
List<MyCustomClass1> output = new List<MyCustomClass1>();
using (SqlConnection cn = new SqlConnection("MyConnectionString"))
using (SqlCommand cmd = new SqlCommand("MyProcNameForReport1", cn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#Param1", SqlDbType.Int).Value = Param1;
SqlDataReader rdr=cmd.ExecuteReader();
int Col1_Ordinal = rdr.GetOrdinal("Col1");
int Col2_Ordinal = rdr.GetOrdinal("Col2");
while (rdr.Read())
{
output.Add(new MyCustomClass1
{
Col1 = rdr.GetSqlInt32(Col1_Ordinal).Value,
Col2 = rdr.GetSqlInt32(Col2_Ordinal).Value
});
}
rdr.Close();
}
return output;
}
This works fine but as I don't need to manipulate those records in my client code (I just need to bind them to a graphical control in my application layer) it doesn't really make sense to do it this way as I would end up with plenty of custom classes that I wouldn't actually use. I found this which does the trick:
public static DataTable GetDataForReport1(int Param1)
{
DataTable output = new DataTable();
using (SqlConnection cn = new SqlConnection("MyConnectionString"))
using (SqlCommand cmd = new SqlCommand("MyProcNameForReport1", cn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#Param1", SqlDbType.Int).Value = Param1;
output.Load(cmd.ExecuteReader());
}
return output;
}
This returns a DataTable which I can bind to whatever control I use in my application layer. I'm wondering if using a DataTable is really needed.
Couldn't I return a list of objects created using anonymous classes:
public static List<object> GetDataForReport1(int Param1)
{
List<object> output = new List<object>();
using (SqlConnection cn = new SqlConnection("MyConnectionString"))
using (SqlCommand cmd = new SqlCommand("MyProcNameForReport1", cn))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#Param1", SqlDbType.Int).Value = Param1;
SqlDataReader rdr=cmd.ExecuteReader();
int Col1_Ordinal = rdr.GetOrdinal("Col1");
int Col2_Ordinal = rdr.GetOrdinal("Col2");
while (rdr.Read())
{
output.Add(new
{
Col1 = rdr.GetSqlInt32(Col1_Ordinal).Value,
Col2 = rdr.GetSqlInt32(Col2_Ordinal).Value
});
}
rdr.Close();
}
return output;
}
Any other ideas? Basically I just want the function to return 'something' which I can bind to a graphical control and I prefer not to have to create custom classes as they wouldn’t really be used. What would be the best approach?
The good news is: this isn't the first time the entities vs datasets issue has come up. It's a debate much older than my own programming experience. If you don't want to write DTO's or custom entities then your options are DataTables/DataSets or you could reinvent this wheel again. You wouldn't be the first and you won't be the last. The argument for DTO's/Entities is that you don't have the performance overheads that you get with DataSets. DataSets have to store a lot of extra information about the datatypes of the various columns, etc...
One thing, you might be happy to hear is that if you go the custom entity route and you are happy for your object property names to match the column names returned by your sprocs, you can skip writing all those GetDataForReport() mapping functions where you are using GetOrdinal to map columns to properties. Luckily, some smart monkeys have properly sussed that issue here.
EDIT: I was researching an entirely different problem today (binding datasets to silverlight datagrids) and came across this article by Vladimir Bodurov, which shows how to transform an IEnumerable of IDictionary to an IEnumerable of dynamically created objects (using IL). It occured to me that you could easily modify the extension method to accept a datareader rather than the IEnumerable of IDictionary to solve your issue of dynamic collections. It's pretty cool. I think it would accomplish exactly what you were after in that you no longer need either the dataset or the custom entities. In effect you end up with a custom entity collection but you lose the overhead of writing the actual classes.
If you are lazy, here's a method that turns a datareader into Vladimir's dictionary collection (it's less efficient than actually converting his extension method):
public static IEnumerable<IDictionary> ToEnumerableDictionary(this IDataReader dataReader)
{
var list = new List<Dictionary<string, object>>();
Dictionary<int, string> keys = null;
while (dataReader.Read())
{
if(keys == null)
{
keys = new Dictionary<int, string>();
for (var i = 0; i < dataReader.FieldCount; i++)
keys.Add(i, dataReader.GetName(i));
}
var dictionary = keys.ToDictionary(ordinalKey => ordinalKey.Value, ordinalKey => dataReader[ordinalKey.Key]);
list.Add(dictionary);
}
return list.ToArray();
}
If you're going to XmlSerialize them and want that to be efficient, they can't be anonymous. Are you going to send back the changes? Anonymous classes wont be much use for that.
If you're really not going to do anything with the data other than gridify it, a DataTable may well be a good answer for your context.
In general, if there's any sort of interesting business login going on, you should be using Data Transfer Objects etc. rather than just ferrying around grids.
If you use anonymous classes, you are still defining each type. You just get to use Type Inference to reduce the amount of typing, and the class type will not clutter any namespaces.
The downside of using them here is that you want to return them out of the method. As you noted, the only way to do this is to cast them to objects, which makes all of their data inaccessable without reflection! Are you never going to do any processing or computation with this data? Display one of the columns as a picture instead of a value, or hide it based on some condition? Even if you are only ever going to have a single conditional statement, doing it with the anonymous class version will make you wish you hadn't.

Categories