Does ADO.NET go overboard with IDisposable? - c#

When .NET first came out, I was one of many who complained about .NET's lack of deterministic finalization (class destructors being called on an unpredictable schedule). The compromise Microsoft came up with at the time was the using statement.
Although not perfect, I think using using is important to ensuring that unmanaged resources are cleaned up in a timely manner.
However, I'm writing some ADO.NET code and noticed that almost every class implements IDisposable. That leads to code that looks like this.
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = new SqlCommand(query, connection))
using (SqlDataAdapter adapter = new SqlDataAdapter(command))
using (SqlCommandBuilder builder = new SqlCommandBuilder(adapter))
using (DataSet dataset = new DataSet())
{
command.Parameters.AddWithValue("#FirstValue", 2);
command.Parameters.AddWithValue("#SecondValue", 3);
adapter.Fill(dataset);
DataTable table = dataset.Tables[0];
foreach (DataRow row in table.Rows) // search whole table
{
if ((int)row["Id"] == 4)
{
row["Value2"] = 12345;
}
else if ((int)row["Id"] == 5)
{
row.Delete();
}
}
adapter.Update(table);
}
I strongly suspect I do not need all of those using statements. But without understanding the code for each class in some detail, it's hard to be certain which ones I can leave out. The results is kind of ugly and detracts from the main logic in my code.
Does anyone know why all these classes need to implement IDisposable? (Microsoft has many code examples online that don't worry about disposing many of these objects.) Are other developers writing using statements for all of them? And, if not, how do you decide which ones can do without?

One part of the problem here is that ADO.NET is an abstract provider model. We don't know what a specific implementation (a specific ADO.NET provider) will need with regards to disposal. Sure, we can reasonably assume the connection and transaction need to be disposed, but the command? Maybe. Reader? Probably, not least because one of the command-flags options allows you to associate a connection's lifetime to the reader (so the connection closes when the reader does, which should logically extend to disposal).
So overall, I think it is probably fine.
Most of the time, people aren't messing with ADO.NET by hand, and any ORM tools (or micro-ORM tools, such as "Dapper") will get this right for you without you having to worry about it.
I will openly confess that on the few occasions when I've used DataTable (seriously, it is 2018 - that shouldn't be your default model for representing data, except for some niche scenarios): I haven't disposed them. That one kinda makes no sense :)

If a class implements IDisposable, then you should always make sure that it gets disposed, without making any assumptions about its implementation.
I think that's the minimal amount of code that you could possibly write that ensures that your objects will be disposed. I see absolutely no problem with it, and it does not distract me at all from the main logic of my code.
If reducing this code is of paramount importance to you, then you can come up with your own replacement (wrapper) of SqlConnection whose child classes do not implement IDisposable and instead get automatically destroyed by the connection once the connection is closed. However, that would be a huge amount of work, and you would lose some precision with regards to when something gets disposed.

Yes. almost anything in ADO.Net is implementing IDisposable, whether that's actually needed - in case of, say, SqlConnection(because of connection pooling) or not - in case of, say, DataTable (Should I Dispose() DataSet and DataTable?
).
The problem is, as noted in the question:
without understanding the code for each class in some detail, it's hard to be certain which ones I can leave out.
I think that this alone is a good enough reason to keep everything inside a using statement - in one word: Encapsulation.
In many words:
It shouldn't be necessary to get intimately familiar, or even remotely familiar for that matter, with the implementation of every class you are working with. You only need to know the surface area - meaning public methods, properties, events, indexers (and fields, if that class has public fields). from the user of a class point of view, anything other then it's public surface area is an implementation detail.
Regarding all the using statements in your code - You can write them only once by creating a method that will accept an SQL statement, an Action<DataSet> and a params array of parameters. Something like this:
void DoStuffWithDataTable(string query, Action<DataTable> action, params SqlParameter[] parameters)
{
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = new SqlCommand(query, connection))
using (SqlDataAdapter adapter = new SqlDataAdapter(command))
using (SqlCommandBuilder builder = new SqlCommandBuilder(adapter))
using (var table = new DataTable())
{
foreach(var param in parameters)
{
command.Parameters.Add(param);
}
// SqlDataAdapter has a fill overload that only needs a data table
adapter.Fill(table);
action();
adapter.Update(table);
}
}
And you use it like that, for all the actions you need to do with your data table:
DoStuffWithDataTable(
"Select...",
table =>
{ // of course, that doesn't have to be a lambda expression here...
foreach (DataRow row in table.Rows) // search whole table
{
if ((int)row["Id"] == 4)
{
row["Value2"] = 12345;
}
else if ((int)row["Id"] == 5)
{
row.Delete();
}
}
},
new SqlParameter[]
{
new SqlParameter("#FirstValue", 2),
new SqlParameter("#SecondValue", 3)
}
);
This way your code is "safe" with regards of disposing any IDisposable, while you have only written the plumbing code for that just once.

Related

Static Data Access Methods

I know that creating a custom data access layer is not a very good idea unless you: 1) Know exactly what you're doing, and/or 2) Have a very specific need. However, I am maintaining some legacy code that uses a custom data access layer where each method looks something like this:
using (SqlConnection cn = new SqlConnection(connectionString))
{
using (SqlDataAdapter da = new SqlDataAdapter("sp_select_details", cn))
{
using (DataSet ds = new DataSet())
{
da.SelectCommand.Parameters.Add("#blind", SqlDbType.Bit).Value = blind;
da.SelectCommand.CommandType = CommandType.StoredProcedure;
da.SelectCommand.CommandTimeout = CommandTimeout;
da.Fill(ds, "sp_select_details");
return ds;
}
}
}
Consequently, the usage looks something like this:
protected void Page_Load(object sender, EventArgs e) {
using (Data da = new Data ("SQL Server connection string")) {
DataSet ds = da.sp_select_blind_options(Session.SessionID); //opens a connection
Boolean result = da.sp_select_login_exists("someone");//opens another connection
}
}
I am thinking that using Microsoft's Enterprise Library would save me from setting up and tearing down, namely, the connection to SQL Server every method call. Am I correct in this thinking?
I've used Enterprise Library in the past very successfully, and Enterprise Library would hide some of the messy details from you, but essentially it would be using the same code internally as that demonstrated in your example.
As #tigran says, I wouldn't recommend trying to change an existing codebase unless there are fundamental issues with it.
Yes, it will definitely save your time, but you will pay in terms of performance and flexibility.
So creating a custom DataLayer is also a very good idea to gain a performance and flexibility.
Considering that you're talking about legacy code, that, I suppose, works, I wouldn't change it to something modern (but less performant) only for having something fresh in my code.
Solid, workable DataLayer is a best choice ever over any other new technology you should implement in legacy code.
In short, change it only if you have really seriouse reasons to do that. I understand your willingness to change the stuff, cause it's always hard to understand the code written by someone else, but believe me, very often not changing old legacy code is a best choice for the project.
Good luck.
Yep, by default connection pooling will be on. The application domain basically maintains a list of connections, and when you issue a call to create a connection, it returns an unused one from the pool, if it exists or creates one if not.
So when your connection cn goes out of scope in teh using statement and get's disposed, what actually happens is it goes back in to the pool, ready for the next request and hang around in there based on various optimisation parameters.
Google ADO connection pooling for more details, there's a lot in there.

How to improve this reusable methods class?

I have a class containing methods to fill DropDowns, return DataSet, return Scalar or simply excute a query. In one of my older posts in StackOverflow, I submitted a buggy code of the same class. Based on the advice of the contributors, I have improved the code and want to know whether this class is suitable to be used in a high-concurrent environment:
public sealed class reuse
{
public void FillDropDownList(string Query, DropDownList DropDownName)
{
using (TransactionScope transactionScope = new TransactionScope())
{
using (SqlConnection con = new SqlConnection(ConfigurationManager.ConnectionStrings["MyDbConnection"].ConnectionString.ToString()))
{
SqlDataReader dr;
try
{
if (DropDownName.Items.Count > 0)
DropDownName.Items.Clear();
SqlCommand cmd = new SqlCommand(Query, con);
dr = cmd.ExecuteReader();
while (dr.Read())
DropDownName.Items.Add(dr[0].ToString());
dr.Close();
}
catch (Exception ex)
{
CustomErrorHandler.GetScript(HttpContext.Current.Response,ex.Message.ToString());
}
}
}
}
}
I want to know whether to dispose Command and DataReader objects as well or they too will get automatically disposed with USING?
The the command/reader: they would be disposed by "using", but only if you use "using" for them, which you should.
Criticisms:
you are mixing UI and data access horribly - the exception handling in particular gives no indication to the calling code (although personally I'd keep the control code separate too), and assumes the caller always wants that script-based approach (to me, if this code fails, things are very wrong: let that exception bubble upwards!)
no mechanism for proper parameters; my suspicion then, is that you're concatenating strings to make a query - potential (but very real) risk of SQL injection
you mention high-concurrent; if so, I would expect to see some cache involvement here
for code maintenance reasons, I'd move all "create a connection" code to a central point - "DRY" etc; I wouldn't expect an individual method like this to concern itself with details like where the connection-string comes from
Frankly I'd just use dapper here, and avoid all these issues:
using(var connection = Config.OpenConnection()) {
return connection.Query<string>(tsql, args).ToString();
}
(and let the caller iterate over the list, or use AddRange, or data-binding, whatever)
Generally agree with Marc's answer but I have some other comments and different angle. Hope my answer will be useful for you.
First, there is nothing wrong in using static classes and methods in concurrent environment as long as there is no need for any state information and no data is shared. In your case, filling up DropDownList, it is perfectly fine because you only need a list of strings and once that's done you can forget all about how you got it. There is also no interference between concurrent calls to static method if they do not access any static fields. Static methods are common across .NET framework and they are thread safe.
In my example below I do use one static field - log4net logger. It is still thread-safe because it does not carry any state and is merely a jump point to log4net library which itself is thread-safe. Do recommend at least looking at log4net - great logging lib.
It could only be unsafe if you tried to fill the same drop down list from two threads but then it would be also unsafe if this class was not static. Make sure drop downs are filled from one (main) thread.
Back to your code. Mixing UI and data retrieval is not a good practice as it makes code much less maintainable and less stable. Separate those two. Dapper library might be a good way to simplify things. I have not used it myself so all I can tell is that it looks very handy and efficient. If you want/need to learn how stuff works don't use it though. At least not at first.
Having non-parametrized query in one string is potentially prone to SQL injection attacks but if that query is not constructed based on any direct user input, it should be safe. Of course you can always adopt parametrization to be sure.
Handling exception using
CustomErrorHandler.GetScript(HttpContext.Current.Response, ex.Message.ToString());
feels flaky and too complex for this place and may result in another exception. Exception when handling another exception means panic. I would move that code outside. If you need something here let it be a simple log4net error log and re-throw that exception.
If you only do one DB read there is no need for an explicit transaction. As per connection object, it should not be static in any situation and be created on demand. There is no performance penalty in that because .NET keeps a pool of ready to use connections and recycles those that were 'disposed'.
I believe that an example is always better than just explanations so here is how I would re-arrange your code.
public static class reuse
{
static public readonly log4net.ILog log = log4net.LogManager.GetLogger("GeneralLog");
public static void FillDropDownList(string query, string[] parms, DropDownList dropDown)
{
dropDown.Items.Clear();
dropDown.DataSource = GetData(query, parms);
dropDown.DataBind();
}
private static IEnumerable<string> GetData(string query, string[] parms)
{
using (SqlConnection con = new SqlConnection(GetConnString()))
{
try
{
List<string> result = new List<string>();
SqlCommand cmd = new SqlCommand(query, con);
cmd.Parameters.AddRange(parms);
SqlDataReader dr = cmd.ExecuteReader();
if (dr.VisibleFieldCount > 0)
{
while (dr.Read())
result.Add(dr[0].ToString());
}
dr.Close();
return result;
}
catch (Exception ex)
{
log.Error("Exception in GetData()", ex);
throw;
}
}
}
private static string GetConnString()
{
return ConfigurationManager.ConnectionStrings["MyDbConnection"].ConnectionString.ToString(CultureInfo.InvariantCulture);
}
}

Trying to understand the 'using' statement better

I have read a couple of articles about the using statement to try and understand when it should be used. It sound like most people reckon it should be used as much as possible as it guarantees disposal of unused objects.
Problem is that all the examples always show something like this:
using (SqlCommand scmFetch = new SqlCommand())
{
// code
}
That makes sense, but it's such a small piece of code. What should I do when executing a query on a database? What are all the steps? Will it look something like this:
string sQuery = #"
SELECT [ID], [Description]
FROM [Zones]
ORDER BY [Description] ";
DataTable dtZones = new DataTable("Zones");
using (SqlConnection scnFetchZones = new SqlConnection())
{
scnFetchZones.ConnectionString = __sConnectionString;
scnFetchZones.Open();
using (SqlCommand scmdFetchZones = new SqlCommand())
{
scmdFetchZones.Connection = scnFetchZones;
scmdFetchZones.CommandText = sQuery;
using (SqlDataAdapter sdaFetch = new SqlDataAdapter())
{
sdaFetch.SelectCommand = scmdFetchZones;
sdaFetch.Fill(dtZones);
}
}
if (scnFetchZones.State == ConnectionState.Open)
scnFetchZones.Close();
}
What I want to know is:
• Is it okay to have 4, 5, 10 nested using statements to ensure all objects are disposed?
• At what point am I doing something wrong and should I consider revision?
• If revision is required due to too many nested using statements, what are my options?
You might end up with a formidable hierarchy, but your code should be quite efficient, right? Or should you only put, for instance, the SqlDataAdapter object in a using statement and it will somehow ensure that all the other objects get disposed as well?
Thanx.
It is perfectly valid to have many nested using statements:
using(A a = new A())
using(B b = new B())
{
a.SomeMethod(b);
}
You would never be wrong if you use using for every IDisposable that you use. There is no limit of how many nested using blocks you use.
Using statement is syntax sugar of C#.
So the following code:
using(var someDisposableObject = new someDisposableObject())
{
// Do Something
}
actualy looks like:
var someDisposableObject = new someDisposableObject();
try
{
// Do Something
}
finally
{
if (someDisposableObject != null)
{
((IDisposable) someDisposableObject).Dispose();
}
}
Look at this article: http://msdn.microsoft.com/en-us/library/yh598w02.aspx
There's no limit on the depth, so that's not a concern. You should verify that the object of the using implements IDisposable. And an object being disposed doesn't dispose of all objects connected to it, just those it creates.
So, at what point are you doing wrong: there's no limit, but generally its fairly shallow, you create the object, do a task, then the object is disposed. If you're doing it very deeply, I'd look at the design. I think you'd be hard pressed to do it more than few layers deep.
As for your options for a redesign, that really depends upon what you are doing, but you might use the same object for multiple tasks. Most likely you will end up breaking the task down into a function (passing in any surrounding objects that are needed).
• Is it okay to have 4, 5, 10 nested using statements to ensure all objects are disposed?
Reply: You can not limit of using nested "using blocks ".
• At what point am I doing something wrong and should I consider revision?
Reply: If you have many nested "using blocks". Please try as below.
using (var con = new SqlConnection(connStr))
using (var cmd = new SqlCommand(queryStry))
using (var rs = cmd.ExecuteReader())
{
while (rs.Read())
{
//Code.
}
}
Personally, I have used at least 3 layers (Connection, Command, Other) a number of times and I see absolutely no problem with it, but as you have already hinted at, eventually there will be a problem a readability. As with other nested constructs, you may need to balance efficiency with maintainability. That is, you don't need to necessarily sacrifice efficiency, but there is often 'more than one way to skin a cat'.
That said, you would be hard-pushed to generate 10 nested layers!
IMHO, what you need to ask yourself is: What are the alternatives? Try/finally blocks? Are they more readable? More maintainable? In almost all cases, the answer is going to be "no".
So use using. It's the closest thing C# has to C++'s RAII pattern and it's all good :-)
One time I can think of where you wouldn't want to use 'using' on connections would be on ClassFactories for connected objects such as DataReaders, e.g. consider the case
private IDataReader CreateReader(string queryString,
string connectionString)
{
SqlConnection connection = new SqlConnection(connectionString);
SqlCommand command = new SqlCommand(queryString, connection);
connection.Open();
return command.ExecuteReader(CommandBehavior.CloseConnection);
// Don't close connection
}
(Modified from MSDN - The example on MSDN is just plain stupid)
Another reason is on WCF ServiceReference 'clients' - if the channel becomes faulted, 'using' then hides the actual exception. But this is just a buggy implementation IMHO.

Is there a reason to check for null inside multiple using clausule in c#?

Is there a reason to check for null in the last using? Seems to me like it's most likely not gonna be needed?
using (var connection = new SqlConnection(connectionString))
{
using (var command = new SqlCommand(commandString, connection))
{
using (var reader = command.ExecuteReader())
{
if (reader != null) {
// Use the reader
}
}
}
}
EDIT:
Should i close reader.Close() and connection.Close() inside the using if i use it like that:
using (var varConnection = Locale.sqlConnectOneTime(Locale.sqlDataConnectionDetailsDZP))
using (var sqlWrite = new SqlCommand(preparedCommand, varConnection)) {
while (sqlWrite.Read()) {
//something to do.
}
sqlWrite.Close();
varConnection.Close();
}
public static SqlConnection sqlConnectOneTime(string varSqlConnectionDetails) {
SqlConnection sqlConnection = new SqlConnection(varSqlConnectionDetails);
sqlConnect(sqlConnection);
if (sqlConnection.State == ConnectionState.Open) {
return sqlConnection;
}
return null;
}
Is using of close necessary in following example or i can skip those two?
sqlWrite.Close();
varConnection.Close();
With regards,
MadBoy
No, this is not necessary.
But that follows from the definition of ExecuteReader, it is not related to the using clause. ExecuteReader either returns a (non-null) DataReader object or it throws an exception.
The expresion in the if statement will always be true (if it is reached).
And by leaving out the if and all the superfluous brace-pairs you can make this much easier to read:
using (var connection = new SqlConnection(connectionString))
using (var command = new SqlCommand(commandString, connection))
using (var reader = command.ExecuteReader())
{
// Use the reader
}
No. Why would you do that? The documentation doesn't suggest that it might be null. What would you do if it was null, anyway? Try it again?
Since you are specifically using SqlConnection and SqlCommand - no, there is no reason for this check.
However, the ADO interfaces are so that you can plug in any other DB provider and use them via the ADO base classes (DbConnection, DbCommand etc). It seems that some providers are returning null sometimes (MySQL maybe?), which may be why the programmer inserted this. Or just because ReSharper issues a warning here? These may be reasons for this, even though it is not necessary if the providers used follow the contract.
the using statements are not going to check for null for you, so if command.ExecuteReader() can return null you have to check for it explicitly, just like in your code snippet.
EDIT: Looks like in this particular case ExecuteReader() should now return null, so you can avoid it. Please remember that you need it in general case though.
Seems like the null check is worth it simply to avoid that slim chance that the object is null. Take a look at my answer to How can I modify a queue collection in a loop?
In the example Queue.Dequeue() should never return a null but it does. It's an extreme case but I don't see why you wouldn't want to avoid an unhandled exception, especially if it's as easy as if (object != null)
For Henk (you can run the code yourself and get similar results):
alt text http://www.ccswe.com/temp/Untitled.png
Either way I was simply stating my opinion that just because the documentation says it will do one thing doesn't mean it always will. Not sure why I got the downvote simply because someone has a different opinion :-)
Edit: Stupid tool tip, either way you can see the processed is 9998065 and the null values encountered is 2264. If there is something fundamentally wrong with my example code I'd be interested in hearing it. I'm gonna back away from this thread now.
In this case you should not check reader for null.
But you may use Code Contracts library and use Contract.Assume (or Contract.Assert) to write your assumptions about code. In this case you can use tools for static analysis and easily remove this checks from your release builds.

C# IEnumerator/yield structure potentially bad?

Background: I've got a bunch of strings that I'm getting from a database, and I want to return them. Traditionally, it would be something like this:
public List<string> GetStuff(string connectionString)
{
List<string> categoryList = new List<string>();
using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
string commandText = "GetStuff";
using (SqlCommand sqlCommand = new SqlCommand(commandText, sqlConnection))
{
sqlCommand.CommandType = CommandType.StoredProcedure;
sqlConnection.Open();
SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();
while (sqlDataReader.Read())
{
categoryList.Add(sqlDataReader["myImportantColumn"].ToString());
}
}
}
return categoryList;
}
But then I figure the consumer is going to want to iterate through the items and doesn't care about much else, and I'd like to not box myself in to a List, per se, so if I return an IEnumerable everything is good/flexible. So I was thinking I could use a "yield return" type design to handle this...something like this:
public IEnumerable<string> GetStuff(string connectionString)
{
using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
string commandText = "GetStuff";
using (SqlCommand sqlCommand = new SqlCommand(commandText, sqlConnection))
{
sqlCommand.CommandType = CommandType.StoredProcedure;
sqlConnection.Open();
SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();
while (sqlDataReader.Read())
{
yield return sqlDataReader["myImportantColumn"].ToString();
}
}
}
}
But now that I'm reading a bit more about yield (on sites like this...msdn didn't seem to mention this), it's apparently a lazy evaluator, that keeps the state of the populator around, in anticipation of someone asking for the next value, and then only running it until it returns the next value.
This seems fine in most cases, but with a DB call, this sounds a bit dicey. As a somewhat contrived example, if someone asks for an IEnumerable from that I'm populating from a DB call, gets through half of it, and then gets stuck in a loop...as far as I can see my DB connection is going to stay open forever.
Sounds like asking for trouble in some cases if the iterator doesn't finish...am I missing something?
It's a balancing act: do you want to force all the data into memory immediately so you can free up the connection, or do you want to benefit from streaming the data, at the cost of tying up the connection for all that time?
The way I look at it, that decision should potentially be up to the caller, who knows more about what they want to do. If you write the code using an iterator block, the caller can very easily turned that streaming form into a fully-buffered form:
List<string> stuff = new List<string>(GetStuff(connectionString));
If, on the other hand, you do the buffering yourself, there's no way the caller can go back to a streaming model.
So I'd probably use the streaming model and say explicitly in the documentation what it does, and advise the caller to decide appropriately. You might even want to provide a helper method to basically call the streamed version and convert it into a list.
Of course, if you don't trust your callers to make the appropriate decision, and you have good reason to believe that they'll never really want to stream the data (e.g. it's never going to return much anyway) then go for the list approach. Either way, document it - it could very well affect how the return value is used.
Another option for dealing with large amounts of data is to use batches, of course - that's thinking somewhat away from the original question, but it's a different approach to consider in the situation where streaming would normally be attractive.
You're not always unsafe with the IEnumerable. If you leave the framework call GetEnumerator (which is what most of the people will do), then you're safe. Basically, you're as safe as the carefullness of the code using your method:
class Program
{
static void Main(string[] args)
{
// safe
var firstOnly = GetList().First();
// safe
foreach (var item in GetList())
{
if(item == "2")
break;
}
// safe
using (var enumerator = GetList().GetEnumerator())
{
for (int i = 0; i < 2; i++)
{
enumerator.MoveNext();
}
}
// unsafe
var enumerator2 = GetList().GetEnumerator();
for (int i = 0; i < 2; i++)
{
enumerator2.MoveNext();
}
}
static IEnumerable<string> GetList()
{
using (new Test())
{
yield return "1";
yield return "2";
yield return "3";
}
}
}
class Test : IDisposable
{
public void Dispose()
{
Console.WriteLine("dispose called");
}
}
Whether you can affort to leave the database connection open or not depends on your architecture as well. If the caller participates in an transaction (and your connection is auto enlisted), then the connection will be kept open by the framework anyway.
Another advantage of yield is (when using a server-side cursor), your code doesn't have to read all data (example: 1,000 items) from the database, if your consumer wants to get out of the loop earlier (example: after the 10th item). This can speed up querying data. Especially in an Oracle environment, where server-side cursors are the common way to retrieve data.
You are not missing anything. Your sample shows how NOT to use yield return. Add the items to a list, close the connection, and return the list. Your method signature can still return IEnumerable.
Edit: That said, Jon has a point (so surprised!): there are rare occasions where streaming is actually the best thing to do from a performance perspective. After all, if it's 100,000 (1,000,000? 10,000,000?) rows we're talking about here, you don't want to be loading that all into memory first.
As an aside - note that the IEnumerable<T> approach is essentially what the LINQ providers (LINQ-to-SQL, LINQ-to-Entities) do for a living. The approach has advantages, as Jon says. However, there are definite problems too - in particular (for me) in terms of (the combination of) separation | abstraction.
What I mean here is that:
in a MVC scenario (for example) you want your "get data" step to actually get data, so that you can test it works at the controller, not the view (without having to remember to call .ToList() etc)
you can't guarantee that another DAL implementation will be able to stream data (for example, a POX/WSE/SOAP call can't usually stream records); and you don't necessarily want to make the behaviour confusingly different (i.e. connection still open during iteration with one implementation, and closed for another)
This ties in a bit with my thoughts here: Pragmatic LINQ.
But I should stress - there are definitely times when the streaming is highly desirable. It isn't a simple "always vs never" thing...
Slightly more concise way to force evaluation of iterator:
using System.Linq;
//...
var stuff = GetStuff(connectionString).ToList();
No, you are on the right path... the yield will lock the reader... you can test it doing another database call while calling the IEnumerable
The only way this would cause problems is if the caller abuses the protocol of IEnumerable<T>. The correct way to use it is to call Dispose on it when it is no longer needed.
The implementation generated by yield return takes the Dispose call as a signal to execute any open finally blocks, which in your example will call Dispose on the objects you've created in the using statements.
There are a number of language features (in particular foreach) which make it very easy to use IEnumerable<T> correctly.
You could always use a separate thread to buffer the data (perhaps to a queue) while also doing a yeild to return the data. When the user requests data (returned via a yeild), an item is removed from the queue. Data is also being continuously added to the queue via the separate thread. That way, if the user requests the data fast enough, the queue is never very full and you do not have to worry about memory issues. If they don't, then the queue will fill up, which may not be so bad. If there is some sort of limitation you would like to impose on memory, you could enforce a maximum queue size (at which point the other thread would wait for items to be removed before adding more to the queue). Naturally, you will want to make sure you handle resources (i.e., the queue) correctly between the two threads.
As an alternative, you could force the user to pass in a boolean to indicate whether or not the data should be buffered. If true, the data is buffered and the connection is closed as soon as possible. If false, the data is not buffered and the database connection stays open as long as the user needs it to be. Having a boolean parameter forces the user to make the choice, which ensures they know about the issue.
I've bumped into this wall a few times. SQL database queries are not easily streamable like files. Instead, query only as much as you think you'll need and return it as whatever container you want (IList<>, DataTable, etc.). IEnumerable won't help you here.
What you can do is use a SqlDataAdapter instead and fill a DataTable. Something like this:
public IEnumerable<string> GetStuff(string connectionString)
{
DataTable table = new DataTable();
using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
string commandText = "GetStuff";
using (SqlCommand sqlCommand = new SqlCommand(commandText, sqlConnection))
{
sqlCommand.CommandType = CommandType.StoredProcedure;
SqlDataAdapter dataAdapter = new SqlDataAdapter(sqlCommand);
dataAdapter.Fill(table);
}
}
foreach(DataRow row in table.Rows)
{
yield return row["myImportantColumn"].ToString();
}
}
This way, you're querying everything in one shot, and closing the connection immediately, yet you're still lazily iterating the result. Furthermore, the caller of this method can't cast the result to a List and do something they shouldn't be doing.
Dont use yield here. your sample is fine.

Categories