I have a class containing methods to fill DropDowns, return DataSet, return Scalar or simply excute a query. In one of my older posts in StackOverflow, I submitted a buggy code of the same class. Based on the advice of the contributors, I have improved the code and want to know whether this class is suitable to be used in a high-concurrent environment:
public sealed class reuse
{
public void FillDropDownList(string Query, DropDownList DropDownName)
{
using (TransactionScope transactionScope = new TransactionScope())
{
using (SqlConnection con = new SqlConnection(ConfigurationManager.ConnectionStrings["MyDbConnection"].ConnectionString.ToString()))
{
SqlDataReader dr;
try
{
if (DropDownName.Items.Count > 0)
DropDownName.Items.Clear();
SqlCommand cmd = new SqlCommand(Query, con);
dr = cmd.ExecuteReader();
while (dr.Read())
DropDownName.Items.Add(dr[0].ToString());
dr.Close();
}
catch (Exception ex)
{
CustomErrorHandler.GetScript(HttpContext.Current.Response,ex.Message.ToString());
}
}
}
}
}
I want to know whether to dispose Command and DataReader objects as well or they too will get automatically disposed with USING?
The the command/reader: they would be disposed by "using", but only if you use "using" for them, which you should.
Criticisms:
you are mixing UI and data access horribly - the exception handling in particular gives no indication to the calling code (although personally I'd keep the control code separate too), and assumes the caller always wants that script-based approach (to me, if this code fails, things are very wrong: let that exception bubble upwards!)
no mechanism for proper parameters; my suspicion then, is that you're concatenating strings to make a query - potential (but very real) risk of SQL injection
you mention high-concurrent; if so, I would expect to see some cache involvement here
for code maintenance reasons, I'd move all "create a connection" code to a central point - "DRY" etc; I wouldn't expect an individual method like this to concern itself with details like where the connection-string comes from
Frankly I'd just use dapper here, and avoid all these issues:
using(var connection = Config.OpenConnection()) {
return connection.Query<string>(tsql, args).ToString();
}
(and let the caller iterate over the list, or use AddRange, or data-binding, whatever)
Generally agree with Marc's answer but I have some other comments and different angle. Hope my answer will be useful for you.
First, there is nothing wrong in using static classes and methods in concurrent environment as long as there is no need for any state information and no data is shared. In your case, filling up DropDownList, it is perfectly fine because you only need a list of strings and once that's done you can forget all about how you got it. There is also no interference between concurrent calls to static method if they do not access any static fields. Static methods are common across .NET framework and they are thread safe.
In my example below I do use one static field - log4net logger. It is still thread-safe because it does not carry any state and is merely a jump point to log4net library which itself is thread-safe. Do recommend at least looking at log4net - great logging lib.
It could only be unsafe if you tried to fill the same drop down list from two threads but then it would be also unsafe if this class was not static. Make sure drop downs are filled from one (main) thread.
Back to your code. Mixing UI and data retrieval is not a good practice as it makes code much less maintainable and less stable. Separate those two. Dapper library might be a good way to simplify things. I have not used it myself so all I can tell is that it looks very handy and efficient. If you want/need to learn how stuff works don't use it though. At least not at first.
Having non-parametrized query in one string is potentially prone to SQL injection attacks but if that query is not constructed based on any direct user input, it should be safe. Of course you can always adopt parametrization to be sure.
Handling exception using
CustomErrorHandler.GetScript(HttpContext.Current.Response, ex.Message.ToString());
feels flaky and too complex for this place and may result in another exception. Exception when handling another exception means panic. I would move that code outside. If you need something here let it be a simple log4net error log and re-throw that exception.
If you only do one DB read there is no need for an explicit transaction. As per connection object, it should not be static in any situation and be created on demand. There is no performance penalty in that because .NET keeps a pool of ready to use connections and recycles those that were 'disposed'.
I believe that an example is always better than just explanations so here is how I would re-arrange your code.
public static class reuse
{
static public readonly log4net.ILog log = log4net.LogManager.GetLogger("GeneralLog");
public static void FillDropDownList(string query, string[] parms, DropDownList dropDown)
{
dropDown.Items.Clear();
dropDown.DataSource = GetData(query, parms);
dropDown.DataBind();
}
private static IEnumerable<string> GetData(string query, string[] parms)
{
using (SqlConnection con = new SqlConnection(GetConnString()))
{
try
{
List<string> result = new List<string>();
SqlCommand cmd = new SqlCommand(query, con);
cmd.Parameters.AddRange(parms);
SqlDataReader dr = cmd.ExecuteReader();
if (dr.VisibleFieldCount > 0)
{
while (dr.Read())
result.Add(dr[0].ToString());
}
dr.Close();
return result;
}
catch (Exception ex)
{
log.Error("Exception in GetData()", ex);
throw;
}
}
}
private static string GetConnString()
{
return ConfigurationManager.ConnectionStrings["MyDbConnection"].ConnectionString.ToString(CultureInfo.InvariantCulture);
}
}
Related
When .NET first came out, I was one of many who complained about .NET's lack of deterministic finalization (class destructors being called on an unpredictable schedule). The compromise Microsoft came up with at the time was the using statement.
Although not perfect, I think using using is important to ensuring that unmanaged resources are cleaned up in a timely manner.
However, I'm writing some ADO.NET code and noticed that almost every class implements IDisposable. That leads to code that looks like this.
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = new SqlCommand(query, connection))
using (SqlDataAdapter adapter = new SqlDataAdapter(command))
using (SqlCommandBuilder builder = new SqlCommandBuilder(adapter))
using (DataSet dataset = new DataSet())
{
command.Parameters.AddWithValue("#FirstValue", 2);
command.Parameters.AddWithValue("#SecondValue", 3);
adapter.Fill(dataset);
DataTable table = dataset.Tables[0];
foreach (DataRow row in table.Rows) // search whole table
{
if ((int)row["Id"] == 4)
{
row["Value2"] = 12345;
}
else if ((int)row["Id"] == 5)
{
row.Delete();
}
}
adapter.Update(table);
}
I strongly suspect I do not need all of those using statements. But without understanding the code for each class in some detail, it's hard to be certain which ones I can leave out. The results is kind of ugly and detracts from the main logic in my code.
Does anyone know why all these classes need to implement IDisposable? (Microsoft has many code examples online that don't worry about disposing many of these objects.) Are other developers writing using statements for all of them? And, if not, how do you decide which ones can do without?
One part of the problem here is that ADO.NET is an abstract provider model. We don't know what a specific implementation (a specific ADO.NET provider) will need with regards to disposal. Sure, we can reasonably assume the connection and transaction need to be disposed, but the command? Maybe. Reader? Probably, not least because one of the command-flags options allows you to associate a connection's lifetime to the reader (so the connection closes when the reader does, which should logically extend to disposal).
So overall, I think it is probably fine.
Most of the time, people aren't messing with ADO.NET by hand, and any ORM tools (or micro-ORM tools, such as "Dapper") will get this right for you without you having to worry about it.
I will openly confess that on the few occasions when I've used DataTable (seriously, it is 2018 - that shouldn't be your default model for representing data, except for some niche scenarios): I haven't disposed them. That one kinda makes no sense :)
If a class implements IDisposable, then you should always make sure that it gets disposed, without making any assumptions about its implementation.
I think that's the minimal amount of code that you could possibly write that ensures that your objects will be disposed. I see absolutely no problem with it, and it does not distract me at all from the main logic of my code.
If reducing this code is of paramount importance to you, then you can come up with your own replacement (wrapper) of SqlConnection whose child classes do not implement IDisposable and instead get automatically destroyed by the connection once the connection is closed. However, that would be a huge amount of work, and you would lose some precision with regards to when something gets disposed.
Yes. almost anything in ADO.Net is implementing IDisposable, whether that's actually needed - in case of, say, SqlConnection(because of connection pooling) or not - in case of, say, DataTable (Should I Dispose() DataSet and DataTable?
).
The problem is, as noted in the question:
without understanding the code for each class in some detail, it's hard to be certain which ones I can leave out.
I think that this alone is a good enough reason to keep everything inside a using statement - in one word: Encapsulation.
In many words:
It shouldn't be necessary to get intimately familiar, or even remotely familiar for that matter, with the implementation of every class you are working with. You only need to know the surface area - meaning public methods, properties, events, indexers (and fields, if that class has public fields). from the user of a class point of view, anything other then it's public surface area is an implementation detail.
Regarding all the using statements in your code - You can write them only once by creating a method that will accept an SQL statement, an Action<DataSet> and a params array of parameters. Something like this:
void DoStuffWithDataTable(string query, Action<DataTable> action, params SqlParameter[] parameters)
{
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = new SqlCommand(query, connection))
using (SqlDataAdapter adapter = new SqlDataAdapter(command))
using (SqlCommandBuilder builder = new SqlCommandBuilder(adapter))
using (var table = new DataTable())
{
foreach(var param in parameters)
{
command.Parameters.Add(param);
}
// SqlDataAdapter has a fill overload that only needs a data table
adapter.Fill(table);
action();
adapter.Update(table);
}
}
And you use it like that, for all the actions you need to do with your data table:
DoStuffWithDataTable(
"Select...",
table =>
{ // of course, that doesn't have to be a lambda expression here...
foreach (DataRow row in table.Rows) // search whole table
{
if ((int)row["Id"] == 4)
{
row["Value2"] = 12345;
}
else if ((int)row["Id"] == 5)
{
row.Delete();
}
}
},
new SqlParameter[]
{
new SqlParameter("#FirstValue", 2),
new SqlParameter("#SecondValue", 3)
}
);
This way your code is "safe" with regards of disposing any IDisposable, while you have only written the plumbing code for that just once.
Let's say we have a method with a code like this:
Dictionary<int, int> results = new Dictionary<int, int>();
try
{
using (SqlConnection sqlConn = new SqlConnection("some connection string"))
{
SqlCommand sqlCmd = new SqlCommand("stored procedure's name here", sqlConn);
sqlCmd.CommandType = CommandType.StoredProcedure;
//sqlCmd.Parameters.Add lines here
sqlConn.Open();
using (SqlDataReader sqlDR = sqlCmd.ExecuteReader())
{
while (sqlDR.Read())
{
results.Add((int)sqlDR["keyColumnName"], (int)sqlDR["valueColumnName"]);
}
}
}
}
catch { }
return results;
The stored procedure is a select statement with a subselect, both from the same single table, returning multiple rows, no more than several hundred, usually less. Assuming that SP groups results by the key column (so no duplicate key problem in dictionary) and returns ints (so no problem with conversion), is it possible to have it return only partial results if any other error occurs?
I'm well aware it's an empty catch block there - if it wasn't empty, I probably wouldn't be asking this question. I also know this code can return an empty dictionary. I was wondering if an exception can break reading from SqlDataReader so that results are neither empty nor complete.
I was also told that switching from SqlDataReader.Read to loading query results at once with DataTable.Load(DataReader) and then filling results from DataTable outside of both using statements would avoid getting partial results (that is, if they are possible at all in the code above). Would it though? Does DataTable.Load really work differently from SqlDataReader.Read?
Yes, it is possible. Results are 'created' as the query executes, and send back to the client as they are being created. The reader will read these results as they come and add them to the Dictionary. When and error occurs int he engine, if it occurs, then the execution on the server side is aborted and the error information is sent back and the SqlClient reacts by raising the exception. Read Understanding How SQL Server executes a query for more details.
So in your code is absolutely possible to silently return a result that is not empty but also not complete. Aside from the empty catch block, this problem is a just an example in the general anti-pattern of writing code that is not exception safe, in the sense that in the case of exceptions it leaves the application in a state that partially changed and will only trigger more errors later in execution. There is an excelent book on the subject, Exceptional C++, even though is C++ the principles apply.
The usual workaround is to mutate a temporary state and then swap the current state with the desired state in an operation that cannot raise exceptions. In your case that means reading into a dictionary and and then assigning to the return only at the end, after the result is entirely read:
Dictionary<int, int> results = new Dictionary<int, int>();
try
{
Dictionary<int, int> temp = new Dictionary<int, int>();
using (SqlConnection sqlConn = new SqlConnection("some connection string"))
{
using (SqlDataReader sqlDR = sqlCmd.ExecuteReader())
{
while (sqlDR.Read())
{
temp.Add((int)sqlDR["keyColumnName"], (int)sqlDR["valueColumnName"]);
}
}
}
results = temp;
}
catch { }
return results;
Another approach is to compensate the actions in the catch block. In your case it would mean clearing the results. But I much disfavor that approach because it requires keeping the state mutation actions in sync with the compensating actions and over time, if they drift apart, some actions are no longer compensated (undone).
The diligent reader will notice that the two approaches are the code equivalent of the two approaches in database theory for achieving atomicity: shadow-paging vs. rollback in write-ahead logging. This is no coincidence, since what you're trying to achieve is atomicity (either all state changes occur or none occurs).
The data-reader API is a streaming API, so yes: if there is a connection issue, it could happen at any time - including in the middle of a result grid. There would be, however, absolutely no advantage in using a DataTable here, as that would fail identically to your existing code in this scenario - if it wasn't for your catch{}. It is only the catch{} that is causing the "so that results are neither empty nor complete" issue: without the catch{}, you would get notified (by an exception) about the problem.
I know that creating a custom data access layer is not a very good idea unless you: 1) Know exactly what you're doing, and/or 2) Have a very specific need. However, I am maintaining some legacy code that uses a custom data access layer where each method looks something like this:
using (SqlConnection cn = new SqlConnection(connectionString))
{
using (SqlDataAdapter da = new SqlDataAdapter("sp_select_details", cn))
{
using (DataSet ds = new DataSet())
{
da.SelectCommand.Parameters.Add("#blind", SqlDbType.Bit).Value = blind;
da.SelectCommand.CommandType = CommandType.StoredProcedure;
da.SelectCommand.CommandTimeout = CommandTimeout;
da.Fill(ds, "sp_select_details");
return ds;
}
}
}
Consequently, the usage looks something like this:
protected void Page_Load(object sender, EventArgs e) {
using (Data da = new Data ("SQL Server connection string")) {
DataSet ds = da.sp_select_blind_options(Session.SessionID); //opens a connection
Boolean result = da.sp_select_login_exists("someone");//opens another connection
}
}
I am thinking that using Microsoft's Enterprise Library would save me from setting up and tearing down, namely, the connection to SQL Server every method call. Am I correct in this thinking?
I've used Enterprise Library in the past very successfully, and Enterprise Library would hide some of the messy details from you, but essentially it would be using the same code internally as that demonstrated in your example.
As #tigran says, I wouldn't recommend trying to change an existing codebase unless there are fundamental issues with it.
Yes, it will definitely save your time, but you will pay in terms of performance and flexibility.
So creating a custom DataLayer is also a very good idea to gain a performance and flexibility.
Considering that you're talking about legacy code, that, I suppose, works, I wouldn't change it to something modern (but less performant) only for having something fresh in my code.
Solid, workable DataLayer is a best choice ever over any other new technology you should implement in legacy code.
In short, change it only if you have really seriouse reasons to do that. I understand your willingness to change the stuff, cause it's always hard to understand the code written by someone else, but believe me, very often not changing old legacy code is a best choice for the project.
Good luck.
Yep, by default connection pooling will be on. The application domain basically maintains a list of connections, and when you issue a call to create a connection, it returns an unused one from the pool, if it exists or creates one if not.
So when your connection cn goes out of scope in teh using statement and get's disposed, what actually happens is it goes back in to the pool, ready for the next request and hang around in there based on various optimisation parameters.
Google ADO connection pooling for more details, there's a lot in there.
I'm using the MySQL .Net libraries in an older C# application I'm rewriting. The Data Access Layer is rather obsolete but I'm trying to make the best of it. But now I ran into some really nasty threading issues.
I have a series of about 20 Select statements which are used to process a report. They take about 5 seconds to complete and I'm displaying a progress bar while the Select statements run. I'm launching the operations via a simple ThreadPool call:
[LATER EDIT: What happens is that I called the method below twice due to a bug in my UI - this doesn't devalue the question though, merely explains why my threads were racing against each other.]
ThreadPool.QueueUserWorkItem(new WaitCallback(UpdateChart));
Sometimes it works.
Sometimes it crashes with a "possible IO stream race condition".
Sometimes it crashes with "connection should be open and valid".
Sometimes it crashes with "object reference not set...".
All classes in my DAL are Static because I thought this is a good way of improving performance (not having to create new class instances for every little operation).
And all my DAL classes use the same "root" DAL class which builds Connections:
public static class MySQLConnectionBuilder
{
private static MySqlConnectionStringBuilder ConnectionStringBuilder = new MySqlConnectionStringBuilder();
//I'm initializing the ConnectionStringBuilder with my server password & address.
public static MySqlConnection GetConnection ()
{
return new MySqlConnection(ConnectionStringBuilder.ConnectionString);
}
}
All my DAL classes have functions which are similar to the crashing function. The crashing function looks like this:
public static STDS.UserPresence.user_presenceDataTable GetPresence (int aUserID, DateTime aStart, DateTime aEnd)
{
ta.Connection = MySQLConnectionBuilder.GetConnection();
ds = ta.GetPresenceForUserBetweenDates(aUserID, aStart, aEnd);
ta.Connection.Close();
return ds;
}
Ideas? Tips on improvement? Will the threading issue go away if I switch to a more object-oriented (instance-driven) DAL?
The line
ta.Connection.Close()
Closes the connection last assigned to ta.Connection - not always the connection created in the same thread. This may close a connection on which a query is currently running in another thread.
If you want to quickly determine if this is what's happening, mark the connection variable with a [ThreadStatic] attribute in the class ta points to:
[ThreadStatic]
private static MySqlConnection connection;
I wouldn't use that approach for your final solution though, as it may cause the GC not to collect them.
A simple solution (for that problem, I can't determine if your classes have other multithreading issues) is to add the connection as a parameter to each of your DAL methods, allowing you to remove the class global Connection:
public static STDS.UserPresence.user_presenceDataTable GetPresence (int aUserID, DateTime aStart, DateTime aEnd)
{
using (MySqlConnection connection = MySQLConnectionBuilder.GetConnection())
{
ds = ta.GetPresenceForUserBetweenDates(connection, aUserID, aStart, aEnd);
return ds;
}
}
Threading issues never simply go away - they require attention. If you are unsure about what's happening, forget about the slight performance boost (if a query takes 5 seconds, any possible performance gain of using static classes would be below 1% anyway).
Background: I've got a bunch of strings that I'm getting from a database, and I want to return them. Traditionally, it would be something like this:
public List<string> GetStuff(string connectionString)
{
List<string> categoryList = new List<string>();
using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
string commandText = "GetStuff";
using (SqlCommand sqlCommand = new SqlCommand(commandText, sqlConnection))
{
sqlCommand.CommandType = CommandType.StoredProcedure;
sqlConnection.Open();
SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();
while (sqlDataReader.Read())
{
categoryList.Add(sqlDataReader["myImportantColumn"].ToString());
}
}
}
return categoryList;
}
But then I figure the consumer is going to want to iterate through the items and doesn't care about much else, and I'd like to not box myself in to a List, per se, so if I return an IEnumerable everything is good/flexible. So I was thinking I could use a "yield return" type design to handle this...something like this:
public IEnumerable<string> GetStuff(string connectionString)
{
using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
string commandText = "GetStuff";
using (SqlCommand sqlCommand = new SqlCommand(commandText, sqlConnection))
{
sqlCommand.CommandType = CommandType.StoredProcedure;
sqlConnection.Open();
SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();
while (sqlDataReader.Read())
{
yield return sqlDataReader["myImportantColumn"].ToString();
}
}
}
}
But now that I'm reading a bit more about yield (on sites like this...msdn didn't seem to mention this), it's apparently a lazy evaluator, that keeps the state of the populator around, in anticipation of someone asking for the next value, and then only running it until it returns the next value.
This seems fine in most cases, but with a DB call, this sounds a bit dicey. As a somewhat contrived example, if someone asks for an IEnumerable from that I'm populating from a DB call, gets through half of it, and then gets stuck in a loop...as far as I can see my DB connection is going to stay open forever.
Sounds like asking for trouble in some cases if the iterator doesn't finish...am I missing something?
It's a balancing act: do you want to force all the data into memory immediately so you can free up the connection, or do you want to benefit from streaming the data, at the cost of tying up the connection for all that time?
The way I look at it, that decision should potentially be up to the caller, who knows more about what they want to do. If you write the code using an iterator block, the caller can very easily turned that streaming form into a fully-buffered form:
List<string> stuff = new List<string>(GetStuff(connectionString));
If, on the other hand, you do the buffering yourself, there's no way the caller can go back to a streaming model.
So I'd probably use the streaming model and say explicitly in the documentation what it does, and advise the caller to decide appropriately. You might even want to provide a helper method to basically call the streamed version and convert it into a list.
Of course, if you don't trust your callers to make the appropriate decision, and you have good reason to believe that they'll never really want to stream the data (e.g. it's never going to return much anyway) then go for the list approach. Either way, document it - it could very well affect how the return value is used.
Another option for dealing with large amounts of data is to use batches, of course - that's thinking somewhat away from the original question, but it's a different approach to consider in the situation where streaming would normally be attractive.
You're not always unsafe with the IEnumerable. If you leave the framework call GetEnumerator (which is what most of the people will do), then you're safe. Basically, you're as safe as the carefullness of the code using your method:
class Program
{
static void Main(string[] args)
{
// safe
var firstOnly = GetList().First();
// safe
foreach (var item in GetList())
{
if(item == "2")
break;
}
// safe
using (var enumerator = GetList().GetEnumerator())
{
for (int i = 0; i < 2; i++)
{
enumerator.MoveNext();
}
}
// unsafe
var enumerator2 = GetList().GetEnumerator();
for (int i = 0; i < 2; i++)
{
enumerator2.MoveNext();
}
}
static IEnumerable<string> GetList()
{
using (new Test())
{
yield return "1";
yield return "2";
yield return "3";
}
}
}
class Test : IDisposable
{
public void Dispose()
{
Console.WriteLine("dispose called");
}
}
Whether you can affort to leave the database connection open or not depends on your architecture as well. If the caller participates in an transaction (and your connection is auto enlisted), then the connection will be kept open by the framework anyway.
Another advantage of yield is (when using a server-side cursor), your code doesn't have to read all data (example: 1,000 items) from the database, if your consumer wants to get out of the loop earlier (example: after the 10th item). This can speed up querying data. Especially in an Oracle environment, where server-side cursors are the common way to retrieve data.
You are not missing anything. Your sample shows how NOT to use yield return. Add the items to a list, close the connection, and return the list. Your method signature can still return IEnumerable.
Edit: That said, Jon has a point (so surprised!): there are rare occasions where streaming is actually the best thing to do from a performance perspective. After all, if it's 100,000 (1,000,000? 10,000,000?) rows we're talking about here, you don't want to be loading that all into memory first.
As an aside - note that the IEnumerable<T> approach is essentially what the LINQ providers (LINQ-to-SQL, LINQ-to-Entities) do for a living. The approach has advantages, as Jon says. However, there are definite problems too - in particular (for me) in terms of (the combination of) separation | abstraction.
What I mean here is that:
in a MVC scenario (for example) you want your "get data" step to actually get data, so that you can test it works at the controller, not the view (without having to remember to call .ToList() etc)
you can't guarantee that another DAL implementation will be able to stream data (for example, a POX/WSE/SOAP call can't usually stream records); and you don't necessarily want to make the behaviour confusingly different (i.e. connection still open during iteration with one implementation, and closed for another)
This ties in a bit with my thoughts here: Pragmatic LINQ.
But I should stress - there are definitely times when the streaming is highly desirable. It isn't a simple "always vs never" thing...
Slightly more concise way to force evaluation of iterator:
using System.Linq;
//...
var stuff = GetStuff(connectionString).ToList();
No, you are on the right path... the yield will lock the reader... you can test it doing another database call while calling the IEnumerable
The only way this would cause problems is if the caller abuses the protocol of IEnumerable<T>. The correct way to use it is to call Dispose on it when it is no longer needed.
The implementation generated by yield return takes the Dispose call as a signal to execute any open finally blocks, which in your example will call Dispose on the objects you've created in the using statements.
There are a number of language features (in particular foreach) which make it very easy to use IEnumerable<T> correctly.
You could always use a separate thread to buffer the data (perhaps to a queue) while also doing a yeild to return the data. When the user requests data (returned via a yeild), an item is removed from the queue. Data is also being continuously added to the queue via the separate thread. That way, if the user requests the data fast enough, the queue is never very full and you do not have to worry about memory issues. If they don't, then the queue will fill up, which may not be so bad. If there is some sort of limitation you would like to impose on memory, you could enforce a maximum queue size (at which point the other thread would wait for items to be removed before adding more to the queue). Naturally, you will want to make sure you handle resources (i.e., the queue) correctly between the two threads.
As an alternative, you could force the user to pass in a boolean to indicate whether or not the data should be buffered. If true, the data is buffered and the connection is closed as soon as possible. If false, the data is not buffered and the database connection stays open as long as the user needs it to be. Having a boolean parameter forces the user to make the choice, which ensures they know about the issue.
I've bumped into this wall a few times. SQL database queries are not easily streamable like files. Instead, query only as much as you think you'll need and return it as whatever container you want (IList<>, DataTable, etc.). IEnumerable won't help you here.
What you can do is use a SqlDataAdapter instead and fill a DataTable. Something like this:
public IEnumerable<string> GetStuff(string connectionString)
{
DataTable table = new DataTable();
using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
string commandText = "GetStuff";
using (SqlCommand sqlCommand = new SqlCommand(commandText, sqlConnection))
{
sqlCommand.CommandType = CommandType.StoredProcedure;
SqlDataAdapter dataAdapter = new SqlDataAdapter(sqlCommand);
dataAdapter.Fill(table);
}
}
foreach(DataRow row in table.Rows)
{
yield return row["myImportantColumn"].ToString();
}
}
This way, you're querying everything in one shot, and closing the connection immediately, yet you're still lazily iterating the result. Furthermore, the caller of this method can't cast the result to a List and do something they shouldn't be doing.
Dont use yield here. your sample is fine.