c# Parallel loop error - c#

Anyone can help? When i write this code and run. The program show me error stated "There is already an open DataReader associated with this Command which must be closed first".
This is my code.
Parallel.For(0, MthRange, i => {
PSUpfrontFeeForMonth[i] = CommissionSummary
.Where(s => s.TransDate == oReportCommonFilter.fromDate.AddMonths(i))
.Sum(s => s.PSUpfrontFee);
if (!PSUpfrontFeeForMonth[i].HasValue)
{
PSUpfrontFeeForMonth[i] = 0;
}
});
Thanks.
Regards,
Jane

Parallelizing database query is completely wrong for the following reasons:
Query is issued against sql from
each processor so multiple data
readers will be opened -> 'error'
No performance gain is achieved, in fact
the program becomes slower because
each processor is trying to connect to
the database and no parallel
processing is actually done since all
query processing is done in sql! so
the normal serial query is faster in
this case.

If you really need to have multiple database connections open simultaneously (as others have stated not necessarily a good idea) then you can usually specify in the connection string that this is needed.
Typical scenario in which I've used this is for using a DataReader to stream rows from a database table (as I don't know how many rows I need in advanced) and where I then need to make additional queries on other database tables. I do this rather than a single query as it would require multiple complex joins and my app has a good caching layer to reduce queries to the database.
For Microsoft SQL Server you add MultipleActiveResultSets=True; to the connection string

My guess would be something to do with how PSUpfrontFeeForMonth works internally is using data readers.
Since I have no idea how that works, first thing I would try would be to initialise the PSUpfrontFeeForMonth within the loop. Maybe that will ensure a dedicated data reader for each iteration.

Related

Concurrent inserts into database

I'm building a program that takes push data from six different sources and inserts the data into a database. Each source has its own function to execute the inserts as soon as they come, but all sources write to the same table.
I would have the following questions:
If one source is currently writing to the table and another source begins to write at the same time is there any chance the inserts will block each other?
The table is also constantly being used to read the data via a view that join some more tables to show the data, can this pose any problems?
Currently each source has its own DB connection to write data, would it be better to have only one connection, or have each use its own?
If one source is currently writing to the table and another source
begins to write at the same time is there any chance the inserts will
block each other?
It depends on the indexes. If the index keys have the same or contiguous values, you may see short=term blocking for the duration of the transaction.
The table is also constantly being used to read the data via a view
that join some more tables to show the data, can this pose any
problems?
It depends on the isolation level. No blocking will occur if:
SELECT queries are running in READ_COMMITTED isolation level and
the READ_COMMITTED_SNAPSHSOT database option is turned on
the SELECT queries don't touch uncommitted data
the SELECT queries run in READ_UNCOMMITTED isolation level
Even if blocking does occur, it may be short-lived if the INSERT transactions are short.
Currently each source has its own DB connection to write data, would
it be better to have only one connection, or have each use its own?
Depends on the problem you are trying to solve. A single connection will ensure inserts don't block/deadlock with each other but might not be an issue anyway.
Please find the below inline answer
If one source is currently writing to the table and another source begins to write at the same time is there any chance the inserts will block each other?
In this case another resource will wait for it.(Insert will be in waiting state for next one)
The table is also constantly being used to read the data via a view that join some more tables to show the data, can this pose any problems?
No problem.
Currently each source has its own DB connection to write data, would it be better to have only one connection, or have each use its own?
Its better to have one DB connection.
Block "each other" i.e. dead-lock is not possible.
No problem. Only if select is too slow, it can delay next insert.
No problem with different connections.

Why time consuming at .ToList() in linq to entities?

We have a website which is using linq to entities, we found that it's very slow recently, after troubleshooting, I found whenever we use linq to entities to search data from database, it will consume very much CPU time, like toList() function. I know it might because we have lots of data in database which caused to slow response, but I just wonder if there are any other reasons which might cause this problem?
How should I do to optimize these kinds of problem? following is the possible reasons:
ToList() might load all object's foreign object(foreign key), how can I force it only load the object?
Is my connection pool too small?
Please let me know if there are any other possible reason, and point me the right direction to solve this issue.
In Linq - a query returns the results from a sequence of manipulations to sources when the query is enumerated.
IQueryable<Customer> myQuery = ...
foreach(Customer c in myQuery) //enumerating the query causes it to be executed
{
}
List<Customer> customers = myQuery.ToList();
// ToList will enumerate the query, and put the results in a list.
// enumerating the query causes it to be executed.
An executing query requires a few things (in no particular order)
A database connection is drawn from the pool.
The query is interpreted by the query provider (in this case, the provider is linq to entities and the interpretation is some form of sql)
The interpretted form is transmitted to the database, where it does what it does and returns data objects.
Some method must be generated to translate the incoming data objects into the desired query output.
The database connection is returned to the pool.
The desired query output may have state tracking done to it before it is returned to your code.
Additionally, the database has a few steps, here listed from the point of view of querying a sql server:
The query text is recieved and checked against the query plan cache for an existing plan.
If no plan exists, a new one is created and stuck into the plan cache by the query optimizer.
The query plan is executed - IO/locks/CPU/Memory - any of these may be bottlenecks
Query results are returned - network may be a bottleneck, particularly if the resultset is large.
So - to find out where the problem with your query is, you need to start measuring. I'll order these targets in the order I'd check them. This is not a complete list.
Get the translated sql text of the query. You can use sql server profiler for this. You can use the debugger. There are many ways to go about it. Make sure the query text returns what you require for your objects, no more no less. Make sure the tables queried match your expectations. Run the query a couple times.
Look at the result set. Is it reasonable or are we looking at 500 Gigs of results? Was a whole table queried, when the whole thing wasn't needed? Was a cartesian result generated unexpectedly?
Get the execution plan of the query (in sql studio, click the show estimated execution plan button). Does the query use the indexes you expect it to? Does the plan look wierd (possibly a bad plan came from the cache)? Does the query work on tables in the order you expect it to, and perform nested/merge/hash joins in the way you expect? Is there parallellization kicking in, when the query doesn't deserve it (this is a sign of bad indexes/TONS of IO)?
Measure the IO of the query. (in sql server, issue SET STATISTICS IO ON). Examine the logical IO per table. Which table stands out? Again, look for a wrong order of table access or an index that can support the query.
If you've made it this far, you've likely found and fixed the problem. I'll keep going though, in case you haven't.
Compare the execution time of the query to the execution time of the enumeration. If there's a large difference, it may be that the code which interprets the data objects is slow or that it generated slow. It could also be that the translation of the query took a while. These are tricky problems to solve (in LinqToSql we use compiled queries to sort them out).
Measure Memory and CPU for the machine the code is running on. If you are capped there, use a code profiler or memory profiler to identify and resolve the issue.
Look at the network stats on the machine, in particular you may want to use TCPView to see the TCP socket connections on the machine. Socket resources may be mis-used (such as opening and closing thousands in a minute).
Examine the database for locks held by other connections.
I guess that's enough. Hope I didn't forget any obvious things to check.
You might find the solution to your problem in Performance Considerations (Entity Framework) on MSDN. In particular
Return the correct amount of data
In some scenarios, specifying a query path using the Include method is
much faster because it requires fewer round trips to the database.
However, in other scenarios, additional round trips to the database to
load related objects may be faster because the simpler queries with
fewer joins result in less redundancy of data. Because of this, we
recommend that you test the performance of various ways to retrieve
related objects. For more information, see Loading Related Objects.
To avoid returning too much data in a single query, consider paging
the results of the query into more manageable groups. For more
information, see How to: Page Through Query Results.

Should I open and close db for each query?

I am using old school ADO.net with C# so there is a lot of this kind of code. Is it better to make one function per query and open and close db each time, or run multiple queries with the same connection obect? Below is just one query for example purpose only.
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings["DBConnectMain"].ConnectionString))
{
// Add user to database, so they can't vote multiple times
string sql = " insert into PollRespondents (PollId, MemberId) values (#PollId, #MemberId)";
SqlCommand sqlCmd = new SqlCommand(sql, connection);
sqlCmd.Parameters.Add("#PollId", SqlDbType.Int);
sqlCmd.Parameters["#PollId"].Value = PollId;
sqlCmd.Parameters.Add("#MemberId", SqlDbType.Int);
sqlCmd.Parameters["#MemberId"].Value = Session["MemberId"];
try
{
connection.Open();
Int32 rowsAffected = (int)sqlCmd.ExecuteNonQuery();
}
catch (Exception ex)
{
//Console.WriteLine(ex.Message);
}
}
Well, you could measure; but as long as you are using the connections (so they are disposed even if you get an exception), and have pooling enabled (for SQL server it is enabled by default) it won't matter hugely; closing (or disposing) just returns the underlying connection to the pool. Both approaches work. Sorry, that doesn't help much ;p
Just don't keep an open connection while you do other lengthy non-db work. Close it and re-open it; you may actually get the same underlying connection back, but somebody else (another thread) might have made use of it while you weren't.
For most cases, opening and closing a connection per query is the way to go (as Chris Lively pointed out). However, There are some cases where you'll run into performance bottlenecks with this solution though.
For example, when dealing with very large volumes of relatively quick to execute queries that are dependent on previous results, I might suggest executing multiple queries in a single connection. You might encounter this when doing batch processing of data, or data massaging for reporting purposes.
Always be sure to use the 'using' wrapper to avoid mem leaks though, regardless of which pattern you follow.
If the methods are structured such that a single command is executed within a single method, then Yes: instantiate and dispose of the connection for each command.
If the methods are structured such that you have multiple commands executed in the same block of code, then the outer block needs to be the using clause for the connection.
ADO is very good about connection pooling so instantiating and disposing of the command object is going to be extremely fast and really won't impact performance.
As an example, we have a few pages that will execute update to 50 queries in order to compose the page. Because there is branching code to determine the queries to run, we have each of them wrapped with their own using (connection...) clauses.
We once ripped those out and grabbed one connection object and passed it to the individual methods. This had exactly zero performance improvement while complicating the hell out of the code with all the exception clauses every where to ensure the connection was properly disposed at the end. At the end of the test, we rolled back the code to how it was before. Much cleaner to know exactly what was going on and when a connection was being used.
Well, as always, it depends. If you have 5 database call to make within the same method call, you should probably use a single connection.
However, holding onto connection while nothing is happening isn't usually advised from a scalability standpoint.
ADO.NET is old school now? Wow, you just made me feel old. To me Rogue Wave ODBC using Borland C++ on Windows 3.1 is old school.
To answer, in general you want to understand how your data drivers work. Understand such concepts as connection pooling and learn to profile the transaction costs associate with connecting / disconnecting and executing queries. Then take that knowledge and apply it it your situation.

Why does a database query only go slow in the application?

I have a webpage that takes 10 minutes to run one query against a database, but the same query returns in less than a second when run from SQL Server Management Studio.
The webpage is just firing SQL at the database that is executing a stored procedure, which in turn is performing a pretty simple select over four tables. Again the code is basic ADO, setting the CommandText on an SqlCommand and then performing an ExecuteReader to get the data.
The webpage normally works quickly, but when it slows down the only way to get it speeded up is to defragment the indexes on the tables being queried (different ones different times), which doesn't seem to make sense when the same query executes so quickly manually.
I have had a look at this question but it doesn't apply as the webpage is literally just firing text at the database.
Does anyone have any good ideas why this is going slow one way and not the other?
Thanks
I would suspect parameter sniffing.
The cached execution plan used for your application's connection probably won't be usable by your SSMS connection due to different set options so it will generate a new different plan.
You can retrieve the cached plans for the stored procedure by using the query below. Then compare to see if they are different (e.g. is the slow one doing index seeks and bookmark lookups at a place where the other one does a scan?)
Use YourDatabase;
SELECT *
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
cross APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
where sys.dm_exec_sql_text.OBJECTID=object_id('YourProcName')
and attribute='set_options'
Is there any difference between the command text of the query in the app and the query you are executing manually? Since you said that reindexing helps performance (which also updates statistics), it sounds like it may be getting stuck on a bad execution plan.
You might want to run a sql trace and capture the showplanxml event to see what the execution plan looks like, and also capture sql statement complete (though this can slow the server down if a lot of statements are coming through the system so be careful) to be sure the statement sent to SQL server is the same one you are running manually.

Insert data into SQL server with best performance

I have an application which intensively uses DB (SQL Server).
As it must have high performance I would like to know the fastest way to insert record into DB.Fastest from the standpoint of execution time.
What should I use ?
As I know the fastest way is to create stored procedure and to call it from code (ADO.NET).
Please let me know is there any better way or may be there are is some other practices to increase performance.
a bulk insert would be the fastest since it is minimally logged, perhaps you can use the SqlBulkCopy class
"It depends".
How many rows are you talking about inserting?
How frequently will they be inserted?
What other database operations will be taking place at the same time?
Will the rows be inserted because of user action (clicking a button), or because of some external stimulus?
Based on your update, I think you should consider mechanisms other than simple code. Look into SQL Server Integration Services, which are optimized for bulk database operations. It's possible that what you need is a simple SSIS job that runs periodically to do a bulk insert on all "new" data meeting particular criteria. It would allow modification over time to use things like staging tables or intermediate servers if that should prove necessary.
Please let me know is there any better way or may be there are is some other practices to increase performance.
Do not open one connection per record. Do learn how connection pooling generally stops you from inadvertently opening one connection per record.
If possible, do not open one transaction per record. Also do not leave the transaction open for undue periods of time.
Consider table design: narrow tables with few indexes/constraints and no triggers.
If you need a fast insert because you're a web application and need to return a page to the user NOW or you're a winform app and are blocking on the UI thread, consider performing the insert async or on another thread.
If you need a fast insert to import a million line file, consider doing a bulk insert.
If all you want to do is store the data, and not to query it... consider using a file-based solution instead.
Have you done the math? 2M/day = 83k/hour = 1388/min = 23/second.
At 23 inserts per second SQL Server won't break a sweat.

Categories