Very slow foreach loop

Very slow foreach loop - c#

I am working on an existing application. This application reads data from a huge file and then, after doing some calculations, it stores the data in another table.
But the loop doing this (see below) is taking a really long time. Since the file sometimes contains 1,000s of records, the entire process takes days.
Can I replace this foreach loop with something else? I tried using Parallel.ForEach and it did help. I am new to this, so will appreciate your help.
foreach (record someredord Somereport.r)
{
try
{
using (var command = new SqlCommand("[procname]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(…);
IAsyncResult result = command.BeginExecuteReader();
while (!result.IsCompleted)
{
System.Threading.Thread.Sleep(10);
}
command.EndExecuteReader(result);
}
}
catch (Exception e)
{
…
}
}
After reviewing the answers , I removed the Async and used edited the code as below. But this did not improve performance.
using (command = new SqlCommand("[sp]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
foreach (record someRecord in someReport.)
{
command.Parameters.Clear();
command.Parameters.Add(....)
command.Prepare();
using (dr = command.ExecuteReader())
{
while (dr.Read())
{
if ()
{
}
else if ()
{
}
}
}
}
}

Instead of looping the sql connection so many times, ever consider extracting the whole set of data out from sql server and process the data via the dataset?
Edit: Decided to further explain what i meant..
You can do the following, pseudo code as follow
Use a select * and get all information from the database and store them into a list of the class or dictionary.
Do your foreach(record someRecord in someReport) and do the condition matching as usual.

Step 1: Ditch the try at async. It isn't implemented properly and you're blocking anyway. So just execute the procedure and see if that helps.
Step 2: Move the SqlCommand outside of the loop and reuse it for each iteration. that way you don't incurr the cost of creating and destroying it for every item in your loop.
Warning: Make sure you reset/clear/remove parameters you don't need from the previous iteration. We did something like this with optional parameters and had 'bleed-thru' from the previous iteration because we didn't clean up parameters we didn't need!

Your biggest problem is that you're looping over this:
IAsyncResult result = command.BeginExecuteReader();
while (!result.IsCompleted)
{
System.Threading.Thread.Sleep(10);
}
command.EndExecuteReader(result);
The entire idea of the asynchronous model is that the calling thread (the one doing this loop) should be spinning up ALL of the asynchronous tasks using the Begin method before starting to work with the results with the End method. If you are using Thread.Sleep() within your main calling thread to wait for an asynchronous operation to complete (as you are here), you're doing it wrong, and what ends up happening is that each command, one at a time, is being called and then waited for before the next one starts.
Instead, try something like this:
public void BeginExecutingCommands(Report someReport)
{
foreach (record someRecord in someReport.r)
{
var command = new SqlCommand("[procname]", sqlConn);
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
command.Parameters.Add(…);
command.BeginExecuteReader(ReaderExecuted,
new object[] { command, someReport, someRecord });
}
}
void ReaderExecuted(IAsyncResult result)
{
var state = (object[])result.AsyncState;
var command = state[0] as SqlCommand;
var someReport = state[1] as Report;
var someRecord = state[2] as Record;
try
{
using (SqlDataReader reader = command.EndExecuteReader(result))
{
// work with reader, command, someReport and someRecord to do what you need.
}
}
catch (Exception ex)
{
// handle exceptions that occurred during the async operation here
}
}

In SQL on the other end of a write is a (one) disk. You rarely can write faster in parallel. In fact in parallel often slows it down due to index fragmentation. If you can sort the data by primary (clustered) key prior to loading. In a big load even disable other keys, load data rebuild keys.
Not really sure what are doing in the asynch but for sure it was not doing what you expected as it was waiting on itself.
try
{
using (var command = new SqlCommand("[procname]", sqlConn))
{
command.CommandTimeout = 0;
command.CommandType = CommandType.StoredProcedure;
foreach (record someredord Somereport.r)
{
command.Parameters.Clear()
command.Parameters.Add(…);
using (var rdr = command.ExecuteReader())
{
while (rdr.Read())
{
…
}
}
}
}
}
catch (…)
{
…
}

As we were talking about in the comments, storing this data in memory and working with it there may be a more efficient approach.
So one easy way to do that is to start with Entity Framework. Entity Framework will automatically generate the classes for you based on your database schema. Then you can import a stored procedure which holds your SELECT statement. The reason I suggest importing a stored proc into EF is that this approach is generally more efficient than doing your queries in LINQ against EF.
Then run the stored proc and store the data in a List like this...
var data = db.MyStoredProc().ToList();
Then you can do anything you want with that data. Or as I mentioned, if you're doing a lot of lookups on primary keys then use ToDictionary() something like this...
var data = db.MyStoredProc().ToDictionary(k => k.MyPrimaryKey);
Either way, you'll be working with your data in memory at this point.

It seems executing your SQL command puts lock on some required resources and that's the reason enforced you to use Async methods (my guess).
If the database in not in use, try an exclusive access to it. Even then in there are some internal transactions due to data-model complexity consider consulting to database designer.

Related

High Memory Usage with Oracle.ManagedDataAccess.Core

I need a little help here.
I'm facing a problem now with Oracle.ManagedDataAccess.Core.
I have a class written for centralize oracle queries (Clases.Oracle()), which works perfectly, but with one query the memory usage rises up to 1GB, which is not a real problem, considering that the resultset has about 260.000 rows in the worst scenario. The real problem is that it never frees that memory, and if I execute that query again, rises up to 2GB, been that the higher limit so far.
I've tried adding GC.Collect() and GC.WaitForPendingFinalizers() with no results.
My command execution function in Clases.Oracle() is:
private DataTable ExecuteReader(string package, ref OracleParameter[] parametros, string owner)
{
var dt = new DataTable();
using (var cn = new OracleConnection(_connection_string))
{
using var cmd = cn.CreateCommand();
try
{
cn.Open();
cmd.CommandText = $"{owner}.{package}";
cmd.CommandType = CommandType.StoredProcedure;
foreach (var par in parametros)
{
cmd.Parameters.Add(par);
}
using var rdr = cmd.ExecuteReader(CommandBehavior.CloseConnection);
dt.Load(rdr);
}
catch (Exception ex)
{
throw new Exceptions.OracleException(ex.Message);
}
finally
{
cn?.Close();
cmd?.Dispose();
cn?.Dispose();
}
}
return dt;
}
I'm using the using clause, so the objects are disposing.
And I'm calling the connection with this function:
public List<AuditoriaUsuarios> ObtieneAuditoriaUsuarios(long incluyeCargoRol = 0)
{
var ora = new Clases.Oracle();
var param = new OracleParameter[]
{
ora.AddInParameter("PIN_INCLUYECARGOROL",OracleDbType.Decimal, incluyeCargoRol),
ora.AddOutCursor("CUR_OUT"),
ora.AddOutParameter("PON_CODE", OracleDbType.Decimal),
ora.AddOutParameter("POV_ERROR", OracleDbType.Varchar2)
};
var result = ora.ExecuteReader<AuditoriaUsuarios>($"{_PCK}.p_AUDIT_USUARIOS", ref param);
if (ora.HayError(param))
{
throw new Exceptions.OracleException(ora.CodigoError, ora.MensajeError);
}
//GC.Collect();
//GC.WaitForPendingFinalizers();
return result;
}
Clases.Oracle() doesn't need to be Disposable, because all the objects that I'm using are Disposable, and are being disposed, and 2 strings for the ConnectionString and the name of the Database Owner.
This is a Memory Usage dump from VS
You can see an oracle related object (OracleInternal.Common.ZoneValue) using a lot of memory way long after the ExecuteReader finished and the results where returned.
Don't know if I'm doing something wrong.
Edit:
I've forgot. This is a ASP .Net Core x64 WebAPI, using .NET Core 3.1 and C#, with Visual Studio 2019 Enterprise.
Edit2:
I know it's dirty, but adding this to ObtieneAuditoriaUsuarios made things a little better. (In this case I don't care about CPU usage, because this data extraction it's supposed to be executed a few times a week, and is not part of everyday operation):
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
Edit3:
I've sent 8 simultaneous requests and the memory usage rised up to 3GB in some tests. But only it takes 1 request with a filter that returns less than 100rows, and the memory usage drops to less than 1GB

Best Practices for SQL Statements/Connections in Get() Request

For simple lookups, I need to perform some SQL statements on a DB2 machine. I'm not able to use an ORM at the moment. I have a working example through this code, however I'm wondering if it can be optimized more as this would essentially create a connection on each request. And that just seems like bad programming.
Is there a way I can optimize this Get() request to leave a connection open? Nesting using statements seems dirty, as well. How should I handle the fact that Get() really wants to return an object of User no matter what; even in error? Can I put this connection in the start of the program so that I can use it over and over again? What are some of the best practices for this?
public class UsersController : ApiController
{
String constr = WebConfigurationManager.ConnectionStrings["DB2Connection"].ConnectionString;
public User Get([FromUri] User cst)
{
if (cst == null)
{
throw new HttpResponseException(HttpStatusCode.NotFound);
}
else
{
using (OdbcConnection DB2Conn = new OdbcConnection(constr))
{
DB2Conn.Open();
using (OdbcCommand com = new OdbcCommand(
// Generic SQL Statement
"SELECT * FROM [TABLE] WHERE customerNumber = ?", DB2Conn))
{
com.Parameters.AddWithValue("#var", cst.customerNumber);
using (OdbcDataReader reader = com.ExecuteReader())
{
try
{
while (reader.Read())
{
cst.name = (string)reader["name"];
return cst;
}
}
catch
{
throw;
}
}
}
}
return cst;
}
}
}
I found a great question that doesn't really have detailed answers, I feel like similar solutions exist for both of these questions...

And that just seems like bad programming.
Why do you think that?
The underlying system should be maintaining connections in a connection pool for you. Creating a connection should be very optimized already.
From a logical perspective, what you're doing now is exactly what you want to be doing. Create the connection, use it, and dispose of it immediately. This allows other threads/processes/etc. to use it from the connection pool now that you're done with it.
This also avoids the myriad of problems which arise from manually maintaining your open connections outside of the code that uses them.
Is there a way I can optimize this Get() request to leave a connection open?
Have you measured an actual performance problem? If not, there's nothing to optimize.
And there's a very good chance that hanging on to open connections in a static context in your web application is going to have drastic performance implications.
In short... You're already doing this correctly. (Well, except for that unnecessary try/catch. You can remove that.)
Edit: If you're just looking to improve the readability of the code (which itself is a matter of personal preference), this seems readable to me:
public User Get([FromUri] User cst)
{
if (cst == null)
throw new HttpResponseException(HttpStatusCode.NotFound);
using (var DB2Conn = new OdbcConnection(constr))
using (var com = new OdbcCommand("SELECT * FROM [TABLE] WHERE customerNumber = ?", DB2Conn))
{
com.Parameters.AddWithValue("#var", cst.customerNumber);
DB2Conn.Open();
using (OdbcDataReader reader = com.ExecuteReader())
while (reader.Read())
{
cst.name = (string)reader["name"]
return cst;
}
}
return cst;
}
Note that you can further improve it by re-addressing the logic of that SQL query. Since you're fetching one value from one record then you don't need to loop over a data reader. Just fetch a single literal and return it. Note that this is free-hand and untested, but it might look something like this:
public User Get([FromUri] User cst)
{
if (cst == null)
throw new HttpResponseException(HttpStatusCode.NotFound);
using (var DB2Conn = new OdbcConnection(constr))
using (var com = new OdbcCommand("SELECT name FROM [TABLE] WHERE customerNumber = ? FETCH FIRST 1 ROWS ONLY", DB2Conn))
{
com.Parameters.AddWithValue("#var", cst.customerNumber);
DB2Conn.Open();
cst.name = (string)com.ExecuteScalar();
}
return cst;
}

#David's answer addresses your actual questions perfectly but here's some other observations that may make your code a little more pallatable to you:
remove the try/catch block - all you're doing is re-throwing the exception which is what will happen if you don't use a try/catch at all. Don't catch the exception unless you can do something about it. (I see now that #David's answer addresses that - either it was added after I read it or I missed it - my apologies for the overlap but it's worth reinforcing)
Change your query to just pull name and use ExecuteScalar instead of ExecuteReader. You are taking the name value from the first record and exiting the while loop. ExecuteScalar returns the value from the first column in the first record, so you can eliminate the while loop and the using there.

Child Parent Transactions roll back

I have a scenario in which I have to process multiple .sQL files, every file contains 3-4 insert or Update queries, now when any query in a file fails I do rollback whole transaction means whole file we be rolled back , and all other files executed before that file will get committed, I want an option where user can rollback entire transaction means all queries in a file executed and all files executed before that particular file containing error, and if user wants to skip that particular file with error we will just rollback single file which contains error all other files will get committed, I am using SQL Transaction right now , no TransactionScope but obviously I can switch too TransactionScope() if needed and possible,
Currently pseudo for my code (what i want) is as follows
Var Files[]
for each (string query in Files)
{
Execute(Query)
IF(TRUE)
CommitQuery()
Else
result=MBOX("IF You want to abort all files or skip this one")
if(result=abort)
rollbackall()
else
QueryRollBack()
}

It seems you are looking for SavePoints, i.e. the option to partially roll back and then resume a larger transaction. AFAIK TransactionScope doesn't support SavePoints so you'll need to deal directly with the native provider (e.g. SqlClient if your RDBMS is Sql Server). (i.e. you cannot leverage the ability of TransactionScope to implement DTC equivalent of SavePoints, e.g. across distributed databases, disparate RDBMS, or parallel transactions)
That said, I would suggest a strategy where the user elects to skip or abort up front, before transactional processing begins, as it will be expensive awaiting UI response while a large number of rows are still locked - this will likely cause contention issues.
Edit
Here's a small sample of using SavePoints. Foo1 and Foo3 are inserted, Foo2 is rolled back to the preceding save point.
using (var conn = new SqlConnection(ConfigurationManager.ConnectionStrings["Foo"].ConnectionString))
{
conn.Open();
using (var txn = conn.BeginTransaction("Outer"))
{
txn.Save("BeforeFoo1");
InsertFoo(txn, "Foo1");
txn.Save("BeforeFoo2");
InsertFoo(txn, "Foo2");
txn.Rollback("BeforeFoo2");
txn.Save("BeforeFoo3");
InsertFoo(txn, "Foo3");
txn.Commit();
}
}
Where InsertFoo is:
private void InsertFoo(SqlTransaction txn, string fooName)
{
using (var cmd = txn.Connection.CreateCommand())
{
cmd.Transaction = txn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = "INSERT INTO FOO(Name) VALUES(#Name)";
cmd.Parameters.Add(new SqlParameter("#Name", SqlDbType.VarChar)).Value = fooName;
cmd.ExecuteNonQuery();
}
}
And the underlying table is:
create table Foo
(
FooId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
Name NVARCHAR(50)
)

Keep all insert, update queries in a try{..}catch(..){..} and if any exception occurs, in the catch roll the db transaction back.
private void InsertFoo(SqlTransaction txn, string fooName)
{
using (var cmd = txn.Connection.CreateCommand())
{
try
{
do your process here...
cmd.Commit();
}
catch(Exception ex)
{
cmd.Rollback();
}
}
}

Selecting million records from SQL Server

We need to index (in ASP.NET) all our records stored in a SQL Server table. That table has around 2M records with text (nvarchar) data too in each row.
Is it okay to fetch all records in one go as we need to index them (for search)? What is the other option (I want to avoid pagination)?
Note: I am not displaying these records, just need all of them in one go so that I can index them via a background thread.
Do I need to set any long time outs for my query? If yes, what is the most effective method for setting longer time outs if I am running the query from ASP.NET page?

If I needed something like this, just thinking about it from the database side, I'd probably export it to a file. Then that file can get moved around pretty easily. Moving around data sets that large is a huge pain to all involved. You can use SSIS, sqlcmd or even bcp in a batch command to get it done.
Then, you just have to worry about what you're doing with it on the app side, no worries about locking & everything on the database side once you've exported it.

I don't think a page is a good place for this regardless. There should be a different process or program that does this. On a related note maybe something like http://incubator.apache.org/lucene.net/ would help you?

Is it okay to fetch all records in one go as we need to index them
(for search)? What is the other option (I want to avoid pagination)?
Memory Management Issue / Performance Issue
You can face System Out Of Memory Exception in case you are bringing 2 millions of records
As you will be keeping all those records in DataSet and the dataset memory will be in RAM.
Do I need to set any long time outs for my query? If yes, what is the
most effective method for setting longer time outs if I am running the
query from ASP.NET page?
using (System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand())
{
cmd.CommandTimeout = 0;
}
Suggestion
It's better to filter out the record from database level...
Fetch all records from database and save it in a file. Access that file for any intermediate operations.

What you describe in Extract Transform Load (ETL). there are 2 options I'm aware of:
SSIS which is part of sql server
Rhino.ETL
I prefer Rhino.Etl as it's comletely written in C#, you can create scripts in Boo and it's much easier to test and compose ETL Processes. And the library is built to handle large sets of data, so memory management is built in.
One final note: while asp.net might be the entry point to start the indexing process, I wouldn't run the process within asp.net as it could take minutes or hours depending on the amount of records and processing.
instead have asp.net be the entry point to fires off a background task to process the records. Ideally, completely independent of asp.net so you avoid any timeout or shutdown issues.

Process your records in batches. You are going to have two main issues. (1) You need to index all of the existing records. (2) you will want to update the index with records that were added, updated or deleted. It might sound eaiser just to drop the index and recreate it, but it should be avoided if possible. Below is an example of processing the [Production].[TransactionHistory] from the AdventureWorks2008R2 database in batches of 10,000 records. It does not load all of the records into memory. Output on my local computer produces Processed 113443 records in 00:00:00.2282294. Obviously, this doesn't take into consideration remote computer and processing time for each record.
class Program
{
private static string ConnectionString
{
get { return ConfigurationManager.ConnectionStrings["db"].ConnectionString; }
}
static void Main(string[] args)
{
int recordCount = 0;
int lastId = -1;
bool done = false;
Stopwatch timer = Stopwatch.StartNew();
do
{
done = true;
IEnumerable<TransactionHistory> transactionDataRecords = GetTransactions(lastId, 10000);
foreach (TransactionHistory transactionHistory in transactionDataRecords)
{
lastId = transactionHistory.TransactionId;
done = false;
recordCount++;
}
} while (!done);
timer.Stop();
Console.WriteLine("Processed {0} records in {1}", recordCount, timer.Elapsed);
}
/// Get a new open connection
private static SqlConnection GetOpenConnection()
{
SqlConnection connection = new SqlConnection(ConnectionString);
connection.Open();
return connection;
}
private static IEnumerable<TransactionHistory> GetTransactions(int lastTransactionId, int count)
{
const string sql = "SELECT TOP(#count) [TransactionID],[TransactionDate],[TransactionType] FROM [Production].[TransactionHistory] WHERE [TransactionID] > #LastTransactionId ORDER BY [TransactionID]";
return GetData<TransactionHistory>((connection) =>
{
SqlCommand command = new SqlCommand(sql, connection);
command.Parameters.AddWithValue("#count", count);
command.Parameters.AddWithValue("#LastTransactionId", lastTransactionId);
return command;
}, DataRecordToTransactionHistory);
}
// funtion to convert a data record to the TransactionHistory object
private static TransactionHistory DataRecordToTransactionHistory(IDataRecord record)
{
TransactionHistory transactionHistory = new TransactionHistory();
transactionHistory.TransactionId = record.GetInt32(0);
transactionHistory.TransactionDate = record.GetDateTime(1);
transactionHistory.TransactionType = record.GetString(2);
return transactionHistory;
}
private static IEnumerable<T> GetData<T>(Func<SqlConnection, SqlCommand> commandBuilder, Func<IDataRecord, T> dataFunc)
{
using (SqlConnection connection = GetOpenConnection())
{
using (SqlCommand command = commandBuilder(connection))
{
using (IDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
T record = dataFunc(reader);
yield return record;
}
}
}
}
}
}
public class TransactionHistory
{
public int TransactionId { get; set; }
public DateTime TransactionDate { get; set; }
public string TransactionType { get; set; }
}

Prepared statements and the built-in connection pool in .NET

I have a long-running service with several threads calling the following method hundreds of times per second:
void TheMethod()
{
using (var c = new SqlConnection("..."))
{
c.Open();
var ret1 = PrepareAndExecuteStatement1(c, args1);
// some code
var ret2 = PrepareAndExecuteStatement2(c, args2);
// more code
}
}
PrepareAndExecuteStatement is something like this:
void PrepareAndExecuteStatement*(SqlConnection c, args)
{
var cmd = new SqlCommand("query", c);
cmd.Parameters.Add("#param", type);
cmd.Prepare();
cmd.Parameters["#param"] = args;
return cmd.execute().read().etc();
}
I want reuse the prepared statements, preparing once per connection and executing them until the connection breaks. I hope this will improve performance.
Can I use the built-in connection pool to achieve this? Ideally every time a new connection is made, all statements should be automatically prepared, and I need to have access to the SqlCommand objects of these statements.

Suggest taking a slightly modified approach. Close your connection immedately after use. You can certainly re-use your SqlConnection.
The work being done at //some code may take a long time. Are you interacting with other network resources, disk resources, or spending any amount of time with calculations? Could you ever, in the future, need to do so? Perhaps the intervals between executing statement are/could be so long that you'd want to reopen that connection. Regardless, the Connection should be opened late and closed early.
using (var c = new SqlConnection("..."))
{
c.Open();
PrepareAndExecuteStatement1(c, args);
c.Close();
// some code
c.Open();
PrepareAndExecuteStatement2(c, args);
c.Close();
// more code
}
Open Late, Close Early as MSDN Magazine by John Papa.
Obviously we've now got a bunch of code duplication here. Consider refactoring your Prepare...() method to perform the opening and closing operations.
Perhaps you'd consider something like this:
using (var c = new SqlConnection("..."))
{
var cmd1 = PrepareAndCreateCommand(c, args);
// some code
var cmd2 = PrepareAndCreateCommand(c, args);
c.Open();
cmd1.ExecuteNonQuery();
cmd2.ExecuteNonQuery();
c.Close();
// more code
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Very slow foreach loop - c#

Related

High Memory Usage with Oracle.ManagedDataAccess.Core

Best Practices for SQL Statements/Connections in Get() Request

Child Parent Transactions roll back

Selecting million records from SQL Server

Prepared statements and the built-in connection pool in .NET

Categories

Resources