Why is C# List<T> Add method very slow? - c#

I have an ASP.NET MVC project using Dapper to read data from a database and I need to export to Excel.
Dapper is fast! ExecuteReader takes only 35 seconds.
But list.Add(InStock); spends too much time! Over 1020 seconds!
Do you have any idea why this is?
public List<InStock> GetList(string stSeId, string edSeId, string stSeDay, string edSeDay, string qDate)
{
List<InStock> list = new List<InStock>();
InStock InStock = null;
IDataReader reader;
using (var conn = _connection.GetConnection())
{
try
{
conn.Open();
//******************Only 35 seconds*****
reader = conn.ExecuteReader(fileHelper.GetScriptFromFile("GetInStock"),
new { STSeId = stSeId, EDSeId = edSeId, STSeDay = stSeDay, EDSeDay = edSeDay, qDate = qDate });
//*************************************
//******************Over 1020 seconds**********
while (reader.Read())
{
InStock = new InStock();
InStock.ColA = reader.GetString(reader.GetOrdinal("ColA"));
InStock.ColB = reader.GetString(reader.GetOrdinal("ColB"));
InStock.ColC = reader.GetString(reader.GetOrdinal("ColC"));
list.Add(InStock);
}
//*********************************************
return list;
}
catch (Exception err)
{
throw err;
}
}
}

It's the database.
From Retrieve data using a DataReader,
The DataReader is a good choice when you're retrieving large amounts of data because the data is not cached in memory.
The key clue for your performance concern regards "because the data is not cached in memory". While strictly an implementation detail, each call to Read() gets new data from the database, while the List<InStock>.Add() call is just adding the new InStock to the list.
There are orders of magnitude of difference in processing times between disk access (even SSDs) compared to RAM. And theres orders of magnitude of difference between network requests and disk access. There's not really a conceivable way that anything other than the database access is the cause of most of your run time.
--
As a side note, you're going to exceed the maximum number of rows in an Excel worksheet.

Related

datacollection program not correctly collecting

I created an data collection app for our company which collect data from our remote devices.
The data is collected from a datamailbox which is comparable with an database that works like an 10 day buffer to store the data. this is all correctly working.
The data is collected through post api requests. for example :
var url = BuildUrl("syncdata");
var response = webClient.CallApi(url, new NameValueCollection() { { "createTransaction","" }, { "lastTransactionId", transactionId } });
var data = DynamicJson.Parse(response);
transactionId = data.transactionId;
I've been trying to collect multiple devices at one time but the problem is that it starts running and collect the data from the first device which works. Than our second device will start collecting the data but it only starts from where device one ended so i've been losing 12hours of data each run. For performance we use transactionId's.(each set of data has its own Id)
The workflow should be like this :
When the data is retrieved for the first time, the user specifies only
the createTransaction filter. The DataMailbox returns all the data of
all devices gateways – with historical data – of the account along a
transaction ID. For the next calls to the API, the client specifies
both createTransaction and lastTransactionId filters. The
lastTransactionId is the ID of the transaction that was returned by
the latest request. The system returns all the historical
data that has been received by the DataMailbox since the last
transaction and a new transaction ID. deviceIds is an additional
filter on the returned result. You must be cautious when using the
combination of lastTransactionId, createTransaction and deviceIds.
lastTransactionId is first used to determine what set of data — newer
than this transaction ID and from all the Device gateways — must be
returned from the DataMailbox, then deviceIds filters this set of data
to send data only from the desired device gateways. If a first request
is called with lastTransactionId, createTransaction and deviceIds, the
following request — implying a new lastTransactionId — does not
contain values history from the previous lastTransactionId of the
device gateways that were not in the deviceId from previous request.
I'm really struggling with the data collection and have no clue how to use the TransactionId and the LastTransActionId.This is the code for now
try
{
CheckLogin();
using (var webClient = new MyWebClient())
{
bool moreDataAvailable;
int samplesCount = 0;
string transactionId = Properties.Settings.Default.TransactionId;
string lastTransactionId = Properties.Settings.Default.LastTransactionId;
do
{
var url = BuildUrl("syncdata");
var response = webClient.CallApi(url, new NameValueCollection() { { "createTransaction","" }, { "lastTransactionId", transactionId } });
var data = DynamicJson.Parse(response);
transactionId = data.transactionId;
var talk2MMessage = getTalk2MMessageHeader(webClient);
if (talk2MMessage != null)
{
}
foreach (var ewon in data.ewons)
{
Directory.CreateDirectory(ewon.name);
foreach (var tag in ewon.tags)
{
try
{
Console.WriteLine(Path.Combine(ewon.name, tag.name + ""));
foreach (var sample in tag.history)
{
Console.WriteLine(ewon.name + " " + tag.name + " " + tag.description);
Console.WriteLine(transactionId);
samplesCount++;
}
}
catch (RuntimeBinderException)
{ // Tag has no history. If it's in the transaction, it's most likely because it has alarm history
Console.WriteLine("Tag {0}.{1} has no history.", ewon.name, tag.name);
}
}
}
Console.WriteLine("{0} samples written to disk", samplesCount);
// Flush data received in this transaction
if (Properties.Settings.Default.DeleteData)
{
//Console.WriteLine("Flushing received data from the DataMailbox...");
url = BuildUrl("delete");
webClient.CallApi(url, new NameValueCollection() { { "transactionId", transactionId } });
Console.WriteLine("DataMailbox flushed.");
}
//save the transaction id for next run of this program
Properties.Settings.Default.LastTransactionId = lastTransactionId;
Properties.Settings.Default.Save();
// Did we receive all data?
try
{
moreDataAvailable = data.moreDataAvailable;
}
catch (RuntimeBinderException)
{ // The moreDataAvailable flag is not specified in the server response
moreDataAvailable = false;
}
if (moreDataAvailable)
Console.WriteLine("There's more data available. Let's get the next part...");
}
while (moreDataAvailable);
Here are my credentials for starting the collection like all parameters
static void CheckLogin()
{
if (string.IsNullOrEmpty(Properties.Settings.Default.Talk2MDevId))
{
Properties.Settings.Default.Talk2MDevId = Prompt("Talk2MDevId");
Properties.Settings.Default.APIToken = Prompt("API Token");
string deleteInputString = Prompt("Delete data after synchronization? (yes/no)");
Properties.Settings.Default.DeleteData = deleteInputString.ToLower().StartsWith("y");
Properties.Settings.Default.TransactionId = "";
Properties.Settings.Default.LastTransactionId = "";
Properties.Settings.Default.Save();
}
I think it's something with the transactionId and LastTransaction id but i have no clue.
More information can be found here: https://developer.ewon.biz/system/files_force/rg-0005-00-en-reference-guide-for-dmweb-api.pdf
As I understand your question, you problem is that for the first few transactionIds, you only get data from device 1 and then only data from device 2.
I'm assuming the following in my answer:
You didn't specify somewhere else in code the filter on "ewonid"
When you say you lose 12 hours of data , you are assuming it because "device 2" data are streamed after "device 1" data.
You did try without the /delete call with no change
/syncdata is an endpoint that returns a block of data for an account since a given transactionId (or oldest block if you didn't provide a transactionID). This data is sorted by storage date by the server, which depends on multiple factors:
when was the device last "vpn online"
at which frequency the device is pushing data to datamailbox
when was that device packet digested by datamailbox service
You could technically have 1 year old data pushed by a device that gets connected back to vpn now, and those data would be registered in the most recent blocks.
For those reasons, the order of data block is not the order of device recording timestamp. You always have to look at the field ewons[].tags[].history[].date to known when that measure was made.
foreach (var sample in tag.history)
{
Console.WriteLine(ewon.name + " " + tag.name + " " + tag.description);
Console.WriteLine(sample.value + " at " + sample.date);
Console.WriteLine(transactionId);
samplesCount++;
}
In your case, I would assume both devices are configured to push their data once a day, one pushing it's backlog, let's say, at 6AM and the other at 6PM.

How to get file size of all datasets of data factory - specially data lake store and blob?

We have many different pipelines in Azure data factory with many data sets. Mainly we have data sets of Azure data lake store and Azure Blobs. I want to know the file size of all files (from all datasets of all pipelines). I am able to iterate all the datasets from all the pipeline using DataFactoryManagementClient in C# but when I am trying to see fileName or folderName of the dataset, I am getting null. You can see my below code -
private static void GetDataSetSize(DataFactoryManagementClient dataFactoryManagementClient)
{
string resourceGroupName = "resourceGroupName";
foreach (var dataFactory in dataFactoryManagementClient.DataFactories.List(resourceGroupName).DataFactories)
{
var linkedServices = new List<LinkedService>(dataFactoryManagementClient.LinkedServices.List(resourceGroupName, dataFactory.Name).LinkedServices);
var datasets = dataFactoryManagementClient.Datasets.List(resourceGroupName, dataFactory.Name).Datasets;
foreach (var dataset in datasets)
{
var lsTypeProperties = linkedServices.First(ls => ls.Name == dataset.Properties.LinkedServiceName).Properties.TypeProperties;
if(lsTypeProperties.GetType() == typeof(AzureDataLakeStoreLinkedService))//AzureDataLakeStoreLinkedService))
{
AzureDataLakeStoreLinkedService outputLinkedService = lsTypeProperties as AzureDataLakeStoreLinkedService;
var folder = GetBlobFolderPathDL(dataset);
var file = GetBlobFileNameDL(dataset);
}
}
}
}
public static string GetBlobFolderPathDL(Dataset dataset)
{
if (dataset == null || dataset.Properties == null)
{
return string.Empty;
}
AzureDataLakeStoreDataset dlDataset = dataset.Properties.TypeProperties as AzureDataLakeStoreDataset;
if (dlDataset == null)
{
return string.Empty;
}
return dlDataset.FolderPath;
}
public static string GetBlobFileNameDL(Dataset dataset)
{
if (dataset == null || dataset.Properties == null)
{
return string.Empty;
}
AzureDataLakeStoreDataset dlDataset = dataset.Properties.TypeProperties as AzureDataLakeStoreDataset;
if (dlDataset == null)
{
return string.Empty;
}
return dlDataset.FileName;
}
With this, I want to generate monitoring tool which will tell me how data is increasing for each file/dataset?
FYI - I am going to monitor retries, failures of each slice. I can get this information without any issue, but now the problem is about getting the file name and folder path because it's returning me null(It seems to be a bug in API). Once I have folder and file path, then using DataLakeStoreFileSystemManagementClient I will get the file size of those files. I am planning to ingest all this data (size, fileName, retries, failure etc) into SQL database and on top of it - I will generate reports which will tell me how my data is growing daily or hourly etc.
I want to make it generic, in such a way that - if in future I add new dataset or pipeline - I get the size of all newly added datasets also without changing any code.
Please help me how can I achieve this. Suggest me if there is an alternate way if possible.
Just place this code in your main method and execute.You may able to see your datasets folderpath and filenames.Use this and change accordingly to your requirement.
Hope this helps!
foreach (var dataFactory in dataFactoryManagementClient.DataFactories.List(resourceGroupName).DataFactories)
{
var datasets = dataFactoryManagementClient.Datasets.List(resourceGroupName, dataFactory.Name).Datasets;
foreach (var dataset in datasets)
{
var lsTypeProperties = dataFactoryManagementClient.Datasets.Get(resourceGroupName,dataFactory.Name,dataset.Name);
if (lsTypeProperties.Dataset.Properties.TypeProperties.GetType() == typeof(AzureDataLakeStoreDataset))//AzureDataLakeStoreDataset))
{
AzureDataLakeStoreDataset OutputDataSet = lsTypeProperties.Dataset.Properties.TypeProperties as AzureDataLakeStoreDataset;
Console.WriteLine(OutputDataSet.FolderPath);
Console.WriteLine(OutputDataSet.FileName);
Console.ReadKey();
}
}
}

Need o log data very fast using entity framework

I need to develop a software to monitor a value from a pressure transducer using a PLC and store the values in a datababe. The problem is i need to read de values every 20ms. Im using this code to save the data using entity framework and SQL. Im using a text box to see if the timer can handle the speed and confront with the SQL
Records made with the text box:
26/06/2017 - 10: 46:35.236
26/06/2017 - 10: 46:35.256
26/06/2017 - 10: 46:35.276
26/06/2017 - 10: 46:35.296
private void mmTimer_Tick(object sender, System.EventArgs e)
{
counter++;
lblCounter.Text = counter.ToString();
txtDT.AppendText(DateTime.Now.ToString("dd/MM/yyyy - HH: mm:ss.FFF\n"));
using (DatabaseContext db = new DatabaseContext())
{
storeDataBindingSource.DataSource = db.StoreDataList.ToList();
StoreData objStoreData = storeDataBindingSource.Current as StoreData;
{
var _StoreData = new StoreData
{
DateTime = DateTime.Now.ToString("dd/MM/yyyy - HH: mm:ss.FFF")
};
db.StoreDataList.Add(_StoreData);
db.SaveChanges();
}
}
}
But when i look at the SQL Table the time values dont keep the same 20ms in every insert probably because of the huge amount of data that are beeing saved every time. Maybe i should use a buffer and insert all at once.
Any sugestion? Thanks in advance.
Any suggestion
use a buffer and insert all at once.
Definitely buffer readings. As a further optimization you can bypass SaveChanges() (which performs row-by-row inserts) and use a TVP or SqlBulkCopy to insert batches into SQL Server.

SQL Query pauses

My query seems to be stalling every so many passes through the query.
status_text.Text = "Check existing records...";
status_text.Refresh();
using (StreamReader reader = new StreamReader(df_text_filename))
{
using (StreamWriter writer = new StreamWriter(df_text_filename + "_temp"))
{
while ((product = reader.ReadLine()) != null)
{
if (product != _aff_svc.DFHeaderProd)
{
df_product = _product_factory.GetProductData(_vsi, product);
}
status_text.Text = "Checking for existing record of vendor record ID " + df_product.SKU;
status_text.Refresh();
if (_pctlr.GetBySKU(df_product.SKU) != null)
{
continue;
}
writer.WriteLine(product);
Application.DoEvents();
}
writer.Close();
}
reader.Close();
}
System.IO.File.Delete(df_text_filename);
System.IO.File.Move(df_text_filename + "_temp", df_text_filename);
The code quickly runs through the GetBySKU about 10 times, pauses for about a second or so, then quickly does another ten records. This occurs throughout my processes, not just with this particular query.
It also occurs whether or not I have the Application.DoEvents() fire.
The other problem is that it is not consistent. I can be working like this for a few hours, then all of a sudden, it will zip through the loop as intended (expected).
My SQL server is running on the same machine as the program.
I looked into dedicating resources to the server to mitigate this behavior, but have found nothing.
It looks as if you're program is parsing a text file for product info and then as it parses, in a while loop you're executing a couple SQL queries. It's almost always a bad idea to make SQL round trips inside a loop.
Instead, I would look into parsing the file, gathering all the product ideas, closing the file and then make one call to sql passing a/many TVPs (table valued parameters) to a sproc and return all the data you need from that sproc - possibly as many tables.
EDIT:
You mentioned in the comments that the file is very large with lots of processing. You could consider batching the SQL work in lets say something like 100?
Also, if you're SQL isn't tuned it would continually slow down as more data is written. There's not enough info in the question to analyze the indexes, query plans etc... but have a look at that as the data set grows.
I will work on a batch solution later, however, this works much faster than the previous code. No pauses at all.
List<Product> _prod_list = new List<Product>();
_prod_list = ProductDataFactory.GetProductListByVendor(vendor_name);
if (_prod_list.Count() > 0)
{
using (StreamReader reader = new StreamReader(df_text_filename))
{
using (StreamWriter writer = new StreamWriter(df_text_filename + "_temp"))
{
while ((product = reader.ReadLine()) != null)
{
if (product != _aff_svc.DFHeaderProd)
{
df_product = _product_factory.GetProductData(_vsi, product);
}
if (_prod_list.Find(o => o.SKU == df_product.SKU) != null)
{
continue;
}
writer.WriteLine(product);
}
writer.Close();
}
reader.Close();
}
System.IO.File.Delete(df_text_filename);
System.IO.File.Move(df_text_filename + "_temp", df_text_filename);
}
Just pulling a list of product objects and query it for existing records if there are any; if not, it skips the whole process of course. No need to hit the db in the loop either.
Thanks.

What is wrong with my program logic

Please help me find the defect in my logic. I have two variables named "prev" and "next"...What i am basically doing is reading the data from my database every 5s and printing it out using Websync server if next and prev are NOT equal. I have two rows in my database . It looks like
ID
8
10
Here is the link to the code http://pastebin.com/Hb3eH2Qv
When i run my program, i get the result as
8 10 8 10
8 10
8 10 8 10
8 10
..... (so on)
But, the result should be just
8 10
I dont know how 8 10 8 10 appears. Data gets concatenated twice.
NOTE: You can just see the code in PublishLoop() function
private void PublishLoop()
{
String prev=String.Copy("");
String next=String.Copy("");
String ConnectionString = ConfigurationManager.ConnectionStrings["MyDbConn"].ToString();
SqlConnection connection = new SqlConnection(ConnectionString);
SqlCommand command = connection.CreateCommand();
command.CommandText = "select ID from Tab1";
command.Notification = null;
while (Running)
{
connection.Open();
using (SqlDataReader reader = command.ExecuteReader(CommandBehavior.CloseConnection))
{
StreamWriter sw1 = new StreamWriter("C:\\Users\\Thothathri\\Desktop\\next.txt");
while ((reader.Read()))
{
//Response.Write(reader[0].ToString());
next = String.Concat(next,reader[0].ToString());
sw1.WriteLine(next);
}
sw1.Close();
if (!prev.Equals(next))
{
Publisher publisher = new Publisher(new PublisherArgs
{
DomainKey = "c80cb405-eb77-4574-9405-5ba51832f5e6",
DomainName="localhost"
});
Publication publication = publisher.Publish("/test", JSON.Serialize(next));
if (publication.Successful == true)
{
StreamWriter sw = new StreamWriter("C:\\Users\\Thothathri\\Desktop\\error123.txt");
sw.WriteLine("success");
sw.WriteLine(next);
sw.Close();
}
else
{
StreamWriter sw = new StreamWriter("C:\\Users\\Thothathri\\Desktop\\error123.txt");
sw.Write("failed");
sw.Close();
}
prev = String.Copy(next);
next = String.Copy("");
}
}
Thread.Sleep(5000);
}
}
Renuiz answered it in a comment, but it is because you're not clearing next.
So you build the string "8 10" in next, store it in prev. Next time you concat "8 10" with next, making "8 10 8 10". Which is different so you print it.
if (!prev.Equals(next))
{
....
prev = String.Copy(next);
next = String.Copy("");
}
This is the end of that loop. You should really be clearing next at the beginning of that loop.
Also you can just set the string
next = String.Empty;
I would declare next inside your while loop, as you don't need it in the greater scope, and I would call it current rather than next.
What is really wrong with your program logic - logic is not obvious. It is so obscure, that you can't understand where error is. So, my advise is following - if you can't find the error, try to simplify your code.
Currently your method has many responsibilities - it queries database, it dumps data to file, it publishes data somewhere and logs the results. And you stuck with all that stuff. If someone will need to change database query, or publishing logic - he will need to review all other stuff.
So, separate logic first:
private void PublishLoop()
{
string previousIDs = String.Empty;
int timeout = Int32.Parse(ConfigurationManager.AppSettings["publishTimeout"]);
while (Running)
{
string currentIDs = ConcatenateList(LoadIDs());
Dump(currentIDs);
if (!previousIDs.Equals(currentIDs))
{
try
{
Publish(currentIDs);
_log.Info("Published successfuly");
}
catch (PublicationException exception)
{
_log.Error("Publication failed");
}
previousIDs = currentIDs;
}
Thread.Sleep(timeout);
}
}
Well, I don't know much about your domain, so you probably can think about better names for variables and methods.
Here you have data access logic extracted to separate method (it's ok for first step of refactoring and for small applications). Keep in mind, that wrapping connection object into using block guarantee that connection will be closed in case of exception:
private IList<int> LoadIDs()
{
List<int> ids = new List<int>();
String connectionString = ConfigurationManager.ConnectionStrings["MyDbConn"].ConnectionString;
using (SqlConnection connection = new SqlConnection(connectionString))
{
SqlCommand command = connection.CreateCommand();
command.CommandText = "select ID from Tab1";
command.Notification = null;
connection.Open();
using (SqlDataReader reader = command.ExecuteReader(CommandBehavior.CloseConnection))
{
while ((reader.Read()))
ids.Add((int)reader["ID"]);
}
}
return ids;
}
Next - simple method for concatenating ids into one string:
private string ConcatenateList(IList<int> values)
{
return String.Join(" ", values.Select(value => value.ToString()).ToArray());
}
Dumping (mind, that file name moved to configuration file):
private void Dump(string ids)
{
using (StreamWriter writer = new StreamWriter(ConfigurationManager.AppSettings["dumpFilePath"]))
writer.WriteLine(ids);
}
And publishing logic:
private void Publish(string ids)
{
PublisherArgs args = new PublisherArgs
{
DomainKey = "c80cb405-eb77-4574-9405-5ba51832f5e6",
DomainName = "localhost"
};
Publisher publisher = new Publisher(args);
Publication publication = publisher.Publish("/test", JSON.Serialize(ids));
if (!publication.Successful)
throw new PublicationException();
}
I think that failures are exceptional and they not occur very often (so I decided to use exceptions for that case). But if it's something ordinary - you can simply use boolean method like TryPublish.
BTW you can use some logging library like log4net for logging successful and failure publishing. Or you can extract logging logic to separate method - this will make primary logic cleaner and easier to understand.
PS try to avoid comparison of boolean variables with true/false (publication.Successful == true) - you can occasionally assign value to your variable.

Categories