How to reduce memory usage of program? - c#

I am creating a (spatial database file)sdf file from the data fetched from Geodatabase file. I am not storing the data anywhere still memory and cpu usage of program is very high.I am using the file having huge database how can I increase performance? I have also tried to use finalize and dispose method.
static void GetFeatureData(Geodatabase geodatabase, string tablename, string sdffilepath)
{
string query = "select * from " + tablename;
int no = 1;
foreach (Row row in geodatabase.ExecuteSQL(query))
{
IConnection con = OpenFDOSDFConnection(sdffilepath);
IInsert insertCommand = (IInsert)con.CreateCommand(OSGeo.FDO.Commands.CommandType.CommandType_Insert);
insertCommand.SetFeatureClassName(tablename);
for (int nFieldNumber = 0; nFieldNumber < row.FieldInformation.Count; nFieldNumber++)
{
string fieldName = row.FieldInformation.GetFieldName(nFieldNumber);
switch (row.FieldInformation.GetFieldType(nFieldNumber))
{
case FieldType.SmallInteger:
if (row.IsNull(fieldName))
{
insertCommand.PropertyValues.Add(new PropertyValue(fieldName, null));
}
else
{
insertCommand.PropertyValues.Add(new PropertyValue(fieldName, new Int32Value(row.GetShort(fieldName))));
}
break;
//All other datatypes
case FieldType.Geometry:
if (!row.IsNull(fieldName))
{
switch (row.GetGeometry().geometryType.ToString())
{
case "Point": insertCommand.PropertyValues.Add(new PropertyValue("Geometry", geometryValue));
break;
//All other Geometry cases
}
}
}
insertCommand.Execute();
insertCommand.Dispose();
con.Dispose();
Console.WriteLine(no++);
}
}
}

Related

Where do I have memory leaks and how to fix it? Why memory consumption increases?

I am struggling a few days already with a problem of growing memory consumption by console application in .Net Core 2.2, and just now I ran out of ideas what else I could improve.
Im my application I have a method that triggers StartUpdatingAsync method:
public MenuViewModel()
{
if (File.Exists(_logFile))
File.Delete(_logFile);
try
{
StartUpdatingAsync("basic").GetAwaiter().GetResult();
}
catch (ArgumentException aex)
{
Console.WriteLine($"Caught ArgumentException: {aex.Message}");
}
Console.ReadKey();
}
StartUpdatingAsync creates 'repo' and instance is getting from DB a list of objects to be updated (around 200k):
private async Task StartUpdatingAsync(string dataType)
{
_repo = new DataRepository();
List<SomeModel> some_list = new List<SomeModel>();
some_list = _repo.GetAllToBeUpdated();
await IterateStepsAsync(some_list, _step, dataType);
}
And now, within IterateStepsAsync we are getting updates, parsing them with existing data and updating DB. Inside of each while I was creating new instances of all new classes and lists, to be sure that old ones are releasing memory, but it didnt help. Also I was GC.Collect() at the end of the method, what also is not helping. Please note, that method below triggers lots of parralel Tasks, but they supposed to be disposed within it, am I right?:
private async Task IterateStepsAsync(List<SomeModel> some_list, int step, string dataType)
{
List<Area> areas = _repo.GetAreas();
int counter = 0;
while (counter < some_list.Count)
{
_repo = new DataRepository();
_updates = new HttpUpdates();
List<Task> tasks = new List<Task>();
List<VesselModel> vessels = new List<VesselModel>();
SemaphoreSlim throttler = new SemaphoreSlim(_degreeOfParallelism);
for (int i = counter; i < step; i++)
{
int iteration = i;
bool skip = false;
if (dataType == "basic" && (some_list[iteration].Mmsi == 0 || !some_list[iteration].Speed.HasValue)) //if could not be parsed with "full"
skip = true;
tasks.Add(Task.Run(async () =>
{
string updated= "";
await throttler.WaitAsync();
try
{
if (!skip)
{
Model model= await _updates.ScrapeSingleModelAsync(some_list[iteration].Mmsi);
while (Updating)
{
await Task.Delay(1000);
}
if (model != null)
{
lock (((ICollection)vessels).SyncRoot)
{
vessels.Add(model);
scraped = BuildData(model);
}
}
}
else
{
//do nothing
}
}
catch (Exception ex)
{
Log("Scrape error: " + ex.Message);
}
finally
{
while (Updating)
{
await Task.Delay(1000);
}
Console.WriteLine("Updates for " + counter++ + " of " + some_list.Count + scraped);
throttler.Release();
}
}));
}
try
{
await Task.WhenAll(tasks);
}
catch (Exception ex)
{
Log("Critical error: " + ex.Message);
}
finally
{
_repo.UpdateModels(vessels, dataType, counter, some_list.Count, _step);
step = step + _step;
GC.Collect();
}
}
}
Inside of the method above, we are calling _repo.UpdateModels, where is updated DB. I tryed two approaches, with using EC Core and SqlConnection. Both with similar results. Below you can find both of them.
EF Core
internal List<VesselModel> UpdateModels(List<Model> vessels, string dataType, int counter, int total, int _step)
{
for (int i = 0; i < vessels.Count; i++)
{
Console.WriteLine("Parsing " + i + " of " + vessels.Count);
Model existing = _context.Vessels.Where(v => v.id == vessels[i].Id).FirstOrDefault();
if (vessels[i].LatestActivity.HasValue)
{
existing.LatestActivity = vessels[i].LatestActivity;
}
//and similar parsing several times, as above
}
Console.WriteLine("Saving ...");
_context.SaveChanges();
return new List<Model>(_step);
}
SqlConnection
internal List<VesselModel> UpdateModels(List<Model> vessels, string dataType, int counter, int total, int _step)
{
if (vessels.Count > 0)
{
using (SqlConnection connection = GetConnection(_connectionString))
using (SqlCommand command = connection.CreateCommand())
{
connection.Open();
StringBuilder querySb = new StringBuilder();
for (int i = 0; i < vessels.Count; i++)
{
Console.WriteLine("Updating " + i + " of " + vessels.Count);
//PARSE
VesselAisUpdateModel existing = new VesselAisUpdateModel();
if (vessels[i].Id > 0)
{
//find existing
}
if (existing != null)
{
//update for basic data
querySb.Append("UPDATE dbo." + _vesselsTableName + " SET Id = '" + vessels[i].Id+ "'");
if (existing.Mmsi == 0)
{
if (vessels[i].MMSI.HasValue)
{
querySb.Append(" , MMSI = '" + vessels[i].MMSI + "'");
}
}
//and similar parsing several times, as above
querySb.Append(" WHERE Id= " + existing.Id+ "; ");
querySb.AppendLine();
}
}
try
{
Console.WriteLine("Sending SQL query to " + counter);
command.CommandTimeout = 3000;
command.CommandType = CommandType.Text;
command.CommandText = querySb.ToString();
command.ExecuteNonQuery();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
connection.Close();
}
}
}
return new List<Model>(_step);
}
Main problem is, that after tenths/hundreds of thousands of updated models my console application memory consumption increases continuously. And I have no idea why.
SOLUTION my problem was inside of ScrapeSingleModelAsync method, where I was using incorrectly HtmlAgilityPack, what I could debug thanks to cassandrad.
Your code is messy, with huge amount of different objects with unknown lifetime. It's hardly possible to figure out the problem just looking at it.
Consider using profiling tools, for example Visual Studio's Diagnostic Tools, they will help you to find what objects are living too long in the heap. Here is overview of its functions related to memory profiling. Highly recomended to be read.
In short, you need to take two snapshots and look at what objects are taking the most memory. Let's look at simple example.
int[] first = new int[10000];
Console.WriteLine(first.Length);
int[] secod = new int[9999];
Console.WriteLine(secod.Length);
Console.ReadKey();
Take the first snapshot when your function works at least once. In my case, I took snapshot when the first huge space has been alocated.
After that, let your app be working some time so the difference in memory usage become noticeable, take the second memory snapshot.
You'll notice that another snapshot is added with info about how much is the difference. To get more specific info, click on one or another blue label of the latest snapshot to open snapshots comparison.
Following my example, we can see that there is change in count of int arrays. By default int[] wasn't visible in the table, so I had to uncheck Just My Code in filtration options.
So, this is what needs to be done. After you figure out what objects increase in count or size over time, you can locate where these objects are create and optimize this operation.

Optimization loading data in memory

I need to load data from SQLite to memory, and when the table contains more than 40k entries this process lasts for a few minutes. Data in db is encrypted so I decrypt it to load in memory.
Basically, I am:
loading all the data from the table
while it is reading I decrypt the info and add it to a dictionary
The code:
internal static void LoadInfo(Dictionary<int, InfoMemory> dic)
{
using (SQLiteConnection con = new SQLiteConnection(conString))
{
using (SQLiteCommand cmd = con.CreateCommand())
{
cmd.CommandText = String.Format("SELECT ID, Version, Code FROM Employee;");
int ID, version, code;
try
{
con.Open();
using (SQLiteDataReader rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
try { ID = Convert.ToInt32(Decrypt(rdr[0].ToString())); }
catch { ID = -1; }
try { version = Convert.ToInt32(Decrypt(rdr[1].ToString())); }
catch { version = -1; }
try { code = Convert.ToInt32(Decrypt(rdr[2].ToString())); }
catch { code = -1; }
if (ID != -1)
{
if (!dic.ContainsKey(ID)) dic.Add(ID, new InfoMemory(version, code));
}
}
rdr.Close();
}
}
catch (Exception ex) { Log.Error(ex.Message); }
finally { con.Close(); GC.Collect(); }
}
}
}
How can I make this process faster?
One way I try is by loading encrypted data to memory and decrypt when needed, but if I do that I consume more memory. Since I am working in mobile devices I want to maintain the memory consumption as low as possible.
One thing you can do is; instead of using try/catch:
int id, version, code;
if(!int.TryParse(rdr[0], out id))
{
id = -1;
}
if(!int.TryParse(rdr[1], out version))
{
version = -1;
}
if(!int.TryParse(rdr[2], out code))
{
code = -1;
}

Parsing performance of row data from files to SQL Server database

I have the PAF raw data in several files (list of all addresses in the UK).
My goal is to create a PostCode lookup in our software.
I have created a new database but there is no need to understand it for the moment.
Let's take a file, his extension is ".c01" and can be open with a text editor. The data in this file are in the following format :
0000000123A
With (according to the developer guide), 8 char for the KEY, 50 char for the NAME.
This file contains 2,449,652 rows (it's a small one !)
I create a Parsing class for this
private class SerializedBuilding
{
public int Key
{
get; set;
}
public string Name
{
get; set;
}
public bool isValid = false;
public Building ToBuilding()
{
Building b = new Building();
b.BuildingKey = Key;
b.BuildingName = Name;
return b;
}
private readonly int KEYLENGTH = 8;
private readonly int NAMELENGTH = 50;
public SerializedBuilding(String line)
{
string KeyStr = null;
string Name = null;
try
{
KeyStr = line.Substring(0, KEYLENGTH);
}
catch (Exception e)
{
Console.WriteLine("erreur parsing key line " + line);
return;
}
try
{
Name = line.Substring(KEYLENGTH - 1, NAMELENGTH);
}
catch (Exception e)
{
Console.WriteLine("erreur parsing name line " + line);
return;
}
int value;
if (!Int32.TryParse(KeyStr, out value))
return;
if (value == 0 || value == 99999999)
return;
this.Name = Name;
this.Key = value;
this.isValid = true;
}
}
I use this method to read the file
public void start()
{
AddressDataContext d = new AddressDataContext();
Count = 0;
string line;
// Read the file and display it line by line.
System.IO.StreamReader file =
new System.IO.StreamReader(filename);
SerializedBuilding sb = null;
Console.WriteLine("Number of line detected : " + File.ReadLines(filename).Count());
while ((line = file.ReadLine()) != null)
{
sb = new SerializedBuilding(line);
if (sb.isValid)
{
d.Buildings.InsertOnSubmit(sb.ToBuilding());
if (Count % 100 == 0)
d.SubmitChanges();
}
Count++;
}
d.SubmitChanges();
file.Close();
Console.WriteLine("building added");
}
I use Linq to SQL classes to insert data to my database. The connection string is the default one.
This seems to work, I have added 67200 lines. It just crashed but my questions are not about that.
My estimations :
33,647,015 rows to parse
Time needed for execution : 13 hours
It's a one-time job (just needs to be done on my sql and on the client server later) so I don't really care about performances but I think it can be interesting to know how it can be improved.
My questions are :
Is readline() and substring() the most powerful ways to read these huge files ?
Can the performance be improved by modifying the connection string ?

SQLite connection strategies

I have a database that may be on the network drive.
There are two things that I want to achieve:
When the first user connects to it in read-only mode (he doesn't
have a read-write access to the location, or the database is
read-only), other users must use the read-only connection also (even
if they have RW access).
When the first user connects to it in RW mode, others can not
connect to the database at all.
I'm using SQLite, and the concurrency should not be the problem, as the database should never be used by more than 10 people at the same time.
UPDATE: This is a sample that I'm trying to make work, so I could implement it in the program itself. Almost everything can be changed.
UPDATE: Now when I finally understood what #CL. was telling me, I made it work and this is the updated code.
using System.Diagnostics;
using System.Linq;
using System.IO;
using DbSample.Domain;
using DbSample.Infrastructure;
using NHibernate.Linq;
using NHibernate.Util;
namespace DbSample.Console
{
class Program
{
static void Main(string[] args)
{
IDatabaseContext databaseContext = null;
databaseContext = new SqliteDatabaseContext(args[1]);
var connection = LockDB(args[1]);
if (connection == null) return;
var sessionFactory = databaseContext.CreateSessionFactory();
if (sessionFactory != null)
{
int insertCount = 0;
while (true)
{
try
{
using (var session = sessionFactory.OpenSession(connection))
{
string result;
session.FlushMode = NHibernate.FlushMode.Never;
var command = session.Connection.CreateCommand();
command.CommandText = "PRAGMA locking_mode=EXCLUSIVE";
command.ExecuteNonQuery();
using (var transaction = session.BeginTransaction(ReadCommited))
{
bool update = false;
bool delete = false;
bool read = false;
bool readall = false;
int op = 0;
System.Console.Write("\nMenu of the day:\n1: update\n2: delete\n3: read\n4: read all\n0: EXIT\n\nYour choice: ");
op = System.Convert.ToInt32(System.Console.ReadLine());
if (op == 1)
update = true;
else if (op == 2)
delete = true;
else if (op == 3)
read = true;
else if (op == 4)
readall = true;
else if (op == 0)
break;
else System.Console.WriteLine("Are you retarded? Can't you read?");
if (delete)
{
System.Console.Write("Enter the ID of the object to delete: ");
var objectToRemove = session.Get<MyObject>(System.Convert.ToInt32(System.Console.ReadLine()));
if (!(objectToRemove == null))
{
session.Delete(objectToRemove);
System.Console.WriteLine("Deleted {0}, ID: {1}", objectToRemove.MyName, objectToRemove.Id);
deleteCount++;
}
else
System.Console.WriteLine("\nObject not present in the database!\n");
}
else if (update)
{
System.Console.Write("How many objects to add/update? ");
int number = System.Convert.ToInt32(System.Console.ReadLine());
number += insertCount;
for (; insertCount < number; insertCount++)
{
var myObject = session.Get<MyObject>(insertCount + 1);
if (myObject == null)
{
myObject = new MyObject
{
MtName = "Object" + insertCount,
IdLegacy = 0,
};
session.Save(myObject);
System.Console.WriteLine("Added {0}, ID: {1}", myObject.MyName, myObject.Id);
}
else
{
session.Update(myObject);
System.Console.WriteLine("Updated {0}, ID: {1}", myObject.MyName, myObject.Id);
}
}
}
else if (read)
{
System.Console.Write("Enter the ID of the object to read: ");
var objectToRead = session.Get<MyObject>(System.Convert.ToInt32(System.Console.ReadLine()));
if (!(objectToRead == null))
System.Console.WriteLine("Got {0}, ID: {1}", objectToRead.MyName, objectToRead.Id);
else
System.Console.WriteLine("\nObject not present in the database!\n");
}
else if (readall)
{
System.Console.Write("How many objects to read? ");
int number = System.Convert.ToInt32(System.Console.ReadLine());
for (int i = 0; i < number; i++)
{
var objectToRead = session.Get<MyObject>(i + 1);
if (!(objectToRead == null))
System.Console.WriteLine("Got {0}, ID: {1}", objectToRead.MyName, objectToRead.Id);
else
System.Console.WriteLine("\nObject not present in the database! ID: {0}\n", i + 1);
}
}
update = false;
delete = false;
read = false;
readall = false;
transaction.Commit();
}
}
}
catch (System.Exception e)
{
throw e;
}
}
sessionFactory.Close();
}
}
private static SQLiteConnection LockDbNew(string database)
{
var fi = new FileInfo(database);
if (!fi.Exists)
return null;
var builder = new SQLiteConnectionStringBuilder { DefaultTimeout = 1, DataSource = fi.FullName, Version = 3 };
var connectionStr = builder.ToString();
var connection = new SQLiteConnection(connectionStr) { DefaultTimeout = 1 };
var cmd = new SQLiteCommand(connection);
connection.Open();
// try to get an exclusive lock on the database
try
{
cmd.CommandText = "PRAGMA locking_mode = EXCLUSIVE; BEGIN EXCLUSIVE; COMMIT;";
cmd.ExecuteNonQuery();
}
// if we can't get the exclusive lock, it could mean 3 things
// 1: someone else has locked the database
// 2: we don't have a write acces to the database location
// 3: database itself is a read-only file
// So, we try to connect as read-only
catch (Exception)
{
// we try to set the SHARED lock
try
{
// first we clear the locks
cmd.CommandText = "PRAGMA locking_mode = NORMAL";
cmd.ExecuteNonQuery();
cmd.CommandText = "SELECT COUNT(*) FROM MyObject";
cmd.ExecuteNonQuery();
// then set the SHARED lock on the database
cmd.CommandText = "PRAGMA locking_mode = EXCLUSIVE";
cmd.ExecuteNonQuery();
cmd.CommandText = "SELECT COUNT(*) FROM MyObject";
cmd.ExecuteNonQuery();
readOnly = true;
}
catch (Exception)
{
// if we can't set EXCLUSIVE nor SHARED lock, someone else has opened the DB in read-write mode and we can't connect at all
connection.Close();
return null;
}
}
return connection;
}
}
}
Set PRAGMA locking_mode=EXCLUSIVE to prevent SQLite from releasing its locks after a transaction ends.
I don't know if it can be done within db but in application;
You can set a global variable (not sure if it's a web or desktop app) to check if anyone connected and he has a write access or not.
After that you can check the other client's state.

Massive differences in reexecution

I wrote a little C# application that indexes a book and executes a boolean textretrieval algorithm on the index. The class at the end of the post showes the implementation of both, building the index and executing the algorithm on it.
The code is called via a GUI-Button in the following way:
private void Execute_Click(object sender, EventArgs e)
{
Stopwatch s;
String output = "-----------------------\r\n";
String sr = algoChoice.SelectedItem != null ? algoChoice.SelectedItem.ToString() : "";
switch(sr){
case "Naive search":
output += "Naive search\r\n";
algo = NaiveSearch.GetInstance();
break;
case "Boolean retrieval":
output += "boolean retrieval\r\n";
algo = BooleanRetrieval.GetInstance();
break;
default:
outputTextbox.Text = outputTextbox.Text + "Choose retrieval-algorithm!\r\n";
return;
}
output += algo.BuildIndex("../../DocumentCollection/PilzFuehrer.txt") + "\r\n";
postIndexMemory = GC.GetTotalMemory(true);
s = Stopwatch.StartNew();
output += algo.Start("../../DocumentCollection/PilzFuehrer.txt", new String[] { "Pilz", "blau", "giftig", "Pilze" });
s.Stop();
postQueryMemory = GC.GetTotalMemory(true);
output += "\r\nTime elapsed:" + s.ElapsedTicks/(double)Stopwatch.Frequency + "\r\n";
outputTextbox.Text = output + outputTextbox.Text;
}
The first execution of Start(...) runs about 700µs, every rerun only takes <10µs.
The application is compiled with Visual Studio 2010 and the default 'Debug' buildconfiguration.
I experimentad a lot to find the reason for that including profiling and different implementations but the effect always stays the same.
I'd be hyppy if anyone could give me some new ideas what I shall try or even an explanation.
class BooleanRetrieval:RetrievalAlgorithm
{
protected static RetrievalAlgorithm theInstance;
List<String> documentCollection;
Dictionary<String, BitArray> index;
private BooleanRetrieval()
: base("BooleanRetrieval")
{
}
public override String BuildIndex(string filepath)
{
documentCollection = new List<string>();
index = new Dictionary<string, BitArray>();
documentCollection.Add(filepath);
for(int i=0; i<documentCollection.Count; ++i)
{
StreamReader input = new StreamReader(documentCollection[i]);
var text = Regex.Split(input.ReadToEnd(), #"\W+").Distinct().ToArray();
foreach (String wordToIndex in text)
{
if (!index.ContainsKey(wordToIndex))
{
index.Add(wordToIndex, new BitArray(documentCollection.Count, false));
}
index[wordToIndex][i] = true;
}
}
return "Index " + index.Keys.Count + "words.";
}
public override String Start(String filepath, String[] search)
{
BitArray tempDecision = new BitArray(documentCollection.Count, true);
List<String> res = new List<string>();
foreach(String searchWord in search)
{
if (!index.ContainsKey(searchWord))
return "No documents found!";
tempDecision.And(index[searchWord]);
}
for (int i = 0; i < tempDecision.Count; ++i )
{
if (tempDecision[i] == true)
{
res.Add(documentCollection[i]);
}
}
return res.Count>0 ? res[0]: "Empty!";
}
public static RetrievalAlgorithm GetInstance()
{
Contract.Ensures(Contract.Result<RetrievalAlgorithm>() != null, "result is null.");
if (theInstance == null)
theInstance = new BooleanRetrieval();
theInstance.Executions++;
return theInstance;
}
}
Cold/warm start of .Net application is usually impacted by JIT time and disk access time to load assemblies.
For application that does a lot of disk IO very first access to data on disk will be much slower than on re-run for the same data due to caching content (also applies to assembly loading) if data is small enough to fit in memory cache for the disk.
First run of the task will be impacted by disk IO for assemblies and data, plus JIT time.
Second run of the same task without restart of application - just reading data from OS memory cache.
Second run of application - reading assemblies from OS memory cache and JIT again.

Categories