cassandra c# driver memory leak - c#

Using cassandra .net driver we are facing the following issue:
When inserting a lot of rows with parameterized INSERTs, the application memory usage continuously grows:
class Program
{
static Cluster cluster = Cluster.Builder()
.AddContactPoints(ConfigurationManager.AppSettings["address"])
.Build();
static Session session = cluster
.Connect(ConfigurationManager.AppSettings["keyspace"]);
static int counter = 0;
static void Main(string[] args)
{
for (int i = 0; i < 50; i++)
{
new Thread(() =>
{
while (true)
{
new Person()
{
Name = Interlocked.Increment(ref counter).ToString(),
ID = Guid.NewGuid(),
Data = new byte[4096],
}.Save(session);
}
}).Start();
}
Console.ReadLine();
}
}
class Person
{
public Guid ID
{
get;
set;
}
public string Name
{
get;
set;
}
public byte[] Data
{
get;
set;
}
public void Save(Session session)
{
Stopwatch w = Stopwatch.StartNew();
session.Execute(session.Prepare(
"INSERT INTO Person(id, name, data) VALUES(?, ?, ?);")
.Bind(this.ID, this.Name, this.Data));
Console.WriteLine("{1} saved in {0} ms",
w.Elapsed.TotalMilliseconds, this.Name);
}
}
According to the created memory dump the managed heap contains a huge amount of small byte arrays (most of them being generation 2), which could be traced back to the cassandra driver's byte conversion methods (InvConvert*) in the internal TypeInterpreter class.
Do you have any advice or ideas how we could get rid of this issue?

For anyone else that runs into this. I having memory problems when creating lots of Cassandra.ISessions even though I was disposing it correctly via a using statement. Changing my code to reuse a single ISession seems to have fixed it. I don't know if this is the best solution.

You are creating the preparedstatement for every record you are trying to insert. Try to create the preparedstatement once as you create the session.

try to use dispose if possible. Perhaps your Garbace Collection cannot clear them.
You also should write the developers of Cassandra. Looks like a bug for me.
Does this help you?

Related

How insert 1M models in SQL

I have a file with 1 million lines. I read and insert these line to mssql. The reading operation takes about a second, but insertion does not work very well here (Time: 00: 03: 36.1424842).
public async Task<int> InsertAsync(List<Model> models)
{
var _connectionString =
"Data Source=(localdb)\\MSSQLLocalDB;Initial Catalog=test;Integrated Security=True;Connect Timeout=30;Encrypt=False;TrustServerCertificate=False;ApplicationIntent=ReadWrite;MultiSubnetFailover=False";
var result = 0;
try
{
using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
sqlBulk.BatchSize = 10000;
sqlBulk.DestinationTableName = "Counterparty";
var dt = DataTableHelpers.ListToDataTable(models);
sqlBulk.WriteToServer(dt);
}
}
catch (Exception e)
{
_logger.Debug($"{e.Message} >>> {e.StackTrace}");
}
return result;
}
My models:
public class Model
{
public int Id { get; set; }
public string Name { get; set; }
public string Comment { get; set; }
public string Address { get; set; }
public string Phone { get; set; }
public bool IsActive { get; set; }
}
and file lines:
TestIsert703,Comment694,Adress694,816,1
TestIsert704,Comment695,Adress695,817,1
I tried changing sqlBulk.BatchSize but it does not work for me. How can I insert with good performance. Can I somehow use parallel.for? The load on the laptop is the minimum RAM 1GB, and the processes are generally silent.
Are you complaining about inserting a million records in three and a half minutes? You're getting more than 4,500 records per second!
If you really need to speed this up, I see the CPU and RAM use are low, and I bet at least some of the of the time is just in the ListToDataTable() method. You might reduce the time by splitting up this part of the work to take more advantage of the hardware.
On the SQL Server side, you can improve this job by switching from FULL to SIMPLE (or even BULK) logged mode, but that's not something I'd want to do all the time. I also see this is a local db. Does SQL Server has access to enough RAM on the system? That can make a huge difference.

C# Dynamically create class instance

I am in a situation where "I don't know what I don't know" so I am unsure if this is even the correct way of approaching this problem, apologies if this comes off as plain ignorant.
I have a program which connects to Ethernet controllers. The program allows users to configure what is connected to the system and set up I/O communication.
Each controller is its own device, and may have different IO depending on what model it is. Controllers have their own API.
The program saves the configuration to an XML config file which is read at startup. I need to then connect to each unknown device and set up a connection to each, leaving me with a method of referring to each device at a later time.
Here is what I am trying to achieve:
using Brainboxes.IO;
public class BrainBoxes
{
public string[] Devices = new string[] { "192.168.16.147", "192.168.16.148", "192.168.16.149", "192.168.16.150" };
List<string> EDDeviceList = new List<string>();
public BrainBoxes() // set up devices and connections to all devices connected in the constructor
{
foreach (string Device in Devices)
{
EDDevice BB400 = EDDevice.Create("192.168.16.147");
// BB400 is a typical name but how do I make this dynamic at the same time making it
// available for other members of the class?
EDDeviceList.Add(BB400); // add the device to a list to refer to later in the constructor
}
for (int i = 0; i < EDDeviceList.Count - 1; i++) { BB400.Connect()}; // connect to each device in sequence.
}
public void Outputs(int Relay)
{
// this would be a switch statement
BB400.Outputs[Relay].Value = 1;
Thread.Sleep(75);
BB400.Outputs[Relay].Value = 0;
}
~BrainBoxes()
{
BB400.Disconnect();
}
}
It sounds like you're trying to do quite a few things at once. To paraphrase what you want: to achieve (looking at both your question, your sample code and your comment)
When your application starts, you want it to connect to a collection of different devices automatically
When running, users can connect to and configure (the right) device
Ensure that connections are closed when the application stops
Also your question is rather open ended and from your first statement, I'm going to assume that you're a beginner. I know that it's quite dry, but you are going to have to look up the documentation for the hardware you're using. Luckily, it looks quite comprehensive
You need to give your class a more representative name. E.g. BrainboxController or BrainboxManager as, by the sounds of it, that is what it's for.
Looks like BB400 is one of the possible hardware devices, it is part of an inheritance hierarchy, so you don't want to restrict yourself to just that
I would avoid doing a lot of work in the constructor, it makes it harder to find problems
Use a dictionary to store your devices, that's how you'll "refer to each device at a later time"
public class BrainboxController : IDisposable
{
private readonly HashSet<string> _deviceIps; // potentially you can get away without having this if you call InitialiseDevices() in the constructor
private Dictionary<string, EDDevice> _devices = new Dictionary<string, EDDevice>(); // possibly use IDevice<C, P> instead of EDDevice
public BrainboxController(IEnumerable<string> devices)
{
_deviceIps = new HashSet<string>(devices);
}
public void InitialiseDevices()
{
foreach (string ip in _deviceIps)
_devices.Add(ip, EDDevice.Create(ip));
}
public void AddDevice(string ip)
{
if (_deviceIps.Add(ip))
_devices.Add(ip, EDDevice.Create(ip));
}
public void RemoveDevice(string ip)
{
if(_devices.ContainsKey(ip))
{
var device = _devices[ip];
device.Disconnect();
device.Dispose();
_devices.Remove(ip);
_deviceIps.Remove(ip);
}
}
public EDDevice GetDevice(string deviceIp)
{
if (_devices.ContainsKey(deviceIp))
return _devices[deviceIp];
return null;
}
public string GetConfiguration(string deviceIp)
{
if (_devices.ContainsKey(deviceIp))
return _devices[deviceIp].Describe(); // I'm assuming that this gets the config data
return "Device not found";
}
public bool SetConfiguration(string deviceIp, string xml)
{
if (_devices.ContainsKey(deviceIp))
{
_devices[deviceIp].SendCommand(xml); // I'm assuming this is how the config data is set
return true;
}
// log device not found
return false;
}
public IOList<IOLine> GetOutputs(string deviceIp, int relay)
{
if (_devices.ContainsKey(deviceIp))
return _devices[deviceIp].Outputs[relay];
// log device not found
return new IOList<IOLine>();
}
public void Dispose()
{
foreach(var device in _devices.Values)
{
device.Disconnect();
device.Dispose();
}
}
}
Strictly speaking, if you follow the single responsibility principle, this class should just be managing your devices and their connections. The methods GetConfiguration(), SetConfiguration() and GetOutputs() are shown as examples and really should live somewhere else.
Your calling code could be look like this (without dependency injection):
var deviceAddresses = new[] { "192.168.16.147", "192.168.16.148", "192.168.16.149", "192.168.16.150" };
var controller = new BrainboxController(deviceAddresses);
controller.InitialiseDevices();
var currentDevice = controller.GetDevice("192.168.16.147");
// do something with currentDevice
Finally, whatever it is you're trying to do with your Outputs method, that looks like business logic and this also should live somewhere else.

how to Serialize .Net Class to Avro.Generic.GenericRecord for publishing into kafka topic?

I am trying to find a way/helper to convert.Net Class to Avro.Generic.GenericRecord . Currently, I am manually adding field-name and field-value to Generic record. Is there a serializer/converter which I can use to convert the object to generic record and publish on to a kafka topic.
class Plant
{
public long Id { get; set; }
public string Name { get; set; }
public List<PlantProperties> PlantProperties{ get; set; }
}
class PlantProperties
{
public long Leaves{ get; set; }
public string Color{ get; set; }
}
Please suggest.
Assuming you are using the Confluent Schema Regsitry, you can use their .NET client1
https://github.com/confluentinc/confluent-kafka-dotnet
Copied from the examples folder
using (var serdeProvider = new AvroSerdeProvider(avroConfig))
using (var producer = new Producer<string, GenericRecord>(producerConfig, serdeProvider.GetSerializerGenerator<string>(), serdeProvider.GetSerializerGenerator<GenericRecord>()))
{
Console.WriteLine($"{producer.Name} producing on {topicName}. Enter user names, q to exit.");
int i = 0;
string text;
while ((text = Console.ReadLine()) != "q")
{
var record = new GenericRecord(s);
record.Add("name", text);
record.Add("favorite_number", i++);
record.Add("favorite_color", "blue");
producer
.ProduceAsync(topicName, new Message<string, GenericRecord> { Key = text, Value = record })
.ContinueWith(task => task.IsFaulted
? $"error producing message: {task.Exception.Message}"
: $"produced to: {task.Result.TopicPartitionOffset}");
}
}
cts.Cancel();
}
Where, in your case, update the record.Add uses accordingly
However, since you have a class, therefore, you should try to use SpecificRecord, rather than serializing back and forth between Avro and a .NET class via a GenericRecord. See the README section on the AvroGen tool for examples of this
1. I'm not aware of an alternative .NET library
Below are the steps I did to solve the problem using the suggestion from #cricket_007.
To avoid the complexity of writing the avro schema, create the c# classes first then use AvroSerializer to generate schema.
AvroSerializer.Create().WriterSchema.ToString()
This will generate the schema json for the class.
Move it to a schema file and
Make all the types to have nulls as Required
Then used avro_gen.exe tool to regenerate class files which implements ISpecific Record.
Add used the below code to publish to queue
using (var serdeProvider = new AvroSerdeProvider(avroConfig))
using (var producer = new Producer<string, MYClass>(producerConfig,
serdeProvider.GetSerializerGenerator<string>(),
serdeProvider.GetSerializerGenerator<MYClass>()))
{
Console.WriteLine($"{producer.Name} producing on
{_appSettings.PullListKafka.Topic}.");
producer.ProduceAsync(_appSettings.PullListKafka.Topic, new
Message<string, MYClass> { Key = Guid.NewGuid().ToString(), Value = MYClassObject})
.ContinueWith(task => task.IsFaulted
? $"error producing message: {task.Exception.Message}"
: $"produced to: {task.Result.TopicPartitionOffset}");
}
some links to help do this.
https://shanidgafur.github.io/blog/apache-avro-on-dotnet
https://github.com/SidShetye/HelloAvro/tree/master/Avro

Fire event on database datetime field

I have table in the database like this:
What is the best way to implement trigger of event somewhere (in SQL server database or c# application) on time of event field in table.
Edit:
Traditionally I would have done something like this:
while(true)
{
DataTable tbl = getRows("select * from table where event=" + DateTime.now());
if(tbl.rows.Count()>0)
{
//do some thing
}
Thread.Sleep(1000);
}
Is there a more efficient way to achieve this?
(I don't want periodically check Database)
I would go like this:
public class Notify
{
public int ID { get; set; }
public string Name { get; set; }
public DateTime Time { get; set; }
}
public Timer tData = new Timer();
public List<Notify> Notifications = new List<Notify>();
public void Main()
{
// 5 minutes
tData.Interval = 1000;
tData.Tick += new EventHandler(CheckEvents);
}
public void GetAllEvents()
{
DataTable results = new DataTable();
/* results = YourDatabase(Select id, name, event FROM...); */
Notifications.Clear();
foreach(DataRow row in results.Rows)
{
Notifications.Add
(
new Notify
{
ID = int.Parse(row[0].ToString()),
Name = row[1].ToString(),
Time = DateTime.Parse(row[2].ToString())
}
);
}
}
public void CheckEvents(object sender, EventArgs e)
{
IEnumerable<Notify> eventsElapsed = Notifications.Where(notify => notify.Time == DateTime.Now);
foreach(Notify notify in eventsElapsed)
{
// Send your sms
var id = notify.ID;
var name = notify.Name;
var time = notify.Time;
}
}
To-Do's:
Care for the correct format when getting the DateTime out of your database. It might look different.
You would have to think about how to get new events. I would do it either by a button-click or by setting up another timer with an interval of something around 5-10 minutes. You can just call GetAllEvents().
I dont think you are inserting new events that will have their time in between the next 30 seconds.
Another more advanced way:
You could also setup 2 apps for this.
First app:
Get all Events from Database
Optionally create an UI that allows to filter which events should be get
Link every event to Scheduled Tasks with start arguments like secondApp.exe "2016-06-26 10:13:56".
Second app:
On startup, fetch the passed arguments and send the SMS.
If your table has lots of data, its not bad to split the time consuming process of hooking up events (1st app) from the simple process of sending out a SMS (2nd app).
Don't mess with triggers at the database for this purpose.
In C# you write the code for your logic - the thing you need to be done. Next designate an object which stores the input parameters + the time to start this logic. E.g. in SMS system we need the subscriber number, the message and the time of sending.
At your system startup (for example) you read the data to obtain the times. Start some threads and make them wait (Sleep) as long as needed, then execute the target method. Something like this:
class SmsDetails
{
public string Subscriber { get; set; }
public string Message { get; set; }
public DateTime SendOn { get; set; }
}
class Program
{
static void Main(string[] args)
{
SqlDataReader schedule = null; // Initialize the reader as appropriate.
while (schedule.Read())
{
var det = new SmsDetails();
//.
//.
//.
det.SendOn = schedule.GetDateTime(2);
ThreadPool.QueueUserWorkItem(ScheduleSending, det);
}
}
static void ScheduleSending(object Details)
{
var smsd = Details as SmsDetails;
if (smsd.SendOn > DateTime.Now)
{
var waitInterval = DateTime.Now - smsd.SendOn;
Thread.Sleep(waitInterval);
SendSms(smsd.Subscriber, smsd.Message);
}
}
static void SendSms(string PhoneNumber, string Message)
{
// Send it out
}
}
Of course this is not the complete code for such a solution but I hope you get the idea. You need to take care to signal (in the DB?) if the message was sent. You can also employ wait handles to interrupt the threads from the calling code.
Depending on the volume of the records in the reader you may want to poll the DB for the message about to be send in the next 1, 6, 12 and so on. E.g. if you need to dispatch 10000 messages over the next 3 days don't schedule them now. Having too many threads can degrade the performance.

Is Redis a viable solution for a local cache?

In my scenario, I have a Winforms client that connects to WebApi2. The data is stored in a SQL Server database.
To speed up performance, I am researching if storing data in local cache is a viable solution. Preferably, the local cache should be stored in files instead of kept in-memory as RAM might be an issue. The data is all POCO classes, some being much more complex than others, and most classes being related to each other.
I have made a shortlist of which frameworks might be viable:
MemoryCache
MemCached
CacheManager
StackExchange.Redis
Local Database
Using MemoryCache, I would need to implement my own solution, but it will fit my initial requirements.
However, one common problem that I am seeing is the updating of related classes. For example, I have a relationship between CustomerAddress and PostCode. If I change some properties in a postcode object, I can easily update its local cache. But how is it possible to update/invalidate any other classes that use this postcode, in this case CustomerAddress?
Does any of the frameworks above have methods that help in this kind of situation, or is it totally dependent on the developer to handle such cache invalidation?
The CachingFramework.Redis library provides a mechanism to relate tags to keys and hashes so you can then invalidate them in a single operation.
I'm assuming that you will:
Store the Customer Addresses in Redis with keys like "Address:{AddressId}".
Store the Post Codes in Redis with keys like "PostCode:{PostCodeId}".
And that your model is something like this:
public class CustomerAddress
{
public int CustomerAddressId { get; set; }
public int CustomerId { get; set; }
public int PostCodeId { get; set; }
}
public class PostCode
{
public int PostCodeId { get; set; }
public string Code { get; set; }
}
My suggestion is to:
Mark the Customer Addresses objects on Redis with tags like "Tag-PostCode:{PostCodeId}".
Use a cache-aside pattern to retrieve the Customer Addresses and Post Codes from cache/database.
Invalidate the cache objects by tag when a Post Code is changed.
Something like this should probably work:
public class DataAccess
{
private Context _cacheContext = new CachingFramework.Redis.Context("localhost:6379");
private string FormatPostCodeKey(int postCodeId)
{
return string.Format("PostCode:{0}", postCodeId);
}
private string FormatPostCodeTag(int postCodeId)
{
return string.Format("Tag-PostCode:{0}", postCodeId);
}
private string FormatAddressKey(int customerAddressId)
{
return string.Format("Address:{0}", customerAddressId);
}
public void InsertPostCode(PostCode postCode)
{
Sql.InsertPostCode(postCode);
}
public void UpdatePostCode(PostCode postCode)
{
Sql.UpdatePostCode(postCode);
//Invalidate cache: remove CustomerAddresses and PostCode related
_cacheContext.Cache.InvalidateKeysByTag(FormatPostCodeTag(postCode.PostCodeId));
}
public void DeletePostCode(int postCodeId)
{
Sql.DeletePostCode(postCodeId);
_cacheContext.Cache.InvalidateKeysByTag(FormatPostCodeTag(postCodeId));
}
public PostCode GetPostCode(int postCodeId)
{
// Get/Insert the postcode from/into Cache with key = PostCode{PostCodeId}.
// Mark the object with tag = Tag-PostCode:{PostCodeId}
return _cacheContext.Cache.FetchObject(
FormatPostCodeKey(postCodeId), // Redis Key to use
() => Sql.GetPostCode(postCodeId), // Delegate to get the value from database
new[] { FormatPostCodeTag(postCodeId) }); // Tags related
}
public void InsertCustomerAddress(CustomerAddress customerAddress)
{
Sql.InsertCustomerAddress(customerAddress);
}
public void UpdateCustomerAddress(CustomerAddress customerAddress)
{
var updated = Sql.UpdateCustomerAddress(customerAddress);
if (updated.PostCodeId != customerAddress.PostCodeId)
{
var addressKey = FormatAddressKey(customerAddress.CustomerAddressId);
_cacheContext.Cache.RenameTagForKey(addressKey, FormatPostCodeTag(customerAddress.PostCodeId), FormatPostCodeTag(updated.PostCodeId));
}
}
public void DeleteCustomerAddress(CustomerAddress customerAddress)
{
Sql.DeleteCustomerAddress(customerAddress.CustomerAddressId);
//Clean-up, remove the postcode tag from the CustomerAddress:
_cacheContext.Cache.RemoveTagsFromKey(FormatAddressKey(customerAddress.CustomerAddressId), new [] { FormatPostCodeTag(customerAddress.PostCodeId) });
}
public CustomerAddress GetCustomerAddress(int customerAddressId)
{
// Get/Insert the address from/into Cache with key = Address:{CustomerAddressId}.
// Mark the object with tag = Tag-PostCode:{PostCodeId}
return _cacheContext.Cache.FetchObject(
FormatAddressKey(customerAddressId),
() => Sql.GetCustomerAddress(customerAddressId),
a => new[] { FormatPostCodeTag(a.PostCodeId) });
}
}
To speed up performance, I am researching if storing data in local
cache is a viable solution. Preferably, the local cache should be
stored in files instead of kept in-memory as RAM might be an issue
The whole issue is to avoid storing it in files, to avoid DISK operations which are slow, thus Redis is RAM based memory.
Does any of the frameworks above have methods that help in this kind
of situation, or is it totally dependent on the developer to handle
such cache invalidation?
You can save the entire object as JSON instead of applying logic and disassembles the objects, which will be also slow and error prone when applying changes.

Categories