Confluent.Kafka - Topic Log Compaction

Confluent.Kafka - Topic Log Compaction - c#

I'm currently building publisher and consumer assets using Confluent.Kafka and I'm trying to understand if there is anything different I need to do in code. I'm able to create the topic log compaction but I do not fully understand how to work with it in C# .NET Core.
My main ask is after creating a topic with log compaction enabled is there anything that must be done IN CODE to use it or is it all handled under the hood.
If there are code specific aspects to write does anyone have an example they can point me to? I've been looking into it for a couple of days and I find plenty of information on how to create a topic with log compaction enabled (which I've already achieved) but nothing on how that might affect code usage for the producer and consumer.
Any help would be much appreciated.

No, you don't need to make any changes to your code to use log compaction. To use log compaction, you only need to configure the topic.
The only thing different in code would be that you can delete events with a certain key by producing a tombstone value. Which in C# is just a null.
Make sure you really understand how log compaction works, you can read more about it here. To activate log compaction you must set the cleanup.policy=compact when creating the topic. But you must also consider other topic configurations which impact how often the topic is compacted: delete.retention.ms, segment.ms, min.cleanable.dirty.ratio.

Related

Might GC.Collect() be warranted in this particular case?

Disclaimer: Yes, I know that the general answer to whether or not to use GC.Collect() is a resounding "NO!". This is the first time in several years of programming that I ever consider using it at all.
Well then, here's the situation: We have developed a C# scripting tool based on the Microsoft.CodeAnalysis.CSharp.Scripting libraries (v3.6.0). It's a Winform GUI with editor etc., not unlike others out there. We use it for the validation of integrated circuits, meaning that its primary task is interfacing lab equipment such as power supplies, pattern generators, meters and the like. For the communication to said instruments we predominantly rely on National Instrument's VISA framework, albeit not exclusively. Some devices are controlled directly via DLLs from their respective manufacturers. In general, this system is working beautifully and by now it is successfully used by quite a lot of design engineers who do not know the first thing about the intricacies of .NET and C#.
At this point I should explain that the user can simply write a method (i.e. on "top-level") and then execute it. The Roslyn-part behind this is that the input is fed to CSharpScript.Create() and then compiled. The execution of a method is done via Script.ContinueWith("method name"). Inside of such a method the user can construct an object like, say, new VISA("connection string"), which connects to the device and then communicate with the device via this object. Nothing forces him or her to care about disposing the object (i.e. closing the connection).
Now, the problem is this: recently, very sporadic crashes of the GUI application have occurred with no feedback at all from the system - the form just closes and that's it. By trial-and-error we are currently 99% sure that if all connection objects are explicitely disposed within a method, the crashes do not occur. So, rewriting the method to something like this fixes the problem:
using(var device = new VISA("connection string"))
{
device.Query("IDN?");
}
The reason why I look into the GC's direction at all is that there is no discernible correlation to any actions from the user. The guys might run such methods for an hour without a problem and then, when scrolling in the editor, when no method is currently being executed, the GUI closes without comment. And that's why I'd like to get some input from people more knowledgeable about Roslyn and the GC:
Are there known issues with this scripting library and GC? (I would very much assume that there aren't)
Since the explicit disposal of objects seem to prevent the issue, might this be one of the extremely scarce situations where the use of GC.Collect() might be warranted? (admittedly, I could not yet test whether that also prevents the problem thanks to of home office)
Any ideas what can cause a .NET application to crash without any kind of feedback and how to obtain more information about such a crash? (the scripting engine is a separate DLL, as are the device drivers; the GUI only handles the graphics)
I am fully aware that this is a rather vague description of the problem with very little source code. This is due to the fact that the application comprises of quite a lot of source code and I have no idea what might be relevant here. Also, all namespaces in the above text refer to Microsoft.CodeAnalysis.CSharp.Scripting, except for VISA, which is self-defined. Obviously, I will gladly answer any follow-up questions for getting to the bottom of this.
Thanks in advance.

Short answer: No. It's not only not warranted, it's completely missing the actual issue.
Further explanation: #canton7 instantly hit the nail on the head when writing
I'd argue that your application shouldn't crash even if a finalizer does end up being called
The root issue hid inside a 3rd party DLL in form of an, at the very least, suboptimal implementation of IDisposable. Once I zoomed in on that, it was rather easy to produce a workaround for that.
My original question is so very misguided that I'd like to state the one that I should have asked:
How do I trace a crash of my C# application when my application's logging does not show anything?
This question has been answered comprehensively in a number of posts. In my case, the crash could be seen in the Windows event log.

Reusing the session of the thread with NHibernate

I know several topics on the subject have been discussed, because I have been reading a lot to try to resolve my issue, but somehow they happen to not fulfill my needs (maybe for the lack of detail). Anyway, if you think some specific 'topic' might be useful, please link it.
I'm developing a desktop application with WPF (and MVVM) and I'm using NHibernate. After researching about possible ways to manage my session, I have decided to use the session-per-form approach. By this way, I think I can fully use the features of NHibernate like lazy-loading, cache and so on.
As I'm working with a database, I don't want to freeze my UI while I'm loading or saving my entities, so I thought I should use a dedicated thread (in each form, which I think simplifies the development) to handle the database interaction. The problem, though, is how I should 'reuse' the thread (supposing I have a session associated with that thread) to make my 'database calls'.
I think I couldn't use TPL because I'm not guaranteed that the two tasks would run in the same thread (it's not even guaranteed that they will be run in different threads than the invoker)
I would prefer to use session-per-form, as I have seen similar discussions that end by using session-per-conversation or something like that. But anyway, if you find that session-per-conversation would be better, please tell me (and hopefully explain why)
Threads don't provide a way to directly run more than one method, so I think I would have to 'listen' for requests, but I'm still unsure if I really have to do this and how I would 'use' the session (and save it) only inside the thread.
EDIT:
Maybe I'm having this problem because I'm confusing thread-safety with something else.
When the NHibernate documentation says that ISession instances are not thread-safe, does it means that I will (or could) get into trouble if two threads attempt to use it at the same time, right? In my case, if I use TPL, different threads could use the same session, but I wouldn't perform more than one operation in the same session at the same time. So, would I get into trouble in that situation?

If I may make a suggestion, desktop applications are poorly suited to interact with the database directly. The communication is not encrypted and it's really easy for someone with even the slightest amount of know-how to grab the database password and begin messing with records using a SQL connection and corrupt your database.
It would be better to create a web service with authentication that stands between the desktop application and the database as you could create credentials for each person and every transaction would be forcibly subjected to your various business rules.
This would also take care of your threading issue as you would be able to create HTTP connections on another thread with little to no trouble concerning session management. A cookie value is likely all that would be required and RestSharp makes this fairly trivial.

Good remote application logging/monitoring software

I'm sure this has already been done, but Google isn't helping me - I'm getting swamped with answers for similar but different problems:
My boss has asked me to find or build a system that will log uses of our kiosk installations. We build kiosks using java, native c++, c#, python and using things like Unity. We saw another company we worked with using a simple system where a post call with data was logged on a remote site to be checked later. The system allowed the application programmer to decide the contents of the message, and was able to allocate it to either debug or release according to the programmer's wishes.
An example of the log output might be:
[Debug] 28-11-2011 10:10:20 Kiosk1: Pulse
[Debug] 28-11-2011 10:10:25 Kiosk1: Button pressed
[Debug] 28-11-2011 10:10:45 Kiosk1: Widget used
[Debug] 28-11-2011 10:11:20 Kiosk1: Pulse
I looked at log4net/log4j, but that doesn't seem to be compatible with native c++ or python. I'm probably mistaken there :).
Does anyone know of a system that works like this, or that will otherwise be suitable for logging from such diverse languages? If not, I can write my own easily enough. I just don't want to have to support it :)
Regards,
Steve

I'm not sure, but I think what you're looking for is SPLUNK. This can parse almost every log and display it in a unified manner. It can listen to ports, read log files via polling and parses and indexes anything you throw at any point of time.
You can use this to set up you're own multi-language logging server/system. We've been using this and it seamlessly works in our distributed environment.

While writing a specialized logging backend to handle logging both locally and to the network is quite possible, I would advise against it. The reason being that network latency can be to long so it either stops your application, or logging messages can be queued up if using another process/thread to do the actual network pushing.
A much simpler solution is to use little script that is scheduled to run once or a couple of times per day, and that copies the log file(s) to the remote location.

For C++ I highly recommend Poco logging. It allows you to specify the formatting and log level/output using e.g. a properties file.

the python logging library that is included with python is quite similar to log4net, so if you are used to those, the other will be quite easy to understand, but they do not share code (as far as I know)

Use log4j/log4net with a socket appender or log remotely via rsyslog.

You might be interested in something like web beacons. I know it's not exactly what you're asking for, but you ought to think about it for the same reason that web developers do: it's good to know what users are doing.

A method for high-load web site logging to file?

I need to build in click and conversion tracking (more specific and focused than IIS log files) to an existing web site. I am expecting pretty high load. I have investigated using log4net, specifically the FileAppender Class, but the docs explicitly state: "This type is not safe for multithreaded operations."
Can someone suggest a robust approach for a solution for this type of heavy logging? I really like the flexibility log4net would give me. Can I get around the lack of safe multi-threading using lock? Would this introduce performance/contention concerns?

While FileAppender itself may not be safe for logging, I'd certainly expect the normal access routes to it via log4net to be thread-safe.
From the FAQ:
log4net is thread-safe.
In other words, either the main log4net framework does enough locking, or it has a dedicated logging thread servicing a producer/consumer queue of log messages.
Any logging framework which wasn't thread-safe wouldn't survive for long.

You could check out the Logging Application Block available in the Microsoft Enterprise Library. It offers a whole host of different types of loggers, as well as a handy GUI configurator that you can point to your app.config\web.config in order to modify it. So there's not need to sift through the XML yourself.
Here's a link to a nice tutorial on how to get started with it:
http://elegantcode.com/2009/01/20/enterprise-library-logging-101/

I'm also interested in the answer, but I'll tell you what I was told when I tried to find a solution.
An easy way around it would be to use something like an SQL database. If the data you want isn't well suited for that, you could have each page access write it's own log file and then periodically merge the log files.
However, I'm sure there's a better solution.

When using syslog, you won't be having any threading issues. Syslog, sends the loglines using UDP to a logdaemon (could potentially be on the same machine).
Works especially great if you have more running processes/services, since all log lines are aggregated in 1 viewing tool.
if you expect really heavy loads, look at how the guys from facebook do it: http://developers.facebook.com/scribe/ You can use their opensource logtool. I don't think you'll hit their kind of load just yet, so you should be safe for some time to come!
R

Using Performance Counters to track windows services

I'm working with a system that consists of several applications and services, almost all using a SQL database.
Windows services do different things at different times, and I would like to track them. Meaning that on some deployed systems we see the machine running high on CPU, we see that the sql process is running high, but we can't be sure about which service is responsible for it.
I wonder if Performance Counters are good for this job.
Basically I would like to be able to see at a certain moment which service woke up and is processing something.
It seems to me that I can end up having a perfcounter that only has the value 0 or 1 for each service to show if it is doing something, but this doesn't seem like a normal usage for perfcounters.
Are performance counters suitable?
Do you think I should track this in a different way?

If your monitoring framework/approach already centers around monitoring performance counters, this is a viable approach.
Personally I find more detailed instrumentation necessary to really understand what's happening in my services (though maybe that has to do with the nature of my services).
I use .NET Logging Framework because it's simple and can write to multiple targets including log files, the event log, and a TCP socket (I have a simple monitor that listens on the logging socket for each app server and shows me in real-time what's happening).

Performance Counters are attractive because they are really lightweight, but as you say, they only allow you to capture numeric values. Sure, there's a slew of different types of values you can record, such as averages, deltas and totals, but they have to be numbers.
If you need more information than that, you must resort to some other type of instrumentation. In your case, it sounds like your need goes more in that direction.
If your services don't wake up and suspend themselves too often, it sounds like informational message to a custom event log might be a good idea. Create a custom event log for the application if you expect a fair amount of these so as to not flood the regular Application event log.
The .NET Trace API will be a better option if you expect the instrumentation to generate too much data for the normal event log. You can configure your application(s) to trace or not based on app/web.config, although a change will require a restart of the app. This is a good option if you only wish to use the instrumentation for troubleshooting, but it otherwise generates too much data or if tracing itself degrades performance too much. Another good thing about the Tracing API is that you can Trace on multiple levels, so even if you have written code to Trace very verbosely, you will only see that verbose trace data if you enable verbose tracing. That gives you better control of just what is being traced.

Eric J has a good point. I think if you really want to capture "timing" performance you'll have to use some other sort of logging and use start and stop time logs. I personally like log4net though it can be a pain to configure the first time around

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.