SslStream responds differently when accessed as COM object - c#

I am working with an on a project where the bulk of the code is C++. The shop is migrating to C# in the long run, so where possible we are making new code in C# and exposing to C++ via COM.
I have wrapped an System.Net.Sockets.SslStream and a little bit of other functionality in a COM object that is intended to send and then receive messages. When calling the functions on this class from a test C# program I am able to send and receive messages without issue. Making the same calls, which are exposed via COM, seems to work as well, except I cannot receive data immediately after I send.
From the C# test data sends and receives quite quickly and I get the responses I should. From C++ I always get timeout errors. The data in the functions making the call to Read is identical in the C# test and from the C++ program. Much of the state in the SslStream is identical. Just after the write call and before the Read the only differences appear to be, several handles and what appear to be memory addressest, but I assume those are not important. At this same time I noticed that in an item called 'base', then inside an item called 'InnerStream', then inside 'System.Net.Sockets.NetworkStream' there is a property called 'DataAvailable'. This is true in the C# test where it works and false in the C++ program where it fails. I am not aware of any meaningful difference between these project beyond what I have described.
I can provide further details about troubleshooting or snippets of code. I have not included code here because tjust the pieces related to the problem would still be immense. I hope there is some kind of magic answer as to what is going on, however the error is almost certainly in depth and very specific. I would appreciate anything that provides insight on further troubleshooting steps.
What kinds of complications does calling C# though COM impose that I may not have taken in account?
Where did the other sides message go if not into the SslStream buffer?
Where should I be looking?

I have discovered the answer, and it is unrelated to COM, C#, C++ and instead has to do with the formatting of the message being sent between the systems involved. The other system uses an extra carriage return to indicate the end of the message. When missing the remote system simply stops responding until another SSL session is started.
At a previous point in the project I was including an extra line break at the end of messages sent to the server. I had copied one of these messages to produce my entirely c# test. Several times I also took the messages being sent from both codebases and put them into a merge/diff viewer. I never noticed this difference because I had disabled white space matching.
From now on when I compare raw output to other raw output, I will make sure that none of my tools will hide differences from me.

Related

Might GC.Collect() be warranted in this particular case?

Disclaimer: Yes, I know that the general answer to whether or not to use GC.Collect() is a resounding "NO!". This is the first time in several years of programming that I ever consider using it at all.
Well then, here's the situation: We have developed a C# scripting tool based on the Microsoft.CodeAnalysis.CSharp.Scripting libraries (v3.6.0). It's a Winform GUI with editor etc., not unlike others out there. We use it for the validation of integrated circuits, meaning that its primary task is interfacing lab equipment such as power supplies, pattern generators, meters and the like. For the communication to said instruments we predominantly rely on National Instrument's VISA framework, albeit not exclusively. Some devices are controlled directly via DLLs from their respective manufacturers. In general, this system is working beautifully and by now it is successfully used by quite a lot of design engineers who do not know the first thing about the intricacies of .NET and C#.
At this point I should explain that the user can simply write a method (i.e. on "top-level") and then execute it. The Roslyn-part behind this is that the input is fed to CSharpScript.Create() and then compiled. The execution of a method is done via Script.ContinueWith("method name"). Inside of such a method the user can construct an object like, say, new VISA("connection string"), which connects to the device and then communicate with the device via this object. Nothing forces him or her to care about disposing the object (i.e. closing the connection).
Now, the problem is this: recently, very sporadic crashes of the GUI application have occurred with no feedback at all from the system - the form just closes and that's it. By trial-and-error we are currently 99% sure that if all connection objects are explicitely disposed within a method, the crashes do not occur. So, rewriting the method to something like this fixes the problem:
using(var device = new VISA("connection string"))
{
device.Query("IDN?");
}
The reason why I look into the GC's direction at all is that there is no discernible correlation to any actions from the user. The guys might run such methods for an hour without a problem and then, when scrolling in the editor, when no method is currently being executed, the GUI closes without comment. And that's why I'd like to get some input from people more knowledgeable about Roslyn and the GC:
Are there known issues with this scripting library and GC? (I would very much assume that there aren't)
Since the explicit disposal of objects seem to prevent the issue, might this be one of the extremely scarce situations where the use of GC.Collect() might be warranted? (admittedly, I could not yet test whether that also prevents the problem thanks to of home office)
Any ideas what can cause a .NET application to crash without any kind of feedback and how to obtain more information about such a crash? (the scripting engine is a separate DLL, as are the device drivers; the GUI only handles the graphics)
I am fully aware that this is a rather vague description of the problem with very little source code. This is due to the fact that the application comprises of quite a lot of source code and I have no idea what might be relevant here. Also, all namespaces in the above text refer to Microsoft.CodeAnalysis.CSharp.Scripting, except for VISA, which is self-defined. Obviously, I will gladly answer any follow-up questions for getting to the bottom of this.
Thanks in advance.
Short answer: No. It's not only not warranted, it's completely missing the actual issue.
Further explanation: #canton7 instantly hit the nail on the head when writing
I'd argue that your application shouldn't crash even if a finalizer does end up being called
The root issue hid inside a 3rd party DLL in form of an, at the very least, suboptimal implementation of IDisposable. Once I zoomed in on that, it was rather easy to produce a workaround for that.
My original question is so very misguided that I'd like to state the one that I should have asked:
How do I trace a crash of my C# application when my application's logging does not show anything?
This question has been answered comprehensively in a number of posts. In my case, the crash could be seen in the Windows event log.

Inconsistent behavior from ServicePointManager.SetTcpKeepAlive

I've been trying to get tcp keep alive packets to send using System.Net.Http.HttpClient. As far as I can see the only way to do this is using ServicePointManager.SetTcpKeepAlive(true, X, Y). When testing this in a LinqPad script I have got it working although it's not consistent. For example if I do a manual call to ServicePointManager.FindServicePoint(myUrl) before I call ServicePointManager.SetTcpKeepAlive then it won't work, I assume because of caching, however if it's the first thing that happens it usually works (I'm fairly certain it's still inconsistent here). Note that I am checking whether it works or not using Wireshark.
However when I try to use this in my real UWP application it fails. I've tried setting this as the first App.xaml.cs constructor and just before instantiating the HttpClient and various places in between without any luck.
Am I missing something?
Note that I realise it is possible to use HttpWebRequest directly and set this on it's ServicePoint instance but I'd like to know why this isn't working first before resorting to something like this.
EDIT: I ended up trying to implement this with HttpWebRequest and ServicePoint.SetTcpKeepAlive and while it compiles, it fails when it's called with a Operation is not supported on this platform exception. I guess this means the main problem here might be that UWP just doesn't support sending TCP keep alive packets?
EDIT2: I've created a minimum reproducible example here: https://github.com/csuzw/TcpKeepAliveTest . In this case the HttpWebRequest approach does not fail with the Operation is not support on this platform exception but it doesn't send keep alive packets either. I wonder if the difference is that my real app is a Xamarin.Forms UWP target app, whereas this test app is a straight UWP app. Regardless both approaches used in this test app fail to produce keep alive packets.

if i don't use NetworkComms.Shutdown will i break something?

Hey i recently created a text message application in c# that sends messages back in forth in a console. I used NetworkCommsDotNet & NetworkCommsDotNet.Connections.
When i was researching about it i found a command NetworkComms.Shutdown() http://www.networkcomms.net/api/html/M_NetworkCommsDotNet_NetworkComms_Shutdown.htm
I'm also new to programming so i really didn't completely understand what they where saying and was still left wondering if I don't use this in my program, will it break something or mess up my router in any way?
ps - the program works and i had success with testing it between two computers on my home network.
I haven't used this, nor even know what it is, however i am good at reading documentation and believe what they tell me (for the most part)
Shutdown all connections, threads and execute OnCommsShutdown event.
Any packet handlers are left unchanged. If any network activity has
taken place this should be called on application close.
The reason why its telling you this, is that is most likely using unmanaged resources, and most likely wants to gracefully shut them down or clean them up. Since there is no open source for this project, we can only listen to what its telling you

Handle Pure Virtual Function Call in C#

I am using C# to run a directshow graph and a third party filter by MainConcept errors with a Pure Virtual Function Call.
Is it possible to handle c++ runtime pure virtual function calls in C# gracefully?
There are no other exceptions provided as a popup displays over the app pointing to the directshow filter. Nothing logged in event viewer either.
The problem source is in the third party component you have as a binary. It stumbles on certain internal problem, displays the box then terminates the process. You can of course send relevant information to component vendor (MainConcept) so that they possibly fix that on their side.
There is little you can do here except one thing. Apparently the problem is related to certain specific external behavior or data you stream through this component. Examples of this include: specific order of termination calls, ill-formed input, calls from multiple threads. If you happen to see the pattern of what might be causing the problem exactly, then you can possibly prevent the scenario from taking place.

ZeroMQ subscriber fails to initialize using 1000+ publishers

I am trying to evaluate ZeroMQ for a larger monitoring and data gathering system. On a smaller scale everything works nice but stepping up the load and scale a bit seems tricky.
Right now I am using a C# wrapper (clrzmq, 3.0.0-rc1) to create both a publisher and a subscriber application. I am binding the Publisher socket (1 socket, 1 context) to 1000 endpoints (localhost + a range of ports) and let the Subscriber applications socket (again 1 socket, 1 context) bind to the publisher endpoints.
This sometimes works, and sometimes not (I guess it relates to the max number of sockets handled by the process somehow). It seems to depend on in which order I start the applications but I cannot tell for sure. The only thing I see is nasty SEHExceptions, containing no details at all. If I create simple console applications I sometimes see low level C++ Asserts like:
Assertion failed: fds.size () <= FD_SETSIZE (......\src\select.cpp:70)
Assertion failed: Permission denied (......\src\signaler.cpp:281)
Assertion failed: Connection reset by peer (......\src\signaler.cpp:124)
Not very helpful to me. In the C# wrapper, the Context creation fails. It does not even get a chance to begin connecting to or even creating sockets. I would expect low level ZeroMQ errors to be handled by throwing exceptions, maybe I just have not understood how to deal with errors yet.
The questions I have right now is:
How do I create a (somewhat) realistic test setup to simulate 1000 separate publishers on a single machine (in real world 1 publisher = 1 machine) and a couple of Subscribers on Another machine, all using C#. Is that even possible?
More importantly, how do I trap ZeroMQ errors in C# code to be able to understand what goes wrong?
Since ZeroMQ seems pretty stable and mature I have a hard time believing 1000 publishers should be a problem to handle. However, I need better error support than currently available (unless I completely missed something here) in order to use ZeroMQ over C#.
Update:
After diggin into the source, I end up with a zmq_assert(...) leading to RaiseException (0x40000015, EXCEPTION_NONCONTINUABLE, 1, extra_info);. This will abruptly terminate the application after dumping the original assert statement to the console. This seems a bit harsh, but may well be the best option given that it is really unrecoverable. However, a somewhat better error message would not hurt. Not everyone knows what fds.size () <= FD_SETSIZE means. The comment in the source gives some clues, would be nice to have that comment in the error message. Anyway, given that my application is not a console app, this just leaves me with an unhandled SEHException, which does not seem to contain even the assert statement or line/file info. I wonder how many other bugs I will create that will result in other similar cryptical errors.
After looking into this a bit more, it seems the default number of sockets are set to 1024. The C# wrapper has a property on the Context object that should be able to change this setting but it is not working, at least not as expected. Also, the native zmqlib does not have this setting on the context object.
Running a setup like in the description does not seem possible, at least not using the clrzmq C# ZeroMQ wrapper. I solved it by running 500 publishers on a separate machine and another 500 plus 1000 subscribers on another machine. This worked nice without any errors.
The other topic is also a bit disappointing. When the maximum number of sockets are reached, ZeroMQ simply throws an uncatchable exception causing the application to crash abruptly. This is a fail fast approach, avoiding any further data/state corruption but unfortunatly also leaves very few clues to what happend that caused the application to die. Judging from other posts, it seems very hard to gather data for post-mortem when this happens. Catching the exception in the C# code seems impossible or very hard, and hooking into the stdout to capture the printed assert also seems very hard to achieve (if we are not running from a command prompt, in which case the assert message is printed just before the application dies).
All-in-all, this makes low-level trouble shooting and post-mortem analysis in a non-console C# setting very hard when ZeroMQ terminates via the zmq_assert(...) call. Hopefully this was an extreme case. Not all failure modes seems to cause termination in this abrupt way.
The default FD_SETSIZE is 1024 (defined in the MSVC libzmq project), so you will hit this about half-way through your test case. The other asserts tumble on from that.
Increase this in your libzmq project, to 4K or 8K, and things should work better.
As for the assert() call, it's too brutal on Windows, for sure. On Linux this gives a decent stack dump and enough information to trace the problem. Feel free to improve the assert macro so that it does something smarter, e.g. launch the debugger. In any case if you hit an assert you can't reasonably continue.
Asserting when the FD set is full, well, that could be handled better. If you know anything about C/C++, feel free to take a look at the code. We do depend on peoples' patches.
Also, if you feel 1024 is too small, feel free to raise this in the project and send us the patch.
A quick and dirty look into this problem suggest that you're creating too many socket connections for your computer. Check out this link on the max number of sockets from MSDN. The error's you are getting look suspiciously relevant enough for this to be a possible source of your error.
To be honest, having 1000 separate publishers seems like you are tackling the problem a little incorrectly for using zmq. Why not have 1 publisher and use 'namespaces' and have the subscribers SUBSCRIBE to what it needs to split out what messages subscribers get.

Categories