Should I use exceptions in C# to enforce base class compatibility? - c#

On one hand, I'm told that exceptions in C# are 'expensive', but on the other, I'm stuck on how to implement this.
My problem is this: I'm making a Stream derivitive, that wraps a NetworkStream. Now, the problem I'm facing is this: Read(byte[] buffer, int offset, int count). From the Stream docs for the function:
Returns:
... or zero (0) if the end of the stream has been reached.
The problem is, in the protocol I'm implementing the remote side can send an "end of record" token, or a "please respond" token. Obviously, if this happens at the start of the Read() this causes problems, since I need to return from the function, and I havn't read anything, so I need to return 0, which means the stream is finished, but it isn't... is a EndOfRecordException or similar justified in this case? And in this case, should it aways be thrown when this token is encountered (at the start of the Read() call and make sure these tokens are always at the start by returning early) so that there is some sort of pattern to how these tokens should be handled.
Edit: For what it's worth, these tokens generally come through 3-10 times a second. At the most, I wouldn't expect more than 25 a second.

Exceptions aren't really all that expensive - but they also aren't necessarily the best way to manage expected/normal flow.
To me, it sounds like you aren't actually implementing a Stream - you are encapsulating a stream into a "reader". I might be inclined to write a protocol-specific reader class with suitable methods to detect the end of a record, or Try... methods to get data or return false.

It sounds like you shouldn't really be deriving from Stream if your class is concerned with records. Streams don't generally interpret their data at all - they're just a transport mechanism of data from one place to another.
There have been cases like ZipInputStream in Java which end up being very confusing when a single InputStream effectively has several streams within it, and you can skip between them. Such APIs have been awful to use in my experience. Providing a separate class to implement the "record splitting" which can provide a stream for the data within a record sounds cleaner to me. Then each stream can behave consistently with normal streams. No need for new exceptions.
However, I'm just guessing at your context based on the limited information available. If you could give more details of the bigger picture, that would help.

It's not such a big deal performance-wise, but still... Exceptions are intended for, well, exceptions. Situations that are "unusual". If that is the way the underlying stream behaves, then your stream should be able to handle it. If it can, it should handle it on its own. If not, you can have the user set some callback or something which will get called when you receive a "please respond" token.

I believe that Stream-derived class should deal only with streaming issues and adhere to Stream semantic contract. All higher-level logic (interpreting EOF and EOR tokens) should be placed in some other class.

Maybe you can create an enum that you return, this enum can contain items for EndOfRecord, EndOfStream, ReadOk or whatever you need.
The actual read data can be passed as an out parameter.

Related

Get contiguous Memory<byte> from ReadOnlySequence<byte>

After a wonderful introduction to pipes on Marc Gravell's blog post on the subject, I am tinkering with implementing pipes with sockets.
I know that Marc has already come up with Pipelines.Sockets.Unofficial
, and I am using that as a reference, but I have a question.
It seems that SocketAsyncEventArgs has a new overload to the SetBuffer() method: SetBuffer(Memory<byte>)
It seems that the intent here is to integrate nicely with Pipes.
My confusion arises from the fact that Pipe.Reader.ReadAsync() returns a ReadResult containing a ReadOnlySequence<byte> (ReadResult.Buffer)
In the case that Buffer.IsSingleSegment == true, it's fairly obvious what to do:
SocketAsyncEventArgs.SetBuffer(Buffer[0])
But in the case where there are multiple segments, I'm not entirely sure what the best course of action is.
I could of course just get a byte[] from the pipe and be done with it, but that would incur a copy (possible more than one, even).
What is the intended use of ReadOnlySequence<byte> here? Is there a way to get `Memory' representing the whole contents of the sequence?
Perhaps I need to re-re-read Marc's blog post...
For what I see:
There is this ReadOnlySequence<>.First which in case of IsSingleSegment should give you ReadOnlyMemory<> you are looking for.
For sequences with more segments I think you should iterate through them with
var iSegment = seq.Start;
ReadOnlyMemory<byte> readMemory;
while(seq.TryGet(ref iSegment, out readMemory, advance: true))
{ /* do smth */ }
Alternatively I think, this does the same thing:
foreach(var memory in seq)
{ /* do smth */ }
It makes sense to iterate through if you care about performance and don't want to make a copy of the whole buffer (that is what .ToArray() extension method does).
The intention IMHO was to effectively represent non-continuous buffers.
I am not sure about Sequence<> implementation details, but it is common in most of buffering scenarios to hold couple of small buffers(3, 2) and switch between them so to not allocate new memory unnecessarily. That is sometimes abstracted as "Circular" or "Ring" buffers.
It leads to such "sequential" memory access pattern as in Sequence<>.

.Net streams: Returning vs Providing

I have always wondered what the best practice for using a Stream class in C# .Net is. Is it better to provide a stream that has been written to, or be provided one?
i.e:
public Stream DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream;
}
as opposed to;
public void DoStuff(Stream myStream, ...)
{
//write to myStream directly
}
I have always used the former example for sake of lifecycle control, but I have this feeling that it a poor way of "streaming" with Stream's for lack of a better word.
I would prefer "the second way" (operate on a provided stream) since it has a few distinct advantages:
You can have polymorphism (assuming as evidenced by your signature you can do your operations on any type of Stream provided).
It's easily abstracted into a Stream extension method now or later.
You clearly divide responsibilities. This method should not care on how to construct a stream, only on how to apply a certain operation to it.
Also, if you're returning a new stream (option 1), it would feel a bit strange that you would have to Seek again first in order to be able to read from it (unless you do that in the method itself, which is suboptimal once more since it might not always be required - the stream might not be read from afterwards in all cases). Having to Seek after passing an already existing stream to a method that clearly writes to the stream does not seem so awkward.
I see the benefit of Streams is that you don't need to know what you're streaming to.
In the second example, your code could be writing to memory, it could be writing directly to file, or to some network buffer. From the function's perspective, the actual output destination can be decided by the caller.
For this reason, I would prefer the second option.
The first function is just writing to memory. In my opinion, it would be clearer if it did not return a stream, but the actual memory buffer. The caller can then attach a Memory Stream if he/she wishes.
public byte[] DoStuff(...)
{
var retStream = new MemoryStream();
//Write to retStream
return retStream.ToArray();
}
100% the second one. You don't want to make assumptions about what kind of stream they want. Do they want to stream to the network or to disk? Do they want it to be buffered? Leave these up to them.
They may also want to reuse the stream to avoid creating new buffers over and over. Or they may want to stream multiple things end-to-end on the same stream.
If they provide the stream, they have control over its type as well as its lifetime. Otherwise, you might as well just return something like a string or array. The stream isn't really giving you any benefit over these.

Oop data structure advice

I am writing a log file decoder which should be capable of reading many different structures of files. My question is how best to represent this data. I am using C#, but am new to OOP.
An example:
The log files have a range of sensor values. One sensor reading can be called A, another B. Obviously, there are many more than 2 entry types.
In different log files, they could be stored either as ABABABABAB or AAAAABBBBB.
I was thinking of describing this as blocks of entries. So in the first case, a block would be 'AB', with 5 blocks. In the second case, the first block is 'A', read 5 times. This is followed by a block of 'B', read 5 times.
This is quite a simplification (there are actually 40 different types of log file, each with up to 40 sensor values in a block). No log has more than 300 blocks.
At the moment, I store all of this in a datatable. I have a column for each entry, with a property of how many to read. If this is set to -1, it continues to the next column in the block. If not, it will assume that it has reached the end of the block.
This all seems quite clumsy. Can anyone suggest a better way of doing this?
I think you should first start here, and then here to learn a little bit about what object oriented programming is. Don't worry about your current problem while learning about OOP.
As you are learning about OO concepts, you should begin to understand code is not data, and data is not code. It does not matter how you represent your data from an OOP stance. You can write OO code to consume your data, or you could write procedurage code to consume your data, that part is irrelevant to the format of the data.
So then getting back to your question
My question is how best to represent this data
It depends on your needs. What is writing the log file? Do you have control over the writer and reader? If I did I would rely on build the built in serialization methods to minize the amount of code I need to write. Is the log file going to be really long? If so the "datatable" approach you described is usually better. If the log file isn't going to be a huge in file size, XML is really easy to work with.
Very basic and straightforward:
Define an interface for IEnrty with properties like string EntryBlock, int Count
Define a class which represents an Entry and implements IEntry
Code which doing a binary serialization should be aware of interfaces, for instance it should reffer IEnumerable<IEntry>
Class Entry could override ToString() to return something like [ABAB-2], surely if this is would be helpful whilst serialization
Interface IEntry could provide method void CreateFromRawString(string rawDataFromLog) if it would be helpful, decide yourself
If you want more info please share code you are using for serialization/deserializaton
In addition to what Bob has offered, I highly recommend Head First Design Patterns as a gentle, but robust introduction to OO for a C# programmer. The samples are in Java, which translate easily to C#.
As for OOP, you want to learn SOLID.
I would suggest you build this using Test Driven Development.
Start small, with a simple fragment of your log data and write a test like (you'll find a better way to do this with experience and apply it to your situation):
[Test]
public void ReadSequence_FiveA_ReturnsProperList()
{
// Arrange
string sequenceStub = "AAAAA";
// Act
MyFileDecoder decoder = new MyFileDecoder();
List<string> results = decoder.ReadSequence(sequenceStub);
// Assert
Assert.AreEqual(5, results.Count);
Assert.AreEqual("A", results[0]);
}
That test code snippet is just a starting point, and I've tried to be rather verbose in the assertions. You can come up with more creative ways over time. The point is to start small. Once this test passes, add another test where you mix "AB" and change your decoder to handle this properly. Eventually, you'll have a large set of tests that handle your different formats. Using TDD, you'll be on the path to using SOLID properly. Whenever you find something you can't test, you should review the rules and see if you can't make it simpler and inject dependencies.
Eventually you'll get into mocking. For example, you might find that you'd rather INJECT the ability for your MyFileDecoder class to have a dependency that will read your log file. In that case, you would create a mock object and pass that into the constructor and set the mock to return the sequenceStub when a method is called.

Cancelling some data processing with return values or exceptions. What pattern is more suitable?

I have lots of code which is doing some data processing (in C# to be more specific). Very often the data may be only processed, if some criteria are met. Since the criteria can be rather complex, they are checked in a lazy fashion. They are not checked beforehand. Thus: If during the processing some criterion is not matching, the processing needs to be cancelled or shortcutted.
Since this can occure quite frequently and there are cases where this is considered to be nothing exceptional, I work with return values and with a pattern like:
if (string.IsNullOrEmpty(customerSecondaryAddress))
{
LogCustomerHasNoSecondaryAddress(entry);
return ProcessingStatus.IsProcessed;
}
ProcessSecondaryAddress(customerSecondaryAddress);
return ProcessingStatus.Continue;
I was hesitant to use Excpetions instead, because not matching criteria like in the example above is nothing exceptional. Besides it yields method signatures, which communicate quite clearly their purpose:
public ProcessingStatus WriteRecipientListFor(Customer customer)
The problem is, that I have to pass the status code around (ProcessingStatus). This is especially cumbersome, when the processing logic is quite nested. In this case I need to pass the status code up the callstack and return statements are scattered around in the code.
My questions are: Is the return value approach appropriate? Should I switch to Exceptions instead? Are there other patterns or approaches which I can use instead?
The guidelines I like are:
Return Values
When the purpose is to attempt to do something. TryParse, FirstOrDefault, SendAttempt work for this.
When the failing a criteria is not necessarily a total failure, and depending on what can get done, the calling method will need to behave differently. Enums work here.
Exceptions:
When a method's purpose is to achieve something (not attempt to achieve something), and it has no way to complete its task or recover from its state.
I especially use exceptions when I have a class that has properties where the status can be checked, and an operation is called when the status is in a bad state. That method isn't going to know how the calling method is going to want to fix the class's state.
In these cases I usually use exceptions - just because it makes it easy to stop processing in any place and doesn't clutter your code with error handling code. You shouldn't think about Exceptions as something that happens rarely, it is more appropriate to think about then as something that breaks the positive processing path. So I use them to signal a situation when it is not possible to continue processing, which is exactly your case.
It is obviously good practice to define your own exception type and catch it in a single place so that its obvious what it is used for.

CRUD operations; do you notify whether the insert,update etc. went well?

I have a simple question for you (i hope) :)
I have pretty much always used void as a "return" type when doing CRUD operations on data.
Eg. Consider this code:
public void Insert(IAuctionItem item) {
if (item == null) {
AuctionLogger.LogException(new ArgumentNullException("item is null"));
}
_dataStore.DataContext.AuctionItems.InsertOnSubmit((AuctionItem)item);
_dataStore.DataContext.SubmitChanges();
}
and then considen this code:
public bool Insert(IAuctionItem item) {
if (item == null) {
AuctionLogger.LogException(new ArgumentNullException("item is null"));
}
_dataStore.DataContext.AuctionItems.InsertOnSubmit((AuctionItem)item);
_dataStore.DataContext.SubmitChanges();
return true;
}
It actually just comes down to whether you should notify that something was inserted (and went well) or not ?
I typically go with the first option there.
Given your code, if something goes wrong with the insert there will be an Exception thrown.
Since you have no try/catch block around the Data Access code, the calling code will have to handle that Exception...thus it will know both if and why it failed. If you just returned true/false, the calling code will have no idea why there was a failure (it may or may not care).
I think it would make more sense if in the case where "item == null" that you returned "false". That would indicate that it was a case that you expect to happen not infrequently, and that therefore you don't want it to raise an exception but the calling code could handle the "false" return value.
As it standards, you'll return "true" or there'll be an exception - that doesn't really help you much.
Don't fight the framework you happen to be in. If you are writing C code, where return values are the most common mechanism for communicating errors (for lack of a better built in construct), then use that.
.NET base class libraries use Exceptions to communicate errors and their absence means everything is okay. Because almost all code uses the BCL, much of it will be written to expect exceptions, except when it gets to a library written as if C# was C with no support for Exceptions, each invocation will need to be wrapped in a if(!myObject.DoSomething){ System.Writeline("Damn");} block.
For the next developer to use your code (which could be you after a few years when you've forgotten how you originally did it), it will be a pain to start writing all the calling code to take advantage of having error conditions passed as return values, as changes to values in an output parameter, as custom events, as callbacks, as messages to queue or any of the other imaginable ways to communicate failure or lack thereof.
I think it depends. Imaging that your user want to add a new post onto a forum. And the adding fail by some reason, then if you don't tell the user, they will never know that something wrong. The best way is to throw another exception with a nice message for them
And if it does not relate to the user, and you already logged it out to database log, you shouldn't care about return or not any more
I think it is a good idea to notify the user if the operation went well or not. Regardless how much you test your code and try to think out of the box, it is most likely that during its existence the software will encounter a problem you did not cater for, thus making it behave incorrectly. The use of notifications, to my opinion, allow the user to take action, a sort of Plan B if you like when the program fails. This action can either be a simple work around or else, inform people from the IT department so that they can fix it.
I'd rather click that extra "OK" button than learn that something went wrong when it is too late.
You should stick with void, if you need more data - use variables for it, as either you'll need specific data (And it can be more than one number/string) and an excpetion mechanism is a good solution for handling errors.
so.. if you want to know how many rows affected, if a sp returned something ect... - a return type will limit you..

Categories