As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Should I make my classes immutable where possible?
I once read the book "Effective Java" by Joshua Bloch and he recommended to make all business objects immutable for various reasons. (for example thread safety)
Does this apply for C# too?
Do you try to make your objects immutable, so you have less problems when working with them?
Or is it not worth the inconvenience you have to create them?
The immutable Eric Lippert has written a whole series of blog posts on the topic. Part one is here.
Quoting from the earlier post that he links to:
ASIDE: Immutable data structures are the way of the future in C#. It is much easier to reason about a data structure if you know that it will never change. Since they cannot be modified, they are automatically threadsafe. Since they cannot be modified, you can maintain a stack of past “snapshots” of the structure, and suddenly undo-redo implementations become trivial. On the down side, they do tend to chew up memory, but hey, that’s what garbage collection was invented for, so don’t sweat it.
This is going to be more of an opinion type answer but...
I find that the ease of understanding a program, i.e. maintaining and debugging said application, is inversly proportional to the amount of stateful transitions that occur during the processing of each component. The less state I need to cart around in my head, the more focus I can pay attention to the logic within the algorithms as it is written.
Immutable objects are the central feature of functional programming; it has its own advantages and disadvantages. (E.g. linked lists are practically impossible to be immutable, but immutable objects make parallelism a piece of cake.) So as a comment on your post noted, the answer is "it depends".
Off the top of my head, I can't think of a reason for immutable objects making thread safe code somehow "better".
If I want an object to be thread safe, I will either put a lock around it or I will make a copy of it and update the reference once I'm done working on it. I typically wouldn't want a new object for every little change.
For me, immutable strings create more headaches for threading than it helps.
I actually went out of my way to make an "in-place" "ToUpper" using unsafe code isntead of the built in String.ToUpper(). It runs about 4 times faster and consumes 1/2 the peak memory.
Another nice benefit of immutable structures is that you can locally cache instances of them and reuse them across multiple threads without fear of unexpected behaviors as would be the case if they were mutable.
For instance, suppose you are using an external caching service such as memcached or Velocity or some other equally simplistic distributed hashtable service. You could just use the C# client library and call it good enough. However, that is being wasteful with resources given a short-lived context like a web request scenario. What you really want is to pull each object from the cache once and only once in your context.
The safest way to get this job done is to place a local hashtable in your process in front of the cache provider. On the first request for the cache key you'd pull down the serialized byte stream that represents the object you wish to use and store that byte stream in your local hashtable. On subsequent requests for the same cache key, just look up the byte stream in the local hashtable and deserialize the object to a new instance for each request. This is to prevent multiple redundant trips to the cache server node for the same information that assumedly has not changed over the lifetime of your context.
With immutable structures, you could deserialize the byte stream only once on the first request and get away with storing the deserialized instance in the hashtable instead of the byte stream and just share that one single immutable instance of your object. This obviously cuts down on deserialization penalties which can add up rather quickly if your consuming code is written in such a fashion that it does not care how many calls it makes to the caching provider, assuming the cache is faster than querying your underlying data store.
Perhaps this is more of a subjective answer, but it's a specific problem that can be solved uniquely by using immutable structures so I thought it was relevant to share.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Before down-voting let me explain my question. I have a little experience in designing architectures and try to progress. Ones, when I was fixing a bug, I came up with a conclusion that we need to make our private method to be public and than use it. That was the fastest way to make my job done, and have a bug fixed. I went to my team-leader and said it. After I've got a grimace from him, I was explained that every public method is a very expensive pleasure. I was told that every public method should be supported throughout the lifetime of a project. And much more..
I was wondering. Indeed! Why it wasn't so clearly when I was looking in the code. It wasn't also so evidently when I designed my own architectures. I remember my thoughts about it:
Ahh, I will leave this method public, who knows, maybe it will come usefull when the system grows.
I was confused, and thought that I made scaleable systems, but in fact got tons of garbage in my interfaces.
My question:
How can you explain to yourself if a method is really important and worthy to be public? Are any counterexamples for checking it? How you get trained to make private/public choise without spending hours in astral?
I suggest you read up on YAGNI http://c2.com/cgi/wiki?YouArentGonnaNeedIt
You should write code to suit actual requirements because writing code to suit imagined requirements leads to bloated code which is harder to maintain.
My favourite quote
Perfection is achieved, not when there is nothing more to add, but
when there is nothing left to take away.
-- Antoine de Saint-Exupery French writer (1900 - 1944)
This question need a deep and thorough discussion on OOP design, but my simple answer is anything with public visibility can be used by other classes. Hence if you're not building method for others to use, do not make it public.
One pitfall of unecessarily making private method public is when other classes did use it, it makes it harder for you to refactor / change the method, you have to maintain the downstream (think if this happen to hundreds of classes)
But nevertheless maybe this discussion will never end. You should spend more time reading OOP design pattern books, it will give you heaps more idea
There are a few questions you can ask yourself about the domain in which the object exists:
Does this member (method, property, etc.) need to be accessed by other objects?
Do other objects have any business accessing this member?
Encapsulation is often referred to as "data hiding" or "hiding members" which I believe leads to a lot of confusion. Inexperienced developers would rightfully ask, "Why would I want to hide anything from the rest of my code? If it's there, I should be able to use it. It's my code after all."
And while I'm not really convinced with the way in which your team leader worded his response, he has a very good point. When you have too many connection points between your objects, you end up with too many connections. Objects become more and more tightly coupled and fuse into one big unsupportable mess.
Clearly and strictly maintaining a separation of concerns throughout the architecture can significantly help prevent this. When you design your objects, think in terms of what their public interfaces would look like. What kind of outwardly-visible attributes and functionality would they have? Anything which wouldn't reasonably be expected as part of that functionality shouldn't be public.
For example, consider an object called a Customer. You would reasonably expect some attributes which describe a Customer, such as:
Name
Address
Phone Number
List of orders processed
etc.
You might also expect some functionality available:
Process Payment
Hold all Orders
etc.
Suppose you also have some technical considerations within that Customer. For example, maybe the methods on the Customer object directly access the database via a class-level connection object. Should that connection object be public? Well, in the real world, a customer doesn't have a database connection associated with it. So, clearly, no it should not be public. It's an internal implementation concern which isn't part of the outwardly-visible interface for a Customer.
This is a pretty obvious example, of course, but illustrates the point. Whenever you expose a public member, you add to the outwardly-visible "contract" of functionality for that object. What if you need to replace that object with another one which satisfies the same contract? In the above example, suppose you wanted to create a version of the system which stores data in XML files instead of a database. If other objects outside of the Customer are using its public database connection, that's a problem. You'd have to change a lot more about the overall design than just the internal implementation of the Customer.
As a general rule it's usually best to prefer the strictest member visibilities first and open them up as needed. Combine that guideline with an approach of thinking of your objects in terms of what real-world entities they represent and what functionality would be visible on those entities and you should be able to determine the correct course of action for any given situation.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to design my own database engine for educational purposes, for the time being. Designing a binary file format is not hard nor the question, I've done it in the past, but while designing a database file format, I have come across a very important question:
How to handle the deletion of an item?
So far, I've thought of the following two options:
Each item will have a "deleted" bit which is set to 1 upon deletion.
Pro: relatively fast.
Con: potentially sensitive data will remain in the file.
0x00 out the whole item upon deletion.
Pro: potentially sensitive data will be removed from the file.
Con: relatively slow.
Recreating the whole database.
Pro: no empty blocks which makes the follow-up question void.
Con: it's a really good idea to overwrite the whole 4 GB database file because a user corrected a typo. I will sell this method to Twitter ASAP!
Now let's say you already have a few empty blocks in your database (deleted items). The follow-up question is how to handle the insertion of a new item?
Append the item to the end of the file.
Pro: fastest possible.
Con: file will get huge because of all the empty blocks that remain because deleted items aren't actually deleted.
Search for an empty block exactly the size of the one you're inserting.
Pro: may get rid of some blocks.
Con: you may end up scanning the whole file at each insert only to find out it's very unlikely to come across a perfectly fitting empty block.
Find the first empty block which is equal or larger than the item you're inserting.
Pro: you probably won't end up scanning the whole file, as you will find an empty block somewhere mid-way; this will keep the file size relatively low.
Con: there will still be lots of leftover 0x00 bytes at the end of items which were inserted into bigger empty blocks than they are.
Rigth now, I think the first deletion method and the last insertion method are probably the "best" mix, but they would still have their own small issues. Alternatively, the first insertion method and scheduled full database recreation. (Probably not a good idea when working with really large databases. Also, each small update in that method will clone the whole item to the end of the file, thus accelerating file growth at a potentially insane rate.)
Unless there is a way of deleting/inserting blocks from/to the middle of the file in a file-system approved way, what's the best way to do this? More importantly, how do databases currently used in production usually handle this?
the engines you name are very different... and your engine seems to have not so much in common with them... your engine sound similar to the good old dBase format...
For deletion the idea with the bit is good... make the part with overwriting deleted items with 0x00 configurable...
For Insertion you should keep a list of free blocks with their respective size... this list gets updated when you delete an item and when you grow the file and when you shrink the filt... this way you can determine very fast how to handle an insertion...
Why not start by looking at how existing systems work? If this is for your own education that will benefit you more in the long run.
Look at the tried and true B-Tree/B+Tree for starters. Then look at some others like Fractal Tree indexes, SSTables, Hash Tables, Merge Tables, etc.
Start by understanding how a 'database' stores and indexes data. There are great open source and documented examples of this both in the NoSQL space as well as the more traditional RDBMS world. Take apart something that exists, understand it, modify it, improve it.
I've been down this road, though not for educational purposes. The .NET space lacked any thread-safe B+Tree that was disk-based, so I wrote one. You can read some about it on my blog at http://csharptest.net/projects/bplustree/ or go download the source and take it apart: http://code.google.com/p/csharptest-net/downloads/list
There are open source databases why dont you look at them first. MySQL source code can be a good start. You can download the source and get into it.
Also, you can start investigating the data structures being used by databases, then look at persistence strategies and so forth.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am currently learning C# and I have a situation where I have a class that contains a ISet. I don't wish clients to modify this set directly, and most clients only Add and Remove, and I provide accessors through my class to do this.
However, I have one client that wishes to know more about this set and its contents. I don't really want to muddy the wrapper class itself with lots of methods for this one client, so I would prefer to be able to return the set itself in a immutable way.
I found I can't - well, not really. The only options I seem to have are:
Return an IEnumerable (No: restrictive functionality);
ReadOnlyCollection (No: It's a LIST);
Return a copy (No: Bad form IMHO, allows clients to modify the returned collection perhaps unaware that it's not going to change the real object, plus it has performance overhead);
Implement my own ReadOnlySet (No: Would need to derive from ISet and thus meaning I need to implement mutators, probably firing exceptions, I would rather compile time errors - not runtime).
Am I missing something? Am I being unreasonable? Is my only option to provide the full set of accessors on my wrapper? Am I incorrect in my original intent to keep the wrapper clean for the vast majority of clients?
So two questions:
Why isn't there an standard C# immutable Collection interface? It seems like a fairly reasonable requirement?
Why is ReadOnlyCollection annoyingly called ReadOnlyCollection when it is really a ReadOnlyList? I was going to bite the bullet and use that until I found out it was a List (and I use a Set).
Why isn't there a standard C# immutable interface? It seems like a
fairly reasonable requirement?
A standard C# immutable¹ interface already exists: it's called IEnumerable and all containers implement it.
More powerful immutable interfaces are problematic, because there are many kinds of immutability. If the BCL team decided to pick one definition of immutability and elevate it to the immutability status it's certain that down the road people looking for a different kind of immutability would complain about the choice.
Satisfying everyone would mean not only sorting all of the immutability mess out but creating lots of interfaces (good luck picking good names for them too) and baking all these immutability concepts into the language well enough to make immutability a first-class citizen -- remember that there are no second chances here, once you ship a public class its public interface is immutable forever (pun intended). While all of this might be good to have, I 'm really skeptical about the cost/benefit ratio.
It's not difficult to define IReadOnlyList, IReadOnlySet and such if you do require them. I assume that they do not already exist because again, minus 100 points.
ReadOnlyCollection is IMHO either a concession or a class that was required internally for the BCL and exposed to the world because hey, free functionality at really low cost for the BCL team (since it would have to be implemented, documented and tested anyway). In any case I don't think that it does not live in the glamorous System.Collections.Generic neighborhood by chance.
Why is ReadOnlyCollection annoyingly called ReadOnlyCollection when it
is really a ReadOnlyList? I was going to bite the bullet and use that
until I found out it was a List (and I use a Set).
I 'm sure the BCL team would love to be able to go back in time and fix that, because it's almost certainly one of those little inconsistencies that unavoidably sneak into any library of comparable scope. Since ReadOnlyCollection implements IList it should definitely have been called ReadOnlyList.
However, given that a "list" offers more functionality than a "collection", I don't see how this would stop you. Neither is a Set, so you would have to build set-related functionality on top of them in any case (which is not a good idea; just build read-only semantics on top of Set).
¹ We 're tossing around "immutable" a lot here, but that word does not have a singular meaning. I think it would be more appropriate to use "read-only", but I 'll go with your choice of word for consistency.
This may help,
http://blogs.msdn.com/b/jaredpar/archive/2008/04/22/api-design-readonlycollection-t.aspx
I think the only way for you to provide a read-only 'copy' of the set without actually copying the data into another instance of the same or a different structure, is to go with the wrapper and implement all the item-adding-and-removing methods to throw an exception.
If your set is exposed only as an ISet anyway, consumers are only going to see the members defined on the interface, no matter what your wrapper contains - that doesn't seem like it's a bad thing.
I agree it would be nice if there were better support in .net for both immutability and read-only wrappers, though I think it's important to note that there is a huge difference between the concepts. A read-only wrapper promises its creator that consumers of it won't be able to change the underlying object, but makes no promise to consumers that the underlying object itself won't change. By contrast, an immutable object promises its creator and consumers that its values won't change.
I'm not sure why the notion that there are many different types of immunity should be a problem. If I have a generic ImmutableList<T> which takes an unqualified T my expectation would be that it will always contain the same T's as it did when it was created. The collection could in no way affect whether any of the properties of the T's could change, and thus it shouldn't be expected to.
If I had my druthers, most of the collection-related interfaces would include readable, mutable, and immutable variants (mutable and immutable would both extend from readable). I'd also add a write-only contravariant IAppendable interface, as well as an IImmutableEnumerable derived from IEnumerable (I'd add a ToImmutable method to IEnumerable (and IImmutableEnumerable); an implementation could construct an immutable collection, but in some cases that might not be the best approach. For example, a mutable object might implement IEnumerable by return a mutable number of copies of a mutable element. If the number of copies is large, converting to a simple collection could be very wasteful.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to share objects across processes in .Net?
I can do this for a single process (single .exe) but how can I do it between processes?
You can do this via remoting. You class needs to inherit from MarshalByRefObject, which will give your clients a proxy to the real object.
You'd need to use some sort of distributed hash table or caching mechanism.
Try to avoid things like remoting if you can, because calls to a remote object can get expensive and start to really hurt performance. If you do go with .net remoting, then carefully consider the interface of the remote object. You should be passing coarse grained data across the process boundry, so avoid chatty interfaces with lots of calls with little bits of data.
What are the requirements of the class that you want to act as a singleton? There might be a totally different way of looking at it. Currently the thinking is that singletons are undesirable because they are difficult to unit test reliably, so avoiding the singleton concept could be the direction to take.
using .Net remoting
(see answers above or by this url: http://msdn.microsoft.com/en-us/library/kwdt6w2k%28VS.71%29.aspx)
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What's the best way for a common library to know what context - a.k.a. the calling app - it is in? I'm in a very controlled enterprise environment... is there a better way for the library to know what application it is getting called from than reading a setting in the config file? What do you use for this type of thing?
//the rest of the story
I work on the Intranet team for a Fortune 500 manufacturing company. I have created a common library that all of our new .Net applications will make use of. It queries a common database for information about the application and a bunch of other things that are irrelevant to the question. As you can imagine, the common library needs to know what application is calling it. I could just force every application to set a property on some static class or something, but instead I wanted to make it a little more behind the scenes. Currently it requires the developer to put a setting in the app.config or web.config with a key of ApplicationName and a value of - you guessed it - the application name (which is a unique non changing id for us). It then uses Currently it uses ConfigurationManager.AppSettings["ApplicationName"] to pull this in.
There may be a way to do it. I will most likely get down votes for this since I don't plan to answer your actual question at all, but I just couldn't move on without saying something. To me this is an example of the worst sort of coupling possible. Your actually has to look at a DB and behave differently depending on the application that is calling it?
You could also just call Assembly.GetEntryAssembly within the common library class.
Then use the .Name property from the returned assembly.
That means those, that your appsettings table (or whatever it is) needs to be keyed by the assembly name, and that if assembly name should change it'd all break. It means you're slightly less flexible on your naming/key choices here.
getenv() will get you environment variables, which in turn should should give you what you want. But generally, having different behavior depending on the name of the calling program is not considered a best practice. An exception would be if you wanted to print out the calling program's name in a log message. There are of course other exceptions, and your situation may be one of them.
You can also probably get the information via the process id (w/ getpid()).
I'm agreeing with EBGreen here. This is a red flag question to me.
That said, I suggest to do exactly the opposite of what you're suggesting, and simply pass the key (application name or whatever) as a parameter to the function you're calling. You could bake it into the program as public static property on whatever you're entry point is, and make a little helper function that basically overloads the call to the repository. That would hide it and reduce error prone repetition..
Better yet, you could also make your entry point classes implement an interface that has a method to get the application name (it just returns a string constant) and methods for setting the values returned from the database.. Call it "IIntranetApp" or something. Then you just pass "this" to a function expecting "IIntranetApp", and it magically fills in the blanks needed from your central repo.
Something like:
public interface IIntranetApp
{
string GetApplicationName();
void SetConnectionString(string connectionString);
// etc... add methods as necessary
}