Programmatically check for a change in a class in C#

Programmatically check for a change in a class in C# - c#

Is there a way to check for the size of a class in C#?
My reason for asking is:
I have a routine that stores a class's data in a file, and a different routine that loads this object (class) from that same file. Each attribute is stored in a specific order, and if you change this class you have to be reminded of these export/import routines needs changing.
An example in C++ (no matter how clumsy or bad programming this might be) would be
the following:
#define PERSON_CLASS_SIZE 8
class Person
{
char *firstName;
}
...
bool ExportPerson(Person p)
{
if (sizeof(Person) != PERSON_CLASS_SIZE )
{
CatastrophicAlert("You have changed the Person class and not fixed this export routine!")
}
}
Thus before compiletime you need to know the size of Person, and modify export/import routines with this size accordingly.
Is there a way to do something similar to this in C#, or are there other ways of "making sure" a different developer changes import/export routines if he changes a class.
... Apart from the obvious "just comment this in the class, this guarantees that a developer never screws things up"-answer.
Thanks in advance.

Each attribute is stored in a specific order, and if you change this class you have to be reminded of these export/import routines needs changing.
It sounds like you're writing your own serialization mechanism. If that's the case, you should probably include some sort of "fingerprint" of the expected properties in the right order, and validate that at read time. You can then include the current fingerprint in a unit test, which will then fail if a property is added. The appropriate action can then be taken (e.g. migrating existing data) and the unit test updated.
Just checking the size of the class certainly wouldn't find all errors - if you added one property and deleted one of the same size in the same change, you could break data without noticing it.

A part from the fact that probably is not the best way to achieve what you need,
I think the fastest way is to use Cecil. You can get the IL body of the entire class.

Related

PostScript memory management implementation for local and global

In PostScript you have VM to store the values of composite objects.
They can be stored in local or global VM depending on the VM allocation mode of the interpreter.
I'm working on an interpreter in C# (a bit similar to the JAVA) language. And I can't figure out how to represent local and global VM.
Let's say I have an object:
public class StringObj : Composite {
public string Data { get;set; }
}
The Data property (value of StringObj) is stored either in local or global VM. But how could this be presented in C# (or Java).
C# itself already has memory management itself (stack/heap/...) but these are the internals of the language and .NET framework, which I can't control.
Should I need to create an own memory structure? If so, how would/could that be represented?
Or would it be ok to just store a bool property on each Composite object to know if it is local or global, something like this:
public class StringObj : Composite {
public string Data { get;set; }
public bool IsGlobal { get; set; }
}
Update:
Maybe if I know how the "save" operator works, I might better understand how to implement the memory management.
What exactly does the "save" operator save?
"creates a snapshot of the current state of virtual memory"
From reading the restore operator I think it stores this:
The array packing mode (packing)
VM allocation mode (boolean)
object output format (?)
user interpreter parameters (?)
saves a copy of the current graphics state on the graphics state stack
What else does it save? as the definition of "current state of virtual memory" is not quite good defined.
Should I also check every object on the stack to verify if it is composite or not and save the value of the object on the stack? or are stacks/dictionaries untouched? or..?

This isn't really a question anyone else can answer for you, if you are intent on writing your own PostScript interpreter.
You will need to fully understand the memory management of PostScript objects, and their lifetime, and design your own memory management around that. I think it very unlikely that you can get away without designing your own memory structure(s), I've certainly never seen a PostScript interpreter which didn't.
Again I can't answer a vague question like "how would/could that be represented?", that's much too general. There are many ways you could design a PostScript memory manager, the choice is entirely yours. If memory management interests you then presumably you will have a preferred approach, if it doesn't then stick with something simple, just make sure it covers all the basics.
By the way, is there a reason you are writing your own interpreter, other than personal satisfaction ? The general consensus is that its more than 5 man year work to implement a PostScript interpreter (potentially less if you use a pre-existing graphics library for rendering, assuming you have a PostScript-compatible one). That seems like a lot of work for a 30+ year old language that is not in wide usage.
Regarding save:
save saves, well, everything.... Just like the sentence you quoted, a save is a mark that you can later return to, and encapsulates everything in the PostScript VM.
I think you've missed the fact that I can make changes to a composite object, and those changes are subject to save and restore.
Try this:
%!
/mydict <</Test (this is a string)>> def
save
mydict /Test (This is not a string) put
mydict /Test get == flush
restore
mydict /Test get == flush
Notice that the content of the dictionary changes after the restore.
One interpreter I knew of used save as,essentially, a 'high water mark', combined with a copy-on-write architecture. If you performed an operation on an existing composite object which was below a save mark, then the object was copied, and the alteration made to the copy. Then a restore simply freed everything back to the last save mark. However details of implementation are not specified, provided the interpreter behaves correctly you can do anything you like.
Note that objects in global VM are not subject to save and restore. You also need to be careful around the stack contents when doing a restore, to ensure that objects on the stack wouldn't be discarded, this triggers an invalidrestore error.
Note 2, the job server loop save and restore will affect global VM.....

Maintaining modularity in Main()?

I'm writing the simple card game "War" for homework and now that the game works, I'm trying to make it more modular and organized. Below is a section of Main() containing the bulk of the program. I should mention, the course is being taught in C#, but it is not a C# course. Rather, we're learning basic logic and OOP concepts so I may not be taking advantage of some C# features.
bool sameCard = true;
while (sameCard)
{
sameCard = false;
card1.setVal(random.Next(1,14)); // set card value
val1 = determineFace(card1.getVal()); // assign 'face' cards accordingly
suit = suitArr[random.Next(0,4)]; // choose suit string from array
card1.setSuit(suit); // set card suit
card2.setVal(random.Next(1,14)); // rinse, repeat for card2...
val2 = determineFace(card2.getVal());
suit = suitArr[random.Next(0,4)];
card2.setSuit(suit);
// check if same card is drawn twice:
catchDuplicate(ref card1, ref card2, ref sameCard);
}
Console.WriteLine ("Player: {0} of {1}", val1, card1.getSuit());
Console.WriteLine ("Computer: {0} of {1}", val2, card2.getSuit());
// compare card values, display winner:
determineWinner(card1, card2);
So here are my questions:
Can I use loops in Main() and still consider it modular?
Is the card-drawing process written well/contained properly?
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
I've only been programming for two semesters and I'd like to form good habits at this stage. Any input/advice would be much appreciated.
Edit:
catchDuplicate() is now a boolean method and the call looks like this:
sameCard = catchDuplicate(card1, card2);
thanks to #Douglas.

Can I use loops in Main() and still consider it modular?
Yes, you can. However, more often than not, Main in OOP-programs contains only a handful of method-calls that initiate the core functionality, which is then stored in other classes.
Is the card-drawing process written well/contained properly?
Partially. If I understand your code correctly (you only show Main), you undertake some actions that, when done in the wrong order or with the wrong values, may not end up well. Think of it this way: if you sell your class library (not the whole product, but only your classes), what would be the clearest way to use your library for an uninitiated user?
I.e., consider a class Deck that contains a deck of cards. On creation it creates all cards and shuffles it. Give it a method Shuffle to shuffle the deck when the user of your class needs to shuffle and add methods like DrawCard for handling dealing cards.
Further: you have methods that are not contained within a class of their own yet have functionality that would be better of in a class. I.e., determineFace is better suited to be a method on class Card (assuming card2 is of type Card).
Is it considered bad practice to print messages in a method (i.e.: determineWinner())?
Yes and no. If you only want messages to be visible during testing, use Debug.WriteLine. In a production build, these will be no-ops. However, when you write messages in a production version, make sure that this is clear from the name of the method. I.e., WriteWinnerToConsole or something.
It's more common to not do this because: what format would you print the information? What text should come with it? How do you handle localization? However, when you write a program, obviously it must contain methods that write stuff to the screen (or form, or web page). These are usually contained in specific classes for that purpose. Here, that could be the class CardGameX for instance.
General thoughts
Think about the principle "one method/function should have only one task and one task only and it should not have side effects (like calculating square and printing, then printing is the side effect).".
The principle for classes is, very high-level: a class contains methods that logically belong together and operate on the same set of properties/fields. An example of the opposite: Shuffle should not be a method in class Card. However, it would belong logically in the class Deck.

If the main problem of your homework is create a modular application, you must encapsulate all logic in specialized classes.
Each class must do only one job.
Function that play with the card must be in a card class.
Function that draw cards, should be another class.
I think it is the goal of your homework, good luck!

Take all advices on "best practices" with a grain of salt. Always think for yourself.
That said:
Can I use loops in Main() and still consider it modular?
The two concepts are independent. If your Main() only does high-level logic (i.e. calls other methods) then it does not matter if it does so in a loop, after all the algorithm requires a loop. (you wouldn't add a loop unnecessarily, no?)
As a rule of thumb, if possible/practical, make your program self-documenting. Make it "readable" so, if a new person (or even you, a few months from now) looks at it they can understand it at any level.
Is the card-drawing process written well/contained properly?
No. First of all, a card should never be selected twice. For a more "modular" approach I would have something like this:
while ( Deck.NumCards >= 2 )
{
Card card1 = Deck.GetACard();
Card card2 = Deck.GetACard();
PrintSomeStuffAboutACard( GetWinner( card1, card2 ) );
}
Is it considered bad practice to print messages in a method (ie: determineWinner())?
Is the purpose of determineWinner to print a message? If the answer is "No" then it is not a matter of "bad practice", you function is plain wrong.
That said, there is such a thing as a "debug" build and a "release" build. To aid you in debugging the application and figuring out what works and what doesn't it is a good idea to add logging messages.
Make sure they are relevant and that they are not executed in the "release" build.

Q: Can I use loops in Main() and still consider it modular?
A: Yes, you can use loops, that doesn't really have an impact on modularity.
Q: Is the card-drawing process written well/contained properly?
A: If you want to be more modular, turn DrawCard into a function/method. Maybe just write DrawCards instead of DrawCard, but then there's an optimization-versus-modularity question there.
Q: Is it considered bad practice to print messages in a method (ie: determineWinner())?
A: I wouldn't say printing messages in a method is bad practice, it just depends on context. Ideally, the game itself doesn't handle anything but game logic. The program can have some kind of game object and it can read state from the game object. This way, you could technically change the game from being text-based to being graphical. I mean, that's ideal for modularity, but it may not be practical given a deadline. You always have to decide when you have to sacrifice a best practice because there isn't enough time. Sadly, this is all too often a common occurrence.
Separate game logic from the presentation of it. With a simple game like this, it's an unnecessary dependency.

Oop data structure advice

I am writing a log file decoder which should be capable of reading many different structures of files. My question is how best to represent this data. I am using C#, but am new to OOP.
An example:
The log files have a range of sensor values. One sensor reading can be called A, another B. Obviously, there are many more than 2 entry types.
In different log files, they could be stored either as ABABABABAB or AAAAABBBBB.
I was thinking of describing this as blocks of entries. So in the first case, a block would be 'AB', with 5 blocks. In the second case, the first block is 'A', read 5 times. This is followed by a block of 'B', read 5 times.
This is quite a simplification (there are actually 40 different types of log file, each with up to 40 sensor values in a block). No log has more than 300 blocks.
At the moment, I store all of this in a datatable. I have a column for each entry, with a property of how many to read. If this is set to -1, it continues to the next column in the block. If not, it will assume that it has reached the end of the block.
This all seems quite clumsy. Can anyone suggest a better way of doing this?

I think you should first start here, and then here to learn a little bit about what object oriented programming is. Don't worry about your current problem while learning about OOP.
As you are learning about OO concepts, you should begin to understand code is not data, and data is not code. It does not matter how you represent your data from an OOP stance. You can write OO code to consume your data, or you could write procedurage code to consume your data, that part is irrelevant to the format of the data.
So then getting back to your question
My question is how best to represent this data
It depends on your needs. What is writing the log file? Do you have control over the writer and reader? If I did I would rely on build the built in serialization methods to minize the amount of code I need to write. Is the log file going to be really long? If so the "datatable" approach you described is usually better. If the log file isn't going to be a huge in file size, XML is really easy to work with.

Very basic and straightforward:
Define an interface for IEnrty with properties like string EntryBlock, int Count
Define a class which represents an Entry and implements IEntry
Code which doing a binary serialization should be aware of interfaces, for instance it should reffer IEnumerable<IEntry>
Class Entry could override ToString() to return something like [ABAB-2], surely if this is would be helpful whilst serialization
Interface IEntry could provide method void CreateFromRawString(string rawDataFromLog) if it would be helpful, decide yourself
If you want more info please share code you are using for serialization/deserializaton

In addition to what Bob has offered, I highly recommend Head First Design Patterns as a gentle, but robust introduction to OO for a C# programmer. The samples are in Java, which translate easily to C#.

As for OOP, you want to learn SOLID.
I would suggest you build this using Test Driven Development.
Start small, with a simple fragment of your log data and write a test like (you'll find a better way to do this with experience and apply it to your situation):
[Test]
public void ReadSequence_FiveA_ReturnsProperList()
{
// Arrange
string sequenceStub = "AAAAA";
// Act
MyFileDecoder decoder = new MyFileDecoder();
List<string> results = decoder.ReadSequence(sequenceStub);
// Assert
Assert.AreEqual(5, results.Count);
Assert.AreEqual("A", results[0]);
}
That test code snippet is just a starting point, and I've tried to be rather verbose in the assertions. You can come up with more creative ways over time. The point is to start small. Once this test passes, add another test where you mix "AB" and change your decoder to handle this properly. Eventually, you'll have a large set of tests that handle your different formats. Using TDD, you'll be on the path to using SOLID properly. Whenever you find something you can't test, you should review the rules and see if you can't make it simpler and inject dependencies.
Eventually you'll get into mocking. For example, you might find that you'd rather INJECT the ability for your MyFileDecoder class to have a dependency that will read your log file. In that case, you would create a mock object and pass that into the constructor and set the mock to return the sequenceStub when a method is called.

Hashing the state of a complex object in .NET

Some background information:
I am working on a C#/WPF application, which basically is about creating, editing, saving and loading some data model.
The data model contains of a hierarchy of various objects. There is a "root" object of class A, which has a list of objects of class B, which each has a list of objects of class C, etc. Around 30 classes involved in total.
Now my problem is that I want to prompt the user with the usual "you have unsaved changes, save?" dialog, if he tries to exit the program. But how do I know if the data in current loaded model is actually changed?
There is of course ways to solve this, like e.g. reloading the model from file and compare against the one in memory value by value or make every UI control set a flag indicating the model has been changed. Now instead, I want to create a hash value based on the model state on load and generate a new value when the user tries to exit, and compare those two.
Now the question:
So inspired of that, I was wondering if there exist some way to generate a hash value from the (value)state of some arbitrary complex object? Preferably in a generic way, e.g. no need to apply attributes to each involved class/field.
One idea could be to use some of .NET's serialization functionality (assuming it will work out-of-the-box in this case) and apply a hash function to the content of the resulting file. However, I guess there exist some more suitable approach.
Thanks in advance.
Edit:
Point taken about the hashing and possible collisions. Instead, I am going for deep comparing value by value. I am already using the XML serializer for persistence, so I am just going to serialize and compare chars. Not pretty, but it does the trick in this case.

Ok you can use reflection and some sort of recursive function of course.
But keep in mind that every object is a model of a particular thing. I mean there maybe a lot of "unimportant" fields and properties.
And, thanks to #compie!
You can create a hash function just for your domain. But this requires strong mathematic skills.
And you can try to use classic hash functions like SHA. Just assume that your object is a string or byte array.

Because this is a WPF app, it may be easier than you think to be notified of changes as they happen. The event architecture of WPF allows you to create event handlers at a level somewhere above where the event actually originates. So, you could create event handlers for the various "change" events of your UI elements in the root window of your interface and set the "changed" flag at that scope.
WPF Routed Events Overview

I would advice against this. Different objects can have the same hash. It's not safe to rely on this for checking if changes have to be saved.

Is it a good idea to create a custom type for the primary key of each data table?

We have a lot of code that passes about “Ids” of data rows; these are mostly ints or guids. I could make this code safer by creating a different struct for the id of each database table. Then the type checker will help to find cases when the wrong ID is passed.
E.g the Person table has a column calls PersonId and we have code like:
DeletePerson(int personId)
DeleteCar(int carId)
Would it be better to have:
struct PersonId
{
private int id;
// GetHashCode etc....
}
DeletePerson(PersionId persionId)
DeleteCar(CarId carId)
Has anyone got real life experience
of dong this?
Is it worth the overhead?
Or more pain then it is worth?
(It would also make it easier to change the data type in the database of the primary key, that is way I thought of this ideal in the first place)
Please don’t say use an ORM some other big change to the system design as I know an ORM would be a better option, but that is not under my power at present. However I can make minor changes like the above to the module I am working on at present.
Update:
Note this is not a web application and the Ids are kept in memory and passed about with WCF, so there is no conversion to/from strings at the edge. There is no reason that the WCF interface can’t use the PersonId type etc. The PersonsId type etc could even be used in the WPF/Winforms UI code.
The only inherently "untyped" bit of the system is the database.
This seems to be down to the cost/benefit of spending time writing code that the compiler can check better, or spending the time writing more unit tests. I am coming down more on the side of spending the time on testing, as I would like to see at least some unit tests in the code base.

It's hard to see how it could be worth it: I recommend doing it only as a last resort and only if people are actually mixing identifiers during development or reporting difficulty keeping them straight.
In web applications in particular it won't even offer the safety you're hoping for: typically you'll be converting strings into integers anyway. There are just too many cases where you'll find yourself writing silly code like this:
int personId;
if (Int32.TryParse(Request["personId"], out personId)) {
this.person = this.PersonRepository.Get(new PersonId(personId));
}
Dealing with complex state in memory certainly improves the case for strongly-typed IDs, but I think Arthur's idea is even better: to avoid confusion, demand an entity instance instead of an identifier. In some situations, performance and memory considerations could make that impractical, but even those should be rare enough that code review would be just as effective without the negative side-effects (quite the reverse!).
I've worked on a system that did this, and it didn't really provide any value. We didn't have ambiguities like the ones you're describing, and in terms of future-proofing, it made it slightly harder to implement new features without any payoff. (No ID's data type changed in two years, at any rate - it's could certainly happen at some point, but as far as I know, the return on investment for that is currently negative.)

I wouldn't make a special id for this. This is mostly a testing issue. You can test the code and make sure it does what it is supposed to.
You can create a standard way of doing things in your system than help future maintenance (similar to what you mention) by passing in the whole object to be manipulated. Of course, if you named your parameter (int personID) and had documentation then any non malicious programmer should be able to use the code effectively when calling that method. Passing a whole object will do that type matching that you are looking for and that should be enough of a standardized way.
I just see having a special structure made to guard against this as adding more work for little benefit. Even if you did this, someone could come along and find a convenient way to make a 'helper' method and bypass whatever structure you put in place anyway so it really isn't a guarantee.

You can just opt for GUIDs, like you suggested yourself. Then, you won't have to worry about passing a person ID of "42" to DeleteCar() and accidentally delete the car with ID of 42. GUIDs are unique; if you pass a person GUID to DeleteCar in your code because of a programming typo, that GUID will not be a PK of any car in the database.

You could create a simple Id class which can help differentiate in code between the two:
public class Id<T>
{
private int RawValue
{
get;
set;
}
public Id(int value)
{
this.RawValue = value;
}
public static explicit operator int (Id<T> id) { return id.RawValue; }
// this cast is optional and can be excluded for further strictness
public static implicit operator Id<T> (int value) { return new Id(value); }
}
Used like so:
class SomeClass
{
public Id<Person> PersonId { get; set; }
public Id<Car> CarId { get; set; }
}
Assuming your values would only be retrieved from the database, unless you explicitly cast the value to an integer, it is not possible to use the two in each other's place.

I don't see much value in custom checking in this case. You might want to beef up your testing suite to check that two things are happening:
Your data access code always works as you expect (i.e., you aren't loading inconsistent Key information into your classes and getting misuse because of that).
That your "round trip" code is working as expected (i.e., that loading a record, making a change and saving it back isn't somehow corrupting your business logic objects).
Having a data access (and business logic) layer you can trust is crucial to being able to address the bigger pictures problems you will encounter attempting to implement the actual business requirements. If your data layer is unreliable you will be spending a lot of effort tracking (or worse, working around) problems at that level that surface when you put load on the subsystem.
If instead your data access code is robust in the face of incorrect usage (what your test suite should be proving to you) then you can relax a bit on the higher levels and trust they will throw exceptions (or however you are dealing with it) when abused.
The reason you hear people suggesting an ORM is that many of these issues are dealt with in a reliable way by such tools. If your implementation is far enough along that such a switch would be painful, just keep in mind that your low level data access layer needs to be as robust as an good ORM if you really want to be able to trust (and thus forget about to a certain extent) your data access.
Instead of custom validation, your testing suite could inject code (via dependency injection) that does robust tests of your Keys (hitting the database to verify each change) as the tests run and that injects production code that omits or restricts such tests for performance reasons. Your data layer will throw errors on failed keys (if you have your foreign keys set up correctly there) so you should also be able to handle those exceptions.

My gut says this just isn't worth the hassle. My first question to you would be whether you actually have found bugs where the wrong int was being passed (a Car ID instead of a Person ID in your example). If so, it is probably more of a case of worse overall architecture in that your Domain objects have too much coupling, and are passing too many arguments around in method parameters rather than acting on internal variables.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.