Iterating through diff changes in LibGit2Sharp

Iterating through diff changes in LibGit2Sharp - c#

What could be the best (as in performant, simple) way to iterate over TreeChanges in LibGit2Sharp?
If I access the .Patch property, I retrieve the full text of the changes. This is not quite enough for me... ideally I would like to be able to iterate over the diff lines, and per each line retrieve the status of the line (modified, added, deleted) and build my own output out of it.
Update:
Let's say I want to build my own diff output. What I'd like to do is to iterate over the changed lines, and during iteration I would check for the type of change (added, removed), and construct my output.
For example:
var diff = "";
foreach (LineChange line in changes) // Bogus class "LineChange"
{
if (line.Type == LineChange.TYPE_ADDED)
diff += "+";
else
diff += "-";
diff += line.Content;
diff += "\n";
}
The above is just a simple example what kind of flexibility I'm looking for. To be able to go through the changes, and run some logic along with it depending on the line change types. The Patch property is already "built", one way would be to parse it, but it seems silly that the library first builds the output, and then I parse it... I'd rather use the building ingredients directly.
I need this kind of functionality so that I can display a visual diff of changes which involves far more code and logic than the simple example I gave above.

As far as I can see, this information is not exposed by libgit2sharp, but it's provided by libgit2 in the case of blob diffs (but not for tree diffs). The relevant code is in ContentChanges.cs, specifically in the constructor and in the LineCallback() method (the code for tree diffs is in TreeChanges.cs).
Because of this, I think you have two options:
Invoke the method git_diff_blobs(), that's used internally by ContentChanges, yourself, either using reflection (it's an internal method in NativeMethods), or by copying the PInvoke signature to your project. You will most likely also need Utf8Marshaler.
Modify the code of ContentChanges, so that it fits your needs. If you do this, it might make sense to create a pull request for that change, so that others could use it too.

#svick is right. It's not exposed.
It might be useful to open an issue/feature request to further discuss this topic. Indeed, exposing a full blown line based diffgram might not fit the current "grain" of the library. However, provided you can come up with a scenario/use case that would benefit most of the users, some research may be invested in order to widen the API.
Beside this option, there might be other solutions: post-process the current produced patch against the previous version of the file
See this SO question for potential leads
Neil Fraser's "Diff Strategies" paper is also a great source of strategies and potential caveats regarding what a diff tool might aim at
DiffPlex, as a working visualization tool, might be inspirational as well
With some more work, one might even achieve something similar to the following kind of visualization (from Perforce 4 viewer)
(source: macworld.com)
Note: In order to ease this, it might be useful to expose in C# the libgit2 diffing options.

Related

Programmatically check for a change in a class in C#

Is there a way to check for the size of a class in C#?
My reason for asking is:
I have a routine that stores a class's data in a file, and a different routine that loads this object (class) from that same file. Each attribute is stored in a specific order, and if you change this class you have to be reminded of these export/import routines needs changing.
An example in C++ (no matter how clumsy or bad programming this might be) would be
the following:
#define PERSON_CLASS_SIZE 8
class Person
{
char *firstName;
}
...
bool ExportPerson(Person p)
{
if (sizeof(Person) != PERSON_CLASS_SIZE )
{
CatastrophicAlert("You have changed the Person class and not fixed this export routine!")
}
}
Thus before compiletime you need to know the size of Person, and modify export/import routines with this size accordingly.
Is there a way to do something similar to this in C#, or are there other ways of "making sure" a different developer changes import/export routines if he changes a class.
... Apart from the obvious "just comment this in the class, this guarantees that a developer never screws things up"-answer.
Thanks in advance.

Each attribute is stored in a specific order, and if you change this class you have to be reminded of these export/import routines needs changing.
It sounds like you're writing your own serialization mechanism. If that's the case, you should probably include some sort of "fingerprint" of the expected properties in the right order, and validate that at read time. You can then include the current fingerprint in a unit test, which will then fail if a property is added. The appropriate action can then be taken (e.g. migrating existing data) and the unit test updated.
Just checking the size of the class certainly wouldn't find all errors - if you added one property and deleted one of the same size in the same change, you could break data without noticing it.

A part from the fact that probably is not the best way to achieve what you need,
I think the fastest way is to use Cecil. You can get the IL body of the entire class.

How to detect if element exist using a lambda expression in c#?

I've been using a try/catch statement to run through whether or not an element exists when I parse through it. Obviously this isn't the best way of doing it. I've been using LINQ (lambda expressions) for the majority of my parsing, but I just don't know how to detect if an element is there or not.
One big problem with some solutions I found is that they take 3-4 times more code than using the try/catch block, which kind of defeats the purpose.
I would assume the code would look something like this:
if(document.Element("myElement").Exists())
{
var myValue = document.Element("myElement").Value;
}
I did find this link, but the looping is unecessary in my case as I can guarantee that it will only show up once if it exists. Plus the fact of having to create a dummy element which seems unecessary as well. Doesn't seem like it's the best way (or a good way) of checking. Any ideas?

XElement e = document.Element("myElement");
if (e != null)
{
var myValue = e.Value;
}
http://msdn.microsoft.com/en-us/library/system.xml.linq.xcontainer.element.aspx
"Gets the first (in document order) child element with the specified XName."
"Returns Nothing if there is no element with the specified name."

Any() is the Linq command.
Assert.IsFalse( new [] { 1, 2, 3, 4 }.Any( i => i == 5 ));

Btw, the comment above about "try / catch" can be true, but isn't in almost all cases. It depends on how you build your solution. In your Release build, turn off as much flags as possible that smell like "Debug" even from a distance. The less the runtime has been told to memorize stack traces and stuff during building, the faster "try / catch" becomes.
Btw, #2: The famous architectural patterns "Tell, don't ask!" (TDA) and the "Open Close Principle" (OCP) forbid the usage of such infamous code like "if (!(fp = fopen(...))". They don't just encourage you to use "try / catch" but force you to do so. Because the OCP not only demands to obey within your own code but also when calling foreign stuff (i.e. libraries like stdio).
Why OCP, not TDA in the last sentence? Because you're not allowed to widen the meaning of existing code. Sticking to the simple "fopen" example, what are you going to do when the result is Zero? Why exactly did "fopen" fail? You could check wether there's enough empty space left, or if the file system is writeable. Of if the file name is valid. Or whatnot. Still, your goal cannot be achieved: open the file. Imagine a headless application, so no intervention of the user is possible. Now what? There is exactly no reason to further fumble with the stuff, because "fopen" failed. You'll need a fallback strategy. Dot. If "fopen" has failed, it has failed.
Rule of thumb: Think of your code as always succeeding (KIS). If your code willingly may 'fail' in terms of that a result set regularly may contain elements or not, put the logic into the class. Perhaps you have to distribute data, properties, and questions, and methods across different classes (TDA). Perhaps you have to readjust your code according to SLA.
In your case, though, make sure the element is existing. If you cannot, it's not your fault. Deep down in your code (a wrapper where all the mistakes of former coders get beautified), transform the data needed into another entity such that further up there is no need to "if".

Any() is the simplest way to check if an element exists.
If you have to make sure that the element is unique, you'll have to do something like .Count() == 1. Alternatively you could implement your own extension method, but this would be only a wrapper around .Count == 1.

Oop data structure advice

I am writing a log file decoder which should be capable of reading many different structures of files. My question is how best to represent this data. I am using C#, but am new to OOP.
An example:
The log files have a range of sensor values. One sensor reading can be called A, another B. Obviously, there are many more than 2 entry types.
In different log files, they could be stored either as ABABABABAB or AAAAABBBBB.
I was thinking of describing this as blocks of entries. So in the first case, a block would be 'AB', with 5 blocks. In the second case, the first block is 'A', read 5 times. This is followed by a block of 'B', read 5 times.
This is quite a simplification (there are actually 40 different types of log file, each with up to 40 sensor values in a block). No log has more than 300 blocks.
At the moment, I store all of this in a datatable. I have a column for each entry, with a property of how many to read. If this is set to -1, it continues to the next column in the block. If not, it will assume that it has reached the end of the block.
This all seems quite clumsy. Can anyone suggest a better way of doing this?

I think you should first start here, and then here to learn a little bit about what object oriented programming is. Don't worry about your current problem while learning about OOP.
As you are learning about OO concepts, you should begin to understand code is not data, and data is not code. It does not matter how you represent your data from an OOP stance. You can write OO code to consume your data, or you could write procedurage code to consume your data, that part is irrelevant to the format of the data.
So then getting back to your question
My question is how best to represent this data
It depends on your needs. What is writing the log file? Do you have control over the writer and reader? If I did I would rely on build the built in serialization methods to minize the amount of code I need to write. Is the log file going to be really long? If so the "datatable" approach you described is usually better. If the log file isn't going to be a huge in file size, XML is really easy to work with.

Very basic and straightforward:
Define an interface for IEnrty with properties like string EntryBlock, int Count
Define a class which represents an Entry and implements IEntry
Code which doing a binary serialization should be aware of interfaces, for instance it should reffer IEnumerable<IEntry>
Class Entry could override ToString() to return something like [ABAB-2], surely if this is would be helpful whilst serialization
Interface IEntry could provide method void CreateFromRawString(string rawDataFromLog) if it would be helpful, decide yourself
If you want more info please share code you are using for serialization/deserializaton

In addition to what Bob has offered, I highly recommend Head First Design Patterns as a gentle, but robust introduction to OO for a C# programmer. The samples are in Java, which translate easily to C#.

As for OOP, you want to learn SOLID.
I would suggest you build this using Test Driven Development.
Start small, with a simple fragment of your log data and write a test like (you'll find a better way to do this with experience and apply it to your situation):
[Test]
public void ReadSequence_FiveA_ReturnsProperList()
{
// Arrange
string sequenceStub = "AAAAA";
// Act
MyFileDecoder decoder = new MyFileDecoder();
List<string> results = decoder.ReadSequence(sequenceStub);
// Assert
Assert.AreEqual(5, results.Count);
Assert.AreEqual("A", results[0]);
}
That test code snippet is just a starting point, and I've tried to be rather verbose in the assertions. You can come up with more creative ways over time. The point is to start small. Once this test passes, add another test where you mix "AB" and change your decoder to handle this properly. Eventually, you'll have a large set of tests that handle your different formats. Using TDD, you'll be on the path to using SOLID properly. Whenever you find something you can't test, you should review the rules and see if you can't make it simpler and inject dependencies.
Eventually you'll get into mocking. For example, you might find that you'd rather INJECT the ability for your MyFileDecoder class to have a dependency that will read your log file. In that case, you would create a mock object and pass that into the constructor and set the mock to return the sequenceStub when a method is called.

We have a graphical designer, now they want a text based designer. Suggestions?

I'm sorry I could not think of a better title.
The problem is the following:
For our customer we have created (as part of a larger application) a
graphical designer which they can use to build "scenario's".
These scenario's consist of "Composites" which in turn consist
of "Commands". These command objects all derive from CommandBase and
implement an interface called ICompilable.
The scenario class also implements ICompilable. When Compile() is called
on a command an array of bytes is returned which can then be send to the device
for which they are intended (can't disclose to much info about that hardware, sorry)
Just to give you an idea:
var scenario = new Scenario();
scenario.Add(new DelayCommand(1));
scenario.Add(new CountWithValueCommand(1,ActionEnum.Add,1));
scenario.Add(new DirectPowerCommand(23,false,150));
scenario.Add(new WaitCommand(3));
scenario.Add(new DirectPowerCommand(23,false,150));
scenario.Add(new SkipIfCommand(1,OperatorEnum.SmallerThan,10));
scenario.Add(new JumpCommand(2));
byte[] compiledData = scenario.Compile();
The graphical designer abstracts all this from the user and allows
him (or her) to simply drag en drop composites onto the designer surface.
(Composites can group commands so we can provide building blocks for returning tasks)
Recently our customer came to us and said, "well the designer is really cool,
but we have some people who would rather have some kind of programming language,
just something simple."
(Simple to them of course)
I would very much like to provide them with a simple language,
that can call various commmands and also replace SkipIfCommand with
a nicer structure, etc...
I have no idea where to start or what my options are (without breaking what we have)
I have heard about people embedding languages such as Python,
people writing their own language an parsers, etc...
Any suggestions?
PS: Users only work with composites, never with commands.
Composites are loaded dynamically at runtime (along with their graphical designer)
and may be provided by third parties in seperate assemblies.

From what i think i've understood you have two options
you could either use an XML style "markup" to let them define entities and their groupings, but that may not be best.
Your alternatives are yes, yoou could embedd a language, but do you really need to, wouldnt that be overkill, and how can you control it?
If you only need really simple syntax then perhaps write your own language. Its actually not that hard to create a simple interpreter, as long as you have a strict, unambiguous language. Have a look for some examples of compilers in whatever youre using, c#?
I wrote a very simple interperter in java at uni, it wasnt as hard as you'd think.

If you really just want a dirt simple language, you want a 'recursive descent parser'.
For example, a language like this:
SCENARIO MyScenario
DELAY 1
COUNT 1 ADD 1
DIRECT_POWER 23, False, 150
WAIT 3
...
END_SCENARIO
You might have a grammar like:
scenario :: 'SCENARIO' label newline _cmds END_SCENARIO
cmds:: _delay or _count or _direct_power or...
delay:: 'DELAY' number
Which gives code like:
def scenario():
match_word('SCENARIO')
scenario_name = match_label()
emit('var scenario = new Scenario();')
cmds()
match_word('END_SCENARIO')
emit('byte[] ' + scenario_name + ' = scenario.Compile();')
def delay():
match_word('DELAY')
length = match_number()
emit('scenario.Add(new DelayCommand('+ length +'))')
def cmds():
word = peek_next_word()
if word == 'DELAY':
delay()
elif ...

This looks like a perfect scenario for a simple DSL. See http://msdn.microsoft.com/en-us/library/bb126235(VS.80).aspx for some information.
You could also use a scripting language such as lua.Net.

Here's a Pythonic solution for building a DSL that you can use to compile and create byte code arrays.
Write a simple module that makes your C# structures available to Python. The goal is to define each C# class that users are allowed to work with (Composites or Commands or whatever) as a Python class.
Usually, this involves implementing a minimal set of methods with different conversions from C# types to native Python types and vice versa.
Write some nice demos showing how to use these Python class definitions to create their scripts. You should be able to create things like this in Python.
import * from someInterfaceModule
scenario= Scenario(
Delay(1),
Repeat( Range(10),
DirectPower( 23, False, 150),
Wait(3),
DirectPower( 23, False, 150)
)
)
scenario.compile()
These are relatively simple classes to define. Each class here be reasonably easy to implement as Python modules that directly call your base C# modules.
The syntax is pure Python with no additional parsing or lexical scanning required.

To add to S.Lott's comment, here's how you eval a Python script from C#

While it might be great fun to create this mini-language and code it all up, the real questions you need to ask are:
What is the business case for adding this feature / facility?
Who is going to pay for this feature?
Who is going to "sign off" on this feature if you build it?
"Really neat" features have a way of getting built when the reality might indicate the true answer to such a request is "no".
See if you have a stakeholder willing to sponsor this before proceeding. Then check with the end users to see what they really want before committing to the project.
Cheers,
-R

How do you flag code so that you can come back later and work on it?

In C# I use the #warning and #error directives,
#warning This is dirty code...
#error Fix this before everything explodes!
This way, the compiler will let me know that I still have work to do. What technique do you use to mark code so you won't forget about it?

Mark them with // TODO, // HACK or other comment tokens that will show up in the task pane in Visual Studio.
See Using the Task List.

Todo comment as well.
We've also added a special keyword NOCHECKIN, we've added a commit-hook to our source control system (very easy to do with at least cvs or svn) where it scans all files and refuses to check in the file if it finds the text NOCHECKIN anywhere.
This is very useful if you just want to test something out and be certain that it doesn't accidentaly gets checked in (passed the watchful eyes during the diff of everything thats commited to source control).

I use a combination of //TODO: //HACK: and throw new NotImplementedException(); on my methods to denote work that was not done. Also, I add bookmarks in Visual Studio on lines that are incomplete.

//TODO: Person's name - please fix this.
This is in Java, you can then look at tasks in Eclipse which will locate all references to this tag, and can group them by person so that you can assign a TODO to someone else, or only look at your own.

If I've got to drop everything in the middle of a change, then
#error finish this
If it's something I should do later, it goes into my bug tracker (which is used for all tasks).

'To do' comments are great in theory, but not so good in practice, at least in my experience. If you are going to be pulled away for long enough to need them, then they tend to get forgotten.
I favor Jon T's general strategy, but I usually do it by just plain breaking the code temporarily - I often insert a deliberately undefined method reference and let the compiler remind me about what I need to get back to:
PutTheUpdateCodeHere();

An approach that I've really liked is "Hack Bombing", as demonstrated by Oren Eini here.
try
{
//do stuff
return true;
}
catch // no idea how to prevent an exception here at the moment, this make it work for now...
{
if (DateTime.Today > new DateTime(2007, 2, 7))
throw new InvalidOperationException("fix me already!! no catching exceptions like this!");
return false;
}

Add a test in a disabled state. They show up in all the build reports.
If that doesn't work, I file a bug.
In particular, I haven't seen TODO comments ever decrease in quantity in any meaningful way. If I didn't have time to do it when I wrote the comment, I don't know why I'd have time later.

//TODO: Finish this
If you use VS you can setup your own Task Tags under Tools>Options>Environment>Task List

gvim highlights both "// XXX" and "// TODO" in yellow, which amazed me the first time I marked some code that way to remind myself to come back to it.

I'm a C++ programmer, but I imagine my technique could be easily implemented in C# or any other language for that matter:
I have a ToDo(msg) macro that expands into constructing a static object at local scope whose constructor outputs a log message. That way, the first time I execute unfinished code, I get a reminder in my log output that tells me that I can defer the task no longer.
It looks like this:
class ToDo_helper
{
public:
ToDo_helper(const std::string& msg, const char* file, int line)
{
std::string header(79, '*');
Log(LOG_WARNING) << header << '\n'
<< " TO DO:\n"
<< " Task: " << msg << '\n'
<< " File: " << file << '\n'
<< " Line: " << line << '\n'
<< header;
}
};
#define TODO_HELPER_2(X, file, line) \
static Error::ToDo_helper tdh##line(X, file, line)
#define TODO_HELPER_1(X, file, line) TODO_HELPER_2(X, file, line)
#define ToDo(X) TODO_HELPER_1(X, __FILE__, __LINE__)
... and you use it like this:
void some_unfinished_business() {
ToDo("Take care of unfinished business");
}

It's not a perfect world, and we don't always have infinite time to refactor or ponder the code.
I sometimes put //REVIEW in the code if it's something I want to come back to later. i.e. code is working, but perhaps not convinced it's the best way.
// REVIEW - RP - Is this the best way to achieve x? Could we use algorithm y?
Same goes for //REFACTOR
// REFACTOR - should pull this method up and remove near-dupe code in XYZ.cs

I use // TODO: or // HACK: as a reminder that something is unfinished with a note explaining why.
I often (read 'rarely') go back and finish those things due to time constraints.
However, when I'm looking over the code I have a record of what was left uncompleted and more importantly WHY.
One more comment I use often at the end of the day or week:
// START HERE CHRIS
^^^^^^^^^^^^^^^^^^^^
Tells me where I left off so I can minimize my bootstrap time on Monday morning.

// TODO: <explanation>
if it's something that I haven't gotten around to implementing, and don't want to forget.
// FIXME: <explanation>
if it's something that I don't think works right, and want to come back later or have other eyes look at it.
Never thought of the #error/#warning options. Those could come in handy too.

I use //FIXME: xxx for broken code, and //CHGME: xxx for code that needs attention but works (perhaps only in a limited context).

Todo Comment.

These are the three different ways I have found helpful to flag something that needs to be addressed.
Place a comment flag next to the code that needs to be looked at. Most compilers can recognize common flags and display them in an organized fashion. Usually your IDE has a watch window specifically designed for these flags. The most common comment flag is: //TODO This how you would use it:
//TODO: Fix this before it is released. This causes an access violation because it is using memory that isn't created yet.
One way to flag something that needs to be addressed before release would be to create a useless variable. Most compilers will warn you if you have a variable that isn't used. Here is how you could use this technique:
int This_Is_An_Access_Violation = 0;
IDE Bookmarks. Most products will come with a way to place a bookmark in your code for future reference. This is a good idea, except that it can only be seen by you. When you share your code most IDE's won't share your bookmarks. You can check the help file system of your IDE to see how to use it's bookmarking features.

I also use TODO: comments. I understand the criticism that they rarely actually get fixed, and that they'd be better off reported as bugs. However, I think that misses a couple points:
I use them most during heavy development, when I'm constantly refactoring and redesigning things. So I'm looking at them all the time. In situations like that, most of them actually do get addressed. Plus it's easy to do a search for TODO: to make sure I didn't miss anything.
It can be very helpful for people reading your code, to know the spots that you think were poorly written or hacked together. If I'm reading unfamiliar code, I tend to look for organizational patterns, naming conventions, consistent logic, etc.. If that consistency had to be violated one or two times for expediency, I'd rather see a note to that effect. That way I don't waste time trying to find logic where there is none.

If it's some long term technical debt, you can comment like:
// TODO: This code loan causes an annual interest rate of 7.5% developer/hour. Upfront fee as stated by the current implementation. This contract is subject of prior authorization from the DCB (Developer's Code Bank), and tariff may change without warning.
... err. I guess a TODO will do it, as long as you don't simply ignore them.

This is my list of temporary comment tags I use:
//+TODO Usual meaning.
//+H Where I was working last time.
//+T Temporary/test code.
//+B Bug.
//+P Performance issue.
To indicate different priorities, e.g.: //+B vs //+B+++
Advantages:
Easy to search-in/remove-from the code (look for //+).
Easy to filter on a priority basis, e.g.: search for //+B to find all bugs, search for //+B+++ to only get high priority ones.
Can be used with C++, C#, Java, ...
Why the //+ notation? Because the + symbol looks like a little t, for temporary.
Note: this is not a Standard recommendation, just a personal one.

As most programmers seem to do here, I use TODO comments. Additionally, I use Eclipse's task interface Mylyn. When a task is active, Mylyn remembers all resources I have opened. This way I can track
where in a file I have to do something (and what),
in which files I have to do it, and
to what task they are related.

Besides keying off the "TODO:" comment, many IDE's also key off the "TASK:" comment. Some IDE's even let you configure your own special identifier.

It is probably not a good idea to sprinkle your code base with uninformative TODOs, especially if you have multiple contributors over time. This can be quite confusing to the newcomers. However, what seems to me to work well in practice is to state the author and when the TODO was written, with a header (50 characters max) and a longer body.
Whatever you pack into the TODO comments, I'd recommend to be systematic in how you track them. For example, there is a service that examines the TODO comments in your repository based on git blame (http://www.tickgit.com).
I developed my own command-line tool to enforce the consistent style of the TODO comments using ideas from the answers here (https://github.com/mristin/opinionated-csharp-todos). It was fairly easy to integrate it into the continuous integration so that the task list is re-generated on every push to the master.
It also makes sense to have the task list separate from your IDE for situations when you discuss the TODOs in a meeting with other people, when you want to share it by email etc.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.