Seeing Trends in Code Metrics with NDepend

Seeing Trends in Code Metrics with NDepend - c#

I have a version of NDepend for build servers and have automated the NDepend report generation. So, every night the build does its thing and NDepend reports/XML are generated. What I now want to do is track some metrics as a function of time.
So, for instance, it might be nice to have a graph of a particular type or namespace's, say, afferent coupling, on the y axis with time on the x axis. I know that I can compare two NDepend builds and have code and metric diffs, but what I'm looking to do is compare the same single metric or metrics over N builds to see ongoing trends.
I'm sort of assuming that there isn't a tool that does this currently and that I'll have to roll my own, but if there is one out there, I'd sure love to hear about it before investing the time. So, does NDepend itself support anything like this, or is there some sort of utility that already exists that I could use?
I'm also open to suggestions for other technologies that would accomplish this besides NDepend, though I have a strong preference for NDepend due to already having invested in it and being familiar with how it works.
Thanks in advance.

With NDepend, you can write a Code Query over LINQ (CQLinq) to match evolution through any code metrics. For example you could start with the query:
from t in JustMyCode.Types
where t.IsPresentInBothBuilds() &&
t.CodeWasChanged()
let tOld = t.OlderVersion()
let newLoC = t.NbLinesOfCode
let oldLoC = tOld.NbLinesOfCode
let newCC = t.CyclomaticComplexity
let oldCC = tOld.CyclomaticComplexity
let newCov = t.PercentageCoverage
let oldCov = tOld.PercentageCoverage
where newLoC > oldLoC || newCC > oldCC || newCov < oldCov
select new { t, newLoC, oldLoC, newCC, oldCC, newCov, oldCov }
...and get an instant result in Visual Studio. Such rule is integrable into your CI TFS build process and can also be shown in a HTML+javascript report.
Several default code rules are provided to restrict over code metric trending:
Avoid making complex methods even more complex
Types that used to be 100% covered but not anymore
From now, all types added or refactored should respect basic quality principles
Avoid adding methods to a type that already had many methods
Avoid making large methods even larger

Related

Ideal performance with filtering generic collections in large Unity/C# game

I'm working on a commercial game with Unity/C# and I'm reviewing and familiarizing myself with the code. And I started wondering about this, because there are a lot of cases where you're holding a List full of things like GameObjects, Components or ScriptableObjects ... for instance you might have a field like this in an RTS game class:
protected List<Unit> allUnits = new List<Unit>();
Now, suppose an event is triggered at runtime where you need to cycle through allUnits and operate on certain ones or select only certain ones you want or even determine the number of them that satisfy a certain condition. Bear with me on this contrived example, but the 3 standard approaches are for-loop, for-each or a Linq extension method / query statement. Let's say I want to find all units that are not critically wounded or dead and are low on ammo. I can use the following ways to deal with this:
for ( int i = 0; i < allUnits.Count; i++ ) {
var next = allUnits[i];
if ( next.IsDead ) continue;
if ( next.IsCriticallyWounded ) continue;
if ( next.AmmoCount >= Magazine.MaxCapacity * 3 ) continue;
else
unitsLowOnAmmo.Add( next );
}
You could use the same logic in a foreach( var next in allUnits ) loop, so I won't repeat the same code again. But another approach would be like this, using Linq extensions:
unitsLowOnAmmo = allUnits.Where( u =>
!u.IsDead &&
!u.IsCriticallyWounded &&
!u.AmmoCount >= Magazine.MaxCapacity * 3 ).ToList();
You could also use this syntax to find everything under AI control, for example:
var aiUnits = (from Unit u in allUnits
where u.AIControlled
select u).ToList();
In another situation, you might need to find the total number of units satisfying a set of conditions, like I want to find all of the AI-controlled units that are critically wounded so they can maybe send medics or try to bring them to safety in a field hospital. So I could do it by using an accumulator, like for ( int i = 0, count = 0; i < allUnits.Count; i++ ) and then check for the wrong conditions and continue; the loop otherwise let it fall through to a count++; statement if it passes the filter. Or, obviously, I could use int count = List<T>.Count( u => cond1 && cond2 ); ...
Now, obviously, the Linq stuff is much cleaner and more expressive, and the code is a bit easier to read and maintain, but we've all heard the wise masters of C# say "Linq has bad performance! Never use Linq!" over the years. And I'm wondering just how true or not true that prejudice against Linq is, and how the performance of these different approaches really differ and perform "in the field". I'm working on a very large project with about 24GB of assets, code and data and it's quite complicated, and there are lots of List instances in classes storing all kinds of stuff that regularly need to be iterated through and filtered or counted. In some cases it's frame to frame, and in other cases it's after a specific amount of time or upon an event or method call. Performance is already a major concern/issue in this project, and we want the game to be able to run on most people's computers.
Can anyone shed some light on performance comparisons and what the best approach would be to cycling through collections to filter, select and count them in the most performant way? Perhaps there's even a superior approach I didn't mention here that could be far better? I've read through some articles on here (and other sites) that didn't quite answer the question in enough detail for me to feel satisfied with it. I'm also just getting caught up on all the latest stuff in C# and .NET after being away a little while, and I'm not sure if there has been any changes to the framework (or language) that may completely change the things people used to say about Linq. I've heard that Microsoft is boasting of performance improvements in a lot of areas of .NET and wonder if any of those gains pertained to this situation. In any case, I just want to figure out the ideal approach to rapidly filtering, operating on and counting my big (and small collections) as quickly and with as little memory overhead as possible.

Excel-DNA: grouping rows via C API feature of Excel-DNA

I'm familiar with how to group a range in Excel VSTO/COM interop:
ws.EnableOutlining = true;
ws.Outline.SummaryRow = XlSummaryRow.xlSummaryAbove;
var rng = GetRangeSomeHow();
rng.EntireRow.Group();
rng.EntireRow.OutlineLevel = someLevel;
What is the most efficient way to do this in Excel-DNA? I would imagine there must be a C-API way to do it, encapsulated cleverly in Excel-DNA somehow, but for the life of me, I can't figure it out via online documentation (incl. Google).
There's a lot of posts using code similar to my sample above, but these are pretty expensive calls, especially since I need to do this ~5000 times overall (I have a really big data set).
EDIT:
So there seems to be this method call:
XlCall.Excel(XlCall.xlfGroup...)
The only problem is, I have no idea what the parameters are. It seems an ExcelReference should be passed in, but how is the .EntireRow resolved? Will the C API just handle it for me - in which case I just need to pass a new ExcelReference(1,100,1,1) and be done with it... or is there more to this?
Thanks in advance to anyone who can answer my question!

I don't think the C API GROUP function is te one you're looking for. The documentation says:
GROUP
Creates a single object from several selected objects and returns the
object identifier of the group (for example, "Group 5"). Use GROUP to
combine a number of objects so that you can move or resize them
together.
If no object is selected, only one object is selected, or a group is
already selected, GROUP returns the #VALUE! error value and interrupts
the macro.
I'd suggest you use the COM object model for this kind of thing, even in an Excel-DNA add-in. The C API has not really been updated over the years for the general sheet manipulation like this case, so you're likely to run into some features that don't work right or are incomplete relative to the COM object model.
From your Excel-DNA add-in, just make sure your get hold of the right Application root object with a call to ExcelDnaUtil.Application.
For improved performance of this kind of sheet editing, you pretty much have to use the same tricks as from VBA or VSTO - disable screen updating and calculations etc.

natural language query processing

I have a NLP (natural language processing application) running that gives me a tree of the parsed sentence, the questions is then how should I proceed with that.
What is the time
\-SBAR - Suborginate clause
|-WHNP - Wh-noun phrase
| \-WP - Wh-pronoun
| \-What
\-S - Simple declarative clause
\-VP - Verb phrase
|-VBZ - Verb, 3rd person singular present
| \-is
\-NP - Noun phrase
|-DT - Determiner
| \-the
\-NN - Noun, singular or mass
\-time
the application has a build in javascript interpreter, and was trying to make the phrase in to a simple function such as
function getReply() {
return Resource.Time();
}
in basic terms, what = request = create function, is would be the returned object, and the time would reference the time, now it would be easy just to make a simple parser for that but then we also have what is the time now, or do you know what time it is. I need it to be able to be further developed based on the english language as the project will grow.
the source is C# .Net 4.5
thanks in advance.

As far as I can see, using dependency parse trees will be more helpful. Often, the number of ways a question is asked is limited (I mean statistically significant variations are limited ... there will probably be corner cases that people ordinarily do not use), and are expressed through words like who, what, when, where, why and how.
Dependency parsing will enable you to extract the nominal subject and the direct as well as indirect objects in a query. Typically, these will express the basic intent of the query. Consider the example of tow equivalent queries:
What is the time?
Do you know what the time is?
Their dependency parse structures are as follows:
root(ROOT-0, What-1)
cop(What-1, is-2)
det(time-4, the-3)
nsubj(What-1, time-4)
and
aux(know-3, Do-1)
nsubj(know-3, you-2)
root(ROOT-0, know-3)
dobj(is-7, what-4)
det(time-6, the-5)
nsubj(is-7, time-6)
ccomp(know-3, is-7)
Both are what-queries, and both contain "time" as a nominal subject. The latter also contains "you" as a nominal subject, but I think expressions like "do you know", "can you please tell me", etc. can be removed based on heuristics.
You will find the Stanford Parser helpful for this approach. They also have this online demo, if you want to see some more examples at work.

Minimization of f(x,y) where x and y are integers

I was wondering if anyone had any suggestions for minimizing a function, f(x,y), where x and y are integers. I have researched lots of minimization and optimization techniques, like BFGS and others out of GSL, and things out of Numerical Recipes. So far, I have tried implenting a couple of different schemes. The first works by picking the direction of largest descent f(x+1,y),f(x-1,y),f(x,y+1),f(x,y-1), and follow that direction with line minimization. I have also tried using a downhill simplex (Nelder-Mead) method. Both methods get stuck far away from a minimum. They both appear to work on simpler functions, like finding the minimum of a paraboloid, but I think that both, and especially the former, are designed for functions where x and y are real-valued (doubles). One more problem is that I need to call f(x,y) as few times as possible. It talks to external hardware, and takes a couple of seconds for each call. Any ideas for this would be greatly appreciated.
Here's an example of the error function. Sorry I didn't post this before. This function takes a couple of seconds to evaluate. Also, the information we query from the device does not add to the error if it is below our desired value, only if it is above
double Error(x,y)
{
SetDeviceParams(x,y);
double a = QueryParamA();
double b = QueryParamB();
double c = QueryParamC();
double _fReturnable = 0;
if(a>=A_desired)
{
_fReturnable+=(A_desired-a)*(A_desired-a);
}
if(b>=B_desired)
{
_fReturnable+=(B_desired-b)*(B_desired-b);
}
if(c>=C_desired)
{
_fReturnable+=(C_desired-c)*(C_desired-c);
}
return Math.sqrt(_fReturnable)
}

There are many, many solutions here. In fact, there are entire books and academic disciplines based on the subject. I am reading an excellent one right now: How to Solve It: Modern Heuristics.
There is no one solution that is correct - different solutions have different advantages based on specific knowledge of your function. It has even been proven that there is no one heuristic that performs the best at all optimization tasks.
If you know that your function is quadratic, you can use Newton-Gauss to find the minimum in one step. A genetic algorithm can be a great general-purpose tool, or you can try simulated annealing, which is less complicated.

Have you looked at genetic algorithms? They are very, very good at finding minimums and maximums, while avoiding local minimum/maximums.

How do you define f(x,y) ? Minimisation is a hard problem, depending on the complexity of your function.
Genetic Algorithms could be a good candidate.
Resources:
Genetic Algorithms in Search, Optimization, and Machine Learning
Implementing a Genetic Algorithms in C#
Simple C# GA

If it's an arbitrary function, there's no neat way of doing this.
Suppose we have a function defined as:
f(x, y) = 0 for x==100, y==100
100 otherwise
How could any algorithm realistically find (100, 100) as the minimum? It could be any possible combination of values.
Do you know anything about the function you're testing?

What you are generally looking for is called an optimisation technique in mathematics. In general, they apply to real-valued functions, but many can be adapted for integral-valued functions.
In particular, I would recommend looking into non-linear programming and gradient descent. Both would seem quite suitable for your application.
If you could perhaps provide any more details, I might be able to suggest somethign a little more specific.

Jon Skeet's answer is correct. You really do need information about f and it's derivatives even if f is everywhere continuous.
The easiest way to appreciate the difficulties of what you ask(minimization of f at integer values only) is just to think about an f: R->R (f is a real valued function of the reals) of one variable that makes large excursions between individual integers. You can easily construct such a function so that there is NO correllation between the local minimums on the real line and the minimums at the integers as well as having no relationship to the first derivative.
For an arbitrary function I see no way except brute force.

So let's look at your problem in math-speak. This is all assuming I understand
your problem fully. Feel free to correct me if I am mistaken.
we want to minimize the following:
\sqrt((a-a_desired)^2 + (b-b_desired)^2 + (c-c_desired)^2)
or in other notation
||Pos(x - x_desired)||_2
where x = (a,b,c) and Pos(y) = max(y, 0) means we want the "positive part"(this accounts
for your if statements). Finally, we wish to restrict ourself
to solutions where x is integer valued.
Unlike the above posters, I don't think genetic algorithms are what you want at all.
In fact, I think the solution is much easier (assuming I am understanding your problem).
1) Run any optimization routine on the function above. THis will give you
the solution x^* = (a^*, b^*,c^*). As this function is increasing with respect
to the variables, the best integer solution you can hope for is
(ceil(a^*),ceil(b^*),ceil(c^*)).
Now you say that your function is possibly hard to evaluate. There exist tools
for this which are not based on heuristics. The go under the name Derivative-Free
Optimization. People use these tools to optimize objective based on simulations (I have
even heard of a case where the objective function is based on crop crowing yields!)
Each of these methods have different properties, but in general they attempt to
minimize not only the objective, but the number of objective function evaluations.

Sorry the formatting was so bad previously. Here's an example of the error function
double Error(x,y)
{
SetDeviceParams(x,y);
double a = QueryParamA();
double b = QueryParamB();
double c = QueryParamC();
double _fReturnable = 0;
if(a>=A_desired)
{
_fReturnable+=(A_desired-a)*(A_desired-a);
}
if(b>=B_desired)
{
_fReturnable+=(B_desired-b)*(B_desired-b);
}
if(c>=C_desired)
{
_fReturnable+=(C_desired-c)*(C_desired-c);
}
return Math.sqrt(_fReturnable)
}

C#: How to parse arbitrary strings into expression trees?

In a project that I'm working on I have to work with a rather weird data source. I can give it a "query" and it will return me a DataTable. But the query is not a traditional string. It's more like... a set of method calls that define the criteria that I want. Something along these lines:
var tbl = MySource.GetObject("TheTable");
tbl.AddFilterRow(new FilterRow("Column1", 123, FilterRow.Expression.Equals));
tbl.AddFilterRow(new FilterRow("Column2", 456, FilterRow.Expression.LessThan));
var result = tbl.GetDataTable();
In essence, it supports all the standard stuff (boolean operators, parantheses, a few functions, etc.) but the syntax for writing it is quite verbose and uncomfortable for everyday use.
I wanted to make a little parser that would parse a given expression (like "Column1 = 123 AND Column2 < 456") and convert it to the above function calls. Also, it would be nice if I could add parameters there, so I would be protected against injection attacks. The last little piece of sugar on the top would be if it could cache the parse results and reuse them when the same query is to be re-executed on another object.
So I was wondering - are there any existing solutions that I could use for this, or will I have to roll out my own expression parser? It's not too complicated, but if I can save myself two or three days of coding and a heapload of bugs to fix, it would be worth it.

Try out Irony. Though the documentation is lacking, the samples will get you up and running very quickly. Irony is a project for parsing code and building abstract syntax trees, but you might have to write a little logic to create a form that suits your needs. The DLR may be the complement for this, since it can dynamically generate / execute code from abstract syntax trees (it's used for IronPython and IronRuby). The two should make a good pair.
Oh, and they're both first-class .NET solutions and open source.

Bison or JavaCC or the like will generate a parser from a grammar. You can then augment the nodes of the tree with your own code to transform the expression.
OP comments:
I really don't want to ship 3rd party executables with my soft. I want it to be compiled in my code.
Both tools generate source code, which you link with.

I wrote a parser for exaclty this usage and complexity level by hand. It took about 2 days. I'm glad I did it, but I wouldn't do it again. I'd use ANTLR or F#'s Fslex.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Seeing Trends in Code Metrics with NDepend - c#

Related

Ideal performance with filtering generic collections in large Unity/C# game

Excel-DNA: grouping rows via C API feature of Excel-DNA

natural language query processing

Minimization of f(x,y) where x and y are integers

C#: How to parse arbitrary strings into expression trees?

Categories

Resources