C# - Calculations as objects - Many Calculations with Many Dependencies - c#

I have developed a fairly complex spreadsheet in Excel, and I am tasked with converting it to a C# program.
What I am trying to figure out is how to represent the calculations from my spreadsheet in C#.
The calculations have many dependencies, to the point that it would almost appear to be a web, rather than a nice neat hierarchy.
The design solution I can think of is this:
Create an object to represent each calculation.
Each object has an integer or double, which contains the calculation.
this calc has inputs from other objects and so requires that they are evaluated first before it can be performed.
Each object has a second integer "completed", which evaluates to 1 if the previous calculation is successful
Each object has a third integer "ready"
This item requires all precedent object's "completed" integers evaluate to
"1" and if not, the loop skips this object
A Loop runs through all objects, until all of the "completed" integers = 1
I hope this makes sense. I am typing up the code for this but I am still pretty green with C# so at least knowing i'm on the right track is a boon :)
To clarify, this is a design query, I'm simply looking for someone more experienced with C# than myself, to verify that my method is sensible.
I appreciate any help with this issue, and I'm keen to hear your thoughts! :)
edit*
I believe the "completed" state and "ready" state are required for the loop state check to prevent errors that might occur from attempts to evaluate a calculation where precedents aren't evaluated. Is this necessary?
I have it set to "Any CPU", the default setting.
edit*
For example, one object would be a line "V_dist"
It has length, as a property.
It's length "V_dist.calc_formula" is calculated from two other objects "hpc*Tan(dang)"
public class inputs
{
public string input_name;
public int input_angle;
public int input_length;
}
public class calculations
{
public string calc_name; ///calculation name
public string calc_formula; ///this is just a string containing formula
public double calculationdoub; ///this is the calculation
public int completed; ///this will be set to 1 when "calculationdoub" is nonzero
public int ready; ///this will be set to 1 when dependent object's "completed" property = 1
}
public class Program
{
public static void Main()
{
///Horizontal Length
inputs hpc = new inputs();
hpc.input_name = "Horizontal "P" Length";
hpc.input_angle = 0;
hpc.input_length = 200000;
///Discharge Angle
inputs dang = new inputs();
dang.input_name = "Discharge Angle";
dang.input_angle = 12;
dang.input_length = 0;
///First calculation object
calculations V_dist = new calculations();
V_dist.calc_name = "Vertical distance using discharge angle";
V_dist.calc_formula = "hpc*Tan(dang)";
**V_dist.calculationdoub = inputs.hpc.length * Math.Tan(inputs.dang.input_angle);**
V_dist.completed = 0;
V_dist.ready = 0;
}
}
It should be noted that the other features I have yet to add, such as the loop, and the logic controlling the two boolean properties

You have some good ideas, but if I understand what you are trying to do, I think there is a more idiomatic -- more OOP way to solve this that is also much less complicated. I am presupposing you have a standard spreadsheet, where there are many rows on the spreadsheet that all effectively have the same columns. It may also be you have different columns in different sections of the spreadsheet.
I've converted several spreadsheets to applications, and I have settled on this approach. I think you will love it.
For each set of headers, I would model that as a single object class. Each column would be a property of the class, and each row would be one object instance.
In all but very rare cases, I would say simply model your properties to include the calculations. A simplistic example of a box would be something like this:
public class Box
{
public double Length { get; set; }
public double Width { get; set; }
public double Height { get; set; }
public double Area
{
get { return 2*Height*Width + 2*Length*Height + 2*Length*Width; }
}
public double Volume
{
get { return Length * Width * Height; }
}
}
And the idea here is if there are properties (columns in Excel) that use other calculated properties/columns as input, just use the property itself:
public bool IsHuge
{
get { return Volume > 50; }
}
.NET will handle all of the heavy lifting and dependencies for you.
In most cases, this will FLY in C# compared to Excel, and I don't think you'll have to worry about computational speed in the way you've set up your cascading objects.
When I said all but rare cases, if you have properties that are very computationally expensive, then you can make these properties private and then trigger the calculations.
public class Box
{
public double Length { get; set; }
public double Width { get; set; }
public double Height { get; set; }
public double Area { get; private set; }
public double Volume { get; private set; }
public bool IsHuge { get; private set; }
public void Calculate()
{
Area = 2*Height*Width + 2*Length*Height + 2*Length*Width;
Volume = Length * Width * Height;
IsHuge = Volume > 50;
}
}
Before you go down this path, I'd recommend you do performance testing. Unless you have millions of rows and/or very complex calculations, I doubt this second approach would be worthwhile, and you have the benefit of not needing to define when to calculate. It happens when, and only when, the property is accessed.

Related

C# beginner has a problem with class attributes

I am a C# beginner and and I've been working on object-orientation for the past few days.
That is why please excuse my if the question is dumb.
I wrote 2 attributes for a class. Can anyone say to me where is the difference between the first and the second?
public class house
{
private int Height;
public int _Height
{
get { return Height; }
}
public int height { get; }
}
Is there a difference between?
C# knows fields and properties. A field stores data, a property accesses it. In the basic form, this looks as follows:
public class House
{
private int _height; // a field storing an integer
public int Height // A property that can be used to access the _height field
{
get
{
return _height;
}
set
{
_height = value;
}
}
}
The above is, for an outside viewer (almost) equivalent to:
public class House
{
public int Height; // a public field storing an integer
}
but this is discouraged, because fields should not be public. If you want to change something inside your class later, that gets more difficult.
The property has different advantages, one of them being that you can debug when the value gets changed, or you can verify that the value is in range (e.g that no one is setting a negative height). You can also leave away the setter, which allows the users of the class to only read the field, but not set it.
Since properties are so common in C#, the following abbreviation is allowed:
public class House
{
public int Height // An auto-implemented property
{
get;
set;
}
}
These properties are called auto-implemented. Again, for an outsider, this looks exactly the same and the compiler actually converts this to exactly the same code as the first example above. The only difference is that you cannot directly access the field. And you cannot add verification code with this syntax.
So basically, the three variants achieve almost the same result and it is mostly a matter of taste which one to use. By convention, the last variant is mostly used if no verification needs to be done, otherwise variant 1.
Per comment, here is an example with verification:
public class House
{
private int _height; // a field storing an integer
public int Height // A property that can be used to access the _height field
{
get
{
return _height;
}
set
{
if (value < 0)
{
throw new InvalidOperationException("The height of a house cannot be less than 0");
}
_height = value;
}
}
}

Specify columns as both feature and label in multiple combined regression models (ML.NET)

I'm using ML.NET to predict a series of values using a regression model. I am only interested in one column being predicted (the score column). However, the values of some of the other columns are not available for the prediction class. I can't leave them at 0 as this would upset the prediction, so I guess they would have to also be predicted.
I saw a similar question here on predicting multiple values. The answer suggests creating two models, but I can see that the feature columns specified in each model do not include the label column of the other model. So this implies that those columns would not be used when making the prediction. Am I wrong, or should the label column of each model also be included in the feature column of the other model?
Here's some example code to try and explain in code:
public class FooInput
{
public float Feature1 { get; set; }
public float Feature2 { get; set; }
public float Bar {get; set; }
public float Baz {get; set; }
}
public class FooPrediction : FooInput
{
public float BarPrediction { get; set; }
public float BazPrediction { get; set; }
}
public ITransformer Train(IEnumerable<FooInput> data)
{
var mlContext = new MLContext(0);
var trainTestData = mlContext.Data.TrainTestSplit(mlContext.Data.LoadFromEnumerable(data));
var pipelineBar = mlContext.Transforms.CopyColumns("Label", "Bar")
.Append(mlContext.Transforms.CopyColumns("Score", "BarPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Baz"))
.Append(mlContext.Regression.Trainers.FastTree());
var pipelineBaz = mlContext.Transforms.CopyColumns("Label", "Baz")
.Append(mlContext.Transforms.CopyColumns("Score", "BazPrediction"))
.Append(mlContext.Transforms.Concatenate("Features", "Feature1", "Feature2", "Bar"))
.Append(mlContext.Regression.Trainers.FastTree());
return pipelineBar.Append(pipelineBaz).Fit(trainTestData.TestSet);
}
This is effectively the same as the aforementioned answer, but with the addition of Baz as a feature for the model where Bar is to be predicted, and conversely the addition of Bar as a feature for the model where Baz is to be predicted.
Is this the correct approach, or does the answer on the other question achieve the desired result, being that the prediction of each column will utilise the values of the other predicted column from the loaded dataset?
One technique you can use is called "Imputation", which replaces these unknown values with some "guessed" value. Imputation is simply the process of substituting the missing values of our dataset.
In ML.NET, what you're looking for is the ReplaceMissingValues transform. You can find samples on learn.microsoft.com.
The technique you are discussing above is also a form of imputation, where your unknowns are replaced by predicting the value from the other known values. This can work as well. I guess I would try both forms and see what works best for your dataset.

Destroy objects per instance

There are several much more complicated answers out there to a simple question I have, so I'll ask the question in regards to my situation because i can't quite figure out what to do based off of those other answers. Garbage collection seems like a danger zone, so I'll err on the side of caution.
I have a Measurement object that contains a Volume object and a Weight object. Depending on which constructor is used, I would like to destroy the opposite object, that is to say, if a user adds a volumetric measurement, I would like to eradicate the weight element of that instance, as it is just bloat at that point. What should be done?
Edited for clarification:
public class RawIngredient
{
public string name { get; set; }
public Measurement measurement;
public RawIngredient(string n, double d, Measurement.VolumeUnits unit)
{
name = n;
measurement.volume.amount = (decimal)d;
measurement.volume.unit = unit;
//I want to get rid of the weight object on this instance of measurement
}
public RawIngredient(string n, double d, Measurement.WeightUnits unit)
{
name = n;
measurement.weight.amount = (decimal)d;
measurement.weight.unit = unit;
//I want to get rid of the volume object on this instance of measurement
}
}
Edited again to show Measurement
public class Measurement
{
public enum VolumeUnits { tsp, Tbsp, oz, cup, qt, gal }
public enum WeightUnits { oz, lb }
public Volume volume;
public Weight weight;
}
Volume and Weight are simple classes with two fields.
First of all, what needs to be destroyed? This is happening in the ctor, so just don't create the one you don't want.
class Measurement
{
public Volume Volume {get; set;}
public Weight Weight {get; set;}
public Measurement (Volume v) { Volumme = v; Weight = null;}
public Measurement (Weight w) { Volumme = null; Weight = w;}
}
If you are in the Measurement constructor, then simply don't create the non-required type; it will remain as the default value of null (as long as Volume and Weight are reference types and not structs), and any attempted reference to the wrong type would throw an exception.
As long as the Measurement object is in scope, the garbage collector couldn't collect the non-required type, as it would be in the scope of the Measurement instance, and could theoretically be created at any time, regardless of your actual intentions in reality.
If the objects implement IDisposable, you should call their Dispose method and ensure that they're not referenced or disposed-of again.
After Dispose-ing (if necessary), you can set the unused object to null.
Reference: http://blog.stephencleary.com/2010/02/q-should-i-set-variables-to-null-to.html

Design principles/pattern using repository, data transformation, and math formula

Update
I have a RiskReport type, which gets data from IReportRepository, manipulates the data, and calculates risk according to predefined formula.
One might argument that the RiskReport type should get the data in exact format and not perform the data manipulation. RiskReport should be only concerned with how to calculate data according to formula, Whereas IReportRepository should only return the data required by RiskReport class.
Should a new class be introduced between IReportRepository and RiskReport? Because, currently, the data returned from IReportRepository is manipulated to the required format to calculate the risk.
class RiskReport
{
private IReportRepository reportRepository;
public RiskReport(IReportRepository reportRepository)
{
this.reportRepository = reportRepository;
}
public decimal CalculateDataBasedOnFormula()
{
var result = from d in reportRepository.GetReportRelatedData()
group d by d.Id into dgp //potentially complex grouping
select new
{
TotalPage = dgp.Sum(x=>x.Pages) //potentially complex projection
};
decimal risk= //use the result variable to calculate data based on complex formula not shown here
return risk;
}
}
interface IReportRepository
{
IEnumerable<ReportRelatedData> GetReportRelatedData();
}
public class ReportRepository: IReportRepository
{
public IEnumerable<ReportRelatedData> GetReportRelatedData()
{
//return data from underlying data source
return new BindingList<ReportRelatedData>();
}
}
public class ReportRelatedData
{
public int Id { get; set; }
public int Name { get; set; }
public int Pages { get; set; }
//... more properties here
}
Any idea would be appreciated!
I have a Report type, which gets data from IReportRepository, manipulates the data, and
calculates rate according to predefined formula.
I think the answer is in your first sentence. If you want the code to be good, make it SOLID. "S" stands for Single Responsibility Principle. In other words, if you describe what a class does, don't use the word "and". Change your design accordingly.
I think this is one of those questions where if you ask 1000 devs you could get 1000 answers, but yes, I would say that another class should be used. Here's my justification:
A "math"-ish class could be tested independently
A separate class could be reused, keeping the rest of your code DRY
If the formulas change, refactoring won't be nestled into your report code
If I had to inherit the code base, I would like to see three classes here, so that's what I would like to leave behind for the next dev if I was developing it.
Cheers.

Anonymous classes, temporary data, and collections of anonymous classes

I'm new to anonymous classes, and today I think I ran into the first case where I felt like I could really use them. I'm writing a method that would benefit from storing temporary data inside of a class, and since that class doesn't have any meaning outside of that method, using an anonymous class sure made sense to me (at least at the time it did).
After starting on the coding, it sure seemed like I was going to have to make some concessions. I like to put assign things like calculations to temporary variables, so that during debugging I can verify bits of calculations at a time in logical chunks. Then I want to assign something simpler to the final value. This value would be in the anonymous class.
The problem is that in order to implement my code with anonymous classes concisely, I'd like to use LINQ. The problem here is that I don't think you can do such temporary calculations inside of the statement. or can you?
Here is a contrived example of what I want to do:
namespace AnonymousClassTest
{
/// <summary>
/// Interaction logic for Window1.xaml
/// </summary>
public partial class Window1 : Window
{
ObservableCollection<RectanglePoints> Points { get; set; }
public class RectanglePoints
{
public Point UL { get; set; }
public Point UR { get; set; }
public Point LL { get; set; }
public Point LR { get; set; }
}
public class DontWantThis
{
public double Width { get; set; }
public double Height { get; set; }
}
private Dictionary<string,string> properties = new Dictionary<string,string>();
private Dictionary<string,double> scaling_factors = new Dictionary<string,double>();
private void Sample()
{
// not possible to do temp variables, so need to have
// longer, more unreadable assignments
var widths_and_heights = from rp in Points
select new
{
Width = (rp.UR.X - rp.UL.X) * scaling_factors[properties["dummy"]],
Height = (rp.LL.Y - rp.UL.Y) * scaling_factors[properties["yummy"]]
};
// or do it in a for loop -- but then you have to use a concrete
// class to deal with the Width and Height storage
List<DontWantThis> other_widths_and_heights = new List<DontWantThis>();
foreach( RectanglePoints rp in Points) {
double base_width = rp.UR.X - rp.UL.X;
double width_scaling_factor = scaling_factors[properties["dummy"]];
double base_height = rp.LL.Y - rp.UL.Y;
double height_scaling_factor = scaling_factors[properties["yummy"]];
other_widths_and_heights.Add( new DontWantThis
{
Width=base_width * width_scaling_factor,
Height=base_height * height_scaling_factor
});
}
// now we want to use the anonymous class, or concrete class, in the same function
foreach( var wah in widths_and_heights)
Console.WriteLine( String.Format( "{0} {1}", wah.Width, wah.Height));
foreach( DontWantThis dwt in other_widths_and_heights)
Console.WriteLine( String.Format( "{0} {1}", dwt.Width, dwt.Height));
}
public Window1()
{
InitializeComponent();
Points = new ObservableCollection<RectanglePoints>();
Random rand = new Random();
for( int i=0; i<10; i++) {
Points.Add( new RectanglePoints { UL=new Point { X=rand.Next(), Y=rand.Next() },
UR=new Point { X=rand.Next(), Y=rand.Next() },
LL=new Point { X=rand.Next(), Y=rand.Next() },
LR=new Point { X=rand.Next(), Y=rand.Next() }
} );
}
Sample();
}
}
}
NOTE: don't try to run this unless you actually add the keys to the Dictionary :)
The creation of the anonymous class in LINQ is awesome, but forces me to do the calculation in one line. Imagine that the calc is way longer than what I've shown. But it is similar in that I will do some Dictionary lookups to get specific values. Debugging could be painful.
The usage of a concrete class gets around this problem of using temporary variables, but then I can't do everything concisely. Yes, I realize that I'm being a little contradictory in saying that I'm looking for conciseness, while asking to be able to save temp variables in my LINQ statement.
I was starting to try to create an anonymous class when looping over Points, but soon realized that I had no way to store it! You can't use a List because that just loses the entire anonymity of the class.
Can anyone suggest a way to achieve what I'm looking for? Or some middle ground? I've read a few other questions here on StackOverflow, but none of them are exactly the same as mine.
Assuming I understand you correctly, the problem is that you have to set all the properties in a single expression. That's definitely the case with anonymous types.
However, you don't have to do it all inline in that expression. I would suggest that if your properties are based on complex expressions, you break those expressions out into helper methods:
var complex = new {
First = ComputeFirstValue(x, y),
Second = ComputeSecondValue(a, b)
...
};
This has the additional potential benefit that you can unit test each of the helper methods individually, if you're a fan of white-box testing (I am).
This isn't going to avoid there being in one big anonymous type initializer expression, but it means the work will be broken up.
Anonymous classes are really intended to simplify stuff dealing with lambdas, not least LINQ. What you're trying to do sounds much more suited to a nested private class. That way, only your class really knows about your temp class. Trying to muck around with anonymous classes seems only to complicate your code.

Categories