I have an algorithm that returns a list of classifications(strings) dependant on the two arguments given to the algorithm: a type variable, and an extra category string that allows certain special classifications to be added to the result list.
The current implementation, is unreadable and unscalable due to the expression of the rules as ifs, and switch statements. Also the rules are hard coded.
A simplified version of the code:
private static List<string> DetermineTypes(Type x, object category) {
List<string> Types = new List<string>();
if (category is DateTime) {
types.Add("1");
types.Add("2");
types.Add("3");
} else if (category is string) {
switch ((string)category) {
case "A":
Types.Add("4");
break;
case "B":
case "C":
case "D":
Types.Add("5");
break;
case "":
Types = DetermineTypesFromX(Types, x);
break;
default:
Types.Add("6");
break;
}
}
return graphTypes;
}
private static List<string> DetermineTypesFromX(List<string> Types, Type x) {
if (x.Equals(typeof(int))) {
Types.Add("7");
} else if (x.Equals(typeof(double))) {
Types.Add("8");
} else if (x.Equals(typeof(System.DateTime))) {
Types.Add("9");
Types.Add("10");
}
return Types;
}
I was thinking that it would be good to maybe specify these with xml, so that a code change wasn't needed for new types/rules, but that is most probably too heavyweight for the situation. Basically I am trying to solve the problem that a new 'Type' may be added at anytime: common case would be that it is one of the 'rules' above, and an unlikely edge case that a new 'rule' branch may have to be added.
I am still to determine whether the work needed it to make it fully dynamic using xml defined rules( or any other way) is worth it compared to the likelihood of the edge cases ever happening and the business environment(schedules etc).
But my main point of the question is how could you elegantly simplify the nested conditional code above? maybe incorporating more flexibility into the design for increased scalability?
I was wondering if using a combination of F# pattern matching might be an appropriate solution? (NB: Never used F# before, have been curious as of late, so thats why I am asking)
A pattern known as dispatch tables has been recently discussed in the following two blog posts and will probably be of interest to you:
Aaron Feng
K. Scott Allen
I wouldn't shy away from a config-based option; it usually has the advantage of not requiring a rebuild. If you don't want that, another option might be type-metadata via an attribute. This would make it trivial to add data for new types (that you write), and you can (indirectly) add attributes to exiting types (int etc) via TypeDescriptor.AddAttributes - as long as you use TypeDescriptor.GetAttributes to get them back out again ;-p
Whether this is a good idea or not... well, reflection (and the twin, TypeDescriptor) can be slow, so if you want to use this in a tight loop I'd look first at something involving a dictionary.
Your problem may be coded in terms of decision tree or decision table
Also, there is posts into Chris Smith's blog about decision trees:
Awesome F# - Decision Trees – Part I and
Awesome F# - Decision Trees – Part II
I would suggest you look at a business rules/inference engine. NxBRE has a good community around it and is quite mature. This may be beyond your immediate requirements but if you expect these rules to increase in complexity over time a BRE will provide a nice framework to keep things under control.
Since you mention F#, here is some F# code with very similar behavior to the C# code:
open System
let DetermineTypesFromX(x:Type) =
if x.Equals(typeof<int>) then
["7"]
elif x.Equals(typeof<double>) then
["8"]
elif x.Equals(typeof<DateTime>) then
["9"; "10"]
else
[]
let DetermineTypes(x:Type, category:obj) =
match category with
| :? DateTime -> ["1"; "2"; "3"]
| :? string as s ->
match s with
| "A" -> ["4"]
| "B" | "C" | "D" -> ["5"]
| "" -> DetermineTypesFromX(x)
| _ -> ["6"]
| _ -> []
That said, I would recommend considering a table-driven approach as an alternative to hard-coded if/switch logic, regardless of whether you move the logic out of the code and into a config file.
I came across similar situation and I've asked a few questions previously in regards to the similar problem that may help you.
The system I did was a configuration driven, rule based dynamic system. All configurations and rules were saved in database. Decision tables were constructed dynamically based on the values and rules retrived from database. Values were then converted and compared in C#. Here's the question I asked about dynamic decision table in C#. And the question regarding dyanmically convert and compare values retrived from databse.
So I end up having something simliar to this in terms of the config table (just an example):
Conditions IsDecision LHS Operator RHS
TTFF False PostCode > 100
TFTF False PostCode < 10000
FTTT True
Note: LHS is the property name of the object.
The above table in plain English:
Condition 1 PostCode > 100 Yes Yes No No
Condition 2 PostCode < 10000 Yes No Yes No
Outcome 1 Yes
Outcome 2 Yes
Outcome 3 Yes
Outcome 4 Yes
Then you have other tables/configs to determine the action for each outcome.
The core parts of the implementation are how to dynamically construct decision table and how to dynamic convert and compare string values, all of which I have provided links to the specific implementations in the above paragraph. I believe you can apply similar concepts in your situations and I hope I've explained the concept in general.
Other Resources:
Martin Fowler's decision tree article.
Luke's post on decision tree.
Related
I have a program which needs to support "User Options" to determine how it will overwrite files, the user can choose from "Options" which can result into several combinations making it hard to code all the possible "IF... ELSE statements", this complex result evaluation is hard to code and it is getting too long and also driving me nuts!
I'm looking to solve this with some sort of "parsing" to evaluate all the possible results in a faster and more organic way without long chains of IF...ELSE blocks
Here is what I have in my program options:
For example: a user has selected to overwrite files and picked the option "FILE SIZE" and selected ">=" as criteria for this option, and also selected "FILE DATE" plus "<=", and picked an "OR", all options select will result in something like "FILE >= x" OR "FILE DATE <= x".
Given the options above on the screen shot, a user can create all sorts of possible logical options and combine them using "OR" and "AND", and also pick the ">, <, >=, <=, =, <>".
The complexity behind this little screen is huge and I've been researching how to tackle down this and I heard about things called Lambda expressions and Binary Trees but I have no clue if it does apply to my problem, I would like to at least have somebody to point me to the right direction, I don't even know how to correctly classify my "issue" when googling around for answers :)
Thanks in advance to all!
I don't think your issue would be solved using expression trees. Expression trees are functional expressions that can be analyzed, improved or evaluated before they're compiled. It's a good solution when you want to create a fluent configuration which should provide which properties provide some configuration decided by the developer (there're other use cases but this goes beyond your question):
config.EnableWhatever(x => x.HasWhatever)
Your choice should be around enumerations with [FlagsAttribute]:
[Flags]
public enum FileSizeOptions
{
None = 0,
IfFileSizeIsGreaterThan = 1,
IfFileSizeIsLowerThan = 2,
OtherOption = 4,
All = IfFileSizeIsGreaterThan | IfFileSizeIsLowerThan | OtherOptions
}
FileSizeOptions options = FileSizeOptions.IfFileSizeIsGreaterThan | FileSizeOptions.OtherOption;
if(options.HasFlag(FileSizeOptions.All))
{
// Do stuff
} else if(options.HasFlag(FileSizeOptions.IfFileSizeIsGreaterThan))
{
// Do stuff
} // and so on...
I mean, you should use masks instead of booleans and .NET has enums as the recommended way to implement masks or flags.
That is, you can evaluate if an enumeration value has 1 or N possible flags defined in the whole enumeration.
I am working on an ASPX page that needs to handle multiple different kinds of data. I came up with a potentially ideal fashion to fetch the information I need, but am unsure if it is as good an idea as it feels. Basically, I need to filter a set into a subset, but which values I filter by will differ by circumstance. I constructed the following code snippet that seems to work fine.
List<string> lStr = new List<string>() {
"Category",
"Document Number", //Case 1 Only
"Document Title", //Case 1 Only
"Picture Title", //Case 2 Only
"Picture Number", //Case 2 Only
"Issue",
"Issue Date",
"Issue Title",
"Notes",
"High Priority" //Case 1 Only
};
AddControls(bigDataInput.Fields.OfType<FieldObject>().Where(x => lStr.Contains(x.Title)).ToArray());
bigDataInput is an object that has a property called Fields, which is a Collection of objects called FieldObject. I need to get a subset of these FieldObjects based on their title, and pass all of them into a method AddControls(params FieldObject[] fields). The issue is that which titles I need to filter by will differ based on the bigDataInput itself. There are only two case scenarios currently, and these are the fields I need to filter out.
bigDataInput Case 1: Category, Document Number, Document Title, Issue, Issue Date, Issue Title, Notes, High Priority
bigDataInput Case 2: Category, Picture Title, Picture Number, Issue, Issue Date, Issue Title, Notes
The bigDataInput will have additional fields besides those that I need. However, the field collection will only have one of the filtered fields if and only if I will actually need the field for that particular case. For example, Case 1 does not have the Picture Title and Picture Number fields, and Case 2 does not have the Document Number, Document Title, and High Priority fields. This restriction will also apply to all future cases, however many there might be.
I at first considered constructing a List based on the specific case scenario, but the switch case to build it could get quite large, and repetitive to a degree. That's when I came up with the idea for the above code snippet. But is this actually a good idea? Or is there a better method than this but much more concise than a potentially humongous switch case?
It's hard to wrap my head around your exact issue, but from what it got it might make sense to attack this a little differently using some design patterns. One that comes to mind that might make sense is the Strategy pattern. Basically, this is a way of encapsulating the algorithm (logic) from its host.
Wikipedia has an entry on the Strategy pattern here: http://en.wikipedia.org/wiki/Strategy_pattern
I'm thinking about a flow somewhat like this:
You have an interface called IDataInputTransformer with 2 methods: bool AcceptsInput(bigDataInput i) and FieldObject[] TransformInput(bigDataInput i)
The calling class has an IEnumerable of IDataInputTransformers that is set somehow -- either manually on instantiation or using dependency injection or something
Upon being passed a bigDataInput, the calling class iterates over each IDataInputTransformer and calls AcceptsInput with the bigDataInput. If the input is not accepted it just tries the next & the next (if none accept the input perhaps an exception is thrown or something). If it the IDataInputTransformer does accept the input then you can call TransformInput and get the FieldObject[] that will be passed to AddControls
You could take this further, but this is the basic idea. The benefits here are:
Readability -- it's really easy to
read
FinanceDepartmentInputTransformer.TransformInput(),
it's really hard to read a huge
switch
Able to change -- you can add new InputTransformers easily (and it won't ruin the readability or functionality)
Rules are isolated -- you can easily check the business rules in the AcceptsInput methods
Portability -- you could use this elsewhere too
Testability -- you can easily unit test this to make sure it works right
One of F#'s claims is that it allows for interactive scripting and data manipulation / exploration. I've been playing around with F# trying to get a sense for how it compares with Matlab and R for data analysis work. Obviously F# does not have all practical functionality of these ecosystems, but I am more interested in the general advantages / disadvantages of the underlying language.
For me the biggest change, even over the functional style, is that F# is statically typed. This has some appeal, but also often feels like a straightjacket. For instance, I have not found a convenient way to deal with heterogeneous rectangular data -- think dataframe in R. Assume I'm reading a CSV file with names (string) and weights (float). Typically I load data in, perform some transformations, add variables, etc, and then run analysis. In R, the first part might look like:
df <- read.csv('weights.csv')
df$logweight <- log(df$weight)
In F#, it's not clear what structure I should use to do this. As far as I can tell I have two options: 1) I can define a class first that is strongly typed (Expert F# 9.10) or 2) I can use a heterogeneous container such as ArrayList. A statically typed class doesn't seem feasible because I need to add variables in an ad-hoc manner (logweight) after loading the data. A heterogeneous container is also inconvenient because every time I access a variable I will need to unbox it. In F#:
let df = readCsv("weights.csv")
df.["logweight"] = log(double df.["weight"])
If this were once or twice, it might be okay, but specifying a type every time I use a variable doesn't seem reasonable. I often deal with surveys with 100s of variables that are added/dropped, split into new subsets and merged with other dataframes.
Am I missing some obvious third choice? Is there some fun and light way to interact and manipulate heterogeneous data? If I need to do data analysis on .Net, my current sense is that I should use IronPython for all the data exploration / transformation / interaction work, and only use F#/C# for numerically intensive parts. Is F# inherently the wrong tool for quick and dirty heterogeneous data work?
Is F# inherently the wrong tool for
quick and dirty heterogeneous data
work?
For completely ad hoc, exploratory data mining, I wouldn't recommend F# since the types would get in your way.
However, if your data is very well defined, then you can hold disparate data types in the same container by mapping all of your types to a common F# union:
> #r "FSharp.PowerPack";;
--> Referenced 'C:\Program Files\FSharp-1.9.6.16\bin\FSharp.PowerPack.dll'
> let rawData =
"Name: Juliet
Age: 23
Sex: F
Awesome: True"
type csv =
| Name of string
| Age of int
| Sex of char
| Awesome of bool
let parseData data =
String.split ['\n'] data
|> Seq.map (fun s ->
let parts = String.split [':'] s
match parts.[0].Trim(), parts.[1].Trim() with
| "Name", x -> Name(x)
| "Age", x -> Age(int x)
| "Sex", x -> Sex(x.[0])
| "Awesome", x -> Awesome(System.Convert.ToBoolean(x))
| data, _ -> failwithf "Unknown %s" data)
|> Seq.to_list;;
val rawData : string =
"Name: Juliet
Age: 23
Sex: F
Awesome: True"
type csv =
| Name of string
| Age of int
| Sex of char
| Awesome of bool
val parseData : string -> csv list
> parseData rawData;;
val it : csv list = [Name "Juliet"; Age 23; Sex 'F'; Awesome true]
csv list is strongly typed and you can pattern match over it, but you have to define all of your union constructors up front.
I personally prefer this approach, since is orders of magnitude better than working with an untyped ArrayList. However, I'm not really sure what you're requirements are, and I don't know a good way to represent ad-hoc variables (except maybe as a Map{string, obj}) so YMMV.
I think that there are a few other options.
(?) operator
As Brian mentioned, you can use the (?) operator:
type dict<'a,'b> = System.Collections.Generic.Dictionary<'a,'b>
let (?) (d:dict<_,_>) key = unbox d.[key]
let (?<-) (d:dict<_,_>) key value = d.[key] <- box value
let df = new dict<string,obj>()
df?weight <- 50.
df?logWeight <- log(df?weight)
This does use boxing/unboxing on each access, and at times you may need to add type annotations:
(* need annotation here, since we could try to unbox at any type *)
let fltVal = (df?logWeight : float)
Top level identifiers
Another possibility is that rather than dynamically defining properties on existing objects (which F# doesn't support particularly well), you can just use top level identifiers.
let dfLogWeight = log(dfWeight)
This has the advantage that you will almost never need to specify types, though it may clutter your top-level namespace.
Property objects
A final option which requires a bit more typing and uglier syntax is to create strongly typed "property objects":
type 'a property = System.Collections.Generic.Dictionary<obj,'a>
let createProp() : property<'a> = new property<'a>()
let getProp o (prop:property<'a>) : 'a = prop.[o]
let setProp o (prop:property<'a>) (value:'a) = prop.[o] <- value
let df = new obj()
let (weight : property<double>) = createProp()
let (logWeight : property<double>) = createProp()
setProp df weight 50.
setProp df logWeight (getProp df weight)
let fltVal = getProp df logWeight
This requires each property to be explicitly created (and requires a type annotation at that point), but no type annotations would be required after that. I find this much less readable than the other options, although perhaps defining an operator to replace getProp would alleviate that somewhat.
I am not sure if F# is a great tool here or not. But there is a third option - the question mark operator. I've been meaning to blog about this for a while now; Luca's recent PDC talk demo'd a CSV reader with C# 'dynamic', and I wanted to code a similar thing with F# using the (?) operator. See
F# operator "?"
for a short description. You can try to blaze ahead and play around with this on your own, or wait for me to blog about it. I have not tried it myself in earnest so I'm not sure exactly how well it will work out.
EDIT
I should add that Luca's talk shows how 'dynamic' in C# addresses at least a portion of this question for that language.
EDIT
See also
http://cs.hubfs.net/forums/thread/12622.aspx
where I post some basic starter CSV code.
I'm interested in both style and performance considerations. My choice is to do either of the following ( sorry for the poor formatting but the interface for this site is not WYSIWYG ):
One:
string value = "ALPHA";
switch ( value.ToUpper() )
{
case "ALPHA":
// do somthing
break;
case "BETA":
// do something else
break;
default:
break;
}
Two:
public enum GreekLetters
{
UNKNOWN= 0,
ALPHA= 1,
BETA = 2,
etc...
}
string value = "Alpha";
GreekLetters letter = (GreekLetters)Enum.Parse( typeof( GreekLetters ), value.ToUpper() );
switch( letter )
{
case GreekLetters.ALPHA:
// do something
break;
case GreekLetters.BETA:
// do something else
break;
default:
break;
}
Personally, I prefer option TWO below, but I don't have any real reason other than basic style reasons. However, I'm not even sure there really is a style reason. Thanks for your input.
The second option is marginally faster, as the first option may require a full string comparison. The difference will be too small to measure in most circumstances, though.
The real advantage of the second option is that you've made it explicit that the valid values for value fall into a narrow range. In fact, it will throw an exception at Enum.Parse if the string value isn't in the expected range, which is often exactly what you want.
Option #1 is faster because if you look at the code for Enum.Parse, you'll see that it goes through each item one by one, looking for a match. In addition, there is less code to maintain and keep consistent.
One word of caution is that you shouldn't use ToUpper, but rather ToUpperInvariant() because of Turkey Test issues.
If you insist on Option #2, at least use the overload that allows you to specify to ignore case. This will be faster than converting to uppercase yourself. In addition, be advised that the Framework Design Guidelines encourage that all enum values be PascalCase instead of SCREAMING_CAPS.
I can't comment on the performance part of the question but as for style I prefer option #2. Whenever I have a known set of values and the set is reasonably small (less than a couple of dozen or so) I prefer to use an enum. I find an enum is a lot easier to work with than a collection of string values and anyone looking at the code can quickly see what the set of allowed values is.
This actually depends on the number of items in the enum, and you would have to test it for each specific scenario - not that it is likely to make a big difference. But it is a great question.
With very few values, the Enum.Parse is going to take more time than anything else in either example, so the second should be slower.
With enough values, the switch statement will be implemented as a hashtable, which should work the same speed with strings and enums, so again, Enum.Parse will probably make the second solution slower, but not by relatively as much.
Somewhere in the middle, I would expect the cost of comparing strings being higher than comparing enums would make the first solution faster.
I wouldn't even be surprised if it were different on different compiler versions or different options.
I would definitely say #1. Enum.Parse() causes reflection which is relatively expensive. Plus, Enum.Parse() will throw an Exception if its not defined and since there's no TryParse() you'd need to wrap it in Try/Catch block
Not sure if there is a performance difference when switching on a string value versus an enum.
One thing to consider is would you need the values used for the case statements elsewhere in your code. If so, then using an enum would make more sense as you have a singular definition of the values. Const strings could also be used.
I am building a fun little app to determine if I should bike to work.
I would like to test to see if it is either Raining or Thunderstorm(ing).
public enum WeatherType : byte
{ Sunny = 0, Cloudy = 1, Thunderstorm = 2, Raining = 4, Snowing = 8, MostlyCloudy = 16 }
I was thinking I could do something like:
WeatherType _badWeatherTypes = WeatherType.Thunderstorm | WeatherType.Raining;
if(currentWeather.Type == _badWeatherTypes)
{
return false;//don't bike
}
but this doesn't work because _badWeatherTypes is a combination of both types. I would like to keep them separated out because this is supposed to be a learning experience and having it separate may be useful in other situations (IE, Invoice not paid reason's etc...).
I would also rather not do: (this would remove the ability to be configured for multiple people)
if(WeatherType.Thunderstorm)
{
return false; //don't bike
}
etc...
Your current code will say whether it's exactly "raining and thundery". To find out whether it's "raining and thundery and possibly something else" you need:
if ((currentWeather.Type & _badWeatherTypes) == _badWeatherTypes)
To find out whether it's "raining or thundery, and possibly something else" you need:
if ((currentWeather.Type & _badWeatherTypes) != 0)
EDIT (for completeness):
It would be good to use the FlagsAttribute, i.e. decorate the type with [Flags]. This is not necessary for the sake of this bitwise logic, but affects how ToString() behaves. The C# compiler ignores this attribute (at least at the moment; the C# 3.0 spec doesn't mention it) but it's generally a good idea for enums which are effectively flags, and it documents the intended use of the type. At the same time, the convention is that when you use flags, you pluralise the enum name - so you'd change it to WeatherTypes (because any actual value is effectively 0 or more weather types).
It would also be worth thinking about what "Sunny" really means. It's currently got a value of 0, which means it's the absence of everything else; you couldn't have it sunny and raining at the same time (which is physically possible, of course). Please don't write code to prohibit rainbows! ;) On the other hand, if in your real use case you genuinely want a value which means "the absence of all other values" then you're fine.
I'm not sure that it should be a flag - I think that you should have an range input for:
Temperature
How much it's raining
Wind strength
any other input you fancy (e.g. thunderstorm)
you can then use an algorithm to determine if the conditions are sufficiently good.
I think you should also have an input for how likely the weather is to remain the same for cycling home. The criteria may be different - you can shower and change more easliy when you get home.
If you really want to make it interesting, collect the input data from a weather service api, and evaulate the decision each day - Yes, I should have cycled, or no, it was a mistake. Then perhaps you can have the app learn to make better decisions.
Next step is to "socialize" your decision, and see whether other people hear you are making the same decisions.
use the FlagsAttribute. That will allow you to use the enum as a bit mask.
You need to use the [Flags] attribute (check here) on your enum; then you can use bitwise and to check for individual matches.
You should be using the Flags attribute on your enum. Beyond that, you also need to test to see if a particular flag is set by:
(currentWeather.Type & WeatherType.Thunderstorm == WeatherType.Thunderstorm)
This will test if currentWeather.Type has the WeatherType.Thunderstorm flag set.
I wouldn't limit yourself to the bit world. Enums and bitwise operators are, as you found out, not the same thing. If you want to solve this using bitwise operators, I'd stick to just them, i.e. don't bother with enums. However, I'd something like the following:
WeatherType[] badWeatherTypes = new WeatherType[]
{
WeatherType.Thunderstorm,
WeatherType.Raining
};
if (Array.IndexOf(badWeatherTypes, currentWeather.Type) >= 0)
{
return false;
}