I am working on a small part of a matching system that uses boolean conditional expressions.
These conditional expressions are contrained to a single variable and a single operator (with an edge case of an Inclusive Between).
I am interested in:
Equal To "="
Greater than ">"
Greater Than Or Equal To ">="
Less Than "<"
Less Than Or Equal To "<="
Inclusive Between ">= AND <="
I have a requirement to compare two conditional expressions and evaluate:
1) Is there an overlap of possible values?
Does "X > 1000" overlap with "X > 999"? Yes.
2) If there is an overlap, return the overlap:
The overlap of "X > 1000" with "X > 999" is "X > 1000"
3) Is a conditional expression constrained by another?
"X < 999" is constrained by "X < 1000" ; "X < 1001" is not constrained by "X < 1000"
What I have done so far is build up a truth table of all possible combinations and return the results, but I was wondering if there was an easier way to calculate these?
Any Theory / Reference material / C# libraries out there?
I haven't heard of any, but you can easily do without them if you represent the constraints as intervals:
x > 1000 becomes (1000, double.Infinity)
x == 1000 becomes [1000, 1000]
etc.
This way you need only one class
class Constraint
{
double Lower; bool isLowerStrict;
double Upper; bool isUpperStrict;
bool isIn(double d)
{
return (isLowerStrict ? Lower < d : Lower <= d) &&
(isUpperStrict ? Upper > d : Upper >= d);
}
Constraint intersect(Constraint other)
{
Constraint result = new Constraint();
if (Lower > other.Lower)
{
result.Lower = Lower;
result.isLowerStrict = isLowerStrict;
}
else if (Lower < other.Lower)
{
result.Lower = other.Lower;
result.isLowerStrict = other.isLowerStrict;
}
else
{
result.Lower = Lower;
result.IsLowerStrict = isLowerStrict || other.isLowerStrict;
}
// the same for upper
return result;
}
public bool isEmpty()
{
if (Lower > Upper) return true;
if (Lower == Upper && (isLowerStrict || isUpperStrict)) return true;
return false;
}
public bool Equals(Constraint other)
{
if (isEmpty()) return other.isEmpty();
return (Lower == other.Lower) && (Upper = other.Upper) &&
(isLowerStrict == other.IsLowerStrict) &&
(isUpperStrict == other.isUpperStrict);
}
// construction:
static Constraint GreaterThan(double d)
{
return new Constraint()
{
Lower = d,
isLowerStrict = true,
Upper = double.PositiveInfinity,
isUpperStrict = false
};
}
static Constraint IsEqualTo(double d)
{
return new Constraint()
{
Lower = d,
isLowerStrict = false,
Upper = d,
isUpperStrict = false
};
}
// etc.
}
With this code, you can answer the questions:
1) overlap: a.Intersect(b).isEmpty()
2) intersect: a.Intersect(b)
3) constrain: a.Intersect(b).Equals(a)
EDIT:
As #CodeInChaos suggests, you should consider replacing double with decimal. Mind that decimal lacks infinite values, so you should use decimal.MaxValue and decimal.MinValue instead.
I had written some sample code fast. Hope it makes sense:
enum SygnType
{
More, Less, Equal
}
public class Representation
{
public SignType sign;
public int value;
}
public class Range
{
public bool infinityNegative;
public bool infinityPositive;
public int minValue;
public int maxValue;
public Range(List<Representation> values)
{
infinityNegative=true;
infinityPositive=true;
foreach(var value in values)
{
if (value.sign==SignType.More)
{
infinityNegative=false;
if (value>minValue)
minValue=value;
}
else if (value.sign==SignType.Less)
{
infinityPositive=false;
if (value<maxValue)
maxValue=value;
}
else if (value.sign==SignType.Equal)
{
infinityPositive=infinityNegative=false;
minValue=maxValue=value;
break;
}
}
}
public bool Overlaps(Range checkRange)
{
if (checkRange.infinityPositive)
return CompareUpperLevelValue(checkRange); //this method should compare upper value overlapping
else if (checkRange.infinityNegative)
return CompareLowerLevelValue(checkRange); //this method should compare lower value overlapping
else
return CompareInterval(checkRange); //this method should compare interval
}
public bool CompareUpperLevelValue(Range checkRange)
{
if (checkRange.maxValue<maxValue)
return true;
else
return false
}
public bool CompareLowerLevelValue(Range checkRange)
{
if (checkRange.minValue>minValue)
return true;
else
return false
}
public bool CompareInterval(Range checkRange)
{
if ((checkRange.minValue>minValue)&&(checkRange.maxValue<maxValue))
return true;
else
return false;
}
}
Related
i'm trying to make a class that contains 4 functions: isLong, isDouble, stringToLong, and stringToDouble. I am trying to do this without using a TryParse function. Ideally this class would receive a string and return the appropriate type (bool, bool, long, and double) in respective order.
For instance if i enter the number 100000 it returns True (bool) for isLong.
Below is an example of how i did isLong but i am having difficulty trying to do the same for isDouble for receiving decimals and for both stringToLong/stringToDouble.
public static bool isLong(string s)
{
bool ret = true;
int i;
s = s.Trim();
if (s[0] == '-')
{
i = 1;
}
else
{
i = 0;
}
for (; (i < s.Length); i = i + 1)
{
ret = ret && ((s[i] >= '0') && (s[i] <= '9'));
}
return (ret);
}
You could use MinValue and MaxValue properties for check numeric types, for instance you could define a method like this:
public bool IsLong(decimal value)
{
return value >= long.MinValue && value <= long.MaxValue && value == (long)value;
}
I have tried the comparison for two integer values by using two types
Type 1 :
int val1 = 1;
int val2 = 2;
var returnValue = val1.CompareTo(val2);//-1 for First int is smaller.
varreturnValue = val2.CompareTo(val1);// 1 for First int is larger
varreturnValue = val1.CompareTo(val1);//0 for Ints are equal.
If(varreturnValue ==1)
{
//Success
}
else
{
//Failure
}
Type 2:
int val1 = 1;
int val2 = 2;
if (val1 < val2)
{
//return -1 //Failure
}
else if (val2 < val1)
{
//return 2 //Success
}
else
{
// return 0 // Same
}
What is the difference these methods?
Which one(type) is better for standard coding .. ?
Any difference for performance in the types ?
When I take a peek at the internals of int's CompareTo() method (using ReSharper), I see this:
public int CompareTo(int value)
{
if (this < value)
return -1;
return this > value ? 1 : 0;
}
So it would appear, in the case of an int anyway, that the CompareTo() function is doing exactly what your second example does.
If we remove the ternary operator, it looks identical to your example:
public int CompareTo(int value)
{
if (this < value)
return -1;
if (this > value)
return -1;
return 0;
}
In my opinion, the CompareTo method is good in case you need to separate the logic that checks for equality and another logic that uses the result from the comparison. In your example, when you do your code like:
int val1 = 1;
int val2 = 2;
if (val1 < val2)
{
//return -1 //Failure
}
else if (val2 < val1)
{
//return 2 //Success
}
else
{
// return 0 // Same
}
You cannot return to another function the comparison result. Here is the code extracted from msdn:
enum Comparison {
LessThan=-1, Equal=0, GreaterThan=1};
public class ValueComparison
{
public static void Main()
{
int mainValue = 16325;
int zeroValue = 0;
int negativeValue = -1934;
int positiveValue = 903624;
int sameValue = 16325;
Console.WriteLine("Comparing {0} and {1}: {2} ({3}).",
mainValue, zeroValue,
mainValue.CompareTo(zeroValue),
(Comparison) mainValue.CompareTo(zeroValue));
}
}
In this case, the comparison result is represented as an enum and can be passed between functions.
Another case is you could even serialize the comparison result over the wire as a numeric value (-1,0,1) (return value of an ajax call, for example)
There may be not much thing to do with numeric comparison like this, but as noted by Patryk Ćwiek in his comment. CompareTo may be used via interface, which can be implemented by other datatypes including your custom ones.
blockProperty is dictionary<string,string[]>
bool BlockMatch(IList<string> container, string block, int cut)
{
string[] blockArray = blockProperty[block];
int length = blockArray.Length - cut;
if (length > container.Count)
{
return false;
}
for (int i = 0; i < length; i++)
{
if (blockArray[length - 1 - i] != container[container.Count - 1 - i])
{
return false;
}
}
return true;
}
Columns are: inclusive Elapsed time, exclusive Elapsed time, inclusive percentage (of the whole program), exclusive percentage, number of calls.
How can i optimize the method according to the profiling breakdown? As I find it strange that the exclusive elapsed time of the method (6042) is more than a half of inclusive one (10095).
To break this down any further, you may (for testing purposes) split up the function to small "one-line-subfunctions" so you can see the profiling broken down even more. Another help will be a double click on the profiling line to show you the individual function calls with their relative time.
try
if(!blockArray[length - 1 - i].Equals( container[container.Count - 1 - i]) )) {return false;}
Its possible that traversing the array in reverse-order is doing this: from what I know, arrays are optimized for forward/sequential access. Furthermore you might be preventing the JIT from doing bounds-checking elimination with all that arithmetic. Try this:
for (int i = cut; i < blockArray.Length; i++)
{
if (!StringComparer.Ordinal.Equals(blockArray[i], container[i]))
return false;
}
However, in the end it depends on how many items you have in the array - if there are a lot there isn't much you can do (except use unsafe code, but that is only going to give you a tiny bit extra).
Edit: you can improve the speed of negative cases using HashCodes; at the cost of memory.
class StringAndHash
{
public int HashCode;
public string Value;
public StringAndHash(string value)
{
if (value == null)
HashCode = 0;
else
HashCode = StringComparer.Ordinal.GetHashCode(value.GetHashCode());
Value = value;
}
public static implicit operator string(StringAndHash value)
{
return value.Value;
}
public static implicit operator StringAndHash(string value)
{
return new StringAndHash(value);
}
public override int GetHashCode()
{
return HashCode;
}
public override bool Equals(object obj)
{
var sah = obj as StringAndHash;
if (!object.ReferenceEquals(sah, null))
{
return Equals(sah);
}
return base.Equals(obj);
}
public override bool Equals(StringAndHash obj)
{
return obj.HashCode == HashCode // This will improve perf in negative cases.
&& StringComparer.Ordinal.Equals(obj.Value, Value);
}
}
public Dictionary<string, StringAndHash[]> blockProperty;
bool BlockMatch(IList<StringAndHash> container, string block, int cut)
{
var blockArray = blockProperty[block];
var length = blockArray.Length - cut;
if (length > container.Count)
{
return false;
}
for (int i = cut; i < blockArray.Length; i++)
{
if (blockArray[i].Equals(container[i]))
{
return false;
}
}
return true;
}
I've written the following IComparer but I need some help. I'm trying to sort a list of numbers but some of the numbers may not have been filled in. I want these numbers to be sent to the end of the list at all times.. for example...
[EMPTY], 1, [EMPTY], 3, 2
would become...
1, 2, 3, [EMPTY], [EMPTY]
and reversed this would become...
3, 2, 1, [EMPTY], [EMPTY]
Any ideas?
public int Compare(ListViewItem x, ListViewItem y)
{
int comparison = int.MinValue;
ListViewItem.ListViewSubItem itemOne = x.SubItems[subItemIndex];
ListViewItem.ListViewSubItem itemTwo = y.SubItems[subItemIndex];
if (!string.IsNullOrEmpty(itemOne.Text) && !string.IsNullOrEmpty(itemTwo.Text))
{
uint itemOneComparison = uint.Parse(itemOne.Text);
uint itemTwoComparison = uint.Parse(itemTwo.Text);
comparison = itemOneComparison.CompareTo(itemTwoComparison);
}
else
{
// ALWAYS SEND TO BOTTOM/END OF LIST.
}
// Calculate correct return value based on object comparison.
if (OrderOfSort == SortOrder.Descending)
{
// Descending sort is selected, return negative result of compare operation.
comparison = (-comparison);
}
else if (OrderOfSort == SortOrder.None)
{
// Return '0' to indicate they are equal.
comparison = 0;
}
return comparison;
}
Cheers.
Your logic is slightly off: your else will be entered if either of them are empty, but you only want the empty one to go to the end of the list, not the non-empty one. Something like this should work:
public int Compare(ListViewItem x, ListViewItem y)
{
ListViewItem.ListViewSubItem itemOne = x.SubItems[subItemIndex];
ListViewItem.ListViewSubItem itemTwo = y.SubItems[subItemIndex];
// if they're both empty, return 0
if (string.IsNullOrEmpty(itemOne.Text) && string.IsNullOrEmpty(itemTwo.Text))
return 0;
// if itemOne is empty, it comes second
if (string.IsNullOrEmpty(itemOne.Text))
return 1;
// if itemTwo is empty, it comes second
if (string.IsNullOrEmpty(itemTwo.Text)
return -1;
uint itemOneComparison = uint.Parse(itemOne.Text);
uint itemTwoComparison = uint.Parse(itemTwo.Text);
// Calculate correct return value based on object comparison.
int comparison = itemOneComparison.CompareTo(itemTwoComparison);
if (OrderOfSort == SortOrder.Descending)
comparison = (-comparison);
return comparison;
}
(I might've got the "1" and "-1" for when they're empty back to front, I can never remember :)
I'd actually approach this a completely different way, remove the empty slots, sort the list, then add the empty ones to the end of the list
static void Main(string[] args)
{
List<string> ints = new List<string> { "3", "1", "", "5", "", "2" };
CustomIntSort(ints, (x, y) => int.Parse(x) - int.Parse(y)); // Ascending
ints.ForEach(i => Console.WriteLine("[{0}]", i));
CustomIntSort(ints, (x, y) => int.Parse(y) - int.Parse(x)); // Descending
ints.ForEach(i => Console.WriteLine("[{0}]", i));
}
private static void CustomIntSort(List<string> ints, Comparison<string> Comparer)
{
int emptySlots = CountAndRemove(ints);
ints.Sort(Comparer);
for (int i = 0; i < emptySlots; i++)
ints.Add("");
}
private static int CountAndRemove(List<string> ints)
{
int emptySlots = 0;
int i = 0;
while (i < ints.Count)
{
if (string.IsNullOrEmpty(ints[i]))
{
emptySlots++;
ints.RemoveAt(i);
}
else
i++;
}
return emptySlots;
}
This question caught my attention recently, this comparer will do it either
class CustomComparer
: IComparer<string>
{
private bool isAscending;
public CustomComparer(bool isAscending = true)
{
this.isAscending = isAscending;
}
public int Compare(string x, string y)
{
long ix = CustomParser(x) * (isAscending ? 1 : -1);
long iy = CustomParser(y) * (isAscending ? 1 : -1);
return ix.CompareTo(iy) ;
}
private long CustomParser(string s)
{
if (string.IsNullOrEmpty(s))
return isAscending ? int.MaxValue : int.MinValue;
else
return int.Parse(s);
}
}
Your // ALWAYS SEND TO BOTTOM/END OF LIST. branch is being executed when either the x or y parameters are empty, i.e. a non-empty value will be sorted according to this rule if it is being compared to an empty value. You probably want something more like this:
if (!string.IsNullOrEmpty(itemOne.Text) && !string.IsNullOrEmpty(itemTwo.Text))
{
uint itemOneComparison = uint.Parse(itemOne.Text);
uint itemTwoComparison = uint.Parse(itemTwo.Text);
comparison = itemOneComparison.CompareTo(itemTwoComparison);
}
else if (!string.IsNullOrEmpty(itemOne.Text)
{
comparison = -1;
}
else
{
comparison = 1;
}
Always return 1 for your empty x values and -1 for your empty y values. This will mean that the comparer sees empty values as the greater value in all cases so they should end up at the end of the sorted list.
Of course, if both are empty, you should return 0 as they are equal.
else
{
//ALWAYS SEND TO BOTTOM/END OF LIST.
if (string.IsNullOrEmpty(itemOne.Text) && string.IsNullOrEmpty(itemTwo.Text))
{
return 0;
}
else if (string.IsNullOrEmpty(itemOne.Text))
{
return -1;
}
else if (string.IsNullOrEmpty(itemTwo.Text))
{
return 1;
}
}
Is there any simple algorithm to determine the likeliness of 2 names representing the same person?
I'm not asking for something of the level that Custom department might be using. Just a simple algorithm that would tell me if 'James T. Clark' is most likely the same name as 'J. Thomas Clark' or 'James Clerk'.
If there is an algorithm in C# that would be great, but I can translate from any language.
Sounds like you're looking for a phonetic-based algorithms, such as soundex, NYSIIS, or double metaphone. The first actually is what several government departments use, and is trivial to implement (with many implementations readily available). The second is a slightly more complicated and more precise version of the first. The latter-most works with some non-English names and alphabets.
Levenshtein distance is a definition of distance between two arbitrary strings. It gives you a distance of 0 between identical strings and non-zero between different strings, which might also be useful if you decide to make a custom algorithm.
Levenshtein is close, although maybe not exactly what you want.
I've faced similar problem and tried to use Levenstein distance first, but it did not work well for me. I came up with an algorithm that gives you "similarity" value between two strings (higher value means more similar strings, "1" for identical strings). This value is not very meaningful by itself (if not "1", always 0.5 or less), but works quite well when you throw in Hungarian Matrix to find matching pairs from two lists of strings.
Use like this:
PartialStringComparer cmp = new PartialStringComparer();
tbResult.Text = cmp.Compare(textBox1.Text, textBox2.Text).ToString();
The code behind:
public class SubstringRange {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
int start;
public int Start {
get { return start; }
set { start = value; }
}
int end;
public int End {
get { return end; }
set { end = value; }
}
public int Length {
get { return End - Start; }
set { End = Start + value;}
}
public bool IsValid {
get { return MasterString.Length >= End && End >= Start && Start >= 0; }
}
public string Contents {
get {
if(IsValid) {
return MasterString.Substring(Start, Length);
} else {
return "";
}
}
}
public bool OverlapsRange(SubstringRange range) {
return !(End < range.Start || Start > range.End);
}
public bool ContainsRange(SubstringRange range) {
return range.Start >= Start && range.End <= End;
}
public bool ExpandTo(string newContents) {
if(MasterString.Substring(Start).StartsWith(newContents, StringComparison.InvariantCultureIgnoreCase) && newContents.Length > Length) {
Length = newContents.Length;
return true;
} else {
return false;
}
}
}
public class SubstringRangeList: List<SubstringRange> {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
public SubstringRangeList(string masterString) {
this.MasterString = masterString;
}
public SubstringRange FindString(string s){
foreach(SubstringRange r in this){
if(r.Contents.Equals(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public SubstringRange FindSubstring(string s){
foreach(SubstringRange r in this){
if(r.Contents.StartsWith(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public bool ContainsRange(SubstringRange range) {
foreach(SubstringRange r in this) {
if(r.ContainsRange(range))
return true;
}
return false;
}
public bool AddSubstring(string substring) {
bool result = false;
foreach(SubstringRange r in this) {
if(r.ExpandTo(substring)) {
result = true;
}
}
if(FindSubstring(substring) == null) {
bool patternfound = true;
int start = 0;
while(patternfound){
patternfound = false;
start = MasterString.IndexOf(substring, start, StringComparison.InvariantCultureIgnoreCase);
patternfound = start != -1;
if(patternfound) {
SubstringRange r = new SubstringRange();
r.MasterString = this.MasterString;
r.Start = start++;
r.Length = substring.Length;
if(!ContainsRange(r)) {
this.Add(r);
result = true;
}
}
}
}
return result;
}
private static bool SubstringRangeMoreThanOneChar(SubstringRange range) {
return range.Length > 1;
}
public float Weight {
get {
if(MasterString.Length == 0 || Count == 0)
return 0;
float numerator = 0;
int denominator = 0;
foreach(SubstringRange r in this.FindAll(SubstringRangeMoreThanOneChar)) {
numerator += r.Length;
denominator++;
}
if(denominator == 0)
return 0;
return numerator / denominator / MasterString.Length;
}
}
public void RemoveOverlappingRanges() {
SubstringRangeList l = new SubstringRangeList(this.MasterString);
l.AddRange(this);//create a copy of this list
foreach(SubstringRange r in l) {
if(this.Contains(r) && this.ContainsRange(r)) {
Remove(r);//try to remove the range
if(!ContainsRange(r)) {//see if the list still contains "superset" of this range
Add(r);//if not, add it back
}
}
}
}
public void AddStringToCompare(string s) {
for(int start = 0; start < s.Length; start++) {
for(int len = 1; start + len <= s.Length; len++) {
string part = s.Substring(start, len);
if(!AddSubstring(part))
break;
}
}
RemoveOverlappingRanges();
}
}
public class PartialStringComparer {
public float Compare(string s1, string s2) {
SubstringRangeList srl1 = new SubstringRangeList(s1);
srl1.AddStringToCompare(s2);
SubstringRangeList srl2 = new SubstringRangeList(s2);
srl2.AddStringToCompare(s1);
return (srl1.Weight + srl2.Weight) / 2;
}
}
Levenstein distance one is much simpler (adapted from http://www.merriampark.com/ld.htm):
public class Distance {
/// <summary>
/// Compute Levenshtein distance
/// </summary>
/// <param name="s">String 1</param>
/// <param name="t">String 2</param>
/// <returns>Distance between the two strings.
/// The larger the number, the bigger the difference.
/// </returns>
public static int LD(string s, string t) {
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if(n == 0) return m;
if(m == 0) return n;
// Step 2
for(int i = 0; i <= n; d[i, 0] = i++) ;
for(int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for(int i = 1; i <= n; i++) {
//Step 4
for(int j = 1; j <= m; j++) {
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
I doubt there is, considering even the Customs Department doesn't seem to have a satisfactory answer...
If there is a solution to this problem I seriously doubt it's a part of core C#. Off the top of my head, it would require a database of first, middle and last name frequencies, as well as account for initials, as in your example. This is fairly complex logic that relies on a database of information.
Second to Levenshtein distance, what language do you want? I was able to find an implementation in C# on codeproject pretty easily.
In an application I worked on, the Last name field was considered reliable.
So presented all the all the records with the same last name to the user.
User could sort by the other fields to look for similar names.
This solution was good enough to greatly reduce the issue of users creating duplicate records.
Basically looks like the issue will require human judgement.