System of the Unicode Box Drawing table

System of the Unicode Box Drawing table - c#

I'm implementing a function in C# where providing from which side goes what kind of line, it will return one character from the Box Drawing table (0x2500-0x257F) from Unicode. However I've failed (yet) to find a system in the position of these characters in the table, that would make a significantly simpler function, then assigning all possible input to an output in one enormous if-then-else block.
I've noted that there are 9 different line styles (thin, double, thick, double-dashed, triple-dashed, quad-triple-dashed, thick double-dashed, ...) in that table, and with the four direction, with the "no line" information makes 10 different states, which would make up to 9999 different combination not including the "none of the side has a line" case, which in my case would be a space character.
The easiest way I've found to implement this, is to make one freakin' huge array containing all 10000 possible outcome, (where the first digit notes North, the second East, then South and West) but I believe that this is actually the second worst case scenario I've found, and there is a much more elegant solution. (BTW This would be hilarious if you're not planning on implement it this way. That is how I feel about this anyways.)
This question is probably not suitable here, but considering the size of this task, I even take that risk:
Is there a system how the Box Drawing table arranges the characters, and/or is there a simpler algorithm that does the exact same I would like to do?

The simplest/shortes solution I see, needs an array/list of 128 elements.
You declare a struct/class like this:
// I use consts instead of enum to shorten the code below
const int thin = 1;
const int double = 2;
const int thick = 3;
... // other line styles
struct BoxDrawingChar{
int UpLine, DownLine, LeftLine, RightLine;
BoxDrawingChar(int UpLine, int DownLine, int LeftLine, int RightLine)
{ ... }
};
Then you describe appearance of each character:
BoxDrawingChar[] BoxDrawingCharList =
{
new BoxDrawingChar(0, 0, thin, thin), // 0x2500
new BoxDrawingChar(0, 0, thick, thick), // 0x2501
...
new BoxDrawingChar(...), // 0x257F
}
Then your function will be quite simple:
int GetCharCode(int UpLine, int DownLine, int LeftLine, int RightLine)
{
for(int i = 0; i < BoxDrawingCharList.Length; ++i){
BoxDrawingChar ch = BoxDrawingCharList[i];
if (ch.UpLine == UpLine && ch.DownLine == DownLine && ...)
return i + 0x2500;
}
return 0;
}
Of course you can add diagonal lines, rounded angles etc and refactor the code in many ways. I gave only a general idea.

Related

Unity formatting multiple numbers

So I'm a complete newb to unity and c# and I'm trying to make my first mobile incremental game. I know how to format a variable from (e.g.) 1000 >>> 1k however I have several variables that can go up to decillion+ so I imagine having to check every variable's value seperately up to decillion+ will be quite inefficient. Being a newb I'm not sure how to go about it, maybe a for loop or something?
EDIT: I'm checking if x is greater than a certain value. For example if it's greater than 1,000, display 1k. If it's greater than 1,000,000, display 1m...etc etc
This is my current code for checking if x is greater than 1000 however I don't think copy pasting this against other values would be very efficient;
if (totalCash > 1000)
{
totalCashk = totalCash / 1000;
totalCashTxt.text = "$" + totalCashk.ToString("F1") + "k";
}

So, I agree that copying code is not efficient. That's why people invented functions!
How about simply wrapping your formatting into function, eg. named prettyCurrency?
So you can simply write:
totalCashTxt.text = prettyCurrency(totalCashk);
Also, instead of writing ton of ifs you can handle this case with logarithm with base of 10 to determine number of digits. Example in pure C# below:
using System.IO;
using System;
class Program
{
// Very simple example, gonna throw exception for numbers bigger than 10^12
static readonly string[] suffixes = {"", "k", "M", "G"};
static string prettyCurrency(long cash, string prefix="$")
{
int k;
if(cash == 0)
k = 0; // log10 of 0 is not valid
else
k = (int)(Math.Log10(cash) / 3); // get number of digits and divide by 3
var dividor = Math.Pow(10,k*3); // actual number we print
var text = prefix + (cash/dividor).ToString("F1") + suffixes[k];
return text;
}
static void Main()
{
Console.WriteLine(prettyCurrency(0));
Console.WriteLine(prettyCurrency(333));
Console.WriteLine(prettyCurrency(3145));
Console.WriteLine(prettyCurrency(314512455));
Console.WriteLine(prettyCurrency(31451242545));
}
}
OUTPUT:
$0.0
$333.0
$3.1k
$314.5M
$31.5G
Also, you might think about introducing a new type, which implements this function as its ToString() overload.
EDIT:
I forgot about 0 in input, now it is fixed. And indeed, as #Draco18s said in his comment nor int nor long will handle really big numbers, so you can either use external library like BigInteger or switch to double which will lose his precision when numbers becomes bigger and bigger. (e.g. 1000000000000000.0 + 1 might be equal to 1000000000000000.0). If you choose the latter you should change my function to handle numbers in range (0.0,1.0), for which log10 is negative.

How to generalize my algorithm to detect if one string is a rotation of another

So I've been going through various problems to review for upcoming interviews and one I encountered is determining whether two strings are rotations of each other. Obviously, I'm hardly the first person to solve this problem. In fact, I did discover that my idea for solving this seems similar to the approach taken in this question.
Full disclosure: I do have a related question on Math SE that's focused on the properties from a more mathematical perspective (although it's worth noting that the way that I tried to formulate the ideas behind this there end up being incorrect for reasons that are explained there).
Here's the idea (and this is similar to the approach taken in the linked question): suppose you have a string abcd and the rotation cdab. Clearly, both cd and ab are substrings of cdab, but if you concatenate them together you get abcd.
So basically, a rotation simply entails moving a substring from the end of the string to the beginning (e.g. we constructed cdab from abcd by moving cd from the end of the string to the beginning of the string).
I came up with an approach that works in a very restricted case (if both of the substrings consist of consecutive letters, like they do in the example there), but it fails otherwise (and I give an example of passing and failing cases and inputs/outputs below the code). I'm trying to figure out if it's possible (or even worthwhile) to try to fix it to work in the general case.
public bool AreRotations(string a, string b)
{
if (a == null)
throw new ArgumentNullException("a");
else if (b == null)
throw new ArgumentNullException("b");
else if (a.Trim().Length == 0)
throw new ArgumentException("a is empty or consists only of whitespace");
else if (b.Trim().Length == 0)
throw new ArgumentException("b is empty or consists only of whitespace");
// Obviously, if the strings are of different lengths, they can't possibly be rotations of each other
if (a.Length != b.Length)
return false;
int[] rotationLengths = new int[a.Length];
/* For rotations of length -2, -2, -2, 2, 2, 2, the distinct rotation lengths are -2, 2
*
* In the example I give below of a non-working input, this contains -16, -23, 16, 23
*
* On the face of it, that would seem like a useful pattern, but it seems like this
* could quickly get out of hand as I discover more edge cases
*/
List<int> distinctRotationLengths = new List<int>();
for (int i = 0; i < a.Length; i++)
{
rotationLengths[i] = a[i] - b[i];
if (i == 0)
distinctRotationLengths.Add(rotationLengths[0]);
else if (rotationLengths[i] != rotationLengths[i - 1])
{
distinctRotationLengths.Add(rotationLengths[i]);
}
}
return distinctRotationLengths.Count == 2;
}
And now for the sample inputs/outputs:
StringIsRotation rot = new StringIsRotation();
// This is the case that doesn't work right - it gives "false" instead of "true"
bool success = rot.AreRotations("acqz", "qzac");
// True
success = rot.AreRotations("abcdef", "cdefab");
// True
success = rot.AreRotations("ablm", "lmab");
// False, but should be true - this is another illustration of the bug
success = rot.AreRotations("baby", "byba");
// True
success = rot.AreRotations("abcdef", "defabc");
//True
success = rot.AreRotations("abcd", "cdab");
// True
success = rot.AreRotations("abc", "cab");
// False
success = rot.AreRotations("abcd", "acbd");
// This is an odd situation - right now it returns "false" but you could
// argue about whether that's correct
success = rot.AreRotations("abcd", "abcd");
Is it possible/worthwhile to salvage this approach and have it still be O(n), or should I just go with one of the approaches described in the post I linked to? (Note that this isn't actually production code or homework, it's purely for my own learning).
Edit: For further clarification based on the comments, there are actually two questions here - first, is this algorithm fixable? Secondly, is it even worth fixing it (or should I just try another approach like one described in the answers or the other question I linked to)? I thought of a few potential fixes but they all involved either inelegant special-case reasoning or making this algorithm O(n^2), both of which would kill the point of the algorithm in the first place.

Let suppose the first string is S and the second is S', clearly if they have different length then we output they are not a rotation of each other. Create a string S''=SS. In fact concatenation of S to itself. Then if S,S' are rotation of each other we find a substring S' in S'' by KMP Algorithm in O(n), otherwise we output they are not a rotation of each other. BTW if you are looking for a fast practical algorithm then instead of KMP use Boyer Moore algorithm.
To address the question more explicit, I'd say that I don't expect an easy algorithm for this special case of string matching problem. So having this background in mind, I don't think an easy modification on your algorithm can work. In fact the field of string matching algorithms is very well developed. If there is a somewhat simpler algorithm than sth like KMP or suffix tree based algorithms, for this special case, then still I think studying those general algorithms can help.

Would something like this work?:
private bool IsRotation(string a, string b)
{
if (a.Length != b.Length) { return false; }
for (int i = 0; i < b.Length; ++i)
{
if (GetCharactersLooped(b, i).SequenceEqual(a))
{
return true;
}
}
return false;
}
private IEnumerable<char> GetCharactersLooped(string data, int startPos)
{
for (int i = startPos; i < data.Length; ++i)
{
yield return data[i];
}
for (int i = 0; i < startPos; ++i)
{
yield return data[i];
}
}
P.S. This will return true for abcd = abcd, since you could consider it a full rotation. If this is not desired, change the start of the loop from 0 to 1 in the first function.

If you're looking just for a method that will check if a string is a rotation of another string, this is a C# implementation of the function you linked (which as far as I know is about the fastest way to solve this particular problem):
bool IsRotation(string a, string b)
{
if (a == null || b == null || a.Length != b.Length)
return false;
return (a + a).Contains(b);
}
If you're asking for feedback on your algorithm, I'm not sure I understand what your algorithm is trying to do. It seems like you are trying to detect a rotation by storing the difference of the char values in the string and seeing if they sum to 0? Or if the list of unique differences contains mirror pairs (pairs (x,y) where x = -y)? Or simply if the number of unique differences is even? Or something else entirely that I am missing from your description?
I'm not sure if what you're doing can be generalized, simply because it depends so heavily on the characters within the words that it may not adequately check for the order in which they are presented. And even if you could, it would be a scholarly exercise only, as the above method will be far faster and more efficient than your method could ever be.

Loop through every possible combination of values in a BitArray

I'm trying to solve a larger problem. As part of this, I have created a BitArray to represent a series of binary decisions taken sequentially. I know that all valid decision series will have half of all decisions true, and half of all false, but I don't know the order:
ttttffff
[||||||||]
Or:
tftftftf
[||||||||]
Or:
ttffttff
[||||||||]
Or any other combination where half of all bits are true, and half false.
My BitArray is quite a bit longer than this, and I need to move through each set of possible decisions (each possible combination of half true, half false), making further checks on their validity. I'm struggling to conceptually work out how to do this with a loop, however. It seems like it should be simple, but my brain is failing me.
EDIT: Because the BitArray wasn't massive, I used usr's suggestion and implemented a bitshift loop. Based on some of the comments and answers, I re-googled the problem with the key-word "permutations" and found this Stack Overflow question which is very similar.

I'd do this using a recursive algorithm. Each level sets the next bit. You keep track of how many zeroes and ones have been decided already. If one of those counters goes above N / 2 you abort the branch and backtrack. This should give quite good performance because it will tend to cut off infeasible branches quickly. For example, after setting tttt only f choices are viable.
A simpler, less well-performing, version would be to just loop through all possible N-bit integers using a for loop and discarding the ones that do not fulfill the condition. This is easy to implement for up to 63 bits. Just have a for loop from 0 to 1 << 63. Clearly, with high bitcounts this is too slow.
You are looking for all permutations of N / 2 zeroes and N / 2 ones. There are algorithms for generating those. If you can find one implemented this should give the best possible performance. I believe those algorithms use clever math tricks to only visit viable combinations.

If you're OK with using the bits in an integer instead of a BitArray, this is a general solution to generate all patterns with some constant number of bits set.
Start with the lowest valid value, which is with all the ones at the right side of the number, which you can calculate as low = ~(-1 << k) (doesn't work for k=32, but that's not an issue in this case).
Then take Gosper's Hack (also shown in this answer), which is a way to generate the next highest integer with equally many bits set, and keep applying it until you reach the highest valid value, low << k in this case.

This will result in duplicates, but you could check for duplicates before adding to the List if you want to.
static void Main(string[] args)
{
// Set your bits here:
bool[] bits = { true, true, false };
BitArray original_bits = new BitArray(bits);
permuteBits(original_bits, 0, original_bits.Length - 1);
foreach (BitArray ba in permutations)
{
// You can check Validity Here
foreach (bool i in ba)
{
Console.Write(Convert.ToInt32(i));
}
Console.WriteLine();
}
}
static List<BitArray> permutations = new List<BitArray>();
static void permuteBits(BitArray bits, int minIndex, int maxIndex)
{
int current_index;
if (minIndex == maxIndex)
{
permutations.Add(new BitArray(bits));
}
else
{
for (current_index = minIndex; current_index <= maxIndex; current_index++)
{
swap(bits, minIndex, current_index);
permuteBits(bits, minIndex + 1, maxIndex);
swap(bits, minIndex, current_index);
}
}
}
private static void swap(BitArray bits, int i, int j)
{
bool temp = bits[i];
bits[i] = bits[j];
bits[j] = temp;
}

If you want to clear the concept of finding all the permutation of a string having duplicates entry(i.e zeros and ones), you can read this article
They have used recursive solution to solve this problem and the explanation is also good.

A more elegant way to write decision making code which evaluates multiple inputs with different priorities?

I'm writing some decision-making AI for a game, and I've come up with the following piece of code.
if(pushedLeft && leftFree && leftExists)
GoLeft();
else if(pushedRight && rightFree && rightExists)
GoRight();
else if(leftFree && leftExists)
GoLeft();
else if(rightFree && rightExists)
GoRight();
else if(pushedLeft && leftExists)
GoLeft();
else if(pushedRight && rightExists)
GoRight();
else if(leftExists)
GoLeft();
else if(rightExists)
GoRight();
// else do nothing...
That's a pretty long stream of if statements, with similar conditionals!
Note that it makes this nice pattern:
L1 L2 L3 -> L
R1 R2 R3 -> R
L2 L3 -> L
R2 R3 -> R
L1 L3 -> L
R1 R3 -> R
L3 -> L
R3 -> R
(nothing) -> 0
The aim of this code is to decide whether the object should move left or right (or not at all), based on some incoming state information. Each piece of information has a different priority. I could write it in an ordered list like this:
Highest Priority
----------------
Don't ever move into an invalid space
Prefer to move into an unoccupied space
Prefer to move in the push direction
Prefer to move left
----------------
Lowest Priority
It seems obvious that adding additional information inputs upon which to make this decision will double the number of conditionals. And doubling the number of potential values for those inputs (eg: allowing up/down/left/right) will double the number of conditionals as well. (So this is n×m2 conditionals, right?)
So my question is:
Is there a nice, satisfying, elegant way to code this?
I'm thinking that there must be a nice "n×m" way to do it (edit: I had "n+m" here originally, but that seems impossible as there are n×m input conditions). Something that is applicable to both my code here, and to the problem in general?
Preferably something that will perform just as well or better than the conditional version above. Ideally something that avoids heap allocations - important for use in game development scenarios (although these can always be optimised away with caching and the like, if necessary).
And also: Are there any "Googleable terms" for this problem? I suspect that this is not an uncommon problem - but I don't know of a name for it.
Update: An idea thanks to Superpig's answer, is to calculate a score for the various options. Something like this:
int nothingScore = 1 << 4;
int leftScore = (1 << 1) + (pushedLeft ? 1 << 2 : 0) + (leftFree ? 1 << 3 : 0) + (leftExists ? 1 << 5 : 0);
int rightScore = (pushedRight ? 1 << 2 : 0) + (rightFree ? 1 << 3 : 0) + (rightExists ? 1 << 5 : 0);
There's certianly a nicer way to write the scoring code (and alternate ways to score it, too). And then there's still the matter of selecting what to do once the score is calculated. And, of course, there may be a better method entirely that doesn't involve scoring.
Update 2: I've posted and accepted my own answer here (because Superpig's isn't a complete solution, and so far no other answer is even remotely on the right track). Rather than scoring the various outputs, I've chosen an option-elimination approach using a bit-field. This allows a decision to be made using only a single integer for memory.

This is essentially a classification problem; you want something like a decision tree (or behaviour tree). You're trying to take a bunch of discrete inputs for the situation (validity, freeness, push direction, etc) and classify the result as "up, down, left or right."
I suspect that if you want something of greater or equal performance to the long chain of if statements - at least in terms of instruction count / number of comparisons done - then you will have to make your comparisons in the manner you're doing there. Approaches like calculating a score for all directions and then checking the maximum, or recursively partitioning a list of moves into preferred and non-preferred, will all end up doing more work than a pure comparison sequence.
You could just build a lookup table, I think. You've got 4 bits indicating whether a direction is valid, 4 bits indicating whether a direction is occupied, and 2 bits indicating the push direction, for 10 bits in total - so that's 1024 different situations, and the behaviour in each one can be described with just 2 bits (so, 1 byte) - making the total table size 1024 bytes.
A single entry would be a structure like this:
union DecisionSituation
{
unsigned short Index;
struct
{
bool ValidLeft : 1;
bool ValidRight : 1;
bool ValidUp : 1;
bool ValidDown : 1;
bool OccupiedLeft : 1;
bool OccupiedRight : 1;
bool OccupiedUp : 1;
bool OccupiedDown : 1;
Direction PushDirection : 2;
} Flags;
}
You'd describe your situation by filling out the flags in that structure, and then reading the 'Index' value to get your lookup table index.
Edit: Also, regarding your scoring function, because you're doing strict bit-patterns, I think you can skip all the ternary operators:
int leftScore = (leftExists << 4) | (leftFree << 3) | (pushedLeft << 2) | 1;
int rightScore = (rightExists << 4) | (rightFree << 3) | (pushedRight << 2) | 0;
// Find the highest scoring direction here
// If none of the scores are at least (1 << 4) it means none of them existed
if(highest score < (1 << 4)) return nothing;
// otherwise just return the highest scoring direction

The most important thing is to have the code that declares what the inputs are and their relative priorities be simple, short and elegant. Here is one way to write that code:
PreferencedDecisionMaker pdm = new PreferencedDecisionMaker();
pdm.Push(false, leftExists, rightExists, upExists, downExists);
pdm.Push(0);
pdm.Push(false, leftFree, rightFree, upFree, downFree );
pdm.Push(false, pushedLeft, pushedRight, pushedUp, pushedDown);
pdm.Push(1);
switch(pdm.Decision)
{
case 1: GoLeft(); break;
case 2: GoRight(); break;
case 3: GoUp(); break;
case 4: GoDown(); break;
}
Here the inputs are declared in essentially a tabular format. The priority of each input is defined by the ordering of the rows. Each column corresponds to a possible output.
The "complexity" of this code is n×m.
(Although I've used indentation to make this look like a table, more complicated input conditions won't allow each row to exist neatly on a single line. This doesn't matter: the important bit is that there are only n×m declarations. Being able to make it look like a table when the conditions are short is just a nice bonus.)
Less important is the actual behind-the-scenes code to make the decision (the PreferencedDecisionMaker type). There are a few ways to calculate the best output decision based on priority. Superpig suggested scoring, which is good. But I've ended up going for an option-elimination approach using a bit-field. I've posted my code for this below.
Using a bit-field has the big advantage of not needing to allocate heap memory for arrays. The only downside is that it's limited to 32 options.
The following code hasn't been thoroughly tested. And I haven't filled out all 32 versions of the Push method. It uses a mutable struct, which is "naughty" - converting it to an immutable struct should be straightforward. Or you could make it a class - but then you lose the benefit of avoiding heap allocation.
struct PreferencedDecisionMaker
{
private uint availableOptionsBits;
private static readonly int[] MultiplyDeBruijnBitPosition = {
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
public int Decision
{
get
{
uint v = availableOptionsBits;
// Find position of lowest set bit in constant time
// http://stackoverflow.com/a/757266/165500
return MultiplyDeBruijnBitPosition[((uint)((v & -v) * 0x077CB531U)) >> 27];
}
}
private void InternalPush(uint preference)
{
if(availableOptionsBits == 0)
availableOptionsBits = preference;
else
{
uint combinedBits = availableOptionsBits & preference;
if(combinedBits != 0)
availableOptionsBits = combinedBits;
}
}
public void Push(int option)
{
if(option < 0 || option >= 32) throw new ArgumentOutOfRangeException("Option must be between 0 and 31");
InternalPush(1u << option);
}
// ... etc ...
public void Push(bool p0, bool p1, bool p2, bool p3, bool p4) { InternalPush((p0?1u:0u) | ((p1?1u:0u)<<1) | ((p2?1u:0u)<<2) | ((p3?1u:0u)<<3) | ((p4?1u:0u)<<4)); }
// ... etc ...
}

When you have a bunch of if statements, usually they can be refactored using polymorphism combined with the state pattern:
As an introduction, please watch the following video from Misko Hevery (you will love it)
http://www.youtube.com/watch?v=4F72VULWFvc&feature=player_embedded#!
This is a summary from the presentation:
Most if's can be replaced by polymorphism (subclassing)
This is desirable because:
Functions without ifs are easier to read than with ifs
Functions without ifs are easier to test
Related branches of ifs end up in the same subclass
Use Polymorphism (Subclasses)
If you are checking an object should behave differently based on its state
If you have to check the same if condition in multiple places
Use if
Bounds checking primitive objects (>,<, ==, !=)
... other uses but today we focus on avoiding if
Polymorphic solution is often better because
New behavior can be added without having the original source code
Each operation / concern is separated in a separate file
Makes it easy to test / understand
EDIT
At first sight using the state pattern with polymorphism, the solution would look more complex because it implies you will need more classes than before, but the trade-off is much much better, as long as you start writing tests for this kind of code you will find that's easier to test, easier to read and understand, and therefore, easier to maintain and extend (right now you just posted about movement to the right or left, but imagine if later you need to move up and down, if you do not refactor your code now, adding new functionality will be a real PITA)
You would have something like:
// represents the current position
class MyClass
{
public int X;
public int Y;
}
abstract class NodeMoving
{
abstract void Move();
abstract bool IsValid(MyClass myclass);
}
abstract class NodeMovingLeft : NodeMoving
{
override void Move()
{
// add code to move left
if(this.IsValid(MyClass myclass))
{
// move
}
}
}
abstract class NodeMovingRight : NodeMoving
{
override void Move()
{
// add code to move right
if(this.IsValid(MyClass myclass))
{
// move
}
}
}
// and then start modeling the different states
class RightFree : NodeMovingRight
{
override bool IsValid(MyClass myclass)
{
// add condition to validate if the right is free
}
}
// combining conditions
class PushedLeft : NodeMovingLeft
{
override bool IsValid(MyClass myclass)
{
// code to determine if it has been pushed to the left
}
}
class LeftFree : PushedLeft
{
override bool IsValid(MyClass myclass)
{
// get condition to indicate if the left is free
var currentCondition = GetCondition();
// combining the conditions
return currentCondition && base.IsValid(myClass);
}
}
You will need to add the properties needed in order to calculate the conditions and perform the movement
It's worth to note how small the methods are, (yes you will have more than before) but they can be tested in isolation really easy
Edit 2 -- adding priority
Well now that we have a simple state machine, we need to evaluate the priorities, one way to do it (and I would like to hear ways to improve it) is by using a priority queue:
http://www.codeproject.com/Articles/13295/A-Priority-Queue-in-C
It would look something like:
// you could place the priorities in a service to reuse it
var priorities = new HashSet<NodeMoving>();
priorities.Add(new RightExists());
priorities.Add(new PushedLeft());
var currentPosition = new MyClass { X = 1, Y = 2 };
foreach (var priority in priorities)
{
if (priority.IsValid(currentPosition))
{
priority.Move();
break;
}
}
// output is: RightExists
// now changing the priority order
foreach (var priority in priorities.Reverse())
{
if (priority.IsValid(currentPosition))
{
priority.Move();
break;
}
}
// output is: PushedLeft

Use the state pattern. Draw a state diagram that contains all your different states and the allowed transitions between them. When you code it have a state subclass for each node. Each state node / class will decide on the inputs and make drive the appropriate transition to the next allowed state. I don't think you can avoid the multiplicity of states you mention.

Would this not work:
if(!leftExists) {
if(rightExists) {
GoRight();
} else {
// Panic!
}
} else if(!rightExists) {
GoLeft();
} else if(rightFree || leftFree && !(rightFree && leftFree)) {
if(rightFree) {
GoRight();
} else {
GoLeft();
}
} else if(pushedRight) {
// Assumption: pushedLeft and pushedRight cannot both be true?
GoRight();
} else {
// PushedLeft == true, or we just prefer to move left by default
GoLeft();
}
Right now, it is a similar amount of code, but the difference being that we've eliminated the common conditions, so adding additional conditions no longer affects each branch - just insert it at the desired priority.

I'd stick with a modification of your solution.
Have you thought about making GoLeft into a function which also return whether or not left exists? Unless it's a complicated function, and you are calling this a LOT and testing shows that it needs to be optimized, that's what I'd do.
If you do that, then this becomes the following. It's basically what you're doing, but easier to read.
I'm sure I'll get downvotes for this since it's not OO and doesn't use the command pattern, but it does answer the question, and it is easy to read ;) For a more complicated problem, I'd think about using those answers, but for this particular problem, I would stick with a simple answer.
if(pushedLeft && leftFree && GoLeft()) ;
else if(pushedRight && rightFree && GoRight()) ;
else if(leftFree && GoLeft()) ;
else if(rightFree && GoRight()) ;
else if(pushedLeft && GoLeft()) ;
else if(pushedRight && GoRight()) ;
else if(GoLeft()) ;
else if(GoRight()) ;
// else do nothing...

To get a grip on this, i recommend the following measures:
decide which of the statements that are combined in the if statements weigh more
untangle the if statements, resulting in nested if statements like
if (moveRight)
{
if (pushedleft)
{
/// ...
}
}
when you have this, start packing all condidional logic into methods, e.g. HandleMoveRight
having done this, you can start extracting classes, ending up in a command pattern.
Now you have no more problems in adding extra functionality and complexity is flat.

I think the state pattern is the right approach as recommended by #john2lob. You also want some sort of decision tree approach to the determine the next transition on actions of the events 'push' / 'free' / 'exists'. BTW: what's the difference between 'free' and 'exists'?

Looking for a way to optimize this algorithm for parsing a very large string

The following class parses through a very large string (an entire novel of text) and breaks it into consecutive 4-character strings that are stored as a Tuple. Then each tuple can be assigned a probability based on a calculation. I am using this as part of a monte carlo/ genetic algorithm to train the program to recognize a language based on syntax only (just the character transitions).
I am wondering if there is a faster way of doing this. It takes about 400ms to look up the probability of any given 4-character tuple. The relevant method _Probablity() is at the end of the class.
This is a computationally intensive problem related to another post of mine: Algorithm for computing the plausibility of a function / Monte Carlo Method
Ultimately I'd like to store these values in a 4d-matrix. But given that there are 26 letters in the alphabet that would be a HUGE task. (26x26x26x26). If I take only the first 15000 characters of the novel then performance improves a ton, but my data isn't as useful.
Here is the method that parses the text 'source':
private List<Tuple<char, char, char, char>> _Parse(string src)
{
var _map = new List<Tuple<char, char, char, char>>();
for (int i = 0; i < src.Length - 3; i++)
{
int j = i + 1;
int k = i + 2;
int l = i + 3;
_map.Add
(new Tuple<char, char, char, char>(src[i], src[j], src[k], src[l]));
}
return _map;
}
And here is the _Probability method:
private double _Probability(char x0, char x1, char x2, char x3)
{
var subset_x0 = map.Where(x => x.Item1 == x0);
var subset_x0_x1_following = subset_x0.Where(x => x.Item2 == x1);
var subset_x0_x2_following = subset_x0_x1_following.Where(x => x.Item3 == x2);
var subset_x0_x3_following = subset_x0_x2_following.Where(x => x.Item4 == x3);
int count_of_x0 = subset_x0.Count();
int count_of_x1_following = subset_x0_x1_following.Count();
int count_of_x2_following = subset_x0_x2_following.Count();
int count_of_x3_following = subset_x0_x3_following.Count();
decimal p1;
decimal p2;
decimal p3;
if (count_of_x0 <= 0 || count_of_x1_following <= 0 || count_of_x2_following <= 0 || count_of_x3_following <= 0)
{
p1 = e;
p2 = e;
p3 = e;
}
else
{
p1 = (decimal)count_of_x1_following / (decimal)count_of_x0;
p2 = (decimal)count_of_x2_following / (decimal)count_of_x1_following;
p3 = (decimal)count_of_x3_following / (decimal)count_of_x2_following;
p1 = (p1 * 100) + e;
p2 = (p2 * 100) + e;
p3 = (p3 * 100) + e;
}
//more calculations omitted
return _final;
}
}
EDIT - I'm providing more details to clear things up,
1) Strictly speaking I've only worked with English so far, but its true that different alphabets will have to be considered. Currently I only want the program to recognize English, similar to whats described in this paper: http://www-stat.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf
2) I am calculating the probabilities of n-tuples of characters where n <= 4. For instance if I am calculating the total probability of the string "that", I would break it down into these independent tuples and calculate the probability of each individually first:
[t][h]
[t][h][a]
[t][h][a][t]
[t][h] is given the most weight, then [t][h][a], then [t][h][a][t]. Since I am not just looking at the 4-character tuple as a single unit, I wouldn't be able to just divide the instances of [t][h][a][t] in the text by the total no. of 4-tuples in the next.
The value assigned to each 4-tuple can't overfit to the text, because by chance many real English words may never appear in the text and they shouldn't get disproportionally low scores. Emphasing first-order character transitions (2-tuples) ameliorates this issue. Moving to the 3-tuple then the 4-tuple just refines the calculation.
I came up with a Dictionary that simply tallies the count of how often the tuple occurs in the text (similar to what Vilx suggested), rather than repeating identical tuples which is a waste of memory. That got me from about ~400ms per lookup to about ~40ms per, which is a pretty great improvement. I still have to look into some of the other suggestions, however.

In yoiu probability method you are iterating the map 8 times. Each of your wheres iterates the entire list and so does the count. Adding a .ToList() ad the end would (potentially) speed things. That said I think your main problem is that the structure you've chossen to store the data in is not suited for the purpose of the probability method. You could create a one pass version where the structure you store you're data in calculates the tentative distribution on insert. That way when you're done with the insert (which shouldn't be slowed down too much) you're done or you could do as the code below have a cheap calculation of the probability when you need it.
As an aside you might want to take puntuation and whitespace into account. The first letter/word of a sentence and the first letter of a word gives clear indication on what language a given text is written in by taking punctuaion charaters and whitespace as part of you distribution you include those characteristics of the sample data. We did that some years back. Doing that we shown that using just three characters was almost as exact (we had no failures with three on our test data and almost as exact is an assumption given that there most be some weird text where the lack of information would yield an incorrect result). as using more (we test up till 7) but the speed of three letters made that the best case.
EDIT
Here's an example of how I think I would do it in C#
class TextParser{
private Node Parse(string src){
var top = new Node(null);
for (int i = 0; i < src.Length - 3; i++){
var first = src[i];
var second = src[i+1];
var third = src[i+2];
var fourth = src[i+3];
var firstLevelNode = top.AddChild(first);
var secondLevelNode = firstLevelNode.AddChild(second);
var thirdLevelNode = secondLevelNode.AddChild(third);
thirdLevelNode.AddChild(fourth);
}
return top;
}
}
public class Node{
private readonly Node _parent;
private readonly Dictionary<char,Node> _children
= new Dictionary<char, Node>();
private int _count;
public Node(Node parent){
_parent = parent;
}
public Node AddChild(char value){
if (!_children.ContainsKey(value))
{
_children.Add(value, new Node(this));
}
var levelNode = _children[value];
levelNode._count++;
return levelNode;
}
public decimal Probability(string substring){
var node = this;
foreach (var c in substring){
if(!node.Contains(c))
return 0m;
node = node[c];
}
return ((decimal) node._count)/node._parent._children.Count;
}
public Node this[char value]{
get { return _children[value]; }
}
private bool Contains(char c){
return _children.ContainsKey(c);
}
}
the usage would then be:
var top = Parse(src);
top.Probability("test");

I would suggest changing the data structure to make that faster...
I think a Dictionary<char,Dictionary<char,Dictionary<char,Dictionary<char,double>>>> would be much more efficient since you would be accessing each "level" (Item1...Item4) when calculating... and you would cache the result in the innermost Dictionary so next time you don't have to calculate at all..

Ok, I don't have time to work out details, but this really calls for
neural classifier nets (Just take any off the shelf, even the Controllable Regex Mutilator would do the job with way more scalability) -- heuristics over brute force
you could use tries (Patricia Tries a.k.a. Radix Trees to make a space optimized version of your datastructure that can be sparse (the Dictionary of Dictionaries of Dictionaries of Dictionaries... looks like an approximation of this to me)

There's not much you can do with the parse function as it stands. However, the tuples appear to be four consecutive characters from a large body of text. Why not just replace the tuple with an int and then use the int to index the large body of text when you need the character values. Your tuple based method is effectively consuming four times the memory the original text would use, and since memory is usually the bottleneck to performance, it's best to use as little as possible.
You then try to find the number of matches in the body of text against a set of characters. I wonder how a straightforward linear search over the original body of text would compare with the linq statements you're using? The .Where will be doing memory allocation (which is a slow operation) and the linq statement will have parsing overhead (but the compiler might do something clever here). Having a good understanding of the search space will make it easier to find an optimal algorithm.
But then, as has been mentioned in the comments, using a 264 matrix would be the most efficent. Parse the input text once and create the matrix as you parse. You'd probably want a set of dictionaries:
SortedDictionary <int,int> count_of_single_letters; // key = single character
SortedDictionary <int,int> count_of_double_letters; // key = char1 + char2 * 32
SortedDictionary <int,int> count_of_triple_letters; // key = char1 + char2 * 32 + char3 * 32 * 32
SortedDictionary <int,int> count_of_quad_letters; // key = char1 + char2 * 32 + char3 * 32 * 32 + char4 * 32 * 32 * 32
Finally, a note on data types. You're using the decimal type. This is not an efficient type as there is no direct mapping to CPU native type and there is overhead in processing the data. Use a double instead, I think the precision will be sufficient. The most precise way will be to store the probability as two integers, the numerator and denominator and then do the division as late as possible.

The best approach here is to using sparse storage and pruning after each each 10000 character for example. Best storage strucutre in this case is prefix tree, it will allow fast calculation of probability, updating and sparse storage. You can find out more theory in this javadoc http://alias-i.com/lingpipe/docs/api/com/aliasi/lm/NGramProcessLM.html

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.