Code example:
using System;
public class Test {
public static void Main() {
int a = 0;
if(a++ == 0){
Console.WriteLine(a);
}
}
}
In this code the Console will write: 1. I can write this code in another way:
public static void Main() {
int a = 0;
if(a == 0){
a++;
Console.WriteLine(a);
}
}
These two examples work exactly the same (from what I know about postfix).
The problem is with this example coming from the Microsoft tutorials:
using System;
public class Document {
// Class allowing to view the document as an array of words:
public class WordCollection {
readonly Document document;
internal WordCollection (Document d){
document = d;
}
// Helper function -- search character array "text", starting
// at character "begin", for word number "wordCount". Returns
//false if there are less than wordCount words. Sets "start" and
//length to the position and length of the word within text
private bool GetWord(char[] text, int begin, int wordCount,
out int start, out int length) {
int end = text.Length;
int count = 0;
int inWord = -1;
start = length = 0;
for (int i = begin; i <= end; ++i){
bool isLetter = i < end && Char.IsLetterOrDigit(text[i]);
if (inWord >= 0) {
if (!isLetter) {
if (count++ == wordCount) {//PROBLEM IS HERE!!!!!!!!!!!!
start = inWord;
length = i - inWord;
return true;
}
inWord = -1;
}
} else {
if (isLetter) {
inWord = i;
}
}
}
return false;
}
//Indexer to get and set words of the containing document:
public string this[int index] {
get
{
int start, length;
if(GetWord(document.TextArray, 0, index, out start,
out length)) {
return new string(document.TextArray, start, length);
} else {
throw new IndexOutOfRangeException();
}
}
set {
int start, length;
if(GetWord(document.TextArray, 0, index, out start,
out length))
{
//Replace the word at start/length with
// the string "value"
if(length == value.Length){
Array.Copy(value.ToCharArray(), 0,
document.TextArray, start, length);
}
else {
char[] newText = new char[document.TextArray.Length +
value.Length - length];
Array.Copy(document.TextArray, 0, newText, 0, start);
Array.Copy(value.ToCharArray(), 0, newText, start, value.Length);
Array.Copy(document.TextArray, start + length, newText,
start + value.Length, document.TextArray.Length - start - length);
document.TextArray = newText;
}
} else {
throw new IndexOutOfRangeException();
}
}
}
public int Count {
get {
int count = 0, start = 0, length = 0;
while (GetWord(document.TextArray, start + length,
0, out start, out length)) {
++count;
}
return count;
}
}
}
// Class allowing the document to be viewed like an array
// of character
public class CharacterCollection {
readonly Document document;
internal CharacterCollection(Document d) {
document = d;
}
//Indexer to get and set character in the containing
//document
public char this[int index] {
get {
return document.TextArray[index];
}
set {
document.TextArray[index] = value;
}
}
//get the count of character in the containing document
public int Count {
get {
return document.TextArray.Length;
}
}
}
//Because the types of the fields have indexers,
//these fields appear as "indexed properties":
public WordCollection Words;
public readonly CharacterCollection Characters;
private char[] TextArray;
public Document(string initialText) {
TextArray = initialText.ToCharArray();
Words = new WordCollection(this);
Characters = new CharacterCollection(this);
}
public string Text {
get {
return new string(TextArray);
}
}
class Test {
static void Main() {
Document d = new Document(
"peter piper picked a peck of pickled peppers. How many pickled peppers did peter piper pick?"
);
//Change word "peter" to "penelope"
for(int i = 0; i < d.Words.Count; ++i){
if (d.Words[i] == "peter") {
d.Words[i] = "penelope";
}
}
for (int i = 0; i < d.Characters.Count; ++i) {
if (d.Characters[i] == 'p') {
d.Characters[i] = 'P';
}
}
Console.WriteLine(d.Text);
}
}
}
If I change the code marked above to this:
if (count == wordCount) {//PROBLEM IS HERE
start = inWord;
length = i - inWord;
count++;
return true;
}
I get an IndexOutOfRangeException, but I don't know why.
Your initial assumption is incorrect (that the two examples work exactly the same). In the following version, count is incremented regardless of whether or not it is equal to wordCount:
if (count++ == wordCount)
{
// Code omitted
}
In this version, count is ONLY incremented when it is equal to wordCount
if (count == wordCount)
{
// Other code omitted
count++;
}
EDIT
The reason this is causing you a failure is that, when you are searching for the second word (when wordCount is 1), the variable count will never equal wordCount (because it never gets incremented), and therefore the GetWord method returns false, which then triggers the else clause in your get method, which throws an IndexOutOfRangeException.
In your version of the code, count is only being incremented when count == wordCount; in the Microsoft version, it's being incremented whether the condition is met or not.
using System;
public class Test {
public static void Main() {
int a = 0;
if(a++ == 0){
Console.WriteLine(a);
}
}
}
Is not quite the same as:
public static void Main() {
int a = 0;
if(a == 0){
a++;
Console.WriteLine(a);
}
}
In the second case a++ is executed only if a == 0. In the first case a++ is executed every time we check the condition.
There is your mistake:
public static void Main() {
int a = 0;
if(a == 0){
a++;
Console.WriteLine(a);
}
}
It should be like this:
public static void Main() {
int a = 0;
if(a == 0){
a++;
Console.WriteLine(a);
}
else
a++;
}
a gets alwasy increased. This means, that in your code example count will get only increased when count == wordCount (In which case the method will return true anyway...). You basicly never increasing count.
Related
I have to make a string which consists a string like - AAA0009, and once it reaches AAA0009, it will generate AA0010 to AAA0019 and so on.... till AAA9999 and when it will reach to AAA9999, it will give AAB0000 to AAB9999 and so on till ZZZ9999.
I want to use static class and static variables so that it can auto increment by itself on every hit.
I have tried some but not even close, so help me out thanks.
Thanks for being instructive I was trying as I Said already but anyways you already want to put negatives over there without even knowing the thing:
Code:
public class GenerateTicketNumber
{
private static int num1 = 0;
public static string ToBase36()
{
const string base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
var sb = new StringBuilder(9);
do
{
sb.Insert(0, base36[(byte)(num1 % 36)]);
num1 /= 36;
} while (num1 != 0);
var paddedString = "#T" + sb.ToString().PadLeft(8, '0');
num1 = num1 + 1;
return paddedString;
}
}
above is the code. this will generate a sequence but not the way I want anyways will use it and thanks for help.
Though there's already an accepted answer, I would like to share this one.
P.S. I do not claim that this is the best approach, but in my previous work we made something similar using Azure Table Storage which is a no sql database (FYI) and it works.
1.) Create a table to store your running ticket number.
public class TicketNumber
{
public string Type { get; set; } // Maybe you want to have different types of ticket?
public string AlphaPrefix { get; set; }
public string NumericPrefix { get; set; }
public TicketNumber()
{
this.AlphaPrefix = "AAA";
this.NumericPrefix = "0001";
}
public void Increment()
{
int num = int.Parse(this.NumericPrefix);
if (num + 1 >= 9999)
{
num = 1;
int i = 2; // We are assuming that there are only 3 characters
bool isMax = this.AlphaPrefix == "ZZZ";
if (isMax)
{
this.AlphaPrefix = "AAA"; // reset
}
else
{
while (this.AlphaPrefix[i] == 'Z')
{
i--;
}
char iChar = this.AlphaPrefix[i];
StringBuilder sb = new StringBuilder(this.AlphaPrefix);
sb[i] = (char)(iChar + 1);
this.AlphaPrefix = sb.ToString();
}
}
else
{
num++;
}
this.NumericPrefix = num.ToString().PadLeft(4, '0');
}
public override string ToString()
{
return this.AlphaPrefix + this.NumericPrefix;
}
}
2.) Make sure you perform row-level locking and issue an error when it fails.
Here's an oracle syntax:
SELECT * FROM TICKETNUMBER WHERE TYPE = 'TYPE' FOR UPDATE NOWAIT;
This query locks the row and returns an error if the row is currently locked by another session.
We need this to make sure that even if you have millions of users generating a ticket number, it will not mess up the sequence.
Just make sure to save the new ticket number before you perform a COMMIT.
I forgot the MSSQL version of this but I recall using WITH (ROWLOCK) or something. Just google it.
3.) Working example:
static void Main()
{
TicketNumber ticketNumber = new TicketNumber();
ticketNumber.AlphaPrefix = "ZZZ";
ticketNumber.NumericPrefix = "9999";
for (int i = 0; i < 10; i++)
{
Console.WriteLine(ticketNumber);
ticketNumber.Increment();
}
Console.Read();
}
Output:
Looking at your code that you've provided, it seems that you're backing this with a number and just want to convert that to a more user-friendly text representation.
You could try something like this:
private static string ValueToId(int value)
{
var parts = new List<string>();
int numberPart = value % 10000;
parts.Add(numberPart.ToString("0000"));
value /= 10000;
for (int i = 0; i < 3 || value > 0; ++i)
{
parts.Add(((char)(65 + (value % 26))).ToString());
value /= 26;
}
return string.Join(string.Empty, parts.AsEnumerable().Reverse().ToArray());
}
It will take the first 4 characters and use them as is, and then for the remainder of the value if will convert it into characters A-Z.
So 9999 becomes AAA9999, 10000 becomes AAB0000, and 270000 becomes ABB0000.
If the number is big enough that it exceeds 3 characters, it will add more letters at the start.
Here's an example of how you could go about implementing it
void Main()
{
string template = #"AAAA00";
var templateChars = template.ToCharArray();
for (int i = 0; i < 100000; i++)
{
templateChars = IncrementCharArray(templateChars);
Console.WriteLine(string.Join("",templateChars ));
}
}
public static char Increment(char val)
{
if(val == '9') return 'A';
if(val == 'Z') return '0';
return ++val;
}
public static char[] IncrementCharArray(char[] val)
{
if (val.All(chr => chr == 'Z'))
{
var newArray = new char[val.Length + 1];
for (int i = 0; i < newArray.Length; i++)
{
newArray[i] = '0';
}
return newArray;
}
int length = val.Length;
while (length > -1)
{
char lastVal = val[--length];
val[length] = Increment(lastVal);
if ( val[length] != '0') break;
}
return val;
}
I'm working on a hacker rank problem and I believe my solution is correct. However, most of my test cases are being timed out. Could some one suggest how to improve the efficiency of my code?
The important part of the code starts at the "Trie Class".
The exact question can be found here: https://www.hackerrank.com/challenges/contacts
using System;
using System.Collections.Generic;
using System.IO;
class Solution
{
static void Main(String[] args)
{
int N = Int32.Parse(Console.ReadLine());
string[,] argList = new string[N, 2];
for (int i = 0; i < N; i++)
{
string[] s = Console.ReadLine().Split();
argList[i, 0] = s[0];
argList[i, 1] = s[1];
}
Trie trie = new Trie();
for (int i = 0; i < N; i++)
{
switch (argList[i, 0])
{
case "add":
trie.add(argList[i, 1]);
break;
case "find":
Console.WriteLine(trie.find(argList[i, 1]));
break;
default:
break;
}
}
}
}
class Trie
{
Trie[] trieArray = new Trie[26];
private int findCount = 0;
private bool data = false;
private char name;
public void add(string s)
{
s = s.ToLower();
add(s, this);
}
private void add(string s, Trie t)
{
char first = Char.Parse(s.Substring(0, 1));
int index = first - 'a';
if(t.trieArray[index] == null)
{
t.trieArray[index] = new Trie();
t.trieArray[index].name = first;
}
if (s.Length > 1)
{
add(s.Substring(1), t.trieArray[index]);
}
else
{
t.trieArray[index].data = true;
}
}
public int find(string s)
{
int ans;
s = s.ToLower();
find(s, this);
ans = findCount;
findCount = 0;
return ans;
}
private void find(string s, Trie t)
{
if (t == null)
{
return;
}
if (s.Length > 0)
{
char first = Char.Parse(s.Substring(0, 1));
int index = first - 'a';
find(s.Substring(1), t.trieArray[index]);
}
else
{
for(int i = 0; i < 26; i++)
{
if (t.trieArray[i] != null)
{
find("", t.trieArray[i]);
}
}
if (t.data == true)
{
findCount++;
}
}
}
}
EDIT: I did some suggestions in the comments but realized that I can't replace s.Substring(1) with s[0]... because I actually need s[1..n]. AND s[0] returns a char so I'm going to need to do .ToString on it anyways.
Also, to add a little more information. The idea is that it needs to count ALL names after a prefix for example.
Input: "He"
Trie Contains:
"Hello"
"Help"
"Heart"
"Ha"
"No"
Output: 3
I could just post a solution here that gives you 40 points but i guess this wont be any fun.
Make use of Enumerator<char> instead of any string operations
Count while adding values
any charachter minus 'a' makes a good arrayindex (gives you values between 0 and 25)
I think, this link must be very helpfull
There, you can find the good implementation of a trie data structure, but it's a java implementation. I implemented trie on c# with that great article one year ago, and i tried to find solution to an another hackerrank task too ) It was successful.
In my task, i had to a simply trie (i mean my alphabet equals 2) and only one method for adding new nodes:
public class Trie
{
//for integer representation in binary system 2^32
public static readonly int MaxLengthOfBits = 32;
//alphabet trie size
public static readonly int N = 2;
class Node
{
public Node[] next = new Node[Trie.N];
}
private Node _root;
}
public void AddValue(bool[] binaryNumber)
{
_root = AddValue(_root, binaryNumber, 0);
}
private Node AddValue(Node node, bool[] val, int d)
{
if (node == null) node = new Node();
//if least sagnificient bit has been added
//need return
if (d == val.Length)
{
return node;
}
// get 0 or 1 index of next array(length 2)
int index = Convert.ToInt32(val[d]);
node.next[index] = AddValue(node.next[index], val, ++d);
return node;
}
I have a c# class like so
internal class QueuedMinimumNumberFinder : ConcurrentQueue<int>
{
private readonly string _minString;
public QueuedMinimumNumberFinder(string number, int takeOutAmount)
{
if (number.Length < takeOutAmount)
{
throw new Exception("Error *");
}
var queueIndex = 0;
var queueAmount = number.Length - takeOutAmount;
var numQueue = new ConcurrentQueue<int>(number.ToCharArray().Where(m => (int) Char.GetNumericValue(m) != 0).Select(m=>(int)Char.GetNumericValue(m)).OrderBy(m=>m));
var zeroes = number.Length - numQueue.Count;
while (queueIndex < queueAmount)
{
int next;
if (queueIndex == 0)
{
numQueue.TryDequeue(out next);
Enqueue(next);
} else
{
if (zeroes > 0)
{
Enqueue(0);
zeroes--;
} else
{
numQueue.TryDequeue(out next);
Enqueue(next);
}
}
queueIndex++;
}
var builder = new StringBuilder();
while (Count > 0)
{
int next = 0;
TryDequeue(out next);
builder.Append(next.ToString());
}
_minString = builder.ToString();
}
public override string ToString() { return _minString; }
}
The point of the program is to find the minimum possible integer that can be made by taking out any x amount of characters from a string(example 100023 is string, if you take out any 3 letters, the minimum int created would be 100). My question is, is this the correct way to do this? Is there a better data structure that can be used for this problem?
First Edit:
Here's how it looks now
internal class QueuedMinimumNumberFinder
{
private readonly string _minString;
public QueuedMinimumNumberFinder(string number, int takeOutAmount)
{
var queue = new Queue<int>();
if (number.Length < takeOutAmount)
{
throw new Exception("Error *");
}
var queueIndex = 0;
var queueAmount = number.Length - takeOutAmount;
var numQueue = new List<int>(number.Where(m=>(int)Char.GetNumericValue(m)!=0).Select(m=>(int)Char.GetNumericValue(m))).ToList();
var zeroes = number.Length - numQueue.Count;
while (queueIndex < queueAmount)
{
if (queueIndex == 0)
{
var nextMin = numQueue.Min();
numQueue.Remove(nextMin);
queue.Enqueue(nextMin);
} else
{
if (zeroes > 1)
{
queue.Enqueue(0);
zeroes--;
} else
{
var nextMin = numQueue.Min();
numQueue.Remove(nextMin);
queue.Enqueue(nextMin);
}
}
queueIndex++;
}
var builder = new StringBuilder();
while (queue.Count > 0)
{
builder.Append(queue.Dequeue().ToString());
}
_minString = builder.ToString();
}
public override string ToString() { return _minString; }
}
A pretty simple and efficient implementation can be made, once you realize that your input string digits map to the domain of only 10 possible values: '0' .. '9'.
This can be encoded as the number of occurrences of a specific digit in your input string using a simple array of 10 integers: var digit_count = new int[10];
#MasterGillBates describes this idea in his answer.
You can then regard this array as your priority queue from which you can dequeue the characters you need by iteratively removing the lowest available character (decreasing its occurrence count in the array).
The code sample below provides an example implementation for this idea.
public static class MinNumberSolver
{
public static string GetMinString(string number, int takeOutAmount)
{
// "Add" the string by simply counting digit occurrance frequency.
var digit_count = new int[10];
foreach (var c in number)
if (char.IsDigit(c))
digit_count[c - '0']++;
// Now remove them one by one in lowest to highest order.
// For the first character we skip any potential leading 0s
var selected = new char[takeOutAmount];
var start_index = 1;
selected[0] = TakeLowest(digit_count, ref start_index);
// For the rest we start in digit order at '0' first.
start_index = 0;
for (var i = 0; i < takeOutAmount - 1; i++)
selected[1 + i] = TakeLowest(digit_count, ref start_index);
// And return the result.
return new string(selected);
}
private static char TakeLowest(int[] digit_count, ref int start_index)
{
for (var i = start_index; i < digit_count.Length; i++)
{
if (digit_count[i] > 0)
{
start_index = ((--digit_count[i] > 0) ? i : i + 1);
return (char)('0' + i);
}
}
throw new InvalidDataException("Input string does not have sufficient digits");
}
}
Just keep a count of how many times each digit appears. An array of size 10 will do. Count[i] gives the count of digit i.
Then pick the smallest non-zero i first, then pick the smallest etc and form your number.
Here's my solution using LINQ:
public string MinimumNumberFinder(string number, int takeOutAmount)
{
var ordered = number.OrderBy(n => n);
var nonZero = ordered.SkipWhile(n => n == '0');
var zero = ordered.TakeWhile(n => n == '0');
var result = nonZero.Take(1)
.Concat(zero)
.Concat(nonZero.Skip(1))
.Take(number.Length - takeOutAmount);
return new string(result.ToArray());
}
You could place every integer into a list and find all possible sequences of these values. From the list of sequences, you could sort through taking only the sets which have the number of integers you want. From there, you can write a quick function which parses a sequence into an integer. Next, you could store all of your parsed sequences into an array or an other data structure and sort based on value, which will allow you to select the minimum number from the data structure. There may be simpler ways to do this, but this will definitely work and gives you options as far as how many digits you want your number to have.
If I'm understanding this correctly, why don't you just pick out your numbers starting with the smallest number greater than zero. Then pick out all zeroes, then any remaining number if all the zeroes are picked up. This is all depending on the length of your ending result
In your example you have a 6 digit number and you want to pick out 3 digits. This means you'll only have 3 digits left. If it was a 10 digit number, then you would end up with a 7 digit number, etc...
So have an algorithm that knows the length of your starting number, how many digits you plan on removing, and the length of your ending number. Then just pick out the numbers.
This is just quick and dirty code:
string startingNumber = "9999903040404"; // "100023";
int numberOfCharactersToRemove = 3;
string endingNumber = string.Empty;
int endingNumberLength = startingNumber.Length - numberOfCharactersToRemove;
while (endingNumber.Length < endingNumberLength)
{
if (string.IsNullOrEmpty(endingNumber))
{
// Find the smallest digit in the starting number
for (int i = 1; i <= 9; i++)
{
if (startingNumber.Contains(i.ToString()))
{
endingNumber += i.ToString();
startingNumber = startingNumber.Remove(startingNumber.IndexOf(i.ToString()), 1);
break;
}
}
}
else if (startingNumber.Contains("0"))
{
// Add any zeroes
endingNumber += "0";
startingNumber = startingNumber.Remove(startingNumber.IndexOf("0"), 1);
}
else
{
// Add any remaining numbers from least to greatest
for (int i = 1; i <= 9; i++)
{
if (startingNumber.Contains(i.ToString()))
{
endingNumber += i.ToString();
startingNumber = startingNumber.Remove(startingNumber.IndexOf(i.ToString()), 1);
break;
}
}
}
}
Console.WriteLine(endingNumber);
100023 starting number resulted in 100 being the end result
9999903040404 starting number resulted in 3000044499 being the end result
Here's my version to fix this problem:
DESIGN:
You can sort your list using a binary tree , there are a lot of
implementations , I picked this one
Then you can keep track of the number of the Zeros you have in your
string Finally you will end up with two lists: I named one
SortedDigitsList and the other one ZeroDigitsList
perform a switch case to determine which last 3 digits should be
returned
Here's the complete code:
class MainProgram2
{
static void Main()
{
Tree theTree = new Tree();
Console.WriteLine("Please Enter the string you want to process:");
string input = Console.ReadLine();
foreach (char c in input)
{
// Check if it's a digit or not
if (c >= '0' && c <= '9')
{
theTree.Insert((int)Char.GetNumericValue(c));
}
}
//End of for each (char c in input)
Console.WriteLine("Inorder traversal resulting Tree Sort without the zeros");
theTree.Inorder(theTree.ReturnRoot());
Console.WriteLine(" ");
//Format the output depending on how many zeros you have
Console.WriteLine("The final 3 digits are");
switch (theTree.ZeroDigitsList.Count)
{
case 0:
{
Console.WriteLine("{0}{1}{2}", theTree.SortedDigitsList[0], theTree.SortedDigitsList[1], theTree.SortedDigitsList[2]);
break;
}
case 1:
{
Console.WriteLine("{0}{1}{2}", theTree.SortedDigitsList[0], 0, theTree.SortedDigitsList[2]);
break;
}
default:
{
Console.WriteLine("{0}{1}{2}", theTree.SortedDigitsList[0], 0, 0);
break;
}
}
Console.ReadLine();
}
}//End of main()
}
class Node
{
public int item;
public Node leftChild;
public Node rightChild;
public void displayNode()
{
Console.Write("[");
Console.Write(item);
Console.Write("]");
}
}
class Tree
{
public List<int> SortedDigitsList { get; set; }
public List<int> ZeroDigitsList { get; set; }
public Node root;
public Tree()
{
root = null;
SortedDigitsList = new List<int>();
ZeroDigitsList = new List<int>();
}
public Node ReturnRoot()
{
return root;
}
public void Insert(int id)
{
Node newNode = new Node();
newNode.item = id;
if (root == null)
root = newNode;
else
{
Node current = root;
Node parent;
while (true)
{
parent = current;
if (id < current.item)
{
current = current.leftChild;
if (current == null)
{
parent.leftChild = newNode;
return;
}
}
else
{
current = current.rightChild;
if (current == null)
{
parent.rightChild = newNode;
return;
}
}
}
}
}
//public void Preorder(Node Root)
//{
// if (Root != null)
// {
// Console.Write(Root.item + " ");
// Preorder(Root.leftChild);
// Preorder(Root.rightChild);
// }
//}
public void Inorder(Node Root)
{
if (Root != null)
{
Inorder(Root.leftChild);
if (Root.item > 0)
{
SortedDigitsList.Add(Root.item);
Console.Write(Root.item + " ");
}
else
{
ZeroDigitsList.Add(Root.item);
}
Inorder(Root.rightChild);
}
}
I want to select the text that is between the last '{' and '}' of a richtextbox text.
I have the next code, but I have an error on the "LastIndexOf" function and I don't know how to fix it. Can someone give me some help?
private void highlightText()
{
mRtbxOperations.SelectionStart = mRtbxOperations.Text.LastIndexOf(#"{", 1, mRtbxOperations.SelectionStart);
mRtbxOperations.SelectionLength = mRtbxOperations.Text.IndexOf(#"}", mRtbxOperations.SelectionStart, mRtbxOperations.Text.Length - 1);
mRtbxOperations.SelectionBackColor = Color.LightBlue;
mRtbxOperations.SelectionFont = new Font(mRtbxOperations.SelectionFont, FontStyle.Underline);
mRtbxOperations.SelectionLength = 0;
}
LastIndexOf Error:
The count must be positive and must refer to a location within the
string, array or collection. Parameter name: count
You LastIndexOf parameters are messed up, as well as the Length of the selection, where you need to substract the starting point in order to get the proper length.
Try a simpler version:
int textStart = mRtbxOperations.Text.LastIndexOf(#"{",
mRtbxOperations.SelectionStart);
if (textStart > -1) {
int textEnd = mRtbxOperations.Text.IndexOf(#"}", textStart);
if (textEnd > -1) {
mRtbxOperations.Select(textStart, textEnd - textStart + 1);
mRtbxOperations.SelectionBackColor = Color.LightBlue;
}
}
Seems that you're getting out of the text bounds. When you are getting a substring or an index, you always should use the string bounds, or a substring bounds. Also, you need to check that the selection is valid.
I would rewrite your code as follows:
private void highlightText()
{
Selection selection = GetSelection(mRtbxOperations.Text);
if (selection == null)
return;
mRtbxOperations.SelectionStart = selection.Start;
mRtbxOperations.SelectionLength = selection.Length;
mRtbxOperations.SelectionBackColor = Color.LightBlue;
mRtbxOperations.SelectionFont = new Font(mRtbxOperations.SelectionFont,
FontStyle.Underline);
}
private static Selection GetSelection(string text)
{
int sIndex = text.LastIndexOf(#"{");
if (sIndex == -1)
return null;
int eIndex = text.IndexOf(#"}", sIndex);
if (eIndex == -1)
return null;
return new Selection(sIndex + 1, eIndex);
}
public class Selection
{
public int Start { get; set; }
public int End { get; set; }
public int Length
{
get
{
return End - Start;
}
}
public Selection(int startIndex, int endIndex)
{
this.Start = startIndex;
this.End = endIndex;
}
}
Is there any simple algorithm to determine the likeliness of 2 names representing the same person?
I'm not asking for something of the level that Custom department might be using. Just a simple algorithm that would tell me if 'James T. Clark' is most likely the same name as 'J. Thomas Clark' or 'James Clerk'.
If there is an algorithm in C# that would be great, but I can translate from any language.
Sounds like you're looking for a phonetic-based algorithms, such as soundex, NYSIIS, or double metaphone. The first actually is what several government departments use, and is trivial to implement (with many implementations readily available). The second is a slightly more complicated and more precise version of the first. The latter-most works with some non-English names and alphabets.
Levenshtein distance is a definition of distance between two arbitrary strings. It gives you a distance of 0 between identical strings and non-zero between different strings, which might also be useful if you decide to make a custom algorithm.
Levenshtein is close, although maybe not exactly what you want.
I've faced similar problem and tried to use Levenstein distance first, but it did not work well for me. I came up with an algorithm that gives you "similarity" value between two strings (higher value means more similar strings, "1" for identical strings). This value is not very meaningful by itself (if not "1", always 0.5 or less), but works quite well when you throw in Hungarian Matrix to find matching pairs from two lists of strings.
Use like this:
PartialStringComparer cmp = new PartialStringComparer();
tbResult.Text = cmp.Compare(textBox1.Text, textBox2.Text).ToString();
The code behind:
public class SubstringRange {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
int start;
public int Start {
get { return start; }
set { start = value; }
}
int end;
public int End {
get { return end; }
set { end = value; }
}
public int Length {
get { return End - Start; }
set { End = Start + value;}
}
public bool IsValid {
get { return MasterString.Length >= End && End >= Start && Start >= 0; }
}
public string Contents {
get {
if(IsValid) {
return MasterString.Substring(Start, Length);
} else {
return "";
}
}
}
public bool OverlapsRange(SubstringRange range) {
return !(End < range.Start || Start > range.End);
}
public bool ContainsRange(SubstringRange range) {
return range.Start >= Start && range.End <= End;
}
public bool ExpandTo(string newContents) {
if(MasterString.Substring(Start).StartsWith(newContents, StringComparison.InvariantCultureIgnoreCase) && newContents.Length > Length) {
Length = newContents.Length;
return true;
} else {
return false;
}
}
}
public class SubstringRangeList: List<SubstringRange> {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
public SubstringRangeList(string masterString) {
this.MasterString = masterString;
}
public SubstringRange FindString(string s){
foreach(SubstringRange r in this){
if(r.Contents.Equals(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public SubstringRange FindSubstring(string s){
foreach(SubstringRange r in this){
if(r.Contents.StartsWith(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public bool ContainsRange(SubstringRange range) {
foreach(SubstringRange r in this) {
if(r.ContainsRange(range))
return true;
}
return false;
}
public bool AddSubstring(string substring) {
bool result = false;
foreach(SubstringRange r in this) {
if(r.ExpandTo(substring)) {
result = true;
}
}
if(FindSubstring(substring) == null) {
bool patternfound = true;
int start = 0;
while(patternfound){
patternfound = false;
start = MasterString.IndexOf(substring, start, StringComparison.InvariantCultureIgnoreCase);
patternfound = start != -1;
if(patternfound) {
SubstringRange r = new SubstringRange();
r.MasterString = this.MasterString;
r.Start = start++;
r.Length = substring.Length;
if(!ContainsRange(r)) {
this.Add(r);
result = true;
}
}
}
}
return result;
}
private static bool SubstringRangeMoreThanOneChar(SubstringRange range) {
return range.Length > 1;
}
public float Weight {
get {
if(MasterString.Length == 0 || Count == 0)
return 0;
float numerator = 0;
int denominator = 0;
foreach(SubstringRange r in this.FindAll(SubstringRangeMoreThanOneChar)) {
numerator += r.Length;
denominator++;
}
if(denominator == 0)
return 0;
return numerator / denominator / MasterString.Length;
}
}
public void RemoveOverlappingRanges() {
SubstringRangeList l = new SubstringRangeList(this.MasterString);
l.AddRange(this);//create a copy of this list
foreach(SubstringRange r in l) {
if(this.Contains(r) && this.ContainsRange(r)) {
Remove(r);//try to remove the range
if(!ContainsRange(r)) {//see if the list still contains "superset" of this range
Add(r);//if not, add it back
}
}
}
}
public void AddStringToCompare(string s) {
for(int start = 0; start < s.Length; start++) {
for(int len = 1; start + len <= s.Length; len++) {
string part = s.Substring(start, len);
if(!AddSubstring(part))
break;
}
}
RemoveOverlappingRanges();
}
}
public class PartialStringComparer {
public float Compare(string s1, string s2) {
SubstringRangeList srl1 = new SubstringRangeList(s1);
srl1.AddStringToCompare(s2);
SubstringRangeList srl2 = new SubstringRangeList(s2);
srl2.AddStringToCompare(s1);
return (srl1.Weight + srl2.Weight) / 2;
}
}
Levenstein distance one is much simpler (adapted from http://www.merriampark.com/ld.htm):
public class Distance {
/// <summary>
/// Compute Levenshtein distance
/// </summary>
/// <param name="s">String 1</param>
/// <param name="t">String 2</param>
/// <returns>Distance between the two strings.
/// The larger the number, the bigger the difference.
/// </returns>
public static int LD(string s, string t) {
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if(n == 0) return m;
if(m == 0) return n;
// Step 2
for(int i = 0; i <= n; d[i, 0] = i++) ;
for(int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for(int i = 1; i <= n; i++) {
//Step 4
for(int j = 1; j <= m; j++) {
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
I doubt there is, considering even the Customs Department doesn't seem to have a satisfactory answer...
If there is a solution to this problem I seriously doubt it's a part of core C#. Off the top of my head, it would require a database of first, middle and last name frequencies, as well as account for initials, as in your example. This is fairly complex logic that relies on a database of information.
Second to Levenshtein distance, what language do you want? I was able to find an implementation in C# on codeproject pretty easily.
In an application I worked on, the Last name field was considered reliable.
So presented all the all the records with the same last name to the user.
User could sort by the other fields to look for similar names.
This solution was good enough to greatly reduce the issue of users creating duplicate records.
Basically looks like the issue will require human judgement.