Short Id generation and maximum number of combinations

Short Id generation and maximum number of combinations - c#

Okay this is probably more a maths question but since its related to programming and my web application i'll ask here first:
I'm trying to create short id's that are 8 characters long . The "pool" to draw the id from is a combination of numbers, upper and lower case letters.
string charPool = "ABCDEFGOPQRSTUVWXY1234567890ZabcdefghijklmHIJKLMNnopqrstuvwxyz"
And if you're interested here's the method:
private string GenerateRandomCode(int length)
{
string charPool = "ABCDEFGOPQRSTUVWXY1234567890ZabcdefghijklmHIJKLMNnopqrstuvwxyz";
StringBuilder rs = new StringBuilder();
for (int i = 0; i < length; i++)
{
rs.Append(charPool[(int)(_random.NextDouble() * charPool.Length)]);
}
return rs.ToString();
}
How many possible combinations are there for 8 character id's? Grateful if you can post the equation as well :)
Thanks

options per slot ^ number of slots = number of combinations
a-z is 26, times 2 (for uppers as well) is 52, plus 10 (0-9) is 62. Each ID is 8 chars long, so the result is 62^8, which is pretty big:
218,340,105,584,896 possible unique ID's
I would suggest doing:
_random.Next(charPool.Length - 1)
(and saving charPool.Length - 1 in a variable outside of the loop), instead of:
_random.NextDouble() * charPool.Length
Because you might get an exact 1.0 with .nextDouble(), which means you will be accessing the array at an index that equals the length, and you will get IndexOutOfRangeException.

Related

Calculate number of distinct values between two numbers at a given precision

Context: I am building a random-number generating user interface where a user can enter values for the following:
lowerLimit: the lower limit for each randomly generated number
upperLimit: the upper limit for each randomly generated number
maxPrecision: the maximum precision each randomly generated number
Quantity: the maximum number of random number values to be generated
The question is: how can I ensure that at a given lowerLimit/upperLimit range and at a given precision, that the user does not request a greater quantity than is possible?
Example:
lowerLimit: 1
upperLimit: 1.01
maxPrecision: 3
Quantity: 50
At this precision level (3), there are 11 possible values between 1 and 1.01: 1.000, 1.001, 1.002, 1.003, 1.004, 1.005, 1.006, 1.007, 1.008, 1.009, 1.100, yet the user is asking for the top 50.
In one version of the function that returns only distinct values that match user criteria, I am using a dictionary object to store already-generated values and if the value already exists, try another random number until I have found X distinct random number values where X is the user-desired quantity. The problem is, my logic allows for a never-ending loop if the number of possible values is less than the user-entered quantity.
While I could probably employ logic to detect runaway condition, I thought it would be a nicer approach to somehow calculate the quantity of possible return values in advance to make sure it is possible. But that logic is eluding me. (Haven't tried anything because I can't think of how to do it).
Please note: I did see question Generating random, unique values C# but is does not address the specifics of my question relating to number of possible values at a given precision and subsequent runaway condition.
private Random RandomSeed = new Random();
public double GetRandomDouble(double lowerBounds, double upperBounds, int maxPrecision)
{
//Return a randomly-generated double between lowerBounds and upperBounds
//with maximum precision of maxPrecision
double x = (RandomSeed.NextDouble() * ((upperBounds - lowerBounds))) + lowerBounds;
return Math.Round(x, maxPrecision);
}
public double[] GetRandomDoublesUnique(double lowerBounds, double upperBounds, int maxPrecision, int quantity)
{
//This method returns an array of doubles containing randomly-generated numbers
//between user-entered lowerBounds and upperBounds with a maximum precision of
//maxPrecision. The array size is capped at user-entered quantity.
//Create Dictionary to store number values already generated so we can ensure
//we don't have duplicates
Dictionary<double, int> myDoubles = new Dictionary<double, int>();
double[] returnValues = new double[quantity];
double nextValue;
for (int i = 0; i < quantity; i++)
{
nextValue = GetRandomDouble(lowerBounds, upperBounds, maxPrecision);
if (!myDoubles.ContainsKey(nextValue))
{
myDoubles.Add(nextValue, i);
returnValues[i] = nextValue;
}
else
{
i -= 1;
}
}
return returnValues;
}

Number of items can be computed by just subtracting "position" of first from last (pseudo-code below, use Math.Pow to compute 10^x):
(int)(last * 10 ^ precision) - (int)(first * 10 ^ precision)
This may need to be adjusted depending on whether you want boundaries and whether you take decimal (precise) or float/double as input - some +/-1 and Math.Round may need to be sprinkled in to get desired results for all expected values.
After you get number of items there are essentially two cases
there are significantly more choices that desired results (i.e. 1 to 100, take 5 random numbers) - use code you have to filter out duplicates.
there the number of choices is close or less than desired number of results (i.e. 1 to 10, return 11 random numbers) - pre-generate the list of all value and shuffle.
Experiment with the boundary between "significantly more" and "close" - I'd use 25% as boundary ( i.e. 1 to 100, take 76 - use shuffling) to avoid excessive retires close to the end (which is exact reason of slowness/infinite retries of basic approach).
Correct implementation of shuffle is in Randomize a List<T> (check out similar posts like Generating random, unique values C# for more discussion).

The easiest way would probably be to convert the values to integers by multiplying them by 10 ^ precision and then subtract
int lowerInt = (int)(lower * (decimal)Math.Pow(10, precision));
int higherInt = (int)(higher * (decimal)Math.Pow(10, precision));
int possibleValues = higherInt - lowerInt + 1
I feel like it would defeat the purpose of you project to require the user to know how many possible values there are in advance, since it seems like thats what they are hitting this function for in the first place. I'm assuming that requirement was just to alleviate the technical issues you were having. You can just change your loop to this now
for (int i = 0; i < possibleValues; i++)

This is what worked based on Josh Williard's answer.
public double[] GetRandomDoublesUnique(double lowerBounds, double upperBounds, int maxPrecision, int quantity)
{
if (lowerBounds >= upperBounds)
{
throw new Exception("Error in GetRandomDoublesUnique is: LowerBounds is greater than UpperBounds!");
}
//These next few lines are for the purpose of determining the maximum possible number of return values
//possibleValues is populated to prevent a runaway condition that could occurs if the
//max possible values--at the given precision level--is less than the user-selected quantity.
//i.e. if user selects 1 to 1.01, precision of 3, and quantity of 50, there would be a problem
// if we didn't limit loop to the 11 possible values at precision of 3:
//1.000, 1.001, 1.002, 1.003, 1.004, 1.005, 1.006, 1.007, 1.008, 1.009, 1.010
int lowerInt = (int)(lowerBounds * (double)Math.Pow(10, maxPrecision));
int higherInt = (int)(upperBounds * (double)Math.Pow(10, maxPrecision));
int possibleValues = higherInt - lowerInt + 1;
//Create Dictionary to store number values already generated so we can ensure
//we don't have duplicates
Dictionary<double, int> myDoubles = new Dictionary<double, int>();
double[] returnValues = new double[(quantity>possibleValues?possibleValues:quantity)];
double NextValue;
//Iterate through and generate values--limiting to both the user-selected quantity and # of possible values
for (int i = 0; (i < quantity)&&(i<possibleValues); i++)
{
NextValue = GetRandomDouble(lowerBounds, upperBounds, maxPrecision);
if (!myDoubles.ContainsKey(NextValue))
{
myDoubles.Add(NextValue, i);
returnValues[i] = NextValue;
}
else
{
i -= 1;
}
}
return returnValues;
}

Algorithm for generating all combinations with 2 potential values in 5 variables

Apologies if this has been answered before but I can't come up with a good name to search for what I'm looking for. I have the potential for between 1-5 string variables (we'll call them A,B,C,D,E) that can have one of two values represented by 'P' and 'S'. These are for pluralized and singular word forms
The data will always be in the same order, ABCDE, so that is not a concern but it may not contain all five (could be only A, AB, ABC or ABCD). I'm looking for an algorithm that will handle that possibility while generating all potential plural/singular combinations. So in the case of a 5 variable string the results would be:
SSSSS,
SPSSS,
SPPSS,
SPSPS,
...
PPPPP
I have the logic to pluralize and to store the data it's just a question of what is the logic that will generate all those combinations. If it matters, I am working in C#. Any help would be greatly appreciated!

So there are only two possible values, 0 and 1. Wait a minute... Zeroes and ones... Why does that sound familiar...? Ah, binary to the rescue!
Let's count a little in binary, starting with 0.
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
...etc
If you look at the rightmost bit of the first two rows, we have all the possible combinations for 1 bit, 0 and 1.
If you then look at the two rightmost bits of the first four rows, you get all 2 bit combinations: 00, 01, 10 and 11.
The first eight rows have all three bit combinations, etc.
If you want all possible combinations of x bits, count all numbers from 0 to (2^x)-1 and look at the last x bits of the numbers written in binary.
(Likewise, if you instead have three possible values (0, 1 and 2), you can count between 0 and (3^x)-1 and look at the last x digits when written in ternary, and so on for all possible amounts of values.)

"Recursive permutations C#" will do the trick for a google search. But I thought I'd attempt a solution for you using simple counting and bit masking. Here is some code that will do "binary" counting and, using bitshifting, determine if the position in the words should be pluralized (you mention you have those details already):
string input = "red bag";
string[] tokens = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string[] test = new string[tokens.Length];
int size = (int)Math.Pow(tokens.Length, 2);
for (int i = 0; i < size; i++)
{
for (int j = 0; j < tokens.Length; j++)
{
int mask = (1 << j);
if ((mask & i) != 0)
{
test[j] = Pluralize(tokens[j]);
}
else
{
test[j] = Singularize(tokens[j]);
}
}
Console.WriteLine(string.Join(" ", test));
}
Output:
red bag
reds bag
red bags
reds bags

I would advise a recursive algorithm. For example an algorithm like this could be the answer to your problem (I dont really know what returnvalues you exactly want)
public void getAllWords(ref List<string> result, string Prefix, int wordLength)
{
if(wordLength == 0)
result.add(prefix);
else
{
getAllWords(result, prefix+"0", wordLength-1);
getAllWords(result, prefix+"1", wordLength-1);
}
}
to be called with
List<string> result = new List<string>();
getAllWords(result, "", 5);
I hope this works, I'm on mobile at the moment.
You can change that as you want to account for m a different alphabet (for example values 0,1,2..) as you like.

You can enumerate all integers from 0 to 2^5-1 (i.e. from 0 to 31 ) and represent each integer as bool[]. May be this will be helpful:
static bool[][] GetCombinations(int wordCount) {
int length = (int) Math.Pow(2, wordCount);
bool[][] res = new bool[length][];
for (int i = 0; i < length; i++)
{
res [i] = new bool[wordCount];
for (int j = 0; j < wordCount; j++) {
res [i] [j] = ((i & (int)Math.Pow (2, j)) != 0);
}
}
return res;
}

Roman numbers subtract without conversion

Is possible to subtract roman numbers without conversion to decimal numbers?
For Example:
X - III = VII
So in input I have X and III. In output I have VII.
I need algorithm without conversion to decimal number.
Now I don't have an idea.

The most simple algorithm will be to create -- function for Romans. Subtracting A-B means repeating simultaneous A-- and B--, until having nothing in B.
But I wanted to do something more effective
The Roman numbers can be looked at as positional in some very weak way. We'll use it.
Let's make short tables of substraction:
X-V=V
X-I=IX
IX-I=VIII
VIII-I=VII
VII-I=VI
VI-I=V
V-I=IV
IV-I=III
III-I=II
II-I=I
I-I=_
And addition:
V+I=VI
And the same for CLX and MDC levels. Of course, you could create only one table, but to use it on different levels by substitution of letters.
Let's take numbers, for example, A=MMDCVI=2606 a B=CCCXLIII=343
Lets distribute them into levels=powers of 10. The several following operations will be inside levels only.
A=MM+DC+VI, B=CCC+XL+III
Then subtracting
A-B= MM+(DC-CCC)+(-XL)+(VI-III)
At the every level we have three possible letter: units, five-units and ten-units. The combinations (unit, five-units) and (unit, ten-unit) will be translated into differences
A-B= MM+(DC-CCC)+(-L+X)+(VI-III)
The normal combinations (where senior symbol is before junior one), will be translated into sums.
A-B= MM+(D+C-C-C-C)+(-L+X)+(V+I-I-I-I)
Shorten the combinations of same symbols
A-B= MM+(D-C-C)+(-L+X)+(V-I-I)
If some level is negative, borrow a unit from the senior level. Of course, it could work through empty level.
A-B= MM+(D-C-C-C)+(C-L+X)+(V-I-I)
Now, in every level we'll apply the subtraction table we have made, subtracting every minused symbol, strarting from the top of the table and repeating it until no minused members remain.
A-B= MM+(CD-C-C)+(L+X)+(IV-I)
A-B= MM+(CCC-C)+(L+X)+(III)
A-B= MM+(CC)+(L+X)+(III)
Now, use the addition table
A-B= MM+(CC)+(LX)+(III)
Now, we'll open the parenthesis. If there is '_' in some level, there will be nothing on its place.
A-B=MMCCLXIII =2263
The result is correct.

There is a more elegant solution than simply unrolling the whole roman number. The disadvantage of this would be a complexity in O(n) as opposed to O(log n) where n is the input number.
I found this task quite interesting. It is indeed possible without a conversion. Basically, you just have look at the last digit. If they match, take them away, if not, replace the bigger one. However, the whole task gets a lot more complicated by numbers like "IV", because you need a lookahead.
Here is the code. Since this is most likely a homework assignment, I took out some code so you have to think for yourself, how the rest should look like.
private static char[] romanLetters = { 'I', 'V', 'X', 'L', 'C', 'D', 'M' };
private static string[] vals = { "IIIII", "VV", "XXXXX", "LL", "CCCCC", "DD" };
static string RomanSubtract(string a, string b)
{
var _a = new StringBuilder(a);
var _b = new StringBuilder(b);
var aIndex = a.Length - 1;
var bIndex = b.Length - 1;
while (_a.Length > 0 && _b.Length > 0)
{
if (characters match)
{
if (lookahead for a finds a smaller char)
{
aIndex = ReplaceRomans(_a, aIndex, aChar);
continue;
}
if (lookahead for b finds a smaller char)
{
bIndex = ReplaceRomans(_b, bIndex, bChar);
continue;
}
_a.Remove(aIndex, 1);
_b.Remove(bIndex, 1);
aIndex--;
bIndex--;
}
else if (aChar > bChar)
{
aIndex = ReplaceRomans(_a, aIndex, aChar);
}
else
{
bIndex = ReplaceRomans(_b, bIndex, bChar);
}
}
return _a.Length > 0 ? _a.ToString() : "-" + _b.ToString();
}
private static int ReplaceRomans(StringBuilder roman, int index, int charIndex)
{
if (index > 0)
{
var beforeChar = Array.IndexOf(romanLetters, roman[index - 1]);
if (beforeChar < charIndex)
{
Replace e.g. IX with VIIII
}
}
Replace e.g. V with IIIII
}

Apart from checking every possible combination of input numbers - assuming the input is bounded - there is no way to do what you're asking. Roman numerals are awful in terms of mathematical operations.
You could write an algorithm that doesn't convert them, but it'd have to use decimal numbers at some point. Or you could normalize them to e.g. "IIIII...", but again you'd need to write some equivalences like "50 chars = L".

Rough idea:
Create a "map" or list of how each roman numeral relates to simpler numerals, for instance IV corresponds to (II + II), while V corresponds to (III + II), and X corresponds to (V + V).
When calculating e.g. X - III, treat this not as a mathematical term, but a string, which can be changed in several steps, where you each time check for something to remove from both sides of the minus operator:
x - III // Nothing to remove
(V + V) - III // Still nothing to remove
(III + II + III + II) - III // NOW we can remove a "III" from both sides
// while still treating these as roman numerals.
Result: III + II + II
Rejoined: V + II = VII.
If you make each number correspond to something as simple as possible in the "map" (e.g. III can correspond to (II + I), so you don't get stuck with left-overs), then I'm pretty sure you can figure out some kind of solution here.
Of course this requires a bunch of string-operations, comparisons, and a map from which your algorithm can "know" how to compare or switch values. Not exactly traditional maths, but then again, I suppose this is how roman numerals do work.

The basic sketch of my idea is to build up simple converters that chain together via either iterators or observables.
So, for instance, on the input side of things you have a CConverter that performs the transormations of the combinations CD, CM, D and M into CCCC, CCCCCCCCC, CCCCC, and CCCCCCCCCC respectively. All other received inputs are passed through unmolested. Then the next converter in line XConverter converts XL, XC, L and X into the appropriate number of Xs, and so on until you just have a stream of all Is.
Then you perform the subtraction by consuming both of these streams of Is, in lockstep. If the minuend runs out first, then the answer is 0 or negative, in which case everything has gone wrong. Otherwise, when the subtrahend runs out, you just start emitting all remaining Is from the minuend.
Now you need to convert back. So the first INormalizer queues up Is until it's received five of them, then it emits a V. If it reaches the end of the stream and it received four, then it emits IV. Otherwise it just emits as many Is as it received until the end of the stream, and then ends its own stream.
Next, the VNormalizer queues up Vs until it's received two, and then emits an X. If it receives an IV and it has one queued V then it emits IX, otherwise it emits IV.
And if the stream it's receiving ends or just starts sending Is and it still has a V queued, then it emits that, then whatever else the sending stream wanted to send, and then ends its own stream.
And so on, building back up into the correct roman numerals.

Parse the input strings to group the digits in the mixed 5/10 base (M, D, C, L, X, I). I.e. MMXVII yields MM||||X|V|II.
Now subtract from right to left, by canceling the digits in pairs. I.e. V|III - II = V|II - I = V|I.
When required, do a borrow, i.e. split the next highest digit (V splits to IIIII, X to VV...). Example: V|I - III = V| - II = IIIII - II = III. Borrows may need to be recursive, like X||I - III = X|| - II = VV| - II = V|IIIII - II = V|III.
The prefix notation (IV, IX, XL, XC...) makes it a little more complicated. An approach is to preprocess the string to remove them on input (substitute with IIII, VIIII, XXXX, LXXXX...) and postprocess to restore them on output.
Example:
XCIX - LVI = LXXXXVIIII - LVI = L|XXXX|V|IIII - L|V|I = L|XXXX|V|III - L|V| = L|XXXX||III - L|| = XXXX||III = XXXXXIII = XLIII
Pure character processing, no arithmetic involved.
Digits= "MDCLXVI"
Divided= ["DD", "CCCCC", "LL", "XXXXX", "VV", "IIIII"]
def In(Input):
return Input.replace("CM", "DCCCC").replace("CD", "CCCC").replace("XC", "LXXXX").replace("XL", "XXXX").replace("IX", "VIIII").replace("IV", "IIII")
def Group(Input):
Groups= []
for Digit in Digits:
# Split after the last digit
m= Input.rfind(Digit) + 1
Groups.append(Input[:m])
Input= Input[m:]
return Groups
def Decrement(A, i):
if len(A[i]) == 0:
# Borrow
Decrement(A, i - 1)
A[i]= Divided[i - 1] + A[i]
A[i]= A[i][:-1]
def Subtract(A, B):
for i in range(len(Digits) - 1, -1, -1):
while len(B[i]) > 0:
Decrement(A, i)
B[i]= B[i][:-1]
def Out(Input):
return Input.replace("DCCCC", "CM").replace("CCCC", "CD").replace("LXXXX", "XC").replace("XXXX", "XL").replace("VIIII", "IX").replace("IIII", "IV")
A= Group(In("MMDCVI"))
B= Group(In("CCCXLIII"))
Subtract(A, B)
print Out("".join(A))
>>>
MMCCLXIII

How about an Enum?
public enum RomanNumber
{
I = 1,
II = 2,
III = 3,
IV = 4,
V = 5,
VI = 6,
VII = 7,
VIII = 8,
IX = 9
X = 10
}
Then using it like this:
int newRomanNumber = (int) RomanNumber.X - (int) RomanNumber.III
If your input is 'X - III = VII', then you will also have to parse this string.
But I won't do this work for you. ;-)

Does a 2-char check digit for a barcode use the first or second char?

Based on my understanding of how check digits are supposed to be calculated for barcodes, namely:
0) Sum the values of all the characters at odd indexes (1, 3, etc.)
1) Multiply that sum by 3
2) Sum the values of all the characters at even indexes (o, 2, etc.)
3) Combine the two sums from steps 1 and 2
4) Calculate the check digit by subtracting the modulus 10 of the combined sum from 10
So for example, with a barcode "04900000634" the combined sum is 40*; To get the check sum, the modulus (40 % 10) == 0, and then 10 - 0 == 10.
Odd characters == 7; X3 = 21; Even characters == 19, for a combined sum of 40.
Since a check digit is a scalar value, what if the result of the check digit calculation is 10? Does one use "0" or "1"?
Here is the code I'm using (thanks to some help from here: Why does 1 + 0 + 0 + 0 + 3 == 244?); I'm assuming that the formula pseudocoded above applies regardless of the length (8 chars, 12 chars, etc.) and type (128, EAN8, EAN12, etc.) of the barcode.
private void button1_Click(object sender, EventArgs e)
{
string barcodeWithoutCzechSum = textBox1.Text.Trim();
string czechSum = GetBarcodeChecksum(barcodeWithoutCzechSum);
string barcodeWithCzechSum = string.Format("{0}{1}", barcodeWithoutCzechSum, czechSum);
label1.Text = barcodeWithCzechSum;
}
public static string GetBarcodeChecksum(string barcode)
{
int oddTotal = sumOddVals(barcode);
int oddTotalTripled = oddTotal*3;
int evenTotal = sumEvenVals(barcode);
int finalTotal = oddTotalTripled + evenTotal;
int czechSum = 10 - (finalTotal % 10);
return czechSum.ToString();
}
private static int sumEvenVals(string barcode)
{
int cumulativeVal = 0;
for (int i = 0; i < barcode.Length; i++)
{
if (i%2 == 0)
{
cumulativeVal += Convert.ToInt16(barcode[i] - '0');
}
}
return cumulativeVal;
}
private static int sumOddVals(string barcode)
{
int cumulativeVal = 0;
for (int i = 0; i < barcode.Length; i++)
{
if (i % 2 != 0)
{
cumulativeVal += Convert.ToInt16(barcode[i] - '0');
}
}
return cumulativeVal;
}
UPDATE
The calculator here: http://www.gs1us.org/resources/tools/check-digit-calculator claims that the check digit for 04900000634 is 6
How is that being arrived at?
UPDATE 2
This http://www.gs1.org/barcodes/support/check_digit_calculator
revises my understanding of the last part of the equation/formula, where it says, "Subtract the sum from nearest equal or higher multiple of ten = 60- 57 = 3 (Check Digit)"
So, in the case of 04900000634, the combined sum is 40. Based on that formula, the "nearest equal or higher multiple of ten" of 40 is 40, so 40-40=0, and I would expect that to be the check sum (not 6)...so, still confused...
UPDATE 3
I'm not understanding why yet, but mike z's comment must be correct, because when I reverse the "==" and "!=" logic in the sumOddVals() and sumEvenVals() functions, my results correspond to those generated by http://www.gs1us.org/resources/tools/check-digit-calculator
UPDATE 4
Apparently, based on http://en.wikipedia.org/wiki/European_Article_Number, the powers that be behind check digit calculations don't consider the first position to be position 0, but position 1. Confusing for developers, trained to see the first item as residing at index 0, not 1!

The check digit is always last.
Starting with the digit immediately to the left of the check digit and moving LEFT, sum each digit, applying a weight of 3 and 1 alternately.
The check digit is then the number which needs to be added to produce a result that is a multiple of 10.
This works for ALL EAN/UPC codes - UPC-E, EAN-8 (which is all valid 8-digit codes except those whoch start 0,6 or 7) UPC-A (12-digit), EAN-13, EAN-14 (sometimes call "TUN" or "Carton" codes) and SSCCs (actually 18-digit, but implemented as part of the EAN128 standard with an AI of '00', misleading some into believing they're 20-digit codes)
When UPC-E was introduced, the original scheme was [language][company][product][check]. 0,6 and 7 were assigned to English and the remainder unassigned. [company] and [product] were variable-length with total 6 digits; short company numbers for companies with many products, long for companies with few products.
EAN used the remainder of the numbers, but assigned [country][company][product][check] where country was 2-digit.
That system soon ran out of puff, but is still occasionally assigned for very small products - and the original products that had numbers before UPC-A/EAN-13 was introduced.
UPC-A used the same schema as UPC-E, but lost the reference to 'language'. 0,6 and 7 were assigned to US/Canada. The company+product was extended to 10 digits.
EAN-13 extended the scheme to 13 digits, 2 for country, 10 for company+product, 1 to check. UPC-A was compatible by prefixing a leading "0".
By implementing the 13-digit scheme, US companies could track each of these codes and UPC-As did not need to be issued on products that already had an EAN-13 assigned. This was scheduled for completion about 8 years ago, but some companies still lag behind.
EAN-14s are used for carton outers. The leading digit is normall referred to as a "Trade Unit Identifier/Number" Hence the entire code is sometimes called a TUN. At first, there was an attempt to codify the leading digit (1=1doz, 2=2doz, etc.) but this was soon abandoned. Most companies use the number as a packaging level (1=cluster of individual items, 2=tray of clusters, 3=box of trays - depending on each company's preference. 9 is reserved. Not a good idea to use 0 (though some companies have) since it produces the same check-digit as the 13-digit code. I've used this for EAN128 codes bearing the batch number on non-retail goods; AI=01;EAN-14 (=EAN13 with TUN=0);AI=10;batch-number.
SSCCs are another can of worms. They're 18-digit - the first digit was originally used as a logistical descriptor, then there's the country-code, manufacturer-code and package-number with a check-digit. Originally, "3" meant an "external" pallet and "4" an "Internal" pallet, but this fell into disuse as impractical as an "Internal" pallet then has to be re-numbered if it gets sent "outside" and vice-versa.
And of course 2-digit country-codes have been supplanted by 3-digit as more countries have adopted the system.

There are different weights for different barcode formats. You have described the format for the EAN format - a 1313 weighting. Whereas UPC uses a 3131 weighting scheme. ISBN-10 uses a completely different scheme - the weights are different and the calculation is done modulo 11.
I think the reference you are using is assuming that the digits are indexed starting at 1 not 0. The effect is that you have mixed up odd and even characters. So the sum is 3 x 19 + 7 = 64 and therefore the check digit is 6 not 0. For EAN and UPC, the check digit is the value that must be added to the sum to get a number evenly divisible by 10.
Update
Your description of the check digit algorithm is accurate only for certain classes of EAN barcodes because the weights are aligned such that the last digit is always weighted by 3 (see EAN Number). Therefore, depending on the exact EAN scheme (8,12,13 or 14 digit) odd or even digits are weighted differently.
Thus the proper weights are
0 4 9 0 0 0 0 0 6 3 4
3 1 3 1 3 1 3 1 3 1 3
Giving a sum of 64 and a check digit of 6.

Based on this: http://www.gs1.org/barcodes/support/check_digit_calculator, barcode calculation formulas can either start with 1, or start with 3, based on whether the ultimate length of the barcode is even (including the checksum val) or add. If the total number of chars, including the checksum, is even, the 1st digit has a weight of three; otherwise (total char count is odd), the 1st digit has a weight of 1. In either case, 3s and 1s alternate, as "13131313..." or "31313131..."
But they always seem to end with a weight of 3; so, it shouldn't matter how long the barcode is, or whether it is odd or even. Simply calculate the value "backwards," assuming the last digit has a weight of 3; HOWEVER, whether the barcode is of even or odd length, that is to say, whether the last digit and those that alternate with it are even or odd makes all the difference in the world, so that has to be noted, too. The "inside" ordinals begin with the penultimate character in the barcode, and skip one backwards; the "outside" ordinals are the last one and then every other one. Anyway, here is the code which, AFAIK, should work to generate and validate/verify check digits for all barcode types:
private void button1_Click(object sender, EventArgs e)
{
string barcodeWithoutCheckSum = textBox1.Text.Trim();
string checkSum = GetBarcodeChecksum(barcodeWithoutCheckSum);
string barcodeWithCheckSum = string.Format("{0}{1}", barcodeWithoutCheckSum, checkSum);
label1.Text = barcodeWithCheckSum;
textBox1.Focus();
}
public static string GetBarcodeChecksum(string barcode)
{
int oddTotal;
int oddTotalTripled;
int evenTotal;
// Which positions are odd or even depend on the length of the barcode,
// or more specifically, whether its length is odd or even, so:
if (isStringOfEvenLen(barcode))
{
oddTotal = sumInsideOrdinals(barcode);
oddTotalTripled = oddTotal * 3;
evenTotal = sumOutsideOrdinals(barcode);
}
else
{
oddTotal = sumOutsideOrdinals(barcode);
oddTotalTripled = oddTotal * 3;
evenTotal = sumInsideOrdinals(barcode);
}
int finalTotal = oddTotalTripled + evenTotal;
int modVal = finalTotal%10;
int checkSum = 10 - modVal;
if (checkSum == 10)
{
return "0";
}
return checkSum.ToString();
}
private static bool isStringOfEvenLen(string barcode)
{
return (barcode.Length % 2 == 0);
}
// "EvenOrdinals" instead of "EvenVals" because values at index 0,2,4,etc. are seen by the
// checkdigitmeisters as First, Third, Fifth, ... (etc.), not Zeroeth, Second, Fourth
private static int sumInsideOrdinals(string barcode)
{
int cumulativeVal = 0;
for (int i = barcode.Length-1; i > -1; i--)
{
if (i % 2 != 0)
{
cumulativeVal += Convert.ToInt16(barcode[i] - '0');
}
}
return cumulativeVal;
}
// "OddOrdinals" instead of "OddVals" because values at index 1,3,5,etc. are seen by the
// checkdigitmeisters as Second, Fourth, Sixth, ..., not First, Third, Fifth, ...
private static int sumOutsideOrdinals(string barcode)
{
int cumulativeVal = 0;
for (int i = barcode.Length - 1; i > -1; i--)
{
if (i % 2 == 0)
{
cumulativeVal += Convert.ToInt16(barcode[i] - '0');
}
}
return cumulativeVal;
}
UPDATE
With the above code, it is easy enough to add a function to verify that a barcode (with appended checkdigit) is valid:
private static bool isValidBarcodeWithCheckDigit(string barcodeWithCheckDigit)
{
string barcodeSansCheckDigit = barcodeWithCheckDigit.Substring(0, barcodeWithCheckDigit.Length - 1);
string checkDigit = barcodeWithCheckDigit.Substring(barcodeWithCheckDigit.Length - 1, 1);
return GetBarcodeChecksum(barcodeSansCheckDigit) == checkDigit;
}

Efficient algorithm for finding the largest overlapping range given a list of ranges

Consider the following interface that describes a continuous range of integer values.
public interface IRange {
int Minimum { get;}
int Maximum { get;}
IRange LargestOverlapRange(IEnumerable<IRange> ranges);
}
I am looking for an efficient algorithm to find the largest overlap range given a list of IRange objects. The idea is briefly outlined in the following diagram. Where the top numbers represent the integer values, and the |-----| represent the IRange objects with a min and max value. I stacked the IRange objects so that the solution is easy to visualize.
0123456789 ... N
|-------| |------------| |-----|
|---------| |---|
|---| |------------|
|--------| |---------------|
|----------|
Here, the LargestOverlapRange method would return:
|---|
Since that range has a total of 4 'overlaps'. If there are two separate IRange with the same number of overlaps, I want to return null.
Here is some brief code of what I tried.
public class Range : IRange
{
public IRange LargestOverlapRange(IEnumerable<IRange> ranges) {
int maxInt = 20000;
// Create a histogram of the counts
int[] histogram = new int[maxInt];
foreach(IRange range in ranges) {
for(int i=range.Minimum; i <= range.Maximum; i++) {
histogram[i]++;
}
}
// Find the mode of the histogram
int mode = 0;
int bin = 0;
for(int i =0; i < maxInt; i++) {
if(histogram[i] > mode) {
mode = histogram[i];
bin = i;
}
}
// Construct a new range of the mode values, if they are continuous
Range range;
for(int i = bin; i < maxInt; i++) {
if(histogram[i] == mode) {
if(range != null)
return null; // violates two ranges with the same mode
range = new Range();
range.Minimum = i;
while(i < maxInt && histrogram[i] == mode)
i++;
range.Maximum = i;
}
}
return range;
}
}
This involves four loops and is easily O(n^2) if not higher. Is there a more efficient algorithm (speed wise) to find the largest overlap range from a list of other ranges?
EDIT
Yes, the O(n^2) is not correct, I was thinking about it incorrectly. It should be O(N * M) as was pointed out in the comments.
EDIT 2
Let me stipulate a few things, the absolute min and max values of the integer values will be from (0, 20000). Secondly, the average number of IRange will be on the order of 100. I don't know if this will change the way the algorithm is designed.
EDIT 3
I am implementing this algorithm on a scientific instrument (a mass spectrometer) in which the speed of the data processing is paramount to the quality of data (faster analysis time = more spectra collected in time T). The firmware language (proprietary) only has arrays[] and is not object orientated. I choose C# since I am decent at porting concepts between the two languages and thought that in the interest of the SO community, a good answer would have a wider audience.

Convert your list of ranges to a list of start and stop points. Sort the list with an O(n log n) algorithm. Now you can iterate through the list and increment or decrement a counter depending on whether it's a start or stop point, which will give you the current overlap depth.

As I understood OP's question, the solution given the 3 ranges
A: 012
B: 123
C: 34
would be the range 12 (a common subset of A and B), not range 123 (because it isn't a common subset of any pair).
Think about the algorithm on paper before writing any code. How about a dynamic programming solution? (If you don't know dynamic programming, it's worth reading about it in a book). The idea of dynamic programming is to build up solutions of simpler subproblems.
Let f_i(n, k) be the size of the longest interval starting at n common to at least k of the first i given ranges.
You can work out f_1 from f_0, and f_2 from f_1 and so on. Updating the functions just depends on the one extra range considered.
Suppose there are M ranges. The values of f_M will tell us the answer to your problem.
The deepest depth you talked about is the greatest k such that f_M(n, k) is non zero for some n. Let's call that maximal depth K. Then we look for the maximum of f_M(n, K) over n. Its maximum is the size of your largest range, which begins at the maximising n.
The maximising n must be the lower bound of some range, so we only need to calculate f for these kind of n. There are M ranges, so at most M lower bounds. Thus, this algorithm has complexity O(MMK).
Let the ith range be from a to b
If n is outside a to b, then no change
f_i(n,k) = f_i-1(n,k)
If n is within a to b, we test the k deep solution made by combining fresh the interval with our old k-1 deep solution. We only use it if it's better than what we already had.
f_i(n,k) = max ( f_i-1(n,k) , min( f_i-1(n,k-1) , b-n+1))
Example! For ranges 0 to 5, 2 to 6, 4 to 8, and 6 to 9.
n 0123456789
...... range 0 to 5
f_1(n,1) 6543210000
..... range 2 to 6
f_2(n,1) 6554321000
f_2(n,2) 0043210000
..... range 4 to 8
f_3(n,1) 6554543210
f_3(n,2) 0043321000
f_3(n,3) 0000210000
.... range 6 to 9
f_4(n,1) 6554544321
f_4(n,2) 0043323210
f_4(n,3) 0000211000
f_4(n,4) 0000000000
Thus the deepest depth K is 3, and the longest range is 4 to 5. We can also see that the longest range depth 2 has size 4 and starts at 3.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Short Id generation and maximum number of combinations - c#

Related

Calculate number of distinct values between two numbers at a given precision

Algorithm for generating all combinations with 2 potential values in 5 variables

Roman numbers subtract without conversion

Does a 2-char check digit for a barcode use the first or second char?

Efficient algorithm for finding the largest overlapping range given a list of ranges

Categories

Resources