Summary:
I'm beginning with some details about alignment algorithms, and at the end, I ask my question. If you know about alignment algorithm pass the beginning.
Consider we have two strings like:
ACCGAATCGA
ACCGGTATTAAC
There is some algorithms like: Smith-Waterman Or Needleman–Wunsch, that align this two sequence and create a matrix. take a look at the result in the following section:
Smith-Waterman Matrix
§ § A C C G A A T C G A
§ 0 0 0 0 0 0 0 0 0 0 0
A 0 4 0 0 0 4 4 0 0 0 4
C 0 0 13 9 4 0 4 3 9 4 0
C 0 0 9 22 17 12 7 3 12 7 4
G 0 0 4 17 28 23 18 13 8 18 13
G 0 0 0 12 23 28 23 18 13 14 18
T 0 0 0 7 18 23 28 28 23 18 14
A 0 4 0 2 13 22 27 28 28 23 22
T 0 0 3 0 8 17 22 32 27 26 23
T 0 0 0 2 3 12 17 27 31 26 26
A 0 4 0 0 2 7 16 22 27 31 30
A 0 4 4 0 0 6 11 17 22 27 35
C 0 0 13 13 8 3 6 12 26 22 30
Optimal Alignments
A C C G A - A T C G A
A C C G G A A T T A A
Question:
My question is simple, but maybe the answer is not easy as it looks. I want to use a group of character as a single one like: [A0][C0][A1][B1]. But in these algorithms, we have to use individual characters. How can we achieve that?
P.S. Consider we have this sequence: #read #write #add #write. Then I convert this to something like that: #read to A .... #write to B.... #add to C. Then my sequence become to: ABCB. But I have a lot of different words that start with #. And the ASCII table is not enough to convert all of them. Then I need more characters. the only way is to use something like [A0] ... [Z9] for each word. OR to use numbers.
P.S: some sample code for Smith-Waterman is exist in this link
P.S: there is another post that want something like that, but what I want is different. In this question, we have a group of character that begins with a [ and ends with ]. And no need to use semantic like ee is equal to i.
I adapted this Python implementation (GPL version 3 licensed) of both the Smith-Waterman and the Needleman-Wunsch algorithms to support sequences with multiple character groups:
#This software is a free software. Thus, it is licensed under GNU General Public License.
#Python implementation to Smith-Waterman Algorithm for Homework 1 of Bioinformatics class.
#Forrest Bao, Sept. 26 <http://fsbao.net> <forrest.bao aT gmail.com>
# zeros() was origianlly from NumPy.
# This version is implemented by alevchuk 2011-04-10
def zeros(shape):
retval = []
for x in range(shape[0]):
retval.append([])
for y in range(shape[1]):
retval[-1].append(0)
return retval
match_award = 10
mismatch_penalty = -5
gap_penalty = -5 # both for opening and extanding
gap = '----' # should be as long as your group of characters
space = ' ' # should be as long as your group of characters
def match_score(alpha, beta):
if alpha == beta:
return match_award
elif alpha == gap or beta == gap:
return gap_penalty
else:
return mismatch_penalty
def finalize(align1, align2):
align1 = align1[::-1] #reverse sequence 1
align2 = align2[::-1] #reverse sequence 2
i,j = 0,0
#calcuate identity, score and aligned sequeces
symbol = []
found = 0
score = 0
identity = 0
for i in range(0,len(align1)):
# if two AAs are the same, then output the letter
if align1[i] == align2[i]:
symbol.append(align1[i])
identity = identity + 1
score += match_score(align1[i], align2[i])
# if they are not identical and none of them is gap
elif align1[i] != align2[i] and align1[i] != gap and align2[i] != gap:
score += match_score(align1[i], align2[i])
symbol.append(space)
found = 0
#if one of them is a gap, output a space
elif align1[i] == gap or align2[i] == gap:
symbol.append(space)
score += gap_penalty
identity = float(identity) / len(align1) * 100
print 'Identity =', "%3.3f" % identity, 'percent'
print 'Score =', score
print ''.join(align1)
# print ''.join(symbol)
print ''.join(align2)
def needle(seq1, seq2):
m, n = len(seq1), len(seq2) # length of two sequences
# Generate DP table and traceback path pointer matrix
score = zeros((m+1, n+1)) # the DP table
# Calculate DP table
for i in range(0, m + 1):
score[i][0] = gap_penalty * i
for j in range(0, n + 1):
score[0][j] = gap_penalty * j
for i in range(1, m + 1):
for j in range(1, n + 1):
match = score[i - 1][j - 1] + match_score(seq1[i-1], seq2[j-1])
delete = score[i - 1][j] + gap_penalty
insert = score[i][j - 1] + gap_penalty
score[i][j] = max(match, delete, insert)
# Traceback and compute the alignment
align1, align2 = [], []
i,j = m,n # start from the bottom right cell
while i > 0 and j > 0: # end toching the top or the left edge
score_current = score[i][j]
score_diagonal = score[i-1][j-1]
score_up = score[i][j-1]
score_left = score[i-1][j]
if score_current == score_diagonal + match_score(seq1[i-1], seq2[j-1]):
align1.append(seq1[i-1])
align2.append(seq2[j-1])
i -= 1
j -= 1
elif score_current == score_left + gap_penalty:
align1.append(seq1[i-1])
align2.append(gap)
i -= 1
elif score_current == score_up + gap_penalty:
align1.append(gap)
align2.append(seq2[j-1])
j -= 1
# Finish tracing up to the top left cell
while i > 0:
align1.append(seq1[i-1])
align2.append(gap)
i -= 1
while j > 0:
align1.append(gap)
align2.append(seq2[j-1])
j -= 1
finalize(align1, align2)
def water(seq1, seq2):
m, n = len(seq1), len(seq2) # length of two sequences
# Generate DP table and traceback path pointer matrix
score = zeros((m+1, n+1)) # the DP table
pointer = zeros((m+1, n+1)) # to store the traceback path
max_score = 0 # initial maximum score in DP table
# Calculate DP table and mark pointers
for i in range(1, m + 1):
for j in range(1, n + 1):
score_diagonal = score[i-1][j-1] + match_score(seq1[i-1], seq2[j-1])
score_up = score[i][j-1] + gap_penalty
score_left = score[i-1][j] + gap_penalty
score[i][j] = max(0,score_left, score_up, score_diagonal)
if score[i][j] == 0:
pointer[i][j] = 0 # 0 means end of the path
if score[i][j] == score_left:
pointer[i][j] = 1 # 1 means trace up
if score[i][j] == score_up:
pointer[i][j] = 2 # 2 means trace left
if score[i][j] == score_diagonal:
pointer[i][j] = 3 # 3 means trace diagonal
if score[i][j] >= max_score:
max_i = i
max_j = j
max_score = score[i][j];
align1, align2 = [], [] # initial sequences
i,j = max_i,max_j # indices of path starting point
#traceback, follow pointers
while pointer[i][j] != 0:
if pointer[i][j] == 3:
align1.append(seq1[i-1])
align2.append(seq2[j-1])
i -= 1
j -= 1
elif pointer[i][j] == 2:
align1.append(gap)
align2.append(seq2[j-1])
j -= 1
elif pointer[i][j] == 1:
align1.append(seq1[i-1])
align2.append(gap)
i -= 1
finalize(align1, align2)
If we run this with the following input:
seq1 = ['[A0]', '[C0]', '[A1]', '[B1]']
seq2 = ['[A0]', '[A1]', '[B1]', '[C1]']
print "Needleman-Wunsch"
needle(seq1, seq2)
print
print "Smith-Waterman"
water(seq1, seq2)
We get this output:
Needleman-Wunsch
Identity = 60.000 percent
Score = 20
[A0][C0][A1][B1]----
[A0]----[A1][B1][C1]
Smith-Waterman
Identity = 75.000 percent
Score = 25
[A0][C0][A1][B1]
[A0]----[A1][B1]
For the specific changes I made, see: this GitHub repository.
Imagine we have a log file with alphabetic sequences. Like something you said, I converted sequences to A0A1... . For example, if there was a sequence like #read #write #add #write, it converted to A0A1A2A1. Every time, I read two character and compare them but keep score matrix like before. Here is my code in C# for smith-waterman string alignment.
Notice that Cell is a user defined class.
private void alignment()
{
string strSeq1;
string strSeq2;
string strTemp1;
string strTemp2;
scoreMatrix = new int[Log.Length, Log.Length];
// Lists That Holds Alignments
List<char> SeqAlign1 = new List<char>();
List<char> SeqAlign2 = new List<char>();
for (int i = 0; i<Log.Length; i++ )
{
for (int j=i+1 ; j<Log.Length; j++)
{
strSeq1 = "--" + logFile.Sequence(i);
strSeq2 = "--" + logFile.Sequence(j);
//prepare Matrix for Computing optimal alignment
Cell[,] Matrix = DynamicProgramming.Intialization_Step(strSeq1, strSeq2, intSim, intNonsim, intGap);
// Trace back matrix from end cell that contains max score
DynamicProgramming.Traceback_Step(Matrix, strSeq1, strSeq2, SeqAlign1, SeqAlign2);
this.scoreMatrix[i, j] = DynamicProgramming.intMaxScore;
strTemp1 = Reverse(string.Join("", SeqAlign1));
strTemp2 = Reverse(string.Join("", SeqAlign2));
}
}
}
class DynamicProgramming
{
public static Cell[,] Intialization_Step(string Seq1, string Seq2,int Sim,int NonSimilar,int Gap)
{
int M = Seq1.Length / 2 ;//Length+1//-AAA //Changed: /2
int N = Seq2.Length / 2 ;//Length+1//-AAA
Cell[,] Matrix = new Cell[N, M];
//Intialize the first Row With Gap Penality Equal To Zero
for (int i = 0; i < Matrix.GetLength(1); i++)
{
Matrix[0, i] = new Cell(0, i, 0);
}
//Intialize the first Column With Gap Penality Equal To Zero
for (int i = 0; i < Matrix.GetLength(0); i++)
{
Matrix[i, 0] = new Cell(i, 0, 0);
}
// Fill Matrix with each cell has a value result from method Get_Max
for (int j = 1; j < Matrix.GetLength(0); j++)
{
for (int i = 1; i < Matrix.GetLength(1); i++)
{
Matrix[j, i] = Get_Max(i, j, Seq1, Seq2, Matrix,Sim,NonSimilar,Gap);
}
}
return Matrix;
}
public static Cell Get_Max(int i, int j, string Seq1, string Seq2, Cell[,] Matrix,int Similar,int NonSimilar,int GapPenality)
{
Cell Temp = new Cell();
int intDiagonal_score;
int intUp_Score;
int intLeft_Score;
int Gap = GapPenality;
//string temp1, temp2;
//temp1 = Seq1[i*2].ToString() + Seq1[i*2 + 1]; temp2 = Seq2[j*2] + Seq2[j*2 + 1].ToString();
if ((Seq1[i * 2] + Seq1[i * 2 + 1]) == (Seq2[j * 2] + Seq2[j * 2 + 1])) //Changed: +
{
intDiagonal_score = Matrix[j - 1, i - 1].CellScore + Similar;
}
else
{
intDiagonal_score = Matrix[j - 1, i - 1].CellScore + NonSimilar;
}
//Calculate gap score
intUp_Score = Matrix[j - 1, i].CellScore + GapPenality;
intLeft_Score = Matrix[j, i - 1].CellScore + GapPenality;
if (intDiagonal_score<=0 && intUp_Score<=0 && intLeft_Score <= 0)
{
return Temp = new Cell(j, i, 0);
}
if (intDiagonal_score >= intUp_Score)
{
if (intDiagonal_score>= intLeft_Score)
{
Temp = new Cell(j, i, intDiagonal_score, Matrix[j - 1, i - 1], Cell.PrevcellType.Diagonal);
}
else
{
Temp = new Cell(j, i, intDiagonal_score, Matrix[j , i - 1], Cell.PrevcellType.Left);
}
}
else
{
if (intUp_Score >= intLeft_Score)
{
Temp = new Cell(j, i, intDiagonal_score, Matrix[j - 1, i], Cell.PrevcellType.Above);
}
else
{
Temp = new Cell(j, i, intDiagonal_score, Matrix[j , i - 1], Cell.PrevcellType.Left);
}
}
if (MaxScore.CellScore <= Temp.CellScore)
{
MaxScore = Temp;
}
return Temp;
}
public static void Traceback_Step(Cell[,] Matrix, string Sq1, string Sq2, List<char> Seq1, List<char> Seq2)
{
intMaxScore = MaxScore.CellScore;
while (MaxScore.CellPointer != null)
{
if (MaxScore.Type == Cell.PrevcellType.Diagonal)
{
Seq1.Add(Sq1[MaxScore.CellColumn * 2 + 1]); //Changed: All of the following lines with *2 and +1
Seq1.Add(Sq1[MaxScore.CellColumn * 2]);
Seq2.Add(Sq2[MaxScore.CellRow * 2 + 1]);
Seq2.Add(Sq2[MaxScore.CellRow * 2]);
}
if (MaxScore.Type == Cell.PrevcellType.Left)
{
Seq1.Add(Sq1[MaxScore.CellColumn * 2 + 1]);
Seq1.Add(Sq1[MaxScore.CellColumn * 2]);
Seq2.Add('-');
}
if (MaxScore.Type == Cell.PrevcellType.Above)
{
Seq1.Add('-');
Seq2.Add(Sq2[MaxScore.CellRow * 2 + 1]);
Seq2.Add(Sq2[MaxScore.CellRow * 2]);
}
MaxScore = MaxScore.CellPointer;
}
}
}
I need an algorithm to convert an Excel Column letter to its proper number.
The language this will be written in is C#, but any would do or even pseudo code.
Please note I am going to put this in C# and I don't want to use the office dll.
For 'A' the expected result will be 1
For 'AH' = 34
For 'XFD' = 16384
public static int ExcelColumnNameToNumber(string columnName)
{
if (string.IsNullOrEmpty(columnName)) throw new ArgumentNullException("columnName");
columnName = columnName.ToUpperInvariant();
int sum = 0;
for (int i = 0; i < columnName.Length; i++)
{
sum *= 26;
sum += (columnName[i] - 'A' + 1);
}
return sum;
}
int result = colName.Select((c, i) =>
((c - 'A' + 1) * ((int)Math.Pow(26, colName.Length - i - 1)))).Sum();
int col = colName.ToCharArray().Select(c => c - 'A' + 1).
Reverse().Select((v, i) => v * (int)Math.Pow(26, i)).Sum();
Loop through the characters from last to first. Multiply the value of each letter (A=1, Z=26) times 26**N, add to a running total. My string manipulation skill in C# is nonexistent, so here is some very mixed pseudo-code:
sum=0;
len=length(letters);
for(i=0;i<len;i++)
sum += ((letters[len-i-1])-'A'+1) * pow(26,i);
Here is a solution I wrote up in JavaScript if anyone is interested.
var letters = "abc".toUpperCase();
var sum = 0;
for(var i = 0; i < letters.length;i++)
{
sum *= 26;
sum += (letters.charCodeAt(i) - ("A".charCodeAt(0)-1));
}
alert(sum);
Could you perhaps treat it like a base 26 number, and then substitute letters for a base 26 number?
So in effect, your right most digit will always be a raw number between 1 and 26, and the remainder of the "number" (the left part) is the number of 26's collected? So A would represent one lot of 26, B would be 2, etc.
As an example:
B = 2 = Column 2
AB = 26 * 1(A) + 2 = Column 28
BB = 26 * 2(B) + 2 = Column 54
DA = 26 * 4(D) + 1 = Column 105
etc
Shorter version:
int col = "Ab".Aggregate(0, (a, c) => a * 26 + c & 31); // 28
To ignore non A-Za-z characters:
int col = " !$Af$3 ".Aggregate(0, (a, c) => (uint)((c | 32) - 'a') > 25 ? a : a * 26 + (c & 31)); // 32
Here is a basic c++ answer for those who are intrested in c++ implemention.
int titleToNumber(string given) {
int power=0;
int res=0;
for(int i=given.length()-1;i>=0;i--)
{
char c=given[i];
res+=pow(26,power)*(c-'A'+1);
power++;
}
return res;
}
in Excel VBA you could use the .Range Method to get the number, like so:
Dim rng as Range
Dim vSearchCol as variant 'your input column
Set rng.Thisworkbook.worksheets("mySheet").Range(vSearchCol & "1:" & vSearchCol & "1")
Then use .column property:
debug.print rng.column
if you need full code see below:
Function ColumnbyName(vInput As Variant, Optional bByName As Boolean = True) As Variant
Dim Rng As Range
If bByName Then
If Not VBA.IsNumeric(vInput) Then
Set Rng = ThisWorkbook.Worksheets("mytab").Range(vInput & "1:" & vInput & "1")
ColumnbyName = Rng.Column
Else
MsgBox "Please enter valid non Numeric column or change paramter bByName to False!"
End If
Else
If VBA.IsNumeric(vInput) Then
ColumnbyName = VBA.Chr(64 + CInt(vInput))
Else
MsgBox "Please enter valid Numeric column or change paramter bByName to True!"
End If
End If
End Function
I guess this essentially works out pretty much the same as some of the other answers, but it may make a little more clear what's going on with the alpha equivalent of a numeric digit. It's not quite a base 26 system because there is no 0 placeholder. That is, the 26th column would be 'A0' or something instead of Z in base 26. And it's not base 27 because the 'alpha-gits' don't represent powers of 27. Man, it really makes you appreciate what a mess arithmetic must have been before the Babylonians invented the zero!
UInt32 sum = 0, gitVal = 1;
foreach (char alphagit in ColumnName.ToUpperInvariant().ToCharArray().Reverse())
{
sum += gitVal * (UInt32)(alphagit - 'A' + 1)
gitVal *= 26;
}
Like some others, I reversed the character array so I don't need to know anything about exponents.
For this purpose I use only one line:
int ColumnNumber = Application.Range[MyColumnName + "1"].Column;
How do you convert a numerical number to an Excel column name in C# without using automation getting the value directly from Excel.
Excel 2007 has a possible range of 1 to 16384, which is the number of columns that it supports. The resulting values should be in the form of excel column names, e.g. A, AA, AAA etc.
Here's how I do it:
private string GetExcelColumnName(int columnNumber)
{
string columnName = "";
while (columnNumber > 0)
{
int modulo = (columnNumber - 1) % 26;
columnName = Convert.ToChar('A' + modulo) + columnName;
columnNumber = (columnNumber - modulo) / 26;
}
return columnName;
}
If anyone needs to do this in Excel without VBA, here is a way:
=SUBSTITUTE(ADDRESS(1;colNum;4);"1";"")
where colNum is the column number
And in VBA:
Function GetColumnName(colNum As Integer) As String
Dim d As Integer
Dim m As Integer
Dim name As String
d = colNum
name = ""
Do While (d > 0)
m = (d - 1) Mod 26
name = Chr(65 + m) + name
d = Int((d - m) / 26)
Loop
GetColumnName = name
End Function
You might need conversion both ways, e.g from Excel column adress like AAZ to integer and from any integer to Excel. The two methods below will do just that. Assumes 1 based indexing, first element in your "arrays" are element number 1.
No limits on size here, so you can use adresses like ERROR and that would be column number 2613824 ...
public static string ColumnAdress(int col)
{
if (col <= 26) {
return Convert.ToChar(col + 64).ToString();
}
int div = col / 26;
int mod = col % 26;
if (mod == 0) {mod = 26;div--;}
return ColumnAdress(div) + ColumnAdress(mod);
}
public static int ColumnNumber(string colAdress)
{
int[] digits = new int[colAdress.Length];
for (int i = 0; i < colAdress.Length; ++i)
{
digits[i] = Convert.ToInt32(colAdress[i]) - 64;
}
int mul=1;int res=0;
for (int pos = digits.Length - 1; pos >= 0; --pos)
{
res += digits[pos] * mul;
mul *= 26;
}
return res;
}
Sorry, this is Python instead of C#, but at least the results are correct:
def ColIdxToXlName(idx):
if idx < 1:
raise ValueError("Index is too small")
result = ""
while True:
if idx > 26:
idx, r = divmod(idx - 1, 26)
result = chr(r + ord('A')) + result
else:
return chr(idx + ord('A') - 1) + result
for i in xrange(1, 1024):
print "%4d : %s" % (i, ColIdxToXlName(i))
I discovered an error in my first post, so I decided to sit down and do the the math. What I found is that the number system used to identify Excel columns is not a base 26 system, as another person posted. Consider the following in base 10. You can also do this with the letters of the alphabet.
Space:.........................S1, S2, S3 : S1, S2, S3
....................................0, 00, 000 :.. A, AA, AAA
....................................1, 01, 001 :.. B, AB, AAB
.................................... …, …, … :.. …, …, …
....................................9, 99, 999 :.. Z, ZZ, ZZZ
Total states in space: 10, 100, 1000 : 26, 676, 17576
Total States:...............1110................18278
Excel numbers columns in the individual alphabetical spaces using base 26. You can see that in general, the state space progression is a, a^2, a^3, … for some base a, and the total number of states is a + a^2 + a^3 + … .
Suppose you want to find the total number of states A in the first N spaces. The formula for doing so is A = (a)(a^N - 1 )/(a-1). This is important because we need to find the space N that corresponds to our index K. If I want to find out where K lies in the number system I need to replace A with K and solve for N. The solution is N = log{base a} (A (a-1)/a +1). If I use the example of a = 10 and K = 192, I know that N = 2.23804… . This tells me that K lies at the beginning of the third space since it is a little greater than two.
The next step is to find exactly how far in the current space we are. To find this, subtract from K the A generated using the floor of N. In this example, the floor of N is two. So, A = (10)(10^2 – 1)/(10-1) = 110, as is expected when you combine the states of the first two spaces. This needs to be subtracted from K because these first 110 states would have already been accounted for in the first two spaces. This leaves us with 82 states. So, in this number system, the representation of 192 in base 10 is 082.
The C# code using a base index of zero is
private string ExcelColumnIndexToName(int Index)
{
string range = string.Empty;
if (Index < 0 ) return range;
int a = 26;
int x = (int)Math.Floor(Math.Log((Index) * (a - 1) / a + 1, a));
Index -= (int)(Math.Pow(a, x) - 1) * a / (a - 1);
for (int i = x+1; Index + i > 0; i--)
{
range = ((char)(65 + Index % a)).ToString() + range;
Index /= a;
}
return range;
}
//Old Post
A zero-based solution in C#.
private string ExcelColumnIndexToName(int Index)
{
string range = "";
if (Index < 0 ) return range;
for(int i=1;Index + i > 0;i=0)
{
range = ((char)(65 + Index % 26)).ToString() + range;
Index /= 26;
}
if (range.Length > 1) range = ((char)((int)range[0] - 1)).ToString() + range.Substring(1);
return range;
}
This answer is in javaScript:
function getCharFromNumber(columnNumber){
var dividend = columnNumber;
var columnName = "";
var modulo;
while (dividend > 0)
{
modulo = (dividend - 1) % 26;
columnName = String.fromCharCode(65 + modulo).toString() + columnName;
dividend = parseInt((dividend - modulo) / 26);
}
return columnName;
}
Easy with recursion.
public static string GetStandardExcelColumnName(int columnNumberOneBased)
{
int baseValue = Convert.ToInt32('A');
int columnNumberZeroBased = columnNumberOneBased - 1;
string ret = "";
if (columnNumberOneBased > 26)
{
ret = GetStandardExcelColumnName(columnNumberZeroBased / 26) ;
}
return ret + Convert.ToChar(baseValue + (columnNumberZeroBased % 26) );
}
I'm surprised all of the solutions so far contain either iteration or recursion.
Here's my solution that runs in constant time (no loops). This solution works for all possible Excel columns and checks that the input can be turned into an Excel column. Possible columns are in the range [A, XFD] or [1, 16384]. (This is dependent on your version of Excel)
private static string Turn(uint col)
{
if (col < 1 || col > 16384) //Excel columns are one-based (one = 'A')
throw new ArgumentException("col must be >= 1 and <= 16384");
if (col <= 26) //one character
return ((char)(col + 'A' - 1)).ToString();
else if (col <= 702) //two characters
{
char firstChar = (char)((int)((col - 1) / 26) + 'A' - 1);
char secondChar = (char)(col % 26 + 'A' - 1);
if (secondChar == '#') //Excel is one-based, but modulo operations are zero-based
secondChar = 'Z'; //convert one-based to zero-based
return string.Format("{0}{1}", firstChar, secondChar);
}
else //three characters
{
char firstChar = (char)((int)((col - 1) / 702) + 'A' - 1);
char secondChar = (char)((col - 1) / 26 % 26 + 'A' - 1);
char thirdChar = (char)(col % 26 + 'A' - 1);
if (thirdChar == '#') //Excel is one-based, but modulo operations are zero-based
thirdChar = 'Z'; //convert one-based to zero-based
return string.Format("{0}{1}{2}", firstChar, secondChar, thirdChar);
}
}
Same implementation in Java
public String getExcelColumnName (int columnNumber)
{
int dividend = columnNumber;
int i;
String columnName = "";
int modulo;
while (dividend > 0)
{
modulo = (dividend - 1) % 26;
i = 65 + modulo;
columnName = new Character((char)i).toString() + columnName;
dividend = (int)((dividend - modulo) / 26);
}
return columnName;
}
int nCol = 127;
string sChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string sCol = "";
while (nCol >= 26)
{
int nChar = nCol % 26;
nCol = (nCol - nChar) / 26;
// You could do some trick with using nChar as offset from 'A', but I am lazy to do it right now.
sCol = sChars[nChar] + sCol;
}
sCol = sChars[nCol] + sCol;
Update: Peter's comment is right. That's what I get for writing code in the browser. :-) My solution was not compiling, it was missing the left-most letter and it was building the string in reverse order - all now fixed.
Bugs aside, the algorithm is basically converting a number from base 10 to base 26.
Update 2: Joel Coehoorn is right - the code above will return AB for 27. If it was real base 26 number, AA would be equal to A and the next number after Z would be BA.
int nCol = 127;
string sChars = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string sCol = "";
while (nCol > 26)
{
int nChar = nCol % 26;
if (nChar == 0)
nChar = 26;
nCol = (nCol - nChar) / 26;
sCol = sChars[nChar] + sCol;
}
if (nCol != 0)
sCol = sChars[nCol] + sCol;
..And converted to php:
function GetExcelColumnName($columnNumber) {
$columnName = '';
while ($columnNumber > 0) {
$modulo = ($columnNumber - 1) % 26;
$columnName = chr(65 + $modulo) . $columnName;
$columnNumber = (int)(($columnNumber - $modulo) / 26);
}
return $columnName;
}
Just throwing in a simple two-line C# implementation using recursion, because all the answers here seem far more complicated than necessary.
/// <summary>
/// Gets the column letter(s) corresponding to the given column number.
/// </summary>
/// <param name="column">The one-based column index. Must be greater than zero.</param>
/// <returns>The desired column letter, or an empty string if the column number was invalid.</returns>
public static string GetColumnLetter(int column) {
if (column < 1) return String.Empty;
return GetColumnLetter((column - 1) / 26) + (char)('A' + (column - 1) % 26);
}
Although there are already a bunch of valid answers1, none get into the theory behind it.
Excel column names are bijective base-26 representations of their number. This is quite different than an ordinary base 26 (there is no leading zero), and I really recommend reading the Wikipedia entry to grasp the differences. For example, the decimal value 702 (decomposed in 26*26 + 26) is represented in "ordinary" base 26 by 110 (i.e. 1x26^2 + 1x26^1 + 0x26^0) and in bijective base-26 by ZZ (i.e. 26x26^1 + 26x26^0).
Differences aside, bijective numeration is a positional notation, and as such we can perform conversions using an iterative (or recursive) algorithm which on each iteration finds the digit of the next position (similarly to an ordinary base conversion algorithm).
The general formula to get the digit at the last position (the one indexed 0) of the bijective base-k representation of a decimal number m is (f being the ceiling function minus 1):
m - (f(m / k) * k)
The digit at the next position (i.e. the one indexed 1) is found by applying the same formula to the result of f(m / k). We know that for the last digit (i.e. the one with the highest index) f(m / k) is 0.
This forms the basis for an iteration that finds each successive digit in bijective base-k of a decimal number. In pseudo-code it would look like this (digit() maps a decimal integer to its representation in the bijective base -- e.g. digit(1) would return A in bijective base-26):
fun conv(m)
q = f(m / k)
a = m - (q * k)
if (q == 0)
return digit(a)
else
return conv(q) + digit(a);
So we can translate this to C#2 to get a generic3 "conversion to bijective base-k" ToBijective() routine:
class BijectiveNumeration {
private int baseK;
private Func<int, char> getDigit;
public BijectiveNumeration(int baseK, Func<int, char> getDigit) {
this.baseK = baseK;
this.getDigit = getDigit;
}
public string ToBijective(double decimalValue) {
double q = f(decimalValue / baseK);
double a = decimalValue - (q * baseK);
return ((q > 0) ? ToBijective(q) : "") + getDigit((int)a);
}
private static double f(double i) {
return (Math.Ceiling(i) - 1);
}
}
Now for conversion to bijective base-26 (our "Excel column name" use case):
static void Main(string[] args)
{
BijectiveNumeration bijBase26 = new BijectiveNumeration(
26,
(value) => Convert.ToChar('A' + (value - 1))
);
Console.WriteLine(bijBase26.ToBijective(1)); // prints "A"
Console.WriteLine(bijBase26.ToBijective(26)); // prints "Z"
Console.WriteLine(bijBase26.ToBijective(27)); // prints "AA"
Console.WriteLine(bijBase26.ToBijective(702)); // prints "ZZ"
Console.WriteLine(bijBase26.ToBijective(16384)); // prints "XFD"
}
Excel's maximum column index is 16384 / XFD, but this code will convert any positive number.
As an added bonus, we can now easily convert to any bijective base. For example for bijective base-10:
static void Main(string[] args)
{
BijectiveNumeration bijBase10 = new BijectiveNumeration(
10,
(value) => value < 10 ? Convert.ToChar('0'+value) : 'A'
);
Console.WriteLine(bijBase10.ToBijective(1)); // prints "1"
Console.WriteLine(bijBase10.ToBijective(10)); // prints "A"
Console.WriteLine(bijBase10.ToBijective(123)); // prints "123"
Console.WriteLine(bijBase10.ToBijective(20)); // prints "1A"
Console.WriteLine(bijBase10.ToBijective(100)); // prints "9A"
Console.WriteLine(bijBase10.ToBijective(101)); // prints "A1"
Console.WriteLine(bijBase10.ToBijective(2010)); // prints "19AA"
}
1 This generic answer can eventually be reduced to the other, correct, specific answers, but I find it hard to fully grasp the logic of the solutions without the formal theory behind bijective numeration in general. It also proves its correctness nicely. Additionally, several similar questions link back to this one, some being language-agnostic or more generic. That's why I thought the addition of this answer was warranted, and that this question was a good place to put it.
2 C# disclaimer: I implemented an example in C# because this is what is asked here, but I have never learned nor used the language. I have verified it does compile and run, but please adapt it to fit the language best practices / general conventions, if necessary.
3 This example only aims to be correct and understandable ; it could and should be optimized would performance matter (e.g. with tail-recursion -- but that seems to require trampolining in C#), and made safer (e.g. by validating parameters).
I wanted to throw in my static class I use, for interoping between col index and col Label. I use a modified accepted answer for my ColumnLabel Method
public static class Extensions
{
public static string ColumnLabel(this int col)
{
var dividend = col;
var columnLabel = string.Empty;
int modulo;
while (dividend > 0)
{
modulo = (dividend - 1) % 26;
columnLabel = Convert.ToChar(65 + modulo).ToString() + columnLabel;
dividend = (int)((dividend - modulo) / 26);
}
return columnLabel;
}
public static int ColumnIndex(this string colLabel)
{
// "AD" (1 * 26^1) + (4 * 26^0) ...
var colIndex = 0;
for(int ind = 0, pow = colLabel.Count()-1; ind < colLabel.Count(); ++ind, --pow)
{
var cVal = Convert.ToInt32(colLabel[ind]) - 64; //col A is index 1
colIndex += cVal * ((int)Math.Pow(26, pow));
}
return colIndex;
}
}
Use this like...
30.ColumnLabel(); // "AD"
"AD".ColumnIndex(); // 30
private String getColumn(int c) {
String s = "";
do {
s = (char)('A' + (c % 26)) + s;
c /= 26;
} while (c-- > 0);
return s;
}
Its not exactly base 26, there is no 0 in the system. If there was, 'Z' would be followed by 'BA' not by 'AA'.
if you just want it for a cell formula without code, here's a formula for it:
IF(COLUMN()>=26,CHAR(ROUND(COLUMN()/26,1)+64)&CHAR(MOD(COLUMN(),26)+64),CHAR(COLUMN()+64))
In Delphi (Pascal):
function GetExcelColumnName(columnNumber: integer): string;
var
dividend, modulo: integer;
begin
Result := '';
dividend := columnNumber;
while dividend > 0 do begin
modulo := (dividend - 1) mod 26;
Result := Chr(65 + modulo) + Result;
dividend := (dividend - modulo) div 26;
end;
end;
A little late to the game, but here's the code I use (in C#):
private static readonly string _Alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static int ColumnNameParse(string value)
{
// assumes value.Length is [1,3]
// assumes value is uppercase
var digits = value.PadLeft(3).Select(x => _Alphabet.IndexOf(x));
return digits.Aggregate(0, (current, index) => (current * 26) + (index + 1));
}
In perl, for an input of 1 (A), 27 (AA), etc.
sub excel_colname {
my ($idx) = #_; # one-based column number
--$idx; # zero-based column index
my $name = "";
while ($idx >= 0) {
$name .= chr(ord("A") + ($idx % 26));
$idx = int($idx / 26) - 1;
}
return scalar reverse $name;
}
Though I am late to the game, Graham's answer is far from being optimal. Particularly, you don't have to use the modulo, call ToString() and apply (int) cast. Considering that in most cases in C# world you would start numbering from 0, here is my revision:
public static string GetColumnName(int index) // zero-based
{
const byte BASE = 'Z' - 'A' + 1;
string name = String.Empty;
do
{
name = Convert.ToChar('A' + index % BASE) + name;
index = index / BASE - 1;
}
while (index >= 0);
return name;
}
More than 30 solutions already, but here's my one-line C# solution...
public string IntToExcelColumn(int i)
{
return ((i<16926? "" : ((char)((((i/26)-1)%26)+65)).ToString()) + (i<2730? "" : ((char)((((i/26)-1)%26)+65)).ToString()) + (i<26? "" : ((char)((((i/26)-1)%26)+65)).ToString()) + ((char)((i%26)+65)));
}
After looking at all the supplied Versions here, I decided to do one myself, using recursion.
Here is my vb.net Version:
Function CL(ByVal x As Integer) As String
If x >= 1 And x <= 26 Then
CL = Chr(x + 64)
Else
CL = CL((x - x Mod 26) / 26) & Chr((x Mod 26) + 1 + 64)
End If
End Function
Refining the original solution (in C#):
public static class ExcelHelper
{
private static Dictionary<UInt16, String> l_DictionaryOfColumns;
public static ExcelHelper() {
l_DictionaryOfColumns = new Dictionary<ushort, string>(256);
}
public static String GetExcelColumnName(UInt16 l_Column)
{
UInt16 l_ColumnCopy = l_Column;
String l_Chars = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
String l_rVal = "";
UInt16 l_Char;
if (l_DictionaryOfColumns.ContainsKey(l_Column) == true)
{
l_rVal = l_DictionaryOfColumns[l_Column];
}
else
{
while (l_ColumnCopy > 26)
{
l_Char = l_ColumnCopy % 26;
if (l_Char == 0)
l_Char = 26;
l_ColumnCopy = (l_ColumnCopy - l_Char) / 26;
l_rVal = l_Chars[l_Char] + l_rVal;
}
if (l_ColumnCopy != 0)
l_rVal = l_Chars[l_ColumnCopy] + l_rVal;
l_DictionaryOfColumns.ContainsKey(l_Column) = l_rVal;
}
return l_rVal;
}
}
Here is an Actionscript version:
private var columnNumbers:Array = ['A', 'B', 'C', 'D', 'E', 'F' , 'G', 'H', 'I', 'J', 'K' ,'L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'];
private function getExcelColumnName(columnNumber:int) : String{
var dividend:int = columnNumber;
var columnName:String = "";
var modulo:int;
while (dividend > 0)
{
modulo = (dividend - 1) % 26;
columnName = columnNumbers[modulo] + columnName;
dividend = int((dividend - modulo) / 26);
}
return columnName;
}
JavaScript Solution
/**
* Calculate the column letter abbreviation from a 1 based index
* #param {Number} value
* #returns {string}
*/
getColumnFromIndex = function (value) {
var base = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.split('');
var remainder, result = "";
do {
remainder = value % 26;
result = base[(remainder || 26) - 1] + result;
value = Math.floor(value / 26);
} while (value > 0);
return result;
};
These my codes to convert specific number (index start from 1) to Excel Column.
public static string NumberToExcelColumn(uint number)
{
uint originalNumber = number;
uint numChars = 1;
while (Math.Pow(26, numChars) < number)
{
numChars++;
if (Math.Pow(26, numChars) + 26 >= number)
{
break;
}
}
string toRet = "";
uint lastValue = 0;
do
{
number -= lastValue;
double powerVal = Math.Pow(26, numChars - 1);
byte thisCharIdx = (byte)Math.Truncate((columnNumber - 1) / powerVal);
lastValue = (int)powerVal * thisCharIdx;
if (numChars - 2 >= 0)
{
double powerVal_next = Math.Pow(26, numChars - 2);
byte thisCharIdx_next = (byte)Math.Truncate((columnNumber - lastValue - 1) / powerVal_next);
int lastValue_next = (int)Math.Pow(26, numChars - 2) * thisCharIdx_next;
if (thisCharIdx_next == 0 && lastValue_next == 0 && powerVal_next == 26)
{
thisCharIdx--;
lastValue = (int)powerVal * thisCharIdx;
}
}
toRet += (char)((byte)'A' + thisCharIdx + ((numChars > 1) ? -1 : 0));
numChars--;
} while (numChars > 0);
return toRet;
}
My Unit Test:
[TestMethod]
public void Test()
{
Assert.AreEqual("A", NumberToExcelColumn(1));
Assert.AreEqual("Z", NumberToExcelColumn(26));
Assert.AreEqual("AA", NumberToExcelColumn(27));
Assert.AreEqual("AO", NumberToExcelColumn(41));
Assert.AreEqual("AZ", NumberToExcelColumn(52));
Assert.AreEqual("BA", NumberToExcelColumn(53));
Assert.AreEqual("ZZ", NumberToExcelColumn(702));
Assert.AreEqual("AAA", NumberToExcelColumn(703));
Assert.AreEqual("ABC", NumberToExcelColumn(731));
Assert.AreEqual("ACQ", NumberToExcelColumn(771));
Assert.AreEqual("AYZ", NumberToExcelColumn(1352));
Assert.AreEqual("AZA", NumberToExcelColumn(1353));
Assert.AreEqual("AZB", NumberToExcelColumn(1354));
Assert.AreEqual("BAA", NumberToExcelColumn(1379));
Assert.AreEqual("CNU", NumberToExcelColumn(2413));
Assert.AreEqual("GCM", NumberToExcelColumn(4823));
Assert.AreEqual("MSR", NumberToExcelColumn(9300));
Assert.AreEqual("OMB", NumberToExcelColumn(10480));
Assert.AreEqual("ULV", NumberToExcelColumn(14530));
Assert.AreEqual("XFD", NumberToExcelColumn(16384));
}
Sorry, this is Python instead of C#, but at least the results are correct:
def excel_column_number_to_name(column_number):
output = ""
index = column_number-1
while index >= 0:
character = chr((index%26)+ord('A'))
output = output + character
index = index/26 - 1
return output[::-1]
for i in xrange(1, 1024):
print "%4d : %s" % (i, excel_column_number_to_name(i))
Passed these test cases:
Column Number: 494286 => ABCDZ
Column Number: 27 => AA
Column Number: 52 => AZ
For what it is worth, here is Graham's code in Powershell:
function ConvertTo-ExcelColumnID {
param (
[parameter(Position = 0,
HelpMessage = "A 1-based index to convert to an excel column ID. e.g. 2 => 'B', 29 => 'AC'",
Mandatory = $true)]
[int]$index
);
[string]$result = '';
if ($index -le 0 ) {
return $result;
}
while ($index -gt 0) {
[int]$modulo = ($index - 1) % 26;
$character = [char]($modulo + [int][char]'A');
$result = $character + $result;
[int]$index = ($index - $modulo) / 26;
}
return $result;
}
Another VBA way
Public Function GetColumnName(TargetCell As Range) As String
GetColumnName = Split(CStr(TargetCell.Cells(1, 1).Address), "$")(1)
End Function
Here's my super late implementation in PHP. This one's recursive. I wrote it just before I found this post. I wanted to see if others had solved this problem already...
public function GetColumn($intNumber, $strCol = null) {
if ($intNumber > 0) {
$intRem = ($intNumber - 1) % 26;
$strCol = $this->GetColumn(intval(($intNumber - $intRem) / 26), sprintf('%s%s', chr(65 + $intRem), $strCol));
}
return $strCol;
}