C#: Convert COMP-3 Packed Decimal to Human-Readable Value - c#

I have a series of ASCII flat files coming in from a mainframe to be processed by a C# application. A new feed has been introduced with a Packed Decimal (COMP-3) field, which needs to be converted to a numerical value.
The files are being transferred via FTP, using ASCII transfer mode. I am concerned that the binary field may contain what will be interpreted as very-low ASCII codes or control characters instead of a value - Or worse, may be lost in the FTP process.
What's more, the fields are being read as strings. I may have the flexibility to work around this part (i.e. a stream of some sort), but the business will give me pushback.
The requirement read "Convert from HEX to ASCII", but clearly that didn't yield the correct values. Any help would be appreciated; it need not be language-specific as long as you can explain the logic of the conversion process.

I have been watching the posts on numerous boards concerning converting Comp-3 BCD data from "legacy" mainframe files to something useable in C#. First, I would like to say that I am less than enamoured by the responses that some of these posts have received - especially those that have said essentially "why are you bothering us with these non-C#/C++ related posts" and also "If you need an answer about some sort of COBOL convention, why don't you go visit a COBOL oriented site". This, to me, is complete BS as there is going to be a need for probably many years to come, (unfortunately), for software developers to understand how to deal with some of these legacy issues that exist in THE REAL WORLD. So, even if I get slammed on this post for the following code, I am going to share with you a REAL WORLD experience that I had to deal with regarding COMP-3/EBCDIC conversion (and yes, I am he who talks of "floppy disks, paper-tape, Disc Packs etc... - I have been a software engineer since 1979").
First - understand that any file that you read from a legacy main-frame system like IBM is going to present the data to you in EBCDIC format and in order to convert any of that data to a C#/C++ string you can deal with you are going to have to use the proper code page translation to get the data into ASCII format. A good example of how to handle this would be:
StreamReader readFile = new StreamReader(path, Encoding.GetEncoding(037); // 037 = EBCDIC to ASCII translation.
This will ensure that anything that you read from this stream will then be converted to ASCII and can be used in a string format. This includes "Zoned Decimal" (Pic 9) and "Text" (Pic X) fields as declared by COBOL. However, this does not necessarily convert COMP-3 fields to the correct "binary" equivelant when read into a char[] or byte[] array. To do this, the only way that you are ever going to get this translated properly (even using UTF-8, UTF-16, Default or whatever) code pages, you are going to want to open the file like this:
FileStream fileStream = new FileStream(path, FIleMode.Open, FIleAccess.Read, FileShare.Read);
Of course, the "FileShare.Read" option is "optional".
When you have isolated the field that you want to convert to a decimal value (and then subsequently to an ASCII string if need be), you can use the following code - and this has been basically stolen from the MicroSoft "UnpackDecimal" posting that you can get at:
http://www.microsoft.com/downloads/details.aspx?familyid=0e4bba52-cc52-4d89-8590-cda297ff7fbd&displaylang=en
I have isolated (I think) what are the most important parts of this logic and consolidated it into two a method that you can do with what you want. For my purposes, I chose to leave this as returning a Decimal value which I could then do with what I wanted. Basically, the method is called "unpack" and you pass it a byte[] array (no longer than 12 bytes) and the scale as an int, which is the number of decimal places you want to have returned in the Decimal value. I hope this works for you as well as it did for me.
private Decimal Unpack(byte[] inp, int scale)
{
long lo = 0;
long mid = 0;
long hi = 0;
bool isNegative;
// this nybble stores only the sign, not a digit.
// "C" hex is positive, "D" hex is negative, and "F" hex is unsigned.
switch (nibble(inp, 0))
{
case 0x0D:
isNegative = true;
break;
case 0x0F:
case 0x0C:
isNegative = false;
break;
default:
throw new Exception("Bad sign nibble");
}
long intermediate;
long carry;
long digit;
for (int j = inp.Length * 2 - 1; j > 0; j--)
{
// multiply by 10
intermediate = lo * 10;
lo = intermediate & 0xffffffff;
carry = intermediate >> 32;
intermediate = mid * 10 + carry;
mid = intermediate & 0xffffffff;
carry = intermediate >> 32;
intermediate = hi * 10 + carry;
hi = intermediate & 0xffffffff;
carry = intermediate >> 32;
// By limiting input length to 14, we ensure overflow will never occur
digit = nibble(inp, j);
if (digit > 9)
{
throw new Exception("Bad digit");
}
intermediate = lo + digit;
lo = intermediate & 0xffffffff;
carry = intermediate >> 32;
if (carry > 0)
{
intermediate = mid + carry;
mid = intermediate & 0xffffffff;
carry = intermediate >> 32;
if (carry > 0)
{
intermediate = hi + carry;
hi = intermediate & 0xffffffff;
carry = intermediate >> 32;
// carry should never be non-zero. Back up with validation
}
}
}
return new Decimal((int)lo, (int)mid, (int)hi, isNegative, (byte)scale);
}
private int nibble(byte[] inp, int nibbleNo)
{
int b = inp[inp.Length - 1 - nibbleNo / 2];
return (nibbleNo % 2 == 0) ? (b & 0x0000000F) : (b >> 4);
}
If you have any questions, post them on here - because I suspect that I am going to get "flamed" like everyone else who has chosen to post questions that are pertinent to todays issues...
Thanks,
John - The Elder.

First of all you must eliminate the end of line (EOL) translation problems that will be caused by ASCII transfer mode. You are absolutely right to be concerned about data corruption when the BCD values happen to correspond to EOL characters. The worst aspect of this problem is that it will occur rarely and unexpectedly.
The best solution is to change the transfer mode to BIN. This is appropriate since the data you are transferring is binary. If it is not possible to use the correct FTP transfer mode, you can undo the ASCII mode damage in code. All you have to do is convert \r\n pairs back to \n. If I were you I would make sure this is well tested.
Once you've dealt with the EOL problem, the COMP-3 conversion is pretty straigtforward. I was able to find this article in the MS knowledgebase with sample code in BASIC. See below for a VB.NET port of this code.
Since you're dealing with COMP-3 values, the file format you're reading almost surely has fixed record sizes with fixed field lengths. If I were you, I would get my hands of a file format specification before you go any further with this. You should be using a BinaryReader to work with this data. If someone is pushing back on this point, I would walk away. Let them find someone else to indulge their folly.
Here's a VB.NET port of the BASIC sample code. I haven't tested this because I don't have access to a COMP-3 file. If this doesn't work, I would refer back to the original MS sample code for guidance, or to references in the other answers to this question.
Imports Microsoft.VisualBasic
Module Module1
'Sample COMP-3 conversion code
'Adapted from http://support.microsoft.com/kb/65323
'This code has not been tested
Sub Main()
Dim Digits%(15) 'Holds the digits for each number (max = 16).
Dim Basiceqv#(1000) 'Holds the Basic equivalent of each COMP-3 number.
'Added to make code compile
Dim MyByte As Char, HighPower%, HighNibble%
Dim LowNibble%, Digit%, E%, Decimal%, FileName$
'Clear the screen, get the filename and the amount of decimal places
'desired for each number, and open the file for sequential input:
FileName$ = InputBox("Enter the COBOL data file name: ")
Decimal% = InputBox("Enter the number of decimal places desired: ")
FileOpen(1, FileName$, OpenMode.Binary)
Do Until EOF(1) 'Loop until the end of the file is reached.
Input(1, MyByte)
If MyByte = Chr(0) Then 'Check if byte is 0 (ASC won't work on 0).
Digits%(HighPower%) = 0 'Make next two digits 0. Increment
Digits%(HighPower% + 1) = 0 'the high power to reflect the
HighPower% = HighPower% + 2 'number of digits in the number
'plus 1.
Else
HighNibble% = Asc(MyByte) \ 16 'Extract the high and low
LowNibble% = Asc(MyByte) And &HF 'nibbles from the byte. The
Digits%(HighPower%) = HighNibble% 'high nibble will always be a
'digit.
If LowNibble% <= 9 Then 'If low nibble is a
'digit, assign it and
Digits%(HighPower% + 1) = LowNibble% 'increment the high
HighPower% = HighPower% + 2 'power accordingly.
Else
HighPower% = HighPower% + 1 'Low nibble was not a digit but a
Digit% = 0 '+ or - signals end of number.
'Start at the highest power of 10 for the number and multiply
'each digit by the power of 10 place it occupies.
For Power% = (HighPower% - 1) To 0 Step -1
Basiceqv#(E%) = Basiceqv#(E%) + (Digits%(Digit%) * (10 ^ Power%))
Digit% = Digit% + 1
Next
'If the sign read was negative, make the number negative.
If LowNibble% = 13 Then
Basiceqv#(E%) = Basiceqv#(E%) - (2 * Basiceqv#(E%))
End If
'Give the number the desired amount of decimal places, print
'the number, increment E% to point to the next number to be
'converted, and reinitialize the highest power.
Basiceqv#(E%) = Basiceqv#(E%) / (10 ^ Decimal%)
Print(Basiceqv#(E%))
E% = E% + 1
HighPower% = 0
End If
End If
Loop
FileClose() 'Close the COBOL data file, and end.
End Sub
End Module

If the original data was in EBCDIC your COMP-3 field has been garbled. The FTP process has done an EBCDIC to ASCII translation of the byte values in the COMP-3 field which isn't what you want. To correct this you can:
1) Use BINARY mode for the transfer so you get the raw EBCDIC data. Then you convert the COMP-3 field to a number and translate any other EBCDIC text on the record to ASCII. A packed field stores each digit in a half byte with the lower half byte as a sign (F is positive and other values, usually D or E, are negative). Storing 123.4 in a PIC 999.99 USAGE COMP-3 would be X'01234F' (three bytes) and -123 in the same field is X'01230D'.
2) Have the sender convert the field into a USAGE IS DISPLAY SIGN IS LEADING(or TRAILING) numeric field. This stores the number as a string of EBCDIC numeric digits with the sign as a separate negative(-) or blank character. All digits and the sign translate correctly to their ASCII equivalent on the FTP transfer.

I apologize if I am way off base here, but perhaps this code sample I'll paste here could help you. This came from VBRocks...
Imports System
Imports System.IO
Imports System.Text
Imports System.Text.Encoding
'4/20/07 submission includes a line spacing addition when a control character is used:
' The line spacing is calculated off of the 3rd control character.
'
' Also includes the 4/18 modification of determining end of file.
'4/26/07 submission inclues an addition of 6 to the record length when the 4th control
' character is an 8. This is because these records were being truncated.
'Authored by Gary A. Lima, aka. VBRocks
''' <summary>
''' Translates an EBCDIC file to an ASCII file.
''' </summary>
''' <remarks></remarks>
Public Class EBCDIC_to_ASCII_Translator
#Region " Example"
Private Sub Example()
'Set your source file and destination file paths
Dim sSourcePath As String = "c:\Temp\MyEBCDICFile"
Dim sDestinationPath As String = "c:\Temp\TranslatedFile.txt"
Dim trans As New EBCDIC_to_ASCII_Translator()
'If your EBCDIC file uses Control records to determine the length of a record, then this to True
trans.UseControlRecord = True
'If the first record of your EBCDIC file is filler (junk), then set this to True
trans.IgnoreFirstRecord = True
'EBCDIC files are written in block lengths, set your block length (Example: 134, 900, Etc.)
trans.BlockLength = 900
'This method will actually translate your source file and output it to the specified destination file path
trans.TranslateFile(sSourcePath, sDestinationPath)
'Here is a alternate example:
'No Control record is used
'trans.UseControlRecord = False
'Translate the whole file, including the first record
'trans.IgnoreFirstRecord = False
'Set the block length
'trans.BlockLength = 134
'Translate...
'trans.TranslateFile(sSourcePath, sDestinationPath)
'*** Some additional methods that you can use are:
'Trim off leading characters from left side of string (position 0 to...)
'trans.LTrim = 15
'Translate 1 EBCDIC character to an ASCII character
'Dim strASCIIChar as String = trans.TranslateCharacter("S")
'Translate an EBCDIC character array to an ASCII string
'trans.TranslateCharacters(chrEBCDICArray)
'Translates an EBCDIC string to an ASCII string
'Dim strASCII As String = trans.TranslateString("EBCDIC String")
End Sub
#End Region 'Example
'Translate characters from EBCDIC to ASCII
Private ASCIIEncoding As Encoding = Encoding.ASCII
Private EBCDICEncoding As Encoding = Encoding.GetEncoding(37) 'EBCDIC
'Block Length: Can be fixed (Ex: 134).
Private miBlockLength As Integer = 0
Private mbUseControlRec As Boolean = True 'If set to False, will return exact block length
Private mbIgnoreFirstRecord As Boolean = True 'Will Ignore first record if set to true (First record may be filler)
Private miLTrim As Integer = 0
''' <summary>
''' Translates SourceFile from EBCDIC to ASCII. Writes output to file path specified by DestinationFile parameter.
''' Set the BlockLength Property to designate block size to read.
''' </summary>
''' <param name="SourceFile">Enter the path of the Source File.</param>
''' <param name="DestinationFile">Enter the path of the Destination File.</param>
''' <remarks></remarks>
Public Sub TranslateFile(ByVal SourceFile As String, ByVal DestinationFile As String)
Dim iRecordLength As Integer 'Stores length of a record, not including the length of the Control Record (if used)
Dim sRecord As String = "" 'Stores the actual record
Dim iLineSpace As Integer = 1 'LineSpace: 1 for Single Space, 2 for Double Space, 3 for Triple Space...
Dim iControlPosSix As Byte() 'Stores the 6th character of a Control Record (used to calculate record length)
Dim iControlRec As Byte() 'Stores the EBCDIC Control Record (First 6 characters of record)
Dim bEOR As Boolean 'End of Record Flag
Dim bBOF As Boolean = True 'Beginning of file
Dim iConsumedChars As Integer = 0 'Stores the number of consumed characters in the current block
Dim bIgnoreRecord As Boolean = mbIgnoreFirstRecord 'Ignores the first record if set.
Dim ControlArray(5) As Char 'Stores Control Record (first 6 bytes)
Dim chrArray As Char() 'Stores characters just after read from file
Dim sr As New StreamReader(SourceFile, EBCDICEncoding)
Dim sw As New StreamWriter(DestinationFile)
'Set the RecordLength to the RecordLength Property (below)
iRecordLength = miBlockLength
'Loop through entire file
Do Until sr.EndOfStream = True
'If using a Control Record, then check record for valid data.
If mbUseControlRec = True Then
'Read the Control Record (first 6 characters of the record)
sr.ReadBlock(ControlArray, 0, 6)
'Update the value of consumed (read) characters
iConsumedChars += ControlArray.Length
'Get the bytes of the Control Record Array
iControlRec = EBCDICEncoding.GetBytes(ControlArray)
'Set the line spacing (position 3 divided by 64)
' (64 decimal = Single Spacing; 128 decimal = Double Spacing)
iLineSpace = iControlRec(2) / 64
'Check the Control record for End of File
'If the Control record has a 8 or 10 in position 1, and a 1 in postion 2, then it is the end of the file
If (iControlRec(0) = 8 OrElse iControlRec(0) = 10) AndAlso _
iControlRec(1) = 1 Then
If bBOF = False Then
Exit Do
Else
'The Beginning of file flag is set to true by default, so when the first
' record is encountered, it is bypassed and the bBOF flag is set to False
bBOF = False
End If 'If bBOF = Fals
End If 'If (iControlRec(0) = 8 OrElse
'Set the default value for the End of Record flag to True
' If the Control Record has all zeros, then it's True, else False
bEOR = True
'If the Control record contains all zeros, bEOR will stay True, else it will be set to False
For i As Integer = 0 To 5
If iControlRec(i) > 0 Then
bEOR = False
Exit For
End If 'If iControlRec(i) > 0
Next 'For i As Integer = 0 To 5
If bEOR = False Then
'Convert EBCDIC character to ASCII
'Multiply the 6th byte by 6 to get record length
' Why multiply by 6? Because it works.
iControlPosSix = EBCDICEncoding.GetBytes(ControlArray(5))
'If the 4th position of the control record is an 8, then add 6
' to the record length to pick up remaining characters.
If iControlRec(3) = 8 Then
iRecordLength = CInt(iControlPosSix(0)) * 6 + 6
Else
iRecordLength = CInt(iControlPosSix(0)) * 6
End If
'Add the length of the record to the Consumed Characters counter
iConsumedChars += iRecordLength
Else
'If the Control Record had all zeros in it, then it is the end of the Block.
'Consume the remainder of the block so we can continue at the beginning of the next block.
ReDim chrArray(miBlockLength - iConsumedChars - 1)
'ReDim chrArray(iRecordLength - iConsumedChars - 1)
'Consume (read) the remaining characters in the block.
' We are not doing anything with them because they are not actual records.
'sr.ReadBlock(chrArray, 0, iRecordLength - iConsumedChars)
sr.ReadBlock(chrArray, 0, miBlockLength - iConsumedChars)
'Reset the Consumed Characters counter
iConsumedChars = 0
'Set the Record Length to 0 so it will not be processed below.
iRecordLength = 0
End If ' If bEOR = False
End If 'If mbUseControlRec = True
If iRecordLength > 0 Then
'Resize our array, dumping previous data. Because Arrays are Zero (0) based, subtract 1 from the Record length.
ReDim chrArray(iRecordLength - 1)
'Read the specfied record length, without the Control Record, because we already consumed (read) it.
sr.ReadBlock(chrArray, 0, iRecordLength)
'Copy Character Array to String Array, Converting in the process, then Join the Array to a string
sRecord = Join(Array.ConvertAll(chrArray, New Converter(Of Char, String)(AddressOf ChrToStr)), "")
'If the record length was 0, then the Join method may return Nothing
If IsNothing(sRecord) = False Then
If bIgnoreRecord = True Then
'Do nothing - bypass record
'Reset flag
bIgnoreRecord = False
Else
'Write the line out, LTrimming the specified number of characters.
If sRecord.Length >= miLTrim Then
sw.WriteLine(sRecord.Remove(0, miLTrim))
Else
sw.WriteLine(sRecord.Remove(0, sRecord.Length))
End If ' If sRecord.Length >= miLTrim
'Write out the number of blank lines specified by the 3rd control character.
For i As Integer = 1 To iLineSpace - 1
sw.WriteLine("")
Next 'For i As Integer = 1 To iLineSpace
End If 'If bIgnoreRecord = True
'Obviously, if we have read more characters from the file than the designated size of the block,
' then subtract the number of characters we have read into the next block from the block size.
If iConsumedChars > miBlockLength Then
'If iConsumedChars > iRecordLength Then
iConsumedChars = iConsumedChars - miBlockLength
'iConsumedChars = iConsumedChars - iRecordLength
End If
End If 'If IsNothing(sRecord) = False
End If 'If iRecordLength > 0
'Allow computer to process (works in a class module, not in a dll)
'Application.DoEvents()
Loop
'Destroy StreamReader (sr)
sr.Close()
sr.Dispose()
'Destroy StreamWriter (sw)
sw.Close()
sw.Dispose()
End Sub
''' <summary>
''' Translates 1 EBCDIC Character (Char) to an ASCII String
''' </summary>
''' <param name="chr"></param>
''' <returns></returns>
''' <remarks></remarks>
Private Function ChrToStr(ByVal chr As Char) As String
Dim sReturn As String = ""
'Convert character into byte
Dim EBCDICbyte As Byte() = EBCDICEncoding.GetBytes(chr)
'Convert EBCDIC byte to ASCII byte
Dim ASCIIByte As Byte() = Encoding.Convert(EBCDICEncoding, ASCIIEncoding, EBCDICbyte)
sReturn = Encoding.ASCII.GetString(ASCIIByte)
Return sReturn
End Function
''' <summary>
''' Translates an EBCDIC String to an ASCII String
''' </summary>
''' <param name="sStringToTranslate"></param>
''' <returns>String</returns>
''' <remarks></remarks>
Public Function TranslateString(ByVal sStringToTranslate As String) As String
Dim i As Integer = 0
Dim sReturn As New System.Text.StringBuilder()
'Loop through the string and translate each character
For i = 0 To sStringToTranslate.Length - 1
sReturn.Append(ChrToStr(sStringToTranslate.Substring(i, 1)))
Next
Return sReturn.ToString()
End Function
''' <summary>
''' Translates 1 EBCDIC Character (Char) to an ASCII String
''' </summary>
''' <param name="sCharacterToTranslate"></param>
''' <returns>String</returns>
''' <remarks></remarks>
Public Function TranslateCharacter(ByVal sCharacterToTranslate As Char) As String
Return ChrToStr(sCharacterToTranslate)
End Function
''' <summary>
''' Translates an EBCDIC Character (Char) Array to an ASCII String
''' </summary>
''' <param name="sCharacterArrayToTranslate"></param>
''' <returns>String</returns>
''' <remarks>Remarks</remarks>
Public Function TranslateCharacters(ByVal sCharacterArrayToTranslate As Char()) As String
Dim sReturn As String = ""
'Copy Character Array to String Array, Converting in the process, then Join the Array to a string
sReturn = Join(Array.ConvertAll(sCharacterArrayToTranslate, _
New Converter(Of Char, String)(AddressOf ChrToStr)), "")
Return sReturn
End Function
''' <summary>
''' Block Length must be set. You can set the BlockLength for specific block sizes (Ex: 134).
''' Set UseControlRecord = False for files with specific block sizes (Default is True)
''' </summary>
''' <value>0</value>
''' <returns>Integer</returns>
''' <remarks></remarks>
Public Property BlockLength() As Integer
Get
Return miBlockLength
End Get
Set(ByVal value As Integer)
miBlockLength = value
End Set
End Property
''' <summary>
''' Determines whether a ControlKey is used to calculate RecordLength of valid data
''' </summary>
''' <value>Default value is True</value>
''' <returns>Boolean</returns>
''' <remarks></remarks>
Public Property UseControlRecord() As Boolean
Get
Return mbUseControlRec
End Get
Set(ByVal value As Boolean)
mbUseControlRec = value
End Set
End Property
''' <summary>
''' Ignores first record if set (Default is True)
''' </summary>
''' <value>Default is True</value>
''' <returns>Boolean</returns>
''' <remarks></remarks>
Public Property IgnoreFirstRecord() As Boolean
Get
Return mbIgnoreFirstRecord
End Get
Set(ByVal value As Boolean)
mbIgnoreFirstRecord = value
End Set
End Property
''' <summary>
''' Trims the left side of every string the specfied number of characters. Default is 0.
''' </summary>
''' <value>Default is 0.</value>
''' <returns>Integer</returns>
''' <remarks></remarks>
Public Property LTrim() As Integer
Get
Return miLTrim
End Get
Set(ByVal value As Integer)
miLTrim = value
End Set
End Property
End Class

Some useful links for EBCDIC translation:
Translation table - useful to do check some of the values in the packed decimal fields:
http://www.simotime.com/asc2ebc1.htm
List of code pages in msdn:
http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx
And a piece of code to convert the byte array fields in C#:
// 500 is the code page for IBM EBCDIC International
System.Text.Encoding enc = new System.Text.Encoding(500);
string value = enc.GetString(byteArrayField);

The packed fields are the same in EBCDIC or ASCII. Do not run the EBCDIC to ASCII conversion on them. In .Net dump them into a byte[].
You use bitwise masks and shifts to pack/unpack.
-- But bitwise ops only apply to integer types in .Net so you need to jump through some hoops!
A good COBOL or C artist can point you in the right direction.
Find one of the old guys and pay your dues (about three beers should do it).

The “ASCII transfer type” will transfer the files as regular text files. So files becoming corrupt when we transfer packed decimal or binary data files in ASCII transfer type. The “Binary transfer type” will transfer the data in binary mode which handles the files as binary data instead of text data. So we have to use Binary transfer type here.
Reference : https://www.codeproject.com/Tips/673240/EBCDIC-to-ASCII-Converter
Once your file is ready, here is the code to convert packed decimal to human readable decimal.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp2
{
class Program
{
static void Main(string[] args)
{
var path = #"C:\FileName.BIN.dat";
var templates = new List<Template>
{
new Template{StartPos=1,CharLength=4,Type="AlphaNum"},
new Template{StartPos=5,CharLength=1,Type="AlphaNum"},
new Template{StartPos=6,CharLength=8,Type="AlphaNum"},
new Template{StartPos=14,CharLength=1,Type="AlphaNum"},
new Template{StartPos=46,CharLength=4,Type="Packed",DecimalPlace=2},
new Template{StartPos=54,CharLength=5,Type="Packed",DecimalPlace=0},
new Template{StartPos=60,CharLength=4,Type="Packed",DecimalPlace=2},
new Template{StartPos=64,CharLength=1,Type="AlphaNum"}
};
var allBytes = File.ReadAllBytes(path);
for (int i = 0; i < allBytes.Length; i += 66)
{
var IsLastline = (allBytes.Length - i) < 66;
var lineLength = IsLastline ? 64 : 66;
byte[] lineBytes = new byte[lineLength];
Array.Copy(allBytes, i, lineBytes, 0, lineLength);
var outArray = new string[templates.Count];
int index = 0;
foreach (var temp in templates)
{
byte[] amoutBytes = new byte[temp.CharLength];
Array.Copy(lineBytes, temp.StartPos - 1, amoutBytes, 0,
temp.CharLength);
var final = "";
if (temp.Type == "Packed")
{
final = Unpack(amoutBytes, temp.DecimalPlace).ToString();
}
else
{
final = ConvertEbcdicString(amoutBytes);
}
outArray[index] = final;
index++;
}
Console.WriteLine(string.Join(" ", outArray));
}
Console.ReadLine();
}
private static string ConvertEbcdicString(byte[] ebcdicBytes)
{
if (ebcdicBytes.All(p => p == 0x00 || p == 0xFF))
{
//Every byte is either 0x00 or 0xFF (fillers)
return string.Empty;
}
Encoding ebcdicEnc = Encoding.GetEncoding("IBM037");
string result = ebcdicEnc.GetString(ebcdicBytes); // convert EBCDIC Bytes ->
Unicode string
return result;
}
private static Decimal Unpack(byte[] inp, int scale)
{
long lo = 0;
long mid = 0;
long hi = 0;
bool isNegative;
// this nybble stores only the sign, not a digit.
// "C" hex is positive, "D" hex is negative, AlphaNumd "F" hex is unsigned.
var ff = nibble(inp, 0);
switch (ff)
{
case 0x0D:
isNegative = true;
break;
case 0x0F:
case 0x0C:
isNegative = false;
break;
default:
throw new Exception("Bad sign nibble");
}
long intermediate;
long carry;
long digit;
for (int j = inp.Length * 2 - 1; j > 0; j--)
{
// multiply by 10
intermediate = lo * 10;
lo = intermediate & 0xffffffff;
carry = intermediate >> 32;
intermediate = mid * 10 + carry;
mid = intermediate & 0xffffffff;
carry = intermediate >> 32;
intermediate = hi * 10 + carry;
hi = intermediate & 0xffffffff;
carry = intermediate >> 32;
// By limiting input length to 14, we ensure overflow will never occur
digit = nibble(inp, j);
if (digit > 9)
{
throw new Exception("Bad digit");
}
intermediate = lo + digit;
lo = intermediate & 0xffffffff;
carry = intermediate >> 32;
if (carry > 0)
{
intermediate = mid + carry;
mid = intermediate & 0xffffffff;
carry = intermediate >> 32;
if (carry > 0)
{
intermediate = hi + carry;
hi = intermediate & 0xffffffff;
carry = intermediate >> 32;
// carry should never be non-zero. Back up with validation
}
}
}
return new Decimal((int)lo, (int)mid, (int)hi, isNegative, (byte)scale);
}
private static int nibble(byte[] inp, int nibbleNo)
{
int b = inp[inp.Length - 1 - nibbleNo / 2];
return (nibbleNo % 2 == 0) ? (b & 0x0000000F) : (b >> 4);
}
class Template
{
public string Name { get; set; }
public string Type { get; set; }
public int StartPos { get; set; }
public int CharLength { get; set; }
public int DecimalPlace { get; set; }
}
}
}

Files must be transferred as binary. Here's a much shorter way to do it:
using System.Linq;
namespace SomeNamespace
{
public static class SomeExtensionClass
{
/// <summary>
/// computes the actual decimal value from an IBM "Packed Decimal" 9(x)v9 (COBOL) format
/// </summary>
/// <param name="value">byte[]</param>
/// <param name="precision">byte; decimal places, default 2</param>
/// <returns>decimal</returns>
public static decimal FromPackedDecimal(this byte[] value, byte precision = 2)
{
if (value.Length < 1)
{
throw new System.InvalidOperationException("Cannot unpack empty bytes.");
}
double power = System.Math.Pow(10, precision);
if (power > long.MaxValue)
{
throw new System.InvalidOperationException(
$"Precision too large for valid calculation: {precision}");
}
string hex = System.BitConverter.ToString(value).Replace("-", "");
var bytes = Enumerable.Range(0, hex.Length)
.Select(x => System.Convert.ToByte($"0{hex.Substring(x, 1)}", 16))
.ToList();
long place = 1;
decimal ret = 0;
for (int i = bytes.Count - 2; i > -1; i--)
{
ret += (bytes[i] * place);
place *= 10;
}
ret /= (long)power;
return (bytes.Last() & (1 << 7)) != 0 ? ret * -1 : ret;
}
}
}

Related

Having trouble unpacking Comp-3 in .Net. There are letter characters aside from sign character inside Comp-3 value

I am trying to import a Mainframe EDI File back to SQL Server using .NET and I am having problems unpacking some comp-3 fields.
This file was from one of our clients and I have the Copy Book layout for the following fields:
05 EH-GROSS-INVOICE-AMT PIC S9(07)V9999 COMP-3.
05 EH-CASH-DISCOUNT-AMT PIC S9(07)V9999 COMP-3.
05 EH-CASH-DISCOUNT-PCT PIC S9(03)V9999 COMP-3.
I will just be focusing on these 3 fields as all other fields are PIC(X) and are already Unicode values. I loaded everything up with the help of this Tool Ebcdic2Ascii that was created by Max Vagner. I just did a bit of modification on the "Unpack" function and have modified it to
private string Unpack(byte[] packedBytes, int decimalPlaces, out bool isParsedSuccessfully)
{
isParsedSuccessfully = true;
return BitConverter.ToString(packedBytes);
}
in order for me to get the following sample data:
EH-GROSS-INVOICE-AMT EH-CASH-DISCOUNT-AMT EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
00-1A-1A-03-26-0C 00-00-00-00-00-0C 00-00-00-0C
00-0A-1A-1A-00-0C 00-00-1A-1A-2D-0C 00-1A-00-0C
00-09-10-20-00-0C 00-00-10-1A-1A-0C 00-1A-00-0C
Here is a sample code that I created for Unpacking these values based on my understanding of Comp-3 values:
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var result1 = UnpackMod("00-1A-1A-03-26-0C", 4);
var result2 = UnpackMod("00-00-00-00-00-0C", 4);
var result3 = UnpackMod("00-00-00-0C", 4);
Console.WriteLine($"{result1}\n{result2}\n{result3}\n");
var result4 = UnpackMod("00-0A-1A-1A-00-0C", 4);
var result5 = UnpackMod("00-00-1A-1A-2D-0C", 4);
var result6 = UnpackMod("00-1A-00-0C", 4);
Console.WriteLine($"{result4}\n{result5}\n{result6}\n");
var result7 = UnpackMod("00-09-10-20-00-0C", 4);
var result8 = UnpackMod("00-00-10-1A-1A-0C", 4);
var result9 = UnpackMod("00-1A-00-0C", 4);
Console.WriteLine($"{result7}\n{result8}\n{result9}");
Console.ReadLine();
}
/// <summary>
/// Method for unpacking Comp-3 fields.
/// </summary>
/// <param name="hexString"></param>
/// <param name="decimalPlaces"></param>
/// <returns>Returns numeric string if parse was successful; else Return input hex string</returns>
private static string UnpackMod(string inputString, int decimalPlaces)
{
var outputString = inputString;
// Remove "-".
outputString = outputString.Replace("-", "");
// Check last character for sign.
string lastChar = outputString.Substring(outputString.Length - 1, 1);
bool isNegative = (lastChar == "D" || lastChar == "B");
// Remove sign character.
if (lastChar == "C" || lastChar == "A" || lastChar == "E" || lastChar == "F" || lastChar == "D" || lastChar == "B")
{
outputString = outputString.Substring(0, outputString.Length - 1);
}
// Place decimal point.
outputString = outputString.Insert(outputString.Length - decimalPlaces, ".");
// Check if parsed value is numeric. This will also eliminate all leading 0.
var isParsedSuccessfully = decimal.TryParse(outputString, out decimal decimalValue);
// If isParsedSuccessfully is true then return numeric string else return inputString..
string result = "NULL";
if (isParsedSuccessfully)
{
// Convert value to negative.
if (isNegative)
{
decimalValue = decimalValue * -1;
}
result = decimalValue.ToString();
}
return result;
}
}
}
After running the sample code I was able to get the following results:
EH-GROSS-INVOICE-AMT EH-CASH-DISCOUNT-AMT EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
NULL 0.0000 0.0000
NULL NULL NULL
9102.0000 NULL NULL
As you can see I was only able to get following 3 values correctly:
00-09-10-20-00-0C -> 9102.0000
00-00-00-00-00-0C -> 0.0000
00-00-00-0C -> 0.0000
As referenced from this source: http://www.3480-3590-data-conversion.com/article-packed-fields.html. I have the following understanding about Comp-3:
COBOL Comp-3 is a binary field type that puts ("packs") two digits into each byte, using a notation called Binary Coded Decimal, or BCD.
The Binary Coded Decimal (BCD) data type is just as its name suggests -- it is a value stored in decimal (base ten) notation, and each digit is binary coded. Since a digit only has ten possible values (0-9).
The low nibble of the least significant byte is used to store the sign for the number. This nibble stores only the sign, not a digit. "C" hex is positive, "D" hex is negative, and "F" hex is unsigned.
Since I know that BCD should only be values 0-9 and that there should just only be a character at the end which could either be "C", "D" or "F". I don't know how to unpack the following values:
00-1A-1A-03-26-0C
00-0A-1A-1A-00-0C
00-00-1A-1A-2D-0C
00-1A-00-0C
00-00-10-1A-1A-0C
00-1A-00-0C
These values has other characters beside the sign character. I have a feeling that the data has already been converted because if it is not then there should be no readable values there not unless you apply an Encoding. I am still not sure about this and would love any insights on this. Thanks.
First, PIC X is not Unicode in COBOL.
Quoting myself from here...
It is common for mainframe data to include both text and binary data
in a single record, for example a name, a currency amount, and a
quantity:
Hopper Grace ar% .
...which would be...
x'C8969797859940404040C799818385404040404081996C004B'
...in hex. This is code page 37, commonly referred to as EBCDIC.
[...]Converting to code page 1250, commonly in use on Microsoft
Windows, you would end up with...
x'486F707065722020202047726163652020202020617225002E'
...where the text data is translated but the packed data is destroyed.
The packed data no longer has a valid sign in the last nibble (the
lower half of the last byte), the currency amount itself has been
changed as has the quantity (from decimal 75 to decimal 11,776 due to
both code page conversion and mangling of a big endian number as a
little endian number).
Likely your data was code page converted on transfer from the mainframe. If you know the original code page and the code page it was converted to, then you might be able to unscramble the packed data.
I say might because, if you're lucky, the hex values you have will have been mapped one-to-one with hex values in the original code page. Note that it is common for both EBCDIC x'15' and x'0D' to be mapped to ASCII x'0D'.

How do I convert 'Single' to binary?

I'm putting together a small handful of extension methods, for converting numeric values to their binary equivalents.
But I'm running into a problem.
We may use a Convert.ToString() overload to convert these types to binary:
Byte
Short
Integer
Long
For example:
Dim iInteger As Integer
Dim sBinary As String
iInteger = Integer.MaxValue
sBinary = Convert.ToString(iInteger, 2)
But there isn't an overload for Single that accepts the base value. The single-argument overload returns scientific notation, not the binary value.
I've tried this code, adapted from this answer:
Public Function ToBinary(Value As Single) As String
Dim aBits As Integer()
Dim _
iLength,
iIndex As Integer
Select Case Value
Case < BitLengths.BYTE : iLength = 7
Case < BitLengths.WORD : iLength = 15
Case < BitLengths.DWORD : iLength = 31
Case < BitLengths.QWORD : iLength = 63
End Select
aBits = New Integer(iLength) {}
For iIndex = 0 To iLength
aBits(iLength - iIndex) = Value Mod 2
Value \= 2
Next
ToBinary = String.Empty
aBits.ForEach(Sub(Bit)
ToBinary &= Bit
End Sub)
End Function
Unfortunately, however, it returns inaccurate results:
Input: 1361294667
Result: Assert.AreEqual failed. Expected:<01010001001000111011010101001011>. Actual:<01010001001000111011010110000000>.
We can get the expected value from the Programmer View of the old Windows 7 calculator:
Given these, how may we reliably convert a Single value to a binary string?
--EDIT--
I found this statement: "There is not an exact binary representation of 0.1 or 0.01." That pretty much says it all. I've decided to abandon this, as it's become clear that the effort is a fruitless pursuit.
The example expected value you've provided is just a straight binary representation of the number, however while probably not the most efficient way if you wanted to get the IEEE-754 representation of the number in binary you could use BitConverter.GetBytes as in the following example:
Sub Main
Dim i As Int32 = 1361294667
Console.WriteLine(ObjectAsBinary(i))
Dim s As Single = 1361294667
Console.WriteLine(ObjectAsBinary(s))
End Sub
Private Function ObjectAsBinary(o As Object) As String
Dim bytes = BitConverter.GetBytes(o)
If BitConverter.IsLittleEndian Then
Array.Reverse(bytes)
End If
Dim result As String = ""
For Each b In bytes
result &= Convert.ToString(b, 2).PadLeft(8, "0")
Next
Return result
End Function
That code outputs the following:
01010001001000111011010101001011 - Matches your example
01001110101000100100011101101011 - Matches IEEE-754 from IEEE-754 Floating Point Converter

Pascal to C# conversion

I am trying to convert this pascal code into C# in order to communicate with a peripheral device attached to a comm port. This piece of code should calculate the Control Byte, however I'm not getting the right hex Value therefore I'm wondering if I'm converting the code in the right way.
Pascal:
begin
check := 255;
for i:= 3 to length(sequence)-4 do
check := check xor byte(sequence[i]);
end;
C#:
int check = 255;
for (int x = 3; x < (sequence.Length - 4); x++)
{
check = check ^ (byte)(sequence[x]);
}
Pascal function:
{ *** conversion of number into string ‘hex’ *** }
function word_to_hex (w: word) : string;
var
i : integer;
s : string;
b : byte;
c : char;
begin
s := ‘’;
for i:= 0 to 3 do
begin
b := (hi(w) shr 4) and 15;
case b of
0..9 : c := char(b+$30);
10..15 : c := char(b+$41-10);
end;
s := s + c;
w := w shl 4;
end;
word_ to_hex := s;
end;
C# Equivalent:
public string ControlByte(string check)
{
string s = "";
byte b;
char c = '\0';
//shift = check >> 4 & 15;
for (int x = 0; x <= 3; x++)
{
b = (byte)((Convert.ToInt32(check) >> 4) & 15);
if (b >= 0 && b <= 9)
{
c = (char)(b + 0x30);
}
else if (b >= 10 && b <= 15)
{
c = (char)(b + 0x41 - 10);
}
s = s + c;
check = (Convert.ToInt32(check) << 4).ToString();
}
return s;
}
And last pascal:
function byte_to_hex (b:byte) : string;
begin
byte_to_hex := copy(word_to_hex(word(b)),3,2);
end;
which i am not sure how is substringing the result from the function. So please let me know if there is something wrong with the code conversion and whether I need to convert the function result into bytes. I appreciate your help, UF.
Further info EDIT: Initially I send a string sequence containing the command and information that printer is supposed to print. Since every sequence has a unique Control Byte (in Hex) I have to calculate this from the sequence (sequence = "P1;1$l201PrinterPrinterPrinter1B/100.00/100.00/0/\") which is what upper code does according to POSNET=>"cc – control byte, encoded as 2 HEX digits (EXOR of all characters after ESC P to this byte with #255 initial quantity), according to the following algorithm in PASCAL language:(see first code block)".=>1. check number calculated in the above loop which constitutes control byte should be recoded into two HEX characters (ASCII characters from scope: ‘0’..’9’,’A’..’F’,’a’..’f’), utilizing the following byte_to_hex function:(see third code block). =>{* conversion of byte into 2 characters *}(see 5th code block)
The most obvious problem that I can see is that the Pascal code operates on 1-based 8 bit encoded strings, and the C# code operates on 0-based 16 bit encoded strings. To convert the Pascal/Delphi code that you use to C# you need to address the mis-match. Perhaps like this:
byte[] bytes = Encoding.Default.GetBytes(sequence);
int check = 255;
for (int i = 2; i < bytes.Length-4; i++)
{
check ^= bytes[i];
}
Now, in order to write this I've had to make quite a few assumptions, because you did not include anywhere near enough code in the question. Here's what I assumed:
The Pascal sequence variable is a 1-based 8 bit ANSI encoded Delphi AnsiString.
The Pascal check variable is a Delphi 32 bit signed Integer.
The C# sequence variable is a C# string.
If any of those assumptions prove to be false, then the code above will be no good. For instance, perhaps the Pascal check is really Byte. In which case I guess the C# code should be:
byte[] bytes = Encoding.Default.GetBytes(sequence);
byte check = 255;
for (int i = 2; i < bytes.Length - 4; i++)
{
check ^= bytes[i];
}
I hope that this persuades you of the importance of supplying complete information.
That's really all the meat of this question. The rest of the code concerns converting values to hex strings in C# code. That has been covered again and again here on Stack Overflow. For instance:
C# convert integer to hex and back again
How do you convert Byte Array to Hexadecimal String, and vice versa?
There are many many more such questions.

In SQL Server, I need to pack 2 characters into 1 character, similar to HEX. How?

I have a SQL Server table that has a column in it that is defined as Binary(7).
It is updated with data from a Cobol program that has Comp-3 data (packed decimal).
I wrote a C# program to take a number and create the Comp-3 value. I have it available to SQL Server via CLR Integration. I'm able to access it like a stored procedure.
My problem is, I need to take the value from this program and save it in the binary column. When I select a row of data that is already in there, I am seeing a value like the following:
0x00012F0000000F
The value shown is COBOL comp-3 (packed decimal) data, stored in the SQL table. Remember, this field is defined as Binary(7). There are two values concatenated and stored here. Unsigned value 12, and unsigned value 0.
I need to concatenate 0x00012F (length of 3 characters) and 0x0000000F (length of 4 characters) together and write it to the column.
My question is two part.
1) I am able to return a string representation of the Comp-3 value from my program. But, I'm not sure if this is the format I need to return to make this work. What format should I return to SQL, so it can be used correctly?
2) What do I need to do to convert this to make it work?
I hope I was clear enough. It's a lot to digest...Thanks!
I figured it out!
I needed to change the output to byte[], and reference it coming out of the program in SQL as varbinary.
This is the code, if anyone else in the future needs it. I hope this helps others that need to create Comp-3 (packed decimal) in SQL. I'll outline the steps to use it below.
Below is the source for the C# program. Compile it as a dll.
using System;
using System.Collections.Generic;
using System.Data;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
namespace Numeric2Comp3
{
//PackedDecimal conversions
public class PackedDecimal
{
[Microsoft.SqlServer.Server.SqlProcedure]
public static void ToComp3(string numberin, out byte[] hexarray, out string hexvalue)
{
long value;
bool result = Int64.TryParse(numberin, out value);
if (!result)
{
hexarray = null;
hexvalue = null;
return;
}
Stack<byte> comp3 = new Stack<byte>(10);
byte currentByte;
if (value < 0)
{
currentByte = 0x0d; //signed -
value = -value;
}
else if (numberin.Trim().StartsWith("+"))
{
currentByte = 0x0c; //signed +
}
else
{
currentByte = 0x0f; //unsigned
}
bool byteComplete = false;
while (value != 0)
{
if (byteComplete)
currentByte = (byte)(value % 10);
else
currentByte |= (byte)((value % 10) << 4);
value /= 10;
byteComplete = !byteComplete;
if (byteComplete)
comp3.Push(currentByte);
}
if (!byteComplete)
comp3.Push(currentByte);
hexarray = comp3.ToArray();
hexvalue = bytesToHex(comp3.ToArray());
}
private static string bytesToHex(byte[] buf)
{
string HexChars = "0123456789ABCDEF";
System.Text.StringBuilder sb = new System.Text.StringBuilder((buf.Length / 2) * 5 + 3);
for (int i = 0; i < buf.Length; i++)
{
sbyte b = Convert.ToSByte(buf[i]);
b = (sbyte)(b >> 4); // Hit to bottom
b = (sbyte)(b & 0x0F); // get HI byte
sb.Append(HexChars[b]);
b = Convert.ToSByte(buf[i]); // refresh
b = (sbyte)(b & 0x0F); // get LOW byte
sb.Append(HexChars[b]);
}
return sb.ToString();
}
}
}
Save the dll somewhere in a folder on the SQL Server machine. I used 'C:\NTA\Libraries\Numeric2Comp3.dll'.
Next, you'll need to enable CLR Integration on SQL Server. Read about it on Microsoft's website here: Introduction to SQL Server CLR Integration. Open SQL Server Management Studio and execute the following to enable CLR Integration:
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
Once that is done, execute the following in Management Studio:
CREATE ASSEMBLY Numeric2Comp3 from 'C:\NTA\Libraries\Numeric2Comp3.dll' WITH PERMISSION_SET = SAFE
You can execute the following to remove the assembly, if you need to for any reason:
drop assembly Numeric2Comp3
Next, in Management studio, execute the following to create the stored procedure to reference the dll:
CREATE PROCEDURE Numeric2Comp3
#numberin nchar(27), #hexarray varbinary(27) OUTPUT, #hexstring nchar(27) OUTPUT
AS
EXTERNAL NAME Numeric2Comp3.[Numeric2Comp3.PackedDecimal].ToComp3
If everything above runs successfully, you're done!
Here is some SQL to test it out:
DECLARE #in nchar(27), #hexstring nchar(27), #hexarray varbinary(27)
set #in = '20120123'
EXEC Numeric2Comp3 #in, #hexarray out, #hexstring out
select len(#hexarray), #hexarray
select len(#hexstring), #hexstring
This will return the following values:
(No column name) (No column name)
5 0x020120123F
(No column name) (No column name)
10 020120123F
In my case, what I need is the value coming out of #hexarray. This will be written to the Binary column in my table.
I hope this helps others that may need it!
If you have Comp-3 stored in a binary filed as a hex string, well I wonder if the process that created this is working as it should.
Be that as it may, the best solution would be to cast them in the select; the cast sytax is simple, but I don't know if a comp-3 cast is available.
Here are examples on MSDN.
So let's work with the string: To transform the string you use this:
string in2 = "020120123C";
long iOut = Convert.ToInt64(in2.Substring(0, in2.Length - 1))
* (in2.Substring(in2.Length - 1, 1)=="D"? -1 : 1 ) ;
It treats the last character as th sign, with 'D' being the one negative sign. Both 'F' and 'C' would be positive.
Will you also need to write the data back?
I am curious: What string representaion comes out for fractional numbers like 123.45 ?
( I'll leave the original answer for reference..:)
Here are a few lines of code to show how you can work with bit and bytes.
The operations to use are:
shift the data n bits right or left: << n or >> n
masking/clearing unwanted high bits: e.g. set all to 0 except the last 4 bits: & 0xF
adding bitwise: |
If you have a string representation like the one you have shown the out3 and out4 byte would be the result. The other conversions are just examples how to process bit; you can't possibly have decimals as binarys or binarys that look like decimals. Maybe you get integers - then out7 and out8 would be the results.
To combine two bytes into one integer look at the last calculation!
// 3 possible inputs:
long input = 0x00012F0000071F;
long input2 = 3143;
string inputS = "0x00012F0000071F";
// take binary input as such
byte out1 = (byte)((input >> 4) & 0xFFFFFF );
byte out2 = (byte)(input >> 36);
// take string as decimals
byte out3 = Convert.ToByte(inputS.Substring(5, 2));
byte out4 = Convert.ToByte(inputS.Substring(13, 2));
// take binary as decimal
byte out5 = (byte)(10 * ((input >> 40) & 0xF) + (byte)((input >> 36) & 0xF));
byte out6 = (byte)(10 * ((input >> 8) & 0xF) + (byte)((input >> 4) & 0xF));
// take integer and pick out 3rd and last byte
byte out7 = (byte)(input2 >> 8);
byte out8 = (byte)(input2 & 0xFF);
// combine two bytes to one integer
int byte1and2 = (byte)(12) << 8 | (byte)(71) ;
Console.WriteLine(out1.ToString());
Console.WriteLine(out2.ToString());
Console.WriteLine(out3.ToString());
Console.WriteLine(out4.ToString());
Console.WriteLine(out5.ToString());
Console.WriteLine(out6.ToString());
Console.WriteLine(out7.ToString());
Console.WriteLine(out8.ToString());
Console.WriteLine(byte2.ToString());

Variable length encoding of an integer

Whats the best way of doing variable length encoding of an unsigned integer value in C# ?
"The actual intent is to append a variable length encoded integer (bytes) to a file header."
For ex: "Content-Length" - Http Header
Can this be achieved with some changes in the logic below.
I have written some code which does that ....
A method I have used, which makes smaller values use fewer bytes, is to encode 7 bits of data + 1 bit of overhead pr. byte.
The encoding works only for positive values starting with zero, but can be modified if necessary to handle negative values as well.
The way the encoding works is like this:
Grab the lowest 7 bits of your value and store them in a byte, this is what you're going to output
Shift the value 7 bits to the right, getting rid of those 7 bits you just grabbed
If the value is non-zero (ie. after you shifted away 7 bits from it), set the high bit of the byte you're going to output before you output it
Output the byte
If the value is non-zero (ie. same check that resulted in setting the high bit), go back and repeat the steps from the start
To decode:
Start at bit-position 0
Read one byte from the file
Store whether the high bit is set, and mask it away
OR in the rest of the byte into your final value, at the bit-position you're at
If the high bit was set, increase the bit-position by 7, and repeat the steps, skipping the first one (don't reset the bit-position)
39 32 31 24 23 16 15 8 7 0
value: |DDDDDDDD|CCCCCCCC|BBBBBBBB|AAAAAAAA|
encoded: |0000DDDD|xDDDDCCC|xCCCCCBB|xBBBBBBA|xAAAAAAA| (note, stored in reverse order)
As you can see, the encoded value might occupy one additional byte that is just half-way used, due to the overhead of the control bits. If you expand this to a 64-bit value, the additional byte will be completely used, so there will still only be one byte of extra overhead.
Note: Since the encoding stores values one byte at a time, always in the same order, big- or little-endian systems will not change the layout of this. The least significant byte is always stored first, etc.
Ranges and their encoded size:
0 - 127 : 1 byte
128 - 16.383 : 2 bytes
16.384 - 2.097.151 : 3 bytes
2.097.152 - 268.435.455 : 4 bytes
268.435.456 - max-int32 : 5 bytes
Here's C# implementations for both:
void Main()
{
using (FileStream stream = new FileStream(#"c:\temp\test.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(stream))
writer.EncodeInt32(123456789);
using (FileStream stream = new FileStream(#"c:\temp\test.dat", FileMode.Open))
using (BinaryReader reader = new BinaryReader(stream))
reader.DecodeInt32().Dump();
}
// Define other methods and classes here
public static class Extensions
{
/// <summary>
/// Encodes the specified <see cref="Int32"/> value with a variable number of
/// bytes, and writes the encoded bytes to the specified writer.
/// </summary>
/// <param name="writer">
/// The <see cref="BinaryWriter"/> to write the encoded value to.
/// </param>
/// <param name="value">
/// The <see cref="Int32"/> value to encode and write to the <paramref name="writer"/>.
/// </param>
/// <exception cref="ArgumentNullException">
/// <para><paramref name="writer"/> is <c>null</c>.</para>
/// </exception>
/// <exception cref="ArgumentOutOfRangeException">
/// <para><paramref name="value"/> is less than 0.</para>
/// </exception>
/// <remarks>
/// See <see cref="DecodeInt32"/> for how to decode the value back from
/// a <see cref="BinaryReader"/>.
/// </remarks>
public static void EncodeInt32(this BinaryWriter writer, int value)
{
if (writer == null)
throw new ArgumentNullException("writer");
if (value < 0)
throw new ArgumentOutOfRangeException("value", value, "value must be 0 or greater");
do
{
byte lower7bits = (byte)(value & 0x7f);
value >>= 7;
if (value > 0)
lower7bits |= 128;
writer.Write(lower7bits);
} while (value > 0);
}
/// <summary>
/// Decodes a <see cref="Int32"/> value from a variable number of
/// bytes, originally encoded with <see cref="EncodeInt32"/> from the specified reader.
/// </summary>
/// <param name="reader">
/// The <see cref="BinaryReader"/> to read the encoded value from.
/// </param>
/// <returns>
/// The decoded <see cref="Int32"/> value.
/// </returns>
/// <exception cref="ArgumentNullException">
/// <para><paramref name="reader"/> is <c>null</c>.</para>
/// </exception>
public static int DecodeInt32(this BinaryReader reader)
{
if (reader == null)
throw new ArgumentNullException("reader");
bool more = true;
int value = 0;
int shift = 0;
while (more)
{
byte lower7bits = reader.ReadByte();
more = (lower7bits & 128) != 0;
value |= (lower7bits & 0x7f) << shift;
shift += 7;
}
return value;
}
}
You should first make an histogram of your value. If the distribution is random (that is, every bin of your histogram's count is close to the other), then you'll not be able encode more efficiently than the binary representation for this number.
If your histogram is unbalanced (that is, if some values are more present than others), then it might make sense to choose an encoding that's using less bits for these values, while using more bits for the other -unlikely- values.
For example, if the number you need to encode are 2x more likely to be smaller than 15 bits than larger, you can use the 16-th bit to tell so and only store/send 16 bits (if it's zero, then the upcoming byte will form a 16-bits numbers that can fit in a 32 bits number).
If it's 1, then the upcoming 25 bits will form a 32 bits numbers.
You loose one bit here but because it's unlikely, in the end, for a lot of number, you win more bits.
Obviously, this is a trivial case, and the extension of this to more than 2 cases is the Huffman algorithm that affect a "code word" that close-to optimum based on the probability of the numbers to appear.
There's also the arithmetic coding algorithm that does this too (and probably other).
In all cases, there is no solution that can store random value more efficiently than what's being done currently in computer memory.
You have to think about how long and how hard will be the implementation of such solution compared to the saving you'll get in the end to know if it's worth it. The language itself is not relevant here.
If small values are more common than large ones you can use Golomb coding.
I know this question was asked quite a few years ago, however for MIDI developers I thought to share some code from a personal midi project I'm working on. The code block is based on a segment from the book Maximum MIDI by Paul Messick (This example is a tweaked version for my own needs however, the concept is all there...).
public struct VariableLength
{
// Variable Length byte array to int
public VariableLength(byte[] bytes)
{
int index = 0;
int value = 0;
byte b;
do
{
value = (value << 7) | ((b = bytes[index]) & 0x7F);
index++;
} while ((b & 0x80) != 0);
Length = index;
Value = value;
Bytes = new byte[Length];
Array.Copy(bytes, 0, Bytes, 0, Length);
}
// Variable Length int to byte array
public VariableLength(int value)
{
Value = value;
byte[] bytes = new byte[4];
int index = 0;
int buffer = value & 0x7F;
while ((value >>= 7) > 0)
{
buffer <<= 8;
buffer |= 0x80;
buffer += (value & 0x7F);
}
while (true)
{
bytes[index] = (byte)buffer;
index++;
if ((buffer & 0x80) > 0)
buffer >>= 8;
else
break;
}
Length = index;
Bytes = new byte[index];
Array.Copy(bytes, 0, Bytes, 0, Length);
}
// Number of bytes used to store the variable length value
public int Length { get; private set; }
// Variable Length Value
public int Value { get; private set; }
// Bytes representing the integer value
public byte[] Bytes { get; private set; }
}
How to use:
public void Example()
{
//Convert an integer into a variable length byte
int varLenVal = 480;
VariableLength v = new VariableLength(varLenVal);
byte[] bytes = v.Bytes;
//Convert a variable length byte array into an integer
byte[] varLenByte = new byte[2]{131, 96};
VariableLength v = new VariableLength(varLenByte);
int result = v.Length;
}
As Grimbly pointed out, there exists BinaryReader.Read7BitEncodedInt and BinaryWriter.Write7BitEncodedInt. However, these are internal methods that one cannot call from a BinaryReader or -Writer object.
However, what you can do is take the internal implementation and copy it from the reader and the writer:
public static int Read7BitEncodedInt(this BinaryReader br) {
// Read out an Int32 7 bits at a time. The high bit
// of the byte when on means to continue reading more bytes.
int count = 0;
int shift = 0;
byte b;
do {
// Check for a corrupted stream. Read a max of 5 bytes.
// In a future version, add a DataFormatException.
if (shift == 5 * 7) // 5 bytes max per Int32, shift += 7
throw new FormatException("Format_Bad7BitInt32");
// ReadByte handles end of stream cases for us.
b = br.ReadByte();
count |= (b & 0x7F) << shift;
shift += 7;
} while ((b & 0x80) != 0);
return count;
}
public static void Write7BitEncodedInt(this BinaryWriter br, int value) {
// Write out an int 7 bits at a time. The high bit of the byte,
// when on, tells reader to continue reading more bytes.
uint v = (uint)value; // support negative numbers
while (v >= 0x80) {
br.Write((byte)(v | 0x80));
v >>= 7;
}
br.Write((byte)v);
}
When you include this code in any class of your project, you'll be able to use the methods on any BinaryReader/BinaryWriter object. They've only been slightly modified to make them work outside of their original classes (for example by changing ReadByte() to br.ReadByte()). The comments are from the original source.
BinaryReader.Read7BitEncodedInt Method ?
BinaryWriter.Write7BitEncodedInt Method ?

Categories