Reading a Cobol generated file

Reading a Cobol generated file - c#

I’m currently on the task of writing a c# application, which is going sit between two existing apps. All I know about the second application is that it processes files generated by the first one. The first application is written in Cobol.
Steps:
1) Cobol application, writes some files and copies to a directory.
2) The second application picks these files up and processes them.
My C# app would sit between 1) an 2). It would have to pick up the file generated by 1), read it, modify it and save it, so that application 2)
wouldn’t know I have even been there.
I have a few problems.
First of all if I open a file generated by 1) in notepad, most of it is unreadable while other parts are.
If I read the file, modify it and save, I must save the file with the same notation used by the cobol application, so that app 2), doesn´t know I´ve been there.
I´ve tried reading the file this way, but it´s still unreadable:
Code:
string ss = #"filename";
using (FileStream fs = new FileStream(ss, FileMode.Open))
{
StreamReader sr = new StreamReader(fs);
string gg = sr.ReadToEnd();
}
Also if I find a way of making it readable (using some sort of encoding technique), I´m afraid that when I save the file again, I may change it´s original format.
Any thoughts? Suggestions?

To read the COBOL-genned file, you'll need to know:
First, you'll need the record layout (copybook) for the file. A COBOL record layout will look something like this:
01 PATIENT-TREATMENTS.
05 PATIENT-NAME PIC X(30).
05 PATIENT-SS-NUMBER PIC 9(9).
05 NUMBER-OF-TREATMENTS PIC 99 COMP-3.
05 TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
DEPENDING ON NUMBER-OF-TREATMENTS
INDEXED BY TREATMENT-POINTER.
10 TREATMENT-DATE.
15 TREATMENT-DAY PIC 99.
15 TREATMENT-MONTH PIC 99.
15 TREATMENT-YEAR PIC 9(4).
10 TREATING-PHYSICIAN PIC X(30).
10 TREATMENT-CODE PIC 99.
You'll also need a copy of IBM's Principles of Operation (S/360, S370, z/OS, doesn't really matter for our purposes). Latest is available from IBM at
http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a (but you'll need an IBM account.
An older edition is available, gratis, at http://www.hack.org/mc/texts/principles-of-operation.pdf
Chapters 8 (Decimal Instructions) and 9 (Floating Point Overview and Support Instructions) are the interesting bits for our purposes.
Without that, you're pretty much lost.
Then, you need to understand COBOL data types. For instance:
PIC defines an alphameric formatted field (PIC 9(4), for example is 4 decimal digits, that might be filled with for space characters if missing). Pic 999V99 is 5 decimal digits, with an implied decimal point. So-on and so forthe.
BINARY is [usually] a signed fixed point binary integer. Usual sizes are halfword (2 octets) and fullword (4 octets).
COMP-1 is single precision floating point.
COMP-2 is double precision floating point.
If the datasource is an IBM mainframe, COMP-1 and COMP-2 likely won't be IEE floating point: it will be IBM's base-16 excess 64 floating point format. You'll need something like the S/370 Principles of Operation to help you understand it.
COMP-3 is 'packed decimal', of varying lengths. Packed decimal is a compact way of representing a decimal number. The declaration will look something like this: PIC S9999V99 COMP-3. This says that is it signed, consists of 6 decimal digits with an implied decimal point. Packed decimal represents each decimal digit as a nibble of an octet (hex values 0-9). The high-order digit is the upper nibble of the leftmost octet. The low nibble of the rightmost octet is a hex value A-F representing the sign. So the above PIC clause will require ceil( (6+1)/2 ) or 4 octets. the value -345.67, as represented by the above PIC clause will look like 0x0034567D. The actual sign value may vary (the default is C/positive, D/negative, but A, C, E and F are treated as positive, while only B and D are treated as negative). Again, see the S\370 Principles of Operation for details on the representation.
Related to COMP-3 is zoned decimal. This might be declared as `PIC S9999V99' (signed, 5 decimal digits, with an implied decimal point). Decimal digits, in EBCDIC, are the hex values 0xFO - 0xF9. 'Unpack' (mainframe machine instruction) takes a packed decimal field and turns in into a character field. The process is:
start with the rightmost octet. Invert it, so the sign nibble is on top and place it into the rightmost octet of the destination field.
Working from right to left (source and the target both), strip off each remaining nibble of the packed decimal field, and place it into the low nibble of the next available octet in the destination. Fill the high nibble with a hex F.
The operation ends when either the source or destination field is exhausted.
If the destination field is not exhausted, if it left-padded with zeroes by filling the remaining octets with decimal '0' (oxF0).
So our example value, -345.67, if stored with the default sign value (hex D), would get unpacked as 0xF0F0F0F3F4F5F6D7 ('0003456P', in EBDIC).
[There you go. There's a quiz later]
If the COBOL app lives on an IBM mainframe, has the file been converted from its native EBCDIC to ASCII? If not, you'll have to do the mapping your self (Hint: its not necessarily as straightforward as that might seem, since this might be a selective process -- only character fields get converted (COMP-1, COMP-2, COMP-3 and BINARY get excluded since they are a sequence of binary octets). Worse, there are multiple flavors of EBCDIC representations, due to the varying national implementations and varying print chains in use on different printers.
Oh...one last thing. The mainframe hardware tends to like different things aligned on halfword, word or doubleword boundaries, so the record layout may not map directly to the octets in the file as there may be padding octets inserted between fields to maintain the needed word alignment.
Good Luck.

I see from comments attached to your question that you are dealing with the “classic” COBOL batch file structure: Header record, detail records and trailer record.
This is probably bad news if you are responsible for creating the trailer record! The typical “trailer” record is used to identify the end-of-file and provides control information such as the number of records that precede it and various check sums and/or grand totals for “detail” records. In other words, you may need to read and summarize the entire file in order to create the trailer. Add to this the possibility that much of the data in the file is in Packed Decimal, Zoned Decimal or other COBOLish numeric data types, you could be in for a rough time.
You might want to question why you are adding trailer records to these files. Typically the “trailer” is produced by the same program or application that created the “detail” records. The trailer is supposed to act as a verification that the sending application/program wrote all of the data it was supposed to. The summary totals, counts etc. are used by the receiving application to verify that the detail records tally with the preceding details. This is supposed to serve as another verification that the sending application didn't muff up the data or that it was not corrupted en-route (no that wasn't a joke – but maybe it should be). When a "man in the middle" creates the trailers it kind of defeats the entire purpose of the exercise (no matter how flawed it might have been to begin with).

It would be useful to know which Cobol Dialect you are dealing with because there is
no single Cobol Format. Some Cobol Compilers (Micro Focus) put a "File Description" at the front of files (For Micro Focus VB / Indexed files).
Have a look at the RecordEditor (http://record-editor.sourceforge.net/). It has a File Wizard which could be very useful for you.
In the File Wizard set the file as Fixed-Width File (most common in Cobol). The program lets you try out different Record Lengths. When you get the correct record length, the Text fields should line up.
Latter on in the Wizard there is field search which can look for Binary, Comp-3, Text Fields.
There is some notes on using the RecordEditor's Wizard with an unknown file here
http://record-editor.sourceforge.net/Unkown.htm
Unless the file is coming from a Mainframe / AS400 it is unlikely to use EBCDIC (cp037 - Coded Page 37 is US EBCDIC), any text is most likely in Ascii.
The file probably contains Packed-Decimal (Comp3) and Binary-Integer data. Most Cobols
use Big-Endian (for Comp integers) even on Intel (little endian hardware).
One thing to remember with Cobol PIC s9(6)V99 comp is stored as a Binary Integer with x'0001' representing 0.01. So unless you have the Cobol definition you can not tell wether a binary 1 is 1 0.1, 0.01 etc

Related

Decompression of 2 byte to 4 byte float

I have some animation data (x,y,z), which is represented as 2 byte structures and written in Little Endian. I know that they should be a 4 byte floating point, so i have to unpack them. I collected a few sample values as precise as it was possible (they doesn't represent exactly packed values, but very close to them) and roughly divided packed values on few ranges.
Sample values (Little Endian):
0.048879981 - 0x0046
0.056879997 - 0x0047
0.253880024 - 0x0050
0.313879967 - 0x0051
0.623880029 - 0x0055
1.003879905 - 0x0058
-0.066120029 - 0x00С8
-0.1561199428 - 0x00СD
-0.8691199871 - 0x00D7
Ranges:
0x0000 : zero
[0x0000,0x0014] : invisible changes (increasing probably)
[0x0014, ....] : increasing (visible)
0x0080 : zero, probably the point of sign change
[0x0080,0x00B0] : invisible changes (decreasing probably)
[0x00B0, ....] : decreasing (visible)
There are gaps (....) on the ends of ranges because it is hard to check them correctly, but i assume such big values which are lying close to these ends doesn't used in practice.
Also, it looks like a symmetry between positive and negative ranges, for example i tested 0x0058 which gave 1.003879905 and 0x00D8 which gave value close to -1.003879905 but not precise. Maybe it happened because of slightly offset observed after 0x0080, when visible decreasing starts from 0x00B0, but it should be about 0x0094 if entire range had equal symmetry. But slight measure inaccuracy might be as well.
So, how to get a function in C#, that will convert source data to 4 byte floating point?

Some initial comments based on the information in the question so far:
byte[] buffer = new byte[4]; is a bad approach because it addresses bytes individually while the other code manipulates bits using shifts within words, and C# does not define endianness. Simply use an unsigned 32-bit integer for all the work. The code will actually be simpler.
The code does not handle subnormal values properly. If num2 is zero and num3 is not zero, the significand (num3) must be shifted and the exponent (num2) must be adjusted.

Which data type should I use to handle nine-digit account numbers and why?

Which data type should I use to handle 9-digit account numbers and why?
varchar(9) or int or decimal or something else ?
I'm talking from a database perspective — and the DBMS is Informix.

TL;DR Use CHAR(9).
You have a number of options, most of them mentioned in the comments. The options have different trade-offs. They include:
CHAR(9). This uses 9 bytes of storage, but can store leading zeros and that can save on formatting in the applications. You can write a check constraint that ensures that the value always contains 9 digits. If you later need to use longer numbers, you can extend the type easily to CHAR(13) or CHAR(16) or whatever.
INTEGER. This uses 4 bytes of storage. If you need leading zeros, you will have to format them yourself. If you later need more digits, you will need to change the type to BIGINT.
SERIAL. This could be used on one table and would automatically generate new values when you insert a zero into the column. Cross-referencing tables would use the INTEGER type.
DECIMAL(9,0). This uses 5 bytes of storage, and does not store leading zeros so you will have to format them yourself. If you later need more digits, you can change the type to DECIMAL(13,0) or DECIMAL(16,0) or whatever.
BIGINT and BIGSERIAL. These are 8-byte integers that can take you to 16 digits without problem. You have to provide leading zeros yourself.
INT8 and SERIAL8 — do not use these types.
VARCHAR(9). Not really appropriate since the length is not variable. It would require 10 bytes on disk where 9 is sufficient.
LVARCHAR(9). This is even less appropriate than VARCHAR(9).
NCHAR(9). This could be used as essentially equivalent to CHAR(9), but if you're only going to store digits, you may as well use CHAR(9).
NVARCHAR(9). Not appropriate for the same reasons that VARCHAR(9) and NCHAR(9) are not appropriate.
MONEY(9,0). Basically equivalent to DECIMAL(9,0) but might attract currency symbols — it would be better to use DECIMAL(9,0).
Any other type is rather quickly inappropriate, unless you design an extended type that uses INTEGER for storage but provides a conversion function to CHAR(9) that adds the leading zeros.

Is it possible to keep original trailing zeroes from C# decimal type when saving in SQL Server?

I would like to find a simple way to keep trailing zeroes from C# decimal type when saving in SQL Server.
Example:
5.3 and save, the system should display 5.3 after reloading.
05.30 and save, the system should display 5.30
5.300 and save, the system should display 5.300 after reloading.
The C# decimal type seems to do it well but the SQL server decimal type not.
For example, I would define the SQL Server column as decimal(9,3) and all 3 values would be saved as 5.300.
Of course, I could convert to string but I just wonder if there is any more elegant solution if any computing is needed on this field.

I think it is not a good idea to mix DB and UI layers. How the SQL stores data is the DB problem, and how to show type to the user is a UI problem.
C# stores Decimal in the format, using the base 10:
http://msdn.microsoft.com/en-us/library/system.decimal.getbits.aspx
Internal representation:
1m : 0x00000001 0x00000000 0x00000000 0x00000000
1.0000m : 0x000186a0 0x00000000 0x00000000 0x00050000
AFAIK, the internal representation of decimal in MSSQL is not documented. And, if the are using the following floating point format http://en.wikipedia.org/wiki/IEEE_754-1985
then it is impossible.
But, there are parameters of the decimal in MSSQL like precision and scale. One can try to use ADO.NET to manipulate this parameters in the code like this:
var cmd = new SqlCommand("command", new SqlConnection("connection"));
cmd.Parameters.Add("#p1", SqlDbType.Decimal,18);
cmd.Parameters["#p1"].Precision = 18;
cmd.Parameters["#p1"].Scale = 8;
Then, it can be possible, but, anyway it is a really hacking method, and you should not use this in the production

If you want the number of significant figures (or precision, if that's all you're interested in) for a numeric value to vary on a row-by-row basis you'll need to have a separate column that stores that number as a single numeric column isn't going to store that information.
If the number of significant figures (or precision) is consistent for all of the rows, then you can simply store the data in the database with as much precision as the database supports and then convert it back to what it should be within your application before presenting it to the user.

You can define the column type of SQL Server as sql_variant.
When you set a value of decimal(C#) to that column via SqlParameter, SQL Server keeps the metadata including scale of the value.

Driver License Barcode Field Data Types

I am given a task to develop a small library which needs to be able to read PDF417 barcode located on the back of the Driver's License card and parse the data out to our custom object.
However I need to know what data types are these Data types denoting?
4/ANS, 10/ANS, 5/ANS, etc.
The complete documentation can be found at: http://www.dol.wa.gov/external/docs/barcodeCalibration-basic.pdf

Guessing here, but <field length>/ANS, where A is alphabetic, N numeric and S spaces?
For example, 3/A is 3 alphabetic characters like USA.
Funny that weight and sex are both 1/N, but the example given (2 in both cases) fits my hypothesis.

The Washington spec is based on the AAMVA standard here:
http://www.aamva.org/DL-ID-Card-Design-Standard/
The 2013 ID Card Design Standard is here: http://www.aamva.org/WorkArea/DownloadAsset.aspx?id=4435
The PDF 417 barcode specifications start on page 51 (65) of that document. On page 58 (72) they list the type definitions: "A=alpha A-Z, N=numeric 0-9, S=special, F=fixed length, V=variable length"

6 A/N means it is a 6 digit or spaced (A)lpha/(N)umeric variable. For example 5'7" could be expressed as a variable that would fit the format as "067 in" (quotation marks only enclosing the actual variable. Very common definition of terms usually found in Database programming. Your variable will always be 6 characters long (including the space character)--3 alpha ( in) and 3 numeric (067).

Reading 39 digit number from excel sheet using c#

Excel cannot process data more than 64 bit (Big it will store it in powers of 10), but in our application I want 128bit data, for that I have formatted particular cell to be text format in excel sheet, so that I can enter very big number. Now I am able to enter big number, but not able to read that particular cell in code and also gives error for that cell.
I am using OleDbConnection in C#.

You mention that it is some sort of card ID, which to me says it's a string rather than a true number.
But, if you really have to manipulate as an integer, have you looked at BigInteger?
BigInteger, GetFiles, and More
Update in response to comments: #Shashikiran: you seem to be treating the symptoms rather than the cause. Your real problem appears to be reading a string longer than 14 chars, when excel is treating the cell contents as a number rather than string (due to all numeric chars). Sounds like you need to tell Excel it's a string rather than a number, I believe you do this by pre-fixing with 'A'

Can you read that cell as a string and then convert it to a biginteger?
C# has no built-in 128-bit integer data type.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.