Regex Headache, String formatting, datagrid

Regex Headache, String formatting, datagrid - c#

I've been battling it out for hours and admit defeat. I have coded a C# Win forms client which receives Telnet data in the following format:
Decimal AlphaNum Int "dB" Int "WPM" AN Int + "Z"
14048.4 XY3CVI 19 dB 29 WPM 1700Z
14092.6 XY3CVI 19 dB 29 WPM XZ 1700Z
Periodically due to an upstream bottleneck I get a 'double-hit' without the CRLF.
14048.4 XY3CVI 19 dB 29 WPM 1700Z14048.4 XY3CVI 19 dB 29 WPM 1700Z
The incoming data is padded with whitespace which vary in order to keep the columns aligned perfectly (by the server, not me)(accounting for varying numbers of characters in 'AlphaNum' and 'Int' which precedes dB.
I need a means - not necessarily regex, to add the CRLF when a string comes in doubled-up (sometimes tripled), preserving the extra data.
Since I'm already pleading for help I'd just as well go the whole hog, as it's likely the combined solution will be more elegant to implement as one:
To take the above 'problem', and format it for entry into a datagrid with four columns; from the above; columns 1, 2, 3 and 5. Your help would be greatly appreciated!

As far as I understand it, your lines with the CRLF are always the same size right ?
Just check the size of every line, if it's longer than the expected size, just add the CRLF to the line using line.Insert(expectedLineSize,"\r\n"), you will then have a new line to check, just continue on like that until you don't have anymore lines.
As for the second part, you have fixed size columns, just use Substring

Related

What is the time format 'MMMMMMMSS'?

When processing a file from a telecom company, I came across this in the specifications :
When reading in that data, how can I convert that format to something usable in c# ? I have no idea what MMMMMMMMSS format is !!

The only logical explanation I can think of is the following:
Since this is a call duration representation, let's say that a call duration was 10:10:5. I assume they want to represent this in minutes and seconds only. Hence considering the given format, it would be represented like this: 61005 which is 610 minutes and 5 seconds, then the 5 remaining bytes can be filled with trailing zeros, or with space characters (since you mentioned that's what they used to represent a value).
Hope that helps.

I would expect each of these to be zero-padded. Regardless, Split the last two characters off to derive seconds and cents, respectively. The first 8 characters represent minutes and dollars. A one minute (exactly) call would be 7 zeros followed by a 1 followed by two zeros. A ten minute and ten second call would be 6 zeros followed by 1010.

How does String.Format work in this situation?

I have a website where you can buy stuff, and we want to format the orderID that goes to our portal in certain way. I am using the string.format method to format it like this:
Portal.OrderID = string.Format( "{0}{1:0000000}-{2:000}",
"Z",
this.Order.OrderID,
"000");
So we want it to look like this basically Z0545698-001. My question is, if I am using string.format will it blow up if this.Order.OrderID is greater than 7 characters?
If so, how can I keep the same formatting (i.e. Z 1234567 - 000) but have the first set of numbers (the 1-7) be a minimum of 7 (with any numbers less than 7 in length have leading 0's). And then have anything greater than 7 in length just extend the formatting so I could get an order number like Z12345678-001?

how can I keep the same formatting (i.e. Z 1234567 - 000) but have the first set of numbers (the 1-7) be a minimum of 7 (with any numbers less than 7 in length have leading 0's). And then have anything greater than 7 in length just extend the formatting so I could get an order number like Z12345678-001?
Use exactly the code that you have, because that's what it does.

Date Time Encoding

Any ideas or implementations floating about for encoding the current date including the milliseconds into the shortest possible string length?
e.g I want 31/10/2011 10:41:45 in the shortest string possible (ideally 5 characters) - obviously decodable.
If it is impossible to get down to 5 characters, then the year is optional.
edit: it doesn't actually need to be decodable. It just needs to be a unique string.

An time_t is 31 bits. Add 10 bits for up to 1000 milliseconds: That's 41 bits. You want 5 characters: That's 8 bits for the 1st 4 characters + 9 bits for the last one.
Using Chinese ideograms, you should easily be able to find a range of 256 consecutive chars for each of the 1st 4 chars and a range of 512 for the last one.
Needless to say your encoded date will look... chinese! But it should do the trick ;-)
BTW, you don't have to stick to Chinese. You might even want to choose a different Unicode 256 chars range for each character. Of course, you'll want to find sequences of 256/512 printable chars.
Now let's say we skip the year. We're down to 86400 x 366 seconds per year = 31622400 seconds. Including millisecs : 31622400000. That's 35 bits. Great: We're down at 7 bits per character. Easy! :-)

you can use the Ticks:
var ticks = System.DateTime.Now.Ticks;
this is a 64bit number. You get the Time back by calling:
var timeBack = new System.DateTime(ticks);
of course this are 8 bytes but I don't think you can get this more compact (easily).

No can do: The total ms in an year (365 days) is 31,536,000,000 (=365*24*60*60*1000). You need 34.87628063 bits of information to store that value (log2 31,536,000,000). You probably meant "printable characters" BUT you would need 7 bits/character to store 35 bits in 5 characters. As an example base64 is 6 bits/character of information, so 6 characters. Ascii85 would be a little better, but still you would need around 5.5 characters, so 6 characters.
Clearly if you meant 5 BYTES, everything changes. You can store 34.84 years (in ms) in that space.
And if you meant 5 C# PRINTABLE AND UNPRINTABLE CHARACTERS (each C# character is 16 bits), then it's even better. 10 bytes! DateTime in C# is only 8 bytes and it uses ticks (they are a VERY VERY VERY small part of a second)!
BUT if you meant 5 C# PRINTABLE CHARACTERS characters, then use Serge's response. It's very good and show us that the world is a big place (and show us that why good questions are so much important: they let us see the world in new ways).

You can use ASCII characters to represent the numbers and drop the formatting, for example:
31/10/2011 10:41:45
*/*/** *:*:*
*******
That's 7, you can drop 2 if you don't want to include the full year. Obviously the * are actual characters relating to a number, A could be 1 etc, or even use the proper ASCII codes.

Reading a Cobol generated file

I’m currently on the task of writing a c# application, which is going sit between two existing apps. All I know about the second application is that it processes files generated by the first one. The first application is written in Cobol.
Steps:
1) Cobol application, writes some files and copies to a directory.
2) The second application picks these files up and processes them.
My C# app would sit between 1) an 2). It would have to pick up the file generated by 1), read it, modify it and save it, so that application 2)
wouldn’t know I have even been there.
I have a few problems.
First of all if I open a file generated by 1) in notepad, most of it is unreadable while other parts are.
If I read the file, modify it and save, I must save the file with the same notation used by the cobol application, so that app 2), doesn´t know I´ve been there.
I´ve tried reading the file this way, but it´s still unreadable:
Code:
string ss = #"filename";
using (FileStream fs = new FileStream(ss, FileMode.Open))
{
StreamReader sr = new StreamReader(fs);
string gg = sr.ReadToEnd();
}
Also if I find a way of making it readable (using some sort of encoding technique), I´m afraid that when I save the file again, I may change it´s original format.
Any thoughts? Suggestions?

To read the COBOL-genned file, you'll need to know:
First, you'll need the record layout (copybook) for the file. A COBOL record layout will look something like this:
01 PATIENT-TREATMENTS.
05 PATIENT-NAME PIC X(30).
05 PATIENT-SS-NUMBER PIC 9(9).
05 NUMBER-OF-TREATMENTS PIC 99 COMP-3.
05 TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
DEPENDING ON NUMBER-OF-TREATMENTS
INDEXED BY TREATMENT-POINTER.
10 TREATMENT-DATE.
15 TREATMENT-DAY PIC 99.
15 TREATMENT-MONTH PIC 99.
15 TREATMENT-YEAR PIC 9(4).
10 TREATING-PHYSICIAN PIC X(30).
10 TREATMENT-CODE PIC 99.
You'll also need a copy of IBM's Principles of Operation (S/360, S370, z/OS, doesn't really matter for our purposes). Latest is available from IBM at
http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a (but you'll need an IBM account.
An older edition is available, gratis, at http://www.hack.org/mc/texts/principles-of-operation.pdf
Chapters 8 (Decimal Instructions) and 9 (Floating Point Overview and Support Instructions) are the interesting bits for our purposes.
Without that, you're pretty much lost.
Then, you need to understand COBOL data types. For instance:
PIC defines an alphameric formatted field (PIC 9(4), for example is 4 decimal digits, that might be filled with for space characters if missing). Pic 999V99 is 5 decimal digits, with an implied decimal point. So-on and so forthe.
BINARY is [usually] a signed fixed point binary integer. Usual sizes are halfword (2 octets) and fullword (4 octets).
COMP-1 is single precision floating point.
COMP-2 is double precision floating point.
If the datasource is an IBM mainframe, COMP-1 and COMP-2 likely won't be IEE floating point: it will be IBM's base-16 excess 64 floating point format. You'll need something like the S/370 Principles of Operation to help you understand it.
COMP-3 is 'packed decimal', of varying lengths. Packed decimal is a compact way of representing a decimal number. The declaration will look something like this: PIC S9999V99 COMP-3. This says that is it signed, consists of 6 decimal digits with an implied decimal point. Packed decimal represents each decimal digit as a nibble of an octet (hex values 0-9). The high-order digit is the upper nibble of the leftmost octet. The low nibble of the rightmost octet is a hex value A-F representing the sign. So the above PIC clause will require ceil( (6+1)/2 ) or 4 octets. the value -345.67, as represented by the above PIC clause will look like 0x0034567D. The actual sign value may vary (the default is C/positive, D/negative, but A, C, E and F are treated as positive, while only B and D are treated as negative). Again, see the S\370 Principles of Operation for details on the representation.
Related to COMP-3 is zoned decimal. This might be declared as `PIC S9999V99' (signed, 5 decimal digits, with an implied decimal point). Decimal digits, in EBCDIC, are the hex values 0xFO - 0xF9. 'Unpack' (mainframe machine instruction) takes a packed decimal field and turns in into a character field. The process is:
start with the rightmost octet. Invert it, so the sign nibble is on top and place it into the rightmost octet of the destination field.
Working from right to left (source and the target both), strip off each remaining nibble of the packed decimal field, and place it into the low nibble of the next available octet in the destination. Fill the high nibble with a hex F.
The operation ends when either the source or destination field is exhausted.
If the destination field is not exhausted, if it left-padded with zeroes by filling the remaining octets with decimal '0' (oxF0).
So our example value, -345.67, if stored with the default sign value (hex D), would get unpacked as 0xF0F0F0F3F4F5F6D7 ('0003456P', in EBDIC).
[There you go. There's a quiz later]
If the COBOL app lives on an IBM mainframe, has the file been converted from its native EBCDIC to ASCII? If not, you'll have to do the mapping your self (Hint: its not necessarily as straightforward as that might seem, since this might be a selective process -- only character fields get converted (COMP-1, COMP-2, COMP-3 and BINARY get excluded since they are a sequence of binary octets). Worse, there are multiple flavors of EBCDIC representations, due to the varying national implementations and varying print chains in use on different printers.
Oh...one last thing. The mainframe hardware tends to like different things aligned on halfword, word or doubleword boundaries, so the record layout may not map directly to the octets in the file as there may be padding octets inserted between fields to maintain the needed word alignment.
Good Luck.

I see from comments attached to your question that you are dealing with the “classic” COBOL batch file structure: Header record, detail records and trailer record.
This is probably bad news if you are responsible for creating the trailer record! The typical “trailer” record is used to identify the end-of-file and provides control information such as the number of records that precede it and various check sums and/or grand totals for “detail” records. In other words, you may need to read and summarize the entire file in order to create the trailer. Add to this the possibility that much of the data in the file is in Packed Decimal, Zoned Decimal or other COBOLish numeric data types, you could be in for a rough time.
You might want to question why you are adding trailer records to these files. Typically the “trailer” is produced by the same program or application that created the “detail” records. The trailer is supposed to act as a verification that the sending application/program wrote all of the data it was supposed to. The summary totals, counts etc. are used by the receiving application to verify that the detail records tally with the preceding details. This is supposed to serve as another verification that the sending application didn't muff up the data or that it was not corrupted en-route (no that wasn't a joke – but maybe it should be). When a "man in the middle" creates the trailers it kind of defeats the entire purpose of the exercise (no matter how flawed it might have been to begin with).

It would be useful to know which Cobol Dialect you are dealing with because there is
no single Cobol Format. Some Cobol Compilers (Micro Focus) put a "File Description" at the front of files (For Micro Focus VB / Indexed files).
Have a look at the RecordEditor (http://record-editor.sourceforge.net/). It has a File Wizard which could be very useful for you.
In the File Wizard set the file as Fixed-Width File (most common in Cobol). The program lets you try out different Record Lengths. When you get the correct record length, the Text fields should line up.
Latter on in the Wizard there is field search which can look for Binary, Comp-3, Text Fields.
There is some notes on using the RecordEditor's Wizard with an unknown file here
http://record-editor.sourceforge.net/Unkown.htm
Unless the file is coming from a Mainframe / AS400 it is unlikely to use EBCDIC (cp037 - Coded Page 37 is US EBCDIC), any text is most likely in Ascii.
The file probably contains Packed-Decimal (Comp3) and Binary-Integer data. Most Cobols
use Big-Endian (for Comp integers) even on Intel (little endian hardware).
One thing to remember with Cobol PIC s9(6)V99 comp is stored as a Binary Integer with x'0001' representing 0.01. So unless you have the Cobol definition you can not tell wether a binary 1 is 1 0.1, 0.01 etc

Can't figure out what this SubString.PadLeft is doing

In this code I am debugging, I have this code snipit:
ddlExpYear.SelectedItem.Value.Substring(2).PadLeft(2, '0');
What does this return? I really can't run this too much as it is part of a live credit card application. The DropDownList as you could imagine from the name contains the 4-digit year.
UPDATE: Thanks everyone. I don't do a lot of .NET development so setting up a quick test isn't as quick for me.

It takes the last two digits of the year and pads the left side with zeroes to a maximum of 2 characters. Looks like a "just in case" for expiration years ending in 08, 07, etc., making sure that the leading zero is present.

This prints "98" to the console.
class Program
{
static void Main(string[] args)
{
Console.Write("1998".Substring(2).PadLeft(2, '0'));
Console.Read();
}
}

Of course you can run this. You just can't run it in the application you're debugging. To find out what it's doing, and not just what it looks like it's doing, make a new web application, put in a DropDownList, put a few static years in it, and then put in the code you've mentioned and see what it does. Then you'll know for certain.

something stupid. It's getting the value of the selected item and taking the everything after the first two characters. If that is only one character, then it adds a '0' to the beginning of it, and if it is zero characters, the it returns '00'. The reason I say this is stupid is because if you need the value to be two characters long, why not just set it like that to begin with when you are creating the drop down list?

It looks like it's grabbing the substring from the 3rd character (if 0 based) to the end, then if the substring has a length less than 2 it's making the length equal to 2 by adding 0 to the left side.

PadLeft ensures that you receive at least two characters from the input, padding the input (on the left side) with the appropriate character. So input, in this case, might be 12. You get "12" back. Or input might be 9, in which case, you get "09" back.
This is an example of complex chaining (see "Is there any benefit in Chaining" post) gone awry, and making code appear overly complex.

The substring returns the value with the first two characters skipped, the padleft pads the result with leading zeros:
string s = "2014";
MessageBox.Show(s.Substring(2).PadLeft(2, 'x')); //14
string s2 = "14";
MessageBox.Show(s2.Substring(2).PadLeft(2, 'x')); //xx
My guess is the code is trying to convert the year to a 2 digit value.

The PadLeft only does something if the user enters a year that is either 2 or 3 digits long.
With a 1-digit year, you get an exception (Subsring errs).
With a 2-digit year (07, 08, etc), it will return 00. I would say this is an error.
With a 3-digit year (207, 208), which the author may have assumed to be typos, it would return the last digit padded with a zero -- 207 -> 07; 208 -> 08.
As long as the user must choose a year and isn't allowed to enter a year, the PadLeft is unnecessary -- the Substring(2) does exactly what you need given a 4-digit year.

This code seems to be trying to grab a 2 digit year from a four digit year (ddlexpyear is the hint)
It takes strings and returns strings, so I will eschew the string delimiters:
1998 -> 98
2000 -> 00
2001 -> 01
2012 -> 12
Problem is that it doesn't do a good job. In these cases, the padding doesn't actually help. Removing the pad code does not affect the cases it gets correct.
So the code works (with or without the pad) for 4 digit years, what does it do for strings of other lengths?
null: exception
0: exception
1: exception
2: always returns "00". e.g. the year 49 (when the Jews were expulsed from rome) becomes "00". This is bad.
3: saves the last digit, and puts a "0" in front of it. Correct in 10% of cases (when the second digit is actually a zero, like 304, or 908), but quite wrong in the remainder (like 915, 423, and 110)
5: just saves the 3rd and 4th digits, which is also wrong, "10549" should probably be "49" but is instead "54".
as you can expect the problem continues in higher digits.

OK so it's taking the value from the drop down, ABCD
Then it takes the substring from position 2, CD
And then it err, left pads it with 2 zeros if it needs too, CD
Or, if you've just ended X, then it would substring to X and pad to OX

It's taking the last two digits of the year, then pad to the left with a "0".
So 2010 would be 10, 2009 would be 09.
Not sure why the developer didn't just set the value on the dropdown to the last two digits, or why you would need to left pad it (unless you were dealing with years 0-9 AD).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.