Verifying modular sum checksum in c# - c#

I'm working with an embedded system that returns ASCII data that includes (what I believe to be) a modular sum checksum. I would like to verify this checksum, but I've been unable to do so based on the manufacturers specification. I've also been unable to accomplish the opposite and calculate the same checksum based off the description.
Each response from the device is in the following format:
╔═════╦═══════════════╦════════════╦════╦══════════╦═════╗
║ SOH ║ Function Code ║ Data Field ║ && ║ Checksum ║ ETX ║
╚═════╩═══════════════╩════════════╩════╩══════════╩═════╝
Example:
SOHi11A0014092414220&&FBEA
Where SOH is ASCII 1. e.g.
#define SOH "\x01"
The description of the checksum is as follows:
The Checksum is a series of four ASCII-hexadecimal characters which provide a check on the integrity of all the characters preceding it, including the control
characters. The four characters represent a 16-bit binary count which is the 2's complemented sum of the 8-bit binary representation of the message characters after the parity bit (if enabled) has been cleared. Overflows are ignored. The data integrity check can be done by converting the four checksum characters to the 16-bit
binary number and adding the 8-bit binary representation of the message characters to it. The binary result should be zero.
I've tried a few different interpretations of the specification, including ignoring SOH as well as the ampersands, and even the function code. At this point I must be missing something very obvious in either my interpretation of the spec, or the code I've been using to test. Below you'll find a simple example (data was taken from a live system), if it were correct, the lower word in the validate variable would be 0:
static void Main(string[] args)
{
unchecked
{
var data = String.Format("{0}{1}", (char) 1, #"i11A0014092414220&&");
const string checkSum = "FBEA";
// Checksum is 16 bit word
var checkSumValue = Convert.ToUInt16(checkSum, 16);
// Sum of message chars preceeding checksum
var mySum = data.TakeWhile(c => c != '&').Aggregate(0, (current, c) => current + c);
var validate = checkSumValue + mySum;
Console.WriteLine("Data: {0}", data);
Console.WriteLine("Checksum: {0:X4}", checkSumValue);
Console.WriteLine("Sum of chars: {0:X4}", mySum);
Console.WriteLine("Validation: {0}", Convert.ToString(validate, 2));
Console.ReadKey();
}
}
Edit
While the solution provided by #tinstaafl works for this particular example, it doesn't work when providing a larger record such as the below:
SOHi20100140924165011000007460904004608B40045361000427DDD6300000000427C3C66000000002200000745B4100045B3D8004508C00042754B900000000042774D8D0000000033000007453240004531E000459F5000420EA4E100000000427B14BB000000005500000744E0200044DF4000454AE000421318A0000000004288A998000000006600000744E8C00044E7200045469000421753E600000000428B4DA50000000&&
BA6C
Theoretically you could keep incrementing/decrementing a value in the string until the checksum matched, it just so happened that using the character 1 rather than the ASCII SOH control character gave it just the right value, a coincidence in this case.

Not sure if this is exactly what you're looking for, but by using an integer of 1 for the SOH instead of a char value of 1, taking the sum of all the characters and converting the validate variable to a 16 bit integer, I was able to get validate to equal 0:
var data = (#"1i11A0014092414220&&");
const string checkSum = "FBEA";
// Checksum is 16 bit word
var checkSumValue = Convert.ToUInt16(checkSum, 16);
// Sum of message chars preceeding checksum
var mySum = data.Sum<char>(c => c);
var validate = (UInt16)( checkSumValue + mySum);
Console.WriteLine("Data: {0}", data);
Console.WriteLine("Checksum: {0:X4}", checkSumValue);
Console.WriteLine("Sum of chars: {0:X4}", mySum);
Console.WriteLine("Validation: {0}", Convert.ToString(validate, 2));
Console.ReadKey();

Related

Having trouble unpacking Comp-3 in .Net. There are letter characters aside from sign character inside Comp-3 value

I am trying to import a Mainframe EDI File back to SQL Server using .NET and I am having problems unpacking some comp-3 fields.
This file was from one of our clients and I have the Copy Book layout for the following fields:
05 EH-GROSS-INVOICE-AMT PIC S9(07)V9999 COMP-3.
05 EH-CASH-DISCOUNT-AMT PIC S9(07)V9999 COMP-3.
05 EH-CASH-DISCOUNT-PCT PIC S9(03)V9999 COMP-3.
I will just be focusing on these 3 fields as all other fields are PIC(X) and are already Unicode values. I loaded everything up with the help of this Tool Ebcdic2Ascii that was created by Max Vagner. I just did a bit of modification on the "Unpack" function and have modified it to
private string Unpack(byte[] packedBytes, int decimalPlaces, out bool isParsedSuccessfully)
{
isParsedSuccessfully = true;
return BitConverter.ToString(packedBytes);
}
in order for me to get the following sample data:
EH-GROSS-INVOICE-AMT EH-CASH-DISCOUNT-AMT EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
00-1A-1A-03-26-0C 00-00-00-00-00-0C 00-00-00-0C
00-0A-1A-1A-00-0C 00-00-1A-1A-2D-0C 00-1A-00-0C
00-09-10-20-00-0C 00-00-10-1A-1A-0C 00-1A-00-0C
Here is a sample code that I created for Unpacking these values based on my understanding of Comp-3 values:
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var result1 = UnpackMod("00-1A-1A-03-26-0C", 4);
var result2 = UnpackMod("00-00-00-00-00-0C", 4);
var result3 = UnpackMod("00-00-00-0C", 4);
Console.WriteLine($"{result1}\n{result2}\n{result3}\n");
var result4 = UnpackMod("00-0A-1A-1A-00-0C", 4);
var result5 = UnpackMod("00-00-1A-1A-2D-0C", 4);
var result6 = UnpackMod("00-1A-00-0C", 4);
Console.WriteLine($"{result4}\n{result5}\n{result6}\n");
var result7 = UnpackMod("00-09-10-20-00-0C", 4);
var result8 = UnpackMod("00-00-10-1A-1A-0C", 4);
var result9 = UnpackMod("00-1A-00-0C", 4);
Console.WriteLine($"{result7}\n{result8}\n{result9}");
Console.ReadLine();
}
/// <summary>
/// Method for unpacking Comp-3 fields.
/// </summary>
/// <param name="hexString"></param>
/// <param name="decimalPlaces"></param>
/// <returns>Returns numeric string if parse was successful; else Return input hex string</returns>
private static string UnpackMod(string inputString, int decimalPlaces)
{
var outputString = inputString;
// Remove "-".
outputString = outputString.Replace("-", "");
// Check last character for sign.
string lastChar = outputString.Substring(outputString.Length - 1, 1);
bool isNegative = (lastChar == "D" || lastChar == "B");
// Remove sign character.
if (lastChar == "C" || lastChar == "A" || lastChar == "E" || lastChar == "F" || lastChar == "D" || lastChar == "B")
{
outputString = outputString.Substring(0, outputString.Length - 1);
}
// Place decimal point.
outputString = outputString.Insert(outputString.Length - decimalPlaces, ".");
// Check if parsed value is numeric. This will also eliminate all leading 0.
var isParsedSuccessfully = decimal.TryParse(outputString, out decimal decimalValue);
// If isParsedSuccessfully is true then return numeric string else return inputString..
string result = "NULL";
if (isParsedSuccessfully)
{
// Convert value to negative.
if (isNegative)
{
decimalValue = decimalValue * -1;
}
result = decimalValue.ToString();
}
return result;
}
}
}
After running the sample code I was able to get the following results:
EH-GROSS-INVOICE-AMT EH-CASH-DISCOUNT-AMT EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
NULL 0.0000 0.0000
NULL NULL NULL
9102.0000 NULL NULL
As you can see I was only able to get following 3 values correctly:
00-09-10-20-00-0C -> 9102.0000
00-00-00-00-00-0C -> 0.0000
00-00-00-0C -> 0.0000
As referenced from this source: http://www.3480-3590-data-conversion.com/article-packed-fields.html. I have the following understanding about Comp-3:
COBOL Comp-3 is a binary field type that puts ("packs") two digits into each byte, using a notation called Binary Coded Decimal, or BCD.
The Binary Coded Decimal (BCD) data type is just as its name suggests -- it is a value stored in decimal (base ten) notation, and each digit is binary coded. Since a digit only has ten possible values (0-9).
The low nibble of the least significant byte is used to store the sign for the number. This nibble stores only the sign, not a digit. "C" hex is positive, "D" hex is negative, and "F" hex is unsigned.
Since I know that BCD should only be values 0-9 and that there should just only be a character at the end which could either be "C", "D" or "F". I don't know how to unpack the following values:
00-1A-1A-03-26-0C
00-0A-1A-1A-00-0C
00-00-1A-1A-2D-0C
00-1A-00-0C
00-00-10-1A-1A-0C
00-1A-00-0C
These values has other characters beside the sign character. I have a feeling that the data has already been converted because if it is not then there should be no readable values there not unless you apply an Encoding. I am still not sure about this and would love any insights on this. Thanks.
First, PIC X is not Unicode in COBOL.
Quoting myself from here...
It is common for mainframe data to include both text and binary data
in a single record, for example a name, a currency amount, and a
quantity:
Hopper Grace ar% .
...which would be...
x'C8969797859940404040C799818385404040404081996C004B'
...in hex. This is code page 37, commonly referred to as EBCDIC.
[...]Converting to code page 1250, commonly in use on Microsoft
Windows, you would end up with...
x'486F707065722020202047726163652020202020617225002E'
...where the text data is translated but the packed data is destroyed.
The packed data no longer has a valid sign in the last nibble (the
lower half of the last byte), the currency amount itself has been
changed as has the quantity (from decimal 75 to decimal 11,776 due to
both code page conversion and mangling of a big endian number as a
little endian number).
Likely your data was code page converted on transfer from the mainframe. If you know the original code page and the code page it was converted to, then you might be able to unscramble the packed data.
I say might because, if you're lucky, the hex values you have will have been mapped one-to-one with hex values in the original code page. Note that it is common for both EBCDIC x'15' and x'0D' to be mapped to ASCII x'0D'.

How to convert a base 10-number to 3-base in net (Special case)

I'm looking for a routine in C# that gives me the following output when putting in numbers:
0 - A
1 - B
2 - C
3 - AA
4 - AB
5 - AC
6 - BA
7 - BB
8 - BC
9 - CA
10 - CB
11 - CC
12 - AAA
13 - etc
I'm using letters, so that it's not so confusing with zero's.
I've seen other routines, but they will give me BA for the value of 3 and not AA.
Note: The other routines I found was Quickest way to convert a base 10 number to any base in .NET? and http://www.drdobbs.com/architecture-and-design/convert-from-long-to-any-number-base-in/228701167, but as I said, they would give me not exactly what I was looking for.
Converting between systems is basic programming task and logic doesn't differ from other systems (such as hexadecimal or binary). Please, find below code:
//here you choose what number should be used to convert, you wanted 3, so I assigned this value here
int systemNumber = 3;
//pick number to convert (you can feed text box value here)
int numberToParse = 5;
// Note below
numberToParse++;
string convertedNumber = "";
List<char> letters = new List<char>{ 'A', 'B', 'C' };
//basic algorithm for converting numbers between systems
while(numberToParse > 0)
{
// Note below
numberToParse--;
convertedNumber = letters[numberToParse % systemNumber] + convertedNumber;
//append corresponding letter to our "number"
numberToParse = (int)Math.Floor((decimal)numberToParse / systemNumber);
}
//show converted number
MessageBox.Show(convertedNumber);
NOTE: I didn't read carefully at first and got it wrong. I added to previous solution two lines marked with "Note below": incrementation and decrementation of parsed number. Decrementation enables A (which is zero, thus omitted at the beginning of numbers) to be treated as relevent leading digit. But this way, numbers that can be converted are shifted and begin with 1. To compensate that, we need to increment our number at the beginning.
Additionaly, if you want to use other systems like that, you have to expand list with letter. Now we have A, B and C, because you wanted system based on 3. In fact, you can always use full alphabet:
List<char> letters = new List<char> {'A','B','C', 'D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
and only change systemNumber.
Based on code from https://stackoverflow.com/a/182924 the following should work:
private string GetWeirdBase3Value(int input)
{
int dividend = input+1;
string output = String.Empty;
int modulo;
while (dividend > 0)
{
modulo = (dividend - 1) % 3;
output = Convert.ToChar('A' + modulo).ToString() + output;
dividend = (int)((dividend - modulo) / 3);
}
return output;
}
The code should hopefully be pretty easy to read. It essentially iteratively calculates character by character until the dividend is reduced to 0.

Bit shifting with hex in Python

I am trying to understand how to perform bit shift operations in Python. Coming from C#, it doesn't work in the same way.
The C# code is;
var plain=0xabcdef0000000; // plaintext
var key=0xf0f0f0f0f123456; // encryption key
var L = plain;
var R = plain>>32;
The output is;
000abcdef0000000 00000000000abcde
What is the equivilent in Python? I have tried;
plain = 0xabcdef0000000
key = 0xf0f0f0f0f123456
print plain
left = plain
right = plain >> 32
print hex(left)
print hex(right)
However, it doesn't work. The output is different in Python. The 0's padding are missing. Any help would be appreciated!
The hex() function does not pad numbers with leading zeros, because Python integers are unbounded. C# integers have a fixed size (64 bits in this case), so have an upper bound and can therefor be padded out. This doesn't mean those extra padding zeros carry any meaning; the integer value is the same.
You'll have to explicitly add those zeros, using the format() function to produce the output:
print format(left, '#018x')
print format(right, '#018x')
The # tells format() to include the 0x prefix, and the leading 0 before the field width asks format() to pad the output:
>>> print format(left, '#018x')
0x000abcdef0000000
>>> print format(right, '#018x')
0x0000000000abcde
Note that the width includes the 0x prefix; there are 16 hex digits in that output, representing 64 bits of data.
If you wanted to use a dynamic width based on the number of characters used in key, then calculate that from int.bit_length(); every 4 bits produce a hex character:
format(right, '#0{}x'.format((key.bit_length() + 3) // 4 + 2))
Demo:
>>> (key.bit_length() + 3) // 4 + 2
17
>>> print format(right, '#0{}x'.format((key.bit_length() + 3) // 4 + 2))
0x0000000000abcde
But note that even the key is only 60 bits in length and C# would pad that value with an 0 as well.
I have no problem with you you tried
>>> hex(0xabcdef0000000)
'0xabcdef0000000'
>>> hex(0xabcdef0000000 >> 32)
'0xabcde'
In [83]: plain=0xabcdef0000000
In [84]: plain>>32
Out[84]: 703710
In [85]: plain
Out[85]: 3022415462400000
In [87]: hex(plain)
Out[87]: '0xabcdef0000000'
if
In [134]: left = plain
In [135]: right = plain >> 32
Then
In [140]: '{:0x}'.format(left)
Out[140]: 'abcdef0000000'
In [143]: '{:018x}'.format(right)
Out[143]: '0000000000000abcde'

How to create byte[] with length 16 using FromBase64String [duplicate]

This question already has an answer here:
Calculate actual data size from Base64 encoded string length
(1 answer)
Closed 10 years ago.
I have a requirement to create a byte[] with length 16. (A byte array that has 128 bit to be used as Key in AES encryption).
Following is a valid string
"AAECAwQFBgcICQoLDA0ODw=="
What is the algorithm that determines whether a string will be 128 bit? Or is trial and error the only way to create such 128 bit strings?
CODE
static void Main(string[] args)
{
string firstString = "AAECAwQFBgcICQoLDA0ODw=="; //String Length = 24
string secondString = "ABCDEFGHIJKLMNOPQRSTUVWX"; //String Length = 24
int test = secondString.Length;
byte[] firstByteArray = Convert.FromBase64String((firstString));
byte[] secondByteArray = Convert.FromBase64String((secondString));
int firstLength = firstByteArray.Length;
int secondLength = secondByteArray.Length;
Console.WriteLine("First Length: " + firstLength.ToString());
Console.WriteLine("Second Length: " + secondLength.ToString());
Console.ReadLine();
}
Findings:
For 256 bit, we need 256/6 = 42.66 chars. That is rounded to 43 char. [To make it divisible by 4 add =]
For 512 bit, we need 512/6 = 85.33 chars. That is rounded to 86 char. [To make it divisible by 4 add ==]
For 128 bit, we need 128/6 = 21.33 chars. That is rounded to 22 char. [To make it divisible by 4 add ==]
A base64 string for 16 bytes will always be 24 characters and have == at the end, as padding.
(At least when it's decodable using the .NET method. The padding is not always inlcuded in all uses of base64 strings, but the .NET implementation requires it.)
In Base64 encoding '=' is a special symbol that is added to end of the Base64 string to indicate that there is no data for these chars in original value.
Each char is equal to 6 original bits of data, so to produce 8 bit values the string length has to be dividable by 4 without remainder. (6 bits * 4 = 8 bits * 3). When the resulting BASE64 string is shorter than 4n then '=' are added at the end to make it valid.
Update
Last char before '==' encodes only 2 bits of information, so by replacing it with all possible Base64 chars will give you only 4 different keys out of 64 possible combinations. In other words, by generating strings in format "bbbbbbbbbbbbbbbbbbbbbb==" (where 'b' is valid Base64 character) you'll get 15 duplicate keys per each unique key.
You can use PadRight() to pad the string to the end of it with a char that you will later remove once decrypted.

Compressing big number (or string) to small value

My ASP.NET page has following query string parameter:
…?IDs=1000000012,1000000021,1000000013,1000000022&...
Here IDs parameter will always have numbers separated by something, in this case ,. Currently there are 4 numbers but normally they would be in between 3 and 7.
Now, I am looking for method to convert each big number from above into smallest possible value; specifically compressing value of IDs query string parameter. Both, compressing each number algorithm or compressing whole value of IDs query string parameter are welcome.
Encode or decode is not an issue; just compressing the value IDs query string parameter.
Creating some unique small value for IDs and then retrieving its value from some data source is out of scope.
Is there an algorithm to compress such big numbers to small values or to compress value of the IDs query string parameter all together?
You basically need so much room for your numbers because you are using base 10 to represent them. An improvement would be to use base 16 (hex). So for example, you could represent 255 (3 digits) as ff (2 digits).
You can take that concept further by using a much larger number base... the set of all characters that are valid query string parameters:
A-Z, a-z, 0-9, '.', '-', '~', '_', '+'
That gives you a base of 67 characters to work with (see Wikipedia on QueryString).
Have a look at this SO post for approaches to converting base 10 to arbitrary number bases.
EDIT:
In the linked SO post, look at this part:
string xx = IntToString(42,
new char[] { '0','1','2','3','4','5','6','7','8','9',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x'});
That's almost what you need. Just expand it by adding the few characters it is missing:
yz.-~_+
That post is missing a method to go back to base 10. I'm not going to write it :-) but the procedure is like this:
Define a counter I'll call TOTAL.
Look at the right most character and find it's position in the array.
TOTAL = (the position of the character in the array)
Example: Input is BA1. TOTAL is now 1 (since "1" is in position 1 in the array)
Now look at the next character left of the first one and find it's position in the array.
TOTAL += 47 * (the position of the character in the array)
Example: Input is BA1. TOTAL is now (47 * 11) + 1 = 518
Now look at the next character left of the previous one and find it's position in the array.
TOTAL += 47 * 47 * (the position of the character in the array)
Example: Input is BA1. Total is now (47 * 47 * 10) + (47 * 11) + 1 = 243508
And so on.
I suggest you write a unit test that converts a bunch of base 10 numbers into base 47 and then back again to make sure your conversion code works properly.
Note how you represented a 6 digit base 10 number in just 3 digits of base 47 :-)
What is the range of your numbers? Assuming they can fit in a 16-bit integer, I would:
Store all your numbers as 16-bit integers (2 bytes per number, range -32,768 to 32,767)
Build a bytestream of 16-bit integers (XDR might be a good option here; at very least, make sure to handle endianness correctly)
Base64 encode the bytestream, using the modified base64 encoding for URLs (net is about 3 characters per number)
As an added bonus you don't need comma characters anymore because you know each number is 2 bytes.
Alternatively, if that isn't good enough, I'd use zlib to compress your stream of integers and then base64 the zlib-compressed stream. You can also switch to 32-bit integers if 16-bit isn't a large enough range (i.e. if you really need numbers in the 1,000,000,000 range).
Edit:
Maybe too late, but here's an implementation that might do what you need:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Scratch {
class Program {
static void Main(string[] args) {
//var ids = new[] { 1000000012, 1000000021, 1000000013, 1000000022 };
var rand = new Random();
var ids = new int[rand.Next(20)];
for(var i = 0; i < ids.Length; i++) {
ids[i] = rand.Next();
}
WriteIds(ids);
var s = IdsToString(ids);
Console.WriteLine("\nResult string is: {0}", s);
var newIds = StringToIds(s);
WriteIds(newIds);
Console.ReadLine();
}
public static void WriteIds(ICollection<Int32> ids) {
Console.Write("\nIDs: ");
bool comma = false;
foreach(var id in ids) {
if(comma) {
Console.Write(",");
} else {
comma = true;
}
Console.Write(id);
}
Console.WriteLine();
}
public static string IdsToString(ICollection<Int32> ids) {
var allbytes = new List<byte>();
foreach(var id in ids) {
var bytes = BitConverter.GetBytes(id);
allbytes.AddRange(bytes);
}
var str = Convert.ToBase64String(allbytes.ToArray(), Base64FormattingOptions.None);
return str.Replace('+', '-').Replace('/', '_').Replace('=', '.');
}
public static ICollection<Int32> StringToIds(string idstring) {
var result = new List<Int32>();
var str = idstring.Replace('-', '+').Replace('_', '/').Replace('.', '=');
var bytes = Convert.FromBase64String(str);
for(var i = 0; i < bytes.Length; i += 4) {
var id = BitConverter.ToInt32(bytes, i);
result.Add(id);
}
return result;
}
}
}
Here's another really simple scheme that should give good compression for a set of numbers of the form N + delta where N is a large constant.
public int[] compress(int[] input) {
int[] res = input.clone();
Arrays.sort(res);
for (int i = 1; i < res.length; i++) {
res[i] = res[i] - res[i - 1];
}
return res;
}
This should reduce the set {1000000012,1000000021,1000000013,1000000022} to the list [1000000012,1,9,1], which you can then compress further by representing the numbers in base47 encoding as described in another answer.
Using simple decimal encoding, this goes from 44 characters to 16 characters; i.e. 63%. (And using base47 will give even more compression).
If it is unacceptable to sort the ids, you don't get quite as good compression. For this example, {1000000012,1000000021,1000000013,1000000022} compresses to the list [1000000012,9,-8,9]. That is just one character longer for this example
Either way, this is better than a generic compression algorithm or encoding schemes ... FOR THIS KIND OF INPUT.
If the only issue is the URL length, you can convert numbers to base64 characters, then convert them back to numbers at the server side
how patterned are the IDs you are getting? if digit by digit, the IDs are random, then the method I am about to propose won't be very efficient. but if the IDs you gave as an example are representative of the types you'd be getting, then perhaps the following could work?
i motivate this idea by example.
you have for example, 1000000012 as ID that you'd like to compress. why not store it as [{1},{0,7},{12}]? This would mean that the first digit is a 1 followed by 7 zeros followed by a 12. Thus if we use the notation {x} that would represent one instance of x, while if we use {x,y} that would mean that x occurs y times in a row.
you could extend this with a little bit of pattern matching and/or function fitting.
for example, pattern matching: 1000100032 would be [{1000,2}{32}].
for example, function fitting:
if your IDs are 10 digits, then split the ID into two 5 digit numbers and store the equation of the line that goes through both points. if ID = 1000000012, the you have y1 = 10000 and y2 = 12. therefore, your slope is -9988 and your intercept is 10000 (assuming x1 = 0, x2 = 1). In this case, it's not an improvement, but if the numbers were more random, it could be. Equivalently, you could store the sequence of IDs with piecewise linear functions.
in any case, this mostly depends on the structure of your IDs.
I assume you are doing this as a workaround for request URL length restrictions ...
Other answers have suggested encoding the decimal id numbers in hex, base47 or base64, but you can (in theory) do a lot better than that by using LZW (or similar) to compress the id list. Depending on how much redundancy there is in your ID lists, you could get significantly more than 40% reduction, even after re-encoding the compressed bytes as text.
In a nut-shell, I suggest that you find an off-the-shelf text compression library implemented in Javascript and use it client side to compress the ID list. Then encode the compressed bytestring using base47/base64, and pass the encoded string as the URL parameter. On the server side do the reverse; i.e. decode followed by decompress.
EDIT: As an experiment, I created a list of 36 different identifiers like the ones you supplied and compressed it using gzip. The original file is 396 bytes, the compressed file is 101 bytes, and the compressed + base64 file 138 bytes. That is a 65% reduction overall. And the compression ratio could actually improve for larger files. However, when I tried this with a small input set (e.g. just the 4 original identifiers), I got no compression, and after encoding the size was larger than the original.
Google "lzw library javascript"
In theory, there might be simpler solution. Send the parameters as "post data" rather than in the request URL, and get the browser to apply the compression using one of the encodings that it understands. That will give you more savings too since there is no need to encode the compressed data into legal URL characters.
The problem is getting the browser to compress the request ... and doing that in a browser independent way.

Categories