How the space complexity of this algorithm is O(1) - c#

I found an algorithm here to remove duplicate characters from string with O(1) space complexity (SC). Here we see that the algorithm converts string to character array which is not constant, it will change depending on input size. They claim that it will run in SC of O(1). How?
// Function to remove duplicates
static string removeDuplicatesFromString(string string1)
{
// keeps track of visited characters
int counter = 0;
char[] str = string1.ToCharArray();
int i = 0;
int size = str.Length;
// gets character value
int x;
// keeps track of length of resultant String
int length = 0;
while (i < size) {
x = str[i] - 97;
// check if Xth bit of counter is unset
if ((counter & (1 << x)) == 0) {
str[length] = (char)('a' + x);
// mark current character as visited
counter = counter | (1 << x);
length++;
}
i++;
}
return (new string(str)).Substring(0, length);
}
It seems that I don't understand Space Complexity.

I found an algorithm here to remove duplicate characters from string with O(1) space complexity (SC). Here we see that the algorithm converts string to character array which is not constant, it will change depending on input size. They claim that it will run in SC of O(1). How?
It does not.
The algorithm takes as its input an arbitrary sized string consisting only of 26 characters, and therefore the output is only ever 26 characters or fewer, so the output array need not be of the size of the input.
You are correct to point out that the implementation given on the site allocates O(n) extra space unnecessarily for the char array.
Exercise: Can you fix the char array problem?
Harder Exercise: Can you describe and implement a string data structure that implements the contract of a string efficiently but allows this algorithm to be implemented actually using only O(1) extra space for arbitrary strings?
Better exercise: The fact that we are restricted to an alphabet of 26 characters is what enables the cheesy "let's just use an int as a set of flags" solution. Instead of saying that n is the size of the input string, what if we allow arbitrary sequences of arbitrary values that have an equality relation; can you come up with a solution to this problem that is O(n) in the size of the output sequence, not the input sequence?
That is, can you implement public static IEnumerable<T> Distinct<T>(this IEnumerable<T> t) such that the output is deduplicated but otherwise in the same order as the input, using O(n) storage where n is the size of the output sequence?
This is a better exercise because this function is actually implemented in the base class library. It's useful, unlike the toy problem.
I note also that the problem statement assumes that there is only one relevant alphabet with lowercase characters, and that there are 26 of them. This assumption is false.

Related

Selecting set of binary sequences to avoid similarity

I want to be able to programatically generate a set of binary sequences of a given length whilst avoiding similarity between any two sequences.
I'll define 'similar' between two sequences thus:
If sequence A can be converted to sequence B (or B to A) by bit-shifting A (non-circularly) and padding with 0s, A and B are similar (note: bit-shifting is allowed on only one of the sequences otherwise both could always be shifted to a sequence of just 0s)
For example: A = 01010101 B = 10101010 C = 10010010
In this example, A and B are similar because a single left-shift of A results in B (A << 1 = B). A and C are not similar because no bit-shifting of one can result in the other.
A set of sequences is defined is dissimilar if no subset of size 2 is similar.
I believe there could be multiple sets for a given sequence length and presumably the size of the set will be significantly less than the total possibilities (total possibilities = 2 ^ sequence length).
I need a way to generate a set for a given sequence length. Does an algorithm exist that can achieve this? Selecting sequences one at a time and checking against all previously selected sequences is not acceptable for my use case (but may have to be if a better method doesn't exist!).
I've tried generating sets of integers based on primes numbers and also the golden ratio, then converting to binary. This seemed like it might be a viable method, but I have been unable to get it to work as expected.
Update: I have written a function in C# to that uses a prime number modulo to generate the set without success. Also I've tried using the Fibonacci sequence which finds a mostly dissimilar set, but of a size that is very small compared to the number of possibilities:
private List<string> GetSequencesFib(int sequenceLength)
{
var sequences = new List<string>();
long current = 21;
long prev = 13;
long prev2 = 8;
long size = (long)Math.Pow(2, sequenceLength);
while (current < size)
{
current = prev + prev2;
sequences.Add(current.ToBitString(sequenceLength));
prev2 = prev;
prev = current;
}
return sequences;
}
This generates a set of sequences of size 41 that is roughly 60% dissimilar (sequenceLength = 32). It is started at 21 since lower values produce sequences of mostly 0s which are similar to any other sequence.
By relaxing the conditions of similarity to only allowing a small number of successive bit-shifts, the proportion of dissimilar sequences approaches 100%. This may be acceptable in my use case.
Update 2:
I've implemented a function following DCHE's suggestion, by selecting all odd numbers greater than half the maximum value for a given sequence length:
private static List<string> GetSequencesOdd(int length)
{
var sequences = new List<string>();
long max = (long)(Math.Pow(2, length));
long quarterMax = max / 4;
for (long n = quarterMax * 2 + 1; n < max; n += 2)
{
sequences.Add(n.ToBitString(length));
}
return sequences;
}
This produces an entirely dissimilar set as per my requirements. I can see why this works mathematically as well.
I can't prove it, but from my experimenting, I think that your set is the odd integers greater than half of the largest number in binary. E.g. for bit sets of length 3, max integer is 7, so the set is 5 and 7 (101, 111).

Encrypt int generating unique string and Decrypt string for getting int again

I need to generate unique strings starting from int number (id). The length must be proportionally incremental, so for tiny ids I have to generate unique strings of four characters. For big ids I have to generate strings much more complex, with growing size when needed (max 8 digit) in order to accomplish the uniqueness.
All this procedure must be done with two opposite functions:
from id -> obtain string
from string -> obtain id
Unique strings must be composed by numbers and characters of a specific set (379CDEFHKJLMNPQRTUWXY)
Is there a well know algorithm to do this? I need to do this in c# or better in tsql. Ideas are also appreciated.
Edit
I have "simply" the need to encode (and than decode) the number. I've implemented this routines for my alphabet (21 symbols length):
Encode:
public static String BfEncode(long input)
{
if (input < 0) throw new ArgumentOutOfRangeException("input", input, "input cannot be negative");
char[] clistarr = BfCharList.ToCharArray();
var result = new Stack<char>();
while (input != 0)
{
result.Push(clistarr[input % 21]);
input /= 21;
}
return new string(result.ToArray());
}
And decode:
public static Int64 BfDecode(string input)
{
var reversed = input.ToLower().Reverse();
long result = 0;
int pos = 0;
foreach (char c in reversed)
{
result += BfCharList.IndexOf(c.ToString().ToUpper()) * (long)Math.Pow(21, pos);
pos++;
}
return result;
}
I've generated example strings in a loop starting from 10000 to 10000000 (!). Starting from 10K I can generate strings of 4 digits length. After the generation I've put all the strings in a list and checked for uniqueness (I've done it with parallel foreach...). At the number 122291 the routine thrown an exception because there is a duplicate! Is it possibile?
The base conversion to a custom alphabet is not a good solution?

Cryptography .NET, Avoiding Timing Attack

I was browsing crackstation.net website and came across this code which was commented as following:
Compares two byte arrays in length-constant time. This comparison method is used so that password hashes cannot be extracted from on-line systems using a timing attack and then attacked off-line.
private static bool SlowEquals(byte[] a, byte[] b)
{
uint diff = (uint)a.Length ^ (uint)b.Length;
for (int i = 0; i < a.Length && i < b.Length; i++)
diff |= (uint)(a[i] ^ b[i]);
return diff == 0;
}
Can anyone please explain how does this function actual works, why do we need to convert the length to an unsigned integer and how this method avoids a timing attack? What does the line diff |= (uint)(a[i] ^ b[i]); do?
This sets diff based on whether there's a difference between a and b.
It avoids a timing attack by always walking through the entirety of the shorter of the two of a and b, regardless of whether there's a mismatch sooner than that or not.
The diff |= (uint)(a[i] ^ (uint)b[i]) takes the exclusive-or of a byte of a with a corresponding byte of b. That will be 0 if the two bytes are the same, or non-zero if they're different. It then ors that with diff.
Therefore, diff will be set to non-zero in an iteration if a difference was found between the inputs in that iteration. Once diff is given a non-zero value at any iteration of the loop, it will retain the non-zero value through further iterations.
Therefore, the final result in diff will be non-zero if any difference is found between corresponding bytes of a and b, and 0 only if all bytes (and the lengths) of a and b are equal.
Unlike a typical comparison, however, this will always execute the loop until all the bytes in the shorter of the two inputs have been compared to bytes in the other. A typical comparison would have an early-out where the loop would be broken as soon as a mismatch was found:
bool equal(byte a[], byte b[]) {
if (a.length() != b.length())
return false;
for (int i=0; i<a.length(); i++)
if (a[i] != b[i])
return false;
return true;
}
With this, based on the amount of time consumed to return false, we can learn (at least an approximation of) the number of bytes that matched between a and b. Let's say the initial test of length takes 10 ns, and each iteration of the loop takes another 10 ns. Based on that, if it returns false in 50 ns, we can quickly guess that we have the right length, and the first four bytes of a and b match.
Even without knowing the exact amounts of time, we can still use the timing differences to determine the correct string. We start with a string of length 1, and increase that one byte at a time until we see an increase in the time taken to return false. Then we run through all the possible values in the first byte until we see another increase, indicating that it has executed another iteration of the loop. Continue with the same for successive bytes until all bytes match and we get a return of true.
The original is still open to a little bit of a timing attack -- although we can't easily determine the contents of the correct string based on timing, we can at least find the string length based on timing. Since it only compares up to the shorter of the two strings, we can start with a string of length 1, then 2, then 3, and so on until the time becomes stable. As long as the time is increasing our proposed string is shorter than the correct string. When we give it longer strings, but the time remains constant, we know our string is longer than the correct string. The correct length of string will be the shortest one that takes that maximum duration to test.
Whether this is useful or not depends on the situation, but it's clearly leaking some information, regardless. For truly maximum security, we'd probably want to append random garbage to the end of the real string to make it the length of the user's input, so the time stays proportional to the length of the input, regardless of whether it's shorter, equal to, or longer than the correct string.
this version goes on for the length of the input 'a'
private static bool SlowEquals(byte[] a, byte[] b)
{
uint diff = (uint)a.Length ^ (uint)b.Length;
byte[] c = new byte[] { 0 };
for (int i = 0; i < a.Length; i++)
diff |= (uint)(GetElem(a, i, c, 0) ^ GetElem(b, i, c, 0));
return diff == 0;
}
private static byte GetElem(byte[] x, int i, byte[] c, int i0)
{
bool ok = (i < x.Length);
return (ok ? x : c)[ok ? i : i0];
}

List<T> capacity increasing vs Dictionary<K,V> capacity increasing?

Why does List<T> increase its capacity by a factor of 2?
private void EnsureCapacity(int min)
{
if (this._items.Length < min)
{
int num = (this._items.Length == 0) ? 4 : (this._items.Length * 2);
if (num < min)
{
num = min;
}
this.Capacity = num;
}
}
Why does Dictionary<K,V> use prime numbers as capacity?
private void Resize()
{
int prime = HashHelpers.GetPrime(this.count * 2);
int[] numArray = new int[prime];
for (int i = 0; i < numArray.Length; i++)
{
numArray[i] = -1;
}
Entry<TKey, TValue>[] destinationArray = new Entry<TKey, TValue>[prime];
Array.Copy(this.entries, 0, destinationArray, 0, this.count);
for (int j = 0; j < this.count; j++)
{
int index = destinationArray[j].hashCode % prime;
destinationArray[j].next = numArray[index];
numArray[index] = j;
}
this.buckets = numArray;
this.entries = destinationArray;
}
Why doesn't it also just multiply by 2? Both are dealing with finding continues memory location...correct?
It's common to use prime numbers for hash table sizes because it reduces the probability of collisions.
Hash tables typically use the modulo operation to find the bucket where an entry belongs, as you can see in your code:
int index = destinationArray[j].hashCode % prime;
Suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering:
Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries.
Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.
(from http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html)
HashHelpers.GetPrime(this.count * 2)
should return a prime number. Look at the definition of HashHelpers.GetPrime().
Dictionary puts all its objects into buckets depending on their GetHashCode value, i.e.
Bucket[object.GetHashCode() % DictionarySize] = object;
It uses a prime number for size to avoid the chance of collisions. Presumably a size with many divisors would be bad for poorly designed hash codes.
From a question in SO;
Dictionary or hash table relies on hashing the key to get a smaller
index to look up into corresponding store (array). So choice of hash
function is very important. Typical choice is to get hash code of a
key (so that we get good random distribution) and then divide the code
by a prime number and use reminder to index into fixed number of
buckets. This allows to convert arbitrarily large hash codes into a
bounded set of small numbers for which we can define an array to look
up into. So its important to have array size in prime number and then
the best choice for the size become the prime number that is larger
than the required capacity. And that's exactly dictionary
implementation does.
List<T> employs arrays to store data; and increasing the capacity of an array requires copying the array to a new memory location; which is time consuming. I guess, in order to lower the occurence of copying arrays, list doubles it's capacity.
I'm not computer scientist, but ...
Most probabbly its related to a HashTable's Load factor (the last link just a math definition), and for not creating more confusion, for not math auditory, it's important to define that:
loadFactor = FreeCells/AllCells
this we can write as
loadFactor = (AllBuckets - UsedBuckets)/AllBuckets
loadFactor defines a probabbilty of collision in hash map.
So by using a Prime Number,a number that
..is a natural number greater than 1 that
has no positive divisors other than 1 and itself.
we decrease (but do not erase) a risk of collision in our hashmap.
If loadFactor tends to 0, we have more secure hashmap, so we always has to keep it as low as possible. By MS blog, they found out that the value of that loadFactor (optimal one) has to be arround 0.72, so if it becomes bigger, we increase the capacity following nearest prime number.
EDIT
To be more clear on this: having a prime number, ensures, as mush as it possible, uniform destribution of the hashes in this concrete implementation of the hash we have in .NET dictionary. It's not about efficency of the retrieval of the values, but efficiency of the memory used and collision risk reduction.
Hope this helps.
Dictionary needs some heuristic so that hash code distribution among buckets is more uniform.
.NET's Dictionary uses prime number of buckets to do that, and then calculates bucket index like this:
int num = this.comparer.GetHashCode(key) & 2147483647; // make hash code positive
// get the remainder from division - that's our bucket index
int num2 = this.buckets[num % ((int)this.buckets.Length)];
When it grows, it doubles the number of buckets and then adds some more to make the number prime again.
It's not the only heuristic possible. Java's HashMap, for example, takes another approach. The number of buckets there is always a power of 2 and on grow it just doubles the number of buckets:
resize(2 * table.length);
But when calculating bucket index it modifies hash:
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
static int indexFor(int h, int length) {
return h & (length-1);
}
// from put() method
int hash = hash(key.hashCode()); // get modified hash
int i = indexFor(hash, table.length); // trim the hash to the bucket count
List on the other hand doesn't need any heuristic, so they didn't bother.
Addition: Grow behavior doesn't influence Add's complexity at all. Dictionary, HashMap and List each have amortized Add complexity of O(1).
Grow operation takes O(N) but occurs only N-th time, so to cause grow operation we need to call Add N times. For N=8 the time it takes to do N Adds has the value
O(1)+O(1)+O(1)+O(1)+O(1)+O(1)+O(1)+O(N) = O(N)+O(N) = O(2N) = O(N)
So, N Adds take O(N), then one Add takes O(1).
Increasing the capacity by a constant factor (instead of for example increasing the capacity by a additive constant) when resizing is required to guarantee some amortized running times. For example adding to or removing from the end of an array based list requires O(1) time except when you have to increase or decrease the capacity requiring to copy the list content and therefore requiring O(n) time. Changing the capacity by a constant factor guarantees that the amortized runtime is still O(1). The optimal value of the factor depends on the expected usage. Some more information on Wikipedia.
Choosing the capacity of a hash table to be prime is used to improve the distribution of the items. bucket[hash % capacity] will yield a more uniform distribution if hash is not uniformly distributed if capacity is prime. (I can not give the math behind that but I am looking for a good reference.) The combination of this with the first point is exactly what the implementation does - increasing the capacity by a factor (of at least) 2 and also ensure that the capacity is prime.

Compressing big number (or string) to small value

My ASP.NET page has following query string parameter:
…?IDs=1000000012,1000000021,1000000013,1000000022&...
Here IDs parameter will always have numbers separated by something, in this case ,. Currently there are 4 numbers but normally they would be in between 3 and 7.
Now, I am looking for method to convert each big number from above into smallest possible value; specifically compressing value of IDs query string parameter. Both, compressing each number algorithm or compressing whole value of IDs query string parameter are welcome.
Encode or decode is not an issue; just compressing the value IDs query string parameter.
Creating some unique small value for IDs and then retrieving its value from some data source is out of scope.
Is there an algorithm to compress such big numbers to small values or to compress value of the IDs query string parameter all together?
You basically need so much room for your numbers because you are using base 10 to represent them. An improvement would be to use base 16 (hex). So for example, you could represent 255 (3 digits) as ff (2 digits).
You can take that concept further by using a much larger number base... the set of all characters that are valid query string parameters:
A-Z, a-z, 0-9, '.', '-', '~', '_', '+'
That gives you a base of 67 characters to work with (see Wikipedia on QueryString).
Have a look at this SO post for approaches to converting base 10 to arbitrary number bases.
EDIT:
In the linked SO post, look at this part:
string xx = IntToString(42,
new char[] { '0','1','2','3','4','5','6','7','8','9',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x'});
That's almost what you need. Just expand it by adding the few characters it is missing:
yz.-~_+
That post is missing a method to go back to base 10. I'm not going to write it :-) but the procedure is like this:
Define a counter I'll call TOTAL.
Look at the right most character and find it's position in the array.
TOTAL = (the position of the character in the array)
Example: Input is BA1. TOTAL is now 1 (since "1" is in position 1 in the array)
Now look at the next character left of the first one and find it's position in the array.
TOTAL += 47 * (the position of the character in the array)
Example: Input is BA1. TOTAL is now (47 * 11) + 1 = 518
Now look at the next character left of the previous one and find it's position in the array.
TOTAL += 47 * 47 * (the position of the character in the array)
Example: Input is BA1. Total is now (47 * 47 * 10) + (47 * 11) + 1 = 243508
And so on.
I suggest you write a unit test that converts a bunch of base 10 numbers into base 47 and then back again to make sure your conversion code works properly.
Note how you represented a 6 digit base 10 number in just 3 digits of base 47 :-)
What is the range of your numbers? Assuming they can fit in a 16-bit integer, I would:
Store all your numbers as 16-bit integers (2 bytes per number, range -32,768 to 32,767)
Build a bytestream of 16-bit integers (XDR might be a good option here; at very least, make sure to handle endianness correctly)
Base64 encode the bytestream, using the modified base64 encoding for URLs (net is about 3 characters per number)
As an added bonus you don't need comma characters anymore because you know each number is 2 bytes.
Alternatively, if that isn't good enough, I'd use zlib to compress your stream of integers and then base64 the zlib-compressed stream. You can also switch to 32-bit integers if 16-bit isn't a large enough range (i.e. if you really need numbers in the 1,000,000,000 range).
Edit:
Maybe too late, but here's an implementation that might do what you need:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Scratch {
class Program {
static void Main(string[] args) {
//var ids = new[] { 1000000012, 1000000021, 1000000013, 1000000022 };
var rand = new Random();
var ids = new int[rand.Next(20)];
for(var i = 0; i < ids.Length; i++) {
ids[i] = rand.Next();
}
WriteIds(ids);
var s = IdsToString(ids);
Console.WriteLine("\nResult string is: {0}", s);
var newIds = StringToIds(s);
WriteIds(newIds);
Console.ReadLine();
}
public static void WriteIds(ICollection<Int32> ids) {
Console.Write("\nIDs: ");
bool comma = false;
foreach(var id in ids) {
if(comma) {
Console.Write(",");
} else {
comma = true;
}
Console.Write(id);
}
Console.WriteLine();
}
public static string IdsToString(ICollection<Int32> ids) {
var allbytes = new List<byte>();
foreach(var id in ids) {
var bytes = BitConverter.GetBytes(id);
allbytes.AddRange(bytes);
}
var str = Convert.ToBase64String(allbytes.ToArray(), Base64FormattingOptions.None);
return str.Replace('+', '-').Replace('/', '_').Replace('=', '.');
}
public static ICollection<Int32> StringToIds(string idstring) {
var result = new List<Int32>();
var str = idstring.Replace('-', '+').Replace('_', '/').Replace('.', '=');
var bytes = Convert.FromBase64String(str);
for(var i = 0; i < bytes.Length; i += 4) {
var id = BitConverter.ToInt32(bytes, i);
result.Add(id);
}
return result;
}
}
}
Here's another really simple scheme that should give good compression for a set of numbers of the form N + delta where N is a large constant.
public int[] compress(int[] input) {
int[] res = input.clone();
Arrays.sort(res);
for (int i = 1; i < res.length; i++) {
res[i] = res[i] - res[i - 1];
}
return res;
}
This should reduce the set {1000000012,1000000021,1000000013,1000000022} to the list [1000000012,1,9,1], which you can then compress further by representing the numbers in base47 encoding as described in another answer.
Using simple decimal encoding, this goes from 44 characters to 16 characters; i.e. 63%. (And using base47 will give even more compression).
If it is unacceptable to sort the ids, you don't get quite as good compression. For this example, {1000000012,1000000021,1000000013,1000000022} compresses to the list [1000000012,9,-8,9]. That is just one character longer for this example
Either way, this is better than a generic compression algorithm or encoding schemes ... FOR THIS KIND OF INPUT.
If the only issue is the URL length, you can convert numbers to base64 characters, then convert them back to numbers at the server side
how patterned are the IDs you are getting? if digit by digit, the IDs are random, then the method I am about to propose won't be very efficient. but if the IDs you gave as an example are representative of the types you'd be getting, then perhaps the following could work?
i motivate this idea by example.
you have for example, 1000000012 as ID that you'd like to compress. why not store it as [{1},{0,7},{12}]? This would mean that the first digit is a 1 followed by 7 zeros followed by a 12. Thus if we use the notation {x} that would represent one instance of x, while if we use {x,y} that would mean that x occurs y times in a row.
you could extend this with a little bit of pattern matching and/or function fitting.
for example, pattern matching: 1000100032 would be [{1000,2}{32}].
for example, function fitting:
if your IDs are 10 digits, then split the ID into two 5 digit numbers and store the equation of the line that goes through both points. if ID = 1000000012, the you have y1 = 10000 and y2 = 12. therefore, your slope is -9988 and your intercept is 10000 (assuming x1 = 0, x2 = 1). In this case, it's not an improvement, but if the numbers were more random, it could be. Equivalently, you could store the sequence of IDs with piecewise linear functions.
in any case, this mostly depends on the structure of your IDs.
I assume you are doing this as a workaround for request URL length restrictions ...
Other answers have suggested encoding the decimal id numbers in hex, base47 or base64, but you can (in theory) do a lot better than that by using LZW (or similar) to compress the id list. Depending on how much redundancy there is in your ID lists, you could get significantly more than 40% reduction, even after re-encoding the compressed bytes as text.
In a nut-shell, I suggest that you find an off-the-shelf text compression library implemented in Javascript and use it client side to compress the ID list. Then encode the compressed bytestring using base47/base64, and pass the encoded string as the URL parameter. On the server side do the reverse; i.e. decode followed by decompress.
EDIT: As an experiment, I created a list of 36 different identifiers like the ones you supplied and compressed it using gzip. The original file is 396 bytes, the compressed file is 101 bytes, and the compressed + base64 file 138 bytes. That is a 65% reduction overall. And the compression ratio could actually improve for larger files. However, when I tried this with a small input set (e.g. just the 4 original identifiers), I got no compression, and after encoding the size was larger than the original.
Google "lzw library javascript"
In theory, there might be simpler solution. Send the parameters as "post data" rather than in the request URL, and get the browser to apply the compression using one of the encodings that it understands. That will give you more savings too since there is no need to encode the compressed data into legal URL characters.
The problem is getting the browser to compress the request ... and doing that in a browser independent way.

Categories