How to create a smallest hash with less character in a url - c#

I have the following Hash.EncodeMD5(p) that take a value p and Encode it, and eventually it get passed into a url as such:
www.mysite/test/test.aspx?perm=?|1098951-c2bcc0d267304a3d7d663007dbf801bc|1011796-3af44ad8442000232390799c367a06ed|
My problem is my url can be very long in length. and how can I reduce it's length? I believe the issue is Hash.EncodeMD5(p);
string[] x = (Request.Form["ID"]).Split(',');
foreach (string pp in x)
{
perm += pp + "-" + Hash.EncodeMD5(pp);
}

Try one of these without shortening the hash :
Store the very long hash value in a cookie that you read later server-side.
Hide the values in some hidden Asp.Net/HTML controls.
Hide the values (client-side) via window.localStorage. This solution might not work but it worths the try.

Related

Passing array into query string

I need to send array as query parameter, I do it like this
StringBuilder Ids = new StringBuilder();
for (int i = 0; i < array.Count; i++)
{
Ids.Append(String.Format("&id[{0}]={1}", i, items[i].ID));
}
ifrDocuments.Attributes.Add("src", "Download.aspx?arrayCount=" + array.Count + Ids);
After this I have string:
Download.aspx?arrayCount=8&id[0]=106066&id[1]=106065&id[2]=106007&id[3]=105284&id[4]=105283&id[5]=105235&id[6]=105070&id[7]=103671
It can contain 100 elements and in this case I'm getting error:
enter image description here
Maybe I can do it in another way? Not by sending it inside query qtring.
There is a limit on URL length on multiple levels (browsers, proxy servers, etc). You can change maxQueryString (*1) but I would not recommend it if you expect real users to use your system.
It looks like downloads.aspx is your page. Put all those ids in temporary storage - (cache or database) and pass the key to this new entity in the request
*1: https://blog.elmah.io/fix-max-url-and-query-string-length-with-web-config-and-iis/
QueryString is not the way to pass an array because of the limits.
If you have hands on the endpoint, you should consider sending your array in a POST Body.
Regards

How to build a fast cache, fast searching for objects in a list

I got lots of data to write into database tables (Oracle).
Writing takes much time and I want to avoid to post data-sets which are in the table already. Therefor I need a cache.
At first I used generic List and Dictionary>key, value> as a cache.
I tried IMemoryCache from .NET but I got the feeling it does not fit on my problem.
Also I tried using hash, but that does not work because the object in my cache has a different has than a other object with same values.
My current solution is faster (nearly double speed) than posting every object into the database but still far to slow.
When I post a object to the database I get the key as return. This key I also need in the further code.
string dataRecordKey = dataRecord.MetaDataRecordId.ToString() + "|" + dataRecord.Profile + "|" + dataRecord.Group + "|" + dataRecord.FirstName + "|" + dataRecord.FamilyName+ "|" + dataRecord.City;
int dataRecordId = -1;
if (dictDataRecord.ContainsKey(dataRecordKey))
{
dataRecordId = dictDataRecord[dataRecordKey];
}
else
{
dataRecordId = await dataRecordRepository.CreateDataRecordAsync(dataRecord);
dictDataRecord.Add(dataRecordKey, dataRecordId);
}
To post 115 data-sets costs 6 seconds.
To use my code above it takes 3.6 seconds.
But I need to get it below 1 second.
Usually, problems like those are solved with hashing.
First
Also I tried using hash, but that does not work because the object in my cache has a different has than a other object with same values
Did you override the hashing function? You can define how objects are hashed to a Dictionary, if the current hash doesn't fit your needs. First I recomend trying to override the hash:
public override int GetHashCode()
{
string dataRecordKey = dataRecord.MetaDataRecordId.ToString() + "|" + dataRecord.Profile + "|" + dataRecord.Group + "|" + dataRecord.FirstName + "|" + dataRecord.FamilyName+ "|" + dataRecord.City;
return dataRecordKey.GetHashCode();
}
Now, if for some reason this hash is too expensive to compute the common solution is to use multiple hashes!:
A fast, but unrealiable hash.
A slow, but more reliable hash.
Use 2 hash sets (in C# simply use a Dictionary with any value you want, we will only care about whether the key exists or not, not the value)
For the first hashmap use the fast hash (for example, the length of the dataRecordKey string or just the length on one of its individual strings such as dataRecord.FamilyName or simply dataRecord.GetHashCode()
To make a check:
First, check for the record on the first (fast) dictionary. If the key is found on the first dictionary, well, remember that the hash is bad so we need to check the second dictionary using the aforementioned hash override.
If the key is not found on the second dictionary, then add it to second dictionary and to database. If its found on second, skip it.
If the key is not found in the first dictionary then you are absolutely sure it wasn't added to the database. Add it and then add to both hashmaps.
How many collisions the first hash has will afect performance. You'll be trading collisions for computation speed.
Edit
You say you need the key, so both Dictionaries can have the key as value. Just remember do not retrieve the key from the first hashmap, as multiple records will hash into the same value (which is intended).
Edit 2
Sorry, a little optimization, if nothing is found on the first dictionary you don't even need to check the second because you are sure its not there.
Also, to prevent lots of items in the same bucket, for the first dictionary, use a bool set to either true or false, the value is irrelevant.
My current solution looks like that:
int dataRecordId = -1;
string dataRecordKey = dataRecord.MetaDataRecordId.ToString() + "|" + dataRecord.Profile + "|" + dataRecord.Group + "|" + dataRecord.FirstName + "|" + dataRecord.FamilyName+ "|" + dataRecord.City;
int theHash = dataRecordKey.GetHashCode();
if(dictDataRecord1.ContainsKey(theHash))
{
dataRecordId = dictDataRecord1[theHash];
}
else
{
dataRecordId = await dataRecordRepository.CreateDataRecordAsync(dataRecord);
dictDataRecord1.Add(theHash, dataRecordId);
}
Now it is only a little faster. 2.75 instead of 3.6 seconds per data set.
#André Santos: You speak about two Dictionarys. Do you mean two Dictionary object with the same content? That makes no sense in my eyes. Or the first Dictionary with (dataRecordKey.Length, dataRecordId)?

Using c# to read from a text file

Am reading from a text file using the code below.
if (!allLines.Contains(":70"))
{
var firstIndex = allLines.IndexOf(":20");
var secondIndex = allLines.IndexOf(":23B");
var thirdIndex = allLines.IndexOf(":59");
var fourthIndex = allLines.IndexOf(":71A");
var fifthIndex = allLines.IndexOf(":72");
var sixthIndex = allLines.IndexOf("-}");
var firstValue = allLines.Substring(firstIndex + 4, secondIndex - firstIndex - 5).TrimEnd();
var secondValue = allLines.Substring(thirdIndex + 4, fourthIndex - thirdIndex - 5).TrimEnd();
var thirdValue = allLines.Substring(fifthIndex + 4, sixthIndex - fifthIndex - 5).TrimEnd();
var len1 = firstValue.Length;
var len2 = secondValue.Length;
var len3 = thirdValue.Length;
inflow103.REFERENCE = firstValue.TrimEnd();
pointer = 1;
inflow103.BENEFICIARY_CUSTOMER = secondValue;
inflow103.RECEIVER_INFORMATION = thirdValue;
}
else if (allLines.Contains(":70"))
{
var firstIndex = allLines.IndexOf(":20");
var secondIndex = allLines.IndexOf(":23B");
var thirdIndex = allLines.IndexOf(":59");
var fourthIndex = allLines.IndexOf(":70");
var fifthIndex = allLines.IndexOf(":71");
var sixthIndex = allLines.IndexOf(":72");
var seventhIndex = allLines.IndexOf("-}");
var firstValue = allLines.Substring(firstIndex + 4, secondIndex - firstIndex - 5).TrimEnd();
var secondValue = allLines.Substring(thirdIndex + 5, fourthIndex - thirdIndex - 5).TrimEnd();
var thirdValue = allLines.Substring(sixthIndex + 4, seventhIndex - sixthIndex - 5).TrimEnd();
var len1 = firstValue.Length;
var len2 = secondValue.Length;
var len3 = thirdValue.Length;
inflow103.REFERENCE = firstValue.TrimEnd();
pointer = 1;
inflow103.BENEFICIARY_CUSTOMER = secondValue;
inflow103.RECEIVER_INFORMATION = thirdValue;
}
Below is the format of the text file am reading.
{1:F21DBLNNGLAAXXX4695300820}{4:{177:1405260906}{451:0}}{1:F01DBLNNGLAAXXX4695300820}{2:O1030859140526SBICNGLXAXXX74790400761405260900N}{3:{103:NGR}{108:AB8144573}{115:3323774}}{4:
:20:SBICNG958839-2
:23B:CRED
:23E:SDVA
:32A:140526NGN168000000,
:50K:IHS PLC
:53A:/3000025296
SBICNGLXXXX
:57A:/3000024426
DBLNNGLA
:59:/0040186345
SONORA CAPITAL AND INVSTMENT LTD
:71A:OUR
:72:/CODTYPTR/001
-}{5:{MAC:00000000}{PAC:00000000}{CHK:42D0D867739F}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}
The above file format represent one transaction in a single text file, but while testing with live files, I came accross a situation where a file can have more than one transaction. Example is the code below.
{1:F21DBLNNGLAAXXX4694300150}{4:{177:1405231923}{451:0}}{1:F01DBLNNGLAAXXX4694300150}{2:O1031656140523FCMBNGLAAXXX17087957771405231916N}{3:{103:NGR}{115:3322817}}{4:
:20:TRONGN3RDB16
:23B:CRED
:23E:SDVA
:26T:001
:32A:140523NGN1634150,00
:50K:/2206117013
SUNLEK INVESTMENT LTD
:53A:/3000024763
FCMBNGLA
:57A:/3000024426
DBLNNGLA
:59:/0022617678
GOLDEN DC INT'L LTD
:71A:OUR
:72:/CODTYPTR/001
//BNF/TRSF
-}{5:{MAC:00000000}{PAC:00000000}{CHK:C21000C4ECBA}{DLM:}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}${1:F21DBLNNGLAAXXX4694300151}{4:{177:1405231923}{451:0}}{1:F01DBLNNGLAAXXX4694300151}{2:O1031656140523FCMBNGLAAXXX17087957781405231916N}{3:{103:NGR}{115:3322818}}{4:
:20:TRONGN3RDB17
:23B:CRED
:23E:SDVA
:26T:001
:32A:140523NGN450000,00
:50K:/2206117013
SUNLEK INVESTMENT LTD
:53A:/3000024763
FCMBNGLA
:57A:/3000024426
DBLNNGLA
:59:/0032501697
SUNSTEEL INDUSTRIES LTD
:71A:OUR
:72:/CODTYPTR/001
//BNF/TRSF
-}{5:{MAC:00000000}{PAC:00000000}{CHK:01C3B7B3CA53}{DLM:}}{S:{SPD:}{SAC:}{FAC:}{COP:P}}
My challenge is that in my code, while reading allLines, each line is identified by certain index, a situation where I need to pick up the second transaction from the file, and the same index exist like as before, how can I manage this situation.
This is a simple problem obscured by excess code. All you are doing is extracting 3 values from a chunk of text where the precise layout can vary from one chunk to another.
There are 3 things I think you need to do.
Refactor the code. Instead of two hefty if blocks inline, you need functions that extract the required text.
Use regular expressions. A single regular expression can extract the values you need in one line instead of several.
Separate the code from the data. The logic of these two blocks is identical, only the data changes. So write one function and pass in the regular expression(s) needed to extract the data items you need.
Unfortunately this calls for a significant lift in the abstraction level of the code, which may be beyond what you're ready for. However, if you can do this and (say) you have function Extract() with regular expressions as arguments, you can apply that function once, twice or as often as needed to handle variations in your basic transaction.
You may perhaps use the code below to achieve multiple record manipulation using your existing code
//assuming fileText is all the text read from the text file
string[] fileData = fileText.Split('$');
foreach(string allLines in fileData)
{
//your code goes here
}
Maybe indexing works, but given the particular structure of the format, I highly doubt that is a good solution. But if it works for you, then that's great. You can simply split on $ and then pass each substring into a method. This assures that the index for each substring starts at the beginning of the entry.
However, if you run into a situation where indices are no longer static, then before you even start to write a parser for any format, you need to first understand the format. If you don't have any documentation and are basically reverse engineering it, that's what you need to do. Maybe someone else has specifications. Maybe the source of this data has it somewhere. But I will proceed under the assumption that none of this information is available and you have been given a task with absolutely no support and are expected to reverse-engineer it.
Any format that is meant to be parsed and written by a computer will 9 out of 10 times be well-formed. I'd say 9.9 out of 10 for that matter, since there are cases where people make things unnecessarily complex for the sake of "security".
When I look at your sample data, I see "chunks" of data enclosed within curly braces, as well as nested chunks.
For example, you have things like
{tag1:value1} // simple chunk
{tag2:{tag3: value3}{tag4:value4}} // nested chunk
Multiple transactions are delimited by a $ apparently. You may be able to split on $ signs in this case, but again you need to be sure that the $ is a special character and doesn't appear in tags or values themselves.
Do not be fixated on what a "chunk" is or why I use the term. All you need to know is that there are "tags" and each tag comes with a particular "value".
The value can be anything: a primitive such as a string or number, or it can be another chunk. This suggests that you need to first figure out what type of value each tag accepts. For example, the 1 tag takes a string. The 4 tag takes multiple chunk, possibly representing different companies. There are chunks like DLM that have an empty value.
From these two samples, I would assume that you need to consume each each chunk, check the tag, and then parse the value. Since there are nested chunks, you likely need to store them in a particular way to correctly handle it.

deserializing data best practices

I have been given the task of deserializing some data. The data has all been munged into a string which is in the following format:
InternalNameA8ValueDisplay NameA¬InternalNameB8ValueDisplay NameB¬ etc etc.
(ie, it has an internal name, '8', the value, the display name, followed by '¬' **). for example, you'd have FirstName8JoeFirst Name¬
I have no control over how this data is serialized, its legacy stuff.
I've thought of doing a bunch of splits on the string, or breaking it up into a char array and splitting down the text that way. But this just seems horrible. This way there is too much that could go wrong (e.g, if the value of a phone number (for example), could begin with '8'.
What I want to know is what peoples' approaches to this would be? Is there anything more clever i can do to break the data down
note: '¬' isn't actually the character, it looks more like an arrow pointing left. but I'm away from my machine at the moment. Doh!
Thanks.
Instead of using splits, I would recommend using a simple state machine. Walk over each characters until you hit a delimiter, then you know you're on the next field. That takes care of issues like an "8" in a phone number.
NOTE - untested code ahead.
var fieldValues = new string[3];
var currentField = 0;
var line = "InternalNameA8ValueDisplay NameA¬InternalNameB8ValueDisplay NameB¬";
foreach (var c in line)
{
if (c == '8' && currentField == 0)
{
currentField++; continue;
}
if (c == '¬')
{
currentField++; continue;
}
fieldValues[currentField] += c;
}
Dealing with wonky formats - always a good time!
Good luck,
Erick

String Builder vs Lists

I am reading in multiple files in with millions of lines and I am creating a list of all line numbers that have a specific issue. For example if a specific field is left blank or contains an invalid value.
So my question is what would be the most efficient date type to keep track of a list of numbers that could be upwards of a million number of rows. Would using String Builder, Lists, or something else be more efficient?
My end goal is to out put a message like "Specific field is blank on 1-32, 40, 45, 47, 49-51, etc. So in the case of a String Builder, I would check the previous value and if it is is only 1 more I would change it from 1 to 1-2 and if it was more than one would separate it by a comma. With the List, I would just add each number to the list and then combine then once the file has been completely read. However in this case I could have multiple list containing millions of numbers.
Here is the current code I am using to combine a list of numbers using String Builder:
string currentLine = sbCurrentLineNumbers.ToString();
string currentLineSub;
StringBuilder subCurrentLine = new StringBuilder();
StringBuilder subCurrentLineSub = new StringBuilder();
int indexLastSpace = currentLine.LastIndexOf(' ');
int indexLastDash = currentLine.LastIndexOf('-');
int currentStringInt = 0;
if (sbCurrentLineNumbers.Length == 0)
{
sbCurrentLineNumbers.Append(lineCount);
}
else if (indexLastSpace == -1 && indexLastDash == -1)
{
currentStringInt = Convert.ToInt32(currentLine);
if (currentStringInt == lineCount - 1)
sbCurrentLineNumbers.Append("-" + lineCount);
else
{
sbCurrentLineNumbers.Append(", " + lineCount);
commaCounter++;
}
}
else if (indexLastSpace > indexLastDash)
{
currentLineSub = currentLine.Substring(indexLastSpace);
currentStringInt = Convert.ToInt32(currentLineSub);
if (currentStringInt == lineCount - 1)
sbCurrentLineNumbers.Append("-" + lineCount);
else
{
sbCurrentLineNumbers.Append(", " + lineCount);
commaCounter++;
}
}
else if (indexLastSpace < indexLastDash)
{
currentLineSub = currentLine.Substring(indexLastDash + 1);
currentStringInt = Convert.ToInt32(currentLineSub);
string charOld = currentLineSub;
string charNew = lineCount.ToString();
if (currentStringInt == lineCount - 1)
sbCurrentLineNumbers.Replace(charOld, charNew);
else
{
sbCurrentLineNumbers.Append(", " + lineCount);
commaCounter++;
}
}
My end goal is to out put a message like "Specific field is blank on 1-32, 40, 45, 47, 49-51
If that's the end goal, no point in going through an intermediary representation such as a List<int> - just go with a StringBuilder. You will save on memory and CPU that way.
StringBuilder serves your purpose so stick with that, if you ever need the line numbers you can easily change the code then.
Depends on how you can / want to break the code up.
Given you are reading it in line order, not sure you need a list at all.
Your current desired output implies that you can't output anything until the file is completely scanned. The size of the file suggests a one pass`analysis phase would be a good idea as well, given you are going to use buffered input as opposed to reading the entire thing into memory.
I'd be tempted with an enum to describe the issue e.g Field??? is blank and then use that as the key a dictionary of string builders.
As a first thought anyway
Is your output supposed to be human readable? If so, you'll hit the limit of what is reasonable to read, long before you have any performance/memory issues from your data structure. Use whatever is easiest for you to work with.
If the output is supposed to be machine readable, then that output might suggest an appropriate data structure.
As others have pointed out, I would probably use StringBuilder. The List may have to resize many times; the new implementation of StringBuilder does not have to resize.

Categories