I have an old Paradox database (I can convert it to Access 2007) which contains more then 200,000 records. This database has two columns: the first one is named "Word" and the second one is named "Mean". It is a dictionary database and my client wants to convert this old database to ASP.NET and SQL.
However, we don't know what key or method is used to encrypt or encode the "Mean" column which is in the Unicode format. The software itself has been written in Delphi 7 and we don't have the source code. My client only knows the credentials for logging in to database. The problem is decoding the Mean column.
What I do have is the compiled windows application and the Paradox database. This software can decode the "Mean" column for each "Word" so the method and/or key is in its own compiled code(.exe) or one of the files in its directory.
For example, we know that in the following row the "Zymurgy"
exactly means "مبحث عمل تخمیر در شیمی علمی, تخمیر شناسی" since the application translates it like that. Here is what the record looks like when I open the database in Access:
Word Mean
Zymurgy 5OBnGguKPdDAd7L2lnvd9Lnf1mdd2zDBQRxngsCuirK5h91sVmy0kpRcue/+ql9ORmP99Mn/QZ4=
Therefore we're trying to discover how the value in the Mean column is converted to "مبحث عمل تخمیر در شیمی علمی, تخمیر شناسی". I think the "Mean" column value in above row is encoded in Base64 string format, but decoding the Base64 string does not yet result in the expected text.
The extensions for files in the win app directory are dll, CCC, DAT, exe (other than the main app file), SYS, FAM, MB, PX, TV, VAL.
Any kind of help is appreciated.
here is two more example and remember double quotes at start and end are not part of the strings:
word: "abdominal"
coded value: "vwtj0bmj7jdF9SS8sbrIalBoKMDvTbpraFgG4gP/G9GLx5iU/E98rQ=="
translation in Farsi: "شکمی, بطنی, وریدهای شکمی, ماهیان بطنی"
word: "cart"
coded value: "KHoCkDsIndb6OKjxVxsh+Ti+iA/ZqP9sz28e4/cQzMyLI+ToPbiLOaECWQ8XKXTz"
translation in Farsi: "ارابه, گاری, دوچرخه, چرخ, با گاری بردن"
here is the result in different encodings:
1- in unicode the result is: "ᩧ訋퀽矀箖�柖�섰᱁艧껀늊螹泝汖銴岔也捆鹁"
2- in utf32 the result is: "��������������"
3- in utf7 the result is: "äàg\v=ÐÀw²ö{Ýô¹ßÖg]Û0ÁAgÀ®²¹ÝlVl´\\¹ïþª_NFcýôÉÿA"
4- in utf8 the result is: "��g\v�=��w���{����g]�0�Ag��������lVl���\\����_NFc����A�"
5- in 1256 the result is: "نàg\vٹ=ذہw²ِ–{فô¹كضg]غ0ءAg‚ہ®ٹ²¹‡فlVl´’”\\¹ïھ_NFcôةےA"
yet i discovered that the paradox database system is very complex when it comes to key management and most of the time the keys are "compound keys" and that's why it's problematic and that's why it's abandoned!
UPDATE: i'm trying to do the automation by using AutoIt v3 because the decryption process as i understand can't be done in one or two days. now i have another problem which is related to text/font. when i copy the translated text to notepad it will change to some unrecognizable text unless i change the font of notepad to the font of the translation software. if i type something in the notepad in Farsi it will show it correctly regardless of what font i've been chosen. more interesting is when i copy the text to any other program like MS Office Word it'll be shown correctly no matter what font i choose.
so how can i get around this ?
In this situation, I would think about writing a script/program to simply pull all the data out through the existing program.
You could write an application to send keypresses to the app which would select and copy each value in turn.
It would take a while to run, but you could just leave it overnight (how big is your database?) and it only has to run once.
Not sure how easy this would be, since I haven't seen this app of course - might this work?
Take a debugger like ollydbg/softice. Find the place where the mean is decoded/encoded and then step through the instructions one by one, check all registers to find out what is done. I have done so numerous times. That should help you getting started, since you have the application which is able to decode this stuff. You also have a reference word. That's all you need.
Also take into consideration: Unicode can be Little or Big Endian. So you might try swapping the bytes. UTF-8 can be a pain, since some words are stored as one byte and some as two bytes.
You can also try to take words which are almost identical in Farsi and try to compare the outputs. That could lead to a reconstruction of a custom code page, if there is one.
Related
aight so I'm making a WPF application and it has a loading window and on that loading window i want the text to be random every time the app is started any idea on how to do this?
Depending on what you mean by "Random" there could be multiple ways to do this:
Have a Hardcoded list of Loading texts in Application (or a resource like xml or database). Read a random line from this store.
Do a WebApi call to fetch the loading text (You need to create/find relevant WebApi).
Want Truly Random/runtine generated text? Ok - generate a random number in range: −2,147,483,648 to 2,147,483,647. Convert this to UTF-32. Keep appending more chars and/or spaces until a minimum string length is achieved. Use this as the Loading text.
If you are looking for generating a random Integer - Look at this
EDIT FOR FUTURE VIEWERS - Fixed by changing character set to CP1252 in
php.ini
So recently I took on the task of redoing an old website we created a number of years ago, updating it to be faster, more efficient and, most importantly, not on the WordPress back end.
We are almost done, but I have run into a snag. The old database was done in the CP1252 encoding format and when updating the code, we have converted to the UTF-8 standard. This has naturally caused a number of database entries to be formatted improperly and, with 42,000+ entries in one table alone, it's not super easy to re-enter all the data.
I worked with a developer to create a simple PHP script that loads if an entry's 'id' is below a certain number to convert the old data to UTF-8 upon display.
Here's an example as it pulls an obituary:
function convert_charset($input){
return iconv('CP1252', 'UTF-8', $input);
}
…
if ($row["id"] > "42362") {
return $row["obituary"];
}
else {
return stripslashes(convert_charset($row["obituary"]));
}
This works perfectly. But now I have to convert the mobile site (of course the project lead doesn't want to do a responsive site. OF COURSE HE DOESN'T THAT WOULD MAKE TOO MUCH SENSE) and it's written in ASP.NET, which I have no experience working with, and don't know where to even start.
It is pulling the information as such:
HTML += "<a href='http://twitter.com/share?text=Obituary for " + firstName + "'>";
Can I just get it to load the PHP queries in the header and copy how I have been doing it in PHP into ASP.NET? Being that the information is now separated, is there a way I can convert after a certain point if I can't?
I am trying to read the data stored in an ICMT tag on a WAV file generated by a noise monitoring device.
The RIFF parsing code all seems to work fine, except for the fact that the ICMT tag seems to have data after the declared size. As luck would have it, it's the timestamp, which is the one absolutely critical piece of info for my application.
SYN is hex 16, which gives a size of 22, which is up to and including the NUL before the timestamp. The monitor documentation is no help; it says that the tag includes the time, but their example also has the same issue.
It is the last tag in the enclosing list, and the size of the list does include it - does that mean it doesn't need a chunk ID? I'm struggling to find decent RIFF docs, but I can't find anything that suggests that's the case; also I can't see how it'd be possible to determine that it was the last chunk and so know to read it with no chunk ID.
Alternatively, the ICMT comment chunk is the last thing in the file - is that a special case? Can I just get the time by reading everything from the end of the declared length ICMT to the end of the file and assume that will always work?
The current parser behaviour is that it's being read after the channel / dB information as a chunk ID + size, and then complaining that there was not enough data left in the file to fulfil the request.
No, it would still need its own ID. No, being the last thing in the file is no special case either. What you're showing here is malformed.
Your current parser errors correctly, as the next thing to be expected again is a 4 byte ID followed by 4 bytes for the length. The potential ID _10: is unknown and would be skipped, but interpreting 51:4 as DWORD for the length of course asks for trouble.
The device is the culprit. Do you have other INFO fields which use NULL bytes? If not then I assume the device is naive enough to consider a NULL the end of a string, despite producing himself strings with multiple NULLs.
Since I encountered countless files not sticking to standards I can only say your parser is too naive as well: it knows how long the encapsulating list is and thus could easily detect field lengths that would not fit anymore. And could ignore garbage like that. Or, in your case, offer the very specific option "add to last field".
I've Google'd and read quite a bit on QR codes and the maximum data that can be used based on the various settings, all of it being in tabular format. I can't seem to find anything giving a formula or a proper explanation of how these values are calculated.
What I would like to do is this:
Present the user with a form, allowing them to choose Format, EC & Version.
Then they can type in some data and generate a QR code.
Done deal. That part is easy.
The addition I would like to include is a "remaining character count" so that they (the user) can see how much more data they can type in, as well as what effect the properties have on the storage capacity of the QR code.
Does anyone know where I can find the formula(s)? Or do I need to purchase ISO 18004:2006?
A formula to calculate the amount of data you could put in a QRcode would be quite complex to make, not mentioning it would need some approximations for the calculation to be possible. The formula would have to calculate the amount of modules dedicated to the data in your QRCode based on its version, and then calculate how many codewords (which are sets of 8 modules) will be used for the error correction.
To calculate the amount of modules that will be used for the data, you need to know how many modules will be used for the function patterns. While this is not a problem for the three finder patterns, the timing or the version/format information, there will be a problem with the alignment patterns as their number is dependent on the QRCode's version, meaning you anyway would have to use a table at that point.
For the second part, I have to say I don't know how to calculate the number of error correcting codewords based on the correction capacity. For some reason, there are more error correcting codewords used that there should to match the error correction capacity, as for example a 6-H QRCode can correct up to 32.6% of the data, instead of the 30% set by the H correction level.
In any case, as you can see a formula would be quite complex to implement. Using a table like already suggested is probably the best thing you could do.
I wrote the original AIM specification for QR Code back in the '90s for Denso Corporation, and was also project editor for both editions of the ISO/IEC 18004 standard. It was felt to be much easier for people producing code printing software to use a look-up table rather than calculate capacities from a formula - no easy job as there are several independent variables that have to be taken into account iteratively when parsing the text to be encoded to minimise its length in bits, in order to achieve the smallest symbol. The most crucial factor is the mix of characters in the data, the sequence and lengths of sub-strings of numeric, alphanumeric, Kanji data, with the overhead needed to signal each change of character set, then the required level of error correction. I did produce a guidance section for this which is contained in the ISO standard.
The storage is calculated by the QR mode and the version/type that you are using. More specifically the calculation is based on how 'compressible' the characters are and what algorithm that the qr generator is allowed to use on the content present.
More information can be found http://en.wikipedia.org/wiki/QR_code#Storage
In my application I use WpfLocalization to provide translations while the application is running. The library will basically maintain a list of properties and their assigned localization keywords and use DependencyObject.SetValue() to update their values when the active language is changed.
The scenario in which I noticed my problem is this: I have a simple TextBlock and have assigned a localization keyword for its Text property. Now when my application starts, it will write the initial value into it and it will display just fine on screen. Now I switch the language and the new value is set as the Text property but only half the text will actually display on screen. Switching the languages back and forth does not have any effect. The first language is always displayed fine, the second is cut off (in the middle of words, but always full characters).
The relative length of both languages to each other does not seem to have anything to do with it. In my test case the working language string is 498 bytes and the one that gets cut off is 439 bytes and gets cut off after 257 bytes).
When I inspect the current value of the Text property of said TextBlock right before I change its value through the localization code, it will always have the expected value (not cut off) in either language.
When inspecting the TextBlock at runtime through WPF Inspector it will display the cut off text as the Text property in the second language.
This makes no sense to me at all thus far. But now it gets better.
The original WpfLocalization library reads the localized strings from standard resource files, but we use a modified version that can also read those string from an Excel file. It does that by opening an OleDbConnection using the Microsoft OLE DB driver and reading the strings through that. In the debugger I can see that all the values are read just fine.
Now I was really surprised when a colleague found the fix for the "cut off text" issue. He re-ordered the rows in the Excel sheet. I don't see how that could be relevant, but switching between the two versions of that file has an impact on the issue.
That does actually make sense, it's because the ole db driver for Excel has to take a sample of the data in a column to assign it a type and in the case of string, also a length. If it only samples values below the 255 character threshold, you will get a string(255) type and truncated text, if it has sampled a longer string, it will assign it as a memo column and allow longer strings to be retrieved / stored. By re-ordering, you are changing which rows are sampled.
If you read the SQL Server to Excel using oledb you will find this is a known issue. http://msdn.microsoft.com/en-us/library/ms141683.aspx - since you are using the same ole db driver, I would expect the situation to also apply to you.
From the docs:
Truncated text.
When the driver determines that an Excel column
contains text data, the driver selects the data type (string or memo)
based on the longest value that it samples. If the driver does not
discover any values longer than 255 characters in the rows that it
samples, it treats the column as a 255-character string column instead
of a memo column. Therefore, values longer than 255 characters may be
truncated. To import data from a memo column without truncation, you
must make sure that the memo column in at least one of the sampled
rows contains a value longer than 255 characters, or you must increase
the number of rows sampled by the driver to include such a row. You
can increase the number of rows sampled by increasing the value of
TypeGuessRows under the
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel registry
key. For more information, see PRB: Transfer of Data from Jet 4.0
OLEDB Source Fails w/ Error.