Store huge data in metro app [closed] - c#

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to store world cities in a list since metro app can't have local database but I am not sure it is possible (I've found a text file with more than 3 million cities).
I wonder how they did in the weather app. Since there is no latentcy in the results suggested (in the search charm or in the "favorite places" screen), I don't think they use a webservice, and also I want my app to be able to propose a list of cities even if there is no connection available.
Any idea ?

Simply stick it in a text file, delimited by line. This is not a large amount of data - you can likely hold it all in RAM in one go.
Using a database just for this one list seems a little overkill.
With some rough calculations, assuming names of around 20 characters each, I'm in the region of about 100MB of city data. That's not insignificant for one list in memory - granted, but it's not a lot to have to contend with.
You may even be able to use something like a Linq to Text provider.
How they may have done it in the charm is to only worry about a few cities - the favourites and whatever your location service last reported. Handling < 10 is easier than 3 million.

3 million cities might sound like a lot. But is it a lot for your Metro app?
These are very rough estimates
Let's use an average city name length of 20 unicode characters.
20 * 3 million = 60 million unicode characters.
60 million * 2 bytes per unicode character = 120 million bytes.
120 million bytes / 1024 = 117,187.5 kilobytes
117,187.5 kilobytes / 1024 = 114 megabytes
~115mb isn't exactly 'small' but depending on your other requirements - you can probably handle loading 150mb into memory. You can use whatever .NET objects you'd typically use like List and use LINQ to get the matching cities or whatever.
That's not to say this is your only option. It's just probably a viable one. There is a lot of very clever stuff you could do to avoid pulling all of it into memory at once; but if you want to eliminate/minimize lag - that's going to be your best bet.

I would suggest looking into SQLite if you need a database. Metro applications obviously do not have access to SQL Server and other Win32-based DBMS, but you could try SQLite as a lightweight alternative.
Try this: https://github.com/doo/SQLite3-WinRT

We recommend using SQLite database with LinqConnect - Devart's LINQ to SQL compatible solution which supports SQLite. You can employ LINQ and ADO.NET interfaces with our product. Starting from the 4.0 version, LinqConnect supports Windows Metro applications: http://blogs.devart.com/dotconnect/linqconnect-for-metro-quick-start-guide.html.

Related

Is there a better permalink solution [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am developing a website in C# and ASP.NET MVC where people can manage their own web pages. At the moment I am using the permalink solution of StackOverflow but I am not sure if this will work in my situation because people will add and delete pages constantly. This means that the id in the pages table will grow very large.
Example: mydomain.com/page/17745288223/my-page-title
Is there a better solution?
I think that for your case (users creating pages) it's actually more user friendly to put all pages created by a single user under his/her own path i.e.:
mydomain.com/page/{username/nickname/some-name-selected-by-user}/my-page-title
If you don't want to use such format an int or long in URL will probably do.
Well, you could use some kind of a hash to make lookups more efficient. You could, for instance compute a SHA-1 hash of page title, creating date, user information, etc. - just like git does for commit ids.
Or you could use simple numbers, but convert them into some compact representation using hexadecimal numbers or alphanumerical characters like some url-shortening services.
Though this started as a comment I decided it was growing larger so here it is again..
The page id solution seems just fine.
What are you worried about? If you are expecting a few million pages that's 7 characters. If you are expecting more than a few billion pages that's 9 - 10 characters.. Pretty manageable, I think.
You could also represent it as hex and reduce it to a maximum of 8 characters to fit up to 2^32 different ids.
This means that the id in the pages table will grow very large.
What's the problem with that?
The largest value for an int is also very large (just over 2 billion) so I doubt it will hit any limit unless you are planning to have millions of users with thousands of pages each.
If you are still worried then you can use a long (64-bit integer). It can handle trillions of users with millions of pages each. Note that the population of the Earth is only a few billion.

Designing a database file format [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to design my own database engine for educational purposes, for the time being. Designing a binary file format is not hard nor the question, I've done it in the past, but while designing a database file format, I have come across a very important question:
How to handle the deletion of an item?
So far, I've thought of the following two options:
Each item will have a "deleted" bit which is set to 1 upon deletion.
Pro: relatively fast.
Con: potentially sensitive data will remain in the file.
0x00 out the whole item upon deletion.
Pro: potentially sensitive data will be removed from the file.
Con: relatively slow.
Recreating the whole database.
Pro: no empty blocks which makes the follow-up question void.
Con: it's a really good idea to overwrite the whole 4 GB database file because a user corrected a typo. I will sell this method to Twitter ASAP!
Now let's say you already have a few empty blocks in your database (deleted items). The follow-up question is how to handle the insertion of a new item?
Append the item to the end of the file.
Pro: fastest possible.
Con: file will get huge because of all the empty blocks that remain because deleted items aren't actually deleted.
Search for an empty block exactly the size of the one you're inserting.
Pro: may get rid of some blocks.
Con: you may end up scanning the whole file at each insert only to find out it's very unlikely to come across a perfectly fitting empty block.
Find the first empty block which is equal or larger than the item you're inserting.
Pro: you probably won't end up scanning the whole file, as you will find an empty block somewhere mid-way; this will keep the file size relatively low.
Con: there will still be lots of leftover 0x00 bytes at the end of items which were inserted into bigger empty blocks than they are.
Rigth now, I think the first deletion method and the last insertion method are probably the "best" mix, but they would still have their own small issues. Alternatively, the first insertion method and scheduled full database recreation. (Probably not a good idea when working with really large databases. Also, each small update in that method will clone the whole item to the end of the file, thus accelerating file growth at a potentially insane rate.)
Unless there is a way of deleting/inserting blocks from/to the middle of the file in a file-system approved way, what's the best way to do this? More importantly, how do databases currently used in production usually handle this?
the engines you name are very different... and your engine seems to have not so much in common with them... your engine sound similar to the good old dBase format...
For deletion the idea with the bit is good... make the part with overwriting deleted items with 0x00 configurable...
For Insertion you should keep a list of free blocks with their respective size... this list gets updated when you delete an item and when you grow the file and when you shrink the filt... this way you can determine very fast how to handle an insertion...
Why not start by looking at how existing systems work? If this is for your own education that will benefit you more in the long run.
Look at the tried and true B-Tree/B+Tree for starters. Then look at some others like Fractal Tree indexes, SSTables, Hash Tables, Merge Tables, etc.
Start by understanding how a 'database' stores and indexes data. There are great open source and documented examples of this both in the NoSQL space as well as the more traditional RDBMS world. Take apart something that exists, understand it, modify it, improve it.
I've been down this road, though not for educational purposes. The .NET space lacked any thread-safe B+Tree that was disk-based, so I wrote one. You can read some about it on my blog at http://csharptest.net/projects/bplustree/ or go download the source and take it apart: http://code.google.com/p/csharptest-net/downloads/list
There are open source databases why dont you look at them first. MySQL source code can be a good start. You can download the source and get into it.
Also, you can start investigating the data structures being used by databases, then look at persistence strategies and so forth.

How to test 500 Trillion combinations in less than 6 hours of execution time [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I have a PHP script now looping through combinations of a set of arrays. I can test 6.1 Billion of the 500 Trillion total combinations in 1 hour with a simple PHP script. Is it possible to write a program in any language running on todays average PC that would be able to test all 500 Trillion combinations of multiple arrays in less than ~6 hours?
Also, I do not have the resources to use distributed or cluster computing for this task. What kind of gains could I expected converting the code to multithreaded java/c#?
Thank you
Let's start simple. Do you use threading? If not - a modern higher end Intel today has 12 hardware threads per processor. This means you get a factor of 12 from threading.
If someone gets a server specific for that he could get 24-32 hardware threads easily for relatively low cost.
If the arrays are semi static and you asume adecent graphics card, you may find having from 800 to 3000 processor cores a huge time saver. Nothing beats this - and even average CPU's have quite some core capabilities in their chips or the graphics cards these days.
500 trillion comparisons in 6 hours
=
83.3 trillion comparisons in 1 hour
=
1.4 trillion comparisons per minute
=
23.1 billion comparisons per second
Assuming you've got an Intel Core i7-2600 cpu (3.4GHz), which is 4 cores + hyperthreading = 8 cores, you'd need a per-core speed of
23.1/6 = 3.9GHz
which is at the extreme end of possibility for basic overclocking.
Once you factor in other overhead, what you want is not possible. Your cpu cannot do NOTHING BUT COMPARISONS.
If you don't have the resources then I'm afraid to say, with the numbers you want, you are buggered.
You'll need to rethink your data structures and the algorithms working on them to have any chance of completing your puzzle within the time limit - using PHP or any other language.
I know nothing about the process you want to run and maybe there is no way to achieve your goals with your current resources, but since you are asking for a language, and it is true that PHP is not the best one to tackle paralelism, I should say that Erlang is famous for that kind of achievements.

isn't number localization just unnecessary? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've just read this page http://weblogs.asp.net/scottgu/archive/2010/06/10/jquery-globalization-plugin-from-microsoft.aspx
One of the things they did was to convert the arabic date to the arabic calendar. I'm wondering if it is a good idea at all to do so. Will it actually be annoying/confusing for the user (even if the user is Arabic).
Also, my second question is that do we really need to change 3,899.99 to 3.899,99 for some cultures like German? I mean it doesn't hurt to do so since the library already does it for us but wouldn't this actually cause more confusion to the user (even if he is German).
I'm sure whatever culture these people come from, if i give you a number 3,899.99 there's no way you'd get that wrong right? (since he'd probably learned the universal format anyway)
Your problem here seems to be a bad assumption. There is no "universal format" for numbers. 3,899.99 is valid in some places, and confusing in others. Same for the converse. People can often figure out what they need to (especially if it's in software that is clearly doing a shoddy job of localization otherwise. :) ), but that's not the point.
Except in certain scientific and technical domains that general software doesn't usually address, there's no universal format for any of these things. If you want your software to be accepted on native terms anywhere but your own place, you'll need to work for it.
To me it seems like it would be much less confusing to see dates and numbers in the format you're used to (in your country or language) - why do you think it would be the other way around?
The point of localization is to make your application look more natural for the user. It is definitely advisable to do this in your application if you use it internationally. While you can use US standards, that is not very customer-friendly way of doing things.
How would it be more confusing to a person to see the format they are familiar with? Meet people where they are with your application. If their standard is 10.000,00 and you are showing them 10,000.00, even if they understand it, it does make it a bit disconcerting. Reverse the situation and think what you would like. Would you like a developer using 10.000,00 for their application because you can understand it just fine?
Depends. 3.899,99 to me looks like two numbers. 3.899 and 99. I imagine our number formatting looks similarly funny to foreigners. Sure, I could guess what it means here, but what if you had a whole bunch of numbers like this clustered together? The winning lotto numbers are 45,26,21,56,94,13. Is that one big number, or 6 2-digit numbers?
Date formatting is especially important. 01/02/03. Is that Jan 2 2003, Feb 1 2003, Feb 3 2001 or what? Different cultures specify the d/m/y in different orders. Also, when spelled out, they obviously have different names for the months.
If you have the time and resources to internationalize it, I think you should.
As a foreigner myself, I can assure you that localization helps a lot in terms of user satisfaction. Commas or dots in numbers may induce big mistakes. Another on is the relative position of days and months.
To improve even further, create translations and add an option to choose locale. That way you will have close to 100% customer satisfaction
another important thing is input. if you don't have localization, take the user input "1.234"... what does the user mean? 1.234 or 1234 ? ... there may be users that don't like their values to be off by factor 1000 ... who knows? ;)

Piece of code that can kill computer performance [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm searching for code in c# that can kill computer performance (CPU performance, maybe cpu - memory link performance too) as much as it is possible (it will run on 4 core box so I'm going to create 4 threads and run it simultaneously).
Should it work on int / double / numeric data type / should it have some crazy data structures (but it should not take too much memory) .
Do you have any suggestions ?
Calculate PI using all processors.
You could use parallel Linq to generate a Mandelbrot (Jon Skeet has the code readily available).
Have a program that writes copies of its executable to the drive multiple times for each thread. Have each of these copies of the program then triggered by the program. :)
If you want to kill a machine's performance, try hitting the disk, because IO interrupts tend to affect everything even on a good CPU scheduler. Something like enumerating a directory of many little files, or writing a lot of big files to disk would do the trick.
Why re-invent the wheel? Use existing Load Testing software.
Calculate a long sequence of prime numbers. The following link contains code that can be modified to do this..
Program to find prime numbers
Call Bitmap.GetPixel, in a loop, in an image processing application.
I would say: a naieve (brute force) travelling salesman implementation:
(from wikipedia):
The Travelling Salesman Problem (TSP) is an NP-hard problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pairwise distances, the task is to find a shortest possible tour that visits each city exactly once.
Brute force solving of N Queens (see wikipedia) for for example 64 queens.
Because a simple loop like this can be optimized away (sometimes only after a few minutes already running):
while(true) {
i++;
}
You can, as well, resolve a very long encrypted message, encrypted by a key such as 2048 bits.
That's a killer.
An open-source, multithreaded 3D modeling program rendering an extremely complex lighted scene will pound the strongest system into submission.
Okay, how about some infinite recursion in the spirit of StackOverflow?
void deathToAllRobots(int someMeaninglessValue) {
deathToAllRobots(someMeaninglessValue+1);
}
int *x;
while(1)
{
x = new int[10];
}

Categories