Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
My system will save ~20-40 million image files
Each file is 150-300KB
My application will run on windows server 2012 R2 and the files will be saved on storage (don't know which one yet)
My application is written in C#
My requirements are:
- The system will constantly delete old files and save new files (around 100K files per day)
- The most recent images will be automatically displayed to users on web and wpf applications
- I need fast access to recent files (last week) for report purposes
What is the best practice for storing / organizing this amount of files?
Broad questions much? If you're asking about how to organize them for efficient access that's a bit harder to answer without knowing the reason you're storing that many files.
Let me explain:
Lets say you're storing a ton of log files. Odds are your users are going to be most interested in the logs from the last week or so. So storing your data on disk in a way that you can easily access the files by day (e.g. yyyy-mm-dd.log) will speed up getting access to a specific day log.
Now instead think of it like a phone book and you're accessing peoples names. Well storing it by the time you inserted that name in the phone book really isn't going to help you get to the result you want quickly. Better come up with a better sorting algorithm.
Essentially look at how your data will be accessed, try to sort it in a logical manner so that you can do a binary search algorithm or better algorithm on it.
I'd highly recommend rewording your question so it is clearer though.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 months ago.
Improve this question
I am developing a light weight web app in ASP.NET with C# and I would like to store some data such as site analytics etc.
Instead of using a SQL server database could I use a JSON file instead? So I would open the JSON file and write into it when done. Could there be problems with simultaneous connection?
Is this standard/ common practice?
Any help will be highly appreciated.
It is probably a VERY bad idea. I would only do this if it was some text file to display a wee bit of information.
If you need updating? Then what happens if two users are on the site. The last person to update will over write the text based json file.
I suppose you could write out a "id" and json file for each record you edit, and that way a save of one record value would not over write others. But, putting more then one reocrd in a json file, and if users are to edit such values, then no, it will not work, since a simple save of that text json file will overwrite any other changes made by any other user who also happens to hit the same button.
It is VERY hard to find a web site hosting, even those for super cheap - less then $10 per month that does not include a database server system. And be it MySQL, SQL server, postgres sql and more? They all have free verisons you can use.
And I suppose you could consider the file based sqlLite. That would even be better. While it not considered thread safe, it can work if you only say have a a few users like 2-4 users working at the same time.
Because you have OH SO MANY choices here? The only reason to use a text based json file is if no other options exist - and boatloads of options exist in near all cases.
And if some database was not available? The I would simple include sqlLite in the project and use that.
I'm not super experienced but i would like to explain why i would never do that.
First of all everything (if you are brave enough) can be a database, as long as it is some kind of file that can give persistency to your data. Database are basically optimized one purpose software to store data. Mainly the problem in your solution is that you would need to read the file and load it in memory storing it as an object and then writing data to it like you would do with a static factory object and then serialise it back to JSON after you are don with it. Personally i don't like this idea because i think is very prone to human error (like deleting the file by accident during mantainance) and it can be hard-er to debug if it starts accumulating a sizeable chunk of data. There are very lightweight data persistency solutions that implements SQLite that is a database for small applications, like Pocket Base. Since you are already developing a backend it would require you near to no effort to add a little table to store the analytics.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am trying to store a software "Calender"(complex logs that saved and browsed based on date).
I tried using both .ini and xml but when the application tried to read the entire file to find info for 1 specific day out of the 100 days (or so it seemed), it took almost 9 seconds to get 5 variables out of 500 variables. The size of the actual file might eventually be more than 40 variables per day.
Also, I would rather not make a file for each day, that will seems a little bit unprofessional and messy.
I am asking the question to know if there is an alternative to keep things fast and neat. The data includes different types of variables and different amounts of them. I know i am kinda overdoing it with logging thing but the program needs logs to do its work
If the data must be stored it has to be a file or a database (local or remote), I'd go for SQLite, it would end in a single file, but you could query the data with SELECT, JOIN, etc.
EDIT:
You can use SQLite3 from c# if you include this package:
https://www.nuget.org/packages/System.Data.SQLite/
You'll need to learn some SQL, but after that you'll just use something like:
select Message from Logs where Date > '2015-11-01' and Date < '2015-11-25';
which is easier, faster and clearer than messing with XML, and it will not load the whole file.
As mentioned above, SQLite will offer a great possibility. Since you (generally), and probably not a lot of people out here will be able to write a database management system that is as efficient as the ones out there.
https://www.sqlite.org
Whole point of using RDBMS because it's far more efficient that dealing with files.
SQL Lite is light weight and easier to deploy. But remember that,
SQLite only supports a single writer at a time (meaning the execution
of an individual transaction). SQLite locks the entire database when
it needs a lock (either read or write) and only one writer can hold a
write lock at a time. Due to its speed this actually isn't a problem
for low to moderate size applications, but if you have a higher volume
of writes (hundreds per second) then it could become a bottleneck.
Reference this question
If this is an enterprise level application requirement I would go for Azure Table storage based solution which is identical for this sort of scenario.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
As part of the Discovery process for an upcoming project, I am trying to find a way of taking a representative sample of the PPT files on our network. So far, I have collected & organized all of the PPT files that we have, however I've realized that there is an overwhelming volume of documents, such that I need to find a way to reduce it down. To this end, I was thinking that it'd be helpful to delete all "duplicate" files.
Our company does not have any sort of version control system for files on our network. As such, users often create copies of files in order to make small minor changes. This has led to a high volume of "duplicate" files with no real naming convention, etc. Ideally, I'd be able to make a best-guess as to which files are "duplicates" and keep the most recent version. Since I just need a representative sample, I do not need to be 100% accurate regarding the save/delete decision and it's also ok if I lose a chunk of the files due to (there are currently 135K files, and I expect to end up with 3-5K). I am not sure how to go about this, as tools like http://www.easyduplicatefinder.com/ seem to look for truly identical documents, as opposed to a more nuanced difference.
Here are a couple of additional details:
File names do not follow any standard convention
I think it's fair to assume that many of the PPT properties would remain unchanged across versions
Versions of files are always located in the same folder, however other PPT files may also exist in the same folder
I'm open to addressing this problem in any of the following languages/technologies: C#, VB, Ruby, Python, IronPython, PowerShell
I would approach it like:
extract all visible text strings from each .ppt file
dump the strings into text files, one per .ppt
run diff across all pairs of text files (in the same directory?) to get min edit distance
run the resulting distance matrix through a clustering algorithm
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Disclaimer: Yes, I am fairly new to storing data in files. But this is something I'm willing to learn, and take the time to learn
I am working on creating a program for someone who keeps track of volunteers and how much time they spend working, and what they are working on.
For each volunteer I will have their Address Information, email, phone, etc. I will also have what show they worked, how long they worked it for, and what position they were working.
Later, the user will be able to access a way to print out monthly reports of this information.
I am wondering if anyone is willing to give me a nudge in the right direction. How should I store all this data? I've heard of XML, SQL, JSON, and other things here and there. I need something that can handle large amounts of data, as there are about 200 volunteers right now, and data will need to be constantly added to this file(s). Are there any suggestions? If you need me to clarify something, please just ask.
Also, I am using Windows Forms Application, C#.
There are different ways to accomplish this, most preferred ones would be either SQL database (MySQL, MSSQL, SQLite, etc.), JSON storage (and GZipping, for good measures. Saving storage, you know), or binary storage (aka. knowing the complete size of the data you're storing, such as how many Int32's, and reading this in bytes from a file).
I'd go with the JSON storage if you're not planning on accessing this data from more than 1 computer.
If portability is what you want, you're gonna want to go with SQL.
If you're just looking for a simple way to store data, then it'd be either JSON or binary saving.
These are probably the best choices you have, on how to save data.
#Edit: Seeing how you've specified large amounts of data, your best shot would be either SQL or Binary. Binary, because you can jump around in files because you know the size of each object, so you wont have to read all of them. SQL because it has this feature built as part of the core. Json simply wont be a very good choice for large amounts of data.
XML, you're gonna want to avoid entirely, unless you want to be able to export data because other applications requires it in XML format. XML simply uses too large amounts of storage space, due to the whole structure of the language.
#Edit2: Would whoever downvoted my answer, explain why please? .__.'
SQL can handle pretty much a lot of data. It is easy to use for a beginner and there is a lot of support for it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am working on a C# windows forms project with a friend and we are using TFS for our source control. The program we have written scrapes data off a couple of webpages at regular time intervals and this is updated to the UI.
However, we would like to be able to store this data for future reference. We will be gathering around 1000 bits of data every few seconds for several hours a day.
I'm relatively new to programming and know very little about databases and couldn't find anything on the internet about how to use them with team foundation server. What's the best way to store this data so it is accessible to both me and my colleague?
So far, we've used local XML files but as I say we need somewhere to centrally store the data.
Sorry if there's any information you need that I've missed off - this is the first question I've asked on a forum - but let me know and I'll provide any info I can.
I look forward to your help,
There is nothing specific about TFS that would keep you from using databases.
If you have installed Visual Studio as your IDE to work with TFS as your source control you probably have a copy of SQL Server Express installed. I would look into how to utilize that. You may want to look at Linq to SQL or ADO.Net to provide your connectivity to your database from your applcation. You will likely get a great deal more flexibility and performance depending on how much data you are collecting if you keep it there vs files.
ADO.Net Tutorial
Linq to SQL Tutorials
Entity Framework
Create a .txt file and every time the service runs append a new line to the text file.
How to add new line into txt file