In my website im having one csv file, which is having millions of records.
based on some search key i need to select one record.
this part i completed.
my doubt is If multiple users (1000 users) access my website (only one csv file will be available)... we can able to read the same file with 100 users?
1M records is not a lot. Frankly I'd just load it all into structured data, and reference that. Any number of users can access it once it is memory (especially for read-only).
But ultimately the ideal answer here is: use a database. SQL Server Express is free and will cope with that effortlessly.
As long as the application only has to read you will not have a problem. However it would be more efficent to use a database for this task. You can make indexes and use sql of easy access. No need to parse the file on each request and you can even add/change data when your site is running.
Related
When working with ASP.NET MVC and SQL Server we are wondering if caching to XML is still something to think about or are their other possibilities for this?
Like for instance we have a table called Customers. If you call this db table everytime you click on Customers or do sorting or filtering in the app why not store this info in a xml file.
Then you work only with the xml file and not the db and you update the xml after adding changes to the customers table.
It is an absolutely brilliant idea.
If:
You only have 1 client
Or you have multiple client but they don't mind seeing old data
You have a database system that doesn't provide caching possibilities
You do not use database access frameworks that can handle caching for you
In short, no, it actually is almost never a good idea.
Databases are made to be used. Most of them can handle a much higher load than programmers think they can, as long as you treat them well. If necessary, a lot of them provide perfectly fine caching possibilities to improve performance if needed.
Any useful type of caching in your application should involving refreshing that cache when anything changes. Implementing that by yourself is usually not a good idea. If you do want a very simple cache of data that was just on the screen before the user clicked away, memory would be the place for it, not a file system. Unless you need centralised session cache, but that goes way beyond "let's write some xml".
Caching to xml file is bad choice. Database system can handle load of 100 users in 5 seconds if you have 50000 records in your table. If you want more speed than this then try using In-memory sql which stores data in RAM for fast access. But for it you need high RAM capacity on server.
The company I work for is running a C# project that crawling data from around 100 websites, saving it to the DB and running some procedures and calculations on that data.
Each one of those 100 websites is having around 10,000 events, and each event is saved to the DB.
After that, the data that was saved is being generated and aggregated to 1 big xml file, so each one of those 10,000 events that were saved, is now presented as a XML file in the DB.
This design looks like that:
1) crawling 100 websites to collects the data and save it the DB.
2) collect the data that was saved to the DB and generate XML files for each event
3) XML files are saved to the DB
The main issue for this post, is the selection of the saved XML files.
Each XML is about 1MB, and considering the fact that there are around 10,000 events, I am not sure SQL Server 2008 R2 is the right option.
I tried to use Redis, and the save is working very well (and fast!), but the query to get those XMLs works very slow (even locally, so network traffic wont be an issue).
I was wondering what are your thoughts? please take into consideration that it is a real-time system, so caching is not an option here.
Any idea will be welcomed.
Thanks.
Instead of using DB you could try a cloud-base system (Azure blobs or Amazon S3), it seems to be a perfect solution. See this post: azure blob storage effectiveness, same situation, except you have XML files instead of images. You can use a DB for storing the metadata, i.e. source and event type of the XML, the path in the cloud, but not the data itself.
You may also zip the files. I don't know the exact method, but it can surely be handled on client-side. Static data is often sent in zipped format to the client by default.
Your question is missing some details such as how long does your data need to remain in the database and such…
I’d avoid storing XML in database if you already have the raw data. Why not have an application that will query the database and generate XML reports on demand? This will save you a lot of space.
10GBs of data per day is something SQL Server 2008 R2 can handle with the right hardware and good structure optimization. You’ll need to investigate if standard edition will be enough or you’ll have to use enterprise or data center licenses.
In any case answer is yes – SQL Server is capable of handling this amount of data but I’d check other solutions as well to see if it’s possible to reduce the costs in any way.
Your basic arch doesn't seem to be at fault, its the way you've perceived the redis, basically if you design your key=>value right there is no way that the retrieval from redis could be slow.
for ex- lets say I have to store 1 mil objects in redis, and say there is an id against which I am storing my objects, this key is nothing but a guid, the save will be really quick, but when it comes to retrieval, do I know the "key" if i KNOW the key it'll be fast, but if I don't know it or I am trying to retrieve my data not on the basis of key but on the basis of some Value in my objects, then off course it'll be slow.
The point is - when it comes to retrieval you should just work against the "Key" and nothing else, so design your key like a pre-calculated value in itself; so when I need to get some data from redis/memcahce, I could make the KEY, and just do a single hit to get the data.
If you could put more details, we'll be able to help you better.
Is there a more effective way than this for implementing a user-saved-file in my application?
(I'm using C# / .Net with Sql Server)
MY GOAL:
I wish to allow users to save their created datapoints (along with some other structured data) to a "project file" with arbitrary extension.
PROPOSED METHOD:
Store each datapoint in the database along with a FileID column
When user saves the file, grab all the datapoints: "SELECT * ... WHERE FileID = #CurrentFileID".
Export all of those datapoints to an XML file.
Delete all of those datapoints from the database.
Save the XML file as (or as part of) the project file.
Every time user loads their project file, import the data from the XML back into the database.
Display the datapoints from the database that have FileId = Current file ID.
ALTERNATIVE:
Use Sqlite and create a separate Sqlite database file for each of user's projects?
The "best" answer depends on several factors. You can help yourself arrive at the best implementation for you by asking some probing questions about the use of the data.
The first question I would ask is: is there any reason that you can think of, now or in the future, for storing the data points as discrete fields in the database.
Think about this question in the context of what needs to consume those data points.
If, for example, you need to be able to pull them into a report or export only a portion of them at a time based on some criteria, then you almost certainly need to store them as discrete values in the database.
However, if the points will only ever be used in your application as a complete set and you have to disassemble and reassemble the XML every time, you may want to just store the complete XML file in the database as a blob. This will make storage, backup, and update very simple operations, but will also limit future flexibility.
I tend to lean toward the first approach (discrete fields), but you could easily start with the second approach if it is a stretch to conceive of any other use for the data and, if the need arises to have discrete fields, it should be a relatively easy exercise to convert the data in the blobs into tabled data if needed.
You can refine the answer by also asking additional questions:
What is the magnitude of the data (hundreds, thousands, millions, billions)?
What is the frequency of inserts and updates (x per second, minute, day, year)?
What is the desired response time (milliseconds, seconds, doesn't matter)?
If you need to insert hundreds of thousands of points per second, you might be better off storing the data as a blob.
However, if you only need to insert a few hundred points a couple of times a day, you might be better off starting with the tabled approach.
It can get complex, but by probing how the data is used in your scenario, you should be able to arrive at a pretty good answer for your system.
I have one SQL server A located at some place which contains huge number of records(raw data). A continuosuly running process (C# .NET) from there will notify me if there are records that needs to be processed via web service (WCF) and I need to move that record to my SQL server B for processing. What is the elegant and efficient way to do that?
I have a couple of thoughts on that:
1) Sent the records in batches from one to the other via WCF.
2) Save the records in a file and load it to FTP. Then I can download it from there and upload the records to my DB.
Is there any other better way to do that?
I have a couple of thoughts on that: 1) Sent the records in batches from one to the other via WCF. 2) Save the records in a file and load it to FTP. Then I can download it from there and upload the records to my DB.
This really depends on how real-time the data needs to be. In our organization we use a lot of MQ's to keep data synchronized because it needs to be updated real-time between differing applications.
REAL-TIME
If the data needs to be real-time, and you can setup an MQ, that's what I'd recommend. They are fast, light-weight, and durable. They do take some work to setup, but here is a link that can get you started.
BATCH
If the data can be updated in batch you're going to be better off. Real-time data, and the issues that come along with triage, is a lot more complex and cumbersome in practice. With a batch file you can validate and sanitize the data up front to ensure the CRUD operations will succeed. With batch, use a text file, delimited or fixed, and import it using an SSIS job. SSIS can pull it down from the FTP Server and import it, all in one fell swoop.
Background:
I have one Access database (.mdb) file, with half a dozen tables in it. This file is ~300MB large, so not huge, but big enough that I want to be efficient. In it, there is one major table, a client table. The other tables store data like consultations made, a few extra many-to-one to one fields, that sort of thing.
Task:
I have to write a program to convert this Access database to a set of XML files, one per client. This is a database conversion application.
Options:
(As I see it)
Load the entire Access database into memory in the form of List's of immutable objects, then use Linq to do lookups in these lists for associated data I need.
Benefits:
Easy parallelised. Startup a ThreadPool thread for each client. Because all the objects are immutable, they can be freely shared between the threads, which means all threads have access to all data at all times, and it is all loaded exactly once.
(Possible) Cons:
May use extra memory, loading orphaned items, items that aren't needed anymore, etc.
Use Jet to run queries on the database to extract data as needed.
Benefits:
Potentially lighter weight. Only loads data that is needed, and as it is needed.
(Possible) Cons:
Potentially heavier! May load items more than once and hence use more memory.
Possibly hard to paralellise, unless Jet/OleDb supports concurrent queries (can someone confirm/deny this?)
Some other idea?
What are StackOverflows thoughts on the best way to approach this problem?
Generate XML parts from SQL. Store each fetched record in the file as you fetch it.
Sample:
SELECT '<NODE><Column1>' + Column1 + '</Column1><Column2>' + Column2 + '</Column2></Node>' from MyTable
If your objective is to convert your database to xml files, you can then:
connect to your database through an ADO/OLEDB connection
successively open each of your tables as ADO recordsets
Save each of your recordset as a XML file:
myRecordset.save myXMLFile, adPersistXML
If you are working from the Access file, use the currentProject.accessConnection as your ADO connection
From the sounds of this, it would be a one-time operation. I strongly discourage the actual process of loading the entire setup into memory, that just does not seem like an efficient method of doing this at all.
Also, depending on your needs, you might be able to extract directly from Access -> XML if that is your true end game.
Regardless, with a database that small, doing them one at a time, with a few specifically written queries in my opinion would be easier to manage, faster to write, and less error prone.
I would lean towards jet, since you can be more specific in what data you want to pull.
Also I noticed the large filesize, this is a problem i have recently come across at work. Is this an access 95 or 97 db? If so converting the DB to 2000 or 2003 and then back to 97 will reduce this size, it seems to be a bug in some cases. The DB I was dealing with claimed to be 70meg after i converted it to 2000 and back again it was 8 meg.