ElasticSearch vs SQL Full Text Search [closed] - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I want to use full text search in my project... Can anyone explain me, what is the difference between ElasticSearch and SQL Full Text Search
Or
why SQL Full Text Search is better (worse) than elastic?
documentations, presentations, schema...

Define "better"... sql full text search is fairly trivial to get working (indexing and query) - but it has penalties:
very little (virtually no) control over how things are indexed (what the index keys are; what the lexers/stemmers/etc are; etc)
runs on the sql server - which is usually your least scalable infrastructure
Elastic search requires more work; you need to setup and maintain a dedicated cluster of nodes, and then provide code that performs the actual index operations, which may also involve a scheduled job that works from a change-log (processing new / edited data), building the fragments to be indexed; equally, you need to then take more time building the query. But you get a lot of control over the index and query, and scalability (a cluster can be whatever size you need). If it helps any, Stack Overflow grew up on sql full text search, then moved into elastic search when the limitations (both features and performance) proved prohibitive.

The answer depends on what goal you're trying to achieve and what resources you have to reach it. SQL server fulltext search is lower admin but limited in functionalities. Elastic search is at the other end of the spectrum.
SQL server fulltext search:
can prove efficient if you're data is not considerable growing and or schema is not changing over time
requires less effort to maintain and less of a learning curve/need for new competence
Elasticsearch:
need for data ingestion if your master db is having frequent incremental updates (logstash and other alternatives)
scales better horizontally
ability to use advanced features to improve performance for very large data sets (eg. routing)

Related

Save millions of files on windows [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
My system will save ~20-40 million image files
Each file is 150-300KB
My application will run on windows server 2012 R2 and the files will be saved on storage (don't know which one yet)
My application is written in C#
My requirements are:
- The system will constantly delete old files and save new files (around 100K files per day)
- The most recent images will be automatically displayed to users on web and wpf applications
- I need fast access to recent files (last week) for report purposes
What is the best practice for storing / organizing this amount of files?
Broad questions much? If you're asking about how to organize them for efficient access that's a bit harder to answer without knowing the reason you're storing that many files.
Let me explain:
Lets say you're storing a ton of log files. Odds are your users are going to be most interested in the logs from the last week or so. So storing your data on disk in a way that you can easily access the files by day (e.g. yyyy-mm-dd.log) will speed up getting access to a specific day log.
Now instead think of it like a phone book and you're accessing peoples names. Well storing it by the time you inserted that name in the phone book really isn't going to help you get to the result you want quickly. Better come up with a better sorting algorithm.
Essentially look at how your data will be accessed, try to sort it in a logical manner so that you can do a binary search algorithm or better algorithm on it.
I'd highly recommend rewording your question so it is clearer though.

C#: methode for storing large and complex data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am trying to store a software "Calender"(complex logs that saved and browsed based on date).
I tried using both .ini and xml but when the application tried to read the entire file to find info for 1 specific day out of the 100 days (or so it seemed), it took almost 9 seconds to get 5 variables out of 500 variables. The size of the actual file might eventually be more than 40 variables per day.
Also, I would rather not make a file for each day, that will seems a little bit unprofessional and messy.
I am asking the question to know if there is an alternative to keep things fast and neat. The data includes different types of variables and different amounts of them. I know i am kinda overdoing it with logging thing but the program needs logs to do its work
If the data must be stored it has to be a file or a database (local or remote), I'd go for SQLite, it would end in a single file, but you could query the data with SELECT, JOIN, etc.
EDIT:
You can use SQLite3 from c# if you include this package:
https://www.nuget.org/packages/System.Data.SQLite/
You'll need to learn some SQL, but after that you'll just use something like:
select Message from Logs where Date > '2015-11-01' and Date < '2015-11-25';
which is easier, faster and clearer than messing with XML, and it will not load the whole file.
As mentioned above, SQLite will offer a great possibility. Since you (generally), and probably not a lot of people out here will be able to write a database management system that is as efficient as the ones out there.
https://www.sqlite.org
Whole point of using RDBMS because it's far more efficient that dealing with files.
SQL Lite is light weight and easier to deploy. But remember that,
SQLite only supports a single writer at a time (meaning the execution
of an individual transaction). SQLite locks the entire database when
it needs a lock (either read or write) and only one writer can hold a
write lock at a time. Due to its speed this actually isn't a problem
for low to moderate size applications, but if you have a higher volume
of writes (hundreds per second) then it could become a bottleneck.
Reference this question
If this is an enterprise level application requirement I would go for Azure Table storage based solution which is identical for this sort of scenario.

Performance evaluation of Web API calls with Database LINQ queries [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a UI which calls WebAPIs (WebAPI 2.0), Web API are basically LINQ queries (to MS SQL database) and some processing logic for the data. I want to do performance evaluation of the complete flow (click from UI to API, back to UI to display data) upon a huge DB with 30K - 60K records in it.
How can it be done? Let me know the methods/tools used for this.
currently I am tracking time-taken in chrome debug window, which shows the total time for each network call.
Wow. This is a subject in its own right but here's an approach:
The bits are independent so you break it down. You measure your LINQ queries without any of the logic or web api stuff getting in the way. If LINQ is against stored procedures then measure those first. Then you measure the cost of the logic, then you measure the cost of sending X rows of data using WebAPI. You should avoid including the cost of actually retrieving the rows from the database so you're checking just the connectivity. I'd also consider writing a browserless test client (i.e. GETS/POSTS or whatever) to eliminate the browser as a variable.
Now you've got a fairly good picture of where the time gets spent. You know if you've got DB issues, query issues, network issues or application server issues.
Assuming it all goes well, now add a bunch of instances to your test harness so you're testing concurrent access, load testing and the like. Often if you get something wrong you can't surface that with a single user so this is important.
Break it down into chunks and have a data set that you can consistently bring back to a known state.
As for tools, it really depends on what you use. VS comes with a bunch of useful things but there are tons of third party ones too. If you have a dedicated test team this might be part of their setup. SQL Server has a huge chunk of monitoring capability. Ask your DBAs. If you've got to find your own way, just keep in mind that you want to be able to do this by pressing a button, not by setting up a complex environment.

Does frequent INSERT/UPDATE query compromise the database safety? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Hi, I am relatively new to using databases in a web application and I am trying to develop a dynamic application for my users.
So, I was wondering how safe is it to have your application frequently (say every 2 seconds) execute an INSERT/UPDATE query from the same user.
I'm aware that INSERT/UPDATE queries are quite slow to execute (I've read its 12 INSERT per second) but that's not my question.My question is concerning the safety of my database.
The frequency of executing INSERTs or UPDATEs to your database is in no way related to the safety of of your database. But it might impact its performance, that's possible.
I think the concern of the OP is more about the robustness of the database's ability to handle load and not become corrupt, rather than issues of SQL injection. (Valuable comments to take note of though). The volumes suggested per user should no concern.
Database integrity for any DB should be checked with regular maintenance. Make sure you are doing the following and you should have no problems with performance and reliability.
Back up your DB. Full backup and transaction log backups.
Re-build and/or Re-Organize indexes
Delete old data backups and check free space often.
Check the maintenance logs for issues on your DB.
Then monitor performance for sufficient Memory, DISK I/O, and CPU.
wihout "with(nolock)" segment after the table name, the table being insert/update is always locked.
if so, how can it performance well

dotNetRDF vs plain SQL [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am working on a collaborative-filtering recommender system. I built such a system before in a parallel-threaded environment, querying RDF with SPARQL. This worked well, because of the parallel nature of SPARQL and RDF graphs. However, I am now working in a standard desktop PC, and am wondering if using SPARQL is still the way to go in a largely serial environment. I've looked at dotNetRDF, as I'm using C#, and am wondering if it is any more efficient than simple SQL, especially now that dotNetRDF seems to be moving away from a SQL back-end.
So as far as performance on a few threads go, SQL or dotNetRDF? Tables or graphs?
The two things are not really comparable, dotNetRDF is a programming API that provides support for a variety of storage backends in addition to a pure in-memory solution which we mainly recommend for testing and development (Disclaimer I'm the lead developer)
The different backends have a wide variety of performance characteristics so if your problem is expressible in RDF then likely there is an appropriate backend for you.
SQL is a query language, really you should be comparing SQL to SPARQL and ultimately which you chose comes down to what your data model looks like. If it's regular then you likely want to use a RDBMS and SQL, if it's irregular and/or graph like then you likely want to use a triple store and SPARQL. The two have different pros and cons as your own answer implies.
This seems to answer it well enough. Triple Stores vs Relational Databases
Essentially, RDF is much more flexible, but expensive. Since I'm just doing collaborative filtering with data that fits pretty well into a table, I don't think I need the extra expense, as much as I like graphs.

Categories