Suggestions on using NoSQL DB as FileStorage, and Pros & Cons

Suggestions on using NoSQL DB as FileStorage, and Pros & Cons - c#

We are evaluating our options for alternatives to the static file storage (which is hosted among multiple geographic location).
We are on Microsoft.NET platform (C#, ASP.NET, WEB API, SQL SERVER)
We would like to store digital assets, mostly BINARY (AI, PSD, JPG, PNG, PDF, XLS, DOC...) files on any NoSQL DB.
For image files it could contain thumbnail (small size) to original artwork (large file: ranges from 300 MB to more than 1 GB).
Thumbnail would appear on the web page, but original would be available as an attachment with option to edit (User could download originals and edit using respective program and update the version).
Each thumbnail and original needs to store multiple versions.
We would not be hosting these digital asset on 3rd party platform (like Amazon S3, Azure) and CDN.
This digital asset could be hosted on different geographic environment based upon user system configuration. (User in USA could store either USA, Europe, or ASIA based servers/db).
Each storage needs to be replicated.
We are looking into MongoDB for this. Does anyone could suggest pros & cons based above assumptions or any other alternatives?
Some of MongoDB research reveals...
Disk space consumption is 3 times larger than size of raw data
Could cut down space consumption by -oplogSize parameter
If We try to read chunk and stream to the browser speed could be 6 times slower than reading it from static file store.
Replication is not bidirectional and it works as Master and Slave.
I have prototyped to read digital asset from static file system and store it to MongoDB GridFS in default chunk. What is the better approach in storing thumbnail and originals to MongoDB? As thumbnail would always be less than 16MB, but original could/not be more than 16MB, so by default should I store all image asset on GridFS?
I could envision to create different DB based upon content type, for example: one for PDF, Excel, Word, another for Image.
How can we replicate among different servers?
How can we store it among different MongoDB instance among different region?
I would really appreciate any input.
Thank you.

Some of MongoDB research reveals...
Disk space consumption is 3 times larger than size of raw data Could
cut down space consumption by -oplogSize parameter If We try to read
chunk and stream to the browser speed could be 6 times slower than
reading it from static file store. Replication is not bidirectional
and it works as Master and Slave.
Did you tried to store data or just found some info somewhere? There is always an overhead if you are using a database (no matter which) than a plain filestorage. Why? Well, you have indexes and meta information.
mongodb is a shared nothing strong consistent db. So you write your data to one node and it then gets replicated. But you can use WriteConcerns (http://docs.mongodb.org/manual/core/write-operations/#write-concern) to wait and so make sure that your data is been written to a number, majority etc of nodes in a replicaset. With replication you can do rolling upgrades without downtime and it is also very easy to scale using sharding. And using shard-tags to 'pin' documents to specific shards. see here: http://www.kchodorow.com/blog/2012/07/25/controlling-collection-distribution/

Related

UWP Is the size of Roaming ApplicationData the file Actual Size or Size on Disk?

The following link explains the size of the maximum data allowed to roam between devices, and also that once the 100KB limit is exceeded, ALL roaming functionality is stopped.
https://msdn.microsoft.com/en-us/library/windows/apps/windows.storage.applicationdata.roamingsettings.aspx
Does anyone happen to know if the size of the file being roamed is the actual file size, or the size of the file on disk.
Just in case that isn't clear, I'm writing a JSON file with settings and data that is 736 bytes of actual space, which turns into 4KB of disk space. Which one of these values does Microsoft use for calculating available space remaining?
And, is there a framework anyone knows of for querying the amount of space left? I know Microsoft doesn't offer native support for that functionality, but thought there might be a third party solution.
Many thanks guys!

The size on disc only applies to your machine. Just the bare bytes are transmitted through the web.
You can just check the size of the settings file. It's located in your apps settings folder (%home%\AppData\Local\Packages\%appid%\Settings).
(But not accessible from the apps Sandbox...)
On the other hand, you know you can only store about 100k characters including keys, so if you really get anywhere near this, you should consider a different roaming mechanism or the kind of data you store there.

Store pdf files in a database

I would like to design a C# application to store emlployees data, I have around 500 employees. I want to store also pdf scanned profile of each employee. I am planing to use PostgreSQL. Is it practical to store the pdf scanned profiles in the database? Do I need to use blob data-type?

Assuming that PDF is not going to be very large (probably less than 5MB I assume) it is ok. You should use type BYTEA for this.
Read more about how to use Npgsql: .NET Postgresql driver (scroll to Working with binary data and bytea datatype)

yes you need to save them as BLOB objects or in bytea or text types, and you need to consider Postgres limitation regarding this. limited 2G's per entry, & 4 Billion per database for blobs and limited to 1G per entry,4 Billion entries per table for bytea or text, but if i were you i will save a reference to this file in the database "where this PDF is located in the local file system" and you stream this file once it is needed
for PostgreSQL limitation check the following link http://wiki.postgresql.org/wiki/BinaryFilesInDB

Document Management System - Where to Store Files?

I'm on charge of building an ASP.NET MVC Document Management System. It have to be able to do basic document management tasks like adding, editing and searching entries and also perform versioning.
Anyways, I'm targeting PDF, Office and many image formats as the file attached to each document entry in the database. My question is: What design guidelines do pros follow when building the storage mechanism? Do they store the document files in the file system? Database? How file uploading is handled?
I used to upload the files to a temporal location while the user was editing the data and move it to permanent storage when the user confirmed the entry creation. Is this good? Any suggestions on improvement?

Files should generally be stored on a filesystem, rather than a database.
You will, however, have to consider some other things:
Are you planning on ever supporting load-balancing, replication, etc for your system?
If so, you'll need to support saving / loading files from a network location of some sort.
This can be trickier than you may imagine.
Are you planning to secure access to the files?
If so, you'll need to ensure they can't be read by someone who happens to know the URL.
eg: by returning the file as an attachment to a request.
This also prevents user-provided files being executed on your server - eg someone uploading an .aspx or .exe file and then accessing it.

Should I save images to the database itself as binary information or as a path on the FS?

Background information:
This application is .NET 4/C# Windows
Forms using SQLite as it's backend.
There is only one user using the
database and in no way does it
interact through a network.
My software needs to save images associated to a Project record. Should I save the image as binary information in the database itself; or should I save the path to the picture on the file system and use that to retrieve it.
My concerns when saving as path is that someone might change the filename of a picture and that would essentially break my applications use.
Can anyone give some suggestions?interact through a network.

"It depends". If there are a lot of images, then all that BLOB weight may make backups increasingly painful (and indeed, may preclude some database implementations that only support limited sizes). But it works, and works well. The file system is fine as long as you only store the path relative to some unknown root, i.e. you store "foo/blah/blip.png", which is combined with configuration data to get the full path - then you can relocate the path easily. File systems have simpler backup options in some cases, but you need to marry the file-system and database backups.

In general, it is better to store them on the filesystem, with a path stored in the DB.
However, Microsoft published a white paper some time ago with research showing that files up to 150K can benefit from being put inside the DB (of course, this only pertains to SQL Server).
The question has been asked here many many times before:
Exact Duplicate: User Images: Database or filesystem storage?
Exact Duplicate: Storing images in database: Yea or nay?
Exact Duplicate: Should I store my images in the database or folders?
Exact Duplicate: Would you store binary data in database or folders?
Exact Duplicate: Store pictures as files or or the database for a web app?
Exact Duplicate: Storing a small number of images: blob or fs?
Exact Duplicate: store image in filesystem or database?

First of all have you checked the SQLite limits? If this is of no concern for you application, I would still chose the FS for storage needs simply due to overhead from getting large BLOBS from DB vs. reading a file from FS. You can mark the files as read only and hidden to lessen the chance of them being renamed... You can also store the file hash (like MD5) of a file in the DB so you can have secondary lookup option in case someone does rename the file (of course, they could move it as well in which case this would not help much)...

getting images out of mssql in C# using streams

I have a database which stores .png images as the sql "image" type. I have some code which retrieves these images as a byte[], and sends them to the page via the FileContentResult object in .Net. Performance is key in this application, and the images have to be retrieved and displayed as quickly as possible. My question is, can this operation be performed quicker by passing a byte stream from the database to the browser, and not at anytime storing the whole byte array in memory. If this is possible and worthwhile doing, how do I do it?
Here is the code I have so far:
// Get: /Image/Get/5
public FileResult Get(int id)
{
Response.Cache.SetExpires(DateTime.Now.AddSeconds(300));
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetValidUntilExpires(true);
// Get full size image by PageId.
return base.File(page.getFullsizeImage(id), "image/png");
}
And
public byte[] getFullsizeImage(int pageId)
{
return (from t in tPage
// Filter on pageId.
where t.PageId == pageId
select t.Image).Single().ToArray();
}
Thanks for any help!

A nice question.
Reality is the code required to send the image as a stream is really minimal. It is just Response.Write~~~ byte array and setting the HTTP's content-type header which must be very fast.
Now you seem to need to open up your database to the world to get it done quicker. That, being probably possible using features that allow SQL server to serve HTTP/interact with IIS (long time ago I looked at it), not a good idea so I do not believe you should take that risk.
You are already using the caching so that is cool but files being large, cache gets purged frequently.
But one thing to do is to have a local File Cache on the IIS and if image is used, it is written to the file on teh web server and from then on (until maybe next day when this is cleared) this other URL (to the static asset) is returned so requests would not have to go through the ASP.NET layer. It is not a great idea but will achieve what you need with least risk.

Changing the linq from single to first should give you nicer SQL, if PageId is the primary key you can safely assume first and single will return the same result.

Edit: Based on your comments, I think you should consider using DeepZoom from microsoft. Essentially, what this allows you to do is generate a specialized image file on the server. When a user is browsing the image in full view, just the couple of million or so pixels that are displayed on the screen are sent to the browser via AJAX. Then when the user zooms in, the appropriate pixels for the zoom level and x and y axis are streamed out.
There is a DeepZoom Composer which can be accessed via the command line to generate these image files on demand and write them to a network share. Your users will be really impressed.
Take a look at this example. This is a massive image - Gigabytes. in about the middle of the image you will see some newspaper pages. You can zoom right in and read the articles.
End of Edit
Do you have to have images with a large file size? If they are only meant for displaying in the browser, they should be optimized for the web. All main image editing applications have this ability.
If you do need the large file size, then you could provide optimized images and then when the user clicks on the image, allow them to download the full file. They should expect this download to take some time.
In Photoshop, the task is "Save for web". There is a similarly named plugin for Gimp.
I know that this doesn't answer your direct question ("can this operation be performed quicker by passing a byte stream"), but it might help solve your problem.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Suggestions on using NoSQL DB as FileStorage, and Pros & Cons - c#

Related

UWP Is the size of Roaming ApplicationData the file Actual Size or Size on Disk?

Store pdf files in a database

Document Management System - Where to Store Files?

Should I save images to the database itself as binary information or as a path on the FS?

getting images out of mssql in C# using streams

Categories

Resources