I have a FileForUploading class which should be uploaded to a database.
public class FileForUploading
{
public FileForUploading(string filename, Stream stream)
{
this.Filename = filename;
this.Stream = stream;
}
public string Filename { get; private set; }
public Stream Stream { get; private set; }
}
I am using the Entity Framework to convert it to a FileForUploadingEntity
which is a very simple class that however only contains the Filename property. I don't want to store the Stream in memory but rather upload it directly to the database.
What would be the best way to 'stream' the Stream directly to the database?
So far I have come up with this
private void UploadStream(string name, Stream stream)
{
var sqlQuery = #"UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;";
var nameParameter = new SqlParameter()
{
ParameterName = "#name",
Value = name
};
var contentParameter = new SqlParameter()
{
ParameterName = "#content",
Value = ConvertStream(stream),
SqlDbType = SqlDbType.Binary
};
// the database context used throughout the application.
this.context.Database.ExecuteSqlCommand(sqlQuery, contentParameter, nameParameter);
}
And here is my ConvertStream which converts the Stream to a byte[]. (It is stored as a varbinary(MAX) in the database.
private static byte[] ConvertStream(Stream stream)
{
using (var memoryStream = new MemoryStream())
{
stream.CopyTo(memoryStream);
return memoryStream.ToArray();
}
}
Is the above solution good enough? Will it perform well if the Stream is large?
I don't want to store the Stream in memory but rather upload it directly to the database.
With the above solution you proposed you still have the content of the stream in memory in your application which you mentioned initially is something you were trying to avoid.
Your best bet is to go around EF and use the async function to upload the stream. The following example is taken from MSDN article SqlClient Streaming Support.
// Application transferring a large BLOB to SQL Server in .Net 4.5
private static async Task StreamBLOBToServer() {
using (SqlConnection conn = new SqlConnection(connectionString)) {
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("INSERT INTO [BinaryStreams] (bindata) VALUES (#bindata)", conn)) {
using (FileStream file = File.Open("binarydata.bin", FileMode.Open)) {
// Add a parameter which uses the FileStream we just opened
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#bindata", SqlDbType.Binary, -1).Value = file;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
}
}
You could convert this sample to the following to make it work for you. Note that you should change the signature on your method to make it async so you can take advantage of not having a thread blocked during a long lasting database update.
// change your signature to async so the thread can be released during the database update/insert act
private async Task UploadStreamAsync(string name, Stream stream) {
var conn = this.context.Database.Connection; // SqlConnection from your DbContext
if(conn.State != ConnectionState.Open)
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;", conn)) {
cmd.Parameters.Add(new SqlParameter(){ParameterName = "#name",Value = name});
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#content", SqlDbType.Binary, -1).Value = stream;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
One more note. If you want to save large unstructured data sets (ie. the Streams you are getting uploaded) then it might be a better idea to not save them in the database. There are numerous reasons why but foremost is that relational database were not really designed with this in mind, its cumbersome to work with the data, and they can chew up database space real fast making other operations more difficult (ie. backups, restores, etc).
There is an alternative that still natively allows you to save a pointer in the record but have the actual unstructured data reside on disk. You can do this using the Sql Server FileStream. In ADO.NET you would be working with SqlFileStream. Here is a good walk through on how to configure your Sql Server and database to allow for Sql File Streams. It also has some Vb.net examples on how to use the SqlFileStream class.
An Introduction to SQL Server FileStream
I did assume you were using Microsoft Sql Server as your data repository. If this assumption is not correct please update your question and also add a tag for the correct database service you are connecting to.
Related
I have a project where I need to copy the contents of the .xlsx file I received in Web API Controller (in the form of the Stream from MultipartReader) to SQL Server Database. I'm using SqlBulkCopy for copying itself (I already did a similar task for .csv files), but all of the solutions I was able to find suffer from one or more of the following problems:
Require saving the file to the disk first (not possible in my case)
Don't have any way of reading the file asynchronously
Load entire file into memory first (I'm expecting to deal with fairly large files, so this is not acceptable for me)
Are commercially licensed
Are there any ways of doing this?
Jeroen is correct, in that it is not possible to handle Excel files in a purely streaming manner. While it might require loading the entire .xlsx file in memory, the efficiency of the library can have an even larger impact on the memory usage than the file size. I say this as the author of the most efficient Excel reader for .NET: Sylvan.Data.Excel.
In benchmarks comparing it to other libraries, you can see that not only is it significantly faster than other implementations, but it also uses only a tiny fraction of the memory that other libraries consume.
With the exception of "Load entire file into memory first", it should satisfy all of your requirements. It can process data out of a MemoryStream, it doesn't need to write to disk. It implements DbDataReader which provides ReadAsync. The ReadAsync implementation defaults to the base DbDataReader implementation which defers to the synchronous Read() method, but when the file is buffered in a MemoryStream this doesn't present a problem, and allows the SqlBulkCopy.WriteToServerAsync to process it asynchronously. Finally, it is MIT licensed, so you can do whatever you want with it.
using Sylvan.Data;
using Sylvan.Data.Excel;
using System.Data.Common;
using System.Data.SqlClient;
// provide a schema that maps the columns in the Excel file to the names/types in your database.
var opts = new ExcelDataReaderOptions
{
Schema = MyDataSchemaProvider.Instance
};
var filename = "mydata.xlsx";
var ms = new MemoryStream();
// asynchronously load the file into memory
// this might be loading from an Asp.NET IFormFile instead
using(var f = File.OpenRead(filename))
{
await f.CopyToAsync(ms);
ms.Seek(0, SeekOrigin.Begin);
}
// determine the workbook type from the file-extension
var workbookType = ExcelDataReader.GetWorkbookType(filename);
var edr = ExcelDataReader.Create(ms, workbookType, opts);
// "select" the columns to load. This extension method comes from the Sylvan.Data library.
var dataToLoad = edr.Select("PartNumber", "ServiceDate");
// bulk copy the data to the server.
var conn = new SqlConnection("Data Source=.;Initial Catalog=mydb;Integrated Security=true;");
conn.Open();
var bc = new SqlBulkCopy(conn);
bc.DestinationTableName = "MyData";
bc.EnableStreaming = true;
await bc.WriteToServerAsync(dataToLoad);
// Implement an ExcelSchemaProvider that maps the columns in the excel file
sealed class MyDataSchemaProvider : ExcelSchemaProvider
{
public static ExcelSchemaProvider Instance = new MyDataSchemaProvider();
static readonly DbColumn PartNumber = new MyColumn("PartNumber", typeof(int));
static readonly DbColumn ServiceDate = new MyColumn("ServiceDate", typeof(DateTime));
// etc...
static readonly Dictionary<string, DbColumn> Mapping = new Dictionary<string, DbColumn>(StringComparer.OrdinalIgnoreCase)
{
{ "partnumber", PartNumber },
{ "number", PartNumber },
{ "prt_nmbr", PartNumber },
{ "servicedate", ServiceDate },
{ "service_date", ServiceDate },
{ "svc_dt", ServiceDate },
{ "sd", ServiceDate },
};
public override DbColumn? GetColumn(string sheetName, string? name, int ordinal)
{
if (string.IsNullOrEmpty(name))
{
// There was no name in the header row, can't map to anything.
return null;
}
if (Mapping.TryGetValue(name, out DbColumn? col))
{
return col;
}
// header name is unknown. Might be better to throw in this case.
return null;
}
class MyColumn : DbColumn
{
public MyColumn(string name, Type type, bool allowNull = false)
{
this.ColumnName = name;
this.DataType = type;
this.AllowDBNull = allowNull;
}
}
public override bool HasHeaders(string sheetName)
{
return true;
}
}
The most complicated part of this is probably the "schema provider" which is used to provide header name mappings and define the column types, which are required for SqlBulkCopy to operate correctly.
I also maintain the Sylvan.Data.Csv library, which provides very similar capabilities for CSV files, and is a fully asynchronous streaming CSV reader impelementation. The API it provides is nearly identical to the Sylvan ExcelDataReader. It is also the fastest CSV reader for .NET.
If you end up trying these libraries and have any troubles, open an issue in the github repo and I can take a look.
In my Datebase Table the PDFs are saved as Blob Data, example:
What I'm trying to do now is to create a PDF file out of this data.
My code is like that:
SqlConnection con = new SqlConnection(connectionString);
con.Open();
if (con.State == ConnectionState.Open)
{
string query = // fancy SELECTION string goes here... reads only one by the way
using (SqlCommand command = new SqlCommand(query, con))
{
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
Byte[] bytes = (Byte[])reader["File BLOB-Contents"];
Console.WriteLine(bytes.Length); // prints the correct file size in Bytes
using (FileStream fstream = new FileStream(#"C:\Users\myUsername\Desktop\test3.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
fstream.Write(bytes, 0, bytes.Length);
}
}
}
}
}
The pdf gets created in the end but the problem is, that I can't open it. I get the following (German) message in Adobe Reader:
Anyone here an idea or is there something I'm doing wrong? The file size is ok. It's not 0.
When we storing something like a PDF file in SQL Server, I would recommend converting the PDF file into a byte array and then put it into a column that is varbinary(max) instead of image.
Honestly, I think the recommended way of doing this is having the file reside not in the DB, but instead in either local file storage or some storage service like an AWS S3 bucket and have the location be stored in the database instead.
I am using couchdb for some reason as a content management to upload files as binary data, there is no GridFs support like mongoDB to upload large files, so I need to upload files as chunks then retrieve them as one file.
here is my code
public string InsertDataToCouchDb(string dbName, string id, string filename, byte[] image)
{
var connection = System.Configuration.ConfigurationManager.ConnectionStrings["CouchDb"].ConnectionString;
using (var db = new MyCouchClient(connection, dbName))
{
// HERE I NEED TO UPLOAD MY IMAGE BYTE[] AS CHUNKS
var artist = new couchdb
{
_id = id,
filename = filename,
Image = image
};
var response = db.Entities.PutAsync(artist);
return response.Result.Content._id;
}
}
public byte[] FetchDataFromCouchDb(string dbName, string id)
{
var connection = System.Configuration.ConfigurationManager.ConnectionStrings["CouchDb"].ConnectionString;
using (var db = new MyCouchClient(connection, dbName))
{
//HERE I NEED TO RETRIVE MY FULL IMAGE[] FROM CHUNKS
var test = db.Documents.GetAsync(id, null);
var doc = db.Serializer.Deserialize<couchdb>(test.Result.Content);
return doc.Image;
}
}
THANK YOU
Putting image data in a CouchDB document is a terrible idea. Just don't. This is the purpose of CouchDB attachments.
The potential of bloating the database with redundant blob data via document updates alone will surely have major, negative consequences for anything other than a toy database.
Further there seems to be a lack of understanding how async/await works as the code in the OP is invoking async methods, e.g. db.Entities.PutAsync(artist), without an await - the call surely will fail every time (if the compiler even allows the code). I highly recommend grok'ing the Microsoft document Asynchronous programming with async and await.
Now as for "chunking": If the image data is so large that it needs to be otherwise streamed, the business of passing it around via a byte array looks bad. If the images are relatively small, just use Attachment.PutAsync as it stands.
Although Attachment.PutAsync at MyCouch v7.6 does not support streams (effectively chunking) there exists the Support Streams for attachments #177 PR, which does, and it looks pretty good.
Here's a one page C# .Net Core console app that uploads a given file as an attachment to a specific document using the very efficient streaming provided by PR 177. Although the code uses PR 177, it most importantly uses Attachments for blob data. Replacing a stream with a byte array is rather straightforward.
MyCouch + PR 177
In a console get MyCouch sources and then apply PR 177
$ git clone https://github.com/danielwertheim/mycouch.git
$ cd mycouch
$ git pull origin 15a1079502a1728acfbfea89a7e255d0c8725e07
(I don't know git so there's probably a far better way to get a PR)
MyCouchUploader
With VS2019
Create a new .Net Core console app project and solution named "MyCouchUploader"
Add the MyCouch project pulled with PR 177 to the solution
Add the MyCouch project as MyCouchUploader dependency
Add the Nuget package "Microsoft.AspNetCore.StaticFiles" as a MyCouchUploader dependency
Replace the content of Program.cs with the following code:
using Microsoft.AspNetCore.StaticFiles;
using MyCouch;
using MyCouch.Requests;
using MyCouch.Responses;
using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Security.Cryptography;
using System.Threading.Tasks;
namespace MyCouchUploader
{
class Program
{
static async Task Main(string[] args)
{
// args: scheme, database, file path of asset to upload.
if (args.Length < 3)
{
Console.WriteLine("\nUsage: MyCouchUploader scheme dbname filepath\n");
return;
}
var opts = new
{
scheme = args[0],
dbName = args[1],
filePath = args[2]
};
Action<Response> check = (response) =>
{
if (!response.IsSuccess) throw new Exception(response.Reason);
};
try
{
// canned doc id for this app
const string docId = "SO-68998781";
const string attachmentName = "Image";
DbConnectionInfo cnxn = new DbConnectionInfo(opts.scheme, opts.dbName)
{ // timely fail if scheme is bad
Timeout = TimeSpan.FromMilliseconds(3000)
};
MyCouchClient client = new MyCouchClient(cnxn);
// ensure db is there
GetDatabaseResponse info = await client.Database.GetAsync();
check(info);
// delete doc for succcessive program runs
DocumentResponse doc = await client.Documents.GetAsync(docId);
if (doc.StatusCode == HttpStatusCode.OK)
{
DocumentHeaderResponse del = await client.Documents.DeleteAsync(docId, doc.Rev);
check(del);
}
// sniff file for content type
FileExtensionContentTypeProvider provider = new FileExtensionContentTypeProvider();
if (!provider.TryGetContentType(opts.filePath, out string contentType))
{
contentType = "application/octet-stream";
}
// create a hash for silly verification
using var md5 = MD5.Create();
using Stream stream = File.OpenRead(opts.filePath);
byte[] fileHash = md5.ComputeHash(stream);
stream.Position = 0;
// Use PR 177, sea-locks:stream-attachments.
DocumentHeaderResponse put = await client.Attachments.PutAsync(new PutAttachmentStreamRequest(
docId,
attachmentName,
contentType,
stream // :-D
));
check(put);
// verify
AttachmentResponse verify = await client.Attachments.GetAsync(docId, attachmentName);
check(verify);
if (fileHash.SequenceEqual(md5.ComputeHash(verify.Content)))
{
Console.WriteLine("Atttachment verified.");
}
else
{
throw new Exception(String.Format("Attachment failed verification with status code {0}", verify.StatusCode));
}
}
catch (Exception e)
{
Console.WriteLine("Fail! {0}", e.Message);
}
}
}
}
To run:
$ MyCouchdbUploader http://name:password#localhost:5984 dbname path-to-local-image-file
Use Fauxton to visually verify the attachment for the doc.
I want to stream a record from SQL Server. The record contains a byte array (image) and metadata about the image. All I've seen so far is how to stream the column, not the record.
I have 2 questions:
How can I stream the all data/record and populate my class?
Why is ExecuteSprocAccessor<>, which I assume buffers, just as performant as the stream with a 65MB image (about 90 seconds)?
I inherited this code and think it's too slow and resource intensive, so I'm looking for alternatives and this makes the most sense, I think.
My attempt at streaming (it works):
System.IO.Stream stream;
using (var cmd = _mgr.GetStoredProcCommand("dbo.uspUserDocument_Select", paramValues))
{
cmd.Connection = _mgr.CreateConnection();
cmd.CommandTimeout = 180;
await cmd.Connection.OpenAsync();
using (var dr = await cmd.ExecuteReaderAsync(System.Data.CommandBehavior.SequentialAccess).ConfigureAwait(false))
{
while (await dr.ReadAsync())
{
stream = dr.GetStream(4);
// I NEED TO GET THE WHOLE RECORD AND POPULATE A CLASS
}
}
}
Using ExecuteSprocAccessor:
return _mgr.ExecuteSprocAccessor<UserDocument>("dbo.uspUserDocument_Select",rowMapper, paramValues).ToList();
how to read MP3 from Sql database. in sql i have stored the file as binary format. now i want to retrive the Mp3 file stored in the sql and show in my aspx page. how????
pls help...
In its simplest form this is how you would get the raw bytes, can't really show any more without knowing what you want it for...
private byte[] GetMp3Bytes(string connString)
{
SqlConnection conn = null;
SqlCommand cmd = null;
SqlDataReader reader = null;
using (conn = new SqlConnection(connString))
{
conn.Open();
using (cmd = new SqlCommand("SELECT TOP 1 Mp3_File FROM MP3_Table", conn))
using (reader = cmd.ExecuteReader())
{
reader.Read();
return reader["Mp3_File"] as byte[];
}
}
}
You'd probably want to use a Generic ASHX Handler that retrieves the binary data and streams it to the response stream with the correct content-type header ("audio/mpeg").
If you look at the article Displaying Images in ASP.NET Using HttpHandlers then you should see the basic principle. You just need to change the content-type output.