Stream record containing large byte array from SQL Server - c#

I want to stream a record from SQL Server. The record contains a byte array (image) and metadata about the image. All I've seen so far is how to stream the column, not the record.
I have 2 questions:
How can I stream the all data/record and populate my class?
Why is ExecuteSprocAccessor<>, which I assume buffers, just as performant as the stream with a 65MB image (about 90 seconds)?
I inherited this code and think it's too slow and resource intensive, so I'm looking for alternatives and this makes the most sense, I think.
My attempt at streaming (it works):
System.IO.Stream stream;
using (var cmd = _mgr.GetStoredProcCommand("dbo.uspUserDocument_Select", paramValues))
{
cmd.Connection = _mgr.CreateConnection();
cmd.CommandTimeout = 180;
await cmd.Connection.OpenAsync();
using (var dr = await cmd.ExecuteReaderAsync(System.Data.CommandBehavior.SequentialAccess).ConfigureAwait(false))
{
while (await dr.ReadAsync())
{
stream = dr.GetStream(4);
// I NEED TO GET THE WHOLE RECORD AND POPULATE A CLASS
}
}
}
Using ExecuteSprocAccessor:
return _mgr.ExecuteSprocAccessor<UserDocument>("dbo.uspUserDocument_Select",rowMapper, paramValues).ToList();

Related

Converting Blob Data (PDF) from SQL Database to a PDF-File

In my Datebase Table the PDFs are saved as Blob Data, example:
What I'm trying to do now is to create a PDF file out of this data.
My code is like that:
SqlConnection con = new SqlConnection(connectionString);
con.Open();
if (con.State == ConnectionState.Open)
{
string query = // fancy SELECTION string goes here... reads only one by the way
using (SqlCommand command = new SqlCommand(query, con))
{
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
Byte[] bytes = (Byte[])reader["File BLOB-Contents"];
Console.WriteLine(bytes.Length); // prints the correct file size in Bytes
using (FileStream fstream = new FileStream(#"C:\Users\myUsername\Desktop\test3.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
fstream.Write(bytes, 0, bytes.Length);
}
}
}
}
}
The pdf gets created in the end but the problem is, that I can't open it. I get the following (German) message in Adobe Reader:
Anyone here an idea or is there something I'm doing wrong? The file size is ok. It's not 0.
When we storing something like a PDF file in SQL Server, I would recommend converting the PDF file into a byte array and then put it into a column that is varbinary(max) instead of image.
Honestly, I think the recommended way of doing this is having the file reside not in the DB, but instead in either local file storage or some storage service like an AWS S3 bucket and have the location be stored in the database instead.

Appending to a Response Body in C# .NET Core

I am working on a project where I am populating some pdfs on the back-end, then I convert those pdfs into a List of byte[] which gets merged into one very large array and finally, send back via the response body as a Memory Stream.
My issue is that this is a large amount of data and during the process of getting the list of byte arrays to merge I am using a lot of memory.
I am wondering if instead of converting the final merged byte[] into a Memory Stream and adding that to the response body; could I create several Memory Stream objects that I an append to the Response.Body as they are created? Alternatively, I wondered if there was a way to use the one Memory Stream and just keep adding to it as a create each new byte[] for each pdf document?
Edit: This is probably a little long winded but I was too vague with my original post. At the core of what I am trying to do I have several pdf documents, they are each several pages long. Each of them is represented in the code below as one of the byte[] items in the filesToMerge List. Ideally, I would like to go through these one by one and convert them into a memory stream and send them to the client one right after the other in a loop. However, when I try to do this I get errors that the Response body has already been sent. Is there a way to append something to the response body so it is updated each time through the loop?
[HttpGet("template/{formId}/fillforms")]
public void FillForms(string formId/*, [FromBody] IList<IDictionary<string, string>> fieldDictionaries*/)
{
List<byte[]> filesToMerge = new List<byte[]>();
// For testing
var mockData = new MockData();
IList<IDictionary<string, string>> fieldDictionaries = mockData.GetMock1095Dictionaries();
foreach(IDictionary<string, string> dictionary in fieldDictionaries)
{
var populatedForm = this.dataRepo.PopulateForm(formId, dictionary);
// write to rb
filesToMerge.Add(populatedForm);
}
byte[] mergedFilesAsByteArray = this.dataRepo.GetMergedByteArray(filesToMerge);
this.SendResponse(formId + "_filled.pdf", new MemoryStream(mergedFilesAsByteArray));
}
private void SendResponse(string formName, MemoryStream ms, IDictionary<string, string> fieldData = null)
{
Response.Clear();
Response.ContentType = "application/pdf";
Response.Headers.Add("content-disposition", $"attachment;filename={formName}.pdf");
ms.WriteTo(Response.Body);
}
Memory streams are really just byte arrays with a bunch of nice methods on top. So switching to byte arrays won't help that much. A problem that a log of people run into when dealing with byte arrays and memory streams is not releasing the memory when you are done with the data since they occupy the memory of the machine you are running on so you can easily run out of memory. So you should be disposing of data as soon as you don't need it anymore with "using statements" as an example. Memory streams has a method called Dispose that will release all resources used by the stream
If you wanted to transfer the data from your application as quickly as possible the best approach would be to cut the stream into smaller parts and re-assemble them in the correct order at the destination. You could cut them to 1mb or 126kb really whatever you want. When you send the data to the destination you need to also pass what the order number of this part is because this method allows you to POSt the data in parallel and there is no guarantee of order.
To split a stream into multiple streams
private static List<MemoryStream> CreateChunks(Stream stream)
{
byte[] buffer = new byte[4000000]; //set the size of your buffer (chunk)
var returnStreams = new List<MemoryStream>();
using (MemoryStream ms = new MemoryStream())
{
while (true) //loop to the end of the file
{
var returnStream = new MemoryStream();
int read = stream.Read(buffer, 0, buffer.Length); //read each chunk
returnStream.Write(buffer, 0, read); //write chunk to [wherever];
if (read <= 0)
{ //check for end of file
return returnStreams;
}
else
{
returnStream.Position = 0;
returnStreams.Add(returnStream);
}
}
}
}
I then looped through the streams that were created to create tasks to post to the service and each task would post to the server. I would await all of the tasks to finish then call my server again to tell it I had finished uploading and it could combine all of the data into one in the correct order. My service has the concept of an upload session to keep track of all of the parts and which order they would go in. It would also save each part to the database as they came in; in my case Azure Blob storage.
It's not clear why you would be getting errors copying the contents of multiple MemoryStreams to the Response.Body. You should certainly be able to do this, although you'll need to be sure not to try and change response headers or the status code after you begin writing data (also don't try to call Response.Clear() after you begin writing data).
Here is a simple example of starting a response and then writing data:
[ApiController]
[Route("[controller]")]
public class RandomDataController : ControllerBase {
private readonly ILogger<RandomDataController> logger;
private const String CharacterData = "abcdefghijklmnopqrstuvwxyz0123456789 ";
public RandomDataController(ILogger<RandomDataController> logger) {
this.logger = logger;
}
[HttpGet]
public async Task Get(CancellationToken cancellationToken) {
this.Response.ContentType = "text/plain";
this.Response.ContentLength = 1000;
await this.Response.StartAsync(cancellationToken);
logger.LogInformation("Response Started");
var rand = new Random();
for (var i = 0; i < 1000; i++) {
// You should be able to copy the contents of a MemoryStream or other buffer here instead of sending random data like this does.
await this.Response.Body.WriteAsync(Encoding.UTF8.GetBytes(CharacterData[rand.Next(0, CharacterData.Length)].ToString()), cancellationToken);
Thread.Sleep(50); // This is just to demonstrate that data is being sent to the client as it is written
cancellationToken.ThrowIfCancellationRequested();
if (i % 100 == 0 && i > 0) {
logger.LogInformation("Response In Flight {PercentComplete}", (Double)i / 1000);
}
}
logger.LogInformation("Response Complete");
}
}
You can verify that this streams data back to the client using netcat:
% nc -nc 127.0.0.1 5000
GET /randomdata HTTP/1.1
Host: localhost:5000
Connection: Close
(Enter an extra blank line after Connection: Close to begin the request). You should see data appear in netcat as it is written to Response.Body on the server.
One thing to note is that this approach involves calculating the length of the data to be sent up front. If you are unable to calculate the size of the response up front, or prefer not to, you can look into Chunked Transfer Encoding, which ASP.Net should automatically use if you start writing data to the Response.Body without specifying the Content-Length.

Sending GZipped data via TcpClient [duplicate]

I've got a pesky problem with gzipstream targeting .Net 3.5. This is my first time working with gzipstream, however I have modeled after a number of tutorials including here and I'm still stuck.
My app serializes a datatable to xml and inserts into a database, storing the compressed data into a varbinary(max) field as well as the original length of the uncompressed buffer. Then, when I need it, I retrieve this data and decompress it and recreates the datatable. The decompress is what seems to fail.
EDIT: Sadly after changing the GetBuffer to ToArray as suggested, my issue remains. Code Updated below
Compress code:
DataTable dt = new DataTable("MyUnit");
//do stuff with dt
//okay... now compress the table
using (MemoryStream xmlstream = new MemoryStream())
{
//instead of stream, use xmlwriter?
System.Xml.XmlWriterSettings settings = new System.Xml.XmlWriterSettings();
settings.Encoding = Encoding.GetEncoding(1252);
settings.Indent = false;
System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(xmlstream, settings);
try
{
dt.WriteXml(writer);
writer.Flush();
}
catch (ArgumentException)
{
//likely an encoding issue... okay, base64 encode it
var base64 = Convert.ToBase64String(xmlstream.ToArray());
xmlstream.Write(Encoding.GetEncoding(1252).GetBytes(base64), 0, Encoding.GetEncoding(1252).GetBytes(base64).Length);
}
using (MemoryStream zipstream = new MemoryStream())
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Compress);
log.DebugFormat("Compressing commands...");
zip.Write(xmlstream.GetBuffer(), 0, xmlstream.ToArray().Length);
zip.Flush();
float ratio = (float)zipstream.ToArray().Length / (float)xmlstream.ToArray().Length;
log.InfoFormat("Resulting compressed size is {0:P2} of original", ratio);
using (SqlCommand cmd = new SqlCommand())
{
cmd.CommandText = "INSERT INTO tinydup (lastid, command, compressedlength) VALUES (#lastid,#compressed,#length)";
cmd.Connection = db;
cmd.Parameters.Add("#lastid", SqlDbType.Int).Value = lastid;
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.ToArray();
cmd.Parameters.Add("#length", SqlDbType.Int).Value = xmlstream.ToArray().Length;
cmd.ExecuteNonQuery();
}
}
Decompress Code:
/* This is an encapsulation of what I get from the database
public class DupUnit{
public uint lastid;
public uint complength;
public byte[] compressed;
}*/
//I have already retrieved my list of work to do from the database in a List<Dupunit> dupunits
foreach (DupUnit unit in dupunits)
{
DataSet ds = new DataSet();
//DataTable dt = new DataTable();
//uncompress and extract to original datatable
try
{
using (MemoryStream zipstream = new MemoryStream(unit.compressed))
{
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
//WHY ARE YOU ALWAYS 0!!!!!!!!
int bytesdecompressed = zip.Read(xmlbits, 0, unit.compressed.Length);
MemoryStream xmlstream = new MemoryStream(xmlbits);
log.DebugFormat("Uncompressed XML against {0} is: {1}", m_source.DSN, Encoding.GetEncoding(1252).GetString(xmlstream.ToArray()));
try{
ds.ReadXml(xmlstream);
}catch(Exception)
{
//it may have been base64 encoded... decode first.
ds.ReadXml(Encoding.GetEncoding(1254).GetString(
Convert.FromBase64String(
Encoding.GetEncoding(1254).GetString(xmlstream.ToArray())))
);
}
xmlstream.Dispose();
}
}
catch (Exception e)
{
log.Error(e);
Thread.Sleep(1000);//sleep a sec!
continue;
}
Note the comment above... bytesdecompressed is always 0. Any ideas? Am I doing it wrong?
EDIT 2:
So this is weird. I added the following debug code to the decompression routine:
GZipStream zip = new GZipStream(zipstream, CompressionMode.Decompress);
byte[] xmlbits = new byte[unit.complength];
int offset = 0;
while (zip.CanRead && offset < xmlbits.Length)
{
while (zip.Read(xmlbits, offset, 1) == 0) ;
offset++;
}
When debugging, sometimes that loop would complete, but other times it would hang. When I'd stop the debugging, it would be at byte 1600 out of 1616. I'd continue, but it wouldn't move at all.
EDIT 3: The bug appears to be in the compress code. For whatever reason, it is not saving all of the data. When I try to decompress the data using a third party gzip mechanism, I only get part of the original data.
I'd start a bounty, but I really don't have much reputation to give as of now :-(
Finally found the answer. The compressed data wasn't complete because GZipStream.Flush() does absolutely nothing to ensure that all of the data is out of the buffer - you need to use GZipStream.Close() as pointed out here. Of course, if you get a bad compress, it all goes downhill - if you try to decompress it, you will always get 0 returned from the Read().
I'd say this line, at least, is the most wrong:
cmd.Parameters.Add("#compressed", SqlDbType.VarBinary).Value = zipstream.GetBuffer();
MemoryStream.GetBuffer:
Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method.
It should be noted that in the zip format, it first works by locating data stored at the end of the file - so if you've stored more data than was required, the required entries at the "end" of the file don't exist.
As an aside, I'd also recommend a different name for your compressedlength column - I'd initially taken it (despite your narrative) as being intended to store, well, the length of the compressed data (and written part of my answer to address that). Maybe originalLength would be a better name?

Uploading Stream to Database

I have a FileForUploading class which should be uploaded to a database.
public class FileForUploading
{
public FileForUploading(string filename, Stream stream)
{
this.Filename = filename;
this.Stream = stream;
}
public string Filename { get; private set; }
public Stream Stream { get; private set; }
}
I am using the Entity Framework to convert it to a FileForUploadingEntity
which is a very simple class that however only contains the Filename property. I don't want to store the Stream in memory but rather upload it directly to the database.
What would be the best way to 'stream' the Stream directly to the database?
So far I have come up with this
private void UploadStream(string name, Stream stream)
{
var sqlQuery = #"UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;";
var nameParameter = new SqlParameter()
{
ParameterName = "#name",
Value = name
};
var contentParameter = new SqlParameter()
{
ParameterName = "#content",
Value = ConvertStream(stream),
SqlDbType = SqlDbType.Binary
};
// the database context used throughout the application.
this.context.Database.ExecuteSqlCommand(sqlQuery, contentParameter, nameParameter);
}
And here is my ConvertStream which converts the Stream to a byte[]. (It is stored as a varbinary(MAX) in the database.
private static byte[] ConvertStream(Stream stream)
{
using (var memoryStream = new MemoryStream())
{
stream.CopyTo(memoryStream);
return memoryStream.ToArray();
}
}
Is the above solution good enough? Will it perform well if the Stream is large?
I don't want to store the Stream in memory but rather upload it directly to the database.
With the above solution you proposed you still have the content of the stream in memory in your application which you mentioned initially is something you were trying to avoid.
Your best bet is to go around EF and use the async function to upload the stream. The following example is taken from MSDN article SqlClient Streaming Support.
// Application transferring a large BLOB to SQL Server in .Net 4.5
private static async Task StreamBLOBToServer() {
using (SqlConnection conn = new SqlConnection(connectionString)) {
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("INSERT INTO [BinaryStreams] (bindata) VALUES (#bindata)", conn)) {
using (FileStream file = File.Open("binarydata.bin", FileMode.Open)) {
// Add a parameter which uses the FileStream we just opened
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#bindata", SqlDbType.Binary, -1).Value = file;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
}
}
You could convert this sample to the following to make it work for you. Note that you should change the signature on your method to make it async so you can take advantage of not having a thread blocked during a long lasting database update.
// change your signature to async so the thread can be released during the database update/insert act
private async Task UploadStreamAsync(string name, Stream stream) {
var conn = this.context.Database.Connection; // SqlConnection from your DbContext
if(conn.State != ConnectionState.Open)
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;", conn)) {
cmd.Parameters.Add(new SqlParameter(){ParameterName = "#name",Value = name});
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#content", SqlDbType.Binary, -1).Value = stream;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
One more note. If you want to save large unstructured data sets (ie. the Streams you are getting uploaded) then it might be a better idea to not save them in the database. There are numerous reasons why but foremost is that relational database were not really designed with this in mind, its cumbersome to work with the data, and they can chew up database space real fast making other operations more difficult (ie. backups, restores, etc).
There is an alternative that still natively allows you to save a pointer in the record but have the actual unstructured data reside on disk. You can do this using the Sql Server FileStream. In ADO.NET you would be working with SqlFileStream. Here is a good walk through on how to configure your Sql Server and database to allow for Sql File Streams. It also has some Vb.net examples on how to use the SqlFileStream class.
An Introduction to SQL Server FileStream
I did assume you were using Microsoft Sql Server as your data repository. If this assumption is not correct please update your question and also add a tag for the correct database service you are connecting to.

Convert a byte[] to Image without using a MemoryStream

I am having a problem exporting SQL images to files. I first initialize a List. MyRecord is a class with GraphicName, and Graphic properties. When I try to go through the list and save MyRecord.Graphic to disk I get a first chance exception of type 'System.ObjectDisposedException'. I realize this is because when I converted the bytes from the database to an Image I used a using statement with the MemoryStream. I can not use the using statement and it all works, but I am worried about memory usage / memory leaks on up to 6,000 records. Is there another way to convert the bytes to an image or is there a better design to do this?
... prior code
using (SqlDataReader reader = sqlCommand.ExecuteReader())
{
while (reader.Read())
{
MyRecord record = new MyRecord();
record.GraphicId = reader["GRAPHIC_ID"].ToString();
record.Graphic = !reader.IsDBNull(reader.GetOrdinal("IMAGE")) ? GetImage((byte[])reader["IMAGE"]) : null;
records.Add(record);
}
... more code
private Image GetImage(byte[] rawImage)
{
using (System.IO.MemoryStream ms = new System.IO.MemoryStream(rawImage))
{
Image image = Image.FromStream(ms);
return image;
}
}
You shouldn't use a using statement with a stream that will be passed to Image.FromStream, as the Image class is basically responsible for the stream from then on. From the documentation:
You must keep the stream open for the lifetime of the Image.
Just change your code to:
private Image GetImage(byte[] rawImage)
{
var stream = new MemoryStream(rawImage);
return Image.FromStream(stream);
}
... but then make sure you dispose of your Image objects later. That will dispose of the stream, allowing the memory to be garbage collected. Then there shouldn't be any memory leaks - but you need to work out whether you can really load all 6000 images into memory at a time anyway.
(If you don't dispose of the Image objects, they're likely to be finalized anyway at some point - but it would be better to dispose of them deterministically.)

Categories