Retrieve large file from database CLOB - c#

I am working with asp.net and an Oracle SQL database.
I have a simple procedure in the database which returns a file (in this case an XML file) based on a given ID. In the .net application I open up an OracleConnection and read the file to a string with the OracleDataReader.
This works fine until the file size becomes very large (360mb), which causes an 'System.OutOfMemoryException' fault, which I am guessing happens because the process goes over 800mb of memory usage.
Is there a better way of retrieving the file or is it possible to increase the 800mb limit? Time is not an issue here.
Procedure in database
PROCEDURE get_xml(xml_id IN NUMBER,
p_records OUT SYS_REFCURSOR)
IS
BEGIN
OPEN p_records FOR
SELECT xml
FROM allxml
WHERE id = xml_id;
END get_xml;
c# code
using (OracleConnection oConn = new OracleConnection(ora_connection))
{
Procedure proc = null;
OracleParameter result = null;
oConn.Open();
OracleDataReader dr = null;
proc = Procedure.CreateProcedure("get_xml", oConn)
.Number("xml_id", id)
.RefCursor("p_records", out result)
.ExecuteReader(out dr);
if (dr.Read())
{
xml = dr.GetString(0);
}
}
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
return doc;
As you can see I load the file into string and then create an xml so I can process it .

A best approach for multi-db app can be to persist on DB only a path/link to the file and store that large blob in file system.

Related

Converting Blob Data (PDF) from SQL Database to a PDF-File

In my Datebase Table the PDFs are saved as Blob Data, example:
What I'm trying to do now is to create a PDF file out of this data.
My code is like that:
SqlConnection con = new SqlConnection(connectionString);
con.Open();
if (con.State == ConnectionState.Open)
{
string query = // fancy SELECTION string goes here... reads only one by the way
using (SqlCommand command = new SqlCommand(query, con))
{
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
Byte[] bytes = (Byte[])reader["File BLOB-Contents"];
Console.WriteLine(bytes.Length); // prints the correct file size in Bytes
using (FileStream fstream = new FileStream(#"C:\Users\myUsername\Desktop\test3.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
fstream.Write(bytes, 0, bytes.Length);
}
}
}
}
}
The pdf gets created in the end but the problem is, that I can't open it. I get the following (German) message in Adobe Reader:
Anyone here an idea or is there something I'm doing wrong? The file size is ok. It's not 0.
When we storing something like a PDF file in SQL Server, I would recommend converting the PDF file into a byte array and then put it into a column that is varbinary(max) instead of image.
Honestly, I think the recommended way of doing this is having the file reside not in the DB, but instead in either local file storage or some storage service like an AWS S3 bucket and have the location be stored in the database instead.

Uploading Stream to Database

I have a FileForUploading class which should be uploaded to a database.
public class FileForUploading
{
public FileForUploading(string filename, Stream stream)
{
this.Filename = filename;
this.Stream = stream;
}
public string Filename { get; private set; }
public Stream Stream { get; private set; }
}
I am using the Entity Framework to convert it to a FileForUploadingEntity
which is a very simple class that however only contains the Filename property. I don't want to store the Stream in memory but rather upload it directly to the database.
What would be the best way to 'stream' the Stream directly to the database?
So far I have come up with this
private void UploadStream(string name, Stream stream)
{
var sqlQuery = #"UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;";
var nameParameter = new SqlParameter()
{
ParameterName = "#name",
Value = name
};
var contentParameter = new SqlParameter()
{
ParameterName = "#content",
Value = ConvertStream(stream),
SqlDbType = SqlDbType.Binary
};
// the database context used throughout the application.
this.context.Database.ExecuteSqlCommand(sqlQuery, contentParameter, nameParameter);
}
And here is my ConvertStream which converts the Stream to a byte[]. (It is stored as a varbinary(MAX) in the database.
private static byte[] ConvertStream(Stream stream)
{
using (var memoryStream = new MemoryStream())
{
stream.CopyTo(memoryStream);
return memoryStream.ToArray();
}
}
Is the above solution good enough? Will it perform well if the Stream is large?
I don't want to store the Stream in memory but rather upload it directly to the database.
With the above solution you proposed you still have the content of the stream in memory in your application which you mentioned initially is something you were trying to avoid.
Your best bet is to go around EF and use the async function to upload the stream. The following example is taken from MSDN article SqlClient Streaming Support.
// Application transferring a large BLOB to SQL Server in .Net 4.5
private static async Task StreamBLOBToServer() {
using (SqlConnection conn = new SqlConnection(connectionString)) {
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("INSERT INTO [BinaryStreams] (bindata) VALUES (#bindata)", conn)) {
using (FileStream file = File.Open("binarydata.bin", FileMode.Open)) {
// Add a parameter which uses the FileStream we just opened
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#bindata", SqlDbType.Binary, -1).Value = file;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
}
}
You could convert this sample to the following to make it work for you. Note that you should change the signature on your method to make it async so you can take advantage of not having a thread blocked during a long lasting database update.
// change your signature to async so the thread can be released during the database update/insert act
private async Task UploadStreamAsync(string name, Stream stream) {
var conn = this.context.Database.Connection; // SqlConnection from your DbContext
if(conn.State != ConnectionState.Open)
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;", conn)) {
cmd.Parameters.Add(new SqlParameter(){ParameterName = "#name",Value = name});
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#content", SqlDbType.Binary, -1).Value = stream;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
One more note. If you want to save large unstructured data sets (ie. the Streams you are getting uploaded) then it might be a better idea to not save them in the database. There are numerous reasons why but foremost is that relational database were not really designed with this in mind, its cumbersome to work with the data, and they can chew up database space real fast making other operations more difficult (ie. backups, restores, etc).
There is an alternative that still natively allows you to save a pointer in the record but have the actual unstructured data reside on disk. You can do this using the Sql Server FileStream. In ADO.NET you would be working with SqlFileStream. Here is a good walk through on how to configure your Sql Server and database to allow for Sql File Streams. It also has some Vb.net examples on how to use the SqlFileStream class.
An Introduction to SQL Server FileStream
I did assume you were using Microsoft Sql Server as your data repository. If this assumption is not correct please update your question and also add a tag for the correct database service you are connecting to.

read word doc file stored as blob and convert its content to string

I have a stored word document as BLOB on Mysql and I am trying to read it using c#, get the text inside it. Can someone give me short code on how to do that. so far I have managed to read the bytes from the database using:
using (MySqlConnection conn = new MySqlConnection())
{
conn.ConnectionString = "connection string is here";
conn.Open();
MySqlCommand command = new MySqlCommand("select filename, document_content from job_db.person_documents where doc_type = 'application/msword' limit 1;", conn);
using (MySqlDataReader reader = command.ExecuteReader())
{
// while there is another record present
while (reader.Read())
{
Byte[] bytData = (byte[])reader[1];
}
}
conn.Close();
}
There is an office talk article about reading open XML docs from memory.
https://msdn.microsoft.com/en-us/library/ee945362%28v=office.11%29.aspx
To access these methods, you need the Open XML SDK.
http://www.microsoft.com/en-au/download/details.aspx?id=30425
Hopefully that's enough to get you started.

FileStream error - The process cannot access the file...used by another process

I have created a method to read from my database, the database has a blob field that stores images.
This method is used to read the Image as well as every other field.
When i run my Application it works and the form displays all the details, but if i close the form and reopen it again, i end up with this error.
IO Exception was unhandled
An unhandled exception of type 'System.IO.IOException' occurred in mscorlib.dll
Additional information: The process cannot access the file 'C:\Users\MyName\Documents\Visual Studio 2013\Projects\MyApp\MyApp\bin\Debug\pic0jpg' because it is being used by another process."
I've tried putting the filestream object in a using statement and other solutions but the error still exists.
Any help would be appreciated, thanks.
public void loadPlane()
{
//convert picture to jpg and load other details
MySqlDataAdapter sqladapt;
DataSet dataSet;
DatabaseConnector dbcon = new DatabaseConnector();
dbcon.OpenConnection();
sqladapt = new MySqlDataAdapter();
sqladapt.SelectCommand = dbcon.InitSqlCommand("SELECT * FROM plane");
dataSet = new DataSet("dst");
sqladapt.Fill(dataSet);
dbcon.CloseConnection();
DataTable tableData = dataSet.Tables[0];
//use from iplane class
IPlane.countPlane = tableData.Rows.Count;
for (int i = 0; i < IPlane.countPlane; i++)
{
IPlane.planeObj[i] = new NormalPlane();
DataRow drow = tableData.Rows[i];
string plName = "pic" + Convert.ToString(i);
FileStream FSNew = new FileStream(plName+"jpg",FileMode.Create);
byte[] blob = (byte[])drow[5];
FSNew.Write(blob,0,blob.Length);
FSNew.Close();
FSNew = null;
IPlane.planeObj[i].PlaneImage=(Image.FromFile(plName + "jpg"));
IPlane.planeObj[i].ModelNumber = drow[0].ToString();
IPlane.planeObj[i].EngineType = drow[1].ToString();
IPlane.planeObj[i].FlySpeed = Convert.ToInt32( drow[2]);
IPlane.planeObj[i].SpecialFeature = drow[3].ToString();
IPlane.planeObj[i].Price = Convert.ToDouble( drow[4]);
IPlane.planeObj[i].EconSeats = Convert.ToInt32(drow[6]);
IPlane.planeObj[i].BussSeats = Convert.ToInt32(drow[7]);
IPlane.planeObj[i].FirstSeats = Convert.ToInt32(drow[8]);
//drow 5 is the image
}
}
this is the main problem:
IPlane.planeObj[i].PlaneImage=(Image.FromFile(plName + "jpg"));
you are opening the image and leaving it opened.
You have 2 way to go:
create the image with a memory stream from the blob and not as a file
create the image as file and use the path of the image as your property IPlane.planeObj[i].PlaneImage and load the file when you need it. (I don't understad why you save an image as blob on db and then create the file anyway)
If you don't have many image or they are not so big or you don't have memory problems I think the first is better.
by the way your code has a lots of resources not disposed (connection, streams, etc etc)
You really should be disposing of your DatabaseConnector after closing it. The best practice for this would be to use a Using Statement:
using (DatabaseConnector dbcon = new DatabaseConnector)
{
// Code that uses connection.
}
Alternatively, just call dbcon.Dispose() after you call .CloseConnection()
This will ensure ALL resources used by DatabaseConnection are released.
As #giammin noted, you are opening a file but never close it. The code also creates a lot of resources that should be disposed.
Moreover, you don't need to save the file at all unless you want to use it at a later time. You can create an Image object from a stream using Image.FromStream. You can wrap the buffer in a MemoryStream and pass it to Image.FromStream, e.g:
using(var stream=new MemoryStream(drow[5]))
{
var image=Image.FromStream(stream);
Plane.planeObj[i].PlaneImage = image;
}

How to process large excel files?

I'm having trouble uploading large excel files (300mb+) using a data reader. With this code I open the excel file and load each row separately. Using breakpoints I noticed that that one statement takes 30s+. The memory usage also has a steady increase.
Specifying the CommandBehavior parameter (e.g. SequentialAccess) of the ExecuteReader() method has no effect.
What am I doing wrong here? Are there alternative ways of processing large (excel) files?
const string inputFilePath = #"C:\largefile.xlsx";
const string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=\"Excel 12.0;IMEX=1;HDR=YES;\";Data Source=" + inputFilePath;
using (var connection = new OleDbConnection(connectionString))
{
connection.Open();
var command = new OleDbCommand("largesheet$", connection) {CommandType = CommandType.TableDirect};
var reader = command.ExecuteReader(); // <-- Completely loads file/sheet into memory
while (reader.HasRows)
{
reader.Read();
}
connection.Close();
}
can you try to load the file in memory with this :
Stream exportData = new MemoryStream(byte[] fileBuffer);

Categories