How to process large excel files? - c#

I'm having trouble uploading large excel files (300mb+) using a data reader. With this code I open the excel file and load each row separately. Using breakpoints I noticed that that one statement takes 30s+. The memory usage also has a steady increase.
Specifying the CommandBehavior parameter (e.g. SequentialAccess) of the ExecuteReader() method has no effect.
What am I doing wrong here? Are there alternative ways of processing large (excel) files?
const string inputFilePath = #"C:\largefile.xlsx";
const string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=\"Excel 12.0;IMEX=1;HDR=YES;\";Data Source=" + inputFilePath;
using (var connection = new OleDbConnection(connectionString))
{
connection.Open();
var command = new OleDbCommand("largesheet$", connection) {CommandType = CommandType.TableDirect};
var reader = command.ExecuteReader(); // <-- Completely loads file/sheet into memory
while (reader.HasRows)
{
reader.Read();
}
connection.Close();
}

can you try to load the file in memory with this :
Stream exportData = new MemoryStream(byte[] fileBuffer);

Related

Converting Blob Data (PDF) from SQL Database to a PDF-File

In my Datebase Table the PDFs are saved as Blob Data, example:
What I'm trying to do now is to create a PDF file out of this data.
My code is like that:
SqlConnection con = new SqlConnection(connectionString);
con.Open();
if (con.State == ConnectionState.Open)
{
string query = // fancy SELECTION string goes here... reads only one by the way
using (SqlCommand command = new SqlCommand(query, con))
{
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
Byte[] bytes = (Byte[])reader["File BLOB-Contents"];
Console.WriteLine(bytes.Length); // prints the correct file size in Bytes
using (FileStream fstream = new FileStream(#"C:\Users\myUsername\Desktop\test3.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
fstream.Write(bytes, 0, bytes.Length);
}
}
}
}
}
The pdf gets created in the end but the problem is, that I can't open it. I get the following (German) message in Adobe Reader:
Anyone here an idea or is there something I'm doing wrong? The file size is ok. It's not 0.
When we storing something like a PDF file in SQL Server, I would recommend converting the PDF file into a byte array and then put it into a column that is varbinary(max) instead of image.
Honestly, I think the recommended way of doing this is having the file reside not in the DB, but instead in either local file storage or some storage service like an AWS S3 bucket and have the location be stored in the database instead.

Uploading Stream to Database

I have a FileForUploading class which should be uploaded to a database.
public class FileForUploading
{
public FileForUploading(string filename, Stream stream)
{
this.Filename = filename;
this.Stream = stream;
}
public string Filename { get; private set; }
public Stream Stream { get; private set; }
}
I am using the Entity Framework to convert it to a FileForUploadingEntity
which is a very simple class that however only contains the Filename property. I don't want to store the Stream in memory but rather upload it directly to the database.
What would be the best way to 'stream' the Stream directly to the database?
So far I have come up with this
private void UploadStream(string name, Stream stream)
{
var sqlQuery = #"UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;";
var nameParameter = new SqlParameter()
{
ParameterName = "#name",
Value = name
};
var contentParameter = new SqlParameter()
{
ParameterName = "#content",
Value = ConvertStream(stream),
SqlDbType = SqlDbType.Binary
};
// the database context used throughout the application.
this.context.Database.ExecuteSqlCommand(sqlQuery, contentParameter, nameParameter);
}
And here is my ConvertStream which converts the Stream to a byte[]. (It is stored as a varbinary(MAX) in the database.
private static byte[] ConvertStream(Stream stream)
{
using (var memoryStream = new MemoryStream())
{
stream.CopyTo(memoryStream);
return memoryStream.ToArray();
}
}
Is the above solution good enough? Will it perform well if the Stream is large?
I don't want to store the Stream in memory but rather upload it directly to the database.
With the above solution you proposed you still have the content of the stream in memory in your application which you mentioned initially is something you were trying to avoid.
Your best bet is to go around EF and use the async function to upload the stream. The following example is taken from MSDN article SqlClient Streaming Support.
// Application transferring a large BLOB to SQL Server in .Net 4.5
private static async Task StreamBLOBToServer() {
using (SqlConnection conn = new SqlConnection(connectionString)) {
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("INSERT INTO [BinaryStreams] (bindata) VALUES (#bindata)", conn)) {
using (FileStream file = File.Open("binarydata.bin", FileMode.Open)) {
// Add a parameter which uses the FileStream we just opened
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#bindata", SqlDbType.Binary, -1).Value = file;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
}
}
You could convert this sample to the following to make it work for you. Note that you should change the signature on your method to make it async so you can take advantage of not having a thread blocked during a long lasting database update.
// change your signature to async so the thread can be released during the database update/insert act
private async Task UploadStreamAsync(string name, Stream stream) {
var conn = this.context.Database.Connection; // SqlConnection from your DbContext
if(conn.State != ConnectionState.Open)
await conn.OpenAsync();
using (SqlCommand cmd = new SqlCommand("UPDATE dbo.FilesForUpload SET Content =#content WHERE Name=#name;", conn)) {
cmd.Parameters.Add(new SqlParameter(){ParameterName = "#name",Value = name});
// Size is set to -1 to indicate "MAX"
cmd.Parameters.Add("#content", SqlDbType.Binary, -1).Value = stream;
// Send the data to the server asynchronously
await cmd.ExecuteNonQueryAsync();
}
}
One more note. If you want to save large unstructured data sets (ie. the Streams you are getting uploaded) then it might be a better idea to not save them in the database. There are numerous reasons why but foremost is that relational database were not really designed with this in mind, its cumbersome to work with the data, and they can chew up database space real fast making other operations more difficult (ie. backups, restores, etc).
There is an alternative that still natively allows you to save a pointer in the record but have the actual unstructured data reside on disk. You can do this using the Sql Server FileStream. In ADO.NET you would be working with SqlFileStream. Here is a good walk through on how to configure your Sql Server and database to allow for Sql File Streams. It also has some Vb.net examples on how to use the SqlFileStream class.
An Introduction to SQL Server FileStream
I did assume you were using Microsoft Sql Server as your data repository. If this assumption is not correct please update your question and also add a tag for the correct database service you are connecting to.

read word doc file stored as blob and convert its content to string

I have a stored word document as BLOB on Mysql and I am trying to read it using c#, get the text inside it. Can someone give me short code on how to do that. so far I have managed to read the bytes from the database using:
using (MySqlConnection conn = new MySqlConnection())
{
conn.ConnectionString = "connection string is here";
conn.Open();
MySqlCommand command = new MySqlCommand("select filename, document_content from job_db.person_documents where doc_type = 'application/msword' limit 1;", conn);
using (MySqlDataReader reader = command.ExecuteReader())
{
// while there is another record present
while (reader.Read())
{
Byte[] bytData = (byte[])reader[1];
}
}
conn.Close();
}
There is an office talk article about reading open XML docs from memory.
https://msdn.microsoft.com/en-us/library/ee945362%28v=office.11%29.aspx
To access these methods, you need the Open XML SDK.
http://www.microsoft.com/en-au/download/details.aspx?id=30425
Hopefully that's enough to get you started.

Retrieve large file from database CLOB

I am working with asp.net and an Oracle SQL database.
I have a simple procedure in the database which returns a file (in this case an XML file) based on a given ID. In the .net application I open up an OracleConnection and read the file to a string with the OracleDataReader.
This works fine until the file size becomes very large (360mb), which causes an 'System.OutOfMemoryException' fault, which I am guessing happens because the process goes over 800mb of memory usage.
Is there a better way of retrieving the file or is it possible to increase the 800mb limit? Time is not an issue here.
Procedure in database
PROCEDURE get_xml(xml_id IN NUMBER,
p_records OUT SYS_REFCURSOR)
IS
BEGIN
OPEN p_records FOR
SELECT xml
FROM allxml
WHERE id = xml_id;
END get_xml;
c# code
using (OracleConnection oConn = new OracleConnection(ora_connection))
{
Procedure proc = null;
OracleParameter result = null;
oConn.Open();
OracleDataReader dr = null;
proc = Procedure.CreateProcedure("get_xml", oConn)
.Number("xml_id", id)
.RefCursor("p_records", out result)
.ExecuteReader(out dr);
if (dr.Read())
{
xml = dr.GetString(0);
}
}
XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
return doc;
As you can see I load the file into string and then create an xml so I can process it .
A best approach for multi-db app can be to persist on DB only a path/link to the file and store that large blob in file system.

how to read and write MP3 to database

how to read MP3 from Sql database. in sql i have stored the file as binary format. now i want to retrive the Mp3 file stored in the sql and show in my aspx page. how????
pls help...
In its simplest form this is how you would get the raw bytes, can't really show any more without knowing what you want it for...
private byte[] GetMp3Bytes(string connString)
{
SqlConnection conn = null;
SqlCommand cmd = null;
SqlDataReader reader = null;
using (conn = new SqlConnection(connString))
{
conn.Open();
using (cmd = new SqlCommand("SELECT TOP 1 Mp3_File FROM MP3_Table", conn))
using (reader = cmd.ExecuteReader())
{
reader.Read();
return reader["Mp3_File"] as byte[];
}
}
}
You'd probably want to use a Generic ASHX Handler that retrieves the binary data and streams it to the response stream with the correct content-type header ("audio/mpeg").
If you look at the article Displaying Images in ASP.NET Using HttpHandlers then you should see the basic principle. You just need to change the content-type output.

Categories