Serialization using Memory Stream uses a lot of memory - c#

I would like some tips to improve the memory efficiency of my application when I serialize and deserialize an object using a memory stream.
For this example I would like to convert a class which contains a large DataTable into bytes in order to send it over TCP.
Lets assume i have the following class which i want to serialize:
[Serializable]
public class DataContainer
{
public string TableName { get; set; }
public DataTable DataTableData { get; set; }
}
And the following Form application:
1) Create a table and store it in a DataContainer
2) Serialize the DataContainer
3) Deserialize the DataContainer
public partial class SerialiseDesirialise : Form
{
private DataContainer dc;
private byte[] byteSD;
public SerialiseDesirialise()
{
InitializeComponent();
dc = new DataContainer();
}
private void runBtn_Click(object sender, EventArgs e)
{
dc.TableName = "Memory Usage Test";
CreateTable();
SerialiseObj();
DeserialiseObj();
dc = null;
byteSD = null;
int k = 0;
}
private void CreateTable()
{
DataTable dt = new DataTable();
dt.Columns.Add("Column 1", typeof(string));
dt.Columns.Add("Column 2", typeof(string));
dt.Columns.Add("Column 3", typeof(string));
string one = new Guid().ToString();
string two = new Guid().ToString();
string three = new Guid().ToString();
for (int i = 0; i < 1000000; i++)
{
dt.Rows.Add(one, two, three);
}
dc.DataTableData = dt;
}
private void SerialiseObj()
{
BinaryFormatter f = new BinaryFormatter();
using (MemoryStream ms = new MemoryStream())
{
f.Serialize(ms, dc);
byteSD = new byte[ms.Length];
byteSD = ms.ToArray();
ms.Dispose();
}
}
private void DeserialiseObj()
{
BinaryFormatter f = new BinaryFormatter();
using (MemoryStream ms = new MemoryStream())
{
ms.Write(byteSD, 0, byteSD.Length);
ms.Position = 0;
DataContainer _dc = f.Deserialize(ms) as DataContainer;
_dc = null;
ms.Dispose();
}
}
}
I recorded the following Process Memory values:
1) When I run the application the Process Memory = 17MB
2) When CreateTable() is completed the Process Memory = 141MB (which s understandable since its a big table)
3) When the line f.Serialize(ms, dc) is completed the Process Memory = 3GB (why? i would expect a much smaller value since the ms.Length is 338779361 bytes which is equal to 338MB)
4) After SerialiseObj() is completed the Process Memory = 1.6GB
5) Again when entering DeserialiseObj() the Process Memory reaches 3GB and drops to 1.6GB
6) Finally after the whole code is completed even if I set every variable to null the Process Memory = 1.6GB (why does not drop to 17MB?)
I was wondering if you could explain to me why the above occurs and how can I improve my application by not reaching so high Process Memory and returning to initial levels when the code is completed.

Related

CS1503 Argument 1: cannot convert from 'BoughtUpgrade.AvailableUpgrade' to 'AvailableUpgrade'

I am making Unity Engine based game.
I have 2 classes. One is "BoughtUpgrade" the other is "RotateBase".
Inside "RotateBase" I am trying to save some data and store it inside file "/savedBoughtUpgrades.doma". But when I try to add an object inside List<BoughtUpgrade.AvailableUpgrade> availableUpgradeList which have the same attributes as List availableUpgradeList. How to convert from one type to another?
Here is my code:
public class RotateBase : MonoBehaviour
{
public List<AvailableUpgrade> availableUpgradeList = new List <AvailableUpgrade>();
[System.Serializable]
public class SaveDataUpgrades
{
public float coins;
public List<BoughtUpgrade.AvailableUpgrade> availableUpgradeList = new List<BoughtUpgrade.AvailableUpgrade>();
}
public void Load()
{
string fileLocation02 = Application.persistentDataPath + "/savedBoughtUpgrades.doma";
if (File.Exists(fileLocation02))
{
BinaryFormatter bf = new BinaryFormatter();
FileStream stream = new FileStream(fileLocation02, FileMode.Open);
stream.Position = 0;
SaveDataUpgrades data = (SaveDataUpgrades)bf.Deserialize(stream);
stream.Close();
coins = data.coins;
//Reads the list and stores values in the original List
for (int i = 0; i < 10; i++)
{
availableUpgradeList.Add(data.availableUpgradeList[i]); //Error CS1503 shows up here
}
}
else if (!File.Exists(fileLocation02))
{
SaveDataUpgrades saveData = new SaveDataUpgrades();
for (int i = 0; i < 10; i++)
{
AvailableUpgrade availableUpgrade = new AvailableUpgrade();
availableUpgrade.alreadyBoughtUpgradeName = "Tsoko";
availableUpgrade.alredyBoughtUpgradeAmount = 0;
availableUpgradeList.Add(availableUpgrade);
}
saveData.coins = coins + points;
BinaryFormatter bf = new BinaryFormatter();
FileStream stream = new FileStream(fileLocation02, FileMode.Create);
bf.Serialize(stream, saveData);
stream.Close();
}
}
}
and here is my other class
public class BoughtUpgrade : MonoBehaviour
{
[System.Serializable]
public class AvailableUpgrade
{
public float alreadyBoughtUpgradePrice;
public float alredyBoughtUpgradeAmount;
public string alreadyBoughtUpgradeName;
}
public RotateBase.SaveDataUpgrades saveData;
public List<AvailableUpgrade> availableUpgradeList = new List<AvailableUpgrade>();
public float coins;
public void Save()
{
RotateBase.SaveDataUpgrades saveData = new RotateBase.SaveDataUpgrades();
//saveData.availableUpgradeList = availableUpgradeList;
//saveData.availableUpgradeList = availableUpgradeList;
string fileLocation02 = Application.persistentDataPath + "/savedBoughtUpgrades.doma";
BinaryFormatter bf = new BinaryFormatter();
FileStream stream = new FileStream(fileLocation02, FileMode.Create);
bf.Serialize(stream, saveData);
stream.Close();
}
public void Load()
{
string fileLocation02 = Application.persistentDataPath + "/savedBoughtUpgrades.doma";
if (File.Exists(fileLocation02))
{
BinaryFormatter bf = new BinaryFormatter();
FileStream stream = new FileStream(fileLocation02, FileMode.Open);
stream.Position = 0;
RotateBase.SaveDataUpgrades data = (RotateBase.SaveDataUpgrades)bf.Deserialize(stream);
stream.Close();
//coins = data.coins;
//Reads the list and stores values in the original List
for (int i = 0; i < 10; i++)
{
//availableUpgradeList.Add(data.availableUpgradeList[i]);
availableUpgradeList.Add(data.availableUpgradeList[i]);
}
}
else if (!File.Exists(fileLocation02))
{
RotateBase.SaveDataUpgrades saveData = new RotateBase.SaveDataUpgrades();
for (int i = 0; i < 10; i++)
{
BoughtUpgrade.AvailableUpgrade availableUpgrade = new BoughtUpgrade.AvailableUpgrade();
availableUpgrade.alreadyBoughtUpgradeName = "Tsoko";
availableUpgrade.alredyBoughtUpgradeAmount = 0;
availableUpgradeList.Add(availableUpgrade);
}
for (int i = 0; i < availableUpgradeList.Count; i++)
{
saveData.availableUpgradeList.Add(availableUpgradeList[i]);
}
saveData.coins = coins;
BinaryFormatter bf = new BinaryFormatter();
FileStream stream = new FileStream(fileLocation02, FileMode.Create);
bf.Serialize(stream, saveData);
stream.Close();
}
}
}
There are a lot of ways to do this, but few common ones that come to mind are:
Write a method to convert one object to another (there is a lot of info about this online)
Brute force example:
var availableUpgrades = originList.Select(x => new AvailableUpgrade() { alreadyBoughtUpgradePrice = x.alreadyBoughtUpgradePrice, alredyBoughtUpgradeAmount = x.alredyBoughtUpgradeAmount, alreadyBoughtUpgradeName = x.alreadyBoughtUpgradeName}).ToList();
Use some existing mapping nuget/lib to do this

How to improve performance of CSV upload via datatable

I have a working solution for uploading a CSV file. Currently, I use the IFormCollection for a user to upload multiple CSV files from a view.
The CSV files are saved as a temp file as follows:
List<string> fileLocations = new List<string>();
foreach (var formFile in files)
{
filePath = Path.GetTempFileName();
if (formFile.Length > 0)
{
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
}
fileLocations.Add(filePath);
}
I send the list of file locations to another method (just below). I loop through the file locations and stream the data from the temp files, I then use a data table and SqlBulkCopyto insert the data. I currently upload between 50 and 200 files at a time and each file is around 330KB. To insert a hundred, it takes around 6 minutes, which is around 30-35MB.
public void SplitCsvData(string fileLocation, Guid uid)
{
MetaDataModel MetaDatas;
List<RawDataModel> RawDatas;
var reader = new StreamReader(File.OpenRead(fileLocation));
List<string> listRows = new List<string>();
while (!reader.EndOfStream)
{
listRows.Add(reader.ReadLine());
}
var metaData = new List<string>();
var rawData = new List<string>();
foreach (var row in listRows)
{
var rowName = row.Split(',')[0];
bool parsed = int.TryParse(rowName, out int result);
if (parsed == false)
{
metaData.Add(row);
}
else
{
rawData.Add(row);
}
}
//Assigns the vertical header name and value to the object by splitting string
RawDatas = GetRawData.SplitRawData(rawData);
SaveRawData(RawDatas);
MetaDatas = GetMetaData.SplitRawData(rawData);
SaveRawData(RawDatas);
}
This code then passes the object to the to create the datatable and insert the data.
private DataTable CreateRawDataTable
{
get
{
var dt = new DataTable();
dt.Columns.Add("Id", typeof(int));
dt.Columns.Add("SerialNumber", typeof(string));
dt.Columns.Add("ReadingNumber", typeof(int));
dt.Columns.Add("ReadingDate", typeof(string));
dt.Columns.Add("ReadingTime", typeof(string));
dt.Columns.Add("RunTime", typeof(string));
dt.Columns.Add("Temperature", typeof(double));
dt.Columns.Add("ProjectGuid", typeof(Guid));
dt.Columns.Add("CombineDateTime", typeof(string));
return dt;
}
}
public void SaveRawData(List<RawDataModel> data)
{
DataTable dt = CreateRawDataTable;
var count = data.Count;
for (var i = 1; i < count; i++)
{
DataRow row = dt.NewRow();
row["Id"] = data[i].Id;
row["ProjectGuid"] = data[i].ProjectGuid;
row["SerialNumber"] = data[i].SerialNumber;
row["ReadingNumber"] = data[i].ReadingNumber;
row["ReadingDate"] = data[i].ReadingDate;
row["ReadingTime"] = data[i].ReadingTime;
row["CombineDateTime"] = data[i].CombineDateTime;
row["RunTime"] = data[i].RunTime;
row["Temperature"] = data[i].Temperature;
dt.Rows.Add(row);
}
using (var conn = new SqlConnection(connectionString))
{
conn.Open();
using (SqlTransaction tr = conn.BeginTransaction())
{
using (var sqlBulk = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, tr))
{
sqlBulk.BatchSize = 1000;
sqlBulk.DestinationTableName = "RawData";
sqlBulk.WriteToServer(dt);
}
tr.Commit();
}
}
}
Is there another way to do this or a better way to improve performance so that the time to upload is reduced as it can take a long time and I am seeing an ever increasing use of memory to around 500MB.
TIA
You can improve performance by removing the DataTable and reading from the input stream directly.
SqlBulkCopy has a WriteToServer overload that accepts an IDataReader instead of an entire DataTable.
CsvHelper can CSV files using a StreamReader as an input. It provides CsvDataReader as an IDataReader implementation on top of the CSV data. This allows reading directly from the input stream and writing to SqlBulkCopy.
The following method will read from an IFormFile, parse the stream using CsvHelper and use the CSV's fields to configure a SqlBulkCopy instance :
public async Task ToTable(IFormFile file, string table)
{
using (var stream = file.OpenReadStream())
using (var tx = new StreamReader(stream))
using (var reader = new CsvReader(tx))
using (var rd = new CsvDataReader(reader))
{
var headers = reader.Context.HeaderRecord;
var bcp = new SqlBulkCopy(_connection)
{
DestinationTableName = table
};
//Assume the file headers and table fields have the same names
foreach(var header in headers)
{
bcp.ColumnMappings.Add(header, header);
}
await bcp.WriteToServerAsync(rd);
}
}
This way nothing is ever written to a temp table or cached in memory. The uploaded files are parsed and written to the database directly.
In addition to #Panagiotis's answer, why don't you interleave your file processing with the file upload? Wrap up your file processing logic in an async method and change the loop to a Parallel.Foreach and process each file as it arrives instead of waiting for all of them?
private static readonly object listLock = new Object(); // only once at class level
List<string> fileLocations = new List<string>();
Parallel.ForEach(files, (formFile) =>
{
filePath = Path.GetTempFileName();
if (formFile.Length > 0)
{
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
await ProcessFileInToDbAsync(filePath);
}
// Added lock for thread safety of the List
lock (listLock)
{
fileLocations.Add(filePath);
}
});
Thanks to #Panagiotis Kanavos, I was able to work out what to do. Firstly, the way I was calling the methods, was leaving them in memory. The CSV file I have is in two parts, vertical metadata and then the usual horizontal information. So I needed to split them into two. Saving them as tmp files was also causing an overhead. It has gone from taking 5-6 minutes to now taking a minute, which for a 100 files containing 8,500 rows isn't bad I suppose.
Calling the method:
public async Task<IActionResult> UploadCsvFiles(ICollection<IFormFile> files, IFormCollection fc)
{
foreach (var f in files)
{
var getData = new GetData(_configuration);
await getData.SplitCsvData(f, uid);
}
return whatever;
}
This is the method doing the splitting:
public async Task SplitCsvData(IFormFile file, string uid)
{
var data = string.Empty;
var m = new List<string>();
var r = new List<string>();
var records = new List<string>();
using (var stream = file.OpenReadStream())
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var header = line.Split(',')[0].ToString();
bool parsed = int.TryParse(header, out int result);
if (!parsed)
{
m.Add(line);
}
else
{
r.Add(line);
}
}
}
//TODO: Validation
//This splits the list into the Meta data model. This is just a single object, with static fields.
var metaData = SplitCsvMetaData.SplitMetaData(m, uid);
DataTable dtm = CreateMetaData(metaData);
var serialNumber = metaData.LoggerId;
await SaveMetaData("MetaData", dtm);
//
var lrd = new List<RawDataModel>();
foreach (string row in r)
{
lrd.Add(new RawDataModel
{
Id = 0,
SerialNumber = serialNumber,
ReadingNumber = Convert.ToInt32(row.Split(',')[0]),
ReadingDate = Convert.ToDateTime(row.Split(',')[1]).ToString("yyyy-MM-dd"),
ReadingTime = Convert.ToDateTime(row.Split(',')[2]).ToString("HH:mm:ss"),
RunTime = row.Split(',')[3].ToString(),
Temperature = Convert.ToDouble(row.Split(',')[4]),
ProjectGuid = uid.ToString(),
CombineDateTime = Convert.ToDateTime(row.Split(',')[1] + " " + row.Split(',')[2]).ToString("yyyy-MM-dd HH:mm:ss")
});
}
await SaveRawData("RawData", lrd);
}
I then use a data table for the metadata (which takes 20 seconds for a 100 files) as I map the field names to the columns.
public async Task SaveMetaData(string table, DataTable dt)
{
using (SqlBulkCopy sqlBulk = new SqlBulkCopy(_configuration.GetConnectionString("DefaultConnection"), SqlBulkCopyOptions.Default))
{
sqlBulk.DestinationTableName = table;
await sqlBulk.WriteToServerAsync(dt);
}
}
I then use FastMember for the large data parts for the raw data, which is more like a traditional CSV.
public async Task SaveRawData(string table, IEnumerable<LogTagRawDataModel> lrd)
{
using (SqlBulkCopy sqlBulk = new SqlBulkCopy(_configuration.GetConnectionString("DefaultConnection"), SqlBulkCopyOptions.Default))
using (var reader = ObjectReader.Create(lrd, "Id","SerialNumber", "ReadingNumber", "ReadingDate", "ReadingTime", "RunTime", "Temperature", "ProjectGuid", "CombineDateTime"))
{
sqlBulk.DestinationTableName = table;
await sqlBulk.WriteToServerAsync(reader);
}
}
I am sure this can be improved on, but for now, this works really well.

Take 1gb ram to parse json object and give System.OutOfMemoryException after performing any other filter [duplicate]

This question already has answers here:
How to parse huge JSON file as stream in Json.NET?
(5 answers)
Closed 4 years ago.
public void ReadJsonFile()
{
try
{
string json = string.Empty;
using (StreamReader r = new StreamReader(val))
{
json = r.ReadToEnd();
var test = JObject.Parse(json);
JArray items = (JArray)test["locations"];
int length = items.Count;
data = new List<Info>();
for (int i = 0; i < items.Count; i++)
{
var d = test["locations"][i]["timestampMs"];
double dTimeSpan = Convert.ToDouble(d);
DateTime dtReturn = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc).AddSeconds(Math.Round(dTimeSpan / 1000d)).ToLocalTime();
string printDate = dtReturn.DayOfWeek.ToString() + "," + " " + dtReturn.ToShortDateString() + " " + dtReturn.ToShortTimeString();
day = dtReturn.DayOfWeek.ToString();
date = dtReturn.ToShortDateString();
time = dtReturn.ToShortTimeString();
var e = test["locations"][i]["latitudeE7"];
var f = test["locations"][i]["longitudeE7"];
var n = test["locations"][i]["accuracy"];
accuracy = n.ToString();
// getLocationByGeoLocation(e.ToString(), f.ToString());
var g = test["locations"][i]["activity"] != null;
if (g == true)
{
JArray items1 = (JArray)test["locations"][i]["activity"];
int length1 = items1.Count;
while (j < items1.Count)
{
if (j == 0)
{
var h = test["locations"][i]["activity"][j]["activity"][j]["type"];
type = h.ToString();
j = 1;
}
else { }
j++;
}
j = 0;
}
else { }
Info ddm = new Info(day, date, time, lat, longi, address, accuracy, type);
data.Add(ddm);
type = "";
}
}
return;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
I am trying to parse JSON file. val is the name of my file to parse. Using StreamReader I am reading each line when I am trying to parse using jObject I will take around 1gb memory and give me System.OutOfMemoryException how can I parse JObject using small memory.
Please help me with this I don't have much idea of JSON.
Please read about JSON thoroughly. NewtonSof.JSON is are very famous library and it is well documented. Let us get back to your problem. As mentioned in comments you have lots of unnecessary middle steps while trying to parse your file. Moreover, you are trying to parse a big file in one go!. First thing first, this is the layout for your JSON
public partial class Data
{
[JsonProperty("locations")]
public Location[] Locations { get; set; }
}
public partial class Location
{
[JsonProperty("timestampMs")]
public string TimestampMs { get; set; }
[JsonProperty("latitudeE7")]
public long LatitudeE7 { get; set; }
[JsonProperty("longitudeE7")]
public long LongitudeE7 { get; set; }
[JsonProperty("accuracy")]
public long Accuracy { get; set; }
}
And while you are deserilazing you should do it object by object, not all at once
The following assumes that your stream is made of Data type of objects, if it is, made up of Location type of objects you have to change it
using (StreamReader streamReader = new StreamReader(val))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;
var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
var data = serializer.Deserialize<Data>(reader);
//data.locations etc etc..
}
}
}
I could fix the System.OutOfMemoryException with the following steps:
If you are not using Visual Studio Hosting Process:
Uncheck the option:
Project->Properties->Debug->Enable the Visual Studio Hosting Process
if still the problem remains:
Go to Project->Properties->Build Events->Post-Build Event Command line and paste this 2 lines
call "$(DevEnvDir)..\..\vc\vcvarsall.bat" x86
"$(DevEnvDir)..\..\vc\bin\EditBin.exe" "$(TargetPath)" /LARGEADDRESSAWARE
Now, build the project

Memory leak when reading from file

I have this method to read from a .dbf file:
public DataTable ReadBulkDBF(string dbfFile, Dictionary<string, string> columnKeys, int maxRows, string dynamicValue, int nextId)
{
long start = DateTime.Now.Ticks;
DataTable dt = new DataTable();
BinaryReader recReader;
string number;
string year;
string month;
string day;
long lDate;
long lTime;
DataRow row;
int fieldIndex;
bool foundLastColumn = false;
List<string> keys = new List<string>(columnKeys.Keys);
List<string> values = new List<string>(columnKeys.Values);
// For testing purposes
int rowCount = 0;
// If there isn't even a file, just return an empty DataTable
if ((!File.Exists(dbfFile)))
{
return dt;
}
BinaryReader br = null;
try
{
// Will allow shared open as long as the other application using it allows it too.
// Read the header into a buffer
br = new BinaryReader(File.Open(dbfFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite));
byte[] buffer = br.ReadBytes(Marshal.SizeOf(typeof(DBFHeader)));
// Marshall the header into a DBFHeader structure
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
DBFHeader header = (DBFHeader)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(DBFHeader));
handle.Free();
// Read in all the field descriptors. Per the spec, 13 (0D) marks the end of the field descriptors
ArrayList fields = new ArrayList();
while ((13 != br.PeekChar()))
{
buffer = br.ReadBytes(Marshal.SizeOf(typeof(FieldDescriptor)));
handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
fields.Add((FieldDescriptor)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(FieldDescriptor)));
handle.Free();
}
// Read in the first row of records, we need this to help determine column types below
((FileStream)br.BaseStream).Seek(header.headerLen + 1, SeekOrigin.Begin);
buffer = br.ReadBytes(header.recordLen);
recReader = new BinaryReader(new MemoryStream(buffer));
// Create the columns in our new DataTable
DataColumn col = null;
dt.Columns.Add(new DataColumn("updateId", typeof(int)));
if (!dbfFile.Contains("con_compania")) { dt.Columns.Add(new DataColumn("dynamic", typeof(string))); }
dt.Columns.Add(new DataColumn("fechasync", typeof(DateTime)));
foreach (FieldDescriptor field in fields)
{
// Adds columns to DataTable dt
}
// Skip past the end of the header.
((FileStream)br.BaseStream).Seek(header.headerLen, SeekOrigin.Begin);
// Read in all the records
for (int counter = 0; counter < header.numRecords && dt.Rows.Count < maxRows; counter++)
{
// First we'll read the entire record into a buffer and then read each field from the buffer
// This helps account for any extra space at the end of each record and probably performs better
buffer = br.ReadBytes(header.recordLen);
recReader = new BinaryReader(new MemoryStream(buffer));
// All dbf field records begin with a deleted flag field. Deleted - 0x2A (asterisk) else 0x20 (space)
if (recReader.ReadChar() == '*')
{
continue;
}
// Loop through each field in a record
fieldIndex = 2;
rowCount = dt.Rows.Count;
row = dt.NewRow();
foreach (FieldDescriptor field in fields)
{
switch (field.fieldType)
{
// Casts field's value according to its type and saves it in the dt.
}
fieldIndex++;
}
// Looks for key-value combination in every row until
// it finds it to know where to start reading the new rows.
if (!foundLastColumn && columnKeys.Keys.Count > 0)
{
foundLastColumn = true;
int i = 3;
if (dbfFile.Contains("con_compania")) { i = 2; }
for (; i < keys.Count && foundLastColumn; i++)
{
if (!row[keys[i]].ToString().Equals(values[i]))
{
foundLastColumn = false;
}
}
}
else
{
dt.Rows.Add(row);
nextId++;
}
}
}
catch (Exception e)
{
throw e;
}
finally
{
if (null != br)
{
br.Close();
br.Dispose();
}
}
long count = DateTime.Now.Ticks - start;
return dt;
}
The problem is somewhere I am leaving some kind of reference to this, so I'm getting OOM.
The method is called with something like:
DataTable dt = new ParseDBF().ReadBulkDBF(...);
//Use dt
dt.Dispose();
dt = null;
If I only call Dispose() it keeps the reference and if I call null dt becomes null, but the reference to the ParseDBF object is still there somewhere.
Any idea where the leak might be? I have looked all over the internet for ideas and tried calling Dispose() and Close(), and setting as null everything I can think of after I use it and it keeps happening.
I notice that recreader may not be getting freed.
I would strongly suggest making use of using blocks within this code to ensure that IDisposable objects are cleaned when execution leaves the using scope.

How to append data in a serialized file on disk

I have a program written in C# that serializes data into binary and write it on the disk. If I want to add more data to this file, fist I have to deserialise whole file and then append more serialized data to it. Is it possible to append data to this serialized file without deserialising the existing data so that I can save some time during whole process?
You don't have to have to read all the data in the file to append data.
You can open it in append mode and write the data.
var fileStream = File.Open(fileName, FileMode.Append, FileAccess.Write, FileShare.Read);
var binaryWriter = new BinaryWriter(fileStream);
binaryWriter.Write(data);
Now that we know (comments) that we're talking about a DataTable/DataSet via BinaryFormatter, it becomes clearer. If your intention is for that to appear as extra rows in the existing table, then no: that isn't going to work. What you could do is append, but deserialize each table in turn, then manually merge the contents. That is probably your best bet with what you describe. Here's an example just using 2, but obviously you'd repeat the deserialize/merge until EOF:
var dt = new DataTable();
dt.Columns.Add("foo", typeof (int));
dt.Columns.Add("bar", typeof(string));
dt.RemotingFormat = SerializationFormat.Binary;
var ser = new BinaryFormatter();
using(var ms = new MemoryStream())
{
dt.Rows.Add(123, "abc");
ser.Serialize(ms, dt); // batch 1
dt.Rows.Clear();
dt.Rows.Add(456, "def");
ser.Serialize(ms, dt); // batch 2
ms.Position = 0;
var table1 = (DataTable) ser.Deserialize(ms);
// the following is the merge loop that you'd repeat until EOF
var table2 = (DataTable) ser.Deserialize(ms);
foreach(DataRow row in table2.Rows) {
table1.ImportRow(row);
}
// show the results
foreach(DataRow row in table1.Rows)
{
Console.WriteLine("{0}, {1}", row[0], row[1]);
}
}
However! Personally I have misgivings about both DataTable and BinaryFormatter. If you know what your data is, there are other techniques. For example, this could be done very simply with "protobuf", since protobuf is inherently appendable. In fact, you need to do extra to not append (although that is simple enough too):
[ProtoContract]
class Foo
{
[ProtoMember(1)]
public int X { get; set; }
[ProtoMember(2)]
public string Y { get; set; }
}
[ProtoContract]
class MyData
{
private readonly List<Foo> items = new List<Foo>();
[ProtoMember(1)]
public List<Foo> Items { get { return items; } }
}
then:
var batch1 = new MyData { Items = { new Foo { X = 123, Y = "abc" } } };
var batch2 = new MyData { Items = { new Foo { X = 456, Y = "def" } } };
using(var ms = new MemoryStream())
{
Serializer.Serialize(ms, batch1);
Serializer.Serialize(ms, batch2);
ms.Position = 0;
var merged = Serializer.Deserialize<MyData>(ms);
foreach(var row in merged.Items) {
Console.WriteLine("{0}, {1}", row.X, row.Y);
}
}

Categories