Stream data is lost when declaring new TextFieldParser()

Stream data is lost when declaring new TextFieldParser() - c#

I am passing a Stream from a csv file from my Controller to my business layer for processing. The stream makes it to the method okay but as soon as I declare my TextFieldParser, and pass in my stream, the data is gone and is therefore not processed.
public CsvRecordReportModel processCsvStream(Stream dataStream, RecordSource recordSource, string fileName)
{
//Create instance of the report.
var report = new CsvRecordReportModel();
report.InsertedRecordCount = 0;
report.FileName = fileName;
using (TextFieldParser csvParser = new TextFieldParser(dataStream))
{
csvParser.CommentTokens = new string[] {"#"};
csvParser.SetDelimiters(new string[] {","});
csvParser.HasFieldsEnclosedInQuotes = true;
// Skip the row with the column names
csvParser.ReadLine();
while (!csvParser.EndOfData)
{
//Do stuff
}
}
}

Related

how to create api to receive raw nmea messages

I have a requirement to create an API to receive GPS data from an IOT device and insert that data into a SQL table. Below is the raw data format. I have tried to create an API controller, but that will work only if the data format is Json.
$GPGLL,4826.67566,N,12322.19605,W,022314.000,A,A*4A
$GPGSA,A,3,30,29,10,21,24,26,15,,,,,,2.9,1.9,2.2*3D
$GPGST,022314.000,8.8,13.0,6.1,65.6,7.1,11.1,14.0*63
$GPGSV,3,1,11,05,09,179,,02,10,072,25,30,28,194,38,29,77,118,42*72
$GPGSV,3,2,11,10,42,059,36,16,24,315,27,21,45,256,43,24,84,024,40*79*
Can anyone help to solve this?
I have tried to create an API controller but that will work only if the data format is Json.
public class GPSStatusController : ApiController
{
public HttpResponseMessage Post([FromBody]GPSStatu Bts)
{
using (LocationEntities entities = new LocaionEntities())
{
var ins = new GPSStatu();
ins.ID = Bts.EvID;
ins.GPSData = Bts.GPSData;
entities.BatteryStatus.Add(ins);
entities.SaveChanges();
}
}
}

If the raw data is just a csv string then you can do:
public async Task<IActionResult> Post()
{
var list = new List<GPSStatu>()
MemoryStream mem = new MemoryStream();
await Request.Body.CopyToAsync(mem);
// parse Memorystream
var table = GenericCsvParse(mem,',',0,false,'"')
foreach (DataRow row in table.Rows)
{
list.Add(new GPSStatu{ID = row[0].toString(),.....}
}
}
public static DataTable GenericCsvParse(Stream fs, char delimiter, int skip, bool firstrowheader, char textqualifier)
{
using (StreamReader sr = new StreamReader(fs))
{
using (GenericParserAdapter parser = new GenericParserAdapter(sr)
{
ColumnDelimiter = delimiter,
SkipStartingDataRows = skip,
FirstRowHasHeader = firstrowheader,
TextQualifier = textqualifier
})
{
return parser.GetDataTable();
}
}
}
Here I used https://www.nuget.org/packages/GenericParsing, but you can do it also with plain split string.

How to improve performance of CSV upload via datatable

I have a working solution for uploading a CSV file. Currently, I use the IFormCollection for a user to upload multiple CSV files from a view.
The CSV files are saved as a temp file as follows:
List<string> fileLocations = new List<string>();
foreach (var formFile in files)
{
filePath = Path.GetTempFileName();
if (formFile.Length > 0)
{
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
}
fileLocations.Add(filePath);
}
I send the list of file locations to another method (just below). I loop through the file locations and stream the data from the temp files, I then use a data table and SqlBulkCopyto insert the data. I currently upload between 50 and 200 files at a time and each file is around 330KB. To insert a hundred, it takes around 6 minutes, which is around 30-35MB.
public void SplitCsvData(string fileLocation, Guid uid)
{
MetaDataModel MetaDatas;
List<RawDataModel> RawDatas;
var reader = new StreamReader(File.OpenRead(fileLocation));
List<string> listRows = new List<string>();
while (!reader.EndOfStream)
{
listRows.Add(reader.ReadLine());
}
var metaData = new List<string>();
var rawData = new List<string>();
foreach (var row in listRows)
{
var rowName = row.Split(',')[0];
bool parsed = int.TryParse(rowName, out int result);
if (parsed == false)
{
metaData.Add(row);
}
else
{
rawData.Add(row);
}
}
//Assigns the vertical header name and value to the object by splitting string
RawDatas = GetRawData.SplitRawData(rawData);
SaveRawData(RawDatas);
MetaDatas = GetMetaData.SplitRawData(rawData);
SaveRawData(RawDatas);
}
This code then passes the object to the to create the datatable and insert the data.
private DataTable CreateRawDataTable
{
get
{
var dt = new DataTable();
dt.Columns.Add("Id", typeof(int));
dt.Columns.Add("SerialNumber", typeof(string));
dt.Columns.Add("ReadingNumber", typeof(int));
dt.Columns.Add("ReadingDate", typeof(string));
dt.Columns.Add("ReadingTime", typeof(string));
dt.Columns.Add("RunTime", typeof(string));
dt.Columns.Add("Temperature", typeof(double));
dt.Columns.Add("ProjectGuid", typeof(Guid));
dt.Columns.Add("CombineDateTime", typeof(string));
return dt;
}
}
public void SaveRawData(List<RawDataModel> data)
{
DataTable dt = CreateRawDataTable;
var count = data.Count;
for (var i = 1; i < count; i++)
{
DataRow row = dt.NewRow();
row["Id"] = data[i].Id;
row["ProjectGuid"] = data[i].ProjectGuid;
row["SerialNumber"] = data[i].SerialNumber;
row["ReadingNumber"] = data[i].ReadingNumber;
row["ReadingDate"] = data[i].ReadingDate;
row["ReadingTime"] = data[i].ReadingTime;
row["CombineDateTime"] = data[i].CombineDateTime;
row["RunTime"] = data[i].RunTime;
row["Temperature"] = data[i].Temperature;
dt.Rows.Add(row);
}
using (var conn = new SqlConnection(connectionString))
{
conn.Open();
using (SqlTransaction tr = conn.BeginTransaction())
{
using (var sqlBulk = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, tr))
{
sqlBulk.BatchSize = 1000;
sqlBulk.DestinationTableName = "RawData";
sqlBulk.WriteToServer(dt);
}
tr.Commit();
}
}
}
Is there another way to do this or a better way to improve performance so that the time to upload is reduced as it can take a long time and I am seeing an ever increasing use of memory to around 500MB.
TIA

You can improve performance by removing the DataTable and reading from the input stream directly.
SqlBulkCopy has a WriteToServer overload that accepts an IDataReader instead of an entire DataTable.
CsvHelper can CSV files using a StreamReader as an input. It provides CsvDataReader as an IDataReader implementation on top of the CSV data. This allows reading directly from the input stream and writing to SqlBulkCopy.
The following method will read from an IFormFile, parse the stream using CsvHelper and use the CSV's fields to configure a SqlBulkCopy instance :
public async Task ToTable(IFormFile file, string table)
{
using (var stream = file.OpenReadStream())
using (var tx = new StreamReader(stream))
using (var reader = new CsvReader(tx))
using (var rd = new CsvDataReader(reader))
{
var headers = reader.Context.HeaderRecord;
var bcp = new SqlBulkCopy(_connection)
{
DestinationTableName = table
};
//Assume the file headers and table fields have the same names
foreach(var header in headers)
{
bcp.ColumnMappings.Add(header, header);
}
await bcp.WriteToServerAsync(rd);
}
}
This way nothing is ever written to a temp table or cached in memory. The uploaded files are parsed and written to the database directly.

In addition to #Panagiotis's answer, why don't you interleave your file processing with the file upload? Wrap up your file processing logic in an async method and change the loop to a Parallel.Foreach and process each file as it arrives instead of waiting for all of them?
private static readonly object listLock = new Object(); // only once at class level
List<string> fileLocations = new List<string>();
Parallel.ForEach(files, (formFile) =>
{
filePath = Path.GetTempFileName();
if (formFile.Length > 0)
{
using (var stream = new FileStream(filePath, FileMode.Create))
{
await formFile.CopyToAsync(stream);
}
await ProcessFileInToDbAsync(filePath);
}
// Added lock for thread safety of the List
lock (listLock)
{
fileLocations.Add(filePath);
}
});

Thanks to #Panagiotis Kanavos, I was able to work out what to do. Firstly, the way I was calling the methods, was leaving them in memory. The CSV file I have is in two parts, vertical metadata and then the usual horizontal information. So I needed to split them into two. Saving them as tmp files was also causing an overhead. It has gone from taking 5-6 minutes to now taking a minute, which for a 100 files containing 8,500 rows isn't bad I suppose.
Calling the method:
public async Task<IActionResult> UploadCsvFiles(ICollection<IFormFile> files, IFormCollection fc)
{
foreach (var f in files)
{
var getData = new GetData(_configuration);
await getData.SplitCsvData(f, uid);
}
return whatever;
}
This is the method doing the splitting:
public async Task SplitCsvData(IFormFile file, string uid)
{
var data = string.Empty;
var m = new List<string>();
var r = new List<string>();
var records = new List<string>();
using (var stream = file.OpenReadStream())
using (var reader = new StreamReader(stream))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var header = line.Split(',')[0].ToString();
bool parsed = int.TryParse(header, out int result);
if (!parsed)
{
m.Add(line);
}
else
{
r.Add(line);
}
}
}
//TODO: Validation
//This splits the list into the Meta data model. This is just a single object, with static fields.
var metaData = SplitCsvMetaData.SplitMetaData(m, uid);
DataTable dtm = CreateMetaData(metaData);
var serialNumber = metaData.LoggerId;
await SaveMetaData("MetaData", dtm);
//
var lrd = new List<RawDataModel>();
foreach (string row in r)
{
lrd.Add(new RawDataModel
{
Id = 0,
SerialNumber = serialNumber,
ReadingNumber = Convert.ToInt32(row.Split(',')[0]),
ReadingDate = Convert.ToDateTime(row.Split(',')[1]).ToString("yyyy-MM-dd"),
ReadingTime = Convert.ToDateTime(row.Split(',')[2]).ToString("HH:mm:ss"),
RunTime = row.Split(',')[3].ToString(),
Temperature = Convert.ToDouble(row.Split(',')[4]),
ProjectGuid = uid.ToString(),
CombineDateTime = Convert.ToDateTime(row.Split(',')[1] + " " + row.Split(',')[2]).ToString("yyyy-MM-dd HH:mm:ss")
});
}
await SaveRawData("RawData", lrd);
}
I then use a data table for the metadata (which takes 20 seconds for a 100 files) as I map the field names to the columns.
public async Task SaveMetaData(string table, DataTable dt)
{
using (SqlBulkCopy sqlBulk = new SqlBulkCopy(_configuration.GetConnectionString("DefaultConnection"), SqlBulkCopyOptions.Default))
{
sqlBulk.DestinationTableName = table;
await sqlBulk.WriteToServerAsync(dt);
}
}
I then use FastMember for the large data parts for the raw data, which is more like a traditional CSV.
public async Task SaveRawData(string table, IEnumerable<LogTagRawDataModel> lrd)
{
using (SqlBulkCopy sqlBulk = new SqlBulkCopy(_configuration.GetConnectionString("DefaultConnection"), SqlBulkCopyOptions.Default))
using (var reader = ObjectReader.Create(lrd, "Id","SerialNumber", "ReadingNumber", "ReadingDate", "ReadingTime", "RunTime", "Temperature", "ProjectGuid", "CombineDateTime"))
{
sqlBulk.DestinationTableName = table;
await sqlBulk.WriteToServerAsync(reader);
}
}
I am sure this can be improved on, but for now, this works really well.

Read text file from specific position and store in two arrays

I have text file which contains line like this:
#relation SMILEfeatures
#attribute pcm_LOGenergy_sma_range numeric
#attribute pcm_LOGenergy_sma_maxPos numeric
#attribute pcm_LOGenergy_sma_minPos numeric...
Where are about 6000 lines of these attributes, after attributes where are lines like this:
#data
1.283827e+01,3.800000e+01,2.000000e+00,5.331364e+00
1.850000e+02,4.054457e+01,4.500000e+01,3.200000e+01...
I need to seperate these strings in two different arrays. So far I only managed to store everything in one array.
Here is my code for storing in array:
using (var stream = new FileStream(filePath, FileMode.OpenOrCreate))
{
using (var sr = new StreamReader(stream))
{
String line;
while ((line = sr.ReadLine()) != null)
{
sb.AppendLine(line);
}
}
string allines = sb.ToString();
Console.WriteLine(sb);
}

All strings after #relation SMILEfeatures and contains #attribute are stored in first array. All the strings after #data should are stored in the second array. Hope this is what you wanted.
var relationLineNumbers = new List<int>();
var dataLineNumbers = new List<int>();
var relation = new StringBuilder();
var data = new List<string>();
using (var stream = new FileStream(filepath, FileMode.OpenOrCreate))
{
using (var sr = new StreamReader(stream))
{
string line;
bool isRelation = false;
bool isData = false;
int lineNumber = 0;
while ((line = sr.ReadLine()) != null)
{
lineNumber++;
if (line.StartsWith("#relation SMILEfeatures"))
{
isRelation = true;
isData = false;
continue;
}
if (line.StartsWith("#data"))
{
isData = true;
isRelation = false;
continue;
}
if (isRelation)
{
if (line.StartsWith("#attribute"))
{
relation.Append(line);
relationLineNumbers.Add(lineNumber);
}
}
if (isData)
{
data.AddRange(line.Split(','));
dataLineNumbers.Add(lineNumber);
}
}
}
Console.WriteLine("Relation");
Console.WriteLine(relation.ToString());
Console.WriteLine("Data");
data.ForEach(Console.WriteLine);

All strings which starts with #relation SMILEfeatures and contains #attribute should be stored in first array. Numbers which starts with #data should be stored in second array.
Use string.Contains() and string.StatsWith() for checking.
Read every line and decide in wich array / list you want to put this line
void ReadAndSortInArrays(string fileLocation)
{
List<string> noData = new List<string>();
List<string> Data = new List<string>();
using(StreamReader sr = new StreamReader(fileLocation))
{
string line;
while(!sr.EndOfStream)
{
line = sr.ReadLine();
if(line.StartsWith("#relation") && line.Contains("#attribute"))
{
noData.Add(line);
}
else if(line.StartsWith("#data")
{
Data.Add(line);
}
else
{
// This is stange
}
}
}
var noDataArray = noData.ToArray();
var DataArray = Data.ToArray();
}
But i think that not every line is beginning with "#data"
So you may want to Read all lines and do somethink like this:
string allLines;
using(StreamReader sr = new StreamReader(yourfile))
{
allLines = = sr.ReadToEnd();
}
var arrays = allLines.Split("#data");
// arrays[0] is the part before #data
// arrays[1] is the part after #data (the numbers)
// But array[1] does not contain #data

The question is not really very clear. But my take is, collect all lines that start with #relation or #attribute in one bucket, then collect all number lines in another bucket. I have chosen to ignore the #data lines, as they do not seem to contain any extra information.
Error checking may be performed by making sure that the data lines (i.e. number lines) contain comma separated lists of parsable numerical values.
var dataLines = new List<string>();
var relAttLines = new List<string>();
foreach (var line in File.ReadAllLines())
{
if (line.StartsWith("#relation") || line.StartsWith("#attribute"))
relAttLines.Add(line);
else if (line.StartsWith("#data"))
//ignore these
continue;
else
dataLines.Add(line);
}

How to put data from List<string []> to dataGridView

Try to put some data from List to dataGridView, but have some problem with it.
Currently have method, that return me required List - please see picture below
code
public List<string[]> ReadFromFileBooks()
{
List<string> myIdCollection = new List<string>();
List<string[]> resultColl = new List<string[]>();
if (chooise == "all")
{
if (File.Exists(filePath))
{
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
StreamReader sr = new StreamReader(fs);
string[] line = sr.ReadToEnd().Split(new string[] { Environment.NewLine },
StringSplitOptions.RemoveEmptyEntries);
foreach (string l in line)
{
string[] result = l.Split(',');
foreach (string element in result)
{
myIdCollection.Add(element);
}
resultColl.Add(new string[] { myIdCollection[0], myIdCollection[1], myIdCollection[2], myIdCollection[3] });
myIdCollection.Clear();
}
sr.Close();
return resultColl;
}
}
....
this return to me required data in requred form (like list from arrays).
After this, try to move it to the dataGridView, that already have 4 columns with names (because i'm sure, that no than 4 colums required) - please see pic below
Try to put data in to dataGridView using next code
private void radioButtonViewAll_CheckedChanged(object sender, EventArgs e)
{
TxtLibrary myList = new TxtLibrary(filePathBooks);
myList.chooise = "all";
//myList.ReadFromFileBooks();
DataTable table = new DataTable();
foreach (var array in myList.ReadFromFileBooks())
{
table.Rows.Add(array);
}
dataGridViewLibrary.DataSource = table;
}
But as result got error - "required more rows that exist in dataGridVIew", but accordint to what I'm see (pic above) q-ty of rows (4) equal q-ty of arrays element in List (4).
Try to check result by putting additional temp variables - but it's ok - please see pic below
Where I'm wrong? Maybe i use dataGridView not in correct way?
EDIT
example of file (simple csv)
11111, Author, Name, Categories
11341, Author1, Name1, Categories1

You need to add columns to your DataTable first before adding rows:
private void radioButtonViewAll_CheckedChanged(object sender, EventArgs e)
{
TxtLibrary myList = new TxtLibrary(filePathBooks);
myList.chooise = "all";
DataTable table = new DataTable();
//add columns first
table.Columns.Add("ID");
table.Columns.Add("Author");
table.Columns.Add("Caption");
table.Columns.Add("Categories");
//then add rows
foreach (var array in myList.ReadFromFileBooks()) {
table.Rows.Add(array);
}
dataGridViewLibrary.DataSource = table;
}

I think your code it's too complex. SImply, if you want see all data in the table from the file, you can do this
if (!System.IO.File.Exists("file.txt"))
return;
dgvDataGridView.ColumnCount = 4;
dgvDataGridView.Columns[0].HeaderCell.Value = "ID";
dgvDataGridView.Columns[1].HeaderCell.Value = "Author";
dgvDataGridView.Columns[2].HeaderCell.Value = "Caption";
dgvDataGridView.Columns[3].HeaderCell.Value = "Categories";
using (System.IO.StreamReader sr = new System.IO.StreamReader("file.txt"))
while (sr.Peek() > -1)
dgvDataGridView.Rows.Add(sr.ReadLine().Split(','));

How to append data in a serialized file on disk

I have a program written in C# that serializes data into binary and write it on the disk. If I want to add more data to this file, fist I have to deserialise whole file and then append more serialized data to it. Is it possible to append data to this serialized file without deserialising the existing data so that I can save some time during whole process?

You don't have to have to read all the data in the file to append data.
You can open it in append mode and write the data.
var fileStream = File.Open(fileName, FileMode.Append, FileAccess.Write, FileShare.Read);
var binaryWriter = new BinaryWriter(fileStream);
binaryWriter.Write(data);

Now that we know (comments) that we're talking about a DataTable/DataSet via BinaryFormatter, it becomes clearer. If your intention is for that to appear as extra rows in the existing table, then no: that isn't going to work. What you could do is append, but deserialize each table in turn, then manually merge the contents. That is probably your best bet with what you describe. Here's an example just using 2, but obviously you'd repeat the deserialize/merge until EOF:
var dt = new DataTable();
dt.Columns.Add("foo", typeof (int));
dt.Columns.Add("bar", typeof(string));
dt.RemotingFormat = SerializationFormat.Binary;
var ser = new BinaryFormatter();
using(var ms = new MemoryStream())
{
dt.Rows.Add(123, "abc");
ser.Serialize(ms, dt); // batch 1
dt.Rows.Clear();
dt.Rows.Add(456, "def");
ser.Serialize(ms, dt); // batch 2
ms.Position = 0;
var table1 = (DataTable) ser.Deserialize(ms);
// the following is the merge loop that you'd repeat until EOF
var table2 = (DataTable) ser.Deserialize(ms);
foreach(DataRow row in table2.Rows) {
table1.ImportRow(row);
}
// show the results
foreach(DataRow row in table1.Rows)
{
Console.WriteLine("{0}, {1}", row[0], row[1]);
}
}
However! Personally I have misgivings about both DataTable and BinaryFormatter. If you know what your data is, there are other techniques. For example, this could be done very simply with "protobuf", since protobuf is inherently appendable. In fact, you need to do extra to not append (although that is simple enough too):
[ProtoContract]
class Foo
{
[ProtoMember(1)]
public int X { get; set; }
[ProtoMember(2)]
public string Y { get; set; }
}
[ProtoContract]
class MyData
{
private readonly List<Foo> items = new List<Foo>();
[ProtoMember(1)]
public List<Foo> Items { get { return items; } }
}
then:
var batch1 = new MyData { Items = { new Foo { X = 123, Y = "abc" } } };
var batch2 = new MyData { Items = { new Foo { X = 456, Y = "def" } } };
using(var ms = new MemoryStream())
{
Serializer.Serialize(ms, batch1);
Serializer.Serialize(ms, batch2);
ms.Position = 0;
var merged = Serializer.Deserialize<MyData>(ms);
foreach(var row in merged.Items) {
Console.WriteLine("{0}, {1}", row.X, row.Y);
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Stream data is lost when declaring new TextFieldParser() - c#

Related

how to create api to receive raw nmea messages

How to improve performance of CSV upload via datatable

Read text file from specific position and store in two arrays

How to put data from List<string []> to dataGridView

How to append data in a serialized file on disk

Categories

Resources