Create a custom method that reads Excel columns as is - c#

I have a property set below for which I need to ingest data from an Excel report with a column name of ENTRY NUMBER. Looking for a custom method that can take in a string so I can assign the spreadsheet name to property below.
public string EntryNumber { get; set; }
I tried looking into libraries similar to Xml.Seralization in that it uses XmlAttribute(nameof()) to differ the property name in the class to what is on the actual xml file and then apply it to an xlsx context.

Related

LinqToExcel returning blank rows for .csv files

I've been using LinqToExcel to import data from .xlsx files successfully for a while. Recently, however, I was sent a .csv file that I'm unable to read the data of.
Let's say that the file contains the following data:
Col1 Col2 Col3
A B C
D E F
I've created a class for mapping the columns as such:
public class Test
{
[ExcelColumn("Col1")]
public string Col1 { get; set; }
[ExcelColumn("Col2")]
public string Col2 { get; set; }
[ExcelColumn("Col3")]
public string Col3 { get; set; }
}
Then I try to read the data like so:
var test = from c in excel.Worksheet<Test>()
select c;
The query successfully returns two Test-objects, but all property values are null.
I even tried to read the data without class and header:
var test = from c in excel.WorksheetNoHeader()
select c;
In this case, the query also returns two rows, both with three cells/values. But again all of these values are null. What could be the issue here?
I should also note that the file opens and looks perfectly fine in Excel. Furthermore using StreamReader, I'm able to read all of its rows and values.
What type of data is in each of those columns? (string, numeric, ...)
According to Initializing the Microsoft Excel driver
TypeGuessRows
The number of rows to be checked for the data type. The data type is
determined given the maximum number of kinds of data found. If there
is a tie, the data type is determined in the following order: Number,
Currency, Date, Text, Boolean. If data is encountered that does not
match the data type guessed for the column, it is returned as a Null
value. On import, if a column has mixed data types, the entire column
will be cast according to the ImportMixedTypes setting. The default
number of rows to be checked is 8. Values are of type REG_DWORD.
See post Can I specify the data type for a column rather than letting linq-to-excel decide?
The post Setting TypeGuessRows for excel ACE Driver states how to change the value for TypeGuessRows.
When the driver determines that an Excel column contains text data,
the driver selects the data type (string or memo) based on the longest
value that it samples. If the driver does not discover any values
longer than 255 characters in the rows that it samples, it treats the
column as a 255-character string column instead of a memo column.
Therefore, values longer than 255 characters may be truncated. To
import data from a memo column without truncation, you must make sure
that the memo column in at least one of the sampled rows contains a
value longer than 255 characters, or you must increase the number of
rows sampled by the driver to include such a row. You can increase the
number of rows sampled by increasing the value of TypeGuessRows under
the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
registry key.
One more thing we need to keep in mind is that the registry
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
only applies to Excel 97- 2003. For Excel 2007 and higher version,
Excel Open XML (.XLSX extension) actually uses ACE OLE DB provider
rather JET provider. If you want to keep the file extension as .XLSX,
you need to modify the following registry key according to your Excel
version:
Excel 2007: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access
Connectivity Engine\Engines\Excel\TypeGuessRows Excel 2010:
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity
Engine\Engines\Excel\TypeGuessRows Excel 2013:
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity
Engine\Engines\Excel\TypeGuessRows
Did you try to materialize your query by calling ToList or ToArray at the end?
I tried to recreate your case and had no trouble reading the data from the Excel file using the following code snippet:
var excel = new ExcelQueryFactory(FilePath);
List<Test> tests = (
from c in excel.Worksheet<Test>()
select c
)
.ToList();
It returns two objects with all properties filled properly.
One minor thing, when I added ToList initially, I got the following exception:
The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine.'
Which according to what they say in the official docs seems reasonable since I was missing Microsoft Access Database Engine 2010 Distributable on my machine.

Epplus: How to make "LoadFromCollection" use [DisplayFormat()] attributes in model

I have an ActionResult (in an MVC 5 website) that successfully uses Epplus to export a data set to an Excel document.
In the ActionResult code block, I use code like this to format the column as a short date. WIthout this code, the date column appears as int values.
// format date columns as date
worksheet.Column(6).Style.Numberformat.Format =
DateTimeFormatInfo.CurrentInfo.ShortDatePattern;
However, in the Model, I already have this defined as an attribute. It works great on the View page, but doesn't format the column as a short date -- hence, the code shown above.
[Column(TypeName = "date")]
[DisplayName("Exit Date")]
**[DisplayFormat(DataFormatString = "{0:d}")]**
public DateTime ExitDate { get; set; }
To provide further context, I load my worksheet from a collection.
worksheet.Cells["A5"].LoadFromCollection(Collection: exportQuery, PrintHeaders: true);
Is there a way that I can extend LoadFromCollection so that that the worksheet not only loads the Collection content, but also the formatting attributes that exist in the model? The "DisplayName" attributes are collected, so is there a way to also collect "DisplayFormat" attributes without having to write separate code?
EPPlus unfortunately didn't implement support for DisplayFormat.
The library is open source, so you can find the full implementation of LoadFromCollection and its overloads on codeplex
To answer you question: I'm afraid you will have to write separate code for this.

c# CSVHelper read CSV with variable headers

First time using the csvReader - note it requires a custom class that defines the Headers found in the CSV file.
class DataRecord
{
//Should have properties which correspond to the Column Names in the file
public String Amount { get; set; }
public String InvoiceDate { get; set; }......
}
The example given then uses the class such:-
using (var sr = new StreamReader(#"C:\\Data\\Invoices.csv"))
{
var reader = new CsvReader(sr);
//CSVReader will now read the whole file into an enumerable
IEnumerable<DataRecord> records = reader.GetRecords<DataRecord>();
//First 5 records in CSV file will be printed to the Output Window
foreach (DataRecord record in records.Take(5))
{
Debug.Print("{0} {1}, {2}", record.Amount, record.InvoiceDate, ....);
}
Two questions :-
1. The app will be loading in files with differing headers so I need to be able to update this class on the fly - is this possible & how?
(I am able to extract the headers from the CSV file.)
CSV file is potentially multi millions of rows (gb size) so is this the best / most efficient way of importing the file.
Destination is a SQLite DB - debug line is used as example.
Thanks
The app will be loading in files with differing headers so I need to be able to update this class on the fly - is this possible & how?
Although it is definetely possible with reflecion or third part libraries, creating an object for a row will be inefficient for such a big files. Moreover, using C# for such a scenario is a bad idea (unless you have some business data transformation). I would consider something like this, or perhaps a SSIS package.

c# mongodb large file within a document

I am not entirely sure how GridFS works in MongoDB. All the examples I have seen currently seen just involve grabbing a file and uploading it to a db through the api, but I want to know
a) can you have large files embedded in your typical JSON style documents or do they have to be stored in their own special GridFS collection or db?
b) how can I handle this type of situation where I have an object which has some typical fields in it, strings ints etc but also has a collection of attachment files which could be anything from small txt files to fairly large video files?
for example
class bug
{
public int Id { get; protected set; }
public string Name { get; protected set; }
public string Description { get; protected set; }
public string StackTrace { get; protected set; }
public List<File> Attachments { get; protected set; } //pictures/videos of the bug in action or text files with user config data in it etc.
}
a) can you have large files embedded in your typical JSON style
documents or do they have to be stored in their own special GridFS
collection or db?
You can in case if file size don't goes above the mongodb document size limit in 16mb. But you will need serialize/deserialize and do another extra work yourself.
b) how can I handle this type of situation where I have an object
which has some typical fields in it, strings ints etc but also has a
collection of attachment files which could be anything from small txt
files to fairly large video files?
If you finally decided to store your attachments in mongodb, better way to go with gridfs. You can simple store file in gridfs, but in the Attachments collection store id of this file and any metadata (file name, size, etc.). Then you can easy get file content by id from inner Attachments collection.
Mongodb gridf is a simple layer above mongodb, that can split big files into chunks and store them in mongodb and also read files back from chunks. To get started with c# and gridfs read this answer.

Storing the RecordString with the FileHelper Class

We are using FileHelpers 2.0 in our project. I have my record defined and have the data being imported correctly. After getting my array of generic objects:
var engine = new FileHelperEngine<UserRecord>();
engine.ErrorManager.ErrorMode = ErrorMode.SaveAndContinue;
UserRecord[] importedUsers = engine.ReadFile(_filepath);
After getting the records that errored due to formatting issues, I am iterating through the importedUsers array and doing validation to see if the information being imported is valid.
If the data is not valid, I want to be able to log the entire string from the original record from my file.
Is there a way to store the entire "RecordString" in the UserRecord class when the FileHelperEngine reads each of the records?
We do that often at work handling the BeforeRead event and storing it in a field mOriginalString that is marked this way:
[FieldNotInFile]
public string mOriginalString;
You must use the last version of the library from here:
http://teamcity.codebetter.com/repository/download/bt65/20313:id/FileHelpers_2.9.9_ReleaseBuild.zip
Cheers

Categories