I have been using DbfDataReader to read DBF files in my C# application. So far, I can read column name, column index, and iterate through the records successfully. There does not appear to be a way to read specific column data I'd like without using the column index. For example, I can get at the FIRSTNAME value with a statement like:
using DbfDataReader;
var dbfPath = "/CONTACTS.DBF";
using (var dbfTable = new DbfTable(dbfPath, EncodingProvider.UTF8))
{
var dbfRecord = new DbfRecord(dbfTable);
while (dbfTable.Read(dbfRecord))
{
Console.WriteLine(dbfRecord.Values[1].ToString()); // would prefer to use something like dbfRecord.Values["FIRSTNAME"].ToString()
Console.WriteLine(dbfRecord.Values[2].ToString()); // would prefer to use something like dbfRecord.Values["LASTNAME"].ToString()
}
}
Where 1 is the index of the FIRSTNAME column and 2 is the index of the LASTNAME column. Is there anyway to use "FIRSTNAME" (or the column name) as the key (or accessor) for what is essentially a name/value pair? My goal is to get all of the columns I care about without having to first build this map each time. (Please forgive me if the terms I am using are not exactly right).
Thanks so much for taking a look at this...
Use the DbfDataReader class as below:
var dbfPath = "/CONTACTS.DBF";
var options = new DbfDataReaderOptions
{
SkipDeletedRecords = true,
Encoding = EncodingProvider.UTF8
};
using (var dbfDataReader = new DbfDataReader.DbfDataReader(dbfPath, options))
{
while (dbfDataReader.Read())
{
Console.WriteLine(dbfDataReader["FIRSTNAME"])
Console.WriteLine(dbfDataReader["LASTNAME"])
}
}
I have been working on extracting text from a csv file and store the data in a string. But now, I would like to extract text from some of the specific columns and store the data in a string.I would like the wordDocContents variable to contain the specific columns and the data in those specific columns which is bank_account, bank_name, customer_name. Currently, my wordDocContents has the entire data from my csv file. Is there a way to filter out the specific columns and the data in those columns and store it in the variable wordDocContents. Thanks
Here is what I tried so far -
public void button1Clicked(object sender, EventArgs args)
{
button1.Text = "You clicked me";
var textExtractor = new TextExtractor();
var wordDocContents = textExtractor.Extract("t.csv");
Console.WriteLine(wordDocContents);
Console.ReadLine();
}
The contents of wordDocContents:-
ACCOUNT_NUMBER,CUSTOMER_NAMES,VALUE_DATE,BOOKING_DATE,TRANSACTION,ACCOUNT_TYPE,BALANCE_TYPE,REFERENCE,MONEY.OUT,MONEY.IN,RUNNING.BALANCE,BRANCH,EMAIL,ACTUAL.BALANCE,AVAILABLE.BALANCE
1000000001,TEST,,2847899,KES,Account,,,10/10/2016,9/11/2016,15181800,UPPER HILL BRANCH,another#yahoo.com,5403.75,5403.75,
1000000001,,9/11/2016,9/11/2016,Opening Balance,,,,,,4643.22,,,,,
1000000001,,12/10/2016,12/10/2016,Mobile Mpesa Transfer,,,,1533,,3110.22,,,,,
1000000001,,17-10-2016,17-10-2016,ATM Withdrawal,,,6.29006E+11,1000,,2110.22,,,,,
1000000001,,17-10-2016,17-10-2016,ATM Withdrawal,,,6.29118E+11,2000,,110.22,,,,,
1000000001,,17-10-2016,17-10-2016,Mobile Mpesa Transfer,,,,2083,,-1972.78,,,,,
1000000001,,17-10-2016,17-10-2016,Transfer from Mpesa,,,,0,4000,2027.22,,,,,
1000000001,,18-10-2016,18-10-2016,Mobile Mpesa Transfer,,,,333,,1694.22,,,,,
From my knowledge on how csv files are constructed. (Maybe post the first 2 lines of your output?)
string[] lines = wordDocContents.Split("\n");
string[] columns = lines[0].Split(",");
string[][] data = new string[lines.Length][columns.Length];
Now let's say customer_name is under columns[2], you can try to:
List<string> customerNames = new List<string>();
for (int i = 1; i < lines.Length; i++) {
customerNames.Add(data[i][2]);
}
Edit just saw the output, this code might need some adjusting for your particular case. I am not 100% sure if string.Split(",") works for multiple commas in a row, but it's worth a shot. Just change the [2] to whichever column you need.
It should be going from [0],[1],[2] etc.
I am new to Deedle. I searched everywhere looking for examples that can help me to complete the following task:
Index data frame using multiple columns (3 in the example - Date, ID and Title)
Add numeric columns in multiple data frames together (Sales column in the example)
Group and add together sales occurred on the same day
My current approach is given below. First of all - it does not work because of the missing values and I don't know how to handle them easily while adding data frames. Second - I wonder if there is a better more elegant way to do it.
// Remove unused columns
var df = dfRaw.Columns[new[] { "Date", "ID", "Title", "Sales" }];
// Index data frame using 3 columns
var dfIndexed = df.IndexRowsUsing(r => Tuple.Create(r.GetAs<DateTime>("Date"), r.GetAs<string>("ID"), r.GetAs<string>("Title")) );
// Remove indexed columns
dfIndexed.DropColumn("Date");
dfIndexed.DropColumn("ID");
dfIndexed.DropColumn("Title");
// Add data frames. Does not work as it will add only
// keys existing in both data frames
dfTotal += dfIndexed
Table 1
Date,ID,Title,Sales,Market
2014-03-01,ID1,Title1,1,US
2014-03-01,ID1,Title1,2,CA
2014-03-03,ID2,Title2,3,CA
Table 2
Date,ID,Title,Sales,Market
2014-03-02,ID1,Title1,2,US
2014-03-03,ID2,Title2,2,CA
Expected Results
Date,ID,Title,Sales
2014-03-01,ID1,Title1,3
2014-03-02,ID1,Title1,2
2014-03-03,ID2,Title2,5
I think that your approach with using tuples makes sense.
It is a bit unfortunate that there is no easy way to specify default values when adding!
The easiest solution I can think of is to realign both series to the same set of keys and use fill operation to provide defaults. Using simple series as an example, something like this should do the trick:
var allKeys = seris1.Keys.Union(series2.Keys);
var aligned1 = series1.Realign(allKeys).FillMissing(0.0);
var aligned2 = series2.Realign(allKeys).FillMissing(0.0);
var res = aligned1 + aligned2;
I am trying to retrieve data from an Excel spreadsheet using C#. The data in the spreadsheet has the following characteristics:
no column names are assigned
the rows can have varying column lengths
some rows are metadata, and these rows label the content of the columns in the next row
Therefore, the objects I need to construct will always have their name in the very first column, and its parameters are contained in the next columns. It is important that the parameter names are retrieved from the row above. An example:
row1|---------|FirstName|Surname|
row2|---Person|Bob------|Bloggs-|
row3|---------|---------|-------|
row4|---------|Make-----|Model--|
row5|------Car|Toyota---|Prius--|
So unfortunately the data is heterogeneous, and the only way to determine what rows "belong together" is to check whether the first column in the row is empty. If it is, then read all data in the row, and check which parameter names apply by checking the row above.
At first I thought the straightforward approach would be to simply loop through
1) the dataset containing all sheets, then
2) the datatables (i.e. sheets) and
3) the row.
However, I found that trying to extract this data with nested loops and if statements results in horrible, unreadable and inflexible code.
Is there a way to do this in LINQ ? I had a look at this article to start by filtering the empty rows between data but didn't really get anywhere. Could someone point me in the right direction with a few code snippets please ?
Thanks in advance !
hiro
I see that you've already accepted the answer, but I think that more generic solution is possible - using reflection.
Let say you got your data as a List<string[]> where each element in the list is an array of string with all cells from corresponding row.
List<string[]> data;
data = LoadData();
var results = new List<object>();
string[] headerRow;
var en = data.GetEnumerator();
while(en.MoveNext())
{
var row = en.Current;
if(string.IsNullOrEmpty(row[0]))
{
headerRow = row.Skip(1).ToArray();
}
else
{
Type objType = Type.GetType(row[0]);
object newItem = Activator.CreateInstance(objType);
for(int i = 0; i < headerRow.Length; i++)
{
objType.GetProperty(headerRow[i]).SetValue(newItem, row[i+1]);
}
results.Add(newItem);
}
}
I have a csv file I am going to read from disk. I do not know up front how many columns or the names of the columns.
Any thoughts on how I should represent the fields. Ideally I want to say something like,
string Val = DataStructure.GetValue(i,ColumnName).
where i is the ith Row.
Oh just as an aside I will be parsing using the TextFieldParser class
http://msdn.microsoft.com/en-us/library/cakac7e6(v=vs.90).aspx
That sounds as if you would need a DataTable which has a Rows and Columns property.
So you can say:
string Val = table.Rows[i].Field<string>(ColumnName);
A DataTable is a table of in-memory data. It can be used strongly typed (as suggested with the Field method) but actually it stores it's data as objects internally.
You could use this parser to convert the csv to a DataTable.
Edit: I've only just seen that you want to use the TextFieldParser. Here's a possible simple approach to convert a csv to a DataTable:
var table = new DataTable();
using (var parser = new TextFieldParser(File.OpenRead(path)))
{
parser.Delimiters = new[]{","};
parser.HasFieldsEnclosedInQuotes = true;
// load DataColumns from first line
String[] headers = parser.ReadFields();
foreach(var h in headers)
table.Columns.Add(h);
// load all other lines as data '
String[] fields;
while ((fields = parser.ReadFields()) != null)
{
table.Rows.Add().ItemArray = fields;
}
}
If the column names are in the first row read that and store in a Dictionary<string, int> that maps the column name to the column index.
You could then store the remaining rows in a simple structure like List<string[]>.
To get a column for a row you'd do csv[rowIndex][nameToIndex[ColumnName]];
nameToIndex[ColumnName] gets the column index from the name, csv[rowIndex] gets the row (string array) we want.
This could of course be wrapped in a class.
Use the csv parser if you want, but a text parser is something very easy to do by yourself if you need customization.
For you need, i would use one (or more) Dictionnary. At least one to have the PropertyString --> column index. And maybe the reverse one column index--> PropertyString if needed.
When i parse a file for csv, i usually put the result in a list while parsing, and then in an array once complete for speed reasons (List.ToArray()).