I have been using DbfDataReader to read DBF files in my C# application. So far, I can read column name, column index, and iterate through the records successfully. There does not appear to be a way to read specific column data I'd like without using the column index. For example, I can get at the FIRSTNAME value with a statement like:
using DbfDataReader;
var dbfPath = "/CONTACTS.DBF";
using (var dbfTable = new DbfTable(dbfPath, EncodingProvider.UTF8))
{
var dbfRecord = new DbfRecord(dbfTable);
while (dbfTable.Read(dbfRecord))
{
Console.WriteLine(dbfRecord.Values[1].ToString()); // would prefer to use something like dbfRecord.Values["FIRSTNAME"].ToString()
Console.WriteLine(dbfRecord.Values[2].ToString()); // would prefer to use something like dbfRecord.Values["LASTNAME"].ToString()
}
}
Where 1 is the index of the FIRSTNAME column and 2 is the index of the LASTNAME column. Is there anyway to use "FIRSTNAME" (or the column name) as the key (or accessor) for what is essentially a name/value pair? My goal is to get all of the columns I care about without having to first build this map each time. (Please forgive me if the terms I am using are not exactly right).
Thanks so much for taking a look at this...
Use the DbfDataReader class as below:
var dbfPath = "/CONTACTS.DBF";
var options = new DbfDataReaderOptions
{
SkipDeletedRecords = true,
Encoding = EncodingProvider.UTF8
};
using (var dbfDataReader = new DbfDataReader.DbfDataReader(dbfPath, options))
{
while (dbfDataReader.Read())
{
Console.WriteLine(dbfDataReader["FIRSTNAME"])
Console.WriteLine(dbfDataReader["LASTNAME"])
}
}
Related
parquet-dotnet has an example I'm trying to work with that looks like this:
using (Stream fileStream = System.IO.File.OpenRead("c:\\test.parquet"))
{
using (var parquetReader = new ParquetReader(fileStream))
{
DataField[] dataFields = parquetReader.Schema.GetDataFields();
for(int i = 0; i < parquetReader.RowGroupCount; i++)
{
using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
{
DataColumn[] columns = dataFields.Select(groupReader.ReadColumn).ToArray();
}
}
}
}
The concern I have is with the columns line. If I have data that looks like this, from a table perspective:
ID
Name
1
Test1
1
Test2
I want to map this data from the parquet file to a model that looks exactly like that. The issue that I have is that the data comes out from columns looking like this:
columns[0].Data[0] - 1
columns[0].Data[1] - 1
columns[1].Data[0] - Test1
columns[1].Data[1] - Test2
This might be a little hard to understand, but essentially, the columns variable is a collection of properties that has an array of values. That array is every value in the table for that column. So I'm having a hard time trying to figure out how to match the data in each array position with the data in the same array position in a different column and still keep everything together.
Also, I'm unable to do the normal deserialize because I have properties in the parquet file that look weird like __$something, so I can't map those to a similarly named property. Any ideas?
I have 3 Excel files containing data related to Client details, Company of Stocks and Order Details of Stocks Purchase. I want to parse all the data into a Multi-layer Dictionary using C# and run "Sorting" and "Searching" Functions on the same. I am a novice when it comes to C# and was wondering what would be the code for the same.
Data Eg: Stock Symbol Company Name S&P Sector
AAPL Apple Inc. IT
I could be barking up the wrong tree but with what you've given us to work with, I'm assuming you want to take the data matrix in the relevant worksheet and from that data, create an enumerable list of data with the relevant type so you can perform operations over it like sorting, filtering, etc. If that's what you want then the below is an example of that.
This is the workbook I created with some test data ...
You said you're a novice with C#, but, to make the below work, create a new .NET Framework project and add the NuGet package ... Microsoft.Office.Interop.Excel. I called the project InterExcelDotNet but you can change that to be whatever you want.
using Microsoft.Office.Interop.Excel;
using System.Collections.Generic;
using System.Linq;
namespace ExcelInteropDotNet
{
public class CompanyStockInfo
{
public string StockSymbol { get; set; }
public string CompanyName { get; set; }
public string SPSector { get; set; }
}
class Program
{
static void Main(string[] args)
{
// Change the below variabes to the relevant values for your needs.
string workbookName = #"c:\temp\Source Data.xlsx";
string worksheetName = "CompanyStockData";
// Create a new list with the type being the CompanyStockInfo type.
var companyStockInfoList = new List<CompanyStockInfo>();
// Create an instance of Excel, open the workbook, fetch the sheet and then
// find the last row in column A.
var xlApplication = new Application();
var xlWorkbook = xlApplication.Workbooks.Open(workbookName, ReadOnly: true);
var xlSrcSheet = xlWorkbook.Worksheets[worksheetName] as Worksheet;
var lastRow = xlSrcSheet.Cells[xlSrcSheet.Rows.Count,1].End[XlDirection.xlUp].Row;
// There may be a better way to do this but essentially, the below will loop through
// all cells from the 2nd row to the last row and create a new item in the list
// that stores all of the data.
for (long row = 2; row <= lastRow; row++)
{
companyStockInfoList.Add(new CompanyStockInfo()
{
StockSymbol = (xlSrcSheet.Cells[row, 1] as Range).Text,
CompanyName = (xlSrcSheet.Cells[row, 2] as Range).Text,
SPSector = (xlSrcSheet.Cells[row, 3] as Range).Text
});
}
xlApplication.Quit();
// You can use Linq to sort and search the list for the data you're wanting to
// get your hands on.
// Will filter all entries that have Inc. in the company name.
var filteredList = companyStockInfoList.Where(item => item.CompanyName.Contains("Inc."));
// Orders all entries by the company name in alphabetical order.
var orderedList = companyStockInfoList.OrderBy(item => item.CompanyName);
}
}
}
Now, having given you the above, you should understand that the Excel library in C# does allow you to perform operations over the workbook directly like you can do within excel, like SORT and FILTER. That may be another way to achieve what you're wanting.
Sort
AdvancedFilter
I'm not sure if all of that helps or not but I hope it does.
Good luck ...!
I have an ASP.NET MVC web application.
The SQL table has one column ProdNum and it contains data such as 4892-34-456-2311.
The user needs a form to search the database that includes this field.
The problem is that the user wants to have 4 separate fields in the UI razor view whereas each field should match with the 4 parts of data above between -.
For example ProdNum1, ProdNum2, ProdNum3 and ProdNum4 field should match with 4892, 34, 456, 2311.
Since the entire search form contains many fields including these 4 fields, the search logic is based on a predicate which is inherited from the PredicateBuilder class.
Something like this:
...other field to be filtered
if (!string.IsNullOrEmpty(ProdNum1) {
predicate = predicate.And(
t => t.ProdNum.toString().Split('-')[0].Contains(ProdNum1).ToList();
...other fields to be filtered
But the above code has run-time error:
The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities`
Does anybody know how to resolve this issue?
Thanks a lot for all responses, finally, I found an easy way to resolve it.
instead of rebuilding models and change the database tables, I just add extra space in the search strings to match the search criteria. since the data format always is: 4892-34-456-2311, so I use Startwith(PODNum1) to search first field, and use Contains("-" + PODNum2 + "-") to search second and third strings (replace PODNum1 to PODNum3), and use EndWith("-" + PODNum4) to search 4th string. This way, I don't need to change anything else, it is simple.
Again, thanks a lot for all responses, much appreciated.
If i understand this correct,you have one column which u want to act like 4 different column ? This isn't worth it...For that,you need to Split each rows column data,create a class to handle the splitted data and finally use a `List .Thats a useless workaround.I rather suggest u to use 4 columns instead.
But if you still want to go with your existing applied method,you first need to Split as i mentioned earlier.For that,here's an example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
string part1 = datareader(1).toString.Split("-")(0);///the 1st part of your column data
string part2 = datareader(1).toString.Split("-")(1);///the 2nd part of your column data
}
}
Now,as mentioned in the comments,you can rather a class to handle all the data.For example,let's call it mydata
public class mydata {
public string part1;
public string part2;
public string part3;
public string part4;
}
Now,within the While loop of the SqlDatareader,declare a new instance of this class and pass the values to it.An example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
Mydata alldata = new Mydata;
alldata.Part1 = datareader(1).toString.Split("-")(0);
alldata.Part2 = datareader(1).toString.Split("-")(1);
}
}
Create a list of the class in class-level
public class MyForm
{
List<MyData> storedData = new List<MyData>;
}
Within the while loop of the SqlDatareader,add this at the end :
storedData.Add(allData);
So finally, u have a list of all the splitted data..So write your filtering logic easily :)
As already mentioned in a comment, the error means that accessing data via index (see [0]) is not supported when translating your expression to SQL. Split('-') is also not supported hence you have to resort to the supported functions Substring() and IndexOf(startIndex).
You could do something like the following to first transform the string into 4 number strings ...
.Select(t => new {
t.ProdNum,
FirstNumber = t.ProdNum.Substring(0, t.ProdNum.IndexOf("-")),
Remainder = t.ProdNum.Substring(t.ProdNum.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
SecondNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
Remainder = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
t.SecondNumber,
ThirdNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
FourthNumber = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
... and then you could simply write something like
if (!string.IsNullOrEmpty(ProdNum3) {
predicate = predicate.And(
t => t.ThirdNumber.Contains(ProdNum3)
See code:
var lines = new List<PosLine>(){
new PosLine{Name="John", Address="dummy1", Tstamp=DateTime.Now},
new PosLine{Name="Jane", Address="dummy2", Tstamp=DateTime.Now}
};
using(var db = new LiteDatabase(#"test.db"))
{
var posLines = db.GetCollection<PosLine>("POS");
foreach(var line in lines)
{
var id = posLines.Insert(line);
Console.WriteLine("id=" + id.ToString());
}
var names = posLines.FindAll().Select(p => p.Name).ToList();
foreach(var name in names)
{
Console.WriteLine("name=" + name);
}
}
The line var names = posLines.FindAll().Select(p => p.Name).ToList(); tries to get a list of "Name", but in this case, it's a full table scan. Is there a way to avoid full table scan, like if I create an index on "Name" property, and then fetch all names from that index?
If you are reading all documents you will never avoid full scan. Using an index in Name you can do full index scan (avoiding full "table" scan). The diference between this two full scan is deserialization time and amount data read (index full scan is much more cheap) .
Unfortunately, in current version of LiteDB you have no options to get index key only. It´s quite easy to implement that, so open an issue on github that could be implemented in next version.
I am currently reading in an HTML document using CsQuery. This document has several HTML tables and I need to read in the data while preserving the structure. At the moment, I simply have a List of List of List of strings. This is a list of tables containing a list of rows containing a list of cells containing the content as a string.
List<List<List<string>>> page_tables = document_div.Cq().Find("TABLE")
.Select(table => table.Cq().Find("TR")
.Select(tr => tr.Cq().Find("td")
.Select(td => td.InnerHTML).ToList())
.ToList())
.ToList();
Is there a better way to store this data, so I can easily access particular tables, and specific rows and cells? I'm writing several methods that deal with this page_tables object so I need to nail down its formulation first.
Is there a better way to store this data, so I can easily access particular tables, and specific rows and cells?
On most occassions, well-formed HTML fits nicely into an XML structure so you could store it as an XML document. LINQ to XML would make querying very easy
XDocument doc = XDocument.parse("<html>...</html>");
var cellData = doc.Descendant("td").Select(x => x.Value);
Based on the comments I feel obliged to point out that there are a couple of other scenarios where this can fall over such as
When HTML-encoded content like is used
Valid HTML which doesn't require a closing tag e.g. <br> is used
(With that said, these things can be handled by some pre-processing)
To summarise, it's by all means not the most robust approach, however, if you can be sure that the HTML you are parsing fits the bill then it would be a pretty neat solution.
You could go fully OOP and write some model classes:
// Code kept short, minimal ctors
public class Cell
{
public string Content {get;set;}
public Cell() { this.Content = string.Empty; }
}
public class Row
{
public List<Cell> Cells {get;set;}
public Row() { this.Cells = new List<Cell>(); }
}
public class Table
{
public List<Row> Rows {get;set;}
public Table() { this.Rows = new List<Row>(); }
}
And then fill them up, for example like this:
var tables = new List<Table>();
foreach(var table in document_div.Cq().Find("TABLE"))
{
var t = new Table();
foreach(var tr in table.Cq().Find("TR"))
{
var r = new Row();
foreach(var td in tr.Cq().Find("td"))
{
var c = new Cell();
c.Contents = td.InnerHTML;
r.Cells.Add(c);
}
t.Rows.Add(r);
}
tables.Add(t);
}
// Assuming the HTML was correct, now you have a cleanly organized
// class structure representing the tables!
var aTable = tables.First();
var firstRow = aTable.Rows.First();
var firstCell = firstRow.Cells.First();
var firstCellContents = firstCell.Contents;
...
I'd probably choose this approach because I always prefer to know exactly what my data looks like, especially if/when I'm parsing from external/unsafe/unreliable sources.
Is there a better way to store this data, so I can easily access
particular tables, and specific rows and cells?
If you want to easily access table data, then create class which will hold data from table row with nicely named properties for corresponding columns. E.g. if you have users table
<table>
<tr><td>1</td><td>Bob</td></tr>
<tr><td>2</td><td>Joe</td></tr>
</table>
I would create following class to hold row data:
public class User
{
public int Id { get; set; }
public string Name { get; set; }
}
Second step would be parsing users from HTML. I suggest to use HtmlAgilityPack (available from NuGet) for parsing HTML:
HtmlDocument doc = new HtmlDocument();
doc.Load("index.html");
var users = from r in doc.DocumentNode.SelectNodes("//table/tr")
let cells = r.SelectNodes("td")
select new User
{
Id = Int32.Parse(cells[0].InnerText),
Name = cells[1].InnerText
};
// NOTE: you can check cells count before accessing them by index
Now you have collection of strongly-typed user objects (you can save them to list, to array or to dictionary - it depends on how you are going to use them). E.g.
var usersDictionary = users.ToDictionary(u => u.Id);
// Getting user by id
var user = usersDictionary[2];
// now you can read user.Name
Since your parsing an HTML table. Could you use an ADO.Net DataTable? If the content doesn't have too many row or col spans this may be an option, you wouldn't have to roll your own and it could be easily saved to a database or list of entities or whatever. Plus you get the benefit of strongly typed data types. As long as the HTML tables are consistent I would prefer an approach like this to make interoperability with the rest of the framework seamless and a ton less work.