Extract text from specific columns in c#? - c#
I have been working on extracting text from a csv file and store the data in a string. But now, I would like to extract text from some of the specific columns and store the data in a string.I would like the wordDocContents variable to contain the specific columns and the data in those specific columns which is bank_account, bank_name, customer_name. Currently, my wordDocContents has the entire data from my csv file. Is there a way to filter out the specific columns and the data in those columns and store it in the variable wordDocContents. Thanks
Here is what I tried so far -
public void button1Clicked(object sender, EventArgs args)
{
button1.Text = "You clicked me";
var textExtractor = new TextExtractor();
var wordDocContents = textExtractor.Extract("t.csv");
Console.WriteLine(wordDocContents);
Console.ReadLine();
}
The contents of wordDocContents:-
ACCOUNT_NUMBER,CUSTOMER_NAMES,VALUE_DATE,BOOKING_DATE,TRANSACTION,ACCOUNT_TYPE,BALANCE_TYPE,REFERENCE,MONEY.OUT,MONEY.IN,RUNNING.BALANCE,BRANCH,EMAIL,ACTUAL.BALANCE,AVAILABLE.BALANCE
1000000001,TEST,,2847899,KES,Account,,,10/10/2016,9/11/2016,15181800,UPPER HILL BRANCH,another#yahoo.com,5403.75,5403.75,
1000000001,,9/11/2016,9/11/2016,Opening Balance,,,,,,4643.22,,,,,
1000000001,,12/10/2016,12/10/2016,Mobile Mpesa Transfer,,,,1533,,3110.22,,,,,
1000000001,,17-10-2016,17-10-2016,ATM Withdrawal,,,6.29006E+11,1000,,2110.22,,,,,
1000000001,,17-10-2016,17-10-2016,ATM Withdrawal,,,6.29118E+11,2000,,110.22,,,,,
1000000001,,17-10-2016,17-10-2016,Mobile Mpesa Transfer,,,,2083,,-1972.78,,,,,
1000000001,,17-10-2016,17-10-2016,Transfer from Mpesa,,,,0,4000,2027.22,,,,,
1000000001,,18-10-2016,18-10-2016,Mobile Mpesa Transfer,,,,333,,1694.22,,,,,
From my knowledge on how csv files are constructed. (Maybe post the first 2 lines of your output?)
string[] lines = wordDocContents.Split("\n");
string[] columns = lines[0].Split(",");
string[][] data = new string[lines.Length][columns.Length];
Now let's say customer_name is under columns[2], you can try to:
List<string> customerNames = new List<string>();
for (int i = 1; i < lines.Length; i++) {
customerNames.Add(data[i][2]);
}
Edit just saw the output, this code might need some adjusting for your particular case. I am not 100% sure if string.Split(",") works for multiple commas in a row, but it's worth a shot. Just change the [2] to whichever column you need.
It should be going from [0],[1],[2] etc.
Related
C# WinForms ListView - delete column doesn't actually remove data
I have a WinForms ListView table, which I load from a text file, which is comma delimited. This is then split, line by line, by comma. The resulting string is then added into a ListViewItem. Example of what my ListView looks like after insert ListView I have a couple of issues, the main one being that I have set up a listener on column header click, which allows deletion of the column. This does result in the removal of the column data. The code for this is (I'm still learning C# so I appreciate this may not be the correct way of doing things): private void DataTable_ColClick(object sender, ColumnClickEventArgs e) { Int32 colIndex = Convert.ToInt32(e.Column.ToString()); PublicVars.ColClicked = colIndex; contextMenuStrip1.Show(Cursor.Position); } private void DeleteMenu_Click(object sender, EventArgs e) { if (PublicVars.ColClicked >= 0) { RawCSVData.BeginUpdate(); RawCSVData.Columns.RemoveAt(PublicVars.ColClicked); RawCSVData.EndUpdate(); PublicVars.ColClicked = -1; } } My problem starts when I want to add the contents of the visual ListView into a ListView which I pass to another class, when that happens it also includes all the column data that was "deleted". This is the code used to create a new List<string>, using the visual ListView sub-items, created as a comma delimited string (this is the only way I could find of doing it - again, I'm still learning so this may not be the best way of achieving it). public void UpdatedDataOkBtn_Click(object sender, EventArgs e) { try { List<string> StringLV = new List<string>(); for (int i = 0; i < RawCSVData.Items.Count; i++) { string EachLine = ""; for (int j = 0; j < RawCSVData.Items[i].SubItems.Count; j++) { string element = RawCSVData.Items[i].SubItems[j].Text; if (element.Length > 0) { EachLine += element + ","; } } StringLV.Add(EachLine.Remove(EachLine.Length - 1)); } string[] StrArr = StringLV.ToArray(); string FirstLineInArray = StrArr[0]; List<string> Lines = new List<string>(); Lines.AddRange(Regex.Split(FirstLineInArray, "[,]{1}(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))")); ActiveForm.Close(); ConvertForm convertForm = new ConvertForm(); int[] DataPosition = CheckCSVData.ScanArray(Lines); var (CSVListData, TotalAmount, NumRows) = PopulateLV.Populate(StringLV, DataPosition); convertForm.Show(); convertForm.BuildDataTable(CSVListData, TotalAmount, NumRows); } catch (Exception error) { ConvertForm.ErrorState(error.ToString()); } } How do I just create a new ListView with only the visible items? As a side question - I want to start at the first Col0 (I create the number of columns based on the number of comma delimited values), but as you can see, I end up with an initial Col0 (which is set in the ListView ColumnHeader Collection Editor), but then my data imported from the file begins in the next col (I assign the Col value using a counter). And finally, I did attempt to allow column reordering, which worked OK, but again, when inserting into a new ListView, it didn't retain the visual ordering. I have attempted several different possible solutions, but none seem to take only the remaining column data into the new ListView.
DataRow.SetField() gives a null ref exception when adding data to a column I previously deleted then added back
UPDATE I think I have found what is causing the issue here https://stackoverflow.com/a/5665600/19393524 I believe my issue lies with my use of .DefaultView. The post thinks when you do a sort on it it is technically a write operation to the DataTable object and might not propagate changes made properly or entirely. It is an interesting read and seems to answer my question of why passing valid data to a DataRow is throwing this exception AFTER I make changes to the datatable UPDATE: Let me be crystal clear. I have already solved my problem. I would just like to know why it is throwing an error. In my view the code should work and it does.. the first run through. AFTER I have already deleted the column then added it back (run this code once) When I debug my code line by line in Visiual studio and stop at the line: data.Rows[i].SetField(sortColumnNames[k], value); the row exists the column exisits value is not null sortColumnNames[k] is not null and contains the correct column name i is 0 Yet it still throws an exception. I would like to know why. What am I missing? Sorry for the long explanation but this one needs some context unfortunately. So my problem is this, I have code that sorts data in a DataTable object by column. The user picks the column they want to sort by and then my code sorts it. I ran into an issue where I needed numbers to sort as numbers not strings (all data in the table is strings). eg (string sorting would result in 1000 coming before 500) So my solution was to create a temporary column that uses the correct datatype so that numbers get sorted properly and the original string data of the number remains unchanged but is now sorted properly. This worked perfectly. I could sort string numeric data as numeric data without changing the formatting of the number or data type. I delete the column I used to sort afterwards because I use defaultview to sort and copy data to another DataTable object. That part all works fine the first time. The issue is when the user needs to do a different sort on the same column. My code adds back the column. (same name) then tries to add values to the column but then I get a null reference exception "Object not set to an instance of an object" Here is what I've tried: I've tried using AcceptChanges() after deleting a column but this did nothing. I've tried using column index, name, and column object returned by DataTable.Columns.Add() in the first parameter of SetField() in case it was somehow referencing the "old" column object I deleted (this is what I think the problem is more than likely) I've tried changing the value of the .ItemArray[] directly but this does not work even the first time Here is the code: This is the how the column names are passed: private void SortByColumn() { if (cbAscDesc.SelectedIndex != -1)//if the user has selected ASC or DESC order { //clears the datatable object that stores the sorted defaultview sortedData.Clear(); //grabs column names the user has selected to sort by and copies them to a string[] string[] lbItems = new string[lbColumnsToSortBy.Items.Count]; lbColumnsToSortBy.Items.CopyTo(lbItems, 0); //adds temp columns to data to sort numerical strings properly string[] itemsToSort = AddSortColumns(lbItems); //creates parameters for defaultview sort string columnsToSortBy = String.Join(",", itemsToSort); string sortDirection = cbAscDesc.SelectedItem.ToString(); data.DefaultView.Sort = columnsToSortBy + " " + sortDirection; //copies the defaultview to the sorted table object sortedData = data.DefaultView.ToTable(); RemoveSortColumns(itemsToSort);//removes temp sorting columns } } This is where the temp columns are added: private string[] AddSortColumns(string[] items)//adds columns to data that will be used to sort //(ensures numbers are sorted as numbers and strings are sorted as strings) { string[] sortColumnNames = new string[items.Length]; for (int k = 0; k < items.Length; k++) { int indexOfOrginialColumn = Array.IndexOf(columns, items[k]); Type datatype = CheckDataType(indexOfOrginialColumn); if (datatype == typeof(double)) { sortColumnNames[k] = items[k] + "Sort"; data.Columns.Add(sortColumnNames[k], typeof(double)); for (int i = 0; i < data.Rows.Count; i++) { //these three lines add the values in the original column to the column used to sort formated to the proper datatype NumberStyles styles = NumberStyles.Any; double value = double.Parse(data.Rows[i].Field<string>(indexOfOrginialColumn), styles); bool test = data.Columns.Contains("QtySort"); data.Rows[i].SetField(sortColumnNames[k], value);//this is line that throws a null ref exception } } else { sortColumnNames[k] = items[k]; } } return sortColumnNames; } This is the code that deletes the columns afterward: private void RemoveSortColumns(string[] columnsToRemove) { for (int i = 0; i < columnsToRemove.Length; i++) { if (columnsToRemove[i].Contains("Sort")) { sortedData.Columns.Remove(columnsToRemove[i]); } } } NOTE: I've been able to fix the problem by just keeping the column in data and just deleting the column from sortedData as I use .Clear() on the sorted table which seems to ensure the exception is not thrown. I would still like an answer though as to why this is throwing an exception. If I use .Contains() on the line right before the one where the exception is thrown is says the column exists and returns true and in case anyone is wondering the params sortColumnNames[k] and value are never null either.
Your problem is probably here: private void RemoveSortColumns() { for (int i = 0; i < data.Columns.Count; i++) { if (data.Columns[i].ColumnName.Contains("Sort")) { data.Columns.RemoveAt(i); sortedData.Columns.RemoveAt(i); } } } If you have 2 columns, and the first one matches the if, you will never look at the second. This is because it will run: i = 0 is i < columns.Count which is 2 => yes is col[0].Contains("sort") true => yes remove col[0] i = 1 is i < columns.Count which is 1 => no The solution is to readjust i after the removal private void RemoveSortColumns() { for (int i = 0; i < data.Columns.Count; i++) { if (data.Columns[i].ColumnName.Contains("Sort")) { data.Columns.RemoveAt(i); sortedData.Columns.RemoveAt(i); i--;//removed 1 element, go back 1 } } }
I fixed my original issue by changing a few lines of code in my SortByColumn() method: private void SortByColumn() { if (cbAscDesc.SelectedIndex != -1)//if the user has selected ASC or DESC order { //clears the datatable object that stores the sorted defaultview sortedData.Clear(); //grabs column names the user has selected to sort by and copies them to a string[] string[] lbItems = new string[lbColumnsToSortBy.Items.Count]; lbColumnsToSortBy.Items.CopyTo(lbItems, 0); //adds temp columns to data to sort numerical strings properly string[] itemsToSort = AddSortColumns(lbItems); //creates parameters for defaultview sort string columnsToSortBy = String.Join(",", itemsToSort); string sortDirection = cbAscDesc.SelectedItem.ToString(); DataView userSelectedSort = data.AsDataView(); userSelectedSort.Sort = columnsToSortBy + " " + sortDirection; //copies the defaultview to the sorted table object sortedData = userSelectedSort.ToTable(); RemoveSortColumns(itemsToSort);//removes temp sorting columns } } Instead of sorting on data.DefaultView I create a new DataView object and pass data.AsDataView() as it's value then sort on that. Completely gets rid of the issue in my original code. For anyone wondering I still believe it is bug with .DefaultView in the .NET framework that Microsoft will probably never fix. I hope this will help someone with a similar issue in the future. Here is the link again to where I figured out a solution to my problem. https://stackoverflow.com/a/5665600
When adding a row to DataGridView, a new blank row is added and data is added to the end of the current row
I'm trying to add data manually to a DataGridView (displaying a grid view of a student's attendance for the year). The problem is, when I add a row of data to the DataGridView instead of a new row being created and the data being added to it. A new blank row is made and the data is added to the top row. Here is the relevant code: foreach (IndividualAttendanceRecord rec in DatabaseInterfacer.GetRecords("pi404")) { if (dataGrid.ColumnCount < rec.Attendance.Count) dataGrid.ColumnCount = rec.Attendance.Count; List<String> row = new List<string>(); foreach (string entry in rec.Attendance) row.Add(entry); string[] rowArray = row.ToArray<string>(); dataGrid.Rows.Add(rowArray); } Doing this code makes a DataGridView with all the data in one line, then two blank lines at the bottom. Any help? EDIT: Still completely stumped on this. I've simplified my code and added a few test rows to the foreach statement and I don't understand why it's outputting the way it is at all. Here is my new code: foreach (IndividualAttendanceRecord rec in DatabaseInterfacer.GetRecords("pi404")) { if (dataGrid.ColumnCount < rec.Attendance.Count) dataGrid.ColumnCount = rec.Attendance.Count; string[] row = rec.Attendance.ToArray<string>(); dataGrid.Rows.Add(row); dataGrid.Rows.Add("1", "2", "3"); dataGrid.Rows.Add("One", "Two", "Three"); } And here is what it outputs: http://i.imgur.com/f45mlod.png I don't see why it is still putting all the information in the IndividualAttendanceRecord in a single line on it's own, and then creating a blank line and puting the "1 2 3" and "one two three". Can anyone see why this is happening? I'm probably being really stupid.
The control is showing what you said to show: First you said to grid to create some columns by setting ColumnCount to the count of items of your list: dataGrid.ColumnCount = rec.Attendance.Count; Then you add a row containing some values using Add( params object[] values) method. when you pass an array to the method, it will adds a row and use those values as columns: string[] rowArray = row.ToArray<string>(); dataGrid.Rows.Add(rowArray); If you want to added all values in a single column, as an option you can: dataGrid.ColumnCount = 1; foreach (string entry in rec.Attendance) dataGrid.Rows.Add(entry);
I looked through the rest of my code and found the problem. There was no problem with the display code, the problem was actually in the database. For some reason all the data was actually in one line of the database with two blank lines underneath.
Retrieve "row pairs" from Excel
I am trying to retrieve data from an Excel spreadsheet using C#. The data in the spreadsheet has the following characteristics: no column names are assigned the rows can have varying column lengths some rows are metadata, and these rows label the content of the columns in the next row Therefore, the objects I need to construct will always have their name in the very first column, and its parameters are contained in the next columns. It is important that the parameter names are retrieved from the row above. An example: row1|---------|FirstName|Surname| row2|---Person|Bob------|Bloggs-| row3|---------|---------|-------| row4|---------|Make-----|Model--| row5|------Car|Toyota---|Prius--| So unfortunately the data is heterogeneous, and the only way to determine what rows "belong together" is to check whether the first column in the row is empty. If it is, then read all data in the row, and check which parameter names apply by checking the row above. At first I thought the straightforward approach would be to simply loop through 1) the dataset containing all sheets, then 2) the datatables (i.e. sheets) and 3) the row. However, I found that trying to extract this data with nested loops and if statements results in horrible, unreadable and inflexible code. Is there a way to do this in LINQ ? I had a look at this article to start by filtering the empty rows between data but didn't really get anywhere. Could someone point me in the right direction with a few code snippets please ? Thanks in advance ! hiro
I see that you've already accepted the answer, but I think that more generic solution is possible - using reflection. Let say you got your data as a List<string[]> where each element in the list is an array of string with all cells from corresponding row. List<string[]> data; data = LoadData(); var results = new List<object>(); string[] headerRow; var en = data.GetEnumerator(); while(en.MoveNext()) { var row = en.Current; if(string.IsNullOrEmpty(row[0])) { headerRow = row.Skip(1).ToArray(); } else { Type objType = Type.GetType(row[0]); object newItem = Activator.CreateInstance(objType); for(int i = 0; i < headerRow.Length; i++) { objType.GetProperty(headerRow[i]).SetValue(newItem, row[i+1]); } results.Add(newItem); } }
how to represent a CSV File as a data structure in a C# program
I have a csv file I am going to read from disk. I do not know up front how many columns or the names of the columns. Any thoughts on how I should represent the fields. Ideally I want to say something like, string Val = DataStructure.GetValue(i,ColumnName). where i is the ith Row. Oh just as an aside I will be parsing using the TextFieldParser class http://msdn.microsoft.com/en-us/library/cakac7e6(v=vs.90).aspx
That sounds as if you would need a DataTable which has a Rows and Columns property. So you can say: string Val = table.Rows[i].Field<string>(ColumnName); A DataTable is a table of in-memory data. It can be used strongly typed (as suggested with the Field method) but actually it stores it's data as objects internally. You could use this parser to convert the csv to a DataTable. Edit: I've only just seen that you want to use the TextFieldParser. Here's a possible simple approach to convert a csv to a DataTable: var table = new DataTable(); using (var parser = new TextFieldParser(File.OpenRead(path))) { parser.Delimiters = new[]{","}; parser.HasFieldsEnclosedInQuotes = true; // load DataColumns from first line String[] headers = parser.ReadFields(); foreach(var h in headers) table.Columns.Add(h); // load all other lines as data ' String[] fields; while ((fields = parser.ReadFields()) != null) { table.Rows.Add().ItemArray = fields; } }
If the column names are in the first row read that and store in a Dictionary<string, int> that maps the column name to the column index. You could then store the remaining rows in a simple structure like List<string[]>. To get a column for a row you'd do csv[rowIndex][nameToIndex[ColumnName]]; nameToIndex[ColumnName] gets the column index from the name, csv[rowIndex] gets the row (string array) we want. This could of course be wrapped in a class.
Use the csv parser if you want, but a text parser is something very easy to do by yourself if you need customization. For you need, i would use one (or more) Dictionnary. At least one to have the PropertyString --> column index. And maybe the reverse one column index--> PropertyString if needed. When i parse a file for csv, i usually put the result in a list while parsing, and then in an array once complete for speed reasons (List.ToArray()).