Passing data between Collections and Regex.Match method - c#

I am building a BP object that does the following:
Captures text from some rows in an Excel file into a Blueprism collection using the standard Excel VBO.
Puts that collection's contents through a Regex.Match code stage in order to extract a particular value from each row in the collection.
The code stage is then supposed to populate another (output) collection with the extracted values.
Blueprism seems to use c# DataTable objects for collections? I tried using List<> objects, but BP will not implicitly cast those (or anything else for that matter), so guess I have to use DataTable.
The input and output variables in the code below (first and last lines), are the wrappers for the actual BP collections in the BP object.
DataTable localList = input;
DataTable localOutput = new DataTable();
DataRow[] rows = localList.Select();
while (localList != null)
{
for (int i = 0; i < rows.Length; i++)
{
string indexValue = Convert.ToString(localList.Rows[i]);
//example Regex
string sPattern1 = "[0-9]{5,8}[A-Z][A-Z]";
Match match1 = Regex.Match(indexValue, sPattern1);
///same thing with sPattern2, 3, 4, 5 & 6... with corresponding match2, 3, etc...
if (match1.Success || match2.Success || match3.Success || match4.Success ||
match5.Success || match6.Success)
{
localOutput.NewRow();
localOutput.Rows.Add(indexValue);
}
else
{
localOutput.NewRow();
localOutput.Rows.Add("Not Found");
}
}
}
output = localOutput.Select();
So, this code compiles OK, i.e. doesn't throw any type-cast errors or anything like when I tried to use List<>, but then when I run the object BP throws the following runtime exception:
Internal : Could not execute code stage because exception thrown by code stage: Input array is longer than the number of columns in this table.
What am I doing wrong? I'd appreciate any help with this, thank you.

First, you have to define the columns:
DataTable localOutput = new DataTable();
localOutput.Columns.Add("Column1", typeof (string));
Then you can add new rows.

Related

DataRow.SetField() gives a null ref exception when adding data to a column I previously deleted then added back

UPDATE
I think I have found what is causing the issue here https://stackoverflow.com/a/5665600/19393524
I believe my issue lies with my use of .DefaultView. The post thinks when you do a sort on it it is technically a write operation to the DataTable object and might not propagate changes made properly or entirely. It is an interesting read and seems to answer my question of why passing valid data to a DataRow is throwing this exception AFTER I make changes to the datatable
UPDATE:
Let me be crystal clear. I have already solved my problem. I would just like to know why it is throwing an error. In my view the code should work and it does.. the first run through.
AFTER I have already deleted the column then added it back (run this code once)
When I debug my code line by line in Visiual studio and stop at the line:
data.Rows[i].SetField(sortColumnNames[k], value);
the row exists
the column exisits
value is not null
sortColumnNames[k] is not null and contains the correct column name
i is 0
Yet it still throws an exception. I would like to know why. What am I missing?
Sorry for the long explanation but this one needs some context unfortunately.
So my problem is this, I have code that sorts data in a DataTable object by column. The user picks the column they want to sort by and then my code sorts it.
I ran into an issue where I needed numbers to sort as numbers not strings (all data in the table is strings). eg (string sorting would result in 1000 coming before 500)
So my solution was to create a temporary column that uses the correct datatype so that numbers get sorted properly and the original string data of the number remains unchanged but is now sorted properly. This worked perfectly. I could sort string numeric data as numeric data without changing the formatting of the number or data type.
I delete the column I used to sort afterwards because I use defaultview to sort and copy data to another DataTable object.
That part all works fine the first time.
The issue is when the user needs to do a different sort on the same column. My code adds back the column. (same name) then tries to add values to the column but then I get a null reference exception "Object not set to an instance of an object"
Here is what I've tried:
I've tried using AcceptChanges() after deleting a column but this did nothing.
I've tried using column index, name, and column object returned by DataTable.Columns.Add() in the first parameter of SetField() in case it was somehow referencing the "old" column object I deleted (this is what I think the problem is more than likely)
I've tried changing the value of the .ItemArray[] directly but this does not work even the first time
Here is the code:
This is the how the column names are passed:
private void SortByColumn()
{
if (cbAscDesc.SelectedIndex != -1)//if the user has selected ASC or DESC order
{
//clears the datatable object that stores the sorted defaultview
sortedData.Clear();
//grabs column names the user has selected to sort by and copies them to a string[]
string[] lbItems = new string[lbColumnsToSortBy.Items.Count];
lbColumnsToSortBy.Items.CopyTo(lbItems, 0);
//adds temp columns to data to sort numerical strings properly
string[] itemsToSort = AddSortColumns(lbItems);
//creates parameters for defaultview sort
string columnsToSortBy = String.Join(",", itemsToSort);
string sortDirection = cbAscDesc.SelectedItem.ToString();
data.DefaultView.Sort = columnsToSortBy + " " + sortDirection;
//copies the defaultview to the sorted table object
sortedData = data.DefaultView.ToTable();
RemoveSortColumns(itemsToSort);//removes temp sorting columns
}
}
This is where the temp columns are added:
private string[] AddSortColumns(string[] items)//adds columns to data that will be used to sort
//(ensures numbers are sorted as numbers and strings are sorted as strings)
{
string[] sortColumnNames = new string[items.Length];
for (int k = 0; k < items.Length; k++)
{
int indexOfOrginialColumn = Array.IndexOf(columns, items[k]);
Type datatype = CheckDataType(indexOfOrginialColumn);
if (datatype == typeof(double))
{
sortColumnNames[k] = items[k] + "Sort";
data.Columns.Add(sortColumnNames[k], typeof(double));
for (int i = 0; i < data.Rows.Count; i++)
{
//these three lines add the values in the original column to the column used to sort formated to the proper datatype
NumberStyles styles = NumberStyles.Any;
double value = double.Parse(data.Rows[i].Field<string>(indexOfOrginialColumn), styles);
bool test = data.Columns.Contains("QtySort");
data.Rows[i].SetField(sortColumnNames[k], value);//this is line that throws a null ref exception
}
}
else
{
sortColumnNames[k] = items[k];
}
}
return sortColumnNames;
}
This is the code that deletes the columns afterward:
private void RemoveSortColumns(string[] columnsToRemove)
{
for (int i = 0; i < columnsToRemove.Length; i++)
{
if (columnsToRemove[i].Contains("Sort"))
{
sortedData.Columns.Remove(columnsToRemove[i]);
}
}
}
NOTE:
I've been able to fix the problem by just keeping the column in data and just deleting the column from sortedData as I use .Clear() on the sorted table which seems to ensure the exception is not thrown.
I would still like an answer though as to why this is throwing an exception. If I use .Contains() on the line right before the one where the exception is thrown is says the column exists and returns true and in case anyone is wondering the params sortColumnNames[k] and value are never null either.
Your problem is probably here:
private void RemoveSortColumns()
{
for (int i = 0; i < data.Columns.Count; i++)
{
if (data.Columns[i].ColumnName.Contains("Sort"))
{
data.Columns.RemoveAt(i);
sortedData.Columns.RemoveAt(i);
}
}
}
If you have 2 columns, and the first one matches the if, you will never look at the second.
This is because it will run:
i = 0
is i < columns.Count which is 2 => yes
is col[0].Contains("sort") true => yes
remove col[0]
i = 1
is i < columns.Count which is 1 => no
The solution is to readjust i after the removal
private void RemoveSortColumns()
{
for (int i = 0; i < data.Columns.Count; i++)
{
if (data.Columns[i].ColumnName.Contains("Sort"))
{
data.Columns.RemoveAt(i);
sortedData.Columns.RemoveAt(i);
i--;//removed 1 element, go back 1
}
}
}
I fixed my original issue by changing a few lines of code in my SortByColumn() method:
private void SortByColumn()
{
if (cbAscDesc.SelectedIndex != -1)//if the user has selected ASC or DESC order
{
//clears the datatable object that stores the sorted defaultview
sortedData.Clear();
//grabs column names the user has selected to sort by and copies them to a string[]
string[] lbItems = new string[lbColumnsToSortBy.Items.Count];
lbColumnsToSortBy.Items.CopyTo(lbItems, 0);
//adds temp columns to data to sort numerical strings properly
string[] itemsToSort = AddSortColumns(lbItems);
//creates parameters for defaultview sort
string columnsToSortBy = String.Join(",", itemsToSort);
string sortDirection = cbAscDesc.SelectedItem.ToString();
DataView userSelectedSort = data.AsDataView();
userSelectedSort.Sort = columnsToSortBy + " " + sortDirection;
//copies the defaultview to the sorted table object
sortedData = userSelectedSort.ToTable();
RemoveSortColumns(itemsToSort);//removes temp sorting columns
}
}
Instead of sorting on data.DefaultView I create a new DataView object and pass data.AsDataView() as it's value then sort on that. Completely gets rid of the issue in my original code. For anyone wondering I still believe it is bug with .DefaultView in the .NET framework that Microsoft will probably never fix. I hope this will help someone with a similar issue in the future.
Here is the link again to where I figured out a solution to my problem.
https://stackoverflow.com/a/5665600

How to get Column Names using Linq from DataTable

I'm trying to use LINQ on DataTable that's getting it's data from sql. So I have a data table with it's usual columns and rows and it appears exactly like a sql select statement. Now I need to get certain rows and columns (including column names) from this data.
I converted the datatable to something LINQ can use using AsEnumerable but I'm not sure what exactly it does. Does it convert the data into an array of objects where each row becomes an object?
I'm used to working with Javascript and it's newer arrow functions so i'd like to use Linq with lambda to keep it consistent.
I'm trying to get rows and column names where first column has a value equal to 2018
DataTable myTable = getData(); // populates the datatable and I've verified the data
var linqTable = myTable.AsEnumerable().Select( x => x[0] = 2018);
I need to get the rows and column names. e.g like an object or array of objects.However, the code above doesn't return the data or column names but just two rows with 2018 in it.
My goal is to eventually serialize this data as json and send it to web page.
To Get the column names:
myTable.Columns.Cast<DataColumn>().Select(dc =>dc.ColumnName).ToList();
The problem is Select() is projecting the objects into a new form. You are seeing 2018 because of '=' instead of '=='. You need to use Where()
var linqTable = myTable.AsEnumerable().Where( x => x.Field<int>(0) == 2018);
You will still end up with a list of DataRows though. The DataTable object isn't really what you should be using because it already provides a nice way to filter its rows:
myTable.Rows.Find(2018);
If you are trying to convert it to a list of objects you should use the Select() method something like:
var linqTable = myTable.AsEnumerable().Where(x => x.Field<int>(0) == 2018)
.Select(x => new
{
year = x[0],
p1 = x[1],
p2 = x[2] // etc...
});
You can create the following function:
public static DataTable CreateDataTableFromAnyCollection<T>(IEnumerable<T> list)
{
Type type = typeof(T);
var properties = type.GetProperties();
DataTable dataTable = new DataTable();
foreach (PropertyInfo info in properties)
{
dataTable.Columns.Add(new DataColumn(info.Name, Nullable.GetUnderlyingType(info.PropertyType) ?? info.PropertyType));
}
foreach (T entity in list)
{
object[] values = new object[properties.Length];
for (int i = 0; i < properties.Length; i++)
{
values[i] = properties[i].GetValue(entity,null);
}
dataTable.Rows.Add(values);
}
return dataTable;
}
and pass any type of object your LINQ query returning.
DataTable dt = CreateDataTableFromAnyCollection(query);
I hope this will help you.
Creating a DataTable From a Query (LINQ to DataSet)

Test for an empty DataRow in C#

what I'm trying to do: I have a large datatable, and I'm going through a list of strings where some of them are in the datatable and some aren't. I need to make a list of those that are, and count those that aren't.
This is my code part:
DataRow[] foundRows;
foundRows = DTgesamt.Select("SAP_NR like '%"+SAP+"%'");
if (AreAllCellsEmpty(foundRows[0]) == false && !(foundRows[0]==null))
{
list.Add(SAP);
}
else
{
notfound++;
}
public static bool AreAllCellsEmpty(DataRow row)
{
if (row == null) throw new ArgumentNullException("row");
for (int i = row.Table.Columns.Count - 1; i >= 0; i--)
{
if (!row.IsNull(i))
{
return false;
}
}
return true;
}
DTgesamt ist a large DataTable. "SAP" is a string that is in the first column of the DataTable, but not all of them are included. I want to count the unfound ones with the int "notfound".
The problem is, the Select returns an empty DataRow {System.Data.DataRow[0]} when it finds nothing.
I'm getting the errormessage Index out of array area.
The two statements in the if-clause are what I read on the internet but they don't work. With only the 2nd statement it just adds all numbers to the list, with the first it still gives this error.
Thanks for any help :)
check count of items in foundRows array to avoid IndexOutOfRange exception
foundRows = DTgesamt.Select("SAP_NR like '%"+SAP+"%'");
if (foundRows.Length > 0 && AreAllCellsEmpty(foundRows[0])==false)
list.Add(SAP);
else
notfound++;
The found cells cannot be empty. Your select statement would be wrong. So what you actually need is:
if (DTgesamt.Select("SAP_NR like '%"+SAP+"%'").Any())
{
list.Add(SAP);
}
else
{
notfound++;
}
You probably don't even need the counter, when you can calculate the missed records based on how many SAP numbers you had and how many results you got in list.
If you have an original list or array of SAP numbers, you could shorten your whole loop to:
var numbersInTable = originalNumbers.Where(sap => DTgesamt.Select("SAP_NR like '%"+sap+"%'").Any()).ToList();
var notFound = originalNumbers.Count - numbersInTable.Count;

Retrieve "row pairs" from Excel

I am trying to retrieve data from an Excel spreadsheet using C#. The data in the spreadsheet has the following characteristics:
no column names are assigned
the rows can have varying column lengths
some rows are metadata, and these rows label the content of the columns in the next row
Therefore, the objects I need to construct will always have their name in the very first column, and its parameters are contained in the next columns. It is important that the parameter names are retrieved from the row above. An example:
row1|---------|FirstName|Surname|
row2|---Person|Bob------|Bloggs-|
row3|---------|---------|-------|
row4|---------|Make-----|Model--|
row5|------Car|Toyota---|Prius--|
So unfortunately the data is heterogeneous, and the only way to determine what rows "belong together" is to check whether the first column in the row is empty. If it is, then read all data in the row, and check which parameter names apply by checking the row above.
At first I thought the straightforward approach would be to simply loop through
1) the dataset containing all sheets, then
2) the datatables (i.e. sheets) and
3) the row.
However, I found that trying to extract this data with nested loops and if statements results in horrible, unreadable and inflexible code.
Is there a way to do this in LINQ ? I had a look at this article to start by filtering the empty rows between data but didn't really get anywhere. Could someone point me in the right direction with a few code snippets please ?
Thanks in advance !
hiro
I see that you've already accepted the answer, but I think that more generic solution is possible - using reflection.
Let say you got your data as a List<string[]> where each element in the list is an array of string with all cells from corresponding row.
List<string[]> data;
data = LoadData();
var results = new List<object>();
string[] headerRow;
var en = data.GetEnumerator();
while(en.MoveNext())
{
var row = en.Current;
if(string.IsNullOrEmpty(row[0]))
{
headerRow = row.Skip(1).ToArray();
}
else
{
Type objType = Type.GetType(row[0]);
object newItem = Activator.CreateInstance(objType);
for(int i = 0; i < headerRow.Length; i++)
{
objType.GetProperty(headerRow[i]).SetValue(newItem, row[i+1]);
}
results.Add(newItem);
}
}

How do I load a list via a data set from SQL Server into ListView?

I have what seems to be a simple question but its killing me trying to find out.
I have a form in which I have a ListView. In this ListView I would like to populate it with data from a SQL Server 2008 database table.
public void LoadList()
{
DataTable dtable = budget_MainDataSetReceipt.Tables["Receipt"];
listView1.Items.Clear();
for (int i = 0; i < dtable.Rows.Count; i++)
{
DataRow drow = dtable.Rows[i];
if (drow.RowState != DataRowState.Deleted)
{
ListViewItem lvi = new ListViewItem(drow["ReceiptID"].ToString());
lvi.SubItems.Add(drow["DateCleared"].ToString());
lvi.SubItems.Add(drow["CategoryID"].ToString());
lvi.SubItems.Add(drow["Amount"].ToString());
lvi.SubItems.Add(drow["Store"].ToString());
lvi.SubItems.Add(drow["DateEntered"].ToString());
listView1.Items.Add(lvi);
}
}
}
I keep getting an
Object reference not set to an instance of an object
error, and I can't figure out why. There are about 5 rows of data in my database, so in my mind, there should be 5 rows of data within the list view.
Can anyone tell me what I am missing? I can post more code if that would be helpful.
I have tried calling the LoadList() method in several ways:
Before the method itself
With the InitializeComponent() method
I have tried the following syntax
this.LoadList();
this.Form1.LoadList();`
I have also tried to initialize the DataTables type with the following:
DataTables dt = new DataTables //did not work
My hunch would be: you're assuming for all columns in your DataRow that they're present and not null - that's a bit of a dangerous assumption.
I would change your assignments to use a method that checks for DBNull before returning the string:
public string SafeGetString(DataRow row, string columnName)
{
if(row[columnName] != null && row[columnName] != DBNull.Value)
{
return row[ColumName].ToString();
}
return string.Empty;
}
so your could would look like:
ListViewItem lvi = new ListViewItem(SafeGetString(drow, "ReceiptID"));
lvi.SubItems.Add(SafeGetString(drow, "DateCleared"));
// and so forth
This way, if any of the columns should contain a NULL, you would get back an empty string - instead of running into a NULL.ToString() that causes the error you're seeing.

Categories