Summarize table using C# and JScript - c#

I've written code in C# to summarize values of InvoiceTable and move those summarized values to to GroupTable in Abbyy FlexiCapture. The Software is comparatively new and does not show any error when I run it.
There are two sets of code to be written:
In TechField.
In EventHandlers.
InvoiceTable consists of:
TariffNumber
ShipQty
Amount
COO
GroupTable consists of:
HSCode
Qty
Amt
CountryOO
EventHandlers code is as follows (C#):
if (ChangedStates.Has(7)) {
int currentRow = 0;
int i;
for (i = 0; i < Document.Field("Invoice2\\InvoiceTable").Items.Count; i++) {
if (Document.Field("Invoice2\\InvoiceTable").Cell("TariffNumber", i).Value == "") {
Document.Field("Invoice2\\GroupTable").Cell("HSCode", currentRow).Value = Document.Field("Invoice2\\InvoiceTable").Cell("TariffNumber", i).Value;
Document.Field("Invoice2\\GroupTable").Cell("Amt", currentRow).Value = Document.Field("Invoice2\\InvoiceTable").Cell("Amount", i).Value;
Document.Field("Invoice2\\GroupTable").Cell("Qty", currentRow).Value = Document.Field("Invoice2\\InvoiceTable").Cell("ShipQty", i).Value;
currentRow++;
}
}
}
TechField is as follows (JScript):
for (i = 0; i < Field("ShipQty").Items.Count - 1; i++) {
for (j = i + 1; j < Field("ShipQty").Items.Count; j++) {
// if same new items are found
if (Field("TariffNumber").Items(i).Value == Field("TariffNumber").Items(j).Value && Field("CoO").Items(i).Value == Field("CoO").Items(j).Value)
{
// summarise quantities
Field("ShipQty").Items(i).Value = parseInt(Field("ShipQty").Items(i).Value) + parseInt(Field("ShipQty").Items(j).Value);
// and weights
Field("Amount").Items(i).Value = parseFloat(Field("Amount").Items(i).Value) + parseFloat(Field("Amount").Items(j).Value);
}
}
}
Condition:
In the InvoiceTable, where ever TariffNumber and COO are equal, values of ShipQty and Amount should be summarized and put into GroupTable.
The code does not show any errors but does not give the output as well. Would be great if anyone of you can help me out on this.

One thing you could try is adding a summary section to the document definition. This may require you to create a document set.
I've found it easier to create rules which are checked when the field is recognised. There's some info here: https://help.abbyy.com/en-us/flexicapture/12/distributed_administrator/docsets_settings/

Related

Average loss stuck on local minima (Self developed Perceptron in C#)

I am trying to build a custom machine learning library in C# , I have researched fairly about the topic. My first example (XOR estimator) was a success, I was able to lower the average loss to almost cero. Then I tried to build a model to classify handwritten digits (using MNIST text database).The problem is ,no matter how I configure the model, I always get stuck on a certain average loss over the data set.Second problem, because the MNIST dataset is very large, the model takes a lot of time to compute, maybe I can use some advice on how to carry on the slowest parts of the algorithm (I am using stochastic gradient descent).I am going to show the main method that performs most of the work.
I have tried using MSE and CrossEntropy loss functions, also tanh , sigmoid , reLu and softPlus activation functions. The model I am trying to build is a 4 layer one. First layer, 784 input neurons ; Second , 16 neurons,sigmoid ; Third , 16 neurons , sigmoid and Output layer, 10 neurons (one hot encoded digits) with sigmoid. I am aware that the code below may not be a minimal reproducible example, but it represents the algorithm I am trying to figure. I have also uploaded the solution to GitHub, maybe somebody can give me a hand figuring the problem. This is the link https://github.com/juan-carvajal/MachineLearningFramework
Running the Main method of the app will first, execute the XOR classifier, that runs good. Then the MNIST classifier.
The model is best represented here:
DataSet dataSet = new DataSet("mnist2.txt", ' ', 10, false);
//This creates a model with batching=128 , learningRate=0.5 and
//CrossEntropy loss function
var p = new Perceptron(128, 0.5, ErrorFunction.CrossEntropy())
.Layer(784, ActivationFunction.Sigmoid())
.Layer(16, ActivationFunction.Sigmoid())
.Layer(16, ActivationFunction.Sigmoid())
.Layer(10, ActivationFunction.Sigmoid());
//1000 is the number of epochs
p.Train2(dataSet, 1000);
Actual Algorithm (stochastic gradiente descent):
Console.WriteLine("Initial Loss:"+ CalculateMeanErrorOverDataSet(dataSet));
for (int i = 0; i < epochs; i++)
{
//Shuffle the data in every step
dataSet.Shuffle();
List<DataRow> batch = dataSet.NextBatch(this.Batching);
//Gets random batch from the dataSet
int count = 0;
foreach (DataRow example in batch)
{
count++;
double[] result = this.FeedForward(example.GetFeatures());
double[] labels = example.GetLabels();
if (result.Length != labels.Length)
{
throw new Exception("Inconsistent array size, Incorrect implementation.");
}
else
{
//What follows is the calculation of the gradient for this example, every example affects the current gradient, then all those changes are averaged an every parameter is updated.
double error = CalculateExampleLost(example);
for (int l = this.Layers.Count - 1; l > 0; l--)
{
if (l == this.Layers.Count - 1)
{
for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
{
this.Layers[l].CostDerivatives[j] = ErrorFunction.GetDerivativeValue(labels[j], this.Layers[l].Activations[j]);
}
}
else
{
for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
{
double acum = 0;
for (int j2 = 0; j2 < Layers[l + 1].Size; j2++)
{
acum += Layers[l + 1].WeightMatrix[j2, j] * this.Layers[l+1].ActivationFunction.GetDerivativeValue(Layers[l + 1].WeightedSum[j2]) * Layers[l + 1].CostDerivatives[j2];
}
this.Layers[l].CostDerivatives[j] = acum;
}
}
for (int j = 0; j < this.Layers[l].Activations.Length; j++)
{
this.Layers[l].BiasVectorChangeRecord[j] += this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j]) * Layers[l].CostDerivatives[j];
for (int k = 0; k < Layers[l].WeightMatrix.GetLength(1); k++)
{
this.Layers[l].WeightMatrixChangeRecord[j, k] += Layers[l - 1].Activations[k]
* this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j])
* Layers[l].CostDerivatives[j];
}
}
}
}
}
TakeGradientDescentStep(batch.Count);
if ((i + 1) % (epochs / 10) == 0)
{
Console.WriteLine("Epoch " + (i + 1) + ", Avg.Loss:" + CalculateMeanErrorOverDataSet(dataSet));
}
}
This is an example of what the local minima looks like in the current model.
In my research I found out that similar models may archieve accuracy up to 90%. My model barely got 10%.

Find a string in a Range using ClosedXML C#

I want to be able to find if a particular string exists in a range using ClosedXML, however, I cannot find any find command in the documentation. Currently I loop through 1000s of rows to find if the string exists. Is there a more efficient way to do this?
Here is an example of my code:
for (int j = 3; j <= PipeSheet.LastRowUsed().RowNumber(); j ++)
{
if ((PipeSheet.Cell(j, ProdCodeColumnInPipe).Value.ToString().Trim() == SheetToEdit.Cell(i, ProdCodeColumnInMain).Value.ToString().Trim() & PipeSheet.Cell(j, 3).Value.ToString().Trim() == SheetToEdit.Cell(i, RegionCodeInMain).Value.ToString().Trim()))
{
SheetToEdit.Cell(i, ColumnToEdit).Value = "+";
if ((new[] { "Open", "Under Review" }).Contains(PipeSheet.Cell(j, 5).Value.ToString().Trim()) & (new[] { "Forecast"}).Contains(PipeSheet.Cell(j, 4).Value.ToString().Trim()))
{
if (FirstColumnHighlight > 1 & LastColumnHighlight > 1)
{
for (int k = FirstColumnHighlight; k <= LastColumnHighlight; k++)
{
SheetToEdit.Cell(i, k).Style.Fill.BackgroundColor = XLColor.FromArgb(255, 255, 0);
}
}
}
}
}
Firstly, your goal is best solved using conditional formatting.
But to answer your question, you can search for a string:
sheet.CellsUsed(cell => cell.GetString() == searchstring)
Reference: https://github.com/ClosedXML/ClosedXML/wiki/Better-lambdas
-- UPDATE --
There is a pull request at https://github.com/ClosedXML/ClosedXML/pull/399 to help with this, for example:
foundCells = ws.Search("searchText", CompareOptions.OrdinalIgnoreCase);

Getting the column totals in a 2D array but it always throws FormatException using C#

I am planning to get an array of the averages of each column.
But my app crashes at sum[j] += int.Parse(csvArray[i,j]); due to a FormatException. I have tried using Convert.ToDouble and Double.Parse but it still throws that exception.
The increments in the for loop start at 1 because Row 0 and Column 0 of the CSV array are strings (names and timestamps). For the divisor or total count of the fields that have values per column, I only count the fields that are not BLANK, hence the IF statement. I think I need help at handling the exception.
Below is the my existing for the method of getting the averages.
public void getColumnAverages(string filePath)
{
int col = colCount(filePath);
int row = rowCount(filePath);
string[,] csvArray = csvToArray(filePath);
int[] count = new int[col];
int[] sum = new int[col];
double[] average = new double[col];
for (int i = 1; i < row; i++)
{
for (int j = 1; j < col; j++)
{
if (csvArray[i,j] != " ")
{
sum[j] += int.Parse(csvArray[i,j]);
count[j]++;
}
}
}
for (int i = 0; i < average.Length; i++)
{
average[i] = (sum[i] + 0.0) / count[i];
}
foreach(double d in average)
{
System.Diagnostics.Debug.Write(d);
}
}
}
I have uploaded the CSV file that I use when I test the prototype. It has BLANK values on some columns. Was my existing IF statement unable to handle that case?
There are also entries like this 1.324556-e09due to the number of decimals I think. I guess I have to trim it in the csvToArray(filePath) method or are there other efficient ways? Thanks a million!
So there are a few problems with your code. The main reason for your format exception is that after looking at your CSV file your numbers are surrounded by quotes. Now I can't see from your code exactly how you convert your CSV file to an array but I'm guessing that you don't clear these out - I didn't when I first ran with your CSV and experienced the exact same error.
I then ran into an error because some of the values in your CSV are decimal, so the datatype int can't be used. I'm assuming that you still want the averages of these columns so in my slightly revised verion of your method I change the arrays used to be of type double.
AS #musefan suggested, I have also changed the check for empty places to use the IsNullOrWhiteSpace method.
Finally when you output your results you receive a NaN for the first value in the averages column, this is because when you don't take into account that you never populate the first position of your arrays so as not to process the string values. I'm unsure how you'd best like to correct this behaviour as I'm not sure of the intended purpose - this might be okay - so I've not made any changes to this for the moment, pop a mention in the comments if you want help on how to sort this!
So here is the updated method:
public static void getColumnAverages(string filePath)
{
// Differs from the current implementation, reads a file in as text and
// splits by a defined delim into an array
var filePaths = #"C:\test.csv";
var csvArray = File.ReadLines(filePaths).Select(x => x.Split(',')).ToArray();
// Differs from the current implementation
var col = csvArray[0].Length;
var row = csvArray.Length;
// Update variables to use doubles
double[] count = new double[col];
double[] sum = new double[col];
double[] average = new double[col];
Console.WriteLine("Started");
for (int i = 1; i < row; i++)
{
for (int j = 1; j < col; j++)
{
// Remove the quotes from your array
var current = csvArray[i][j].Replace("\"", "");
// Added the Method IsNullOrWhiteSpace
if (!string.IsNullOrWhiteSpace(current))
{
// Parse as double not int to account for dec. values
sum[j] += double.Parse(current);
count[j]++;
}
}
}
for (int i = 0; i < average.Length; i++)
{
average[i] = (sum[i] + 0.0) / count[i];
}
foreach (double d in average)
{
System.Diagnostics.Debug.Write(d + "\n");
}
}

How to calculate current value on object based on previous conditions?

I am looking for optimization for the following.
I have an object that has a few properties. For simplicity i will show the only 2 that matter in this scenario.
public class DomainData
{
public long Id{ get; set; }
public long SoldTickets { get; set; }
}
I make a call to my db and get a list of the above object:
var databaseData = _iOvervatchManager.GetDomainData(id);// gets date from db
var model = new List<DomainModel>();
Now I need to calculate how many tickets have been sold between list items.
current = current - previous
I do the calculation with the following code:
for (int i = 0; i <= databaseData.Count() -1; i++)
{
if (i != 0)
{
long ticket = databaseData[i].SoldTickets- databaseData[i - 1].SoldTickets;
model.Add(new DomainModel
{
DomainId = databaseData[i].DomainId ,
SoldTicketsLastUpdate = ticket
});
}
}
And now set the value on the original object.
for(int i= 0; i< databaseData.Count(); i++)
{
if(databaseData[i].Id == model[i].DomainId )
databaseData[i].SoldTickets = model[i].SoldTicketsLastUpdate;
}
I consider this an awful way of accomplishing this since I recon I can do this in the first loop but I can't figure out how.
my first attempt was :
for (int i = 0; i <= databaseData.Count() -1; i++)
{
if (i != 0)
{
databaseData[i].SoldTickets
= databaseData[i].SoldTickets - databaseData[i - 1].SoldTickets;
}
}
But this was wrong since i change the object value and when I come to the next increment the previous value has been modified so the current for that loop increment gets a wrong calculation.
How can I optimize this?
Assuming that no new data will be added after this runs (!!), execute the for-loop backwards:
for (int i = databaseData.Count() - 1; i > 0; i--)
{
databaseData[i].SoldTickets = databaseData[i].SoldTickets - databaseData[i - 1].SoldTickets;
}
And because the loop only runs for i > 0, the check for i != 0 is unneeded.

C# Sorting, return multiple text file string entries line at a time

I have a C# console window program and I am trying to sort "File3" (contains numbers) in ascending and output lines from 3 text files.
So the outcome looks something like this:
===========================================================================
field1.....................field2.....................field3
===========================================================================
[FILE1_LINE1]..............[FILE2_LINE1]..............[FILE3_LINE1]
[FILE1_LINE2]..............[FILE2_LINE2]..............[FILE3_LINE2]
[FILE1_LINE3]..............[FILE2_LINE3]..............[FILE3_LINE3]
and so on...
At the moment, it kinda works I think but it duplicates the first two lines it seems. Could someone give an example of better coding please?
Here is the code that I have atm:
string[] File1 = System.IO.File.ReadAllLines(#"FILE1.txt");
string[] File2 = System.IO.File.ReadAllLines(#"FILE2.txt");
string[] File3 = System.IO.File.ReadAllLines(#"FILE3.txt");
decimal[] File3_1 = new decimal[File3.Length];
for(int i=0; i<File3.Length; i++)
{
File3_1[i] = decimal.Parse(File3[i]);
}
decimal[] File3_2 = new decimal[File3.Length];
for(int i=0; i<File3.Length; i++)
{
File3_2[i] = decimal.Parse(File3[i]);
}
decimal number = 0;
for (double i = 0.00; i < File3_1.Length; i++)
{
for (int sort = 0; sort < File3_1.Length - 1; sort++)
{
if (File3_1[sort] > File3_1[sort + 1])
{
number = File3_1[sort + 1];
File3_1[sort + 1] = File3_1[sort];
File3_1[sort] = number;
}
}
}
if (SortChoice2 == 1)
{
for (int y = 0; y < File3_2.Length; y++)
{
for (int s = 0; s < File3_2.Length; s++)
{
if (File3_1[y] == File3_2[s])
{
Console.WriteLine(File1[s] + File2[s] + File3_1[y]);
}
}
}
}
Just for more info, most of this code was used for another program and worked but in my new program, this doesn't as I've said above - ("it repeats a couple of lines for some reason"). I'm kinda an amateur/ rookie at C# so I only get stuff like this to work with examples.
Thanks in advance :)
Ok, if I understand correctly, what you are trying to do is read the lines from 3 different files, each of them representing a different "field" in a table. You then want to sort this table based on the value of one of the field (in you code, this seems to be the field which values are contained in File3. Well, if I got that right, here's what I suggest you do:
// Read data from files
List<string> inputFileNames = new List<string> {"File1.txt", "File2.txt", "File3.txt"};
decimal[][] fieldValues = new decimal[inputFileNames.Count][];
for (int i = 0; i < inputFileNames.Count; i++)
{
string currentInputfileName = inputFileNames[i];
string[] currentInputFileLines = File.ReadAllLines(currentInputfileName);
fieldValues[i] = new decimal[currentInputFileLines.Length];
for (int j = 0; j < currentInputFileLines.Length; j++)
{
fieldValues[i][j] = decimal.Parse(currentInputFileLines[j]);
}
}
// Create table
DataTable table = new DataTable();
DataColumn field1Column = table.Columns.Add("field1", typeof (decimal));
DataColumn field2Column = table.Columns.Add("field2", typeof (decimal));
DataColumn field3Column = table.Columns.Add("field3", typeof (decimal));
for (int i = 0; i < fieldValues[0].Length; i++)
{
var newTableRow = table.NewRow();
newTableRow[field1Column.ColumnName] = fieldValues[0][i];
newTableRow[field2Column.ColumnName] = fieldValues[1][i];
newTableRow[field3Column.ColumnName] = fieldValues[2][i];
table.Rows.Add(newTableRow);
}
// Sorting
table.DefaultView.Sort = field1Column.ColumnName;
// Output
foreach (DataRow row in table.DefaultView.ToTable().Rows)
{
foreach (var item in row.ItemArray)
{
Console.Write(item + " ");
}
Console.WriteLine();
}
Now, I tried to keep the code above as LINQ free as I could, since you do not seem to be using it in your example, and therefore might not know about it. That being said, while there is a thousand way to do I/O in C#, LINQ would help you a lot in this instance (and in pretty much any other situation really), so I suggest you look it up if you don't know about it already.
Also, the DataTable option I proposed is just to provide a way for you to visualize and organize the data in a more efficient way. That being said, you are in no way obliged to use a DataTable: you could stay with a more direct approach and use more common data structures (such as lists, arrays or even dictionaries if you know what they are) to store the data, depending on your needs. It's just that with a DataTable, you don't, for example, need to do the sorting yourself, or deal with columns indexed only by integers. With time, you'll come to learn about the myriad of useful data structure and native functionalities the C# language offers you and how they can save you doing the work yourself in a lot of cases.

Categories