C# Sorting, return multiple text file string entries line at a time - c#

I have a C# console window program and I am trying to sort "File3" (contains numbers) in ascending and output lines from 3 text files.
So the outcome looks something like this:
===========================================================================
field1.....................field2.....................field3
===========================================================================
[FILE1_LINE1]..............[FILE2_LINE1]..............[FILE3_LINE1]
[FILE1_LINE2]..............[FILE2_LINE2]..............[FILE3_LINE2]
[FILE1_LINE3]..............[FILE2_LINE3]..............[FILE3_LINE3]
and so on...
At the moment, it kinda works I think but it duplicates the first two lines it seems. Could someone give an example of better coding please?
Here is the code that I have atm:
string[] File1 = System.IO.File.ReadAllLines(#"FILE1.txt");
string[] File2 = System.IO.File.ReadAllLines(#"FILE2.txt");
string[] File3 = System.IO.File.ReadAllLines(#"FILE3.txt");
decimal[] File3_1 = new decimal[File3.Length];
for(int i=0; i<File3.Length; i++)
{
File3_1[i] = decimal.Parse(File3[i]);
}
decimal[] File3_2 = new decimal[File3.Length];
for(int i=0; i<File3.Length; i++)
{
File3_2[i] = decimal.Parse(File3[i]);
}
decimal number = 0;
for (double i = 0.00; i < File3_1.Length; i++)
{
for (int sort = 0; sort < File3_1.Length - 1; sort++)
{
if (File3_1[sort] > File3_1[sort + 1])
{
number = File3_1[sort + 1];
File3_1[sort + 1] = File3_1[sort];
File3_1[sort] = number;
}
}
}
if (SortChoice2 == 1)
{
for (int y = 0; y < File3_2.Length; y++)
{
for (int s = 0; s < File3_2.Length; s++)
{
if (File3_1[y] == File3_2[s])
{
Console.WriteLine(File1[s] + File2[s] + File3_1[y]);
}
}
}
}
Just for more info, most of this code was used for another program and worked but in my new program, this doesn't as I've said above - ("it repeats a couple of lines for some reason"). I'm kinda an amateur/ rookie at C# so I only get stuff like this to work with examples.
Thanks in advance :)

Ok, if I understand correctly, what you are trying to do is read the lines from 3 different files, each of them representing a different "field" in a table. You then want to sort this table based on the value of one of the field (in you code, this seems to be the field which values are contained in File3. Well, if I got that right, here's what I suggest you do:
// Read data from files
List<string> inputFileNames = new List<string> {"File1.txt", "File2.txt", "File3.txt"};
decimal[][] fieldValues = new decimal[inputFileNames.Count][];
for (int i = 0; i < inputFileNames.Count; i++)
{
string currentInputfileName = inputFileNames[i];
string[] currentInputFileLines = File.ReadAllLines(currentInputfileName);
fieldValues[i] = new decimal[currentInputFileLines.Length];
for (int j = 0; j < currentInputFileLines.Length; j++)
{
fieldValues[i][j] = decimal.Parse(currentInputFileLines[j]);
}
}
// Create table
DataTable table = new DataTable();
DataColumn field1Column = table.Columns.Add("field1", typeof (decimal));
DataColumn field2Column = table.Columns.Add("field2", typeof (decimal));
DataColumn field3Column = table.Columns.Add("field3", typeof (decimal));
for (int i = 0; i < fieldValues[0].Length; i++)
{
var newTableRow = table.NewRow();
newTableRow[field1Column.ColumnName] = fieldValues[0][i];
newTableRow[field2Column.ColumnName] = fieldValues[1][i];
newTableRow[field3Column.ColumnName] = fieldValues[2][i];
table.Rows.Add(newTableRow);
}
// Sorting
table.DefaultView.Sort = field1Column.ColumnName;
// Output
foreach (DataRow row in table.DefaultView.ToTable().Rows)
{
foreach (var item in row.ItemArray)
{
Console.Write(item + " ");
}
Console.WriteLine();
}
Now, I tried to keep the code above as LINQ free as I could, since you do not seem to be using it in your example, and therefore might not know about it. That being said, while there is a thousand way to do I/O in C#, LINQ would help you a lot in this instance (and in pretty much any other situation really), so I suggest you look it up if you don't know about it already.
Also, the DataTable option I proposed is just to provide a way for you to visualize and organize the data in a more efficient way. That being said, you are in no way obliged to use a DataTable: you could stay with a more direct approach and use more common data structures (such as lists, arrays or even dictionaries if you know what they are) to store the data, depending on your needs. It's just that with a DataTable, you don't, for example, need to do the sorting yourself, or deal with columns indexed only by integers. With time, you'll come to learn about the myriad of useful data structure and native functionalities the C# language offers you and how they can save you doing the work yourself in a lot of cases.

Related

Rearranging a datatable

I'm importing an excel file into a datatable (dtImport) and rearranging that data into another datatable (dtImportParsed).
Here's what that datatable (dtImport) looks like when I first import it.
And this is how I'm trying to rearrange that datatable (dtImportParsed):
I'm currently accomplishing this by using some nested for loops, but this takes a very long time. For example, a sheet with 36 columns and 4,000 rows takes about 30-40 minutes to complete. Is there an alternative method of accomplishing this that would speed things up?
Here's my code:
for (int c = 2; c < dtImport.Columns.Count; c++) //for each date column
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
if (dtImportParsed.Rows.Count == 0)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
else
{
for (int i = 0; i < dtImportParsed.Rows.Count; i++)
{
if (dtImportParsed.Rows[i]["account_id"].ToString() == dtImport.Rows[r]["account_id"].ToString())
{
if (dtImportParsed.Rows[i]["Date"].ToString() == dtImport.Columns[c].ColumnName.ToString())
{
dtImportParsed.Rows[i][dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
break;
}
}
else if (i == dtImportParsed.Rows.Count - 1)
{
DataRow dataRowImport = dtImportParsed.NewRow();
dataRowImport["Date"] = dtImport.Columns[c].ColumnName.ToString().Trim();
dataRowImport["account_id"] = dtImport.Rows[r]["account_id"].ToString().Trim();
dataRowImport[dtImport.Rows[r]["Event Name"].ToString().Trim()] = dtImport.Rows[r][c].ToString().Trim();
dtImportParsed.Rows.Add(dataRowImport);
}
}
}
}
}
The algorithm you use to generate your expected result is too expensive! It will execute in order (c x r x i) where i > r because empty fields are injected in the final table; Actually it is an O(n3) algorithm! Also you preform it on DataTables via iterating DataRows that probably are not efficient for your requirement.
If your source data-set is not large (as you mentioned) and you have not memory restriction, I propose you to arrange expected data-set in memory using index-based data structures. Something like this:
var arrangeDragon = new Dictionary<string, Dictionary<string, Dictionary<string, string>>>();
The dragon enters! And eats the inner for.
for (int c = 2; c < dtImport.Columns.Count; c++) //for each date column
{
for (int r = 1; r < dtImport.Rows.Count; r++)
{
// ...
// instead of: for (int i = 0; i < dtImportParsed.Rows.Count; i++) ...
string date = dtImport.Columns[c].ColumnName.ToString().Trim();
string accountId = dtImport.Rows[r]["account_id"].ToString();
string eventName = dtImport.Rows [r]["Event Name"].ToString().Trim();
if (!arrangeDragon.ContainsKey(date))
arrangeDragon.Add(date, new Dictionary<string, Dictionary<string, string>>());
if (!arrangeDragon[date].ContainsKey(accountId))
arrangeDragon[date][accountId] = new Dictionary<string, string>();
if (!arrangeDragon[date][accountId].ContainsKey(eventName))
arrangeDragon[date][accountId][eventName] = dtImport.Rows[r][c].ToString().Trim();
// ...
}
}
These checks will execute in O(1) instead of O(i), so total overhead will decrease to O(n2) that is the nature of iterating table :)
Also retrieve order is O(1):
string data_field = arrangeDragon["1/1/2022"]["account1"]["Event1"];
Assert.AreEqual(data_field, "42");
Now you can iterate nested Dictionarys once and build the dtImportParsed.
If your data-set is large or host memory is low, you need other solutions that is not your problem as mentioned ;)
Good luck

Reading CSV of unknown number of rows/columns into Unity array

I want a 2D array generated from a CSV file with unknown number of rows/columns. The column count is fixed based on the header data. I need to be able to process it as a grid going both across rows and down columns hence needing array.
At the moment, I can split the data into rows, then split each row into components. I then add each row to a list. This all seems to work fine.
What I cant do is convert a list of string arrays into a 2d array.
It currently is failing on the line string[,] newCSV = csvFile.ToArray(); with error Cannot implicitly convert type 'string[][]' to 'string[ * , * ]' so I'm obviously not declaring something properly - I've just no idea what!
List<string[]> csvFile = new List<string[]>();
void Start()
{
// TODO: file picker
TextAsset sourceData = Resources.Load<TextAsset>("CSVData");
if (sourceData != null)
{
// Each piece of data in a CSV has a newline at the end
// Split the base data into an array each time the newline char is found
string[] data = sourceData.text.Split(new char[] {'\n'} );
for (int i = 0; i < data.Length; i ++)
{
string[] row = data[i].Split(new char[] {','} );
Debug.Log(row[0] + " " + row[1]);
csvFile.Add(row);
}
string[,] newCSV = csvFile.ToArray();
} else {
Debug.Log("Can't open source file");
}
Since your data is in the form of a table, I highly suggest using a DataTable instead of a 2d array like you're currently using to model/hold the data from your csv.
There's a ton of pre baked functionality that comes with this data structure that will make working with your data much easier.
If you take this route, you could then also use this which will copy CSV data into a DataTable using the structure of your CSV data to create the DataTable.
It's very easy to configure and use.
Just a small tip, you should always try to use data structures that best fit your task, whenever possible. Think of the data structures and algorithms you use as tools used to build a house, while you could certainly use a screw driver to pound in a nail, it's much easier and more efficient to use a hammer.
You can use this function to get 2d array.
static public string[,] SplitCsvGrid(string csvText)
{
string[] lines = csvText.Split("\n"[0]);
// finds the max width of row
int width = 0;
for (int i = 0; i < lines.Length; i++)
{
string[] row = SplitCsvLine(lines[i]);
width = Mathf.Max(width, row.Length);
}
// creates new 2D string grid to output to
string[,] outputGrid = new string[width + 1, lines.Length + 1];
for (int y = 0; y < lines.Length; y++)
{
string[] row = SplitCsvLine(lines[y]);
for (int x = 0; x < row.Length; x++)
{
outputGrid[x, y] = row[x];
// This line was to replace "" with " in my output.
// Include or edit it as you wish.
outputGrid[x, y] = outputGrid[x, y].Replace("\"\"", "\"");
}
}
return outputGrid;
}

Getting the column totals in a 2D array but it always throws FormatException using C#

I am planning to get an array of the averages of each column.
But my app crashes at sum[j] += int.Parse(csvArray[i,j]); due to a FormatException. I have tried using Convert.ToDouble and Double.Parse but it still throws that exception.
The increments in the for loop start at 1 because Row 0 and Column 0 of the CSV array are strings (names and timestamps). For the divisor or total count of the fields that have values per column, I only count the fields that are not BLANK, hence the IF statement. I think I need help at handling the exception.
Below is the my existing for the method of getting the averages.
public void getColumnAverages(string filePath)
{
int col = colCount(filePath);
int row = rowCount(filePath);
string[,] csvArray = csvToArray(filePath);
int[] count = new int[col];
int[] sum = new int[col];
double[] average = new double[col];
for (int i = 1; i < row; i++)
{
for (int j = 1; j < col; j++)
{
if (csvArray[i,j] != " ")
{
sum[j] += int.Parse(csvArray[i,j]);
count[j]++;
}
}
}
for (int i = 0; i < average.Length; i++)
{
average[i] = (sum[i] + 0.0) / count[i];
}
foreach(double d in average)
{
System.Diagnostics.Debug.Write(d);
}
}
}
I have uploaded the CSV file that I use when I test the prototype. It has BLANK values on some columns. Was my existing IF statement unable to handle that case?
There are also entries like this 1.324556-e09due to the number of decimals I think. I guess I have to trim it in the csvToArray(filePath) method or are there other efficient ways? Thanks a million!
So there are a few problems with your code. The main reason for your format exception is that after looking at your CSV file your numbers are surrounded by quotes. Now I can't see from your code exactly how you convert your CSV file to an array but I'm guessing that you don't clear these out - I didn't when I first ran with your CSV and experienced the exact same error.
I then ran into an error because some of the values in your CSV are decimal, so the datatype int can't be used. I'm assuming that you still want the averages of these columns so in my slightly revised verion of your method I change the arrays used to be of type double.
AS #musefan suggested, I have also changed the check for empty places to use the IsNullOrWhiteSpace method.
Finally when you output your results you receive a NaN for the first value in the averages column, this is because when you don't take into account that you never populate the first position of your arrays so as not to process the string values. I'm unsure how you'd best like to correct this behaviour as I'm not sure of the intended purpose - this might be okay - so I've not made any changes to this for the moment, pop a mention in the comments if you want help on how to sort this!
So here is the updated method:
public static void getColumnAverages(string filePath)
{
// Differs from the current implementation, reads a file in as text and
// splits by a defined delim into an array
var filePaths = #"C:\test.csv";
var csvArray = File.ReadLines(filePaths).Select(x => x.Split(',')).ToArray();
// Differs from the current implementation
var col = csvArray[0].Length;
var row = csvArray.Length;
// Update variables to use doubles
double[] count = new double[col];
double[] sum = new double[col];
double[] average = new double[col];
Console.WriteLine("Started");
for (int i = 1; i < row; i++)
{
for (int j = 1; j < col; j++)
{
// Remove the quotes from your array
var current = csvArray[i][j].Replace("\"", "");
// Added the Method IsNullOrWhiteSpace
if (!string.IsNullOrWhiteSpace(current))
{
// Parse as double not int to account for dec. values
sum[j] += double.Parse(current);
count[j]++;
}
}
}
for (int i = 0; i < average.Length; i++)
{
average[i] = (sum[i] + 0.0) / count[i];
}
foreach (double d in average)
{
System.Diagnostics.Debug.Write(d + "\n");
}
}

Summarize table using C# and JScript

I've written code in C# to summarize values of InvoiceTable and move those summarized values to to GroupTable in Abbyy FlexiCapture. The Software is comparatively new and does not show any error when I run it.
There are two sets of code to be written:
In TechField.
In EventHandlers.
InvoiceTable consists of:
TariffNumber
ShipQty
Amount
COO
GroupTable consists of:
HSCode
Qty
Amt
CountryOO
EventHandlers code is as follows (C#):
if (ChangedStates.Has(7)) {
int currentRow = 0;
int i;
for (i = 0; i < Document.Field("Invoice2\\InvoiceTable").Items.Count; i++) {
if (Document.Field("Invoice2\\InvoiceTable").Cell("TariffNumber", i).Value == "") {
Document.Field("Invoice2\\GroupTable").Cell("HSCode", currentRow).Value = Document.Field("Invoice2\\InvoiceTable").Cell("TariffNumber", i).Value;
Document.Field("Invoice2\\GroupTable").Cell("Amt", currentRow).Value = Document.Field("Invoice2\\InvoiceTable").Cell("Amount", i).Value;
Document.Field("Invoice2\\GroupTable").Cell("Qty", currentRow).Value = Document.Field("Invoice2\\InvoiceTable").Cell("ShipQty", i).Value;
currentRow++;
}
}
}
TechField is as follows (JScript):
for (i = 0; i < Field("ShipQty").Items.Count - 1; i++) {
for (j = i + 1; j < Field("ShipQty").Items.Count; j++) {
// if same new items are found
if (Field("TariffNumber").Items(i).Value == Field("TariffNumber").Items(j).Value && Field("CoO").Items(i).Value == Field("CoO").Items(j).Value)
{
// summarise quantities
Field("ShipQty").Items(i).Value = parseInt(Field("ShipQty").Items(i).Value) + parseInt(Field("ShipQty").Items(j).Value);
// and weights
Field("Amount").Items(i).Value = parseFloat(Field("Amount").Items(i).Value) + parseFloat(Field("Amount").Items(j).Value);
}
}
}
Condition:
In the InvoiceTable, where ever TariffNumber and COO are equal, values of ShipQty and Amount should be summarized and put into GroupTable.
The code does not show any errors but does not give the output as well. Would be great if anyone of you can help me out on this.
One thing you could try is adding a summary section to the document definition. This may require you to create a document set.
I've found it easier to create rules which are checked when the field is recognised. There's some info here: https://help.abbyy.com/en-us/flexicapture/12/distributed_administrator/docsets_settings/

How do I write data column b y column to a csv file

I have a big problem with writing some data to a csv file. I have a lot of measurement values. Every value is described by name, unit, value. So i want to build for every value a column with these three properties.
I want to store it into the csv file like this:
Width Cell Wall Thickness Coarseness Curl-Index etc.
mm mm mg/m % etc.
16,2 3,2 0,000 11,7 etc.
Till now i was coding a header for the names, another for the units and the values (that were previously stored into a string array) i just wrote in one line.
Till now my csv file looks like this:(
Width;Cell Wall Thickness;Coarseness;Curl-Index;etc.
mm;mm;mg/m;%;etc.
16,2;3,2;0,000;11,7;etc.
if it were not many values i wouldn't care about this but there are a lot so when i open the csv file there's the problem that the headers dont fit to the values and units. It's not organized, i cannot match the values to the headers.
I would like everything to be organized in columns. Any help would be strongly appreciated!
That's the code that i have till now:
StreamWriter sw = new StreamWriter("test2.csv");
int RowCount = 3;
int ColumnCount = 4;
string[][] Transfer_2D = new string[RowCount][];
Transfer_2D[0] = new string[3]{"Width", "CWT", "Coarseness", "Curl-Index"};//name of the values
Transfer_2D[1] = new string[3] {"mm", "mm", "mg/m", "%"}; //units
Transfer_2D[2] = new string[3] { TransferData[0], TransferData[1], TransferData[2], TransferData[3] };
for (int i = 0; i < RowCount; i++)
{
for (int j = 0; j < ColumnCount; j++)
{
sw.Write(Transfer_2D[i][j]);//write one row separated by columns
if (j < ColumnCount)
{
sw.Write(";");//use ; as separator between columns
}
}
if (i < RowCount)
{
sw.Write("\n");//use \n to separate between rows
}
}
sw.Close();
}
you can set the string to a fixed length.
example look here: (.NET Format a string with fixed spaces)
int iWantedStringLength = 20;
string sFormatInstruction = "{0,-" + iWantedStringLength.ToString() + "}";
for (int i = 0; i < RowCount; i++)
{
for (int j = 0; j < ColumnCount; j++)
{
sw.Write(String.Format(sFormatInstruction, Transfer_2D[i][j]));//write one row separated by columns
if (j < ColumnCount)
{
sw.Write(";");//use ; as separator between columns
}
}
if (i < RowCount)
{
sw.Write("\n");//use \n to separate between rows
}
}
For CSV work I use http://joshclose.github.io/CsvHelper/, this is a very nice helper class but it would require you to change your working a little bit.
I would advice you create a class to store each entry in and then create a "Mapper" to map it to csv fields. Then you can just pass a collection of objects and your mapping class to the helper and it produces a structured CSV.
There are alot of examples on that page so it should be straight forward for you to work through.

Categories