I have one DataTable that i need to split into multiple DataTables, with the same structure, by rows.
The way i need to split the tables is:
if i have a table with 40 rows, each individual new table can have a maximum of 17 rows. So it should be first DataTable with rows 1-17, second from 18-34 and third from 35 to 40. Finally, i would add the mentioned tables to the DataSet.
I tried creating copies of the tables and deleting rows by index, but that didn't work.
you can use table.AsEnumerable() and use Skip(startRowIndex) for start index of rows and take(size) for size of each table...
var t1 = table.AsEnumerable().Skip(0).Take(17).CopyToDataTable();
var t2 = table.AsEnumerable().Skip(17).Take(17).CopyToDataTable();
...
A reusable way that handles cases like empty tables or tables that contain less rows than the split-count is this method:
public static IEnumerable<DataTable> SplitTable(DataTable table, int splitCount)
{
if (table.Rows.Count <= splitCount)
{
yield return table.Copy(); // always create a new table
yield break;
}
for (int i = 0; i + splitCount <= table.Rows.Count; i += splitCount)
{
yield return CreateCloneTable(table, i, splitCount);
}
int remaining = table.Rows.Count % splitCount;
if (remaining > 0)
{
yield return CreateCloneTable(table, table.Rows.Count - remaining, splitCount);
}
}
private static DataTable CreateCloneTable(DataTable mainTable, int startIndex, int length)
{
DataTable tClone = mainTable.Clone(); // empty but same schema
for (int r = startIndex; r < Math.Min(mainTable.Rows.Count, startIndex + length); r++)
{
tClone.ImportRow(mainTable.Rows[r]);
}
return tClone;
}
Here is your case with 40 rows and 17 split-count:
DataTable table = new DataTable();
table.Columns.Add();
for (int i = 1; i <= 40; i++) table.Rows.Add(i.ToString());
DataSet ds = new DataSet();
ds.Tables.AddRange(SplitTable(table, 17).ToArray());
Demo: https://dotnetfiddle.net/8XUBjH
Related
I'm trying to read data from an excel table which this piece of code performs perfectly
However, this will read all columns from the table which I don't need, I only need specific columns from the excel table
int fabricHeaderRow = findFabricHeader(excelRange); //row number 26
int rows = excelRange.Rows.Count;
int cols = excelRange.Columns.Count;
myNewRow3 = null;
for (int i = fabricHeaderRow + 2; i <= rows; i++)
{
myNewRow3 = bomTable.NewRow();
if (excelRange.Cells[i, 1].Value2 != null) //Checks that the item column isn't empty
{
for (int j = 1; j <= cols; j++)
{
if (excelRange.Cells[i, j].Value2 == null)
{
myNewRow3[j - 1] = string.Empty; //inserts empty string into datagrid row[i] if the cell is empty
}
else
{
myNewRow3[j - 1] = excelRange.Cells[i, j].Value2.ToString();
}
}
if (myNewRow3 != null)
{
bomTable.Rows.Add(myNewRow3); //adds a new row to datable if not null
}
}
else
{
break; //break out of the outer for-loop as it reached an empty excel row
}
}
Excel table example
For example, from the table, I only need ID, COLOR, DESCRIPTION, SUPPILER and COST columns in my datarow, I'm stuck at trying to figure out how to read specific columns with multiple rows into a datarow.
public string[] bomArr = { "ID", "COLOR", "DESCRIPTION", "SUPPLIER", "COST"};
//for loop here
if (bomArr.Any(x => excelRange.Cells[27, j].Value2.Equals(x))){
myNewRow[currentCount] = excelRange.Cells[i, j].Value2.ToString(); //string
currentCount++;
}
I'm thinking of trying this, basically it'll only add to the datarow if any values in the array matches with a column header. But I only can make it work for tables with a single row, multiple rows make it more complicated.
I have some logical block in my C# program, as I am new to C# programming.
I have a data-table with duplicate header column names. I have to change my duplicate header like concatenate the name of the prior column to make these headers unique. Table names are coming dynamically.
Current datatable dTable
ID |Name|Age| School | Name | state| Part| Country|Division|Part
Expected dTable
ID |Name|Age| School | Name+school | state| Part| Country|Division|Part+Division
What I have tried and blocked here below
public DataTable RemoveDuplicateRows(DataTable dTable)
{
string[] columnNames = dTable.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToArray();
for (int i = 0; i < columnNames.Length; i++)
{
columnNames[i] = columnNames[i].Split('.')[0].Trim();
}
for (int i = 0; i < columnNames.Length; i++)
{
// create nested loop for compare current values with actual value of arr
for (int j = i + 1; j < columnNames.Length; j++)
{
if (columnNames[i] == columnNames[j])
{
var previous = columnNames[i - 1];
var current = columnNames[i];
columnNames[i] = current + previous;
// blocked here
// only one header is concatenating
// how can I add this newly edited columns to my datatable
}
}
}
return dTable; //cant get updated column headers
}
I have a program that I'm writing to extract certain data from various excel spreadsheet.
The process so far is:
for each spreadsheet identified:
-read in the data as a multidimensional array using interop excel (Even though it is slow, it is the best choice due to all of the different file formats I need to read in)
Sample: object[,] cellValues = (object[,])range.Value2;
-Identify the columns that I actually need and what order I need them in. This is stored in a jagged array of bytes:
byte[][] targetColumns
-THe jagged array essentially is (columnIndexFromSpreadsheet, preferredColumnOrder) e.g. if the first column in the spreadsheet should be read in as column 10 it would be (1, 10)
-I sort the jagged array by the preferred column order (that way I can just loop through the array in that order and extract those columns):
public static byte[][] SortTargetColumns(byte[][] targetColumns)
{
return targetColumns.OrderBy(x => x.Skip(1).First()).ToArray();
}
-I then extract that column by creating an array from that column index of the multidimensional array. This is the method that is called:
public static object[] ExtractColumn(object[,] dataArray ,byte columnIndex)
{
return Enumerable.Range(ArrayIndexStart, dataArray.GetLength(0)).Select(x => dataArray[x, columnIndex]).ToArray();
}
Usage:
array = ExtractColumn(dataArray, (byte) colIndex);
Now I am trying to piece these extracted arrays back together to make it readable. I will need to do some manipulation on some of the columns and then write to a text file after consolidating. The only problem is that I have no idea how to do this correctly. I have tried the following methods but continue to get a null reference exception:
// Get Row Count of dataArray
int rowCount = dataArray.GetLength(0);
// Create List to store extracted arrays
List<object[]> extractedDataList = new List<object[]>();
// Loop through target columns and extract the column as an array
for (byte colIndex = 1; colIndex <= targetColumns.Length + 1; colIndex++)
{
object[] array = ExtractColumn(dataArray, (byte) colIndex);
extractedDataList.Add(array);
}
// Create jagged array
object[][] extractedDataArray = new object[rowCount][] ;
for(int i = 0; i < extractedDataArray.GetLength(0); i++)
{
List<object> row = new List<object>();
for (int j = 0; j < extractedDataList.Count; j++)
{
row.Add(extractedDataList[j][i].ToString());
//extractedDataArray[i][j] = extractedDataList[j][i].ToString(); <-- null reference
}
extractedDataArray[i] = row.ToArray();
}
I'm at a loss of what else to try to put these column arrays back in a form that I can easily work with. Any and all tips/recommendations would be greatly appreciated.
Whenever you get confused like this, break the problem down into small pieces, and use meaningful names.
Let's say you have an array of columns, each of which has one element per row. That might be declared like this:
object[][] columns;
First, let's get the row and column counts:
var columnCount = columns.Length;
var rowCount = columns[0].Length;
Now write a small local function to accept a row and column index and return the right cell. In case not all of your columns have the same number of rows, you can include a boundary check and just return null if a cell isn't there.
object Getter(int row, int col)
{
bool outOfBounds = (row >= columns[col].Length);
return outOfBounds ? null : columns[col][row];
}
Now all we have to do is iterate over the rows to create the inner arrays:
object[][] target = new object[rowCount][]
for (int row = 0; row < rowCount; row++)
{
target[row] = new object[columnCount];
}
And add in the code that uses the getter to populate the cells:
object[][] target = new object[rowCount][];
for (int row = 0; row < rowCount; row++)
{
target[row] = new object[columnCount];
for (int col = 0; col < columnCount; col++)
{
var cellValue = Getter(row, col);
target[row][columnCount] = cellValue;
}
}
All together, it is simple to read:
var columnCount = columns.Length;
var rowCount = columns[0].Length;
object Getter(int row, int col)
{
bool outOfBounds = (row >= columns[col].Length);
return outOfBounds ? null : columns[col][row];
}
object[][] target = new object[rowCount][];
for (int row = 0; row < rowCount; row++)
{
target[row] = new object[columnCount];
for (int col = 0; col < columnCount; col++)
{
var cellValue = Getter(row, col);
target[row][columnCount] = cellValue;
}
}
I am getting the index of the cell of a word table using for loop which takes a lot of time for bigger tables, is there any way to do this without for loop?
public static int[] GetColumnIndex(Xceed.Words.NET.Table table, string columnName, int endRow,int k)
{
int[] data = { -1, -1 };
for (int j = k; j < endRow; j++)
{
for (int i = 0; i < table.Rows[j].Cells.Count; ++i)
{
if (table.Rows[j].Cells[i].Paragraphs[0].Text.Equals("«" + columnName + "»"))
{
data[0] = j;
data[1] = i;
return data;
}
}
}
return data;
}
and I am calling this function form another function
int startRow = 0, endRow = 0;
int[] ind;
DocX doc;
doc = DocX.Load(fileName);
Xceed.Words.NET.Table t;
t = doc.Tables[0];
endRow = t.Rows.Count;
System.Data.DataTable dt = new DataTable();
dt = reader(report.Query);
foreach (DataColumn col in dt.Columns)
{
ind = GetColumnIndex(t, col.ColumnName, endRow,2);
//...more code here...
}
A few things you can do to optimise your algorithm (based on your access pattern) is that you search the same table number of times (in fact, since you are searching column names in the table, number of searches increases quickly as the table gets big). Hence, it would be worth transforming the data in the table to a data structure indexed by the words (for e.g. a Sorted Dictionary).
Firstly, create a class that holds the content of the table. This way when you want to search the same table, you can use the same instance of the class and avoid recreating the data structure based on the sorted dictionary:
public class XceedTableAdapter
{
private readonly SortedDictionary<string, (int row, int column)> dict;
public XceedTableAdapter(Xceed.Words.NET.Table table)
{
this.dict = new SortedDictionary<string, (int, int)>();
// Copy the content of the table into the dict.
// If you have duplicate words you need a SortedDictionary<string, List<(int, int)>> type. This is not clear in your question.
for (var i = 0, i < rowCount; i++)
{
for (var j = 0; j < columnCount; j++)
{
// this will overwrite the index if the text was previously found:
this.dict[table.Rows[i].Cells[j].Paragraphs[0].Text] = (i, j);
}
}
}
public (int, int) GetColumnIndex(string searchText)
{
if(this.dict.TryGetValue(searchText, out var index))
{
return index;
}
return (-1, -1);
}
}
Now you loop the entire table only once and the subsequent searches will happen in O(log n). If Xceed has a function to transform data table to a dictionary, that would be quite handy. I'm not familiar with this library.
Now you can search it like:
var searchableTable = new XceedTableAdapter(doc.Tables[0]);
foreach (var col in dt.Columns)
{
ind = searchableTable.GetColumnIndex(col);
}
I need to shuffle rows of DataTable as randomly accessing indexes would not work in my scenario. So I have dt1 having base data which I have to shuffle and dt is the DataTable having shuffled data. And my code is:
int j;
for (int i = 0; i < dt1.Rows.Count - 1; i++)
{
j = rnd.Next(0, dt1.Rows.Count - 1);
DataRow row = dt1.Rows[j];
dt.ImportRow(row);
}
Their is no syntax error but when I run my code where I further access dt I some of same rows get imported twice. What am I doing wrong here?
DataRow can only belong to a one DataTable, create a new Row with the values from existing DataRow.
dt.Rows.Add(row.ItemArray);
Or
dt.ImportRow(row);
Update:
Another approach to randomize any collection (From this Link).
public static class Extensions
{
private static Random random = new Random();
public static IEnumerable<T> OrderRandomly<T>(this IEnumerable<T> items)
{
List<T> randomly = new List<T>(items);
while (randomly.Count > 0)
{
Int32 index = random.Next(randomly.Count);
yield return randomly[index];
randomly.RemoveAt(index);
}
}
}
Now you can randomize any collection just by calling this extension function.
var dt = dt1.AsEnumerable()
.OrderRandomly()
.CopyToDataTable();
Check this Example
Here's an extension method I wrote for datatables. In a static DataTableExtensions Class
public static DataTable Shuffle(this DataTable table) {
int n = table.Rows.Count;
List<DataRow> shuffledRows = new List<DataRow>();
foreach (DataRow row in table.Rows) {
shuffledRows.Add(row);
}
while (n > 1) {
n--;
int k = Random.Range(0, n + 1);
DataRow value = shuffledRows[k];
shuffledRows[k] = shuffledRows[n];
shuffledRows[n] = value;
}
DataTable shuffledTable = table.Clone();
foreach (DataRow row in shuffledRows) {
shuffledTable.ImportRow(row);
}
return shuffledTable;
}
Probably not most efficient but it works.
use:
DataTable shuffledTable = otherDataTable.Shuffle();
here is my solution.
Just pass your table to the function and function will randomize the rows within the table.
public static void RandomizeTable(DataTable RPrl)
{
System.Security.Cryptography.RNGCryptoServiceProvider provider = new System.Security.Cryptography.RNGCryptoServiceProvider();
int n = RPrl.Rows.Count;
while (n > 1)
{
byte[] box = new byte[1];
do
{
provider.GetBytes(box);
}
while (!(box[0] < n * (System.Byte.MaxValue / n)));
int k = (box[0] % n);
n--;
object[] tmp = RPrl.Rows[k].ItemArray;
RPrl.Rows[k].ItemArray = RPrl.Rows[n].ItemArray;
RPrl.Rows[n].ItemArray = tmp;
}
}