Blocking Collection shows Duplicate entries - c#

I first retrieve Total number of rows in my table (say 100) and then divide them into chunks (say 25). Then I create a task (taskFetch) which fetches rows from MyTable in chunks into DataTable (each containing 25 records) using Parallel.Foreach() method. There's another nested Parallel,Foreach() which uses Partitioner.Create() which retrieves data from each DataTable and Adds it into Blocking Collection (sourceCollection). For testing purpose I output the result on console and it was fine.
But, as I tried to retrieve data from sourceCollection I found duplicate entries. In real I have over 1500,000 records in table. I dont exactly think that duplicate entries are Add-ed but the way I'm Take-ing is something I'm a bit doubtful about.
Code
public async Task BulkMigrationAsync(string clearPVK, string EncZPK)
{
BlockingCollection<MigrationObject> sourceCollection = new BlockingCollection<MigrationObject>();
List<Task> tasksGeneratedPinBlock = new List<Task>();
int rangeFrom = 1;
int recordsInSet = 25;
int rangeTo = 25;
int setsCount = 0;
Dictionary<int, Tuple<int, int>> chunks = new Dictionary<int, Tuple<int, int>>();
SqlConnection conn = new SqlConnection(_Database.ConnectionString);
SqlCommand cmd = new SqlCommand("SELECT COUNT(*) FROM MyTable); // Suppose If retrieves 100 rows
// getting total records from MyTable
using (conn)
{
conn.Open();
setsCount = Convert.ToInt32(cmd.ExecuteScalar()) / recordsInSet ; // then setsCount will be 4
conn.Close();
}
for (int i = 0; i < setsCount; i++)
{
chunks.Add(i, new Tuple<int, int>(rangeFrom, rangeTo));
rangeFrom = rangeTo + 1;
rangeTo = rangeTo + recordsInSet;
}
// Each chunk would contain 100000 records to be preocessed later
// chunks => {0, (1, 25)}
// {1, (26, 50)} // a chunk, chunk.Value.Item1 = 26 and chunk.Value.Item2 = 50
// {2, (51, 75)}
// {3, (76, 100)}
// fetching results in dataTable from DB in chunks and ADDING to sourceCollection
Task taskFetch = Task.Factory.StartNew(() =>
{
Parallel.ForEach(chunks, (chunk) =>
{
DataTable dt = new DataTable();
SqlConnection localConn = new SqlConnection(_Database.ConnectionString);
string command = #"SELECT * FROM ( SELECT RELATIONSHIP_NUM, CUST_ID, CODE, BLOCK_NEW, ROW_NUMBER() over (
order by RELATIONSHIP_NUM, CUST_ID) as RowNum FROM MyTable) SUB
WHERE SUB.RowNum BETWEEN chunk.Value.Item1 AND chunk.Value.Item2";
SqlDataAdapter da = new SqlDataAdapter(command, localConn);
try
{
using (da)
using (localConn)
{
da.Fill(dt);
}
}
finally
{
if (localConn.State != ConnectionState.Closed)
localConn.Close();
localConn.Dispose();
}
Parallel.ForEach(Partitioner.Create(0, dt.Rows.Count),
(range, state) =>
{
MigrationObject migSource = new MigrationObject();
for (int i = range.Item1; i < range.Item2; i++)
{
migSource.PAN = dt.Rows[i]["CUST_ID"].ToString();
migSource.PinOffset = dt.Rows[i]["CODE"].ToString();
migSource.PinBlockNew = dt.Rows[i]["BLOCK_NEW"].ToString();
migSource.RelationshipNum = dt.Rows[i]["RELATIONSHIP_NUM"].ToString();
Console.WriteLine(#"PAN " + migSource.PAN + " Rel " + migSource.RelationshipNum
+ " for ranges : " + range.Item1 + " TO " + range.Item2);
sourceCollection.TryAdd(migSource);
}
});
});
});
await taskFetch;
sourceCollection.CompleteAdding();
while (!sourceCollection.IsCompleted)
{
MigrationObject mig;
if (sourceCollection.TryTake(out mig)) // Seems to be the problem area because may be im not handling out
{
await Task.Delay(50);
Console.WriteLine(" Rel " + mig.RelationshipNum + " PAN " + mig.PAN);
}
}
}

My bad, Actually the problem area is :
Parallel.ForEach(Partitioner.Create(0, dt.Rows.Count),
(range, state) =>
{
MigrationObject migSource = new MigrationObject(); // creating the object outside For loop.
for (int i = range.Item1; i < range.Item2; i++)
{
migSource.PAN = dt.Rows[i]["CUST_ID"].ToString();
migSource.PinOffset = dt.Rows[i]["CODE"].ToString();
migSource.PinBlockNew = dt.Rows[i]["BLOCK_NEW"].ToString();
migSource.RelationshipNum = dt.Rows[i]["RELATIONSHIP_NUM"].ToString();
Console.WriteLine(#"PAN " + migSource.PAN + " Rel " + migSource.RelationshipNum
+ " for ranges : " + range.Item1 + " TO " + range.Item2);
sourceCollection.TryAdd(migSource);
}
});
instead I should have included ' MigrationObject migSource = new MigrationObject();' inside For loop :
Parallel.ForEach(Partitioner.Create(0, dt.Rows.Count),
(range, state) =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
MigrationObject migSource = new MigrationObject();
migSource.PAN = dt.Rows[i]["CUST_ID"].ToString();
migSource.PinOffset = dt.Rows[i]["CODE"].ToString();
migSource.PinBlockNew = dt.Rows[i]["BLOCK_NEW"].ToString();
migSource.RelationshipNum = dt.Rows[i]["RELATIONSHIP_NUM"].ToString();
Console.WriteLine(#"PAN " + migSource.PAN + " Rel " + migSource.RelationshipNum
+ " for ranges : " + range.Item1 + " TO " + range.Item2);
sourceCollection.TryAdd(migSource);
}
});

Related

C# / VB .Net Performance tuning, Generate all possible Lottery combinations

Trying to generate all possible draws(combinations) for a lottery of unique 6 out of 42.
Actually looking for the most efficient way to do this (so that the actual generation does not take days).
Aside from the processing HOG (which is to be expected) .. i'm running into a memory limitation issue .. where my machine of 12GB ram cant hold 10% of the amount on number, let alone all combos.
So i decided to look into a Database alternative.
But with that i have the problem of duplicates (since i do not have the whole list in memory to check for existence).
I tried a lot of code versions but all are resource consuming.
Currently looking for alternatives that actually work :)
Here's my latest code sample that employs a database for later record processing and filtering and duplication removal:
public List<Draw> getDrawsContaining(List<int> initialBalls)
{
if (initialBalls == null)
initialBalls = new List<int>();
if (initialBalls.Count >= 6)
return new List<Draw> { new Draw(initialBalls) };
List<Draw> toReturn = new List<Draw>();
for (int i = 1; i <= 42; i++)
{
if (initialBalls.IndexOf(i) != -1)
continue;
initialBalls.Add(i);
toReturn.AddRange(getDrawsContaining(initialBalls));
initialBalls.Remove(i);
}
return toReturn;//.Distinct(dc).ToList();
}
AND say in the Page_Load i fire this :
try
{
using (SqlConnection connection = new SqlConnection(sqlConnectionString))
{
connection.Open();
String query = "TRUNCATE TABLE Draws";
SqlCommand command = new SqlCommand(query, connection);
//command.Parameters.Add("#id", "abc");
command.ExecuteNonQuery();
connection.Close();
}
DataTable dt = new DataTable("Draws");
dt.Columns.Add("Ball1");
dt.Columns.Add("Ball2");
dt.Columns.Add("Ball3");
dt.Columns.Add("Ball4");
dt.Columns.Add("Ball5");
dt.Columns.Add("Ball6");
for (int j = 1, k = 1; j <= 42 && k <= 42; )
{
List<Draw> drawsPart = getDrawsContaining(new List<int> { j, k });
if (drawsPart.Count > 0)
{
foreach (Draw d in drawsPart)
{
d.Balls.OrderBy(c => c);
DataRow dr = dt.NewRow();
dr["Ball1"] = d.Balls[0];
dr["Ball2"] = d.Balls[1];
dr["Ball3"] = d.Balls[2];
dr["Ball4"] = d.Balls[3];
dr["Ball5"] = d.Balls[4];
dr["Ball6"] = d.Balls[5];
dt.Rows.Add(dr);
}
DataTable tmp = dt.Copy();
dt.Rows.Clear();
AsyncDBSave AsyncDBSaveInstance = new AsyncDBSave(tmp, AsyncDBSaveDispose);
Thread t = new Thread(new ThreadStart(AsyncDBSaveInstance.commit));
t.Start();
}
k++;
if (k == 43) { j++; k = 1; }
}
}
catch (Exception ex)
{
var v = ex.Message;
throw;
}
Here we go... all very fast and efficient:
using System;
using System.Diagnostics;
static class Program
{
static void Main(string[] args)
{
byte[] results = new byte[6 * 5245786];
byte[] current = new byte[6];
int offset = 0;
var watch = Stopwatch.StartNew();
Populate(results, ref offset, current, 0);
watch.Stop();
Console.WriteLine("Time to generate: {0}ms", watch.ElapsedMilliseconds);
Console.WriteLine("Data size: {0}MiB",
(results.Length * sizeof(byte)) / (1024 * 1024));
Console.WriteLine("All generated; press any key to show them");
Console.ReadKey();
for (int i = 0; i < 5245786; i++)
{
Console.WriteLine(Format(results, i));
}
}
static string Format(byte[] results, int index)
{
int offset = 6 * index;
return results[offset++] + "," + results[offset++] + "," +
results[offset++] + "," + results[offset++] + "," +
results[offset++] + "," + results[offset++];
}
static void Populate(byte[] results, ref int offset, byte[] current, int level)
{
// pick a new candidate; note since we're doing C not P, assume ascending order
int last = level == 0 ? 0 : current[level - 1];
for (byte i = (byte)(last + 1); i <= 42; i++)
{
current[level] = i;
if (level == 5)
{
// write the results
results[offset++] = current[0];
results[offset++] = current[1];
results[offset++] = current[2];
results[offset++] = current[3];
results[offset++] = current[4];
results[offset++] = current[5];
}
else
{
// dive down
Populate(results, ref offset, current, level + 1);
}
}
}
}
Just for fun, non recursive version is about 2-3 times faster
static byte[] Populate2()
{
byte[] results = new byte[6 * 5245786];
int offset = 0;
for (byte a1 = 1; a1 <= 37; ++a1)
for (byte a2 = a1; ++a2 <= 38;)
for (byte a3 = a2; ++a3 <= 39;)
for (byte a4 = a3; ++a4 <= 40;)
for (byte a5 = a4; ++a5 <= 41;)
for (byte a6 = a5; ++a6 <= 42;)
{
results[offset] = a1;
results[offset+1] = a2;
results[offset+2] = a3;
results[offset+3] = a4;
results[offset+4] = a5;
results[offset+5] = a6;
offset += 6;
}
return results;
}

Multithreading List AddRange

Issue is that TotalRows is about 71800 where the workList only returns 718 which is only the first result of the Task. I have the WaitAll there but it seems to finish as soon as the first task is done.
TotalRows = GetRowCount();
var lastRecord = 0;
List<tmpWBITEMALL> workList = new List<tmpWBITEMALL>();
for (int i = 0; i < 100; i++)
{
var tmpI = i;
gatherTmpTasks.Add(Task.Factory.StartNew(() =>
{
var context = new AS400_PIM5ContextDataContext();
context.CommandTimeout = 0;
int amount = (TotalRows / 100);
int tmplastRecord = lastRecord;
Interlocked.Add(ref lastRecord, amount);
Console.WriteLine("Getting rows " + tmplastRecord+ " to " + (tmplastRecord + amount));
var pagedResult = context.ExecuteQuery<tmpWBITEMALL>("SELECT * FROM (SELECT ROW_NUMBER() OVER ( ORDER BY Id ) AS RowNum, * from tmpWBITEMALL) AS RowConstrainedResult WHERE RowNum >= " + tmplastRecord+ " AND RowNum < " + amount + " ORDER BY RowNum");
lock (listLock)
workList.AddRange(pagedResult);
context.Dispose();
}));
}
Task.WaitAll(gatherTmpTasks.ToArray());
Console.WriteLine("total work: " + workList.Count + " tasks: " + gatherTmpTasks.Count);
So as reference gatherTmpTasks.Count returns 100 but workList.Count is only 718 where as listLock is just a static new object(). If didn't notice already I am using LINQ to SQL
Anyone have ideas why my list isn't the same size as TotalRows?
" AND RowNum < " + amount: amount is always 718, so you are asking the query to always return things between tmplastRecord and 718, NOT inbetween tmplastRecord and tmplastRecord + amount. I think you just need to change to " AND RowNum < " + (tmplastRecord + amount)
Wise man

Sum from dataGridView's column not calculating the right way in WindowsFormApplication

I'm struggling to calculate the sums from 3 columns in my dataGridView. The fact is that the function which does this sum is correctly, because on function load where I call the function the first time, it shows the right sum, but if I call it on button that inserts data in gridView, after inserting the data, it shows a totally random number, and I can't understand where is the problem. Thanks !
Here is the button that inserts and calculats the sum after insert:
private void button1_Click(object sender, EventArgs e)
{
if (checkBox1.Checked)
{
label5.Text = ("1");
}
if (checkBox2.Checked)
{
label5.Text = ("0");
}
textBox1.Text = (Convert.ToInt32(textBox5.Text) - Convert.ToInt32(textBox6.Text)).ToString();
string startPath = Application.StartupPath;
var filepath = startPath + "\\" + "Grupe.sdf";
var connString = (#"Data Source=" + filepath);
using (var conn = new SqlCeConnection(connString))
{
try
{
conn.Open();
var query = "INSERT INTO copii(prezenta, Nume, Prenume, Program, Taxa, Achitat, Diferenta, Grupa) VALUES('" + label5.Text + "', '" + textBox2.Text.Trim() + "', '" + textBox3.Text.Trim() + "', '" + textBox4.Text.Trim() + "', '" + textBox5.Text.Trim() + "', '" + textBox6.Text.Trim() + "', '" + textBox1.Text.Trim() + "', '" + textBox7.Text.Trim() + "')";
var command = new SqlCeCommand(query, conn);
command.ExecuteNonQuery();
refresh();
sume(); //calling the function for the sum
this.dataGridView1.Sort(this.dataGridView1.Columns["Nume"], ListSortDirection.Ascending);
colorRows();
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
}
Here is the function which calculates the sum from 3 diffent columns of dataGridView:
public void sume()
{
int sum1 = 0;
int sum2 = 0;
int sum3 = 0;
for (int i = 0; i <= dataGridView1.Rows.Count - 1; i++)
{
if (label17.Text.Length != 0)
{
sum1 = Convert.ToInt32(label17.Text) + (Convert.ToInt32(dataGridView1.Rows[i].Cells[5].Value));
label17.Text = sum1.ToString();
}
if (label18.Text.Length != 0)
{
sum2 = Convert.ToInt32(label18.Text) + (Convert.ToInt32(dataGridView1.Rows[i].Cells[6].Value));
label18.Text = sum2.ToString();
}
if(label19.Text.Length != 0)
{
sum3 = Convert.ToInt32(label19.Text) + (Convert.ToInt32(dataGridView1.Rows[i].Cells[7].Value));
label19.Text = sum3.ToString();
}
}
MessageBox.Show("done");
}
your sume() method is incorrect, When you are adding values of data grid your method should be like, every time for loop is executed values of sum are reset in your method.so you need to correct it like this(assuming you need to add label 17,18,19 only once).
public void sume()
{
int sum1 = 0;
int sum2 = 0;
int sum3 = 0;
for (int i = 0; i <= dataGridView1.Rows.Count - 1; i++)
{
if (label17.Text.Length != 0)
{
sum1 += (Convert.ToInt32(dataGridView1.Rows[i].Cells[5].Value));
}
if (label18.Text.Length != 0)
{
sum2+= (Convert.ToInt32(dataGridView1.Rows[i].Cells[6].Value));
}
if(label19.Text.Length != 0)
{
sum3+= (Convert.ToInt32(dataGridView1.Rows[i].Cells[7].Value));
}
}
sum1 += Convert.ToInt32(label17.Text);
sum2 += Convert.ToInt32(label18.Text);
sum3 += Convert.ToInt32(label19.Text);
label17.Text = sum1.ToString();
label18.Text = sum2.ToString();
label19.Text = sum3.ToString();
MessageBox.Show("done");
}
and performing summation on datasource is better than performing summation on datagrid.

How to copy DataTable to Excel File using OLEDB? [duplicate]

This question already has answers here:
Excel Interop - Efficiency and performance
(7 answers)
Closed 9 years ago.
I am writing a program send data to excel by using oledb.
I used Update statement like next:
OleDbConnection MyConnection = new OleDbConnection(#"provider=Microsoft.Jet.OLEDB.4.0;Data Source='" + GeneralData.excelPath + "';Extended Properties=Excel 8.0;")
MyConnection.Open();
OleDbCommand myCommand = new OleDbCommand();
myCommand.Connection = MyConnection;
myCommand.CommandType = System.Data.CommandType.Text;
string sql = "Update [test$] set press = " + pointsProperties[i].Pressure + ", temp = " + pointsProperties[i].Temperature + " where id= " + id;
myCommand.CommandText = sql;
myCommand.ExecuteNonQuery();
The problem is I will use the sql statement more than 100 times, which takes much time, so I thought that using Data Table will take less time, so I wrote a code saving my data in data table like next:
public static System.Data.DataTable ExcelDataTable = new System.Data.DataTable("Steam Properties");
static System.Data.DataColumn columnID = new System.Data.DataColumn("ID", System.Type.GetType("System.Int32"));
static System.Data.DataColumn columnPress = new System.Data.DataColumn("Press", System.Type.GetType("System.Int32"));
static System.Data.DataColumn columnTemp = new System.Data.DataColumn("Temp", System.Type.GetType("System.Int32"));
public static void IntializeDataTable() // Called one time in MDIParent1.Load()
{
columnID.DefaultValue = 0;
columnPress.DefaultValue = 0;
columnTemp.DefaultValue = 0;
ExcelDataTable.Columns.Add(columnID);
ExcelDataTable.Columns.Add(columnPress);
ExcelDataTable.Columns.Add(columnTemp);
}
public static void setPointInDataTable(StreamProperties Point)
{
System.Data.DataRow ExcelDataRow = ExcelDataTable.NewRow(); // Must be decleared inside the function
// It will raise exception if decleared outside the function
ExcelDataRow["ID"] = Point.ID;
ExcelDataRow["Press"] = Point.Pressure;
ExcelDataRow["Temp"] = Point.Temperature;
ExcelDataTable.Rows.Add(ExcelDataRow);
}
The problem is I don’t know :
1- Is the second way is faster?
2- How to copy the Data Table to the excel file?
Thanks.
//Dump the datatable onto the sheet in one operation
public void InsertDataTableIntoExcel(Application xlApp, DataTable dt, Reectangle QueryDataArea)
{
TurnOnOffApplicationSettings(false);
using (var rn = xlApp.Range[ColumnNumberToName(QueryDataArea.X) + QueryDataArea.Y + ":" + ColumnNumberToName(QueryDataArea.X + QueryDataArea.Width - 1) + (QueryDataArea.Y + QueryDataArea.Height)].WithComCleanup())
{
rn.Resource.Value2 = Populate2DArray(dt);
}
TurnOnOffApplicationSettings(true);
}
private object[,] Populate2DArray(DataTable dt)
{
object[,] values = (object[,])Array.CreateInstance(typeof(object), new int[2] { dt.Rows.Count + 1, dt.Columns.Count + 1}, new int[2] { 1, 1 });
for (int i = 0; i < dt.Rows.Count; i++)
{
for (int j = 0; j < dt.Columns.Count; j++)
{
values[i + 1, j + 1] = dt.Rows[i][j] == DBNull.Value ? "" : dt.Rows[i][j];
}
}
return values;
}
public static string ColumnNumberToName(Int32 columnNumber)
{
Int32 dividend = columnNumber;
String columnName = String.Empty;
Int32 modulo;
while (dividend > 0)
{
modulo = (dividend - 1)%26;
columnName = Convert.ToChar(65 + modulo).ToString() + columnName;
dividend = (Int32) ((dividend - modulo)/26);
}
return columnName;
}
public static Int32 ColumnNameToNumber(String columnName)
{
if (String.IsNullOrEmpty(columnName)) throw new ArgumentNullException("columnName");
char[] characters = columnName.ToUpperInvariant().ToCharArray();
Int32 sum = 0;
for (Int32 i = 0; i < characters.Length; i++)
{
sum *= 26;
sum += (characters[i] - 'A' + 1);
}
return sum;
}
private static XlCalculation xlCalculation = XlCalculation.xlCalculationAutomatic;
public void TurnOnOffApplicationSettings(Excel.Application xlApp, bool on)
{
xlApp.ScreenUpdating = on;
xlApp.DisplayAlerts = on;
if (on)
{
xlApp.Calculation = xlCalculation;
}
else
{
xlCalculation = xlApp.Calculation;
xlApp.Calculation = XlCalculation.xlCalculationManual;
}
xlApp.UserControl = on;
xlApp.EnableEvents = on;
}
WithComCleanup() is the VSTO Conrtib Libraries.

Threadpooling assistance

I have looked up about threadpooling and etc and found an example of it. At the moment was i trying to recreate the example i saw for my own project and i keep getting this error when i input any number from the UI.
ManualResetEvent[] doneReadEvents = new ManualResetEvent[Read];
ManualResetEvent[] doneWriteEvents = new ManualResetEvent[Write];
ReadWrite[] ReadArray = new ReadWrite[Read];
ReadWrite[] WriteArray = new ReadWrite[Write];
for (int i = 0; i < Read; i++)
{
doneReadEvents[i] = new ManualResetEvent(false);
ReadWrite Rw = new ReadWrite(Read, doneReadEvents[i]);
ReadArray[i] = Rw;
ThreadPool.QueueUserWorkItem(Rw.ThreadPoolCallBackRead, i);
}
for (int i = 0; i < Write; i++)
{
doneReadEvents[i] = new ManualResetEvent(false);
ReadWrite rW = new ReadWrite(Write, doneWriteEvents[i]);
ReadArray[i] = rW;
ThreadPool.QueueUserWorkItem(rW.ThreadPoolCallBackWrite, i);
}
WaitHandle.WaitAny(doneReadEvents);
WaitHandle.WaitAny(doneWriteEvents);
temp.Items.Add("Complete");
temp.Items.Add("Closing");
Output.DataSource = ReadWrite.MyList;
Work.DataSource = ReadWrite.MyList2;
ReadWrite.ReadData(Read);
}
the first line in the first loop i get an error saying it is out of bound of the array. when that error clears i dont know if there will be any more errors
namespace MultiThreadingReaderWriter
{
class ReadWrite
{
public int _rw;
public ManualResetEvent _doneEvents;
public List<string> myList = new List<string>();
public List<string> myList2 = new List<string>();
public List<string> MyList{ get { return myList; } }
public List<string> MyList2{ get { return myList2; } }
public int RW { get { return _rw; } }
//Constructor
public ReadWrite(int rw, ManualResetEvent doneEvents)
{
_rw = rw;
_doneEvents = doneEvents;
}
public void ThreadPoolCallBackRead(Object threadContext)
{
int threadindex = (int) threadContext;
myList.Add("Thread Read " + threadindex+ " started");
ReadData(_rw);
myList.Add("Thread Read " + threadindex + " done");
_doneReadEvents.Set();
}
public void ThreadPoolCallBackWrite(Object threadContext)
{
int threadindex = (int)threadContext;
myList.Add("Thread Write " + threadindex + " started");
WriteData(_rw);
myList.Add("Thread Write " + threadindex + " done");
_doneWriteEvents.Set();
}
public void ReadData(int reader)
{
myList.Add("Reader " + reader + " has entered Critical Section");
myList.Add("Reader " + reader + " is Reading");
myList.Add("Reader " + reader + " is leaving Critical Section");
}
public void WriteData(int writer)
{
myList.Add("Writer " + writer + " has entered Critical Section");
myList.Add("Writer " + writer + " is writing");
myList.Add("Writer " + writer + " is leaving Critical Section");
}
}
}
this is the class connected to that above form program.
Array indices start from zero and the right way to iterate would be
for (int i = 0; i < Read; i++)
{
}
for (int i = 0; i < Write; i++)
{
}

Categories