Splitting of dataset based on number of rows in to multiple tables - c#

I have a situation where I need to Split dataset results in to multiple tables eventually in to an array based on number of rows.
Ex: my dataset has 34 rows with a url column, I need to split 34 rows in to 4 data tables (10,10,10 remaining 4) and eventually add to an array. I am using a forms windows application. Itried something like below however every time i add records to array it adds entire dataset. Any help would be appreciated.
private DataSet Process(DataSet ds)
{
string[] Array1 = new string[10];
string[] Array2 = new string[10];
string[] Array3 = new string[10];
string[] Array4 = new string[10];
int COunt = ds.Tables[0].DefaultView.Count;
int NoOfArraysToCreate = COunt/10 + 1;
for (int i = 0; i <= NoOfArraysToCreate; i++ )
{
if (i == 0)
{
foreach (DataRow drs in ds.Tables[0].Rows)
{
List<String> myList = new List<string>();
myList.Add(drs["Url"].ToString());
Array1 = myList.ToArray();
}
}
else if (i == 1)
{
foreach (DataRow drs in ds.Tables[0].Rows)
{
List<String> myList = new List<string>();
myList.Add(drs["Url"].ToString());
Array2 = myList.ToArray();
}
}
else if (i == 2)
{
foreach (DataRow drs in ds.Tables[0].Rows)
{
List<String> myList = new List<string>();
myList.Add(drs["Url"].ToString());
Array3 = myList.ToArray();
}
}
else if (i == 3)
{
foreach (DataRow drs in dsURLsList.Tables[0].Rows)
{
List<String> myList = new List<string>();
myList.Add(drs["Url"].ToString());
Array4 = myList.ToArray();
}
}
}

It seems you are looping through all DataTable rows for each of your pages.
I would suggest you only loop over your rows once:
private List<List<string>> Process(DataSet ds, int pageSize)
{
List<List<string>> result = new List<List<string>>();
int COunt = ds.Tables[0].DefaultView.Count;
int NoOfArraysToCreate = COunt / pageSize + 1;
IEnumerable<DataRow> collection = ds.Tables[0].Rows.Cast<DataRow>(); //I find it easier to work with enumerables as it allows for LINQ expressions as below
for (int i = 0; i < NoOfArraysToCreate; i++)
{
result.Add(collection.Skip(i*pageSize)
.Take(pageSize)
.Select(r => r["Url"].ToString())
.ToList());
}
Parallel.ForEach(result, (page) =>
{
Parallel.ForEach(page, (url) => {}); // process your strings in parallel
});
return result;//I see you convert your string arrays back to DataSet, but since I don't know the table definition, I'm just returning the lists
}
void Main()
{
// this is just a test code to illustrate my point, yours will be different
var ds = new DataSet();
var dt = new DataTable();
dt.Columns.Add("Url", typeof(string));
for (int i = 0; i < 34; i++) {
dt.Rows.Add(Guid.NewGuid().ToString());//generating some random strings, ignore me
}
ds.Tables.Add(dt);
//---------------------------------------
Process(ds, 10);// calling your method
}
of course there are ways to do it with for loops as well, but I'd leave that for you to explore.
I would also say hardcoding table numbers into your method usually is considered a code smell, but since I don't know your context I will not make any further changes

Related

Filter Data table to list using Linq

DataTable dttoexcel=some data source;
String[] pro = { "Az","Bz","X" };
for (int f = 0; f < pro.Length; f++)
{
var LoginDetails = dttoexcel.Rows
.Cast<DataRow>()
.Where((r => r.Field<string>("Subcategoryname") == pro[f]))
.ToList();
}
Every time the list is updated with new data, the old data will be overwritten.I want to store all the data into list without loosing old data.
Please help me to solve this.
DataTable dttoexcel=some data source;
String[] pro = { "Az","Bz","X" };
List<dynamic> LoginDetails = new List<dynamic>();
for (int f = 0; f < pro.Length; f++)
{
LoginDetails.AddRange(dttoexcel.Rows
.Cast<DataRow>()
.Where((r => r.Field<string>("Subcategoryname") ==
pro[f])).ToList());
}
Every time your loop executes or iterates, it declares a new variable on this line:
var LoginDetails = ....
Instead, you need to concat your result so that your data will not be overwritten everytime you iterate on your loop:
List<DataRow> LoginDetails = new List<DataRow>(); //Replace T with your entity
for (int f = 0; f < pro.Length; f++)
{
LoginDetails = LoginDetails.Concat(dttoexcel.Rows
.Cast<DataRow>()
.Where(r => r.Field<string>("Subcategoryname") ==
pro[f]).ToList()).ToList();
}
Just copying DataRow references to List won't prevent overriding their content. So one should clone existing DataRow objects (see this question for details). Also you can use HashSet to improve filtering:
DataTable dttoexcel = null;
string[] pro = { "Az", "Bz", "X" };
var proSet = new HashSet<string>(pro);
// Storage for copied data. Copy metadata from the original table
DataTable dttoexcelOld = dttoexcel.Clone();
foreach (var row in dttoexcel.Rows
.Cast<DataRow>()
.Where(r => proSet.Contains(r.Field<string>("Subcategoryname"))))
// Clone the row to prevent further overwriting and add it to dttoexcelOld table
dttoexcelOld.ImportRow(row);
With every iteration you are creating new LoginDetails object, so you need to store your filtered data somewhere, for example in the same type List using AddRange or Concat methods. You can try something like this:
DataTable dttoexcel = some data source;
String[] pro = { "Az", "Bz", "X" };
List<DataRow> loginDetails = new List<DataRow>();
for (int f = 0; f < pro.Length; f++)
{
loginDetails.AddRange(dttoexcel //or Concat method
.AsEnumerable()
.Where((r => r.Field<string>("Subcategoryname") == pro[f]))
.ToList());
}
AddRange and Concat methods have different semantics:
AddRange - modifies source list by adding items to it.
Concat - returning the source list, with added new items, without modifying the source list.
Update
To create new DataTable with filtered results, simply do this:
DataTable filteredTable = new DataTable();
filteredTable = loginDetails.CopyToDataTable();

How do i add data to a specific column in datagridview?

I have i already made 3 lists of elements that i want to insert on my datagridview,
but the problem is when i try to insert data on my datagrid a receive something like this,
so to insert data i used the following code:
IList<string> ruas = new List<string>();
foreach (var element in Gdriver.FindElements(By.ClassName("search-title")))
{
//ruas.Add(element.Text);
table.Rows.Add(element.Text);
}
IList<string> codps = new List<string>();
foreach(var Codpelement in Gdriver.FindElements(By.ClassName("cp")))
{
table.Rows.Add("",Codpelement.Text);
}
IList<string> Distritos = new List<string>();
foreach (var Distritoelement in Gdriver.FindElements(By.ClassName("local")))
{
//Distritos.Add(Distritoelement.Text);
table.Rows.Add("","",Distritoelement.Text.Substring(Distritoelement.Text.LastIndexOf(',') + 1));
}
Could you kindly, tell me a better way to make the data appear from top to bottom?
Thanks.
The problem is that you enter a single value per row. You should have three rows in total, but you have 3 * numberofcolumns rows. Instead of your current approach I recommend the following:
IList<string> ruas = new List<string>();
foreach (var element in Gdriver.FindElements(By.ClassName("search-title")))
{
ruas.Add(element.Text);
}
IList<string> codps = new List<string>();
foreach(var Codpelement in Gdriver.FindElements(By.ClassName("cp")))
{
codps.Add(Codpelement.Text);
}
IList<string> Distritos = new List<string>();
foreach (var Distritoelement in Gdriver.FindElements(By.ClassName("local")))
{
Distritos.Add(Distritoelement.Text.Substring(Distritoelement.Text.LastIndexOf(',') + 1));
}
for (int i = 0; i < ruas.Count; i++) {
table.Rows.Add(ruas.ElementAt(i), codps.ElementAt(i), Distritos.ElementAt(i));
}

Generate combinations of elements held in multiple list of strings in C#

I'm trying to automate the nested foreach provided that there is a Master List holding List of strings as items for the following scenario.
Here for example I have 5 list of strings held by a master list lstMaster
List<string> lst1 = new List<string> { "1", "2" };
List<string> lst2 = new List<string> { "-" };
List<string> lst3 = new List<string> { "Jan", "Feb" };
List<string> lst4 = new List<string> { "-" };
List<string> lst5 = new List<string> { "2014", "2015" };
List<List<string>> lstMaster = new List<List<string>> { lst1, lst2, lst3, lst4, lst5 };
List<string> lstRes = new List<string>();
foreach (var item1 in lst1)
{
foreach (var item2 in lst2)
{
foreach (var item3 in lst3)
{
foreach (var item4 in lst4)
{
foreach (var item5 in lst5)
{
lstRes.Add(item1 + item2 + item3 + item4 + item5);
}
}
}
}
}
I want to automate the below for loop regardless of the number of list items held by the master list lstMaster
Just do a cross-join with each successive list:
IEnumerable<string> lstRes = new List<string> {null};
foreach(var list in lstMaster)
{
// cross join the current result with each member of the next list
lstRes = lstRes.SelectMany(o => list.Select(s => o + s));
}
results:
List<String> (8 items)
------------------------
1-Jan-2014
1-Jan-2015
1-Feb-2014
1-Feb-2015
2-Jan-2014
2-Jan-2015
2-Feb-2014
2-Feb-2015
Notes:
Declaring lstRes as an IEnumerable<string> prevents the unnecessary creation of additional lists that will be thrown away
with each iteration
The instinctual null is used so that the first cross-join will have something to build on (with strings, null + s = s)
To make this truly dynamic you need two arrays of int loop variables (index and count):
int numLoops = lstMaster.Count;
int[] loopIndex = new int[numLoops];
int[] loopCnt = new int[numLoops];
Then you need the logic to iterate through all these loopIndexes.
Init to start value (optional)
for(int i = 0; i < numLoops; i++) loopIndex[i] = 0;
for(int i = 0; i < numLoops; i++) loopCnt[i] = lstMaster[i].Count;
Finally a big loop that works through all combinations.
bool finished = false;
while(!finished)
{
// access current element
string line = "";
for(int i = 0; i < numLoops; i++)
{
line += lstMaster[i][loopIndex[i]];
}
llstRes.Add(line);
int n = numLoops-1;
for(;;)
{
// increment innermost loop
loopIndex[n]++;
// if at Cnt: reset, increment outer loop
if(loopIndex[n] < loopCnt[n]) break;
loopIndex[n] = 0;
n--;
if(n < 0)
{
finished=true;
break;
}
}
}
public static IEnumerable<IEnumerable<T>> GetPermutations<T>(this IEnumerable<IEnumerable<T>> lists)
{
IEnumerable<IEnumerable<T>> result = new List<IEnumerable<T>> { new List<T>() };
return lists.Aggregate(result, (current, list) => current.SelectMany(o => list.Select(s => o.Union(new[] { s }))));
}
var totalCombinations = 1;
foreach (var l in lstMaster)
{
totalCombinations *= l.Count == 0 ? 1 : l.Count;
}
var res = new string[totalCombinations];
for (int i = 0; i < lstMaster.Count; ++i)
{
var numOfEntries = totalCombinations / lstMaster[i].Count;
for (int j = 0; j < lstMaster[i].Count; ++j)
{
for (int k = numOfEntries * j; k < numOfEntries * (j + 1); ++k)
{
if (res[k] == null)
{
res[k] = lstMaster[i][j];
}
else
{
res[k] += lstMaster[i][j];
}
}
}
}
The algorithm starts from calculating how many combinations we need for all the sub lists.
When we know that we create a result array with exactly this number of entries. Then the algorithm iterates through all the sub lists, extract item from a sub list and calculates how many times the item should occur in the result and adds the item the specified number of times to the results. Moves to next item in the same list and adds to remaining fields (or as many as required if there is more than two items in the list). And it continues through all the sub lists and all the items.
One area though that needs improvement is when the list is empty. There is a risk of DivideByZeroException. I didn't add that. I'd prefer to focus on conveying the idea behind the calculations and didn't want to obfuscate it with additional checks.

C# Divide the values of one list by the values of the other to produce a new list

I wish to divide the corresponding values of each list in order to produce a new list containing the divided results of each corresponding index in my two lists. Both my lists are 8 values long and are both int lists. I am currently creating two lists that are to be divided like so:
private void StartSchedule_Click(object sender, EventArgs e)
{
string ConnectionString = #"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=F:\A2 Computing\C# Programming Project\TriHard.accdb";
string SelectQuery = "SELECT Time.AthleteID, Athlete.AthleteName, Time.EventTime, Event.EventDistance FROM Event INNER JOIN (Athlete INNER JOIN [Time] ON Athlete.[AthleteID] = Time.[AthleteID]) ON Event.[EventID] = Time.[EventID];";
OleDbConnection Connection = new OleDbConnection(ConnectionString);
OleDbCommand Command = new OleDbCommand(SelectQuery, Connection);
Command.Connection.Open();
OleDbDataReader Reader = Command.ExecuteReader(CommandBehavior.CloseConnection);
PaceCalculator pace = new PaceCalculator();
List<int> Distancelist = new List<int>();
List<int> Secondslist = new List<int>();
List<int> Pacelist = new List<int>();
while (Reader.Read())
{
pace = new PaceCalculator();
pace.Distance = (int)Reader["EventDistance"];
int DistanceInt = Convert.ToInt32(pace.Distance);
Distancelist.Add(DistanceInt);
pace = new PaceCalculator();
pace.Time = (string)Reader["EventTime"]; //Reads in EventTime
double Seconds = TimeSpan.Parse(pace.Time).TotalSeconds; //Converts the string into HH:MM:SS as a double
int SecondsInt = Convert.ToInt32(Seconds); //Converts the double into an integer, returning the seconds in the total time
Secondslist.Add(SecondsInt); //Adds the Seconds for each time to the list;
//Need to fix this currently returns just 0
var Pacelist2 = PaceCalc(Distancelist, Secondslist);
listBox3.DisplayMember = "PaceInt";
listBox3.DataSource = (Pacelist2);
}
listBox1.DisplayMember = "DistanceInt";
listBox1.DataSource = Distancelist;
listBox2.DisplayMember = "SecondsInt";
listBox2.DataSource = Secondslist;
Here is the function I am calling which attempts to divide the lists, but doesn't seem to be working:
public List<int> PaceCalc(List<int> Dlist, List<int> Slist)
{
PaceCalculator pace = new PaceCalculator();
List<int> Plist = new List<int>();
pace = new PaceCalculator();
for (int i = 0; i == Dlist.Count; i++)
{
int PaceInt = Dlist[i] / Slist[i];
Plist.Add(PaceInt);
}
return Plist;
}
I wish to display the outcomes of the division in listBox3. Am I dividing the lists correctly and how can I display it in the list box?
Your for loop is never executing because you're testing if i == Dlist.Count. It should be:
for (int i = 0;i < Dlist.Count; i++)
Alternatively, you could do this with LINQ:
public List<int> PaceCalc(List<int> Dlist, List<int> Slist)
{
return Dlist.Zip(Slist, (a, b) => a / b).ToList();
}
Couple of issues:
First you need to modify your check in for loop to i < Dlist.Count. Your current check i == Dlist.Count is wrong.
So your method would be:
public List<int> PaceCalc(List<int> Dlist, List<int> Slist)
{
List<int> Plist = new List<int>();
for (int i = 0; i < Dlist.Count; i++)
{
int PaceInt = Dlist[i] / Slist[i];
Plist.Add(PaceInt);
}
return Plist;
}
(I have removed PaceCalculator pace = new PaceCalculator();, since you don't need that at all in your method)
Second. You don't have to specify DisplayMember for ListBox3
var Pacelist2 = PaceCalc(Distancelist, Secondslist);
listBox3.DataSource = Pacelist2;
Although, the second issue will not cause any error/exception, since DisplayMember will not be found , it will use the default ToString overload and you will get the number.

C# - Looking for the list of duplicated rows (need optimization)

Please, I would like to optimize this code in C#, if possible.
When there are less than 1000 lines, it's fine. But when we have at least 10000, it starts to take some time...
Here a little benchmark :
5000 lines => ~2s
15000 lines => ~20s
25000 lines => ~50s
Indeed, I'm looking for duplicated lines.
Method SequenceEqual to check values may be a problem (in my "benchmark", I have 4 fields considered as "keyField" ...).
Here is the code :
private List<DataRow> GetDuplicateKeys(DataTable table, List<string> keyFields)
{
Dictionary<List<object>, int> keys = new Dictionary<List<object>, int>(); // List of key values + their index in table
List<List<object>> duplicatedKeys = new List<List<object>>(); // List of duplicated keys values
List<DataRow> duplicatedRows = new List<DataRow>(); // Rows that are duplicated
foreach (DataRow row in table.Rows)
{
// Find keys fields values for the row
List<object> rowKeys = new List<object>();
keyFields.ForEach(keyField => rowKeys.Add(row[keyField]));
// Check if those keys are already defined
bool alreadyDefined = false;
foreach (List<object> keyValue in keys.Keys)
{
if (rowKeys.SequenceEqual(keyValue))
{
alreadyDefined = true;
break;
}
}
if (alreadyDefined)
{
duplicatedRows.Add(row);
// If first duplicate for this key, add the first occurence of this key
if (!duplicatedKeys.Contains(rowKeys))
{
duplicatedKeys.Add(rowKeys);
int i = keys[keys.Keys.First(key => key.SequenceEqual(rowKeys))];
duplicatedRows.Add(table.Rows[i]);
}
}
else
{
keys.Add(rowKeys, table.Rows.IndexOf(row));
}
}
return duplicatedRows;
}
Any ideas ?
I think this is the fastest and shortest way to find duplicate rows:
For 100.000 rows it executes in about 250ms.
Main and test data:
static void Main(string[] args)
{
var dt = new DataTable();
dt.Columns.Add("Id");
dt.Columns.Add("Value1");
dt.Columns.Add("Value2");
var rnd = new Random(DateTime.Now.Millisecond);
for (int i = 0; i < 100000; i++)
{
var dr = dt.NewRow();
dr[0] = rnd.Next(1, 1000);
dr[1] = rnd.Next(1, 1000);
dr[2] = rnd.Next(1, 1000);
dt.Rows.Add(dr);
}
Stopwatch sw = new Stopwatch();
sw.Start();
var duplicates = GetDuplicateRows(dt, "Id", "Value1", "Value2");
sw.Stop();
Console.WriteLine(
"Found {0} duplicates in {1} miliseconds.",
duplicates.Count,
sw.ElapsedMilliseconds);
Console.ReadKey();
}
GetDuplicateRows with LINQ:
private static List<DataRow> GetDuplicateRows(DataTable table, params string[] keys)
{
var duplicates =
table
.AsEnumerable()
.GroupBy(dr => String.Join("-", keys.Select(k => dr[k])), (groupKey, groupRows) => new { Key = groupKey, Rows = groupRows })
.Where(g => g.Rows.Count() > 1)
.SelectMany(g => g.Rows)
.ToList();
return duplicates;
}
Explanation (for those who are new to LINQ):
The most tricky part is the GroupBy I guess. Here I take as the first parameter a DataRow and for each row I create a group key from the values for the specified keys that I join to create a string like 1-1-2. Then the second parameter just selects the group key and the group rows into a new anonymous object. Then I check if there is more then 1 row and flatten the groups back into a list with SelectMany.
Try this. Use more linq, that improve perfomance, also try with PLinq if posible.
Regards
private List<DataRow> GetDuplicateKeys(DataTable table, List<string> keyFields)
{
Dictionary<List<object>, int> keys = new Dictionary<List<object>, int>(); // List of key values + their index in table
List<List<object>> duplicatedKeys = new List<List<object>>(); // List of duplicated keys values
List<DataRow> duplicatedRows = new List<DataRow>(); // Rows that are duplicated
foreach (DataRow row in table.Rows)
{
// Find keys fields values for the row
List<object> rowKeys = new List<object>();
keyFields.ForEach(keyField => rowKeys.Add(row[keyField]));
// Check if those keys are already defined
bool alreadyDefined = false;
foreach (List<object> keyValue in keys.Keys)
{
if (rowKeys.Any(keyValue))
{
alreadyDefined = true;
break;
}
}
if (alreadyDefined)
{
duplicatedRows.Add(row);
// If first duplicate for this key, add the first occurence of this key
if (!duplicatedKeys.Contains(rowKeys))
{
duplicatedKeys.Add(rowKeys);
int i = keys[keys.Keys.First(key => key.SequenceEqual(rowKeys))];
duplicatedRows.Add(table.Rows[i]);
}
}
else
{
keys.Add(rowKeys, table.Rows.IndexOf(row));
}
}
return duplicatedRows;
}

Categories