I'm writting a C# application that compares if the result from two different selects are the same and they're execution time, for optimitzing purposes.
Actually I'm using stopwatch to get execution time and then convert OracleDataReaders into DataTable and compare rows, with independency of order, like this:
var tableA = new DataTable();
tableA.Load(readerA);
var tableB = new DataTable();
tableB.Load(readerB);
bool equals = true;
for (int i = 0; i < tableA.Rows.Count; i++)
{
if (!DataRowComparer.Default.Equals(tableA.Rows[i],tableB.Rows[i]))
{
equals = false;
break;
}
}
return equals;
But I'm assuming that converting OracleDataReader into DataTable and then using a loop to compare rows are the same and in the same order.
Is there any prebuilt method with C# and Oracle to compare result of two selects with/without rows order?
Thanks
Here is an attempt at writing a generic data comparison method for two OracleDataReaders. The code compares the readers line by line, column by column to spot any differences. It takes into account that readers may contain results from more than one query. Code will need to be enhanced if more complex datatypes (binary etc) would be compared. The code also makes the assumption that the order of the data matters; if readers are to be considered equal even when differently sorted the code would need to be rewritten to put rows into lists or dictionaries etc.
private bool ReadersContainEqualData(OracleDataReaders readerA, OracleDataReaders readerB)
{
bool moreResultsA = false;
bool moreResultsB = false;
do {
if(readerA.FieldCount != readerB.FieldCount)
{
return false; // the readers have different number of columns
}
while(readerA.Read() && readerB.Read())
{
for(int i = 0; i < readerA.FieldCount; i++)
{
if(readerA.GetName(i) != readerB.GetName(i)) // different column names, remove this check if it is not important to you
{
return false;
}
if(readerA[i] != readerB[i]) // the columns are either string, numeric or booean, so simple equals comparison works. If more complex columns like varbinary etc is used, this check will need to be enhanced
{
return false;
}
}
}
if(readerA.Read() || readerB.Read()) // one of the readers still has more rows and the other is empty
{
return false;
}
// check if the readers contains results from another query than the recently processed
moreResultsA = readerA.NextResult();
moreResultsB = readerB.NextResult();
if(moreResultsA != moreResultsB)
{
return false;
}
} while(moreResultsA && moreResultsB);
return true;
}
Related
I am doing a code where I compare two columns of DGV roles, the first DGV (DGV1) has the raw data with duplicate roles, and the second DGV (DGV4) is a dictionary with all existing roles (no duplicates), it has to go to each row of the dictionary and if the role exists in the DGV1, it should be removed from the dictionary, leaving only the roles in the dictionary that are not currently being used in the raw data. My code is removing the roles, but when the dictionary has a value that doesn't exist in DGV1, it stops working (DGV1 continues to loop until it has an index error). Any suggestion?
NOTE: The rows in the dictionary automatically go to the first index, so there is no need to increment int i.
int eliminado = 0;
int filasDGV1 = dataGridView1.Rows.Count;
int filasDGV4 = dataGridView4.Rows.Count;
int i = 0;
int j = 0;
do
{
string perfilVacio = dataGridView4["GRANTED_ROLE", i].Value.ToString();
string perfiles = dataGridView1["GRANTED_ROLE", j].Value.ToString();
if(perfiles != perfilVacio)
{
j++;
}
else if(perfiles == perfilVacio)
{
dataGridView4.Rows.RemoveAt(i);
}
}
while (eliminado <= filasDGV4);
The first excel is DGV1 and the other is DGV2, I highlighted where is the code looping currently
The orange highlight is where the loop change in DGV1 but in the dictionary doesnt exist so its stuck there
Change your loop condition to include a test for the changing index j and also to check whether there are rows left to be eliminated.
int filasDGV1 = dataGridView1.Rows.Count;
int j = 0;
while (j < filasDGV1 && dataGridView4.Rows.Count > 0)
{
string perfilVacio = dataGridView4["GRANTED_ROLE", 0].Value.ToString();
string perfiles = dataGridView1["GRANTED_ROLE", j].Value.ToString();
if(perfiles == perfilVacio)
{
dataGridView4.Rows.RemoveAt(0);
}
else
{
j++;
}
}
If you test perfiles != perfilVacio in if you don't have to test perfiles == perfilVacio in else if, because this automatically the case. Either they are equal or they are not. There no other possibility.
Also, it is generally more readable if you ask a positive question in if like == instead of a negative one like !=.
Since i is always 0 I replaced it by the constant 0. The variable eliminado is not required (unless it is incremented when rows are removed to display the number of deleted rows).
The number of rows in dataGridView4 should not be stored in filasDGV4 as this number changes.
Update
According to your comments and the new screenshots, you need two loops. (The code above only works if both lists are sorted). We could use two nested loops; however, this is slow. Therefore, I suggest collecting the unwanted roles in a HashSet<string> first. Testing whether an item is in a HashSet is extremely fast. Then we can loop through the rows of the dictionary and delete the unwanted ones.
var unwanted = new HashSet<string>();
for (int i = 0; i < dataGridView1.Rows.Count: i++)
{
unwanted.Add(dataGridView1["GRANTED_ROLE", i].Value.ToString());
}
int row = 0;
while (row < dataGridView4.Rows.Count)
{
string perfilVacio = dataGridView4["GRANTED_ROLE", row].Value.ToString();
if(unwanted.Contains(perfilVacio))
{
dataGridView4.Rows.RemoveAt(row);
}
else
{
row++;
}
}
Suggestion: Using data binding to bind your DataGridViews to generic lists would enable you to work on these lists instead of working on the DGVs. This would simplify the data handling considerably.
I have two var of code:
first:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var usersLookUp = list_fiodat.ToLookup(u => u.fio, u => u.dat); // create map
var dates = usersLookUp[FullName];
if (dates.Count() > 0)
{
foreach (var dt in dates)
{
if (dt == BirthDate) return true;
}
}
}
and second:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var members = from s in list_fiodat where s.fio == FullName & s.dat == Birthdate select s;
if (members.Count() > 0 return true;
}
They make the same job - searching user by name and birthday.
The first one work very quick.
The second is very slowly (10x-50x)
Tell me please if it possible accelerate the second one?
I mean may be the list need in special preparing?
I tried sorting: list_fiodat_sorted = list_fiodat.OrderBy(x => x.fio).ToList();, but...
I skip your first test and change Count() to Any() (count iterate all list while any stop when there are an element)
public bool Test1(List<pair_fiodat> list_fiodat)
{
foreach (string fname in XML_files)
{
var members = from s in list_fiodat
where s.fio == fname & s.dat == BirthDate
select s;
if (members.Any())
return true;
}
return false;
}
If you want optimize something, you must leave comfortable things that offer the language to you because usually this things are not free, they have a cost.
For example, for is faster than foreach. Is a bit more ugly, you need two sentences to get the variable, but is faster. If you iterate a very big collection, each iteration sum.
LINQ is very powerfull and it's wonder work with it, but has a cost. If you change it for another "for", you save time.
public bool Test2(List<pair_fiodat> list_fiodat)
{
for (int i = 0; i < XML_files.Count; i++)
{
string fname = XML_files[i];
for (int j = 0; j < list_fiodat.Count; j++)
{
var s = list_fiodat[j];
if (s.fio == fname & s.dat == BirthDate)
{
return true;
}
}
}
return false;
}
With normal collections there aren't difference and usually you use foeach, LINQ... but in extreme cases, you must go to low level.
In your first test, ToLookup is the key. It takes a long time. Think about this: you are iterating all your list, creating and filling the map. It's bad in any case but think about the case in which the item you are looking for is at the start of the list: you only need a few iterations to found it but you spend time in each of the items of your list creating the map. Only in the worst case, the time is similar and always worse with the map creation due to the creation itself.
The map is interesting if you need, for example, all the items that match some condition, get a list instead found a ingle item. You spend time creating the map once, but you use the map many times and, in each time, you save time (map is "direct access" against the for that is "sequencial").
Problem Summary:
We have a set of databases on multiple servers that "should" all have the same SQL objects. Over the years, our developers have added/modified objects in various databases such that they do not match anymore. I need to obtain a list of all SQL objects (tables, views, stored procedures, user defined functions) from multiple databases on multiple servers that are exactly the SAME. (later to get the list of unique items and later a list of modified items). My current solution works but is pretty slow. I wanted to know if there was a better existing alternative but I can't find one.
Current Solution:
For now I have been using SMO in C# to get the urns of all objects and script them. When I attempt to script them 1 object at a time, the process is slow (lots of calls to the server). If I try to script them by packing their urns into an array, the process is faster but I just get an Enumerable or StringCollection of the resulting scripts without organization as to which object the script came from, etc. What would be a better way to approach this (I know of existing tools such as ApexSQL or Red-Gate, they are out of the question for the moment). My current solution is to group them up by names (and split by server) and script them in those smaller by-name batches.
Excuse my current code, I've been all over the place trying different methods. Maybe there is a solution that doesn't even need to analyse the code. Two things to note:
I have a class called SqlObjectInfo which stores just some basic info on each object such as: Name, Server, DB, Schema, Type, Urn
items is a SqlObjectInfoCollection which is a class which contains a list of SqlObjectInfo plus some helping functions to add objects from servers and databases. Filling this collection with all of the SqlObjectInfo's is fast so thats not the problem.
//Create DataTable
var table = new DataTable("Equal Objects");
table.Columns.Add("Name");
table.Columns.Add("Type");
//Create DataRows
int dbCount = items.SqlObjects.GroupBy(obj => obj.Database).Count();
DMP dmp = DiffMatchPatchModule.Default;
var rows = new List<DataRow>();
foreach (IGrouping<string, SqlObjectInfo> nameGroup in items.SqlObjects.GroupBy(obj => obj.Name))
{
var likeNamedObjs = nameGroup.ToList();
if (likeNamedObjs.Count != dbCount)
{
continue; //object not in all databases
}
//Script Objects
var rawScripts = new List<string>();
bool scriptingSucceeded = true;
foreach (IGrouping<Server, SqlObjectInfo> serverGroup in nameGroup.GroupBy(obj => obj.Server))
{
Server server = serverGroup.Key;
Urn[] urns = serverGroup.Select(obj => obj.Urn).ToArray();
var scripter = new Scripter(server)
{
Options = items.ScriptingOptions
};
IEnumerable<string> results;
try
{
results = scripter.EnumScript(urns);
}
catch (FailedOperationException)
{
scriptingSucceeded = false;
break; //the object is probably encrypted
}
rawScripts.AddRange(results);
}
if (!scriptingSucceeded)
{
continue;
}
if (rawScripts.Count % nameGroup.Count() != 0)
{
continue;
}
var allScripts = new List<string>();
int stringsPerScript = rawScripts.Count / nameGroup.Count();
for (int i = 0; i < rawScripts.Count; i += stringsPerScript) //0, 3, 6, 9
{
IEnumerable<string> scriptParts = rawScripts.Skip(i).Take(stringsPerScript);
allScripts.Add(string.Join(Environment.NewLine, scriptParts));
}
//Compare Scripts
bool allEqual = true;
for (int i = 1; i < allScripts.Count; i++)
{
(string lineScript0, string lineScriptCurr, _) = dmp.DiffLinesToChars(allScripts[0], allScripts[i]).ToValueTuple();
List<Diff> diffs = dmp.DiffMain(lineScript0, lineScriptCurr, false);
if (!diffs.TrueForAll(diff => diff.Operation.IsEqual))
{
allEqual = false;
break; //scripts not equal
}
}
//If all scripts are equal, create data row for object
if (allEqual)
{
DataRow row = table.NewRow();
row["Name"] = likeNamedObjs[0].Name;
row["Type"] = likeNamedObjs[0].Type;
rows.Add(row);
}
}
//Add DataRows to DataTable
foreach (DataRow row in rows.OrderBy(r => r["Type"]).ThenBy(r => r["Name"]))
{
table.Rows.Add(row);
}
//Write DataTable to csv
var builder = new StringBuilder();
builder.AppendLine(string.Join(",", table.Columns.Cast<DataColumn>().Select(col => col.ColumnName)));
foreach (DataRow row in table.Rows)
{
builder.AppendLine(string.Join(",", row.ItemArray.Select(field => field.ToString())));
}
File.WriteAllText("equalObjects.csv", builder.ToString());
The code works. I can get my expected resulting csv file of (Name|Type) of all the objects that are exactly the same in all DB's across multiple servers. It's just so darn slow. Am I approaching this the right way? Is there a better/more modern solution?
There are tables in all databases that have objects. In sqlserver it is sysobj. First you need to create a master list. You could union this view from.all dbs and do a distinct. Then outer join that to each db sysobj
In below code i want to compare two dataset column's values but its not match then also getting true this condition.so how to really compare?
if (dsEmp.Tables[0].Columns["EmpName"].ToString() == dsAllTables.Tables[2].Columns["EmpName"].ToString())
{
}
You are comparing two column-names, so "EmpName" with "EmpName" which is always true. Tables[0].Columns["EmpName"] returns a DataColumn with that name and ToString returns the name of the column which is "EmpName". So that's pointless.
If you instead want to know if two tables contain the same EmpName value in one of their rows you can use LINQ:
var empRowsEmpName = dsEmp.Tables[0].AsEnumerable().Select(r => r.Field<string>("EmpName"));
var allRowsEmpName = dsAllTables.Tables[2].AsEnumerable().Select(r => r.Field<string>("EmpName"));
IEnumerable<string> allIntersectingEmpNames = empRowsEmpName.Intersect(allRowsEmpName);
if (allIntersectingEmpNames.Any())
{
}
Now you even know which EmpName values are contained in both tables. You could use a foreach-loop:
foreach(string empName in allIntersectingEmpNames)
Console.WriteLine(empName);
If you want to find out if a specific value is contained in both:
bool containsName = allIntersectingEmpNames.Contains("SampleName");
If you just want to get the first matching:
string firstIntersectingEmpName = allIntersectingEmpNames.FirstOrDefault();
if(firstIntersectingEmpName != null){
// yes, there was at least one EmpName that was in both tables
}
If you have a single row, this should work:
if (dsEmp.Tables[0].Row[0]["EmpName"].ToString() == dsAllTables.Tables[2].rows[0]["EmpName"].ToString())
{
}
For multiple rows you have to iterate through table:
for (int i = 0; i <= dsEmp.Tables[0].Rows.Count; i++)
{
for (int j = 0; j <= dsAllTables.Tables[0].Rows.Count; j++)
{
if (dsEmp.Tables[0].Rows[i]["EmpName"].ToString() == dsAllTables.Tables[2].Rows[j]["EmpName"].ToString())
{
}
}
}
I have two datatables - dtbl and mtbl, and I use this to return records that have a difference, as another DataTable.
//compare the two datatables and output any differences into a new datatable, to return
var differences = dtbl.AsEnumerable().Except(mtbl.AsEnumerable(), DataRowComparer.Default);
return differences.Any() ? differences.CopyToDataTable() : new DataTable();
what I'm trying to do: I have a large datatable, and I'm going through a list of strings where some of them are in the datatable and some aren't. I need to make a list of those that are, and count those that aren't.
This is my code part:
DataRow[] foundRows;
foundRows = DTgesamt.Select("SAP_NR like '%"+SAP+"%'");
if (AreAllCellsEmpty(foundRows[0]) == false && !(foundRows[0]==null))
{
list.Add(SAP);
}
else
{
notfound++;
}
public static bool AreAllCellsEmpty(DataRow row)
{
if (row == null) throw new ArgumentNullException("row");
for (int i = row.Table.Columns.Count - 1; i >= 0; i--)
{
if (!row.IsNull(i))
{
return false;
}
}
return true;
}
DTgesamt ist a large DataTable. "SAP" is a string that is in the first column of the DataTable, but not all of them are included. I want to count the unfound ones with the int "notfound".
The problem is, the Select returns an empty DataRow {System.Data.DataRow[0]} when it finds nothing.
I'm getting the errormessage Index out of array area.
The two statements in the if-clause are what I read on the internet but they don't work. With only the 2nd statement it just adds all numbers to the list, with the first it still gives this error.
Thanks for any help :)
check count of items in foundRows array to avoid IndexOutOfRange exception
foundRows = DTgesamt.Select("SAP_NR like '%"+SAP+"%'");
if (foundRows.Length > 0 && AreAllCellsEmpty(foundRows[0])==false)
list.Add(SAP);
else
notfound++;
The found cells cannot be empty. Your select statement would be wrong. So what you actually need is:
if (DTgesamt.Select("SAP_NR like '%"+SAP+"%'").Any())
{
list.Add(SAP);
}
else
{
notfound++;
}
You probably don't even need the counter, when you can calculate the missed records based on how many SAP numbers you had and how many results you got in list.
If you have an original list or array of SAP numbers, you could shorten your whole loop to:
var numbersInTable = originalNumbers.Where(sap => DTgesamt.Select("SAP_NR like '%"+sap+"%'").Any()).ToList();
var notFound = originalNumbers.Count - numbersInTable.Count;