very slow C#/linq loop need improvement suggestions - c#

I have the following piece of code that I use to try to see if copying data from one table to an other missed some records.
There are reasons why this can happen but I won't go into the details here.
Now fortunately, this code runs against a few hundred records at a time, so I can allow myself lo load them into memory and use LINQ to Objects.
As I expected, my code is very slow and I'm wondering if anyone could suggest any way to improve the speed.
void Main()
{
var crossed_data = from kv in key_and_value_table
from ckv in copy_of_key_and_value_table
where kv.key != ckv.key
select new { KeyTable = kv, copyKeyTable = ckv };
List<Key_and_value> difference = new List<Key_and_value>();
foreach (var v in crossed_data)
{
if (crossed_data.Select(s => s.Kv.key).ToList().
Contains(v.ckv.Key) == false)
{
difference.Add(v.ckv);
}
}
}
public class Key_and_value
{
public string Key { get; set; }
public decimal Value { get; set; }
}
many thanks in advance
B

You are doing your Select every iteration when you do not need to. You can move it to the external scope like so.
var keys = crossed_data.Select(s=>s.ckv.key).ToList();
foreach(var v in crossed_data )
{
if (keys.Contains(v.kv.Key) == false)
{
difference.Add(v.Kv);
}
}
This should improve the speed a fair bit.

Related

Best Way to compare 1 million List of object with another 1 million List of object in c#

i am differentiating 1 million list of object with another 1 million list of object.
i am using for , foreach but it takes too much of time to iterate those list.
can any one help me best way to do this
var SourceList = new List<object>(); //one million
var TargetList = new List<object>()); // one million
//getting data from database here
//SourceList with List of one million
//TargetList with List of one million
var DifferentList = new List<object>();
//ForEach
SourceList.ToList().ForEach(m =>
{
if (!TargetList.Any(s => s.Name == m.Name))
DifferentList.Add(m);
});
//for
for (int i = 0; i < SourceList .Count; i++)
{
if (!TargetList .Any(s => s == SourceList [i].Name))
DifferentList .Add(SourceList [i]);
}
I think it seems like a bad idea but IEnumerable magic will help you.
For starters, simplify your expression. It looks like this:
var result = sourceList.Where(s => targetList.Any(t => t.Equals(s)));
I recommend making a comparison in the Equals method:
public class CompareObject
{
public string prop { get; set; }
public new bool Equals(object o)
{
if (o.GetType() == typeof(CompareObject))
return this.prop == ((CompareObject)o).prop;
return this.GetHashCode() == o.GetHashCode();
}
}
Next add AsParallel. This can both speed up and slow down your program. In your case, you can add ...
var result = sourceList.AsParallel().Where(s => !targetList.Any(t => t.Equals(s)));
CPU 100% loaded if you try to list all at once like this:
var cnt = result.Count();
But it’s quite tolerable to work if you get the results in small portions.
result.Skip(10000).Take(10000).ToList();
Full code:
static Random random = new Random();
public class CompareObject
{
public string prop { get; private set; }
public CompareObject()
{
prop = random.Next(0, 100000).ToString();
}
public new bool Equals(object o)
{
if (o.GetType() == typeof(CompareObject))
return this.prop == ((CompareObject)o).prop;
return this.GetHashCode() == o.GetHashCode();
}
}
void Main()
{
var sourceList = new List<CompareObject>();
var targetList = new List<CompareObject>();
for (int i = 0; i < 10000000; i++)
{
sourceList.Add(new CompareObject());
targetList.Add(new CompareObject());
}
var stopWatch = new Stopwatch();
stopWatch.Start();
var result = sourceList.AsParallel().Where(s => !targetList.Any(t => t.Equals(s)));
var lr = result.Skip(10000).Take(10000).ToList();
stopWatch.Stop();
Console.WriteLine(stopWatch.Elapsed);
}
Update
I remembered what you can use Hashtable.Choos unique values from targetList and from sourceList next fill out the result whose values are not targetList.
Example:
static Random random = new Random();
public class CompareObject
{
public string prop { get; private set; }
public CompareObject()
{
prop = random.Next(0, 1000000).ToString();
}
public new int GetHashCode() {
return prop.GetHashCode();
}
}
void Main()
{
var sourceList = new List<CompareObject>();
var targetList = new List<CompareObject>();
for (int i = 0; i < 10000000; i++)
{
sourceList.Add(new CompareObject());
targetList.Add(new CompareObject());
}
var stopWatch = new Stopwatch();
stopWatch.Start();
var sourceHashtable = new Hashtable();
var targetHashtable = new Hashtable();
foreach (var element in targetList)
{
var hash = element.GetHashCode();
if (!targetHashtable.ContainsKey(hash))
targetHashtable.Add(element.GetHashCode(), element);
}
var result = new List<CompareObject>();
foreach (var element in sourceList)
{
var hash = element.GetHashCode();
if (!sourceHashtable.ContainsKey(hash))
{
sourceHashtable.Add(hash, element);
if(!targetHashtable.ContainsKey(hash)) {
result.Add(element);
}
}
}
stopWatch.Stop();
Console.WriteLine(stopWatch.Elapsed);
}
Scanning the target list to match the name is an O(n) operation, thus your loop is O(n^2). If you build a HashSet<string> of all the distinct names in the target list, you can check whether a name exists in the set in O(1) time using the Contains method.
//getting data from database here
You are getting the data out of a system that specializes in matching and sorting and filtering data, into your RAM that by default cannot yet do that task at all. And then you try to sort, filter and match yourself.
That will fail. No matter how hard you try, it is extremely unlikely that your computer with a single programmer working at a matching algorithm will outperform your specialized piece of hardware called a database server at the one operation this software is supposed to be really good at that was programmed by teams of experts and optimized for years.
You don't go into a fancy restaurant and ask them to give you huge bags of raw ingredients so you can throw them into a big bowl unpeeled and microwave them at home. No. You order a nice dish because it will be way better than anything you could do yourself.
The simple answer is: Do not do that. Do not take the raw data and rummage around in it for hours. Leave that job to the database. It's the one thing it's supposed to be good at. Use it's power. Write a query that will give you the result, don't get the raw data and then play database yourself.
Foreach performs a null check before each iteration, so using a standard for loop will provide slightly better performance that will be hard to beat.
If it is taking too long, can you break down the collection into smaller sets and/or process them in parallel?
Also you could look a PLinq (Parallel Linq) using .AsParallel()
Other areas to improve are the actual comparison logic that you are using, also how the data is stored in memory, depending on your problem, you may not have to load the entire object into memory for every iteration.
Please provide a code example so that we can assist further, when such large amounts of data are involved performance degredation is to be expected.
Again depending on the time that we are talking about here, you could upload the data into a database and use that for the comparison rather than trying to do it natively in C#, this type of solution is better suited to data sets that are already in a database or where the data changes much less frequently than the times you need to perform the comparison.

Grouping and sum

I have a list as follows which will contain the following poco class.
public class BoxReportView
{
public DateTime ProductionPlanWeekStarting { get; set; }
public DateTime ProductionPlanWeekEnding { get; set; }
public string BatchNumber { get; set; }
public string BoxRef { get; set; }
public string BoxName { get; set; }
public decimal Qty { get; set; }
public FUEL_KitItem KitItem { get; set; }
public decimal Multiplier { get; set; }
}
I am wanting to group the report and sum it by using the BoxName and also the Qty SO I tried the following
var results = from line in kitItemsToGroup
group line by line.BoxName into g
select new BoxReportView
{
BoxRef = g.First().BoxRef,
BoxName = g.First().BoxName,
Qty = g.Count()
};
In My old report I was just doing this
var multiplier = finishedItem.SOPOrderReturnLine.LineQuantity -
finishedItem.SOPOrderReturnLine.StockUnitDespatchReceiptQuantity;
foreach (KitItem kItem in kitItems.Cast<KitItem().Where(z => z.IsBox == true).ToList())
{
kittItemsToGroup.Add(new BoxReportView() {
BatchNumber = _batchNumber,
ProductionPlanWeekEnding = _weekEndDate,
ProductionPlanWeekStarting = _weekStartDate,
BoxRef = kItem.StockCode,
KitItem = kItem,
Multiplier = multiplier,
Qty = kItem.Qty });
}
}
Then I was just returning
return kitItemsToGroup;
But as I am using it as a var I cannot what is best way to handle the grouping and the sum by box name and qty.
Whether it is the best way depends upon your priorities. Is processing speed important, or is it more important that the code is easy to understand, easy to test, easy to change and easy to debug?
One of the advantages of LINQ is, that it tries to avoid enumeration of the source more than necessary.
Are you sure that the users of this code will always need the complete collection? Can it be, that now, or in near future, someone only wants the first element? Or decides to stop enumeration after he fetched the 20th element and saw that there was nothing of interest for him?
When using LINQ, try to return IEnumerable<...> as long as possible. Let only the end-user who will interpret your LINQed data decide whether he wants to take only the FirstOrDefault(), or Count() everything, or put it in a Dictionary, or whatever. It is a waste of processing power to create a List if it is not going to be used as a List.
your LINQ code and your foreach do some completely different things. Alas it is quite common here on StackOverflow for people to ask for LINQ statements without really specifying their requirements. So I'll have to guess something in between your LINQ statement and your foreach.
Requirement Group the input sequence of kitItems, which are expected to be Fuel_KitItems into groups of BoxReportViews with the same BoxName, and select several properties from every Fuel_KitItem in each group.
var kitItemGroups = kitItems
.Cast<Fuel_KitItem>() // only needed if kitItems is not IEnumerable<Fuel_KitItem>
// make groups of Fuel_KitItems with same BoxName:
.GroupBy(fuelKitItem => fuelKitItem.BoxName,
// ResultSelector, take the BoxName and all fuelKitItems with this BoxName:
(boxName, fuelKitItemsWithThisBoxName) => new
{
// Select only the properties you plan to use:
BoxName = boxName,
FuelKitItems = fuelKitItemsWithThisBoxName.Select(fuelKitItem => new
{
// Only Select the properties that you plan to use
BatchNumber = fuelKitItem.BatchNumber,
Qty = fuelKitItem.Qty,
...
// Not needed, they are all equal to boxName:
// BoxName = fuelKitItem.BoxName
})
// only do ToList if you are certain that the user of the result
// will need the complete list of fuelKitItems in this group
.ToList(),
});
Usage:
var kitItemGroups = ...
// I only need the KitItemGroups with a BoxName starting with "A"
var result1 = kitItemGroups.Where(group => group.BoxName.StartsWith("A"))
.ToList();
// Or I only want the first three after sorting by group size
var result2 = kitItemGroups.OrderBy(group => group.FuelKitItems.Count())
.Take(3)
.ToList();
Efficiency Improvements: As long as you don't know how your LINQ will be used, don't make it a List. If you know that chances are high that the Count of group.FuelKitItems is needed, to a ToList

How to fetch,process and save huge record set in c# efficiently?

I am trying to achieve below things:
get the data from SQL DB .
Pass data to PerformStuff method which has third party method
MethodforResponse(It checks input and provide repsonse)
Save response(xml) back to SQL DB.
below is the sample code.performance wise its not good ,if there are 1000,000 Records in DB its very slow.
its there a better of doing it?any idea or hints to make it better.
please help.
using thirdpartylib;
public class Program
{
static void Main(string[] args)
{
var response = PerformStuff();
Save(response);
}
public class TestRequest
{
public int col1 { get; set; }
public bool col2 { get; set; }
public string col3 { get; set; }
public bool col4 { get; set; }
public string col5 { get; set; }
public bool col6 { get; set; }
public string col7 { get; set; }
}
public class TestResponse
{
public int col1 { get; set; }
public string col2 { get; set; }
public string col3 { get; set; }
public int col4 { get; set; }
}
public TestRequest GetDataId(int id)
{
TestRequest testReq = null;
try
{
SqlCommand cmd = DB.GetSqlCommand("proc_name");
cmd.AddInSqlParam("#Id", SqlDbType.Int, id);
SqlDataReader dr = new SqlDataReader(DB.GetDataReader(cmd));
while (dr.Read())
{
testReq = new TestRequest();
testReq.col1 = dr.GetInt32("col1");
testReq.col2 = dr.GetBoolean("col2");
testReq.col3 = dr.GetString("col3");
testReq.col4 = dr.GetBoolean("col4");
testReq.col5 = dr.GetString("col5");
testReq.col6 = dr.GetBoolean("col6");
testReq.col7 = dr.GetString("col7");
}
dr.Close();
}
catch (Exception ex)
{
throw;
}
return testReq;
}
public static TestResponse PerformStuff()
{
var response = new TestResponse();
//give ids in list
var ids = thirdpartylib.Methodforid()
foreach (int id in ids)
{
var request = GetDataId(id);
var output = thirdpartylib.MethodforResponse(request);
foreach (var data in output.Elements())
{
response.col4 = Convert.ToInt32(data.Id().Class());
response.col2 = data.Id().Name().ToString();
}
}
//request details
response.col1 = request.col1;
response.col2 = request.col2;
response.col3 = request.col3;
return response;
}
public static void Save(TestResponse response)
{
var Sb = new StringBuilder();
try
{
Sb.Append("<ROOT>");
Sb.Append("<id");
Sb.Append(" col1='" + response.col1 + "'");
Sb.Append(" col2='" + response.col2 + "'");
Sb.Append(" col3='" + response.col3 + "'");
Sb.Append(" col4='" + response.col4 + "'");
Sb.Append("></Id>");
Sb.Append("</ROOT>");
var cmd = DB.GetSqlCommand("saveproc");
cmd.AddInSqlParam("#Data", SqlDbType.VarChar, Sb.ToString());
DB.ExecuteNoQuery(cmd);
}
catch (Exception ex)
{
throw;
}
}
}
Thanks!
I think the root of your problem is that you get and insert data record-by-record. There is no possible way to optimize it. You need to change the approach in general.
You should think of a solution that:
1. Gets all the data in one command to the database.
2. Process it.
3. Save it back to the database in one command, using a technique like BULK INSERT. Please be ware that BULK INSERT has certain limitations, so read the documentation carefully.
Your question is very broad and the method PerformStuff() will be fundamentally slow because it takes O(n) * db_lookup_time before another iteration of the output. So, to me it seems you're going about this problem the wrong way.
Database query languages are made to optimize data traversal. So iterating by id, and then checking values, goes around this producing the slowest lookup time possible.
Instead, leverage SQL's powerful query language and use clauses like where id < 10 and value > 100 because you ultimately want to limit the size of the data set needed to be processed by C#.
So:
Read just the smallest amount data you need from the DB
Process this data as a unit, parallelism might help.
Write back modifications in one DB connection.
Hope this sets you in the right direction.
Based on your comment, there are multiple things you can enhance in your solution, from memory consumption to CPU usage.
Take advantage of paging at the database level. Do not fetch all records at once, to avoid having memory leaks and/or high memory consumption in cases of 1+ million records, rather take chunk by chunk and do whatever you need to do with it.
Since you don't need to save XML into a database, you can choose to save response into the file. Saving XML into file gives you an opportunity to stream data onto your local disc.
Instead of assembling XML by yourself, use XmlSerializer to do that job for you. XmlSerializer works nicely with XmlWriter which in the end can work with any stream including FileStream. There is a thread about it, which you can take as an example.
To conclude, PerformStuff method won't be only faster, but it will require way fewer resources (memory, CPU) and the most important thing, you'll be easily able to constraint resource consumption of your program (by changing the size of database page).
Observation: your requirement looks like it matches the map / reduce pattern.
If the values in your ids collection returned by thirdpartylib.Methodforid() are reasonably dense, and the number of rows in the table behind your proc_name stored procedure has close to the same number of items in the ids collection, you can retrieve all the records you need with a single SQL query (and a many-row result set) rather than retrieving them one by one. That might look something like this:
public static TestResponse PerformStuff()
{
var response = new TestResponse();
var idHash = new HashSet<int> (thirdpartylib.Methodforid());
SqlCommand cmd = DB.GetSqlCommand("proc_name_for_all_ids");
using (SqlDataReader dr = new SqlDataReader(DB.GetDataReader(cmd)) {
while (dr.Read()) {
var id = dr.GetInt32("id");
if (idHash.Contains(id)) {
testReq = new TestRequest();
testReq.col1 = dr.GetInt32("col1");
testReq.col2 = dr.GetBoolean("col2");
testReq.col3 = dr.GetString("col3");
testReq.col4 = dr.GetBoolean("col4");
testReq.col5 = dr.GetString("col5");
testReq.col6 = dr.GetBoolean("col6");
testReq.col7 = dr.GetString("col7");
var output = thirdpartylib.MethodforResponse(request);
foreach (var data in output.Elements()) {
response.col4 = Convert.ToInt32(data.Id().Class());
response.col2 = data.Id().Name().ToString();
}
} /* end if hash.Contains(id) */
} /* end while dr.Read() */
} /* end using() */
return response;
}
Why might this be faster? It makes many fewer database queries, and instead streams in the multiple rows of data to process. This will be far more efficient than your example.
Why might it not work?
if the id values must be processed in the same order produced by thirdpartylib.Methodforid() it won't work.
if there's no way to retrieve all the rows, that is no proc_name_for_all_ids stored procedure available, you won't be able to stream the rows.

Checking foreach loop list results C#

I have this bit of code in a class:
public class TicketSummary
{
//get all the development tickets
public List<IncidentSummary> AllDevelopmentTickets { get; set; }
public List<string> TicketNames()
{
List<string> v = new List<string>();
foreach (var developmentTicket in AllDevelopmentTickets)
{
var ticketIds = developmentTicket.id.ToString(CultureInfo.InvariantCulture);
v.Add(ticketIds);
}
return v;
}
}
}
And I am trying to see if my API connection (plus all the code) did it's job and pulls back the tickets and their info, more specifically the ids.
In my main program I have no clue how to check if it did the job. I tried something but it isn't quite right and doesn't return anything ( I know I need a Console.WriteLine)
static void Main(string[] args)
{
Console.ReadLine();
var tickets = new TicketSummary();
tickets.TicketNames();
while ( tickets != null )
{
Console.WriteLine(tickets);
}
}
Any suggestions, please?
Thank you!
You've dropped the returned result: tickets.TicketNames(); returns List<String> that you have to assign and then itterate:
var tickets = new TicketSummary();
var names = tickets.TicketNames(); // <- names, List<String> according to the code
// printing out all the names
foreach(var name in names)
Console.WriteLine(name);
Do you mean you just want to print all the tickets out?
foreach (var ticket in tickets.TicketNames())
{
Console.WriteLine(ticket);
}
You have several problems in your code, that should keep it from even compiling, but aside from that, It seem's what you're really after is rather transforming the data in AllDevelopmentTickets, rather than moving it somewhere. So you could probably do it with a Select call (from LINQ). So, in your main method:
var tickets = new TicketSummary();
// add some tickets to tickets.AllDevelopmentTickets here...
var ticketNames = tickets.AllDevelopmentTickets.Select(ticket => ticket.id.ToString();
// Yes, you should probably use an UI culture in the ToString call.
// I'm just trying to limit my line width =)
Now, ticketNames should be an IEnumerable<string> holding all the ticket ids. To, for example, print them out, you can iterate over them and write to console output:
foreach (var name in ticketNames) {
Console.WriteLine(name);
}
You need to assign / use the return value of the TicketNames() method. This seems a lot of work just to return the string version of the TicketId. This can be reduced
public List<string> TicketNames()
{
return AllDevelopmentTickets
.Select(t => t.id.ToString(CultureInfo.InvariantCulture))
.ToList();
}
var ticketSummary = new TicketSummary();
var ticketNames = ticketSummary.TicketNames();
foreach(var ticketName in ticketNames)
{
Console.WriteLine(ticketName);
}
or Even just:
foreach(var ticketName in AllDevelopmentTickets
.Select(t => t.id.ToString(CultureInfo.InvariantCulture)))
{
Console.WriteLine(ticketName);
}
You're ignoring the returned value.
static void Main(string[] args)
{
Console.ReadLine();
var tickets = new TicketSummary();
var res = tickets.TicketNames();
while ( for r in res )
{
Console.WriteLine(r);
}
}

Edit Web Config file and apply changes

Am doing a windows application in c#, where i read web.config files inside a folder and load the appsettings where users can edit them and apply changes.
I store the settings 'key' and 'value' in a dictionary and the effected values in a separate dictionary . It works well, but it takes lot of time to apply the changes.
How can i speed it up?
here is my code
public List<AppSettings> OldAppSetting;
public List<AppSettings> NewAppSetting;
foreach (var oldSetList in OldAppSetting)
{
Document = Document = XDocument.Load(#oldSetList.FilePathProp);
var appSetting = Document.Descendants("add").Select(add => new
{
Key = add.Attribute("key"),
Value = add.Attribute("value")
}).ToArray();
foreach (var oldSet in appSetting)
{
foreach (var newSet in NewAppSetting)
{
if (oldSet.Key != null)
{
if (oldSet.Key.Value == newSet.AppKey)
{
oldSet.Value.Value = newSet.AppValue;
}
}
Document.Save(#oldSetList.FilePathProp);
}
}
}
here is the Appsettings class
public class AppSettings
{
public string AppKey { get; set; }
public string AppValue { get; set; }
public string FilePathProp{ get; set; }
}
I think your primary speed concern is that you're saving the document after checking every item. Seems like you could change your code to reduce the number of times you call save. For example:
foreach (var oldSetList in OldAppSetting)
{
Document = Document = XDocument.Load(#oldSetList.FilePathProp);
var appSetting = Document.Descendants("add").Select(add => new
{
Key = add.Attribute("key"),
Value = add.Attribute("value")
}).ToArray();
foreach (var oldSet in appSetting)
{
foreach (var newSet in NewAppSetting)
{
if (oldSet.Key != null)
{
if (oldSet.Key.Value == newSet.AppKey)
{
oldSet.Value.Value = newSet.AppValue;
}
}
}
}
Document.Save(#oldSetList.FilePathProp);
}
Also, you could use a Dictionary<string, AppSetting> rather than an array for your appSetting. That would speed things up quite a bit if the number of items is large. It would take some restructuring of your code. I don't know what all of your types are, so I can't give you the exact code, but it would look something like this:
var appSetting = Document.Descendants("add")
.ToDictionary(add => add.Attribute("key"));
foreach (var newSet in NewAppSetting)
{
if (appSetting.ContainsKey(newSet.AppKey))
{
var oldSet = appSetting[newSet.AppKey];
oldSet.Value.Value = newSet.AppValue;
}
}
Your code is a little bit confusing, but I think that's right. The idea here is to build a dictionary of the old values so that we can look them up directly when scanning the new values. It turns your O(n^2) algorithm into an O(n) algorithm, which will make a difference if there are a lot of settings. Plus, the code is smaller and easier to follow.
Put the
Document.Save(#oldSetList.FilePathProp);
Outside the loop!

Categories