c# need help optimizing string array function - c#

This following code works perfectly fine on a small data set. However, the GetMatchCount and BuildMatchArrary are very sluggish on large result. Can anyone recommend any different approach so save processing time? Would it be better to write the array to a file? Are lists just generally slow and not the best option?
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
public class Client
{
public int Id;
public string FirstName
{
get
{
var firstName = //<call to get from database via Id>
return firstName;
}
}
public string MiddleName
{
get
{
var middleName = //<call to get from database via Id>
return middleName;
}
}
public string LastName
{
get
{
var lastName = //<call to get from database via Id>
return lastName;
}
}
public string FullName
{
get
{
return FirstName + " " + MiddleName + " " + LastName;
}
}
public int GetMatchCount(IEnumerable<string> clientFirstNames, IEnumerable<string> clientMiddleNames, IEnumerable<string> clientLastNames)
{
var clientFullNames = BuildMatchArray(clientFirstNames, clientMiddleNames, clientLastNames);
return clientFullNames.Count(x => x == FullName);
}
public string[] BuildMatchArray(IEnumerable<string> clientFirstNames, IEnumerable<string> clientMiddleNames, IEnumerable<string> clientLastNames)
{
Debug.Assert(clientFirstNames.Count() == clientMiddleNames.Count() && clientMiddleNames.Count() == clientLastNames.Count());
var clientFullNames = new List<string>();
for (int i = 0; i < clientFirstNames.Count(); i++)
{
clientFullNames.Add(clientFirstNames.ElementAt(i) + " " + clientMiddleNames.ElementAt(i) + " " + clientLastNames.ElementAt(i));
}
return clientFullNames.ToArray();
}
}

Where are you getting these strings? If you are using lazy sequences, every time you call Count() you will have to iterate the entire sequence to count how many objects are in the sequence. If the IEnumerable<T> is really a T[] or List<T>, then Count() is optimized to just call the Length or Count property, which isn't expensive. Similarly, ElementAt is also very inefficient and iterates the collection. So with an in-memory lazy sequence this performance will be bad, but if you are streaming results from SQL or an external source, it will be really bad or possibly even incorrect.
A more performant implementation of of BuildMatchArray would be like this:
public IEnumerable<string> ZipNames(IEnumerable<string> firsts,
IEnumerable<string> middles, IEnumerable<string> lasts)
{
using(var e1 = firsts.GetEnumerator())
using(var e2 = middles.GetEnumerator())
using(var e3 = lasts.GetEnumerator())
{
var stop = false;
while (!stop)
{
var hasNext1 = e1.MoveNext();
var hasNext2 = e2.MoveNext();
var hasNext3 = e3.MoveNext();
if (hasNext1 && hasNext2 && hasNext3)
{
yield return $"{e1.Current} {e2.Current} {e3.Current}";
}
else
{
stop = true;
Debug.Assert(!(hasNext1 || hasNext2 || hasNext3));
}
}
}
}
This requires only one iteration of each input collection, and doesn't need to copy elements to a new List<T>. Another point to note, is that List<T> starts with capacity for 4 elements, and when it fills up, it copies all elements to a new list with double capacity. So if you have a large sequence, you will copy many times.
This implementation is very similar to System.Linq.Enumerable.Zip
In your case, you also shouldn't do a ToArray to your sequence. This will require another copying, and can potentially be a huge array. If you are only sending that array to .Count(x => x == y), then keeping a lazy IEnumerable would be better, because Count operates lazily for lazy sequences and streams data in and counts elements as it sees them, without ever requiring the full collection to be in memory.
See IEnumerable vs List - What to Use? How do they work?

Related

Permutation algorithm Optimization

I have this permutation code working perfectly but it does not generate the code fast enough, I need help with optimizing the code to run faster, please it is important that the result remains the same, I have seen other algorithms but they don't into consideration the output length and same character reputation which are all valid output. if I can have this converted into a for loop with 28 characters of alphanumeric, that would be awesome. below is the current code I am looking to optimize.
namespace CSharpPermutations
{
public interface IPermutable<T>
{
ISet<T> GetRange();
}
public class Digits : IPermutable<int>
{
public ISet<int> GetRange()
{
ISet<int> set = new HashSet<int>();
for (int i = 0; i < 10; ++i)
set.Add(i);
return set;
}
}
public class AlphaNumeric : IPermutable<char>
{
public ISet<char> GetRange()
{
ISet<char> set = new HashSet<char>();
set.Add('0');
set.Add('1');
set.Add('2');
set.Add('3');
set.Add('4');
set.Add('5');
set.Add('6');
set.Add('7');
set.Add('8');
set.Add('9');
set.Add('a');
set.Add('b');
return set;
}
}
public class PermutationGenerator<T,P> : IEnumerable<string>
where P : IPermutable<T>, new()
{
public PermutationGenerator(int number)
{
this.number = number;
this.range = new P().GetRange();
}
public IEnumerator<string> GetEnumerator()
{
foreach (var item in Permutations(0,0))
{
yield return item.ToString();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
foreach (var item in Permutations(0,0))
{
yield return item;
}
}
private IEnumerable<StringBuilder> Permutations(int n, int k)
{
if (n == number)
yield return new StringBuilder();
foreach (var element in range.Skip(k))
{
foreach (var result in Permutations(n + 1, k + 1))
{
yield return new StringBuilder().Append(element).Append(result);
}
}
}
private int number;
private ISet<T> range;
}
class MainClass
{
public static void Main(string[] args)
{
foreach (var element in new PermutationGenerator<char, AlphaNumeric>(2))
{
Console.WriteLine(element);
}
}
}
}
Thanks for your effort in advance.
What you're outputting there is the cartesian product of two sets; the first set is the characters "0123456789ab" and the second set is the characters "123456789ab".
Eric Lippert wrote a well-known article demonstrating how to use Linq to solve this.
We can apply this to your problem like so:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo;
static class Program
{
static void Main(string[] args)
{
char[][] source = new char[2][];
source[0] = "0123456789ab".ToCharArray();
source[1] = "0123456789ab".ToCharArray();
foreach (var perm in Combine(source))
{
Console.WriteLine(string.Concat(perm));
}
}
public static IEnumerable<IEnumerable<T>> Combine<T>(IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
from accseq in accumulator
from item in sequence
select accseq.Concat(new[] { item }));
}
}
You can extend this to 28 characters by modifying the source data:
source[0] = "0123456789abcdefghijklmnopqr".ToCharArray();
source[1] = "0123456789abcdefghijklmnopqr".ToCharArray();
If you want to know how this works, read Eric Lipper's excellent article, which I linked above.
Consider
foreach (var result in Permutations(n + 1, k + 1))
{
yield return new StringBuilder().Append(element).Append(result);
}
Permutations is a recursive function that implements an iterator. So each time the .MoveNext() method is will advance one step of the loop, that will call MoveNext() in turn etc, resulting in N calls to MoveNext(), new StringBuilder, Append() etc. This is quite inefficient.
A can also not see that stringBuilder gives any advantage here. It is a benefit if you concatenate many strings, but as far as I can see you only add two strings together.
The first thing you should do is add code to measure the performance, or even better, use a profiler. That way you can tell if any changes actually improves the situation or not.
The second change I would try would be to try rewrite the recursion to an iterative implementation. This probably means that you need to keep track of an explicit stack of the numbers to process. Or if this is to difficult, stop using iterator blocks and let the recursive method take a list that it adds results to.

Best Way to compare 1 million List of object with another 1 million List of object in c#

i am differentiating 1 million list of object with another 1 million list of object.
i am using for , foreach but it takes too much of time to iterate those list.
can any one help me best way to do this
var SourceList = new List<object>(); //one million
var TargetList = new List<object>()); // one million
//getting data from database here
//SourceList with List of one million
//TargetList with List of one million
var DifferentList = new List<object>();
//ForEach
SourceList.ToList().ForEach(m =>
{
if (!TargetList.Any(s => s.Name == m.Name))
DifferentList.Add(m);
});
//for
for (int i = 0; i < SourceList .Count; i++)
{
if (!TargetList .Any(s => s == SourceList [i].Name))
DifferentList .Add(SourceList [i]);
}
I think it seems like a bad idea but IEnumerable magic will help you.
For starters, simplify your expression. It looks like this:
var result = sourceList.Where(s => targetList.Any(t => t.Equals(s)));
I recommend making a comparison in the Equals method:
public class CompareObject
{
public string prop { get; set; }
public new bool Equals(object o)
{
if (o.GetType() == typeof(CompareObject))
return this.prop == ((CompareObject)o).prop;
return this.GetHashCode() == o.GetHashCode();
}
}
Next add AsParallel. This can both speed up and slow down your program. In your case, you can add ...
var result = sourceList.AsParallel().Where(s => !targetList.Any(t => t.Equals(s)));
CPU 100% loaded if you try to list all at once like this:
var cnt = result.Count();
But it’s quite tolerable to work if you get the results in small portions.
result.Skip(10000).Take(10000).ToList();
Full code:
static Random random = new Random();
public class CompareObject
{
public string prop { get; private set; }
public CompareObject()
{
prop = random.Next(0, 100000).ToString();
}
public new bool Equals(object o)
{
if (o.GetType() == typeof(CompareObject))
return this.prop == ((CompareObject)o).prop;
return this.GetHashCode() == o.GetHashCode();
}
}
void Main()
{
var sourceList = new List<CompareObject>();
var targetList = new List<CompareObject>();
for (int i = 0; i < 10000000; i++)
{
sourceList.Add(new CompareObject());
targetList.Add(new CompareObject());
}
var stopWatch = new Stopwatch();
stopWatch.Start();
var result = sourceList.AsParallel().Where(s => !targetList.Any(t => t.Equals(s)));
var lr = result.Skip(10000).Take(10000).ToList();
stopWatch.Stop();
Console.WriteLine(stopWatch.Elapsed);
}
Update
I remembered what you can use Hashtable.Choos unique values from targetList and from sourceList next fill out the result whose values are not targetList.
Example:
static Random random = new Random();
public class CompareObject
{
public string prop { get; private set; }
public CompareObject()
{
prop = random.Next(0, 1000000).ToString();
}
public new int GetHashCode() {
return prop.GetHashCode();
}
}
void Main()
{
var sourceList = new List<CompareObject>();
var targetList = new List<CompareObject>();
for (int i = 0; i < 10000000; i++)
{
sourceList.Add(new CompareObject());
targetList.Add(new CompareObject());
}
var stopWatch = new Stopwatch();
stopWatch.Start();
var sourceHashtable = new Hashtable();
var targetHashtable = new Hashtable();
foreach (var element in targetList)
{
var hash = element.GetHashCode();
if (!targetHashtable.ContainsKey(hash))
targetHashtable.Add(element.GetHashCode(), element);
}
var result = new List<CompareObject>();
foreach (var element in sourceList)
{
var hash = element.GetHashCode();
if (!sourceHashtable.ContainsKey(hash))
{
sourceHashtable.Add(hash, element);
if(!targetHashtable.ContainsKey(hash)) {
result.Add(element);
}
}
}
stopWatch.Stop();
Console.WriteLine(stopWatch.Elapsed);
}
Scanning the target list to match the name is an O(n) operation, thus your loop is O(n^2). If you build a HashSet<string> of all the distinct names in the target list, you can check whether a name exists in the set in O(1) time using the Contains method.
//getting data from database here
You are getting the data out of a system that specializes in matching and sorting and filtering data, into your RAM that by default cannot yet do that task at all. And then you try to sort, filter and match yourself.
That will fail. No matter how hard you try, it is extremely unlikely that your computer with a single programmer working at a matching algorithm will outperform your specialized piece of hardware called a database server at the one operation this software is supposed to be really good at that was programmed by teams of experts and optimized for years.
You don't go into a fancy restaurant and ask them to give you huge bags of raw ingredients so you can throw them into a big bowl unpeeled and microwave them at home. No. You order a nice dish because it will be way better than anything you could do yourself.
The simple answer is: Do not do that. Do not take the raw data and rummage around in it for hours. Leave that job to the database. It's the one thing it's supposed to be good at. Use it's power. Write a query that will give you the result, don't get the raw data and then play database yourself.
Foreach performs a null check before each iteration, so using a standard for loop will provide slightly better performance that will be hard to beat.
If it is taking too long, can you break down the collection into smaller sets and/or process them in parallel?
Also you could look a PLinq (Parallel Linq) using .AsParallel()
Other areas to improve are the actual comparison logic that you are using, also how the data is stored in memory, depending on your problem, you may not have to load the entire object into memory for every iteration.
Please provide a code example so that we can assist further, when such large amounts of data are involved performance degredation is to be expected.
Again depending on the time that we are talking about here, you could upload the data into a database and use that for the comparison rather than trying to do it natively in C#, this type of solution is better suited to data sets that are already in a database or where the data changes much less frequently than the times you need to perform the comparison.

Really slow load speed Neo4jClient C# LoadCsv

The code I use now is really slow with about 20 inserts per second and uses a splitter to create multiple csv files to load. Is there a way to use "USING PERIODIC COMMIT 1000" in a proper way using the Neo4jClient for dotnet?
public async Task InsertEdgesByName(List<string> nodeListA, List<string> nodeListB,
List<int> weightList, string type)
{
for (var i = 0; i < nodeListA.Count; i += 200)
{
using (var sw = new StreamWriter(File.OpenWrite($"tempEdge-{type}.csv")))
{
sw.Write("From,To,Weight\n");
for (var j = i;
j < i + 200 &
j < nodeListA.Count;
j++)
{
sw.Write($"{nodeListA[j]}," +
$"{nodeListB[j]}," +
$"{weightList[j]} + id:{j}" +
$"\n");
}
}
var f = new FileInfo($"tempEdge-{type}.csv");
await Client.Cypher
.LoadCsv(new Uri("file://" + f.FullName), "rels", true)
.Match("(from {label: rels.From}), (to {label: rels.To})")
.Create($"(from)-[:{type} {{weight: rels.Weight}}]->(to);")
.ExecuteWithoutResultsAsync();
_logger.LogDebug($"{DateTime.Now}\tEdges inserted\t\tedges inserted: {i}");
}
}
To create the nodes I use
await Client.Cypher
.Create("INDEX ON :Node(label);")
.ExecuteWithoutResultsAsync();
await Client.Cypher
.LoadCsv(new Uri("file://" + f.FullName), "csvNode", true)
.Create("(n:Node {label:csvNode.label, source:csvNode.source})")
.ExecuteWithoutResultsAsync();
The indexing on label does not seem to change the speed of either insert statement. I have about 200.000 edges to insert, at 20 per second this would take hours. Being able to add the USING PERIODIC COMMIT 1000 would clean up my code but wouldn't improve performance by much.
Is there a way to speed up inserts? I know the neo4jclient is not the fastest but I would really like to stay within the asp.net environment.
SimpleNode class
public class SimpleNodeModel
{
public long id { get; set; }
public string label { get; set; }
public string source { get; set; } = "";
public override string ToString()
{
return $"label: {label}, source: {source}, id: {id}";
}
public SimpleNodeModel(string label, string source)
{
this.label = label;
this.source = source;
}
public SimpleNodeModel() { }
public static string Header => "label,source";
public string ToCSVWithoutID()
{
return $"{label},{source}";
}
}
Cypher code
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file://F:/edge.csv' AS rels
MATCH (from {label: rels.From}), (to {label: rels.To})
CREATE (from)-[:edge {{weight: rels.Weight}}]->(to);
Regarding the slow speed of the Cypher code at the bottom, that's because you're not using labels in your MATCH, so your MATCH never uses the index to find the nodes quickly, it instead must scan every node in your database TWICE, once for from, and again for to.
Your use of label in the node properties is not the same as the node label. Since you created the nodes with the :Node label, please reuse this label in your match:
...
MATCH (from:Node {label: rels.FROM}), (to:Node {label: rels.To})
...
Period commit isn't supported in Neo4jClient in the version you're using.
I've just committed a change that will be published shortly (2.0.0.7) which you can then use:
.LoadCsv(new Uri("file://" + f.FullName), "rels", true, periodicCommit:1000)
which will generate the correct cypher.
It's on its way, and should be 5 mins or so depending on indexing time for nuget.

Get specific values of a struct/List

I'm creating a game in Unity3D + C#.
What I've got at the moment: an SQL datatable, consisting of 8 columns holding a total of 3 entries and a list "_WeapList" that holds every entry (as shown below).
public struct data
{
public string Name;
public int ID, dmg, range, magazin, startammo;
public float tbtwb, rltimer;
}
List<data> _WeapList;
public Dictionary<int, data>_WeapoList; //probable change
[...]
//reading the SQL Table + parse it into a new List-entry
while (rdr.Read())
{
data itm = new data();
itm.Name = rdr["Name"].ToString();
itm.ID = int.Parse (rdr["ID"].ToString());
itm.dmg = int.Parse (rdr["dmg"].ToString());
itm.range = int.Parse (rdr["range"].ToString());
itm.magazin = int.Parse (rdr["magazin"].ToString());
itm.startammo = int.Parse (rdr["startammo"].ToString());
itm.tbtwb = float.Parse(rdr["tbtwb"].ToString());
itm.rltimer = float.Parse(rdr["rltimer"].ToString());
_WeapList.Add(itm);
_WeapoList.Add(itm.ID, itm);//probable change
}
Now I want to create a "Weapon"-Class that will have the same 8 fields, feeding them via a given ID
How do I extract the values of a specific item (determined by the int ID, which is always unique) in the list/struct?
public class Weapons : MonoBehaviour
{
public string _Name;
public int _ID, _dmg, _range, _magazin, _startammo;
public float _tbtwb, _rltimer;
void Start()
{//Heres the main problem
_Name = _WeapoList...?
_dmg = _WeapoList...?
}
}
If your collection of weapons may become quite large or you need to frequently look up weapons in it, I would suggest using a Dictionary instead of a List for this (using the weapon ID as the key). A lookup will be much quicker using a Dictionary key than searching through a List using a loop or LINQ.
You can do this by modifying your code to do this as follows:
public Dictionary<int, data>_WeapList;
[...]
//reading the SQL Table + parse it into a new List-entry
while (rdr.Read())
{
data itm = new data();
itm.Name = rdr["Name"].ToString();
itm.ID = int.Parse (rdr["ID"].ToString());
itm.dmg = int.Parse (rdr["dmg"].ToString());
itm.range = int.Parse (rdr["range"].ToString());
itm.magazin = int.Parse (rdr["magazin"].ToString());
itm.startammo = int.Parse (rdr["startammo"].ToString());
itm.tbtwb = float.Parse(rdr["tbtwb"].ToString());
itm.rltimer = float.Parse(rdr["rltimer"].ToString());
_WeapList.Add(itm.ID, itm);//probable change
}
Then, to access elements on the list, just use the syntax:
_WeapList[weaponID].dmg; // To access the damage of the weapon with the given weaponID
Guarding against invalid IDs:
If there's a risk of the weaponID supplied not existing, you can use the .ContainsKey() method to check for it first before trying to access its members:
if (_WeapList.ContainsKey(weaponID))
{
// Retrieve the weapon and access its members
}
else
{
// Weapon doesn't exist, default behaviour
}
Alternatively, if you're comfortable using out arguments, you can use .TryGetValue() instead for validation - this is even quicker than calling .ContainsKey() separately:
data weaponData;
if (_WeapList.TryGetValue(weaponID, out weaponData))
{
// weaponData is now populated with the weapon and you can access members on it
}
else
{
// Weapon doesn't exist, default behaviour
}
Hope this helps! Let me know if you have any questions.
Let specificWeapon be a weapon to be searched in the list, then you can use the following code to select that item from the list of weapons, if it is not found then nullwill be returned. Hope that this what you are looking for:
var selectedWeapon = WeapList.FirstOrDefault(x=> x.ID == specificWeapon.ID);
if(selectedWeapon != null)
{
// this is your weapon proceed
}
else
{
// not found your weapon
}
You can use LINQ to search specific object through weaponId
var Weapon = _WeapList.FirstOrDefault(w=> w.ID == weaponId);

How can I take objects from the second set of objects which don't exist in the first set of objects in fast way?

I have records in two databases. That is the entity in the first database:
public class PersonInDatabaseOne
{
public string Name { get; set; }
public string Surname { get; set; }
}
That is the entity in the second database:
public class PersonInDatabaseTwo
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
How can I get records from the second database which don't exist in the first database (the first name and the last name must be different than in the first database). Now I have something like that but that is VERY SLOW, too slow:
List<PersonInDatabaseOne> peopleInDatabaseOne = new List<PersonInDatabaseOne>();
// Hear I generate objects but in real I take it from database:
for (int i = 0; i < 100000; i++)
{
peopleInDatabaseOne.Add(new PersonInDatabaseOne { Name = "aaa" + i, Surname = "aaa" + i });
}
List<PersonInDatabaseTwo> peopleInDatabaseTwo = new List<PersonInDatabaseTwo>();
// Hear I generate objects but in real I take it from database:
for (int i = 0; i < 10000; i++)
{
peopleInDatabaseTwo.Add(new PersonInDatabaseTwo { FirstName = "aaa" + i, LastName = "aaa" + i });
}
for (int i = 0; i < 10000; i++)
{
peopleInDatabaseTwo.Add(new PersonInDatabaseTwo { FirstName = "bbb" + i, LastName = "bbb" + i });
}
List<PersonInDatabaseTwo> peopleInDatabaseTwoWhichNotExistInDatabaseOne = new List<PersonInDatabaseTwo>();
// BELOW CODE IS VERY SLOW:
foreach (PersonInDatabaseTwo personInDatabaseTwo in peopleInDatabaseTwo)
{
if (!peopleInDatabaseOne.Any(x => x.Name == personInDatabaseTwo.FirstName && x.Surname == personInDatabaseTwo.LastName))
{
peopleInDatabaseTwoWhichNotExistInDatabaseOne.Add(personInDatabaseTwo);
}
};
The fastest way is dependent on the number of entities, and what indexes you already have.
If there's a few entities, what you already have performs better because multiple scans of a small set takes less than creating HashSet objects.
If all of your entities fit in the memory, the best way is to build HashSet out of them, and use Except which is detailed nicely by #alex.feigin.
If you can't afford loading all entities in the memory, you need to divide them into bulks based on the comparison key and load them into memory and apply the HashSet method repeatedly. Note that bulks can't be based on the number of records, but on the comparison key. For example, load all entities with names starting with 'A', then 'B', and so on.
If you already have an index on the database on the comparison key (like, in your case, FirstName and LastName) in one of the databases, you can retrieve a sorted list from the database. This will help you do binary search (http://en.wikipedia.org/wiki/Binary_search_algorithm) on the sorted list for comparison. See https://msdn.microsoft.com/en-us/library/w4e7fxsh(v=vs.110).aspx
If you already have an index on the database on the comparison key on both databases, you can get to do this in O(n), and in a scalable way (any number of records). You need to loop through both lists and find the differences only once. See https://stackoverflow.com/a/161535/187996 for more details.
Edit: with respect to the comments - using a real model and a dictionary instead of a simple set:
Try hashing your list into a Dictionary to hold your people objects, as the key - try a Tuple instead of a name1==name2 && lname1==lname2.
This will potentially then look like this:
// Some people1 and people2 lists of models already exist:
var sw = Stopwatch.StartNew();
var removeThese = people1.Select(x=>Tuple.Create(x.FirstName,x.LastName));
var dic2 = people2.ToDictionary(x=>Tuple.Create(x.Name,x.Surname),x=>x);
var result = dic2.Keys.Except(removeThese).Select(x=>dic2[x]).ToList();
Console.WriteLine(sw.Elapsed);
I hope this helps.

Categories