I have a function that gets a collection of entities, and then appends quotes and commas to the string to update the collection in the DB. This is taking an insane amount of time, it's very inefficient, but I can't think of an alternative:
IEntityCollection c = Transactions.EvalToEntityCollection<ITransactions>(Store, key, item);
int max = transes.Count <= 750 ? transes.Count : 750; // DB times out if there are more than 750, so 750 is the limit
int i = 0;
int t = transes.Count;
StringBuilder sb = new StringBuilder();
foreach (ITransactions trans in transes)
{
sb.Append("'");
sb.Append(trans.GUID);
sb.Append("',");
i++;
t--;
if (i == max || t == 0)
{
sb.Remove(sb.Length - 1, 1);
//in here, code updates a bunch of transactions (if <=750 transaction)
i = 0;
sb = new StringBuilder();
}
}
Something like this perhaps?
var str = String.Join(",", transes.Select(t => string.Format("'{0}'", t.GUID)))
But since you have the comment in your code that it times out with > 750 records, your "insane amount of time" might be from the database, not your code.
String.Join is a really handy method when you want to concatenate a list of stuff together because it automatically handles the ends for you (so you don't end up with leading or trailing delimiters).
Seems you want to do this:
Group the transaction numbers into batches of maximum 750 entities
Put all those transaction numbers in one group into one string delimited by comma and surrounded by single quotes
If so then here's the code to build the batches:
const int batchSize = 750;
List<List<Transaction>> batches =
transes
.Select((transaction, index) => new { transaction, index })
.GroupBy(indexedTransaction => indexedTransaction.index / batchSize)
.Select(group => group.Select(indexedTransaction => indexedTransaction.transaction).ToList())
.ToList();
foreach (var batch in batches)
{
// batch here is List<Transaction>, not just the GUIDs
var guids = string.Join(", ", batch.Select(transaction => "'" + transaction.GUID + "'"));
// process transaction or guids here
}
String builder is efficient. Doing it 750 times (which is your max) will definetly NOT take a measurable amount longer than any othe technique available.
Please comment out the StringBuilder part and run the project
sb.Append("'");
sb.Append("',");
I bet it will take exactly the same time to complete.
Related
I need to concatenate these values, I've seen examples using string builder but I cant quite figure it.
I am trying to return recreate the linestrings of https://api.tfl.gov.uk/Line/140/Route/Sequence/Inbound
However the results I have to return, have more than 1 string of co-ords hence the adding "[", "]"
//
for (int i = 0; i < R.geometry.coordinates.Count; i++)
foreach (List<List<double>> C in R.geometry.coordinates)
{
RS.lineStrings.Add(i++.ToString());
RS.lineStrings.Add("[");
foreach (List<double> a in C)
{
// These values are to be concatentated, I'm wanting to create a string of RS.lineStrings.Add($"[{a[1]},{a[0]}]");
RS.lineStrings.Add($"[{a[1]},{a[0]}]");
}
RS.lineStrings.Add("]");
RS.lineStrings.Add(",");
}
Considering in your code C is List<List<double>>. Then you can use LINQ to concatenate
var sb = new StringBuilder(C.Count * 20); // appox length not to resize
C.ForEach(item => sb.AppendFormat("[{0},{1}]", item[1], item[0]));
var str = sb.ToString(); // This is concatenation.
If you want list of strings
C.Select(item => $"[{item[1]},{item[0]}]").ToList();
Based on your new update (I am trying to return "[[0,1],[2,3],[4,5]]") do this
var result = "[" + string.Join(",", C.Select(item => $"[{item[1]},{item[0]}]")) + "]";
Which method you choose - should depend on the details of your list. you can still do it with string builder for better memory management
var sb = new StringBuilder(C.Count * 20); // appox length not to resize
C.ForEach(item => sb.AppendFormat("[{0},{1}],", item[1], item[0])); // note comma - ],
sb.Insert(0, "[").Replace(',', ']', sb.Length - 1, 1); // this removes last comma, ads closing bracket
You can use string.Join() to join them:
string result = string.Join(",", C);
Strings are inmutable. So if you do a lot of string connaction, that can leave a lot of dead strings in memory. The GC will deal with them, but it is still a performance issue. Especially on a Webserver it should be avoided. And then there are things like StringInterning too. A lot of minor optimisations that will get in the way if you do mass operations on strings.
StringBuilder is the closest we can get to a mutable string, that get's around those optimsiations (wich may be a hinderance here). The only use difference is that you use "Append" rather then "Add".
There are multiple possibilities :
RS.lineStrings.Add(string.concat("[{a[1]}" + "," + "{a[0]}]");
RS.lineStrings.Add(string.concat("[{a[1]}",",","{a[0]}]");
Documentation https://learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings
Here's a solution with a StringBuilder. It's wordy as all get out, but it should be much faster and produce much less garbage for a large number of items. It uses no string concatenation.
var listOfLists = new List<List<double>>
{
new List<double> {1.0, 2.0, 3.0},
new List<double> {3.14, 42.0}
};
var buffer = new StringBuilder();
buffer.Append('[');
var firstOuter = true;
foreach (var list in listOfLists)
{
var firstInner = true;
buffer.Append('[');
if (!firstOuter)
{
buffer.Append(',');
}
foreach (var item in list)
{
if (!firstInner)
{
buffer.Append(',');
}
firstInner = firstOuter = false;
buffer.Append(item.ToString());
}
buffer.Append(']');
}
buffer.Append(']');
var concatenated = buffer.ToString();
This is a two-part question.
I have programmatically determined a range of double values:
public static void Main(string[] args)
{
var startRate = 0.0725;
var rateStep = 0.001;
var maxRate = 0.2;
var stepsFromStartToMax = (int)Math.Ceiling((maxRate-startRate)/rateStep);
var allRateSteps = Enumerable.Range(0, stepsFromStartToMax)
.Select(i => startRate + (maxRate - startRate) * ((double)i / (stepsFromStartToMax - 1)))
.ToArray();
foreach (var i in allRateSteps)
{
Console.WriteLine(i); // this prints the correct values
}
}
I would like to divide this list of numbers up into chunks based on the processor count, which I can get from Environment.ProcessorCount (usually 8.) Ideally, I would end up with something like a List of Tuples, where each Tuple contains the start and end values for each chunk:
[(0.725, 0.813), (0.815, 0.955), ...]
1) How do you select out the inner ranges in less code, without having to know how many tuples I will need? I've come up with a long way to do this with loops, but I'm hoping LINQ can help here:
var counter = 0;
var listOne = new List<double>();
//...
var listEight = new List<double>();
foreach (var i in allRateSteps)
{
counter++;
if (counter < allRateSteps.Length/8)
{
listOne.Add(i);
}
//...
else if (counter < allRateSteps.Length/1)
{
listEight.Add(i);
}
}
// Now that I have lists, I can get their First() and Last() to create tuples
var tupleList = new List<Tuple<double, double>>{
new Tuple<double, double>(listOne.First(), listOne.Last()),
//...
new Tuple<double, double>(listEight.First(), listEight.Last())
};
Once I have this new list of range Tuples, I want to use each of these as a basis for a parallel loop which writes to a ConcurrentDictionary during certain conditions. I'm not sure how to get this code into my loop...
I've got this piece of code working on multiple threads, but 2) how do I evenly distribute the work across all processors based on the ranges I've defined in tupleList:
var maxRateObj = new ConcurrentDictionary<string, double>();
var startTime = DateTime.Now;
Parallel.For(0,
stepsFromStartToMax,
new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
},
x =>
{
var i = (x * rateStep) + startRate;
Console.WriteLine("{0} : {1} : {2} ",
i,
DateTime.Now - startTime,
Thread.CurrentThread.ManagedThreadId);
if (!maxRateObj.Any())
{
maxRateObj["highestRateSoFar"] = i;
}
else {
if (i > maxRateObj["highestRateSoFar"])
{
maxRateObj["highestRateSoFar"] = i;
}
}
});
This prints out, e.g.:
...
0.1295 : 00:00:00.4846470 : 5
0.0825 : 00:00:00.4846720 : 8
0.1645 : 00:00:00.4844220 : 6
0.0835 : 00:00:00.4847510 : 8
...
Thread1 needs to handle the ranges in the first tuple, thread2 handles the ranged defined in the second tuple, etc... where i is defined by the range in the loop. Again, the number of range tuples will depend on the number of processors. Thanks.
I would like to divide this list of numbers up into chunks based on the processor count
There are many possible implementations for a LINQ Batch method.
How do you select out the inner ranges in less code, without having to know how many tuples I will need?
Here's one way to handle that:
var batchRanges = from batch in allRateSteps.Batch(anyNumberGoesHere)
let first = batch.First()
let last = batch.Last()
select Tuple.Create(first, last);
(0.0725, 0.0795275590551181)
(0.0805314960629921, 0.0875590551181102)
(0.0885629921259842, 0.0955905511811024)
...
how do I evenly distribute the work across all processors based on the ranges I've defined in tupleList
This part of your example doesn't reference tupleList so it's hard to see the desired behavior.
Thread1 needs to handle the ranges in the first tuple, thread2 handles the ranged defined in the second tuple, etc...
Unless you have some hard requirement that certain threads process certain batches, I would strongly suggest generating your work as a single "stream" and using a higher-level abstraction for parallelism e.g. PLINQ.
If you just want to do work in batches, you can still do that but not care about which thread(s) the work is being done on:
static void Work(IEnumerable<int> ints) {
var sum = ints.Sum();
Thread.Sleep(sum);
Console.WriteLine(ints.Sum());
}
public static void Main (string[] args) {
var inputs = from i in Enumerable.Range(0, 100)
select i + i;
var batches = inputs.Batch(8);
var tasks = from batch in batches
select Task.Run(() => Work(batch));
Task.WaitAll(tasks.ToArray());
}
The default TaskScheduler is coordinating the work for you behind the scenes, and it'll likely outperform hand-rolling your own threading scheme.
Also consider something like this:
static int Work(IEnumerable<int> ints) {
Console.WriteLine("Work on thread " + Thread.CurrentThread.ManagedThreadId);
var sum = ints.Sum();
Thread.Sleep(sum);
return sum;
}
public static void Main (string[] args) {
var inputs = from i in Enumerable.Range(0, 100)
select i + i;
var batches = inputs.Batch(8);
var tasks = from batch in batches
select Work(batch);
foreach (var task in tasks.AsParallel()) {
Console.WriteLine(task);
}
}
/*
Work on thread 6
Work on thread 4
56
Work on thread 4
184
Work on thread 4
Work on thread 4
312
440
...
*/
I am writing a code which makes a lot of combinations (Combinations might not be the right word here, sequences of string in the order they are actually present in the string) that already exist in a string. The loop starts adding combinations to a List<string> but unfortunately, my loop takes a lot of time when dealing with any file over 200 bytes. I want to be able to work with hundreds of MBs here.
Let me explain what I actually want in the simplest of ways.
Lets say I have a string that is "Afnan is awesome" (-> main string), what I would want is a list of string which encompasses different substring sequences of the main string. For example-> A,f,n,a,n, ,i,s, ,a,w,e,s,o,m,e. Now this is just the first iteration of the loop. With each iteration, my substring length increases, yielding these results for the second iteration -> Af,fn,na,n , i,is,s , a,aw,we,es,so,om,me. The third iteration would look like this: Afn,fna,nan,an ,n i, is,is ,s a, aw, awe, wes, eso, som, ome. This will keep going on until my substring length reaches half the length of my main string.
My code is as follows:
string data = File.ReadAllText("MyFilePath");
//Creating my dictionary
List<string> dictionary = new List<string>();
int stringLengthIncrementer = 1;
for (int v = 0; v < (data.Length / 2); v++)
{
for (int x = 0; x < data.Length; x++)
{
if ((x + stringLengthIncrementer) > data.Length) break; //So index does not go out of bounds
if (dictionary.Contains(data.Substring(x, stringLengthIncrementer)) == false) //So no repetition takes place
{
dictionary.Add(data.Substring(x, stringLengthIncrementer)); //To add the substring to my List<string> -> dictionary
}
}
stringLengthIncrementer++; //To increase substring length with each iteration
}
I use data.Length / 2 because I only need combinations at most half the length of the entire string. Note that I search the entire string for combinations, not half of it.
To further simplify what I am trying to do -> Suppose I have an input string =
"abcd"
the output would be =
a, b, c, d, ab, bc, cd, This rest will be cut out as it is longer than half the length of my primary string -> //abc, bcd, abcd
I was hoping if some regex method may help me achieve this. Anything that doesn't consist of loops. Anything that is exponentially faster than this? Some simple code with less complexity which is more efficient?
Update
When I used Hashset instead of List<string> for my dictionary, I did not experience any change of performance and also got an OutOfMemoryException:
You can use linq to simplify the code and very easily parallelize it, but it's not going to be orders of magnitude faster, as you would need to run it on files of 100s of MBs (that's very likely impossible).
var data = File.ReadAllText("MyFilePath");
var result = Enumerable.Range(1, data.Length / 2)
.AsParallel()
.Select(len => new HashSet<string>(
Enumerable.Range(0, data.Length - len + 1) //Adding the +1 here made it work perfectly
.Select(x => data.Substring(x, len))))
.SelectMany(t=>t)
.ToList();
General improvements, that you can do in your code to improve the performance (I don't consider if there're other more optimal solutions).
calculate data.Substring(x, stringLengthIncrementer) only once
as you do search, use SortedList, it will be faster.
initialize the List (or SortedList, or whatever) with calculated number of items. Like new List(CalucatedCapacity).
or you can try to write an algorithm that produces combinations without checking for duplicates.
You may be able to use HashSet combined with MoreLINQ's Batch feature (available on NuGet) to simplify the code a little.
public static void Main()
{
string data = File.ReadAllText("MyFilePath");
//string data = "Afnan is awesome";
var dictionary = new HashSet<string>();
for (var stringLengthIncrementer = 1; stringLengthIncrementer <= (data.Length / 2); stringLengthIncrementer++)
{
foreach (var skipper in Enumerable.Range(0, stringLengthIncrementer))
{
var batched = data.Skip(skipper).Batch(stringLengthIncrementer);
foreach (var batch in batched)
{
dictionary.Add(new string(batch.ToArray()));
}
}
}
Console.WriteLine(dictionary);
dictionary.ForEach(z => Console.WriteLine(z));
Console.ReadLine();
}
For this input:
"Afnan is awesome askdjkhaksjhd askjdhaksjsdhkajd asjsdhkajshdkjahsd asksdhkajshdkjashd aksjdhkajsshd98987ad asdhkajsshd98xcx98asdjaksjsd askjdakjshcc98z98asdsad"
performance is roughly 10x faster than your current code.
I am trying to read the text file to check whether all the rows has same number of columns or not. In local code its working fine but on the Network shared folder (has permission as Everyone) it is working only for small size (5mb) of file and when I am selecting 10 MB or 500 MB file same code is not working (Not working means, it takes some time but after few minutes page gets refresh, that's it). It is not giving any error or showing any message. Below is the code to read the file and get the columns count
LinesLst = File.ReadLines(_fileName, Encoding.UTF8)
.Select((line, index) =>
{
var count = line.Split(Delimiter).Length;
if (NumberOfColumns < 0)
NumberOfColumns = count;
return new
{
line = line,
count = count,
index = index
};
})
.Where(colCount => colCount.count != NumberOfColumns)
.Select(colCount => colCount.line).ToList();
Perhaps you have OutOfMemoryException on large file. The fact is that are created many objects in the code on each iteration: the string array by line.Split and an anonymous object. Meanwhile, the anonymous object in fact is not needed. I would rewrote the code as so:
LinesLst = File.ReadLines(_fileName, Encoding.UTF8)
.Where(line =>
{
var count = line.Split(Delimiter).Length;
if (NumberOfColumns < 0)
NumberOfColumns = count;
return count != NumberOfColumns;
})
.ToList();
In addition, you can try to get rid of the creation of the string array when you call the line.Split. Try to replace the string
var count = line.Split(Delimiter).Length;
to the string
// Assume that Delimiter is char[]
var count = line.Count(c => Delimiter.Contains(c)) + 1;
// Assume that Delimiter is char
var count = line.Count(c => Delimiter == c) + 1;
I have added AsyncPostBackTimeout="36000" which solved my problem.
I have a C# Queue<TimeSpan> containing 500 elements.
I need to reduce those into 50 elements by taking groups of 10 TimeSpans and selecting their average.
Is there a clean way to do this? I'm thinking LINQ will help, but I can't figure out a clean way. Any ideas?
I would use the Chunk function and a loop.
foreach(var set in source.ToList().Chunk(10)){
target.Enqueue(TimeSpan.FromMilliseconds(
set.Average(t => t.TotalMilliseconds)));
}
Chunk is part of my standard helper library.
http://clrextensions.codeplex.com/
Source for Chunk
Take a look at the .Skip() and .Take() extension methods to partition your queue into sets. You can then use .Average(t => t.Ticks) to get the new TimeSpan that represents the average. Just jam each of those 50 averages into a new Queue and you are good to go.
Queue<TimeSpan> allTimeSpans = GetQueueOfTimeSpans();
Queue<TimeSpan> averages = New Queue<TimeSpan>(50);
int partitionSize = 10;
for (int i = 0; i <50; i++) {
var avg = allTimeSpans.Skip(i * partitionSize).Take(partitionSize).Average(t => t.Ticks)
averages.Enqueue(new TimeSpan(avg));
}
I'm a VB.NET guy, so there may be some syntax that isn't 100% write in that example. Let me know and I'll fix it!
Probably nothing beats a good old procedural execution in a method call in this case. It's not fancy, but it's easy, and it can be maintained by Jr. level devs.
public static Queue<TimeSpan> CompressTimeSpan(Queue<TimeSpan> original, int interval)
{
Queue<TimeSpan> newQueue = new Queue<TimeSpan>();
if (original.Count == 0) return newQueue;
int current = 0;
TimeSpan runningTotal = TimeSpan.Zero;
TimeSpan currentTimeSpan = original.Dequeue();
while (original.Count > 0 && current < interval)
{
runningTotal += currentTimeSpan;
if (++current >= interval)
{
newQueue.Enqueue(TimeSpan.FromTicks(runningTotal.Ticks / interval));
runningTotal = TimeSpan.Zero;
current = 0;
}
currentTimeSpan = original.Dequeue();
}
if (current > 0)
newQueue.Enqueue(TimeSpan.FromTicks(runningTotal.Ticks / current));
return newQueue;
}
You could just use
static public TimeSpan[] Reduce(TimeSpan[] spans, int blockLength)
{
TimeSpan[] avgSpan = new TimeSpan[original.Count / blockLength];
int currentIndex = 0;
for (int outputIndex = 0;
outputIndex < avgSpan.Length;
outputIndex++)
{
long totalTicks = 0;
for (int sampleIndex = 0; sampleIndex < blockLength; sampleIndex++)
{
totalTicks += spans[currentIndex].Ticks;
currentIndex++;
}
avgSpan[outputIndex] =
TimeSpan.FromTicks(totalTicks / blockLength);
}
return avgSpan;
}
It's a little more verbose (it doesn't use LINQ), but it's pretty easy to see what it's doing... (you can a Queue to/from an array pretty easily)
I'd use a loop, but just for fun:
IEnumerable<TimeSpan> AverageClumps(Queue<TimeSpan> lots, int clumpSize)
{
while (lots.Any())
{
var portion = Math.Min(clumpSize, lots.Count);
yield return Enumerable.Range(1, portion).Aggregate(TimeSpan.Zero,
(t, x) => t.Add(lots.Dequeue()),
(t) => new TimeSpan(t.Ticks / portion));
}
}
}
That only examines each element once, so the performance is a lot better than the other LINQ offerings. Unfortunately, it mutates the queue, but maybe it's a feature and not a bug?
It does have the nice bonus of being an iterator, so it gives you the averages one at a time.
Zipping it together with the integers (0..n) and grouping by the sequence number div 10?
I'm not a linq user, but I believe it would look something like this:
for (n,item) from Enumerable.Range(0, queue.length).zip(queue) group by n/10
The take(10) solution is probably better.
How is the grouping going to be performed?
Assuming something very simple (take 10 at a time ), you can start with something like:
List<TimeSpan> input = Enumerable.Range(0, 500)
.Select(i => new TimeSpan(0, 0, i))
.ToList();
var res = input.Select((t, i) => new { time=t.Ticks, index=i })
.GroupBy(v => v.index / 10, v => v.time)
.Select(g => new TimeSpan((long)g.Average()));
int n = 0;
foreach (var t in res) {
Console.WriteLine("{0,3}: {1}", ++n, t);
}
Notes:
Overload of Select to get the index, then use this and integer division pick up groups of 10. Could use modulus to take every 10th element into one group, every 10th+1 into another, ...
The result of the grouping is a sequence of enumerations with a Key property. But just need those separate sequences here.
There is no Enumerable.Average overload for IEnumerable<TimeSpan> so use Ticks (a long).
EDIT: Take groups of 10 to fit better with question.
EDIT2: Now with tested code.