I am trying to seek the way making more performant deserialization process in our application. I created simple test below. When I check the results, it seems like JsonConvert.DeserializeObject working much faster after first iteration.
[TestMethod]
public void DeserializeObjectTest()
{
int count = 5;
for (int i = 0; i < count; i++)
{
CookieCache cookieCache = new CookieCache()
{
added = DateTime.UtcNow.AddDays(-1),
VisitorId = Guid.NewGuid().ToString(),
campaigns = new List<string>() { "qqq", "www", "eee" },
target_dt = "3212018",
updated = DateTime.UtcNow
};
Stopwatch stopwatch = Stopwatch.StartNew();
string serializeObject = JsonConvert.SerializeObject(cookieCache);
CookieCache deserializeObject = JsonConvert.DeserializeObject<CookieCache>(serializeObject);
stopwatch.Stop();
double stopwatchElapsedMilliseconds = stopwatch.Elapsed.TotalMilliseconds;
Debug.WriteLine("iteration " + i + ": " + stopwatchElapsedMilliseconds);
}
And my results:
I think I am using Stopwatch correctly. So does JSON.NET is using some sort of internal caching or optimization process on deserialization on following calls?
Since this is a web application, of course, I am getting similar results in my logs (280.6466) for every single web request.
So am I missing something on my test? Or is it expected behavior?
Related
I have been using stopwatch in one of my DevExpress based application. I have created a FlaUI based test case that initiates the keyboard, enters a value and then moves to the next column. Now, there are like 20+ columns in a row in the DevExpress grid I am using. While I try to run that test case 5 times to come to a baseline timing, I am seeing too much different results.
I am pasting an extract of my current code here:
public bool CreateNewGdistVoyageForPerformanceTesting(IEnumerable<GdistBenchVoyageParameters> gdistparameters,
string gridAutomationId)
{
var watch = System.Diagnostics.Stopwatch
.StartNew(); //This line of code is being used to monitor the time taken by each Keyboard operation
long TotalTimeConsumed = 0;
int MaxAllowedTime = 0;
int HasTimeExceeded = 0;
bool TimeHasNotExceeded = true;
watch.Start();
_logger.Info("Creating a new GDIST Voyage");
TabItem ParentControl = VoyageEditorTabControl.SelectedTabItem;
var CurrentSelectedTab = VoyageEditorTabControl.SelectedTabItemIndex;
var ParentGrid = VoyageEditorTabControl.TabItems[CurrentSelectedTab]
.FindFirstDescendant(cf => cf.ByAutomationId(gridAutomationId)).AsGrid();
_controlAction.Highlight(ParentGrid);
var Pattern = ParentGrid.Patterns.Grid;
var RowCount = Pattern.Pattern.RowCount;
var ColumnCount = Pattern.Pattern.ColumnCount;
_logger.Info("======================================================================================");
if (ParentGrid.AutomationId.Equals("ParentGridControl"))
{
_logger.Info($"Performance Testing on GDIST's View Main Grid :{gridAutomationId}");
_logger.Info($"Current Grid Row count is: {RowCount}");
_logger.Info("Creating a new voyage for GDIST Bench");
}
else
{
_logger.Info($"Performance Testing on GDIST's Similar Voyages Panel Grid: {gridAutomationId}");
_logger.Info($"Current Grid Row count is: {RowCount}");
_logger.Info("Editing an existing voyage for GDIST Bench's Similar Voyages Panel");
}
for (int i = 0; i < ColumnCount; i++)
{
var cell = ParentGrid.Patterns.Grid.Pattern.GetItem(ParentGrid.RowCount - 1, i);
if (cell == null)
{
_logger.Warning("No Columns found with matching Automation Ids");
break;
}
if (cell.AutomationId.Equals("Vessel"))
{
MaxAllowedTime = 1500;
gdistparameters.ToList().ForEach(voyageDetailsField =>
{
if (voyageDetailsField.VesselId != null)
{
_logger.Info("Adding Data in Vessel ID ");
cell.Focus();
cell.Click();
_logger.Info($"Entered value is:{voyageDetailsField.VesselId}");
watch.Stop(); // trying this to ensure the watch dummy run to remove JIT noise
if (!watch.IsRunning)
{
watch.Restart(); //starting the watch
}
Keyboard.Type(voyageDetailsField.VesselId.Trim());
watch.Stop();
Keyboard.Press(VirtualKeyShort.TAB);
// _controlAction.WaitFor(new TimeSpan(0, 0, 2));
Wait.UntilInputIsProcessed();
_logger.Info($"Execution Time: {watch.ElapsedMilliseconds} ms");
if (watch.ElapsedMilliseconds > MaxAllowedTime)
{
HasTimeExceeded++;
_logger.Warning($"The data entry time has exceeded beyond the fixed value by {watch.ElapsedMilliseconds - MaxAllowedTime} ms");
}
TotalTimeConsumed = TotalTimeConsumed + watch.ElapsedMilliseconds;
}
});
TotalTimeConsumed = TotalTimeConsumed + watch.ElapsedMilliseconds;
}
if (cell.AutomationId.Equals("LoadDate")) //Load Date
{
MaxAllowedTime = 500;
gdistparameters.ToList().ForEach(voyageDetailsField =>
{
// _logger.Info("Adding data into the Load Date field");
if (voyageDetailsField.LoadDate != null)
{
_logger.Info("Adding Data in Load Date ");
cell.Focus();
cell.Click();
_logger.Info($"Entered value is:{voyageDetailsField.LoadDate}");
watch.Stop(); // trying this to ensure the watch dummy run to remove JIT noise
if (!watch.IsRunning)
{
watch.Restart(); //starting the watch
}
Keyboard.Type(voyageDetailsField.LoadDate.Trim());
watch.Stop();
Keyboard.Press(VirtualKeyShort.TAB);
// _controlAction.WaitFor(new TimeSpan(0, 0, 2));
Wait.UntilInputIsProcessed();
_logger.Info($"Execution Time: {watch.ElapsedMilliseconds} ms");
if (watch.ElapsedMilliseconds > MaxAllowedTime)
{
HasTimeExceeded++;
_logger.Warning(
$"The data entry time has exceeded beyond the fixed value by {watch.ElapsedMilliseconds - MaxAllowedTime} ms");
}
TotalTimeConsumed = TotalTimeConsumed + watch.ElapsedMilliseconds;
}
});
TotalTimeConsumed = TotalTimeConsumed + watch.ElapsedMilliseconds;
}
The timings that I have been observing via the logger functionality is below.
I have run this on multiple PCs, multiple environments but the results are very different. The 5th run is actually taking a lot in every single case.
Note also that all the data that is being entered in populated when the application loads so network latency shouldn't be a problem here.
Moreover, I read that StopWatch comes with a lot of JIT noise, and yes, I had experienced it every time it was running for the first time, so I have already given it a false start in my code.
This test is a performance test and requires benchmarking. We can decide on the benchmarking with such big difference in the numbers.
So my program lets me send requests using WSDL the class below is provided by the WSDL:
CreateCustomerNoteRequest createCustomerNotesRequestInfo = new CreateCustomerNoteRequest();
Using this class I have to set the variables like this:
//FIRST WRITING NOTE TO OLD ACCOUNT TO SAY ITS BEEN COMPRIMISED AND SHOW NEW CUSTOMER NUMBER:
createCustomerNotesRequestInfo.UserName = username;
createCustomerNotesRequestInfo.Password = password;
createCustomerNotesRequestInfo.SystemToken = "sysToken";
createCustomerNotesRequestInfo.Note = new CustomerNote();
createCustomerNotesRequestInfo.Note.CustomerNumber = cloneCustomerNumber;
createCustomerNotesRequestInfo.Note.Category = new CustomerServiceWSDL.LookupItem();
createCustomerNotesRequestInfo.Note.Category.Code = "GEN";
createCustomerNotesRequestInfo.Note.Details = "Account Takeover – Fraud. Acc – " + customerNumberTextBox.Text + " closed as compromised and new account " + newCloneCustomerNumber + " created matching existing data";
And to finish off I use this to get my response:
createCustomerNotesResponse = soapClient.CreateCustomerNote(createCustomerNotesRequestInfo);
Everything works fine. What I want to do now is because I have multiple Notes I want to loop this process so depending on how many Note there are it would create that many instances.
I do successfully get all the Notes into a list like this using notecount which provides how many number of notes there are (Given by WSDL) so all is good so far:
try
{
for (int i = 0; i <= notesCount; i++)
{
customerNotesArrayList.Add(getCustomerNotesResponse.Notes.Items[i]);
//i++;
}
}
What I want to do: Now depending on the notes count I want to create that many of this:
CreateCustomerNoteRequest createCustomerNotesRequestInfo = new CreateCustomerNoteRequest();
I tried this:
for (int i=0; i<=notesCount;i++)
{
CreateCustomerNoteRequest a[i] = new CreateCustomerNoteRequest();
}
But its not as easy as that so how can I loop to make this happen?
So I want a1, a2, a3 where Ill then loop all the notes in later which shouldn't be a problem. But creating these in the first place is the problem.
[EDIT]
//Create Notes and copy over array contents...
CreateCustomerNoteRequest request = new CreateCustomerNoteRequest();
for (int i = 0; i <= notesCount; i++)
{
request.UserName = username;
request.Password = password;
request.SystemToken = systemToken;
request.Note = new CustomerNote();
request.Note.CustomerNumber = newCloneCustomerNumber;
request.Note.Category = new CustomerServiceWSDL.LookupItem();
request.Note.Category.Code = customerNotesArrayList[i].NoteCategory.Code.ToString();
request.Note.Details = customerNotesArrayList[i].NoteText;
var response = soapClient.CreateCustomerNote(request);
}
You're declaring the array inside the loop, which means it won't be available afterwards. Furthermore you need to declare the array size beforehand:
CreateCustomerNoteRequest[] a = new CreateCustomerNoteRequest[notesCount];
for (int i = 0; i < notesCount; i++)
{
a[i] = new CreateCustomerNoteRequest();
}
// now you can use the array outside the loop as well
Instead of an array you could choose to use a List<CreateCustomerNoteRequest>, which doesn't need a size declaration first.
Note that if you're planning to get the notes inside the same loop, you won't need the array at all:
for (int i = 0; i < notesCount; i++)
{
CreateCustomerNoteRequest request = new CreateCustomerNoteRequest();
var response = soapClient.CreateCustomerNote(request);
// todo process response
}
I was optimizing my code, and I noticed that using properties (even auto properties) has a profound impact on the execution time. See the example below:
[Test]
public void GetterVsField()
{
PropertyTest propertyTest = new PropertyTest();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
propertyTest.LoopUsingCopy();
Console.WriteLine("Using copy: " + stopwatch.ElapsedMilliseconds / 1000.0);
stopwatch.Restart();
propertyTest.LoopUsingGetter();
Console.WriteLine("Using getter: " + stopwatch.ElapsedMilliseconds / 1000.0);
stopwatch.Restart();
propertyTest.LoopUsingField();
Console.WriteLine("Using field: " + stopwatch.ElapsedMilliseconds / 1000.0);
}
public class PropertyTest
{
public PropertyTest()
{
NumRepet = 100000000;
_numRepet = NumRepet;
}
int NumRepet { get; set; }
private int _numRepet;
public int LoopUsingGetter()
{
int dummy = 314;
for (int i = 0; i < NumRepet; i++)
{
dummy++;
}
return dummy;
}
public int LoopUsingCopy()
{
int numRepetCopy = NumRepet;
int dummy = 314;
for (int i = 0; i < numRepetCopy; i++)
{
dummy++;
}
return dummy;
}
public int LoopUsingField()
{
int dummy = 314;
for (int i = 0; i < _numRepet; i++)
{
dummy++;
}
return dummy;
}
}
In Release mode on my machine I get:
Using copy: 0.029
Using getter: 0.054
Using field: 0.026
which in my case is a disaster - the most critical loop just can't use any properties if I want to get maximum performance.
What am I doing wrong here? I was thinking that these would be inlined by the JIT optimizer.
Getters/Setters are syntactic sugar for methods with a few special conventions ("value" variable in a setter", and no visible parameter list).
According to this article, "If any of the method's formal arguments are structs, the method will not be inlined." -- ints are structs. Therefore, I think this limitation applies.
I haven't looked at the IL produced by the following code, but I did get some interesting results that I think shows this working this way...
using System;
using System.Diagnostics;
public static class Program{
public static void Main()
{
PropertyTest propertyTest = new PropertyTest();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
propertyTest.LoopUsingField();
Console.WriteLine("Using field: " + stopwatch.ElapsedMilliseconds / 1000.0);
stopwatch.Restart();
propertyTest.LoopUsingBoxedGetter();
Console.WriteLine("Using boxed getter: " + stopwatch.ElapsedMilliseconds / 1000.0);
stopwatch.Restart();
propertyTest.LoopUsingUnboxedGetter();
Console.WriteLine("Using unboxed getter: " + stopwatch.ElapsedMilliseconds / 1000.0);
}
}
public class PropertyTest
{
public PropertyTest()
{
_numRepeat = 1000000000L;
_field = 1;
Property = 1;
IntProperty = 1;
}
private long _numRepeat;
private object _field = null;
private object Property {get;set;}
private int IntProperty {get;set;}
public void LoopUsingBoxedGetter()
{
for (long i = 0; i < _numRepeat; i++)
{
var f = Property;
}
}
public void LoopUsingUnboxedGetter()
{
for (long i = 0; i < _numRepeat; i++)
{
var f = IntProperty;
}
}
public void LoopUsingField()
{
for (long i = 0; i < _numRepeat; i++)
{
var f = _field;
}
}
}
This produces.. ON MY MACHINE, OS X (recent version of Mono), these results (in seconds):
Using field: 2.606
Using boxed getter: 2.585
Using unboxed getter: 2.71
You say you are optimizing your code, but I am curious as to how, what the functionality is supposed to be, and what the source data coming into this is as well as it's size as this is clearly not "real" code. If you are parsing a large list of data in consider utilizing the BinarySearch functionality. This is significantly faster than, say the .Contains() function with very large sets of data.
List<int> myList = GetOrderedList();
if (myList.BinarySearch(someValue) < 0)
// List does not contain data
Perhaps you are simply looping through data. If you are looping through data and returning a value perhaps you may want to utilize the yield keyword. Additionally consider the potential use of the parallel library if you can, or utilize your own thread management.
This does not seem like what you want judging by the posted source but it was very generic so I figured this was worth mentioning.
public IEnumerable<int> LoopUsingGetter()
{
int dummy = 314;
for (int i = 0; i < NumRepet; i++)
{
dummy++;
yield return dummy;
}
}
[ThreadStatic]
private static int dummy = 314;
public static int Dummy
{
get
{
if (dummy != 314) // or whatever your condition
{
return dummy;
}
Parallel.ForEach (LoopUsingGetter(), (i)
{
//DoWork(), not ideal for given example, but due to the generic context this may help
dummy += i;
});
}
return dummy;
}
Follow the 80/20 performance rule instead of micro-optimizing.
Write code for maintainability, instead of performance.
Perhaps Assembly language is the fastest but that does not mean we should use Assembly language for all purposes.
You are running the loop 100 million times and the difference is 0.02 millisecond or 20 microseconds. Calling a function will have some overhead but in most cases it does not matter. You can trust the compiler to inline or do advanced things.
Directly accessing the field will be problematic in 99% of the cases as you will not have control of where all your variables are referenced and fixing at too many places when you find something is wrong.
You should stop the stop watch when it completes the loop, your stopwatch is still running when you are writing to console this can add additional time that can skew your results.
[Test]
public void GetterVsField()
{
PropertyTest propertyTest = new PropertyTest();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
propertyTest.LoopUsingCopy();
stopwatch.Stop();
Console.WriteLine("Using copy: " + stopwatch.ElapsedMilliseconds / 1000.0);
stopwatch.Reset();
stopwatch.Start();
propertyTest.LoopUsingGetter();
stopwatch.Stop();
Console.WriteLine("Using getter: " + stopwatch.ElapsedMilliseconds / 1000.0);
stopwatch.Reset();
stopwatch.Start();
propertyTest.LoopUsingField();
stopwatch.Stop();
Console.WriteLine("Using field: " + stopwatch.ElapsedMilliseconds / 1000.0);
}
You have to check if optimize code checkbox is checked.
If it is not checked, access to the property is still method call
If it is checked the property is in-lined and the performance is the same as with direct field access because the JITed code will be the same
There is more restriction about inlinig in X64 JIT compiler. More information about JIT64 inlining optimization is there:
David Broman's CLR Profiling API Blog: Tail call JIT conditions.
please see point #3 The caller or callee return a value type.
If your property will return reference type, the property getter will be in-lined.
It means that the property int NumRepet { get; set; } is not inlined but object NumRepet { get; set; } will be inlined if you don't break another restriction.
The optimization of X64 JIT is poor and this is why new one will be introduced as John mention
I have a function which i use for stress testing and i want to stress test multiple functions. The function that i am stress testing here is GetParameters(reportUri, SessionContext);
How can i add a helper function in which i can pass a parameter such as an action body or a delegate (this is good but i have multiple functions with different paramters). and it executes all the steps by just replacing this.RemoteReportingServiceFactory.CreateReportParameterProvider().GetParameters(reportUri, SessionContext); dynamically. The entire function body is going to be same except for above mentioned line
public void GetParameters()
{
for (int i = 0; i < 100; i++)
{
Log.Message(TraceEventType.Information, "Start of {0} sequential iteration with 5 parallel stress runs".InvariantFormat(i));
Parallel.For(0, 2, parameterIteration =>
{
Log.Message(TraceEventType.Information, "Stress run count : {0}".InvariantFormat(parameterIteration + 1));
string reportUrl = TeamFoundationTestConfig.TeamFoundationReportPath("TaskGroupStatus");
ReportUri reportUri = ReportUri.Create(reportUrl);
Log.Message(TraceEventType.Information, "ReportUri = {0}".InvariantFormat(reportUri.UriString));
IList<Parameter> parameters = this.RemoteReportingServiceFactory.CreateReportParameterProvider().GetParameters(reportUri, SessionContext);
});
}
}
Let me know if am not clear enough. I can edit my questions as per requests
How about changeing the method to something like
public void GetParameters(Func<ReportUri, SessionContext, IList<Parameter>> returnStuff)
{
for (int i = 0; i < 100; i++)
{
Log.Message(TraceEventType.Information, "Start of {0} sequential iteration with 5 parallel stress runs".InvariantFormat(i));
Parallel.For(0, 2, parameterIteration =>
{
Log.Message(TraceEventType.Information, "Stress run count : {0}".InvariantFormat(parameterIteration + 1));
string reportUrl = TeamFoundationTestConfig.TeamFoundationReportPath("TaskGroupStatus");
ReportUri reportUri = ReportUri.Create(reportUrl);
Log.Message(TraceEventType.Information, "ReportUri = {0}".InvariantFormat(reportUri.UriString));
IList<Parameter> parameters = returnStuff(reportUri, SessionContext);
});
}
}
struct mydata
{
public int id;
public string data;
}
class Program
{
static void Main(string[] args)
{
List<mydata> myc = new List<mydata>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 1000000; i++)
{
mydata d = new mydata();
d.id = i;
d.data = string.Format("DataValue {0}",i);
myc.Add(d);
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
}
Whys is this code above so SLOW..?
On an older laptop the times are:
C# code above: 1500ms
Similar code in Delphi: 450ms....
I then changed the code to a KeyValue/Pair (see below):
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var list = new List<KeyValuePair<int , string>>();
for (int i = 0; i < 1000000; i++)
{
list.Add(new KeyValuePair<int,string>(i, "DataValue" + i));
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
Console.ReadLine();
This improved the time to 1150ms..
If I remove the '+ i' the time is < 300ms
If I try and replace it with a StringBuilder, the timing is similar.
StringBuilder sb = new StringBuilder();
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
var list = new List<KeyValuePair<int, string>>();
for (int i = 0; i < 1000000; i++)
{
sb.Append("DataValue");
sb.Append(i);
list.Add(new KeyValuePair<int, string>(i, sb.ToString()));
sb.Clear();
}
stopWatch.Stop();
Console.WriteLine("End: {0}", stopWatch.ElapsedMilliseconds);
Console.ReadLine();
Is slightly better.. If you remove the sb.Append(i) its very fast..
It would appear that any time you have to add an Int to a string/stringbuilder its VERY SLOW..
Can I speed this up in any way ??
EDIT **
The code below is the quickest I can get after making suggestions:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Threading;
namespace ConsoleApplication1
{
struct mydata
{
public int id;
public string data;
}
class Program
{
static void Main(string[] args)
{
List<mydata> myc = new List<mydata>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 1000000; i++)
{
mydata d = new mydata();
d.id = i;
d.data = "DataValue " + i.ToString();
myc.Add(d);
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
Console.ReadLine();
}
}
}
If I replace the line:
d.data = "DataValue " + i.ToString();
with:
d.data = "DataValue ";
On my home machine this goes from 660ms -> 31ms..
Yes.. its 630ms slower with the '+ i.ToString()'
But still 2x faster than boxing/string.format etc etc..
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var list = new List<KeyValuePair<int, string>>();
for (int i = 0; i < 1000000; i++)
{
list.Add(new KeyValuePair<int, string>(i, "DataValue" +i.ToString()));
}
stopwatch.Stop();
Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
Console.ReadLine();
is 612ms.. (no difference in speed if List>(1000000); is pre-initialised).
The problem with your first two examples is that the integer must first be boxed and then converted to a string. The boxing causes the code to be slower.
For example, in this line:
d.data = string.Format("DataValue {0}", i);
the second parameter to string.Format is object, which causes boxing of i. See the intermediate language code for confirmation of this:
...
box int32
call string [mscorlib]System.String::Format(string, object)
...
Similarly this code:
d.data = "DataValue " + i;
is equivalent to this:
d.data = String.Concat("DataValue ", i);
This uses the overload of String.Concat with parameters of type object so again this involves a boxing operation. This can be seen in the generated intermediate language code:
...
box int32
call string [mscorlib]System.String::Concat(object, object)
...
For better performance this approach avoids the boxing:
d.data = "DataValue " + i.ToString();
Now the intermediate language code doesn't include the box instruction and it uses the overload of String.Concat that takes two strings:
...
call instance string [mscorlib]System.Int32::ToString()
call string [mscorlib]System.String::Concat(string, string)
...
On my machine:
... String.Format("DataValue {0}", i ) // ~1650ms
... String.Format("DataValue {0}", "") // ~1250ms
... new MyData {Id = i, Data = "DataValue {0}" + i} // ~1200ms
As Mark said, there's a boxing operation involved.
For this specific case, when you get your DataValue based on your id, you could to create a get property or to override ToString() method to do that operation just when you need it.
public override string ToString()
{
return "DataValue {0}" + Id;
}
There are a lot of things wrong with the above which will be affecting your results.
First, none of the comparisons you've done are equal. In both you have a list, and use Add, what you add to the list won't affect the time, changing the declaration of the List to var won't affect the time.
I'm not convinced by the boxing argument put up by Mark, this can be a problem, but I'm pretty certain in the first case there is an implicit call to .ToString. This has its own overhead, and would be needed even if the int is boxed.
Format is quite an expensive operation.
The second version has a string concatenation which is probably cheaper than a .Format.
The third is just expensive all the way. Using a string builder like that is not efficient. Internally a stringbuilder is just a list. When you do a .ToString on it you essentially do a big concat operation then.
The reason some of the operations might suddenly run really quickly if you take out a critical line is that the compile can optimise out bits of code. If it seems to be doing the same thing over and over it might not do it (Gross over simplification).
Right, so here's my suggestion:
The first version is probably the nearest to being "right" in my mind. What you could do is defer some of the processing. Take the object mydata and set a string property AND an int property. Then only when you need to do the read of the string produce the output via a concat. Save that if you're going to repeat the print operation a lot. It won't necessarilly be quicker in the way you expect.
Another major performance killer in this code is the List. Internally it stores the items in an array. When you call Add, it checks if the new Item can fit into the array (EnsureCapacitiy). When it needs more room, it will create a NEW array with double the size, and then COPY the items from the old array into the new one. You can see all this going on if you check out List.Add in Reflector.
So in your code with 1,000,000 items, it needs to copy the array some 25 times, and they're bigger each time.
If your change your code to
var list = new List<KeyValuePair<int, string>>(1000000);
you should see a dramatic increase in speed. Let us know how you fare!
Regards,
GJ