WCF maxes CPU when waiting on _TransparantProxyStub_CrossContext function during call - c#

I'm getting heavy CPU usage when making calls to Cisco's AXL SOAP API using WCF. I start by creating a service model clientbase using generated classes from wsdl. I'm using basichttpbinding and transfermode as buffered. When executing a call, the CPU maxes out, and a CPU profile shows that 96% of CPU time is at _TransparentProxyStub_CrossContext#0 from clr.dll that is called after calls such as base.Channel.getPhone(request);. More correctly, the call maxes out the CPU core that the process is running on.
Here's a snip of the client creation from the wsdl generate
[System.CodeDom.Compiler.GeneratedCodeAttribute("System.ServiceModel", "")]
public partial class AXLPortClient : System.ServiceModel.ClientBase<AxlNetClient.AXLPort>, AxlNetClient.AXLPort
public AXLPortClient()
public AXLPortClient(string endpointConfigurationName) :
This is how I create the client:
public class AxlClientFactory : IAxlClientFactory
private const string AxlEndpointUrlFormat = "https://{0}:8443/axl/";
public AXLPortClient CreateClient(IUcClientSettings settings)
ServicePointManager.ServerCertificateValidationCallback = (sender, certificate, chain, errors) => true;
ServicePointManager.Expect100Continue = false;
var basicHttpBinding = new BasicHttpBinding(BasicHttpSecurityMode.Transport);
basicHttpBinding.Security.Transport.ClientCredentialType = HttpClientCredentialType.Basic;
basicHttpBinding.MaxReceivedMessageSize = 20000000;
basicHttpBinding.MaxBufferSize = 20000000;
basicHttpBinding.MaxBufferPoolSize = 20000000;
basicHttpBinding.ReaderQuotas.MaxDepth = 32;
basicHttpBinding.ReaderQuotas.MaxArrayLength = 20000000;
basicHttpBinding.ReaderQuotas.MaxStringContentLength = 20000000;
basicHttpBinding.TransferMode = TransferMode.Buffered;
//basicHttpBinding.UseDefaultWebProxy = false;
var axlEndpointUrl = string.Format(AxlEndpointUrlFormat, settings.Server);
var endpointAddress = new EndpointAddress(axlEndpointUrl);
var axlClient = new AXLPortClient(basicHttpBinding, endpointAddress);
axlClient.ClientCredentials.UserName.UserName = settings.User;
axlClient.ClientCredentials.UserName.Password = settings.Password;
return axlClient;
The generated wsdl code for the AXL API is very large. Both initial and subsequent calls have the CPU issue, although subsequent calls are faster. Is there anything else I can do to debug this issue? Is there a way to reduce this high CPU usage?
A bit more info with the bounty:
I've created the C# classes like so:
svcutil AXLAPI.wsdl AXLEnums.xsd AXLSoap.xsd /t:code /l:C# /o:Client.cs /n:*,AxlNetClient
You have to download the wsdl for Cisco's AXL api from a call manager system. I'm using the 10.5 version of the API. I believe the a major slowdown is related to XML processing. The WSDL for the api is huge with the resulting classes making a 538406 lines of code!
Update 2
I've turned on WCF tracing with all levels. The largest time difference is in the process action activity between "A message was written" and "Sent a message over a channel" in which nearly a full minute passes between these two actions. Other activities (construct channel, open clientbase and close clientbase) all execute relatively fast.
Update 3
I've made two changes to the generated client classes. First, I removed the ServiceKnownTypeAttribute from all the operation contracts. Second, I removed the XmlIncludeAtribute from some of the serializable classes. These two changes reduced the file size of the generated client by more than 50% and had a small impact on test times (a reduction of about 10s on a 70s test result).
I also noticed that I have roughly 900 operation contracts for a single service interface and endpoint. This is due to the wsdl for the AXL API grouping all operations under a single namespace. I'm thinking about breaking this up, but that would mean creating multiple clientbases that would each implement a reduced interface and end up breaking everything that implements this wcf library.
Update 4
It looks like the number of operations is the central problem. I was able to separate out operations and interface definitions by verb (e.g. gets, adds, etc) into their own clientbase and interface (a very slow process using sublime text and regex as resharper and codemaid couldn't handle the large file that's still 250K+ lines). A test of the "Get" client with about 150 operations defined resulted in a 10 second execution for getPhone compared to a previous 60 second result. This is still a lot slower than it should be as simply crafting this operation in fiddler results in a 2 second execution. The solution will probably be reducing the operation count even more by trying to separate operations further. However, this adds a new problem of breaking all systems that used this library as a single client.

I've finally nailed down this problem. The root cause does appear to be the number of operations. After splitting up the generated client from 900+ operations to 12 each (following guidance from this question) I was able to reduce the processor time spent on generating requests to nearly zero.
This is the final process for optimizing the generated service client from Cisco's AXL wsdl:
Generate client code using wsdl like so:
svcutil AXLAPI.wsdl AXLEnums.xsd AXLSoap.xsd /t:code /l:C# /o:Client.cs /n:*,AxlNetClient
Process the generated client file to break up into sub clients:
I created this script to process the generated code. This script does the following:
Remove ServiceKnownType, FaultContract, and XmlInclude attributes.
These are useful for xml processing, but the generated classes appear to be incorrect from what I understand. The serviceknowntype for example, is identical for all operations even though many of the knowntypes are unique for each operaiton. This reduces the total size of the generated file from 500K+ lines to 250K+ with a minor performance increase in client instantiation time.
Separate out the operation contracts form the interface and methods from the clientbase that implement the interface.
Create subclients each with 12 operations and their respective implementation.
These subclients have three main parts. The first part is a partial class of the original clientbase client. I want this solution to be backwards compatible so I've got methods here that reference the subclient so that calls to the old super-client still work by calling the new subclient. A static get accessor will initiate the subclient if any of it's implemented operations are referenced. There is also events added for when close or abort is called so that subclients can still run these operations.
The second and third parts of the subclient is the interface and subclient class that implements the 12 operations.
I then removed the interface and client methods from the original generated client. I replaced the client constructors for the original client to simply store the binding and endpoint data for subclients to use when needed. Close and abort calls were recreated as event invokers that each subclient would subscribe to when instantiated.
Lastly, I've moved authentication to a custom endpoint behavior similar to what's described here. Using the IClientMessageInspector to send the authentication header immediately saves in one round trip call to the server where WCF likes to send an anonymous request first before authenticating. This gives me roughly a 2 sec increase depending on the server.
Overall, I've got a performance increase from 70s to 2.5s.


Bulk upload via REST api

I have the goal of uploading a Products CSV of ~3000 records to my e-commerce site. I want to utilise the REST API that my e-comm platform provides so I have something I can re-use and build upon for future sites that I may create.
My main issue that I am having trouble working through is:
- System.Threading.ThreadAbortException
Which I can only attribute to how long it takes to process through all 3K records via a POST request. My code:
public ActionResult WriteProductsFromFile()
string fileNameIN = "19107.txt";
string fileNameOUT = "19107_output.txt";
string jsonUrl = $"/api/products";
List<string> ls = new List<string>();
var engine = new FileHelperAsyncEngine<Prod1>();
using (engine.BeginReadFile(fileNameIN))
foreach (Prod1 prod in engine)
outputProduct output = new outputProduct();
if (!string.IsNullOrEmpty(prod.name))
output.product.name = prod.name;
string productJson = JsonConvert.SerializeObject(output);
foreach (String s in ls)
nopApiClient.Post(jsonUrl, s);
return RedirectToAction("GetProducts");
Since I'm new to web-coding, am I going about this the wrong way? Is there a preferred way to bulk-upload that I haven't come across?
I've attempted to use the TaskCreationOptions.LongRunning flag, which helps the cause slightly but doesn't get me anywhere near my goal.
Web and api controller actions are not meant to do long running tasks - besides locking up the UI/thread, you will be introducing a series of opportunities for failure that you will have little recourse in recovering from.
But it's not all bad you have a lot of options here, there is a lot of literature on async/cloud architecture - which explains how to deal with files and these sorts of scenarios.
What you want to do is disconnect the processing of your file from the API request (in your application not the 3rd party)
It will take a little more work but will ultimately create a more reliable application.
Step 1:
Drop the file immediately to disk - I see you have the file on DISK already not sure how it gets there but either way it will work out the same.
Step 2:
Use a process running as
- a console app (easiest)
- a service (requires some sort of install/uninstall of the service)
- or even a thread in your web app (but you will struggle to know when it fails)
Which ever way you choose, the process will watch a directory for file changes, when there is a change it will kick off your method to happily process the file as you like.
Check out the FileSystemWatchers here is a basic example: https://www.dotnetperls.com/filesystemwatcher
If you are interested in running a thread in your Api/Web app, take a look at https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx for some options.
You don't have to use a FileSystemWatcher of course, you could trigger via a flag in a DB - that is being checked periodically, or a system event.

That async-ing feeling - httpclient and mvc thread blocking

Dilemma, dilemma...
I've been working up a solution to a problem that uses async calls to the HttpClient library (GetAsync=>ConfigureAwait(false) etc). IIn a console app, my dll is very responsive and the mixture of using the async await calls and the Parallel.ForEach(=>) really makes me glow.
Now for the issue. After moving from this test harness to the target app, things have become problematic. I'm using asp.net mvc 4 and have hit a few issues. The main issue really is that calling my process on a controller action actually blocks the main thread until the async actions are complete. I've tried using an async controller pattern, I've tried using Task.Factory, I've tried using new Threads. You name it, I've tried all the flavours - and then some!.
Now, I appreciate that the nature of http is not designed to facilitate long processes like this and there are a number of articles here on SO that say don't do it. However, there are mitigating reasons why i NEED to use this approach. The main reason that I need to run this in mvc is due to the fact that I actually update the live data cache (on the mvc app) in realtime via raising an event in my dll's code. This means that fragments of the 50-60 data feeds can be pushed out live before the entire async action is complete. Therefore, client apps can receive partial updates within seconds of the async action being instigated. If I were to delegate the process out to a console app that ran the entire process in the background, I'd no longer be able to harness those fragment partial updates and this is the raison d'etre behind the entire choice of this architecture.
Can anyone shed light on a solution that would allow me to mitigate the blocking of the thread, whilst at the same time, allow each async fragment to be consumed by my object model and fed out to the client apps (I'm using signalr to make these client updates). A kind of nirvanna would be a scenario where an out-of-process cache object could be shared between numerous processes - the cache update could then be triggered and consumed by my mvc process (aka - http://devproconnections.com/aspnet-mvc/out-process-caching-aspnet). And so back to reality...
I have also considered using a secondary webservice to achieve this, but would welcome other options before once again over engineering my solution (there are already many moving parts and a multitude of async Actions going on).
Sorry not to have added any code, I'm hoping for practical philosophy/insights, rather than code help on this, tho would of course welcome coded examples that illustrate a solution to my problem.
I'll update the question as we move in time, as my thinking process is still maturing on this.
[edit] - for the sake of clarity, the snippet below is my brothers grimm code collision (extracted from a larger body of work):
Parallel.ForEach(scrapeDataBases, new ParallelOptions()
MaxDegreeOfParallelism = Environment.ProcessorCount * 15
async dataBase =>
await dataBase.ScrapeUrlAsync().ConfigureAwait(false);
await UpdateData(dataType, (DataCheckerScrape)dataBase);
async and Parallel.ForEach do not mix naturally, so I'm not sure what your console solution looks like. Furthermore, Parallel should almost never be used on ASP.NET at all.
It sounds like what you would want is to just use Task.WhenAll.
On a side note, I think your reasoning around background processing on ASP.NET is incorrect. It is perfectly possible to have a separate process that updates the clients via SignalR.
Being that your question is pretty high level without a lot of code. You could try Reactive Extensions.
Something like
private IEnumerable<Task<Scraper>> ScrappedUrls()
// Return the 50 to 60 task for each website here.
// I assume they all return the same type.
// return .ScrapeUrlAsync().ConfigureAwait(false);
throw new NotImplementedException();
public async Task<IEnumerable<ScrapeOdds>> GetOdds()
var results = new Collection<ScrapeOdds>();
var urlRequest = ScrappedUrls();
var observerableUrls = urlRequest.Select(u => u.ToObservable()).Merge();
var publisher = observerableUrls.Publish();
var hubContext = GlobalHost.ConnectionManager.GetHubContext<OddsHub>();
publisher.Subscribe(scraper =>
// Whatever you do do convert to the result set
var scrapedOdds = scraper.GetOdds();
// update anything else you want when it arrives.
// Update SingalR here
// Will fire off subscriptions and not continue until they are done.
await publisher;
return results;
The merge option will process the results as they come in. You can then update the signalR hubs plus whatever else you need to update as they come in. The controller action will have to wait for them all to come in. That's why there is an await on the publisher.
I don't really know if httpClient is going to like to have 50 - 60 web calls all at once or not. If it doesn't you can just take the IEnumerable to an array and break it down into a smaller chunks. And also there should be some error checking in there. With Rx you can also tell it to SubscribeOn and ObserverOn different threads but I think with everything being pretty much async that wouldn't be necessary.

How does PubSub work in BookSleeve/ Redis?

I wonder what the best way is to publish and subscribe to channels using BookSleeve. I currently implement several static methods (see below) that let me publish content to a specific channel with the newly created channel being stored in private static Dictionary<string, RedisSubscriberConnection> subscribedChannels;.
Is this the right approach, given I want to publish to channels and subscribe to channels within the same application (note: my wrapper is a static class). Is it enough to create one channel even I want to publish and subscribe? Obviously I would not publish to the same channel than I would subscribe to within the same application. But I tested it and it worked:
RedisClient.Publish("Test", "Test Message");
and it worked.
Here my questions:
1) Will it be more efficient to setup a dedicated publish channel and a dedicated subscribe channel rather than using one channel for both?
2) What is the difference between "channel" and "PatternSubscription" semantically? My understanding is that I can subscribe to several "topics" through PatternSubscription() on the same channel, correct? But if I want to have different callbacks invoked for each "topic" I would have to setup a channel for each topic correct? Is that efficient or would you advise against that?
Here the code snippets.
public static Task<long> Publish(string channel, byte[] message)
return connection.Publish(channel, message);
public static Task SubscribeToChannel(string channelName)
string subscriptionString = ChannelSubscriptionString(channelName);
RedisSubscriberConnection channel = connection.GetOpenSubscriberChannel();
subscribedChannels[subscriptionString] = channel;
return channel.PatternSubscribe(subscriptionString, OnSubscribedChannelMessage);
public static Task UnsubscribeFromChannel(string channelName)
string subscriptionString = ChannelSubscriptionString(channelName);
if (subscribedChannels.Keys.Contains(subscriptionString))
RedisSubscriberConnection channel = subscribedChannels[subscriptionString];
Task task = channel.PatternUnsubscribe(subscriptionString);
//remove channel subscription
return task;
return null;
private static string ChannelSubscriptionString(string channelName)
return channelName + "*";
1: there is only one channel in your example (Test); a channel is just the name used for a particular pub/sub exchange. It is, however, necessary to use 2 connections due to specifics of how the redis API works. A connection that has any subscriptions cannot do anything else except:
listen to messages
manage its own subscriptions (subscribe, psubscribe, unsubscribe, punsubscribe)
However, I don't understand this:
private static Dictionary<string, RedisSubscriberConnection>
You shouldn't need more than one subscriber connection unless you are catering for something specific to you. A single subscriber connection can handle an arbitrary number of subscriptions. A quick check on client list on one of my servers, and I have one connection with (at time of writing) 23,002 subscriptions. Which could probably be reduced, but: it works.
2: pattern subscriptions support wildcards; so rather than subscribing to /topic/1, /topic/2/ etc you could subscribe to /topic/*. The name of the actual channel used by publish is provided to the receiver as part of the callback signature.
Either can work. It should be noted that the performance of publish is impacted by the total number of unique subscriptions - but frankly it is still stupidly fast (as in: 0ms) even if you have tens of multiple thousands of subscribed channels using subscribe rather than psubscribe.
But from publish
Time complexity: O(N+M) where N is the number of clients subscribed to the receiving channel and M is the total number of subscribed patterns (by any client).
I recommend reading the redis documentation of pub/sub.
Edit for follow on questions:
a) I assume I would have to "publish" synchronously (using Result or Wait()) if I want to guarantee the order of sending items from the same publisher is preserved when receiving items, correct?
that won't make any difference at all; since you mention Result / Wait(), I assume you're talking about BookSleeve - in which case the multiplexer already preserves command order. Redis itself is single threaded, and will always process commands on a single connection in order. However: the callbacks on the subscriber may be executed asynchronously and may be handed (separately) to a worker thread. I am currently investigating whether I can force this to be in-order from RedisSubscriberConnection.
Update: from 1.3.22 onwards you can set the CompletionMode to PreserveOrder - then all callbacks will be completed sequentially rather than concurrently.
b) after making adjustments according to your suggestions I get a great performance when publishing few items regardless of the size of the payload. However, when sending 100,000 or more items by the same publisher performance drops rapidly (down to 7-8 seconds just to send from my machine).
Firstly, that time sounds high - testing locally I get (for 100,000 publications, including waiting for the response for all of them) 1766ms (local) or 1219ms (remote) (that might sound counter-intuitive, but my "local" isn't running the same version of redis; my "remote" is 2.6.12 on Centos; my "local" is
2.6.8-pre2 on Windows).
I can't make your actual server faster or speed up the network, but: in case this is packet fragmentation, I have added (just for you) a SuspendFlush() / ResumeFlush() pair. This disables eager-flushing (i.e. when the send-queue is empty; other types of flushing still happen); you might find this helps:
try {
// start lots of operations...
} finally {
Note that you shouldn't Wait until you have resumed, because until you call ResumeFlush() there could be some operations still in the send-buffer. With that all in place, I get (for 100,000 operations):
local: 1766ms (eager-flush) vs 1554ms (suspend-flush)
remote: 1219ms (eager-flush) vs 796ms (suspend-flush)
As you can see, it helps more with remote servers, as it will be putting fewer packets through the network.
I cannot use transactions because later on the to-be-published items are not all available at once. Is there a way to optimize with that knowledge in mind?
I think that is addressed by the above - but note that recently CreateBatch was added too. A batch operates a lot like a transaction - just: without the transaction. Again, it is another mechanism to reduce packet fragmentation. In your particular case, I suspect the suspend/resume (on flush) is your best bet.
Do you recommend having one general RedisConnection and one RedisSubscriberConnection or any other configuration to have such wrapper perform desired functions?
As long as you're not performing blocking operations (blpop, brpop, brpoplpush etc), or putting oversized BLOBs down the wire (potentially delaying other operations while it clears), then a single connection of each type usually works pretty well. But YMMV depending on your exact usage requirements.

HttpWebRequest timing out on third try, only two connections allowed HTTP 1.1 [duplicate]

I'm developing an application (winforms C# .NET 4.0) where I access a lookup functionality from a 3rd party through a simple HTTP request. I call an url with a parameter, and in return I get a small string with the result of the lookup. Simple enough.
The challenge is however, that I have to do lots of these lookups (a couple of thousands), and I would like to limit the time needed. Therefore I would like to run requests in parallel (say 10-20). I use a ThreadPool to do this, and the short version of my code looks like this:
public void startAsyncLookup(Action<LookupResult> returnLookupResult)
this.returnLookupResult = returnLookupResult;
foreach (string number in numbersToLookup)
ThreadPool.QueueUserWorkItem(lookupNumber, number);
public void lookupNumber(Object threadContext)
string numberToLookup = (string)threadContext;
string url = #"http://some.url.com/?number=" + numberToLookup;
WebClient webClient = new WebClient();
Stream responseData = webClient.OpenRead(url);
LookupResult lookupResult = parseLookupResult(responseData);
I fill up numbersToLookup (a List<String>) from another place, call startAsyncLookup and provide it with a call-back function returnLookupResult to return each result. This works, but I found that I'm not getting the throughput I want.
Initially I thought it might be the 3rd party having a poor system on their end, but I excluded this by trying to run the same code from two different machines at the same time. Each of the two took as long as one did alone, so I could rule out that one.
A colleague then tipped me that this might be a limitation in Windows. I googled a bit, and found amongst others this post saying that by default Windows limits the number of simultaneous request to the same web server to 4 for HTTP 1.0 and to 2 for HTTP 1.1 (for HTTP 1.1 this is actually according to the specification (RFC2068)).
The same post referred to above also provided a way to increase these limits. By adding two registry values to [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings] (MaxConnectionsPerServer and MaxConnectionsPer1_0Server), I could control this myself.
So, I tried this (sat both to 20), restarted my computer, and tried to run my program again. Sadly though, it didn't seem to help any. I also kept an eye on the Resource Monitor while running my batch lookup, and I noticed that my application (the one with the title blacked out) still only was using two TCP connections.
So, the question is, why isn't this working? Is the post I linked to using the wrong registry values? Is this perhaps not possible to "hack" in Windows any longer (I'm on Windows 7)?
And just in case anyone should wonder, I have also tried with different settings for MaxThreads on ThreadPool (everything from 10 to 100), and this didn't seem to affect my throughput at all, so the problem shouldn't be there either.
It is matter of ServicePoint. Which provides connection management for HTTP connections.
The default maximum number of concurrent connections allowed by a ServicePoint object is 2.
So if you need to increase it you can use ServicePointManager.DefaultConnectionLimit property. Just check the link in MSDN there you can see a sample. And set the value you need.
For quicker reference for someone. To increase the connection limit per host you can do this in your Main() or anytime before you begin making the HTTP requests.
System.Net.ServicePointManager.DefaultConnectionLimit = 1000; //or some other number > 4
Fire and forget this method from your main method. Icognito user is correct, only 2 threads are allowed to play at the same time.
private static void openServicePoint()
ServicePointManager.UseNagleAlgorithm = true;
ServicePointManager.Expect100Continue = true;
ServicePointManager.CheckCertificateRevocationList = true;
ServicePointManager.DefaultConnectionLimit = 10000;
Uri MS = new Uri("http://My awesome web site");
ServicePoint servicePoint = ServicePointManager.FindServicePoint(MS);
For Internet Explorer 8:
Run Registry Editor and navigate to following key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_MAXCONNECTION SPERSERVER
If FEATURE_MAXCONNECTIONSPERSERVER and FEATURE_MAXCONNECTIONSPER1_0SERVER are missing then create them. Now create DWORD Value called iexplore.exe for both sub keys (listed above) and set their value to 10 or whatever number desired.

First WCF connection made in new AppDomain is very slow

I have a library that I use that uses WCF to call an http service to get settings. Normally the first call takes ~100 milliseconds and subsequent calls takes only a few milliseconds. But I have found that when I create a new AppDomain the first WCF call from that AppDomain takes over 2.5 seconds.
Does anyone have an explanation or fix for why the first creation of a WCF channel in a new AppDomain would take so long?
These are the benchmark results(When running without debugger attached in release in 64bit), notice how in the second set of numbers the first connections takes over 25x longer
Running in initial AppDomain
First Connection: 92.5018 ms
Second Connection: 2.6393 ms
Running in new AppDomain
First Connection: 2457.8653 ms
Second Connection: 4.2627 ms
This isn't a complete example but shows most of how I produced these numbers:
class Program
static void Main(string[] args)
Console.WriteLine("Running in initial AppDomain");
new DomainRunner().Run();
Console.WriteLine("Running in new thread and AppDomain");
class DomainRunner : MarshalByRefObject
public static void RunInNewAppDomain(string runnerName)
var newAppDomain = AppDomain.CreateDomain(runnerName);
var runnerProxy = (DomainRunner)newAppDomain.CreateInstanceAndUnwrap(typeof(DomainRunner).Assembly.FullName, typeof(DomainRunner).FullName);
public void Run()
var test = string.Empty;
var sw = Stopwatch.StartNew();
test += AppServSettings.ServiceBaseUrlBatch;
Console.WriteLine("First Connection: {0}", sw.Elapsed.TotalMilliseconds);
sw = Stopwatch.StartNew();
test += AppServSettings.ServiceBaseUrlBatch;
Console.WriteLine("Second Connection: {0}", sw.Elapsed.TotalMilliseconds);
The call to AppServSettings.ServiceBaseUrlBatch is creating a channel to a service and calling a single method. I have used wireshark to watch the call and it only takes a milliseconds to get a response from the service. It creates the channel with the following code:
public static ISettingsChannel GetClient()
EndpointAddress address = new EndpointAddress(SETTINGS_SERVICE_URL);
BasicHttpBinding binding = new BasicHttpBinding
MaxReceivedMessageSize = 1024,
OpenTimeout = TimeSpan.FromSeconds(2),
SendTimeout = TimeSpan.FromSeconds(5),
ReceiveTimeout = TimeSpan.FromSeconds(5),
ReaderQuotas = { MaxStringContentLength = 1024},
UseDefaultWebProxy = false,
cf = new ChannelFactory<ISettingsChannel>(binding, address);
return cf.CreateChannel();
From profiling the app it shows that in the first case constructing the channel factory and creating the channel and calling the method takes less than 100 milliseconds
In the new AppDomain constructing the channel factory took 763 milliseconds, 521 milliseconds to create the channel, 1,098 milliseconds to call the method on the interface.
TestSettingsRepoInAppDomain.DomainRunner.Run() 2,660.00
TestSettingsRepoInAppDomain.AppServSettings.get_ServiceBaseUrlBatch() 2,543.47
Tps.Core.Settings.Retriever.GetSetting(string,!!0,!!0,!!0) 2,542.66
Tps.Core.Settings.Retriever.TryGetSetting(string,!!0&) 2,522.03
Tps.Core.Settings.ServiceModel.WcfHelper.GetClient() 1,371.21
Tps.Core.Settings.ServiceModel.IClientChannelExtensions.CallWithRetry(class System.ServiceModel.IClientChannel) 1,098.83
After using perfmon with the .NET CLR Loading object I can see that when it loads the second AppDomain it is loading way more classes into memory than it does initially. The first flat line is a pause I put in after the first appdomain, there it has 218 classes loaded. The second AppDomain causes 1,944 total classes to be loaded.
I assume its the loading of all these classes that is taking up all of the time, so now the question is, what classes is it loading and why?
The answer turns out to be because of the fact that only one AppDomain is able to take advantage of the native image system dlls. So the slowness in the second appdomain was it having to rejit all of the System.* dlls used by wcf. The first appdomain could use the pre ngened native versions of those dlls, so it didn't have the same startup cost.
After investigating the LoaderOptimizationAttribute that Petar suggested, that indeed seemed to fix the issue, using either MultiDomain or MultiDomainHost results in the second AppDomain to take the same amount of time as the first time to access stuff over wcf
Here you can see the default option, note how in the second AppDomain none of the assemblies say Native, meaning they all had to be rejitted, which is what was taking all of the time
Here is after adding the LoaderOptimization(LoaderOptimization.MultiDomain) to Main. You can see that everything is loaded into the shared AppDomain
Here is after user LoaderOptimization(LoaderOptimization.MultiDomainHost) to main. You can see that all system dlls are shared, but my own dlls and any not in the GAC are loaded seperately into each AppDomain
So for the service that prompted this question using MultiDomainHost is the answer, because it has fast startup time and I can unload AppDomains to remove the dynamically built assemblies that the service uses
You can decorate your Main with LoaderOptimization attribute to tell the CLR loader how to load classes.
MultiDomain - Indicates that the application will probably have many domains that use the same code, and the loader must share maximal internal resources across application domains.
Do you have an HTTP proxy defined in IE? (maybe an auto configure script). This can be a cause.
Otherwise I would guess it is the time that takes to load all the dlls. Try to deparate the proxy creation from the actull call to the service, to see what's taking the time.
I found the following article that talks about how only the first AppDomain can use native image dlls, so a child appdomain will always be forced to JIT lots of stuff that the initial AppDomain doesn't have to. This could lead to the performancce impact I am seeing, but would it be possible to somehow not get this performance penalty?
If there is a native image for the assembly, only the first AppDomain
can use the native image. All other AppDomains will have to
JIT-compile the code which can result in a significant CPU cost.
