Background
I have written a utility that watches for files in a certain directory, and then copies them to defined target locations on remote machines. There is also a feature that allows stopping defined services in order to allow copying to the target.
In our work environment, these remote machines are typically VMs (we use VMWare Workstation) and the machines are part of a VM sub-domain, and are configured to use NAT networking (share the host machine's IP address). So when I say "remote" it's really referring to a VM running on the host.
Problem
For my utility, I'm trying to copy files using a UNC path to the target directory, and using the machine name get a list of services using the ServiceController.GetServices(string machineName) method.
So if you had a VM named server-1, you might be trying to copy a file to \\server-1\c$\destinationfolder. Most of the time this works, but sometimes I see an excetion because the target directory can't be found. When this happens, we also see an error when trying to get services on the remote machine - "The RPC server is unavailable."
When the VM is restarted, everything works fine... for a while.
I'm having a hard time trying to nail down the issue, because it's sporadic and doesn't affect most people. I'm wondering if it's an IP issue, where VMWare changes the IP and it's stale in the host's cache? (If I sound like I don't really know what I'm talking about here, it's only because I don't... my networking knowledge is fairly basic). When I look up issues with the 'RPC server is unavailable' error, I see a lot of answers regarding firewalls, which I don't believe is the case here. We don't run anything like McAfee internally and since it works most of the time, it doesn't seem like the cause.
Actual Questions
Anyone have any thoughts as to what might cause this problem? As a follow-up, if it is a stale IP issue, how could I recreate the issue for debugging purposes, so I can try to come up with a good way to resolve it going forward?
Related
This is my first Stack Overflow question so apologies if this isn't great...
I'm sure this is something either super simple I am missing or something very complex that I've gotten myself into, but I am using ClickOnce for the first time to create an automated updater for a company application I developed.
The application itself was originally written in VB but I have translated it into C#. We use this to automate a database of assets, which changes very frequently. I have been tasked to allow it to complete automated updates to keep from confusing some of the techs with uninstall/reinstalling the application weekly.
I volunteered to make an FTP server using a personal server machine I use at home. Normally this machine would be used for local networking but I've wanted to create an FTP server for some time (this is my first FTP server too).
So I went on my way, set the publish location for the build to ftp://[IP.ADDRESS]:21/Folder/Subfolder and the Installation folder URL to http://[IP.ADDRESS]:21/Folder/Subfolder
Long story short, when I try to test an update (changing only the assembly version), I am an error:
System.Deployment.Application.DeploymentDownloadException: Downloading http://[IP.ADDRESS]:21/Folder/Subfolder/applciation.application did not succeed ---> System.Net.WebException: The server committed a protocol violation.
I did some research and tried adding an SSL certificate and changed the update path to https://[IP.ADDRESS]:21/Folder/Subfolder/ then tested that. This time around, I get this error:
System.Deployment.Application.DeploymentDownloadException: Downloading http://[IP.ADDRESS]:21/Folder/Subfolder/applciation.application did not succeed ---> System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a send. --> System.IO.IOException: The handshake failed due to an unexpected format.
I cannot tell if this is progress or if I moved backwards here LOL. I've been jumping back and forth and going to many threads to try to figure out where this is going wrong. I'm also having a pretty tricky time finding out if this is an error with how I've set up ClickOnce or if this is an error in how I have set up FTP with IIS.
Apologies if this is not enough information, I can provide more if necessary. Also apologies if this is too much information! Any help or guidance is appreciated!
I'm guessing you're working for a small company and infrastructure/resources are at a premium. With that in mind I'll offer some suggestions:
Does your company have a network shared drive? I don't like ClickOnce, but I have deployed it to network shares in the past with success. This has the benefit of you not needing to deal with security.
Have you considered migrating this to a web application? Web development seemed really daunting when I was a native app developer, but with Blazor and ASP.NET Core it's become a lot more accessible. This would completely get rid of the need for updating the application.
Consider an alternative deployment route. ClickOnce is not incredibly well supported.
I'd be remiss if I didn't throw a red flag on security. FTP is a very old protocol and is basically insecure by design. Hosting it on your home server means that you're transmitting the app over the public internet... What would happen if someone outside your company installed the application?
I constantly run into problems where a program runs fine on my system (installed and in development environments) but fails on a users system.
The best solution I can conceive is to debug the program on the users computer, but that's not practical since it would require me to install VS on their system and move the source code over (and doing so would compromise the integrity of the test environment anyway since it may cause the bug to vanish).
I want to remote debug but what I'm reading from the MSDN says to me "Nope, you can't do that unless both systems are on the same network (all. my. rage.)" :
Network configuration
The remote computer and the Visual Studio computer must be connected over a network, workgroup, or homegroup, or else connected directly through an Ethernet cable. Debugging over the Internet is not supported.
Is there another option I can use to connect to the process that is running on another system outside my network?
A partial solution (for the case that throws an exception).
Instead of trying to reach client app over the Internet, generate crash dump and debug it locally, in a comfort of your home :). Please see MSDN for details.
You can use a tunneling application (like hamachi). This will allow you to connect the other party like you're on the same network. Then you can use the remote debugging option. But even in local network remote debugging works slow and sometimes fail with code-optimized builds. Before using the "remote debugger" (by assuming there is any exceptions) I suggest to log all exceptions to a log file then watch the system. It will be helpful to put .pdb files to the remote system to see the relations between code.
I am using visual studio 2010 and I have searched on the net for help and other people using the DirectoryEntry("WinNT:") but it doesn't seem to work for me. I can see my network workgroups and if I use DirectoryEntry("WinNT://MYWORKGROUP") I can't see any computers listed.
Please help I am not sure why it isn't working for me.
Thanks
Getting computer names from my network places:
Do not use DirectoryServices unless your sure of a domain environment. The System.DirectoryServices class is an ADSI wrapper that dosent work without an Active Directory to query against. NetServerEnum() works on workgroups and domains but dosen't guarantee the most reliable data (not all machines may show up). It relies on the Computer Browser service.
To browse the local Windows network, NetBIOS name resolution must be running and correctly configured. In a corporate network that often means the presence of a WINS server. The required components are not enabled by default on modern Windows installations.
Before trying to do anything from your own code, ensure that the infrastructure is in place. Open Windows Explorer and expand the "Network" node. If name Windows browsing is correctly you should see the list of computers on the network there. If the list is empty, the problem isn't in your code.
(Sorry if this is a really long question, it said to be specific)
The company I work for has a number of sites, which have been running for some time with no problems. The applications are a mix of ASP.NET 2.0, 3.5, and 4.0, all using an ADO.NET to connect to a SQL Server Standard instance (on the same webserver) all being hosted with IIS7.
The problem began when we moved to an upgraded webserver. We made every effort to set up the server, db instance and IIS with the exact same settings (except for the different machine name, and the fact that we had upgraded from SQLExpress to Standard), and as far as we could tell, we did. Both servers are running Windows Server 2008 R2 (all current updates applied), and received a default install.
The problem is very apparent when starting up one of these applications. When you reach the login page of our application, the page itself loads extremely fast. This is true even when you load the page from a new machine that could not possibly have the page cached, with IIS caching disabled. The problem is actually visible when you enter your login information and click the login button. Because of the (not great)design of our databases, the login process must access a number of databases, theoretically up to 150 separate DBs, but in practice usually 2. The problem occurs even when only 2 databases (the minimum) are opened. Not a great design, but we have to live with it for now.
When trying to initially open a connection to the database, the entire process stops for about 20 seconds every time, regardless of whether you are connecting to 2 dbs or 40. I have run a .NET profiler (jetbrains dottrace) against the process, and the only information I could take from it was that one or all of the calls to sqlconnection.open() was accounting for 90% of the time. This only happens on first-use of the application, but the problem is compounded by the fact that IIS seems to disregard the recycling settings we have set for it, and recycles the application after a few minutes of idle, causing the problem to occur again.
I also tried to use the SQL Server profiler to see which database operations were the cause of the slowdown, but because of all the other DB activity, (and the fact that I had to do this on our production server, because the problem doesnt occur in our test environments) I couldn't pin down the exact operation that was causing the stoppage. I will try coming in late at night and shutting down the production sites to run the SQL profiler, but I might not be able to do this right away.
In the course of researching the problem, I have tried a couple solutions
Thinking it might be a name resolution problem, I tried modifiying both the hosts file on the webserver as well as giving the connectionstrings an IP address instead of the servername to resolve, with no difference. I have heard of the LLMNR protocol causing problems like this, but I think trying to connect by IP or resolving with the hosts file should have eliminated that possibility, tho i admit I never tried actually turning off LLMNR.
I have increased the idle timeouts, recycling intervals etc in IIS, but this doesn't even seem to be respected, much less solving the problem. This leads me to believe there is a setting overriding the IIS application settings on the machine.
multiple other code fixes, none of which made any difference. Is a SqlServer setting causing the problem?
other stuff that i forgot by now.
Any ideas, experience or whatevers would be greatly appreciated in helping me solve this problem!
I would advise using a non-tcp connection if you are still running the SQL instance on the local machine. SQL Server supports several protocols, tcp, named pipes, and shared memory are the more common.
Named Pipes
Data Source=np:computer\instance
Shared Memory
Data Source=lpc:computer\instance
Personally I prefer the Shared Memory. Remember you need to enable these protocols, and to avoid configuration mistakes I suggest you disable all you are not using.
see http://msdn.microsoft.com/en-us/library/ms187892.aspx
IIS Reset
In IIS7 there are two ways to configure the idle-timeout. Both begin by clicking on the "Application Pools" section and right-clicking the appropriate app domain. If you click the "Recycling..." option there is one setting. The other is in "Advanced Settings..." under the section for "Process Model" you will find "Idle Time-out (minutes)" which set to zero disables the process timeout. This later option is the one that works for us.
If I were you I'd solve this problem first as restarting the appdomain and/or worker process is always painful even if you don't have a 20 second lag.
Some ideas:
from the web server, can you ping the db server and get a "normal"
response, or are you seeing a similar delay?
if you're seeing a delay, run a tracert to see if you can nail down where the slowness is occurring
try using a tool like QueryExpress (http://www.albahari.com/queryexpress.aspx) which doesn't require an install to run. You can download this EXE and run it from your web server. See if you can connect to your db using this and run queries in a normal fashion.
Try something like SysInternals' TcpView (http://technet.microsoft.com/en-us/sysinternals/bb897437) to take a look at your open connections and see what activity is happening on your server and how much data is being sent to and received from your db server.
Just some initial thoughts on where I'd start to look based upon your problem description. I hope this helps. Good luck with things!
With IIS not respecting recycling settings: did restarting IIS/rebooting change the behavior?
I am trying to solve a persistent IO problem when we try to read or write to a Windows 2003 Clustered Fileshare. It is happening regularly and seem to be triggered by traffic. We are writing via .NET's FileStream object.
Basically we are writing from a Windows 2003 Server running IIS to a Windows 2003 file share cluster. When writing to the file share, the IIS server often gets two errors. One is an Application Popup from Windows, the other is a warning from MRxSmb. Both say the same thing:
[Delayed Write Failed] Windows was unable to save all the data for the file \Device\LanmanRedirector. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elswhere.
On reads, we are also getting errors, which are System.IO.IOException errors: "The specified network name is no longer available."
We have other servers writing more and larger files to this File Share Cluster without an issue. It's only coming from the one group of servers that the issue comes up. So it doesn't seem related to writing large files. We've applied all the hotfixes referenced in articles online dealing with this issue, and yet it continues.
Our network team ran Network Monitor and didn't see any packet loss, from what I understand, but as I wasn't present for that test I can't say that for certain.
Any ideas of where to check? I'm out of avenues to explore or tests to run. I'm guessing the issue is some kind of network problem, but as it's only happening when these servers connect to that File Share cluster, I'm not sure what kind of problem it might be.
This issue is awfully specific, and potentially hardware related, but any help you can give would be of assistance.
Eric Sipple
I've heard of AutoDisconnect causing similar issues (even if the device isn't idle). You may want to try disabling that on the server.
I am having similar problems:
writing to a machine that is also part of a Windows 2003 R2 NLB cluster sometimes results in "Delayed Write Failed" or "the semaphore has timed out" or "the specified network name is no longer available"
this is reproducible for the same files, even after rebooting all machines involved
if I rename the problem-files (some of which are quite small), the problem remains
if I write the files to another location (fysical disk) on the same machine, the problem remains
I uninstalled all anti-virus software, problem remains
I have reset the tcp-ip stack, problem temporarily disappears, but after some time the problem returns for the same files
PARTLY SOLVED the problem:
I deleted (not stopped) the host from the NLB cluster. Problem solved.
Seems to have to do something with writing to a share on a server that is also part of a network load balancing cluster
I have not yet found other people posting NLB cluster related file write problems. However, I did find many posts complaining about similar problems, none of which seem to have been solved.
Anne
I've seen other people reporting the "delayed write failed" error. One recommendation was to adjust the size of the cache, there's a utility from sysinternals (http://technet.microsoft.com/en-us/sysinternals/bb897561.aspx) that will allow you to do that.