I have made a simple search utility that will search for files in your computer.
It has a function search which searches for the files and creates the list of matched items to a mainloop function which in turn calls displayForm function which displays the results of the search in a new form.
Whenever, I run the application the first time after startup, although the search function completes the search in about 1 seconds, the time taken to display the result window takes considerable time(about 10 seconds) . This happens only for the first time you search after startup.
I am not providing any code for search function because I don't think search function matters because It takes almost same amount of time irrespective of running it the first time or subsequent times, and the working of displayForm is very simple.
public void displayForm()
{
// Do some stuff here
// Make a listbox and add items to display.
SearchForm.ShowDialog() ;
}
Also, by experimenting with a few cases , I must tell you that making a list box takes the same time irrespective of running it first time or subsequent times.
What could be the possible reasons for this ?
This is entirely normal, it has little to do with your code. Cold start time is dominated by the speed of the hard disk. Which can be disappointing when it has to locate the many DLLs that are required to start a .NET app. Not a problem exclusive to .NET apps, large unmanaged apps like Office apps and Adobe Reader have this issue as well. Which tend to cheat by installing an "optimizer", a program that runs at login and makes your machine slow by pre-loading the DLLs that the program needs so they are available in the file system cache, defeating SuperFetch in the process.
The operating system's file system cache is a pretty effective solution for the slow disk, but it is like a mile long freight train to get up to speed. Getting it filled from scratch with useful data takes a while, effective disk transfer rates when it has to seek is at best a few megabytes/sec. Also the core reason that users like an SSD so much, it provides a much more fundamental solution. Once you've experienced one, you can never go back.
Covered very well in many excellent articles, best way to find them is by googling ".NET cold start time".
Related
EDIT: Context
I have to develop a web asp.net application which will allow user to create "Conversion request" for one or several CAO files.
My application should just upload files to a directory where I can convert them.
I then want to have a service that will check the database updated by the web application to see if a conversion is waiting to be done. Then I have to call a batch command on this file with some arguments.
The thing is that those conversion can freeze if the user give a CAO file which has been done wrongly. In this case, we want the user or an admin to be able to cancel the conversion process.
The batch command use to convert is a third party tool that I can't change. It need a token to convert, and I can multithread as long as I have more than one token available. An other application can use those token at the same moment so I can't just have a pre-sized pool of thread according to the number of token I have.
The only way I have to know if I can convert is to start the conversion with the command and see if in the logs it tells me that I have a licence problem which mean either "No token available" or "Current licence doesn't accept the given input format". As I allow only available input formats on the web application, I can't have the second problem.
The web application is almost done, I mean that I can upload file and download results and conversion logs at the end. Now I need to do the service that will take input files, convert them, update convert status in database and lastly add those files in the correct download dirrectory.
I have to work on a service which will look in a database at a high frequency (maybe 5 or 10 seconds) if a row is set as "Ready to convert" or "Must be canceled".
If the row is set to "ready to convert" I must try to convert it, but I do it using a third party dll that have a licence token system that allow me to do only 5 converts simultaneously atm.
If the row is set to "Must be canceled" I must kill the conversion because it's either freeze and the admin had to kill it or because the user canceled his own task.
Also, conversion times can be very long, from 1 or 2 seconds to several hours depending on the file size and how it has been created.
I was thinking about a pooling system, as I saw here :
Stackoverflow answer
Pooling system give me the advantage to isolate the reading database part to the conversion process. But I have the feeling that I loose a kind of control on background process. Which is maybe juste because I'm not used to them.
But I'm not very used to services and even if this pool system seems good, I don't know how I can cancel any task if needed ?
The tool I use to convert work as a simple batch command that will just return me an error if no licence are available now, but using a pool how can I make the convert thread wait for the convert to be done if No licence are available is a simple infinite while loop an appropriate answer ? It seems quite bad to me.
Finally, I can't just use a "5 threads pool" as thoses licences are also used by 2 others applications which doesn't let me know at any time how many of them are available.
The idea of using pool can also be wrong, as I said, I'm not very used to services and before starting something stupid, I prefer ask abotu the good way to do it.
Moreover, about the database reading/writing, even if I think that the second option is better, should I:
Use big models files that I already use on my ASP.NET web application which will create a lot of objects (one for each row as it's entities models).
Don't use entities models but lighter models which will be less object entities oriented but will probably be less ressources demanding. This will also be harder to maintain.
So I'm more asking about some advices on how I should do it than a pure code answer, but some example could be very useful.
EDIT: to be more precise, I need to find a way to:
(For the moment, I stay with only one licence available, I will evolve it later if needed)
Have a service that run as a loop and will if possible start a new thread for the given request. The service must still be running as the status can be set to "Require to be cancel".
At the moment I was thinking about a task with a cancellation token. Which would probably achive this.
But if the task find that the token isn't currently available, how can I say to my main loop of the service that it can't process now ? I was thinking about having just an integer value as a return where the return code would be an indicator on the ending reason: Cancellation / No token / Ended... Is that a good way to do ?
What I'm hearing is that the biggest bottleneck in your process is the conversion... pooling / object mapping / direct sql doesn't sound as important as the conversion bottleneck.
There are several ways to solve this depending on your environment and what your constraints are. Good, fast, and cheap... pick 2.
As far as "best practice" go, there are large scale solutions (ESBs, Message Queue based etc), there are small scale solutions (console apps, batch files, powershell scripts on Windows scheduler, etc) and the in-between solutions (this is typically where the "Good Enough" answer is found). The volume of the stuff you need to process should decide which one is the best fit for you.
Regardless of which you choose above...
My first step will be to eliminate variables...
How much volume will you be processing? If you wrote something that's not optimized but works, will that be enough to process your current volume? (e.g. a console app to be run using the Windows Task Scheduler every 10 - 15 seconds and gets killed if it runs for more than say 15 minutes)
Does the third party dll support multi-threading? If no, that eliminates all your multi-threading related questions and narrows down your options on how to scale out.
You can then determine what approach will fit your problem domain...
will it be the same service deployed on several machines, each hitting the database every 10-15 seconds?, or
will it be one service on one machine using multi-threading?
will it be something else (pooling might or might not be in play)?
Once you get the answer above, the next question is.
will the work required fit within the allocated budget and time constraint of your project? if not, go back to the questions above and see if there questions above that you can answer differently that would change the answer to this question to yes.
I hope that these questions help you get to a good answer.
In our .NET 4.0 Winforms application, some users (all Win7 x64) recently experienced very long wait times (compared to others) when the application is saving its' settings using this code:
Properties.Settings.Default.Save();
Typical durations: 0.5 to 1 seconds
Extreme durations: 15 to 20
seconds
The applications settings (scope: User, everything saved in user.config under AppData\Local\\) consist of several custom classes as well as two classes representing printer settings:
System.Drawing.Printing.PageSettings and
System.Drawing.Printing.PrinterSettings
Using GlowCode profiler on one of those machines, I found the following function to take 17 seconds:
<Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationWriterPrinterSettings_x003A__x003A_Write9_PrinterSettings Nodes="1" Visits="1" percent_in_Child="100.00 %" Time_in_Child="17.456" Time="17.456" Avg._Time_in_Child_="17.456" Avg._Time="17.456" Blocks_net="12" Bytes_net="1024" Blocks_gross="1087" Bytes_gross="494146" />
Of which the duration was almost equally split onto three getters (taken from GlowCode viewer):
PrinterSettings::get_PaperSizes
PrinterSettings::get_PaperSources
PrinterSettings::get_PrinterResolutions
Doing some research revealed following pages:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/8fd2132a-63e8-498e-ab27-d95cdb45ba87/printersettings-are-very-slow
and
http://www.pcreview.co.uk/forums/papersources-and-papersizes-really-slow-some-systems-t3660593.html, quote:
On some systems, particularly Vista x64 systems, it takes forever (5 to 15
seconds if compiled for x64, 10-20 seconds if compiled for x86) to enumerate
either the papersources or papersizes collection of a printersettings object.
Using a small test app just saving PrinterSettings revealed a saving time around 3.5 seconds on one of those "slow" machines, while the other was quite not impressed with a duration of 0.2 seconds which corresponds to my fast development machine.
Any ideas on the reasons and how to improve this?
How can I find the real reasons for these delays?
Edit: Thanks for pointing out that the printer settings are acquired through the driver, this might explain delays on certain machines.
Updating the printer drivers on machines which I cannot access in future wherever this will be installed is not possible.
Also, I won't (I know I know) reduce the PrinterSettings information to be saved just because some people might experience a lag and break backward compatibility eventually ...
Maybe if I try serialization in background (after user has done some printer changes?) it might speed up things ...
First suggestion:
The calls to retrieve paper sources and paper sizes are being passed through to the driver. Your best bet is going to make sure that the newest version of the driver is installed. It's possible that older versions of the driver (in particular, those from the CD that came in the box) are old and buggy. If you haven't already, hit the manufacture's website, and grab the latest.
Second suggestion
Apart from that, it's going to be a pain, but you could try using the underlying Win32 APIs instead of the CLR counterparts. In this case, you'd call GetPrinter, requesting a PRINTER_INFO_2 struct. Once you have that, you can examine pDevMode to get a DEVMODE struct that has all of the information you're looking for.
This question or this question should be helpful.
Instead of persisting the entire PrinterSettings class instance, only persist individual settings as their base types. Keep it simple -- strings, ints, bools, etc. Clearly the Serializer is requesting communication with the printer, and that's what is introducing the latency. I'm willing to bet that if you grab individual class members and serialize them yourself, you'll see an improvement.
Obviously, this means that when you load settings, you'll need to deserialize all of these settings back into a new PrinterSettings class, and apply them.
EDIT 1, in response to question edit
That's true - you could have the Save() run async in the background. Your only issue would be if the user attempts to end the process (close the app) before the save is complete. You'd have to maintain a bool as to whether a save is occurring (set to false when the callback fires). If the user attempts to exit the app and the bool is true, put up "Please wait while settings are saved..." until the bool goes false.
So, it seems some machines take a long time querying the page and printer settings through the installed driver. I couldn't find anymore specifics about that.
To shorten the shutdown time, the aforementioned parts of the settings are assigned and saved in a background thread after the user made changes to the printer settings. That takes about 10 seconds.
During shutdown (form close), these settings are not assigned again but we still save all (using Properties.Settings.Default.Save()) and somehow the serializer recognizes that they don't have changes to query and so the saving finishes very quick:
Between 0.02 and 0.05 seconds, but still all settings are saved properly!
Fun fact: this issue was first reported in the week when we got a new office printer :)
I have been given a windows service written by a previous intern at my current internship that monitors an archive and alerts specific people through emails and pop-ups should one of the recorded values go outside a certain range. It currently uses a timer to check the archive every 30 seconds, and I have been asked if I would be able to update it to allow a choice of time depending on what "tag" is being monitored. It uses an XML file to keep track of which tags are being monitored. Would creating multiple timers in the service be the most efficient way of going about this? I'm not really sure what approach to take.
The service is written in C# using .NET 3.5.
Depending on the granularity, you could use a single timer that is a common factor of the timing intervals they want. Say they want to put in the XML file that each archive is to be checked every so many minutes. You set up a timer that goes off once a minute, and you check how long it's been since you did each one and whether to do it or not.
If you're getting a chance to re-architect, I would move away from a service to a set of scheduled tasks. Write it so one task does one archive. Then write a controller program that sets up the scheduled tasks (and can stop them, change them etc.) The API for scheduled tasks on Windows 7 is nice and understandable, and unlike a service you can impose restrictions like "don't do it if the computer is on battery" or "only do it if the machine is idle" along with your preferences for what to do if a chance to run the task was missed. 7 or 8 scheduled tasks, each on their own schedule, using the same API of yours, passing in the archive path and the email address, is a lot neater than one service trying to juggle everything at once. Plus the machine will start up faster when you don't have yet another autostart service on it.
Efficient? Possibly not - especially if you have lots of tags, as each timer takes a tiny but finite amount of resources.
An alternative approach might be to have one timer that fires every second, and when that happens you check a list of outstanding requests.
This has the benefit of being easier to debug if things go wrong as there's only one active thread.
As in most code maintenance situations, however, it depends on your existing code, your ability, and how you feel more comfortable.
I woould suggest to just use one timer scheduled at the least common divisor.
For example configure your timer to signal every second and you can handle every interval (1 second, 2 seconds, ...) by counting the according number of timer ticks.
One of the analytics that I had to have on my program was How much time do users spend on my program? It is basically a measure of how useful the users find my program that they actively keep on using it. and used to promote users to actively start using the application.
I initially thought of using Time Span between when they start the application to when they close it but the problem was that users could just keep the application open and not use it.
I currently use TotalProcessorTime (C#/VB .Net) to let management know how much time users actively spend on the application. TotalProcessorTime give the amount an application uses the CPU but this does not translate well to management because even when a user actively uses the application for a few minutes the TotalProcessorTime would be far less.
Any out of the box thinking / suggestions?
Since you want to know how much people use your software as opposed to how long your software uses the CPU (they aren't always the same thing), the way I'd do it (and I actually used this before) is to use GetLastInputInfo.
You can have a timer in your application and check every say.. 500ms if your application is the active application and GetLastInputInfo returns the system has been idle for less than some threshold (5-10sec depending on what your application does). As long as both of these two conditions hold, you can add 500ms to your application active usage.
Of course, you can still track total CPU usage as a separate statistic, but I think my way provides a more... focused usage counter for your application.
I have a faulty hard drive that works intermittently. After cold booting, I can access it for about 30-60 seconds, then the hard drive fails. I'm willing to write a software to backup this drive to a new and bigger disk. I can develop it under GNU/Linux or Windows, I don't care.
The problem is: I can only access the disk for some time, and there are some files that are big and will take longer than that to be copied. For this reason, I'm thinking of backing up the entire hard disk in smaller pieces, something like bit torrenting. I'll read some megabytes and store it, before trying to read another set. My main loop would be something like this:
while(1){
if(!check_harddrive()){ sleep(100ms); continue; }
read_some_megabytes();
if(!check_harddrive()){ sleep(100ms); continue; }
save_data();
update_reading_pointer();
if(all_done){ break; }
}
The problem is the check_harddrive() function. I'm willing to write this in C/C++ for maximus API/library compatibility. I'll need some control over my file handlers to check if they are still valid, and I need something to return bad data, but return, if the drive fails during the copy process.
Maybe C# would give me best results if I abuse "hardcoded" hardware exceptions?
Another approach would be measuring how much time would I need to power cycle my harddrive and code a program to read it during this time only, and flagging me when to power cycle.
What would you do in this case? Are there any tools/utilities that already do this?
Oh, there is a GREAT app to read bad optical medias here, it's called IsoPuzzle, it's not mine, I just wanted to share something related to my problem.
!EDIT!
Some clarifications. I'm a home user, a student of computer engineering at college, I'd rather lose the data than spend thousands of dollars recovering it. The harddrive is still covered by Seagate's warranty, but since they gave me 5 years of warranty, I wanna try everything possible until the time runs out.
When I say cold booting, I mean booting after some seconds without power. Hot booting would be rebooting your computer, cold booting would be shutting it down, waiting a few seconds then bootting it up again. Since the harddisk in question is internal but SATA, I can just disconnect the power cable, wait a few seconds and connect it again.
Until now I'll go with robocopy, I'm just searching for it to see how I can use it. If I don't need to code myself, but script, it'll be even easier.
!EDIT2!
I wasn't clear, my drive is a Seagate 7200.11. It's known that it has a bad firmware and it's not always fixable with a simple firmware update (not after this bug appears). The drive physically is 100% in working condition, just the firmware is screwed, making it enter on a infinite busy state after some seconds.
I would work this from the hardware angle first. Is it an external drive - if so, can you try it in a different case?
You mention cold-booting works, then it quits. Is this heat related? Have you tried using the hard drive for an extended period in something like a freezer?
From the software side I'd have a second thread keep an eye on some progress counter updated by a repeated loop reading small amounts of data, then it would be able to signal failure via a timeout you would define.
I think the simplest way for you is to copy the entire disk image. Under Linux your disk will appear as a block device, /dev/sdb1 for example.
Start copying the disk image until the read error appear. Then wait for the user to "repair" the disk and start reading from the last position.
You can easily mount file disk image and read its content, see -o loop option for mount.
Cool down disk before use. I heard that helps.
You might be interested in robocopy("Robust File Copy"). Robocopy is a command line tool and it can tolerate network outages and resume copying where it previously left off (incomplete files are noted with a date stamp corresponding to 1980-01-01 and contain a recovery record so Robocopy knows from where to continue).
You know... I like being "lazy"... Here is what I would do:
I would write 2 simple scripts. One of them would start robocopy (with persistance feautures turned off) and start the copying while the other would periodically check (maybe by trying to list the contents of the root directory and if it takes more than a few seconds than it it is dead... again..) whether the drive is still working and if the HDD stopped working it would restart the machine. Get them start up after login and setup up auto-login so when the machines reboots it automatically continues.
From a "I need to get my data back" perspective, if your data is really valuable to you, I would recommend sending the drive to a data recovery specialist. Depending on how valuable the data is, the cost (probably several hundred dollars) is trivial. Ideally, you would find a data recovery specialist that doesn't just run some software to do the recovery - if the software approach doesn't work, they should be able to do things like replace the circiut board on the drive, and probably other things (I am not a data recover specialist).
If the value of the data on the drive doesn't quite rise to that level, you should consider purchasing one of the many pieces of software for data recovery. For example, I personally have used and would recommend GetDataBack from Runtime software http://www.runtime.org. I've used it to recover a failing drive, it worked for me.
And now on to more general information... The standard process for data recovery off of a failing drive is to do as little as possible on the drive itself. You should unplug the drive, and stop attempting to do anything. The drive is failing, and it is likely to get worse and worse. You don't want to play around with it. You need to maximize your chances of getting the data off.
The way the process works is to use software that reads the drive block-by-block (not file-by-file), and makes an image copy of the drive. The software attempts to read every block, and will retry the reads if they fail, and writes an image file which is an image of the entire hard drive.
Once the hard drive has been imaged, the software then works against the image to identify the various logical parts of the drive - the partitions, directories, and files. And then it enables you to copy the files off of the image.
The software can typically "deduce" structures from the image. For example, if the partition table is damaged or missing, the software will scan through the entire image, looking for things that might be partitions, and if they look enough like partitions, it will treat them like a partition and see if it can find directories and files. So good software is written with using a lot of knowledge about the different structures on the drive.
If you want to learn how to write such software, good for you! My recommendation is that you start with books about how various operating systems organize data on hard drives, so that you can start to get an intuitive feel for how a software might work with drive images to pull data from them.