As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I need to zip each text file and copy into another server. File size may very from 500MB to 8GB. there is no dependency in each file. I have 35 files Appx.
My regular code taking appx 3-4 hours for this. To reduce the time, I am just thinking to implement Threading for this. Do you thing Threading will reduce the time or is there any other best way to do this.
.Net 4.0 has a new Threading.Task namespace that makes it a lot easier to schedule tasks without having to get deep into the threading scheduling.
It allows you to queue up subsequent tasks to run once the previous one has completed (regardless of success or failure).
http://msdn.microsoft.com/en-us/library/system.threading.tasks.aspx
http://www.codethinked.com/net-40-and-systemthreadingtasks
But, as previous commenters have suggested, if the bottleneck isn't the CPU doing the file compression, but rather the network transfer then it may not help much.
I would recommend you to use Task.Factory.StartNew because it by default creates 1 thread per core and que ups the other thread.
In my experience in working with large files, multi-threading does not speed up the process due to the limitations on the Hard Drive read/write itself and/or network.
You are not only doing a lot of reading and writing with your hard drive, but also copying large files to another computer over the network.
If your average file size is 4.25 GB, that comes out to be 148.75GB of storage space we are dealing with (at 35 file count). That is a lot of space and not only are you reading all that space into memory (hopefully not all at once, otherwise virtual memory will start kicking in and it will write out even more to your hard drive), you are also writing some of that space back out as a zip file.
Add that factor to file transfer over a network, I am not surprised at all at the times you are getting if your network is typical of the networks I have to deal with. Megabit and Gigabit speeds are never what they claim they are.
If you are using an external utility for zipping (i.e. 7-zip), and process spin up is not a concern for your application , I would keep it simple and just Process.Start() as many 7-zip EXEs as you need to do the tasks in (quasi) parallel, or do some number at a time, like 5. Up to you.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have a 500+GB text file. it has to be searched for duplicates, remove them, sort and save final file. Of course for such big file, the LINQ or such things are not good at all and will not work so they have to use External Sorting. there is an app called "Send-Safe List Manager". its speed is super fast, for a 200MB txt file it gives the result in less than 10 seconds. after examining inside the exe using "Greatis WinDowse" app i found that it has been written in Delphi. there are some external sorting classes written in C#. i have tested a 200MB file with them and all were over 1 minute. so my question is that for this kind of calculations is Delphi faster than C# and if i have to write my own, then should i use delphi? and with C# can i reach that speed at all?
Properly written sorting code for large file must be disk bound - at that point there essentially no difference what language you use.
Delphi generates native code and also allows for inline assembly, so in theory, maximum speed for a specific algorithm could be easier to reach in Delphi.
However, the performance of what you describe will be tied to the IO performance, and the performance difference between possible algorithms will be of several orders of magnitude more than the Delphi vs. .NET difference.
The language is probably the last thing you should look at if trying to speed that up.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm implementing a SOAP Webservice for sending thousands of emails and storing thousands of XML response records in a local database. (c#.net, visual studio 2012)
I would like to make my service consumer as fast and lightweight as possible.
I need to know some of the considerations. I always have a feeling that my code should run faster than it is.
E.g.
I've read that using datasets increase overhead. So should I use lists of objects instead?
Does using ORM introduce slowness into my code?
Is a console application faster than a winform? Because the user needs no GUI to deal with. There are simply some parameters sent to the app that invoke some methods.
What are the most efficient ways to deal with a SOAP Web Service?
Make it work, then worry about making it fast. If you try to guess where the bottle necks will be, you will probably guess wrong. The best way to optimize something is to measure real code before and after.
Datasets and ORM and win form apps, and console apps can all run plenty fast. Use the technologies that suit you, then tune the speed if you actually need it.
Finally if you do have a performance problem, changing your choice of algorithms to better suit your problem will likely yield much greater performance impact than changing any of the technologies you mentioned.
Considering my personal experience with soap, in this scenario I would say your main concern should be on how you retrieve this information from your database (procedures, views, triggers, indexes and etc).
The difference between console, winform and webapp isn't that relevant.
After the app is done you should make a huge stress test on it to be able to see where lies your performance problem, if it exists.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm currently working on a todolist system that acts like a calendar to store your Tasks and can check them off once you are done with it. We should also be able to undo our changes made.
Currently, my project mate is suggesting that we store the data into files with different dates and when we want to search for a particular task, just search whether the file exists, then edit and directly manipulate the files when needed.
However, I feel that it is better to store the data into 1 large file and load it into memory(possibly a list of tasks) when our program is executed. I can't explain why though.
Does OOP come into the picture when dealing with this?
Sorry if I am a bit confused as I am still learning.
It is a perfect task for a database solution. I suggest that you use the SQL server database that was included with your Visual Studio for this task.
Store each task as rows in a table and select dates and subjects for the calendar view and all the values of one task when editing. VS has some pretty good tools to create such an application in a few minutes (for an experienced user)
Handling files is always a mess when several persons need to edit the data at the same time.
Always the best practice depends on your work you are doing as for todo list you have to make multiple operation on ,
So its going to better if you use client side memory like a sdf file to do this instead of making files because sdf file will work as database and because its an light weight with large data to so easy to handle than file
Firstly this is a persistence problem and should be dome using well known patterns. You could use a database and repository pattern to solve this. Nosql database are also an option ,as thy are easy to setup and lacks the overheads associated with SQL dbs.
But if flat files is your option then holding all data in memory has the flaw of when an exception occurs or the program shut you loose all you data. Persistence is necessary using create read update ans delete CRUD cycles.
This results in persisting in small chunks as you go and you only loose a small amount of data if you crash.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a website with very high load and keeping my test app under a hidden iframe to make sure that the target framework is a good choice for my use case. First tried SignalR test app and then Pokein under same server config. Currently we are using Flash remoting solutions but soon we are planning to change it.
I spent some time to make my SignalR based test application to handle concurrent client updating under the high load of my website. It was working good under the scenario (some of the clients requests for message).. when most of the connected clients request for the messages at the same time, it failed dramatically (I needed to remove it from the iframe call).. I had suspected my server configuration is the problem but the same scenario work under other paid solution Pokein without any issue.
Is there any trick i forget?
Feb.10.2012 Update:
Although we decided to implement PokeIn into our solution, I tried the latest SignalR code on Github (might be helpful for others).. and the result is the same.
March.13.2012 Update:
Scenario: (One more time)
-Try to send a message to the thousands of connected clients under a given interval lets say (1 sec). It won't be hard to test and see the result. I feel like, i am the only person around stressing the libraries for this type of very common usage.
Details (How to reproduce - tested with 0.5 from Github)
- Server 2008 R2 32GB DDR3, i7-2600 3.4Ghz, 2x256 GB Crucial M4
- ASP.NET 3.5
Single page app. updates the time on the client side from the server every seconds
This page is embedded into a hidden iframe loaded by several web sites in order make a real life load test.
Issues
System locks at some point ( approx 800 users) and most of the clients doesn't get the updated time from server
Once the system locks, that single app page stops responding
I also tried to increase the interval to 5 secs. This time the system was more responsive (approx. 950 users) but the result was same. I tried this on .NET 2 and .NET 4 application pools.
Hope these details are enough. Repeating this test is quite easy for me and as soon as i found a free time, i will repeat the test with future version.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I want to get a good grasp of multi-threading in C#. I've read some articles like Joseph Albahari's tutorials that explain the concepts, but as you know, no matter how much you read, most of it becomes rubbish if you don't practice. I need something that has instructive and pragmatic code examples related to real life practices, not some examples that print some lines. Do you have any suggestions?
guys guys I think I found a good site: planet-source-code.com. Searching in .Net codes with "thread" keyword seems to return some good examples, like
multi threaded folder synchronization
multi threaded TCP server
background file downloader
async. socket
P2P file sharing
simple POP3 console mail checker and lots of others!
yay!
Some kind of random number-crunching is a good test for this. I taught myself threading by writing a prime number finder, then breaking my "search" numbers into blocks and using a thread to work through each one.
This let me set some variables on block size, number of threads to use, wait time between firing threads etc. to test how each of these affects performance.
If you're doing any winforms or wpf development, you'll quickly run across issues when you try to do "stuff" in the UI thread.
Let's say that you need to read and parse the contents of a large (2GB) XML file. If the work were performed in the UI thread, the interface would hang until the work had been completed. Conversely, if you were to do the work correctly in a worker thread, then you could keep the UI responsive via messaging and let the user know what you're currently doing (status bar (ugh,) or display in text what you're doing "Reading XML.", etc.)
A good simple example would be to make a sample application and have it fire off a BackgroundWorker to handle some arbitrary work in the background (it could even be Thread.Sleep(10000), or something trivial like that.)
I'd say this is one of the many good starting points out there on the subject.
http://msdn.microsoft.com/en-us/library/cc221403%28VS.95%29.aspx
This site has a few sample applications that I think would be decent practice applications to implement. However, it seems like the links to the source code are broken. Nonetheless, I believe the applications presented represent very practical examples. A few include:
Desktop Search
Download Manager
FTP Client
File Compression
Multiple RSS Feeds