I've got a WPF application written in C#. It has to instantiate thousands of objects. After pulling data from the database server, it has to run a ton of calculations that takes time. The whole process takes up to 20-30 seconds with 80% of it coming from the calculations.
So to help resolve this issue, I wrote a WCF service that keeps a copy of the already instantiated objects with the calcs already run, and then upon request, transfers the instantiated objects to the calling client.
It works! However it's slow...really slow. Much slower than the original way. It takes 3-4 minutes to transfer all the objects from the WCF service, thus defeating it's purpose.
I've tried streaming instead of buffering the service and increasing or decreasing the different service options in the client and server config files, but haven't found settings that make a real difference yet.
Is this slow speed to be expected, or should it be fast and I just need to modify some options? If so, what options?
WCF isn't necessarily slow but if the application isn't designed properly, the application can be slow. It could be compared to loading up a few thousand pounds of weight on a sports car. The car is a fast car, but it isn't really being used properly.
First, I would say you have to minimize the amount of data that is being sent on the wire (more about this later). Once on the wire, you'll get a lot better performance if you use TCP or named pipes instead of HTTP. See Choosing a Transport. HTTP is easy since most networks are configured to let is past easily but it isn't designed for large data sets.
If the delay is coming from the calculations, then the only thing the WCF service will accomplish is offloading the processing from the server to the client. Ultimately this might be a good thing - or even necessary - if you plan on having a high volume of concurrent requests to the server but as you have noticed, it doesn't necessarily mean shorter times for the end user. What you should focus on doing is minimizing the calculation time.
It is hard to give specifics since you havent revealed much about what is being queried, what is being returned and the the calculations are doing. However, I have had impressive results with large data sets by offloading code from the application server to the database server via Visual Studio SQL Server Projects. Since .NET and MSSQL are both written on the CLR, you can write native database objects (like user defined functions) in C# or VB or any other CLR language and deploy them directly into the database. Then you can use these functions in your queries and they are very fast since they are compiled into native SQL. I've seen orders of magnitude in difference between running C# in the application vs running the same function in the database.
If 80% of your applicatiion's work comes from the calculations, then it might be a great idea to parallize some parts of it, for example with the Task Parallel Library.
I have a WPF app that makes some WCF calls (about 5-6 per minute). It has about 100 users. These calls come in bursts (The user presses save, that calls a WCF "Broker" service, which then calls several other WCF Services.)
I was looking into duplex communication and I saw that WCF can support TCP communication. I also saw that IIS 7 can support TCP hosting.
From what I have read, there can be some performance gains by using TCP.
But my understanding of TCP is that it is more for systems that are going to be making many hundreds of calls per minute.
Would my less chatty system see real benefits from taking the time to switch from HTTP to TCP?
As a matter of opinion, I would say that if your current system works well and you're not experiencing any particular problem using HTTP, then you probably shouldn't change it. Why would you inject uncertainty into your project for no particular reason?
If you're making five or six calls per minute, then I can't see how converting to TCP will gain you much. Sure, your data transmission time will be slightly less, but what's the point? If your messages are huge--megabytes in size--then I might worry about improving data transmission speed. Otherwise, there's just no point to it.
Now, if you expect that your traffic will increase a thousandfold in the near future, then you probably should look at converting to TCP rather than HTTP. Beyond that, I'd recommend that you spend your time and effort on improvements that add value to your product.
I'm considering using WCF or mormot as frameworks for RESTful service, where the code of business / legacy that needs to be accessed is written in Delphi. Performance is a premise in the project.
The application must be prepared for load balancing. The clients of REST service Desktops are Windows applications. These desktop clients allow the user to view large volumes of data, with huge resultsets in SQL statements. What is the best way to implement a service to cache a recordset and consume it slowly through the REST service. Can demonstrate a good example? The recordset must be cached in the session until the client completed the consultation or decided to do the full fetch. I'm looking for the right architecture?
Enabling load balancing will work in WCF? Due to the recordset being cached on a single server, with the row fetch requests, if any, must fall on the same server.
Both WCF and mORMot share the same high-performance kernel-mode http.sys server. Both feature IOCP and multi-threading.
For performance, mORMot will be lighter, will allocate (much) less memory, won't be affected by Garbage Collector freezes, and is able to get JSON content directly from the database engine (by-passing most temporary data conversion and allocation) - so that you can achieve amazing speed. In short, mORMot was designed for performance of serving REST/JSON content from the ground up - with a multi-threaded kernel (whereas e.g. node.js is mono-threaded). If your purpose is also to cache some data, mORMot works very well as 64 bit native services, giving access to all your system RAM if needed, and has built-in real-time content compression.
WCF is a great general-purpose communication library, which can be RESTful, but is not RESTful from its (historical) roots. The main issue I saw with WCF is the difficulty to configure it between applications (.exe.config tuning may be confusing), and that it is a big black box. For instance, it was not possible to implement Cross-origin resource sharing with WCF when the server is hosted as a Windows service (the Access-Control-Allow-Origin: HTTP headers are deleted by WCF!): you have to host it within IIS - and can't fix the issue, whereas with a full Open Source solution, you can fix any issue.
Load-balancing can be implemented in mORMot and WCF with the same algorithm. Instead of using a round-robbin algorithm in your case, a simple routing based on the content may be enough.
Using WCF to serve business logic written in Delphi will be slow, error prone and difficult to maintain. Mixing technologies induces unneeded complexity. I would not go into this direction.
If you have an existing Delphi code base, and some Delphi skills, I guess mORMot may be a better choice. It was reported e.g. that a single server on production is able to hande more than one million requests per day, serving thousands of concurrent clients, with dedicated JavaScript process on the server side. One of the mORMot design goals was to help working with existing code and legacy projects. But I'm not 100% fair, since I'm the main maintainer of this open source project. :)
EDIT: As result of the answers so far I like to add more focus in what I like to zero in on: A database that allows writing in-memory (could be simple C# code) with persistence to storage options in order to access the data from within R. Redis so far looks the most promising. I also consider to actually use something similar to Lockfree++ or ZeroMQ, in order to avoid writing data concurrently to the database, but rather sending all to be persisted data over a message bus/other implementation and to have one "actor" handle all write operations to an in-memory db or other solution. Any more ideas aside Redis (some mentioned SQLite and I will need to still test its performance). Any other suggestions?
I am searching for the ideal database structure/solution that meets most of my below requirements but so far I utterly failed. Can you please help?
My tasks: I run a process in .Net 4.5 (C#) and generate (generally) value types that I want to use for further analysis in other applications and therefore like to either preserve in-memory or persist on disk. More below. The data is generated within different tasks/threads and thus a row based data format does not lend itself well to match this situation (because the data generated in different threads is generated at different times and is thus not aligned). Thus I thought a columnar data structure may be suitable but please correct me if I am wrong.
Tasks/Thread #1 generates the following data at given time stamps
datetime.ticks / value of output data
1000000001 233.23
1000000002 233.34
1000000006 234.23
Taks/Thread #2 generates the following data at given time stamps
datetime.ticks / value of output data
1000000002 33.32
1000000005 34.34
1000000015 54.32
I do not need to align the time stamps at the .Net run-time, I am first and foremost after preserving the data and to process the data within R or Python at a later point.
My requirements:
Fast writes, fast writes, fast writes: It can happen that I generate 100,000- 1,000,000 data points per second and need to persist (worst case) or retain in memory the data. Its ok to run the writes on its own thread so this process can lag the data generation process but limitation is 16gb RAM (64bit code), more below.
Preference is for columnar db format as it lends itself well to how I want to query the data later but I am open to any other structure if it makes sense in regards to the examples above (document/key-value also ok if all other requirements are met, especially in terms of write speed).
API that can be referenced from within .Net. Example: HDF5 may be considered capable by some but I find their .Net port horrible.Something that supports .Net a little better would be a plus but if all other requirements are met then I can deal with something similar to the HDF5 .Net port.
Concurrent writes if possible: As described earlier I like to write data concurrently from different tasks/threads.
I am constrained by 16gb memory (run .Net process in 64bit) and thus I probably look for something that is not purely in-memory as I may sometimes generate more data than that. Something in-memory which persists at times or a pure persistence model is probably preferable.
Preference for embedded but if a server in a client/server solution can run as a windows service then no issue.
In terms of data access I have strong preference for a db solution for which interfaces from R and Python already exist because I like to use the Panda library within Python for time series alignments and other analysis and run analyses within R.
If the API/library supports in addition SQL/SQL-like/Linq/ like queries that would be terrific but generally I just need the absolute bare bones such as load columnar data in between start and end date (given the "key"/index is in such format) because I analyze and run queries within R/Python.
If it comes with a management console or data visualizer that would be a plus but not a must.
Should be open source or priced within "reach" (no, KDB does not qualify in that regards ;-)
OK, here is what I have so far, and again its all I got because most db solution simply fail already on the write performance requirement:
Infobright and Db4o. I like what I read so far but I admit I have not checked into any performance stats
Something done myself. I can easily store value types in binary format and index the data by datetime.ticks , I just would need to somehow write scripts to load/deserialize the data in Python/R. But it would be a massive tasks if I wanted to add concurrency, a query engine, and other goodies. Thus I look for something already out there.
I can't comment -- low rep (I'm new here) -- so you get a full answer instead...
First, are you sure you need a database at all? If fast write speed and portability to R is your biggest concern then have you just considered a flat file mechanism? According to your comments you're willing to batch writes out but you need persistence; if those were my requirements I'd write a straight-to-disck buffering system that was lightning fast then build a separate task that periodically took the disk files and moved them into a data store for R, and that's only if R reading the flat files wasn't sufficient in the first place.
If you can do alignment after-the-fact, then you could write the threads to separate files in your main parallel loop, cutting each file off every so often, and leave the alignment and database loading to the subprocess.
So (in crappy pseudo_code), build a thread process that you'd call with backgroundworker or some such and include a threadname string uniquely identifying each worker and thus each filestream (task/thread):
file_name = threadname + '0001.csv' // or something
open(file_name for writing)
while(generating_data) {
while (buffer_not_full and very_busy) {
if(file is big enough or enough time has passed or we're not too busy) {
move(file_name to bob's folder)
increment file_name
open(file_name for writing)
Efficient and speedy file I/O and buffering is a straightforward and common problem. Nothing is going to be faster than this. Then you can just write another process to do the database loads and not sweat the performance there:
while(file_name in list of files in bob's folder sorted by date for good measure)
read bob's file
load bob's file to database
align dates, make pretty
And I wouldn't write that part in C#, I'd batch script it and use the database's native loader which is going to be as fast as anything you can build from scratch.
You'll have to make sure the two loops don't interfere much if you're running on the same hardware. That is, run the task threads at a higher priority, or build in some mutex or performance limiters so that the database load doesn't hog resources while the threads are running. I'd definitely segregate the database server and hardware so that file I/O to the flat files isn't compromised.
FIFO queues would work if you're on Unix, but you're not. :-)
Also, hardware is going to have more of a performance impact for you than the database engine, I'd imagine. If you're on a budget I'm guessing you're on COTS hardware, so springing for a solid state drive may up performance fairly cheaply. As I said, separating the DB storage from the flat file storage would help, and the CPU/RAM for R, the Database, and your Threads should all be segregated ideally.
What I'm saying is that choice of DB vendor probably isn't your biggest issue, unless you have a lot of money to spend. You'll be hardware bound most of the time otherwise. Database tuning is an art, and while you can eek out minor performance gains at the top end, having a good database administrator will keep most databases in the same ballpark for performance. I'd look at what R and Python support well and that you're comfortable with. If you think in columnar fashion then look at R and C#'s support for Cassandra (my vote), Hana, Lucid, HBase, Infobright, Vertica and others and pick one based on price and support. For traditional databases on a single commodity machine, I haven't seen anything that MySQL can't handle.
This is not to answer my own question but to keep track of all data bases which I tested so far and why they have not met my requirements (yet): each time I attempted to write 1 million single objects (1 long, 2 floats) to the database. For ooDBs, I stuck the objects into a collection and wrote the collection itself, similar story for key/value such as Redis but also attempted to write simple ints (1mil) to columnar dbs such as InfoBright.
Db4o, awefully slow writes: 1mil objects within a collection took about 45 seconds. I later optimized the collection structure and also wrote each object individually, not much love here.
InfoBright: Same thing, very slow in terms of write speed, which surprised me quite a bit as it organizes data in columnar format but I think the "knowledge tree" only kicks in when querying data rather than when saving flat data structures/tables-like structures.
Redis (through BookSleeve): Great API for .Net: Full Redis functionality (though couple drawbacks to run the server on Windows machines vs. a Linux or Unix box). Performance was very fast...North of 1 million items per second. I serialized all objects using Protocol Buffers (protobuf-net, both written by Marc Gravell), still need to play a lot more with the library but R and Python both have full access to the Redis DB, which is a big plus. Love it so far. The Async framework that Marc wrote around the Redis base functions is awesome, really neat and it works so far. I wanna spend a little more time to experiment with the Redis Lists/Collection types as well, as I so far only serialized to byte arrays.
SqLite: I ran purely in-memory and managed to write 1 million value type elements in around 3 seconds. Not bad for a pure RDBMS, obviously the in-memory option really speeds things up. I only created one connection, one transaction, created one command, one parameter, and simply adjusted the value of the parameter within a loop and ran the ExecuteNonQuery on each iteration. The transaction commit was then run outside the loop.
HDF5: Though there is a .Net port and there also exists a library to somehow work with HDF5 files out of R, I strongly discourage anyone to do so. Its a pure nightmare. The .Net port is very badly written, heck, the whole HDF5 concept is more than questionable. Its a very old and in my opinion outgrown solution to store vectorized/columnar data. This is 2012 not 1995. If one cannot completely delete datasets and vectors out of the file in which they were stored before then I do not call that an annoyance but a major design flaw. The API in general (not just .Net) is very badly designed and written imho, there are tons of class objects that nobody, without having spent hours and hours of studying the file structure, understands how to use. I think that is somewhat evidenced by the very sparse amount of documentation and example code that is out there. Furthermore, the h5r R library is a drama, an absolute nightmare. Its badly written as well (often the file upon writing is not correctly close due to a faulty flush and it corrupts files), the library has issues to even be properly installed on 32 bit OSs...and it goes on and on. I write the most about HDF5 because I spent the most of my time on this piece of .... and ended up with the most frustration. The idea to have a fast columnar file storage system, accessible from R and .Net was enticing but it just does not deliver what it promised in terms of API integration and usability or lack thereof.
Update: I ditched testing velocityDB simply because there does not seem any adapter to access the db from within R available. I currently contemplate writing my own GUI with charting library which would access the generated data either from a written binary file or have it sent over a broker-less message bus (zeroMQ) or sent through LockFree++ to an "actor" (my gui). I could then call R from within C# and have results returned to my GUI. That would possibly allow me the most flexibility and freedom, but would obviously also be the most tedious to code. I am running into more and more limitations during my tests that with each db test I befriend this idea more and more.
RESULT: Thanks for the participation. In the end I awarded the bounty points to Chipmonkey because he suggested partly what I considered important points to the solution to my problem (though I chose my own, different solution in the end).
I ended up with a hybrid between Redis in memory storage and direct calls out of .Net to the R.dll. Redis allows access to its data stored in memory by different processes. This makes it a convenient solution to quickly store the data as key/value in Redis and to then access the same data out of R. Additionally I directly send data and invoke functions in R through its .dll and the excellent R.Net library. Passing a collection of 1 million value types to R takes about 2.3 seconds on my machine which is fast enough given that I get the convenience to just pass in the data, invoke computational functions within R out of the .Net environment and getting the results back sync or async.
Just a note: I once had a similar problem posted by a fellow in a delphi forum. I could help him with a simple ID-key-value database backend I wrote at that time (kind of a NoSQL engine). Basically, it uses a B-Tree to store triplets (32bit ObjectID, 32bit PropertyKey, 64bit Value). I could manage to save about 500k/sec Values in real time (about 5 years ago). Of course, the data was indexed on all three values (ID, property-ID and value). You could optimize this by ignoring the value index.
The source I still have is in Delphi, but I would think about implementing something like that using C#. I cannot tell you whether it will meet your needs for performance, but if all else fails, give it a try. Using a buffered write should also drastically improve performance.
I would go with way combining persistence storage (I personally prefer db4o, but you can use files as well as mentioned above) and storing objects into memory this way:
use BlockingCollection<T> to store objects in memory (I believe you will achieve better performance then 1000000/s to store objects in memory), and than have one or more processing threads which will consume the objects and store them into persistent database
// Producing thread
for (int i=0; i<1000000; i++)
// Consuming threads
while (true)
var myObject = blockingCollection.Take();
db4oSession.Store(myObject); // or write it to the files or whathever
BlockingCollection pretty much solves Producer-Consumer workflow, and in case you will use multiple instance of them and use AddToAny/TakeFromAny you can reach any kind of multithreaded performance
each consuming thread could have different db4o session (file) to reach desired performance (db4o is singlethreaded).
Since you want to use ZeroMQ why not use memcache over Redis?
ZeroMQ offers no persistence as far as I know. Memcache also offers no persistence and is a bit faster than Redis.
Or perhaps the other way, if you use Redis why not use beanstalk MQ?
If you want to use Redis (for the persistence) you might want to switch from ZeroMQ to beanstalk MQ (also a fast in memory queue, but also has persistence via logging). Beanstalk also has C# libs.
I want it to work on windows servers.
It will be a cloud type server - it'll consist of modules\parts running on different machines all over the world using http\tcp + upnp to connect to each other
There are going to be controlling\monitoring\observing modules on each machine to provide stats on performance
This net is going to be working with large amount of VIDEO\AUDIO life streaming\broadcasting data
It is going to use FFMPEG for re-encoding and OpenGL, OpenCV and such for filtering (.NET wrappers exist and work BTW)
It will not use any WCF or IIS
I want to develop it in team of 2-4 developers, smart students.
So is it OK to create this in C# .Net or I shall not waste my time on promises of ease it could provide to a developer and go C\C++?
So is it reasonable to write a server application in C# in my case?
Offtop - why not WCF
Warning: it gets way to subjective in here.
WCF is grate when you have big corp with relatively small data exchange per one session of service.
When you have video, LIVE video, it all gets complicated. Large amounts of data, lots of users stream in and out from your service at the same time.
Try to do live video streaming over http binding - than try it with others than you'll see why I do not like idea of live streaming with WCF - it is slow, with way2much not needed for live streaming info and after all have you ever seen a live video streaming app on WCF? No - you haven't - may be you have seen +- live video on Silverlight + IIS pair which I do not like because it is just for Silverlight\WindowsMediaPlayer video streaming solution while I want more than that.
I love to have cross-platform clients with reach UI’s. And I do not like (it is all here my personal opinion - so it is subjective) Silverlight+IIS+WCF group. So what shall I do - right go to sockets, streams in such old and simple formats like FLV and Flash as back end client - Simpler in development in some parts, more conservative way of doing live video over the web than one you get from MS today.
I love Flash FLV live streaming because you just open socket and start sending live FLV video data onto it (for each user FLV header and than FLV "TAG's", one by one: video tag, audio tag, video tag, audio tag etc) and Flash plays it! With no special\unusual code. It is fast, easy in supporting, and does not make client need anything new\unusual. And you on server side can take grate use of that "TAG" form of video\audio data representation.
So that is in short why I just do not want to use WCF - hard to get live video playing out from it on client side, no general benefits for live video server.
And when most of live data goes thru sockets why to bother with using WCF for service management.
During last half of 2009 and first half of 2010 I was getting into WCF, live video streaming, silverlight and flash, comparing process of client\server creation, reading different formats with a team of wary interesting developers. In general at the end of project we had lots of mini servers streaming live data and lots of different clients receiving it. Comparing all we've done we came to conclusions which are near one I present you here.
That is why I do not want to use WCF in my nearest project - I do not want to think about how to deliver media data, I want to focus on its filtering\editing.
Why the question appeared
We started playing with FFmpeg\OpenCV in C, and it is pretty simple to manipulate data using them... in C... on Linux...
But when we started to play with there .Net bindings (we are now playing with Tao.FFmpeg) we found that in most cases we end up playing with C# Marshal a lot, and having 2 variables for its C analog (problem of pointers) and so on. I hope we will not see such problem with Emgu CV but steel it makes me a little bit afraid...
I think it's entirely reasonable. The benefits of C# with regard to ease of development will greatly outweigh any performance drawbacks of not using C++.
C# is generally more cross-platform than C++. True, C++ is a cross-platform language, but there are large differences between the APIs that C++ programs use to interact with the system. C# and .Net/Mono have a much more standardized interface to the socket layer.
Finally, with ambitious projects like this, getting the project into a usable form is a much more important goal than getting the highest performance possible. Performance only matters if the project is complete. Write it in C# because that will give you the greatest odds of completion. Then worry about performance.
I'm not exactly sure why people have brought up Cross Platform concerns as clearly the OP has stated the app will run on Windows.
As to the actual questions.
Can you build a server application that communicates via tcp/http in C# that does not have to run in IIS. -> Yes.
Can you build a server application that is performant and scales in C# -> Yes.
Can you do so with Students -> Maybe. Depends on the students... ;) But that is irrespective of the language in use.
Is this something I would do? Yes. We've done that. We have a c# app running on approximately 20,000 machines right now that are communicating effectively over tcp. We aren't using WCF, but we did decide to use RESTful style services over http for the data transfer.
Our biggest issue was simply tuning the app to transfer the "right" amount of data over the wire at a time. This network is for data collection and storage. It's averaging around 200GB of data collected a day..
I wanted to clarify a bit about the above app. The 20,000 machines at the above installation are clients (XP, Vista, 7, 2003 Server, and 2008 Servers). There's only one data collection point server in the mix. The clients post data to the server, when connected to a network, once every 45 seconds. Roughly 97% of the machines stay connected in this manner, the rest connect a couple times a week.
This works out to the server processing about 37 million requests a day.
Now, to be sure, each request is relatively small at around 5KB to 6KB each. However, the shear number of requests shows that a C# application can handle managing those connections, which is the bigger part of the OP's problem.
Because the OP's files are large (Video), then the real issue is simply in data transfer. Which will be hindered more by hard drive speeds, as well as network speed and latency. Those issues are irrespective of which language you are working in and will limit the number of connections per server based on available bandwidth.
Working this out let's limit it down to one server for an example. If you have a video rate of 400kb/s then and a 25MB connection to the internet, then that box could physically only handle around 62 simultaneous connections. Which is so FAR below the number of connections our app is doing as to be a rounding error.
Assuming perfect network conditions (which don't exist), pumping that internet connection up to 100MB (which can be expensive) means a 4x increase in simultaneous connections to 240; still completely manageable.
However, the network is only one side of the equation. Drive speed on the servers matters a lot. You better have a good disk array capable of continuously delivering that amount of data. I know drives claim 3GB data transfers, but a drive which can saturate the channel has never been built. Which means serious planning and money in the server setup.
The point of all of this is to say that the language doesn't matter one bit in your situation. You have other much larger contention issues. With that being the case, go with the language that will help you get the project done faster.
Why stop at C#, if you (possibly) want cross-platform, write it in Python or similar, you'll find that the networking aspects of a scripting language are far better than C# (as that's pretty much the role scripting languages are put to nowadays, running web-based servers).
You'll find developer productivity is much improved over C# (just as C# has better productivity over C++), and there are lots of people who know and want to work on these systems. It sounds like performance of the servers themselves is of less importance than the networking, so it appears that script would be your best choice. Plus ffmpeg libraries are more tightly integrated with python using pyffmpeg than C# (well, mostly).
And it'd be a lot cooler, more fun, and very much cross-platform!
If you want C# and also cross-platform abilities, your development will have to target the Mono platform (or another cross-platform .NET runtime, if you can find one). You might have to give up VisualStudio, and maybe some Microsoft-specific libraries and tools, but you can still have C# on multiple platforms. Just make sure you start the multi-platform building and testing EARLY in the process or it will be hell to change things later.
If the target of the application is to run only on Windows platforms, I'm completely sure to write this application in C#. Many applications like that can be running right now and we don't even know that.
If the target is to run on multiple platformms, you should encapsulate first all the problems that a non-windows platform can bring to your application.
Why do you have to write it in C++ if, in this case, C# is capable to do everything that C++ does? I would use C++ to program things on hardware-level things, like a robot or something else. To write a server application, C# will fit very well what you want, it was designed for these things.
And C# is cross-platform, you just need the right tool to make it work on a specific platform.
I currently have an application that sends XML over TCP sockets from windows client to windows server.
We are rewriting the architecture and our servers are going to be in Java. One architecture we are looking at is a REST architecture over http. So C# WinForm clients will send info using this. We are looking for high throughput and low latency.
Does anyone have any performance metrics on this approach versus some other C# client to Java server communication options.
This isn't really well-enough defined to make any metric statements; how big are the messages, how often would you be hitting the REST service, is it straight HTTP or do you need to secure it with SSL? In other words, what can you tell us about the workload parameters?
(I say this over and over again on performance questions: unless you can tell me something about the workload, I can't -- nobody really can -- tell you what will give better performance. That's why they used to say you couldn't consider performance until you had an implementation: it's not that you can't think about performance, it's that people often couldn't or at least wouldn't think about workload.)
That said, though, you can make some good estimates simply by looking at how many messages you want to exchange, because setup time for TCP/IP often dominates REST. REST offers two advantages here: first, the TCP/IP time often does dominate the message transmission, and that's pretty well optimized in production web servers like Apache or lighttpd; second, a RESTful architecture enhances scalability by eliminating session state. That means you can scale freely using just a simple TCP/IP load balancer.
I would set up a test to try it and see. I understand that the only part of your application you're changing is the client/server communication. So analyse what you're sending now, and put together a test client/server setup sending messages which are representative of what you think your final solution is going to be doing (perhaps representative only in terms of size/throughput).
As noted in the previous post, there's not enough detail to really judge what the performance is going to be like. e.g.
is your message structure/format going to be the same, but merely over HTTP rather than raw sockets ?
are you going to be sending subsets of XML data ? Processing large quantities of XML can be memory intensive (e.g. if you're using DOM-based approach).
What overhead is your chosen REST framework going to be introducing (hopefully very little, but at the moment we don't know).
The best solution is to set something up using (say) Jersey and spend some time testing various scenarios. If you're re-architecting a solution, it's going to be worth a few days investigating performance (let alone functionality, ease of development etc.)
It's going to be plenty fast, unless you have a very, very large number of concurrent clients hitting those servers. The XML shredding keeps getting faster in both Java and .NET. If you are on CLR2 and Java 5 or above, you will be fine. But of course you still need to do the tests to verify.
We've tested in our lab, REST and SOAP transactions, and they are faster than you might think. Tens of thousands of messages per second. Small numbers of modern CPUs generating XML messages can easily saturate a gigabit network. In other words, the network is the bottleneck (transmission of data), not the CPU (serializing & de-serializing XML).
AND, If you do your software design properly, in the very unlikely situation where REST is not sufficient, then swapping out the message format layer (REST => protobufs) will get you better transmission perf, with minimal disruption.
But before you need to go there, you will be able to send some money to Cisco and get lots more headroom.