I am connecting to Gmail account using TCP client for reading emails. It returns SslStream for the TCP connection. It works fine for single thread environment but performance is very poor in terms of speed.
I need to optimize the project so that its speed can be increased. I have implemented multithreading which increases the speeed but application gets hang at some point.
Is it thread safe to use TCP connection (global member)?
OR Can I create multiple TCP connections and pass to the thread method to increase the speed ?
OR is there any other better way for doing this?
TCPClient m_TCPclient
SslStream sslStream;
private void createTCP()
{
// creating tcp and sslstream
}
private void authenticateUser()
{
// authenticating the user
}
private void getUserdata()
{
// iterating folders and its items
foreach(string emailID in IDList)
{
//Thread implementation
}
With regard to thread safety, take a quick glance at the documentation for the TcpClient and SslStream:
Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
I think what you may want to look at is using the Asynch methods of the stream to deal with the hanging when you perform IO.
Neither TCPClient nor SslStream objects are thread safe. You would have to add thread synchronization to avoid race conditions to avoid hanging. However, your application speed will still be dependent on the single tcp client which essentially renders your multi threading useless in terms of tcp throughput.
Have each thread create its own connection and stream objects instead. This will in turn increase your tcp throughput which is most likely the bottleneck of your application.
To synchronize the threads so they don't read the same information, have the main thread fetch a list of emails and pass a subset of the email list to each of the threads which in turn fetch those emails using their own connections.
You can also use caching to avoid getting the same information every time you restart the application.
Related
I'm creating a network protocol based on Tcp, and i am using berkeley socket via C#.
is the socket buffer gonna mixed up when two threads trying to send data via socket.send method at a time?
should i use lock for accessing one thread at a time?
According to MSDN, the Socket class is thread safe. So, that means you don't need to lock the send method and you can safely send and receive on different threads. But be aware that the order is not guaranteed, so the data won't be mixed if you send it all at a single call to the send method instead of chuncked calls.
In addition to that, I would recommend to lock and flush it just in case you don't want the server swapping the responses across multiple concurrent requests. But that doesn't seems to be this case.
When talking sockets programming in C# what does the term blocking mean?
I need to build a server component (possibly a Windows service) that will receive data, do some processing and return data back to the caller. The caller can wait for the reply but I need to ensure that multiple clients can call in at the same time.
If client 1 connects and I take say 10 seconds to process their request, will the socket be blocked for client 2 calling in 2 seconds later? Or will the service start processing a second request on a different thread?
In summary, my clients can wait for a response but I must be able to handle multiple requests simultaneously.
Blocking means that the call you make (send/ receive) does not return ('blocks') until the underlying socket operation has completed.
For read that means until some data has been received or the socket has been closed.
For write it means that all data in the buffer has been sent out.
For dealing with multiple clients start a new thread for each client/ give the work to a thread in a threadpool.
Connected TCP sockets can not be shared, so it must be one socket per client anyway.
This means you can't use the socket for anything else on the current executing thread.
It has nothing to do with szerver side.
It means the thread pauses whilst it waits for a response from the socket.
If you don't want it to pause, use the async methods.
Read more: http://www.developerfusion.com/article/28/introduction-to-tcpip/8/
A blocking call will hold the currently executing thread until the call completes.
For example, if you wish to read 10 bytes from a network stream call the Read method as follows
byte[] buf = new byte[10];
int bytesRead = stream.Read(buf, 0, buf.Length);
The currently executing thread will block on the Read call until 10 bytes has been read (or the ReadTimeout has expired).
There are Async variants of Read and Write to prevent blocking the current thread. These follow the standard APM pattern in .NET. The Async variants prevent you having to deal out a Thread (which will be blocked) to each client which increases you scalability.
Blocking operations are usually those that send or receive data and those that establish connections (i.e. listen for new clients or connect to other listeners).
To answer your question, blocking basically means that the control stays within a function or block of code (such as readfile() in c++) until it returns and does not move to the code following this code block.
This can be either in a Single threaded or a Multi-threaded context. Though having blocking calls in a single threaded code is basically recipe for disaster.
Solution:
To solve this in C#, you can simply use asynchronous methods for example BeginInvoke(), and EndInvoke() in the sockets context, that will not block your calls. This is called asynchronous programming method.
You can call BeginInvoke() and EndInvoke() either on a delegate or a control depending on which ASYNCHRONOUS method you follow to achieve this.
You can use the function Socket.Select()
Select(IList checkRead, IList checkWrite, IList checkError, int microSeconds)
to check multiple Sockets for both readability or writability. The advantage is that this is simple. It can be done from a single thread and you can specify how long you want to wait, either forever (-1 microseconds) or a specific duration. And you don't have to make your sockets asynchronous (i.e.: keep them blocking).
It also works for listening sockets. It will report writability. when there is a connection to accept. From experimenting, i can say that it also reports readability for graceful disconnects.
It's probably not as fast as asyncrhonous sockets. It's also not ideal for detecting errors. I haven't had much use for the third parameter, because it doesn't detect an ungraceful disconnect.
You should use one socket per thread. Blocking sockets (synchronous) wait for a response before returning. Non-blocking (asynchronous) can peek to see if any data received and return if no data there yet.
I would like to use the C# asynchronous io model for my socket. I have multiple
threads that need to send over the socket. What is the best way of handling this?
A pool of N sockets,with access controlled by lock? Or is asynch send thread-safe
for multiple threads accessing a single socket?
THanks!
Jacko
The async methods already create new threads to send the data. This will probably add unnecessary overhead to your application. If you already have multiple threads, you can create an IDispoable type of object to represent access to the socket and a manager which will control the checkin and checkout for the socket. The socket would checkin itself when the IDisposable dispose method is called. This way you can control what methods your threads can perform on the socket as well.
If the socket is already checked out by another thread, the manager would simply block until it's available.
using (SharedSocket socket = SocketManager.GetSocket())
{
//do things with socket
}
System.Threading.Semaphore is something that becomes handy in synchronizing and racing conditions
We are implementing a C# application that needs to make large numbers of socket connections to legacy systems. We will (likely) be using a 3rd party component to do the heavy lifting around terminal emulation and data scraping. We have the core functionality working today, now we need to scale it up.
During peak times this may be thousands of concurrent connections - aka threads (and even tens of thousands several times a year) that need to be opened. These connections mainly sit idle (no traffic other than a periodic handshake) for minutes (or hours) until the legacy system 'fires an event' we care about, we then scrape some data from this event, perform some workflow, and then wait for the next event. There is no value in pooling (as far as we can tell) since threads will rarely need to be reused.
We are looking for any good patterns or tools that will help use this many threads efficiently. Running on high-end server hardware is not an issue, but we do need to limit the application to just a few servers, if possible.
In our testing, creating a new thread, and init'ing the 3rd party control seems to use a lot of CPU initially, but then drops to near zero. Memory use seems to be about 800Megs / 1000 threads
Is there anything better / more efficient than just creating and starting the number of threads needed?
PS - Yes we know it is bad to create this many threads, but since we have not control over the legacy applications, this seems to be our only alternative. There is not option for multiple events to come across a single socket / connection.
Thanks for any help or pointers!
Vans
You say this:
There is no value in pooling (as far
as we can tell) since threads will
rarely need to be reused.
But then you say this:
Is there anything better / more
efficient than just creating and
starting the number of threads needed?
Why the discrepancy? Do you care about the number of threads you are creating or not? Thread pooling is the proper way to handle large numbers of mostly-idle connections. A few busy threads can handle many idle connections easily and with fewer resources required.
Use the socket's asynchronous BeginReceive and BeginSend. These dispatch the IO operation to the operating system and return immediately.
You pass a delegate and some state to those methods that will be called when an IO operation completes.
Generally once you are done processing the IO then you immediately call BeginX again.
Socket sock = GetSocket();
State state = new State() { Socket = sock, Buffer = new byte[1024], ThirdPartyControl = GetControl() };
sock.BeginReceive(state.Buffer, 0, state.Buffer.Length, 0, ProcessAsyncReceive, state);
void ProcessAsyncReceive(IAsyncResult iar)
{
State state = iar.AsyncState as State;
state.Socket.EndReceive(iar);
// Process the received data in state.Buffer here
state.ThirdPartyControl.ScrapeScreen(state.Buffer);
state.Socket.BeginReceive(state.buffer, 0, state.Buffer.Length, 0, ProcessAsyncReceive, iar.AsyncState);
}
public class State
{
public Socket Socket { get; set; }
public byte[] Buffer { get; set; }
public ThirdPartyControl { get; set; }
}
BeginSend is used in a similar fashion, as well as BeginAccept if you are accepting incoming connections.
With low throughput operations Async communications can easily handle thousands of clients simultaneously.
I would really look into MPI.NET. More Info MPI. MPI.NET also has some Parallel Reduction; so this will work well to aggregate results.
I would suggest utilizing the Socket.Select() method, and pooling the handling of multiple socket connections within a single thread.
You could, for example, create a thread for every 50 connections to the legacy system. These master threads would just keep calling Socket.Select() waiting for data to arrive. Each of these master threads could then have a thread pool that sockets that have data are passed to for actual processing. Once the processing is complete, the thread could be passed back to the master thread.
The are a number of patterns using Microsoft's Coordination and Concurrency Runtime that make dealing with IO easy and light. It allows us to grab and process well over 6000 web pages a minute (could go much higher, but there's no need) in a crawler we are developing. Definitely worth a the time investment required to shift your head into the CCR way of doing things. There's a great article here:
http://msdn.microsoft.com/en-us/magazine/cc163556.aspx
A few words about an ongoing design and implementation
I send a lot of requests to the remote application (running on a different
host, of course), and the application send back data.
About client
Client is a UI that spawn a separate thread to submit and process the requests. Once it submits all the requests, it calls Wait. And the Wait will parse all events coming the app and invoke client's callbacks.
Below is the implementation of Wait.
public void Wait (uint milliseconds)
{
while(_socket.IsConnected)
{
if (_socket.Poll(milliseconds, SelectMode.SelectRead))
{
// read info of the buffer and calls registered callbacks for the client
if(_socket.IsAvailable > 0)
ProcessSocket(socket);
}
else
return; //returns after Poll has expired
}
}
The Wait is called from a separate thread, responsible for managing network connection: both inbound and outbound traffic:
_Receiver = new Thread(DoWork);
_Receiver.IsBackground = true;
_Receiver.Start(this);
This thread is created from UI component of the application.
The issue:
client sometimes sees delays in callbacks even though main application has sent the data on time. Notably, one the message in Poll was delayed until I client disconnected, and internally I called:
_socket.Shutdown(SocketShutdown.Both);
I think something funky is happening in the Poll
Any suggestions on how to fix the issue or an alternative workaround?
Thanks
please let me know if anything is unclear
A couple of things. First, in your example, is there a difference between "_socket" and "socket"? Second, you are using the System.Net.Sockets.Socket class, right? I don't see IsConnected or IsAvailable properties on that class in the MSDN documentation for any .NET version going back to 1.1. I assume these are both typing mistakes, right?
Have you tried putting an "else" clause on the "IsAvailable > 0" test and writing a message to the Console/Output window, e.g.,
if (_socket.IsAvailable > 0) {
ProcessSocket(socket);
} else {
Console.WriteLine("Poll() returned true but there is no data");
}
This might give you an idea of what might be going on in the larger context of your program.
Aside from that, I'm not a big fan of polling sockets for data. As an alternative, is there a reason not to use the asynchronous Begin/EndReceive functions on the socket? I think it'd be straightforward to convert to the asynchronous model given the fact that you're already using a separate thread to send and receive your data. Here is an example from MSDN. Additionally, I've added the typical implementation that I use of this mechanism to this SO post.
What thread is calling the Wait() method? If you're just throwing it into the UI threadpool, that may be why you experience delays sometimes. If this is your problem, then either use the system threadpool, create a new one just for the networking parts of your application, or spawn a dedicated thread for it.
Beyond this, it's hard to help you much without seeing more code.