I am in the process of creating a TCP remote desktop broadcasting application. (Something like Team Viewer or VNC)
the server application will
1. run on a PC listening for multiple clients on one Thread
2. and on another thread it will record the desktop every second
3. and it will broadcast the desktop for each connected client.
i need to make this application possible to run on a connections with a 12KBps upload and 50KBps download DSL connection (client's and server).
so.. i have to reduce the size of the data/image i send per second.
i tried to reduce by doing the following.
I. first i send a Bitmap frame of the desktop and each other time i send only the difference of the previously sent frame.
II. the second way i tried was, each time i send a JPEG frame.
i was unsuccessful to send a JPEG frame and then each next time send the difference of the previously sent JPEG frame.
i tried using lzma compression (7zip SDK) for the when i was transmitting the difference of the Bitmap.
But i was unsuccessful to reduce the data into 12KBps. the maximum i was able to achieve was around 50KBps.
Can someone advice me an algorithm/procedure for doing this?
What you want to do is do what image compression formats do, but in a custom way (Send only the changes, not the whole image over and over). Here is what I would do, in two phases (phase 1: get it done, prove it works, phase 2: optimize)
Proof of concept phase
1) Capture an image of the screen in bitmap format
2) Section the image into blocks of contiguous bytes. You need to play around to find out what the optimal block size is; it will vary by uplink/downlink speed.
3) Get a short hash (crc32, maybe md5, experiment with this as well) for each block
4) Compress (don't forget to do this!) and transfer each changed block (If the hash changed, the block changed and needs to be transferred). Stitch the image together at the receiving end to display it.
5) Use UDP packets for data transfer.
Optimization phase
These are things you can do to optimize for speed:
1) Gather stats and hard code transfer speed vs frame size and hash method for optimal transfer speed
2) Make a self-adjusting mechanism for #1
3) Images compress better in square areas rather then contiguous blocks of bytes, as I explained in #2 of the first phase above. Change your algorithm so you are getting a visual square area rather than sequential blocks of lines. This square method is how the image and video compression people do it.
4) Play around with the compression algorithm. This will give you lots of variables to play with (CPU load vs internet access speed vs compression algorithm choice vs frequency of screen updates)
This is basically a summary of how (roughly) compressed video streaming works (you can see the similarities with your task if you think about it), so it's not an unproven concept.
HTH
EDIT: One more thing you can experiment with: After you capture a bitmap of the screen, reduce the number of colors in it. You can save half the image size if you go from 32 bit color depth to 16 bit, for example.
Related
I am using MagicLeap Headset and MLCamera API to capture a rawvideocapture which the output is YUV_420_888 which I am assuming is YUV420P. API returns yBuffer, uBuffer and vBuffer separately. I am having trouble combining these channels on c# without bitmap since I am using unity I am using Mono. What I am trying to do is to combine these channels and send it to my remote python server to process the image that I have captured. To process the image, it needs to be a full image. I have tried just using the Y plane and creating a gray-scale image but the server couldn't process it so I need to combine all 3 channels on the client and then compress it to preferable jpeg since the size decreases drastically and I am processing the images at 420x420 size although the camera output is 1920x1080. I am trying different methods for the last week and half but couldn't find something decent. There are a couple methods especially for Android but I don't want to convert it to NV21 if I don't have to. I have also seen one with ARCore but I also can't use that one since I am using MagicLeap.
PS: The latency and the processing time is super important so if there is a way to convert YCbCr to jpeg directly without converting it to RGB, I think it would help my case better but I don't know if it's possible. In general I think I lack some basic knowledge that prevents me from going further.
Any help is greatly appreciated!
I've tried something similar in the past, was beating my head on the YUV420 stuff for weeks, but couldn't solve it. In the end, I bought this library OpenCV for Unity. It has custom parts just for the MagicLeap, including reading frames from the Camera in reduced resolution for speed up.
I'm not sure how ever if it managed real time. Maybe in the reduced resolution, yes.
I'm making a project using c# 2013, windows forms and this project will use an IP camera to display a video for a long time using CGI Commands.
I know from the articles I've read that the return of the streaming video of the IP camera is a continuous multi-part stream. and I found some samples to display the video like this one Writing an IP Camera Viewer in C# 5.0
but I see a lot of code to extract the single part that represents a single image and displays it and so on.
Also I tried to take continuous snap shots from the camera using the following code.
HttpWebRequest req=(HttpWebRequest)WebRequest.Create("http://192.168.1.200/snap1080");
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
Stream strm = res.GetResponseStream();
image.Image = Image.FromStream(strm);
and I repeated this code in a loop that remains for a second and counts the no. of snapshots that were taken in a second and it gives me a number between 88 and 114 snapshots per second
IMHO the first example that displays the video makes a lot of processing to extract the single part of the multi-part response and displays it which may be as slow as the other method of taking a continuous snapshots.
So I ask for other developers' experiences in this issue if they see other difference between the 2 methods of displaying the video. Also I want to know the effect of receiving a continuous multi-part stream on the memory is it safe or will generate an out of memory errors.
Thanks in advance
If you are taking more than 1 jpeg per 1-3 seconds, better capture H264 video stream, it will take less bandwidth and cpu.
Usually mjpeg stream is 10-20 times bigger than the same h264 stream. So 80 snapshots per second is a really big amount.
As long as you dispose of the image and stream correctly, you should not have memory issues. I have done a similar thing in the past with an IP Camera, even converting all the images that I take as a snapshot back into a video using ffmpeg (I think it was).
I'm trying to implement a map of the world that the user can scale and pan around in. I'm running into a problem where my program runs out of memory.
Here's my situation:
0) I'm writing my program in C# and OpenGL
1) I downloaded a JPG image off the web that's a detailed image of the world that is 19 megs in size. Each pixel is 24 bits long.
2) I chop this image up into tiles at run time when my program initializes. I feed each tile into OpenGL using glGenTexture, glBindTexture, and glTexImage2d.
3) In my drawing routine, I draw the tiles one by one and try to draw all of them regardless as to whether or not they're on the screen (that's another problem, I know).
For a small image, everything works fine. For large images however, when I decode the JPG into a bitmap so that I can chop it up, my program's memory footprint explodes to over a gigabyte. I know that JPGs are compressed and the .NET JpegBitmapDecoder object is doing its job decompressing a huge image which is probably what gets me into trouble.
My question is, what exactly can one do in this kind of a situation? Since the C# decoding code is the 1st responsible party for blowing up memory, is there an alternate method of creating tiles and feeding them to OpenGL that I should pursue instead (minus bitmaps)?
Obviously you should buy more RAM!
You can store textures in a compressed format on the GPU. See this page (also the SC3T/DXT section.) You may also want to swap textures in and out of video memory as needed. Don't keep the uncompressed data around if you don't need it.
I'd look into decoding the jpg to a bitmap file and using a stream on the bitmap to asynchronously load the relevant tiles data in view (or as much as memory can handle) - similar to e.g. Google Maps on Android.
I also saw this, but can not confirm that "OOM from a Bitmap is very suspect. This class will throw the memory exception for just about anything that goes wrong, including bad parameters etc."
Recently, I finished my conference application. People can talk and watch to each other. Therefore I capture images (IntPtr of buffer converted to JPEG) from the webcam (DirectShow library). Right now I do not have any problems, since the program was used in a LAN only. But I'm planning to implement a internet version of it.
So my question is: Should I use something else than JPEG? Should I compare image x and image x+1 and only send differences? Should I use Motion-JPEG? (Sorry, I do not know anything about motion-jpeg, but it sounds relevant).
you are on the right track with recognizing that images change little from frame to frame, and that sending a sequence of jpegs is not the way to go. I believe mjpeg sends a sequence of jpegs, and is a poor choice. I do not use c#, but i believe that ffmpeg (a video compression library) makes a c# wrapper.
FFmpeg is extremely fast, but is not really well documented and is pure ANSI-C. I think that a better approach in your case is, as you already thought, to compress the difference between image x and image x-1, this should be enough to provide a significant bandwidth saving.
You should also include a method to compress the whole frame every once in a while, or compress the whole image when the difference with the previous one is above a certain threshold
I am solving a problem of transferring images from a camera in a loop from a client (a robot with camera) to a server (PC).
I am trying to come up with ideas how to maximize the transfer speed so I can get the best possible FPS (that is because I want to create a live video stream out of the transferred images). Disregarding the physical limitations of WIFI stick on the robot, what would you suggest?
So far I have decided:
to use YUV colorspace instead of RGB
to use UDP protocol instead of TCP/IP
Is there anything else I could do to get the maximum fps possible?
This might be quite a bit of work but if your client can handle the computations in real time you could use the same method that video encoders use. Send a key frame every say 5 frames and in between only send the information that changed not the whole frame. I don't know the details of how this is done, but try Googling p-frames or video compression.
Compress the difference between successive images. Add some checksum. Provide some way for the receiver to request full image data for the case where things get out of synch.
There are probably a host of protocols doing that already.
So, search for live video stream protocols.
Cheers & hth.,