Unity: How can I get realtime image capturing at ~60fps?

Unity: How can I get realtime image capturing at ~60fps? - c#

I am writing an application in Unity which will be required to capture an image from a camera every frame (at ~60fps), and send the resultant data to another service running locally.
The issue is, I am aware that capturing the rendered data from the camera can cause massive frame rate drops (as explained in this article) when using the GetPixels() method. The article explains that "GetPixels() blocks for ReadPixels() to complete" and "ReadPixels() blocks while flushing the GPU" which is why the GPU and CPU have to sync up, resulting in a lag.
I have produced a sample project with a script attached which simply outputs frames to a file as a PNG to replicate the functionality of the program I wish to create. I have done my best to implement what is described in the article, namely allowing the GPU to render a frame, then wait a few frames before calling GetPixels() so as not to cause the GPU and CPU to forcefully sync up. However, I really haven't made any progress with it. The project still plays at about 10-15fps.
How can I achieve a realtime capture of 60 frames per second in Unity?
using System;
using System.Collections;
using System.IO;
using UnityEngine;
namespace Assets
{
public class MyClass: MonoBehaviour
{
private const float reportInterval = 0.5f;
private int screenshotCount = 0;
private const float maxElapsedSecond = 20;
private string screenshotsDirectory = "UnityHeadlessRenderingScreenshots";
public Camera camOV;
public RenderTexture currentRT;
private int frameCount = 0;
private Texture2D resultantImage;
public void Start()
{
camOV.forceIntoRenderTexture = true;
if (Directory.Exists(screenshotsDirectory))
{
Directory.Delete(screenshotsDirectory, true);
}
if (!Application.isEditor)
{
Directory.CreateDirectory(screenshotsDirectory);
camOV.targetTexture = currentRT;
}
}
// Update is called once per frame
public void Update()
{
//Taking Screenshots
frameCount += 1;
if (frameCount == 1)
{
TakeScreenShot();
}
else if (frameCount == 3)
{
ReadPixelsOut("SS_"+screenshotCount+".png");
}
if (frameCount >= 3)
{
frameCount = 0;
}
}
public void TakeScreenShot()
{
screenshotCount += 1;
RenderTexture.active = camOV.targetTexture;
camOV.Render();
resultantImage = new Texture2D(camOV.targetTexture.width, camOV.targetTexture.height, TextureFormat.RGB24, false);
resultantImage.ReadPixels(new Rect(0, 0, camOV.targetTexture.width, camOV.targetTexture.height), 0, 0);
resultantImage.Apply();
}
private void ReadPixelsOut(string filename)
{
if (resultantImage != null)
{
resultantImage.GetPixels();
RenderTexture.active = currentRT;
byte[] bytes = resultantImage.EncodeToPNG();
// save on disk
var path = screenshotsDirectory + "/" + filename;
File.WriteAllBytes(path, bytes);
Destroy(resultantImage);
}
}
}
}
The article implies that it is possible, but I haven't managed to get it to work.
Many thanks in advance for your help.

I am not sure if OP still need the answer. But in case someone in the future getting the same problem, Let me share what i found.
https://github.com/unity3d-jp/FrameCapturer
This is a plugin designed for rendering animation video in Unity editor. But it can also work in standalone. In my case, i take some part of it, and make my app stream Motion Jpeg. I did it with 30fps, never tried 60fps

Related

Issue accessing HoloLens 2 audio data using Microsoft WindowsMicrophoneStream

I am trying to access the raw (float[]) values of the HoloLens 2's embedded microphone in real time. I do not need to record this data or playback this data, purely to sample is the user is speaking at any given slice in time, as recorded by the HL2. I am using the MicrophoneAmplitudeDemo.cs demo here nearly verbatim. I attach this script to a Unity GameObject, and have modified the script only to print the average amplitude every update, purely as a way to debug the output. When the script runs, the returned float values are always 0. I have already double checked the permissions for the microphone in the manifest and the initial popup permission windows are answered "yes". The code, modified from the original MS sample only to print the average amplitude, is below.
TO try to fix the problem, I have already diabled all other functionality in this program (eye tracking, onboard ML inference, etc) to ensure that wasn't the problem. I have also tried another MS sample (MicStreamDemo) with the exact same result. The debugging window throws no errors, but merely print zeros when I print the current values of the mic stream.
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
[RequireComponent(typeof(AudioSource))]
public class AudioCaptureUtility : MonoBehaviour
{
[SerializeField]
[Tooltip("Gain to apply to the microphone input.")]
[Range(0, 10)]
private float inputGain = 1.0f;
[SerializeField]
[Tooltip("Stream Type")]
public WindowsMicrophoneStreamType streamType= WindowsMicrophoneStreamType.HighQualityVoice;
/// <summary>
/// Class providing microphone stream management support on Microsoft Windows based devices.
/// </summary>
private WindowsMicrophoneStream micStream = null;
/// <summary>
/// The average amplitude of the sound captured during the most recent microphone update.
/// </summary>
private float averageAmplitude = 0.0f;
private void Awake()
{
// We do not wish to play the ambient room sound from the audio source.
//gameObject.GetComponent<AudioSource>().volume = 0.0f;
micStream = new WindowsMicrophoneStream();
if (micStream == null)
{
Debug.Log("Failed to create the Windows Microphone Stream object");
}
micStream.Gain = inputGain;
// Initialize the microphone stream.
WindowsMicrophoneStreamErrorCode result = micStream.Initialize(streamType);
if (result != WindowsMicrophoneStreamErrorCode.Success)
{
Debug.Log($"Failed to initialize the microphone stream. {result}");
return;
}
// Start the microphone stream.
// Do not keep the data and do not preview.
result = micStream.StartStream(false, false);
if (result != WindowsMicrophoneStreamErrorCode.Success)
{
Debug.Log($"Failed to start the microphone stream. {result}");
}
}
private void OnDestroy()
{
if (micStream == null) { return; }
// Stop the microphone stream.
WindowsMicrophoneStreamErrorCode result = micStream.StopStream();
if (result != WindowsMicrophoneStreamErrorCode.Success)
{
Debug.Log($"Failed to stop the microphone stream. {result}");
}
// Uninitialize the microphone stream.
micStream.Uninitialize();
micStream = null;
}
// Start is called before the first frame update
void Start()
{
}
// Update is called once per frame
private void Update()
{
if (micStream == null) { return; }
// Update the gain, if changed.
if (micStream.Gain != inputGain)
{
micStream.Gain = inputGain;
}
float[] tempBuffer = new float[5];
OnAudioFilterRead(tempBuffer, 2);
if(averageAmplitude == 0.0f)
{
Debug.Log("Average Amp is Zero");
//Debug.Log(averageAmplitude.ToString("F9"));
}
}
private void OnAudioFilterRead(float[] buffer, int numChannels)
{
if (micStream == null) { return; }
// Read the microphone stream data.
WindowsMicrophoneStreamErrorCode result = micStream.ReadAudioFrame(buffer, numChannels);
if (result != WindowsMicrophoneStreamErrorCode.Success)
{
Debug.Log($"Failed to read the microphone stream data. {result}");
}
float sumOfValues = 0;
// Calculate this frame's average amplitude.
for (int i = 0; i < buffer.Length; i++)
{
if (float.IsNaN(buffer[i]))
{
buffer[i] = 0;
}
buffer[i] = Mathf.Clamp(buffer[i], -1.0f, 1.0f);
sumOfValues += Mathf.Clamp01(Mathf.Abs(buffer[i]));
}
averageAmplitude = sumOfValues / buffer.Length;
}
}
EDIT: The pictures below are screenshots of the errors. I was able to get some raw float data printed, but the data stream ends during initialization each time. I simply print the current value of averageAmplitude each Update(). The InitializeFrameReader message are from a Windows MediaCapture instance. To ensure that this isn't the culprit, I remove this functionality and the issues remain. The float values cease and never return. I have waited as long as 5 minutes to ensure they never did come back.

I run into an issue after a few tests, and I am not sure if it is the same issue you have. When initialization of micStream, it sometimes returns 'Already Running', and OnAudioFilterRead() will return 'NotEnoughData'. Then I made some modifications of micStream initialization to solve this issue. You can refer to it.
WindowsMicrophoneStreamErrorCode result = micStream.Initialize(WindowsMicrophoneStreamType.HighQualityVoice);
if (result != WindowsMicrophoneStreamErrorCode.Success && result!=WindowsMicrophoneStreamErrorCode.AlreadyRunning)
{
Debug.Log($"Failed to initialize the microphone stream. {result}");
return;
}
Also, you can get some log information on Windows Device Portal to troubleshoot the issue. Or you can try remote debugging on Unity.

Unity 2018 - OnAudioFilterRead() realtime playback from buffer

Let me start by saying that this is my first time opening up a question here. If I'm unclear in my code formatting or have expressed myself poorly, please let me know and I'd be happy to adjust.
I'm doing conceptual design for a tool to be used in Unity (C#) to allow us to "stream" the output of an AudioSource, in real-time, and then have that same audio be played back from a different GameObject. In essence, we would have a parallel signal being stored in a buffer while the original AudioSource functions as one would expect, sending it's playing clip into the mixer just as is normal.
To achieve this I'm trying to use the audio thread and the OnAudioFilterRead() function. To extract the floating point audio data to pipe into OnAudioFilterRead() i'm using AudioSource.GetOutputData, storing that into an array and then supplying the array to the audio filter. I'm also creating an empty AudioClip, setting it's data from the same array and playing that AudioClip on the new GameObject.
Right now I have the audio being played from the new GameObject, but the result is very distorted and unpleasant, which i'm chalking up to one or more of the following;
The audio buffer is being written to/read from in an unsynched manner
Unitys samplerate is causing problems with the clip. Have tried both 44.1khz and 48khz with subpar results, as well as playing around with import settings on clips. Documentation for Unity 2018.2 is very weak and a lot of older methods are now deprecated.
Aliens.
Ideally, I would be able to stream back the audio without audible artefacts. Some latency i can live with (~30-50ms) but not poor audio quality.
Below is the code that is being used. This script is attached to the GameObject that is to receive this audio signal from the original emitter, and it has its own AudioSource for playback and positioning.
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using UnityEngine;
[RequireComponent(typeof(AudioSource))]
public class ECO_receiver : MonoBehaviour {
public AudioSource emitterSend;
private AudioSource audioSource;
public AudioClip streamedClip;
public bool debugSampleData;
private float[] sampleDataArray;
public float sampleSize;
public float sampleFreq;
private int outputSampleRate;
public bool _bufferReady;
void Start () {
audioSource = GetComponent<AudioSource>();
sampleSize = emitterSend.clip.samples;
sampleFreq = emitterSend.clip.frequency;
sampleDataArray = new float[2048];
streamedClip = AudioClip.Create("audiostream", (int)sampleSize, 1, (int)sampleFreq, false);
audioSource.clip = streamedClip;
audioSource.Play();
_bufferReady = true;
}
private void FixedUpdate()
{
if (emitterSend.isPlaying && _bufferReady == true)
{
FillAudioBuffer();
}
else if (!emitterSend.isPlaying)
{
Debug.Log("Emitter is not playing!");
}
if (debugSampleData && sampleDataArray != null && Input.GetKeyDown("p"))
{
for (int i = 0; i < sampleDataArray.Length; i++)
{
Debug.Log(sampleDataArray[i]);
}
}
else if (sampleDataArray == null)
{
Debug.Log("No data in array!");
}
}
void FillAudioBuffer()
{
emitterSend.GetOutputData(sampleDataArray, 0);
streamedClip.SetData(sampleDataArray, 0);
_bufferReady = false;
}
void OnAudioFilterRead(float[] data, int channels)
{
if (!_bufferReady)
{
for (int i = 0; i < data.Length; i++)
{
data[i] = (float)sampleDataArray[i];
}
_bufferReady = true;
}
}
}
Greatly appreciate any wisdom I might be granted! Thank you!

Converting Texture2D into a video

I've did a lot of research, but I can't find a suitable solution that works with Unity3d/c#. I'm using a Fove-HMD and would like to record/make a video of the integrated camera. So far I managed every update to take a snapshot of the camera, but I can't find a way to merge this snapshots into a video. Does someone know a way of converting them? Or can someone point me in the right direction, in which I could continue my research?
public class FoveCamera : SingletonBase<FoveCamera>{
private bool camAvailable;
private WebCamTexture foveCamera;
private List<Texture2D> snapshots;
void Start ()
{
//-------------just checking if webcam is available
WebCamDevice[] devices = WebCamTexture.devices;
if (devices.Length == 0)
{
Debug.LogError("FoveCamera could not be found.");
camAvailable = false;
return;
}
foreach (WebCamDevice device in devices)
{
if (device.name.Equals("FOVE Eyes"))
foveCamera = new WebCamTexture(device.name);//screen.width and screen.height
}
if (foveCamera == null)
{
Debug.LogError("FoveCamera could not be found.");
return;
}
//-------------camera found, start with the video
foveCamera.Play();
camAvailable = true;
}
void Update () {
if (!camAvailable)
{
return;
}
//loading snap from camera
Texture2D snap = new Texture2D(foveCamera.width,foveCamera.height);
snap.SetPixels(foveCamera.GetPixels());
snapshots.Add(snap);
}
}
The code works so far. The first part of the Start-Method is just for finding and enabling the camera. In the Update-Method I'm taking every update a snapshot of the video.
After I "stop" the Update-Method, I would like to convert the gathered Texture2D object into a video.
Thanks in advance

Create MediaEncoder
using UnityEditor; // VideoBitrateMode
using UnityEditor.Media; // MediaEncoder
var vidAttr = new VideoTrackAttributes
{
bitRateMode = VideoBitrateMode.Medium,
frameRate = new MediaRational(25),
width = 320,
height = 240,
includeAlpha = false
};
var audAttr = new AudioTrackAttributes
{
sampleRate = new MediaRational(48000),
channelCount = 2
};
var enc = new MediaEncoder("sample.mp4", vidAttr, audAttr);
Convert each snapshot to Texture2D
Call consequently AddFrame to add each snapshot to MediaEncoder
enc.AddFrame(tex);
Once done call Dispose to close the file
enc.Dispose();

I see two methods here, one is fast to implement, dirty and not for all platforms, second one harder but pretty. Both rely on FFMPEG.
1) Save every frame into image file (snap.EncodeToPNG()) and then call FFMPEG to create video from images (FFmpeg create video from images) - slow due to many disk operations.
2) Use FFMPEG via wrapper implemented in AForge and supply its VideoFileWriter class with images that you have.
Image sequence to video stream?
Problem here is it uses System.Bitmap, so in order to convert Texture2D to Bitmap you can use: How to create bitmap from byte array?
So you end up with something like:
Bitmap bmp;
Texture2D snap;
using (var ms = new MemoryStream(snap.EncodeToPNG()))
{
bmp = new Bitmap(ms);
}
vFWriter.WriteVideoFrame(bmp);
Both methods are not the fastest ones though, so if performance is an issue here you might want to operate on lower level data like DirectX or OpenGL textures.

Texture load using string to int replace issue

The issue according to my debug log is that my ints counts with no problem however the int to string conversion continues to apply to the original value not the updated counter on runtime. (there are some unused private's here for testing) & the frame values are all good.
a screen shot of my debug log: http://c2n.me/39GlfuI - as you can see the counter increases but 'frame' doesn't.
Hopefully this is self explanatory
using UnityEngine;
using System.Collections;
public class imagecycle : MonoBehaviour
{
public string Startingframe;
private string Nextframe;
private int framecomp = 0;
private int frameint;
private int framestep = 1;
private int maxframe = 119;
private string framestring;
// Use this for initialization
void Start ()
{
Nextframe = ("frame_000");
frameint = 20; // currently adding one to this and resetting on update
}
// Update is called once per frame
void Update ()
{
frameint += framestep;
//Converts framestring to int of frameint -updating frame
framestring = frameint.ToString();
Debug.Log (frameint);
// replaces loaded texture recourse with frame string:
Nextframe = Nextframe.Replace ("000", framestring);
// Loads texture into Currentframe:
Texture Currentframe = Resources.Load (Nextframe) as Texture;
// Applies texture:
renderer.material.mainTexture = Currentframe;
Debug.Log (Currentframe);
if (frameint > 119)
{
frameint = 1;
}
}
void LateUpdate()
{
}
}

that is because in first your next frame is "frame_000" so the replace method will replace 000 with 21 as you can see but after that your nextFrame variable is "frame_21" so there is no "000" in your string so your replace method wont do anything so nextFrame will stay at frame_21
Nextframe = Nextframe.Replace ("000", framestring); wont do anything after the first replace because its string doesnt conatins 000

Ah many thanks, so it was my understanding of the replace function that was incorrect, I assumed it would reset to frame_000 on on each update. Many thanks guys.
And yeah i'll try making it more efficient.
Also ,sorry I can't vote up yet; not enough 'rep'.

Real time object tracking in java(some java API) or C#(emgucv,dshownet,Aforge.NET)

I am doing a project called user initiated real time object tracking system. Here, is what I want to happen in the project:
1) Take a continuous stream from a web camera.
2) Using the mouse a user can draw a square, around an object of interest.
3) Then from there onwards, the square moves along with the object of interest. Thereby, tracking each and every place the object moves hence the term object tracking.
Current Progress
I have used dshownet(.NET wrapper for DirectShow)to take input from the web camera. And I am in the process of splitting the video to frames. I have 4 ways in mind to do the project :
Technique 1
There is a saved video
I load it.
when the video is running, i pause (using a pause button) it, at a particular scene and draw a square on an object.
And when i press play button the square will move along with the object with no/5 seconds processing time [OR] I will give the application some processing time(e.g. 3 minutes) and then it will play from that point onwards with the tracking taking place.
Technique 2
There is a saved video
I load it.
when the video is running i dont pause it but quickly draw a square on an object (when the object is still at some point).
Then the object will be tracked after that with no processing time. [OR] with some processing time (10 sec delay) making the file to play for a little greater time.
Technique 3
I take an input from a web cam for 1 min.
Save that video to a file
And perform Way 1 or Way 2
Technique 4 - (Apparently this seems alot harder)
Take input from a web cam continuously
Draw a square around the object without any pausing, when the object shows no movement (for e.g. when a person is sitting down on a chair)
And then show the tracking by moving the square along with the object with no processing time [OR] slight processing time of 2 secs such that the delay is not significantly apparent.
Objects to track :-
Basically I can track anything, since I use the mouse to draw
I am planning to use the whole body (but if this is troublesome.. next option)
I would try to track the face of an individual (obviously by drawing the area with a mouse.)
Time to code : 1 and 1/2 months
Progress : Still getting errors with the splitting. (Someone suggested to start splitting a saved video first, and I am in the process of trying that now)
MY QUESTIONS
1) Which Technique (out of the four) could I possibly implement in 1 and 1/2 months time frame ?
2) To code, is java + some java framework good for this or C#.net with emgucv/AForge.net/Dshownet [by the way my knowledge in java is good and not so good in C#.net]??
Thanks in advance

Technique 1,2,3 you could implement in Java using the Java Media Framework and ImageJ libraries. For Technique 4 you are better to implement in C++ or other non-interpreted language given the time constraints.

This example basically implements what you mentioned as Technique 4. The user draws a rect around a pattern or object to be tracked. In this case the tracked element is used to control the paddle in the Pong Game. Therefore the user can use objects to play the game in front of camera.
I think it solves the most parts of your problem.
Screenshot:
Source code:
package video.trackingPong;
import java.awt.BorderLayout;
import java.awt.Color;
import java.awt.Container;
import java.awt.event.MouseEvent;
import java.awt.event.MouseListener;
import javax.swing.JFrame;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.JSlider;
import javax.swing.event.ChangeEvent;
import javax.swing.event.ChangeListener;
import marvin.gui.MarvinImagePanel;
import marvin.image.MarvinImage;
import marvin.image.MarvinImageMask;
import marvin.io.MarvinImageIO;
import marvin.plugin.MarvinImagePlugin;
import marvin.util.MarvinAttributes;
import marvin.util.MarvinPluginLoader;
import marvin.video.MarvinJavaCVAdapter;
import marvin.video.MarvinVideoInterface;
import marvin.video.MarvinVideoInterfaceException;
public class TrackingPong extends JFrame implements Runnable{
private final static int BALL_INITIAL_PX=100;
private final static int BALL_INITIAL_PY=100;
private final static int BALL_INITIAL_SPEED=3;
private MarvinVideoInterface videoInterface;
private MarvinImagePanel videoPanel;
private Thread thread;
private MarvinImage imageIn,
imageOut;
private JPanel panelSlider;
private JSlider sliderSensibility;
private JLabel labelSlider;
private int regionPx,
regionPy,
regionWidth,
regionHeight;
private boolean regionSelected=false;
private int[] arrInitialRegion;
private int sensibility=30;
// Pong Game Attributes
private double ballPx=BALL_INITIAL_PX,
ballPy=BALL_INITIAL_PY;
private int ballSide=15;
double ballIncX=5;
private double ballIncY=5;
private int imageWidth,
imageHeight;
private Paddle paddlePlayer,
paddleComputer;
private int playerPoints=0,
computerPoints=0;
private MarvinImagePlugin findColorPattern,
flip,
text;
private MarvinImage imageBall,
imagePaddlePlayer,
imagePaddleComputer;
private MarvinAttributes attributesOut;
public TrackingPong(){
videoPanel = new MarvinImagePanel();
try{
// 1. Connect to the camera device.
videoInterface = new MarvinJavaCVAdapter();
videoInterface.connect(0);
imageWidth = videoInterface.getImageWidth();
imageHeight = videoInterface.getImageHeight();
imageOut = new MarvinImage(imageWidth, imageHeight);
// 2. Load Graphical Interface.
loadGUI();
// 3. Load and set up Marvin plug-ins.
findColorPattern = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.pattern.findColorPattern");
flip = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.transform.flip");
text = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.render.text");
text.setAttribute("fontFile", MarvinImageIO.loadImage("./res/font.png"));
text.setAttribute("color", 0xFFFFFFFF);
// 3. Load game images
imageBall = MarvinImageIO.loadImage("./res/ball.png");
imagePaddlePlayer = MarvinImageIO.loadImage("./res/paddleA.png");
imagePaddleComputer = MarvinImageIO.loadImage("./res/paddleB.png");
attributesOut = new MarvinAttributes(null);
// Set up plater and computer paddle properties.
paddlePlayer = new Paddle();
paddlePlayer.px=100;
paddlePlayer.py=420;
paddlePlayer.width=100;
paddlePlayer.height=30;
paddleComputer = new Paddle();
paddleComputer.px=100;
paddleComputer.py=30;
paddleComputer.width=100;
paddleComputer.height=30;
thread = new Thread(this);
thread.start();
}
catch(MarvinVideoInterfaceException e){
e.printStackTrace();
}
}
private void loadGUI(){
setTitle("Video Sample - Tracking Pong");
videoPanel.addMouseListener(new MouseHandler());
sliderSensibility = new JSlider(JSlider.HORIZONTAL, 0, 60, 30);
sliderSensibility.setMinorTickSpacing(2);
sliderSensibility.setPaintTicks(true);
sliderSensibility.addChangeListener(new SliderHandler());
labelSlider = new JLabel("Sensibility");
panelSlider = new JPanel();
panelSlider.add(labelSlider);
panelSlider.add(sliderSensibility);
Container container = getContentPane();
container.setLayout(new BorderLayout());
container.add(videoPanel, BorderLayout.NORTH);
container.add(panelSlider, BorderLayout.SOUTH);
setSize(videoInterface.getImageWidth()+20,videoInterface.getImageHeight()+100);
setVisible(true);
}
public void run(){
long time = System.currentTimeMillis();
int ticks=0;
// The game loop.
try{
while(true){
ticks++;
if(System.currentTimeMillis() - time > 1000){
System.out.println("FPS: "+ticks+" ");
ticks=0;
time = System.currentTimeMillis();
}
// 1. Get the current video frame.
imageIn = videoInterface.getFrame();
MarvinImage.copyColorArray(imageIn, imageOut);
// 2. Flip the frame horizontally so the player will see him on the screen like looking at the mirror.
flip.process(imageOut, imageOut);
if(regionSelected){
// 3. Find the player paddle position.
findColorPattern.setAttribute("differenceColorRange", sensibility);
findColorPattern.process(imageOut, imageOut, attributesOut, MarvinImageMask.NULL_MASK, false);
regionPx = (Integer)attributesOut.get("regionPx");
regionPy = (Integer)attributesOut.get("regionPy");
regionWidth = (Integer)attributesOut.get("regionWidth");
regionHeight = (Integer)attributesOut.get("regionHeight");
// 4. Invoke the game logic
pongGame();
// 5. Draw the detected region
imageOut.drawRect(regionPx, regionPy, regionWidth, regionHeight, Color.red);
// 6. Draw the player and computer points.
text.setAttribute("x", 105);
text.setAttribute("y", 3);
text.setAttribute("text", "PLAYER:"+playerPoints);
text.process(imageOut, imageOut);
text.setAttribute("x", 105);
text.setAttribute("y", 460);
text.setAttribute("text", "COMPUTER:"+computerPoints);
text.process(imageOut, imageOut);
}
videoPanel.setImage(imageOut);
}
}
catch(MarvinVideoInterfaceException e){
e.printStackTrace();
}
}
private void pongGame(){
// 1. Move the ball
ballIncX*=1.001;
ballIncY*=1.001;
ballPx+=ballIncX;
ballPy+=ballIncY;
// 2. Set the player paddle position to the the coordinates of the detected region.
paddlePlayer.px = regionPx+((regionWidth-paddlePlayer.width)/2);
// 3. Invoke simple computer AI
computerAI();
// 4. Check object positions and collisions.
checkPaddlePosition(paddlePlayer);
checkPaddlePosition(paddleComputer);
collisionScreen();
collisionTap();
// 5. Draw the game elements.
imageOut.fillRect(horizontalMargin, 0, 5, imageHeight, Color.black);
imageOut.fillRect(imageWidth-horizontalMargin, 0, 5, imageHeight, Color.black);
combineImage(imagePaddlePlayer, paddlePlayer.px, paddlePlayer.py);
combineImage(imagePaddleComputer, paddleComputer.px, paddleComputer.py);
combineImage(imageBall,(int)ballPx, (int)ballPy);
}
private void checkPaddlePosition(Paddle a_paddle){
if(a_paddle.px < horizontalMargin){
a_paddle.px = horizontalMargin;
}
if(a_paddle.px+a_paddle.width > imageWidth-horizontalMargin){
a_paddle.px = imageWidth-horizontalMargin-a_paddle.width;
}
}
private void computerAI(){
if(ballPx < paddleComputer.px+(paddleComputer.width/2)-10){
paddleComputer.px-=4;
}
if(ballPx > paddleComputer.px+(paddleComputer.width/2)+10){
paddleComputer.px+=4;
}
}
private int horizontalMargin = 100;
private void collisionScreen(){
if(ballPx < horizontalMargin){
ballPx = horizontalMargin;
ballIncX*=-1;
}
if(ballPx+ballSide >= imageWidth-horizontalMargin){
ballPx=(imageWidth-horizontalMargin)-ballSide;
ballIncX*=-1;
}
if(ballPy < 0){
playerPoints++;
ballPx = BALL_INITIAL_PX;
ballPy = BALL_INITIAL_PY;
ballIncY=BALL_INITIAL_SPEED;
ballIncX=BALL_INITIAL_SPEED;
} else if(ballPy+ballSide >= imageHeight){
computerPoints++;
ballPx = BALL_INITIAL_PX;
ballPy = BALL_INITIAL_PY;
ballIncY=BALL_INITIAL_SPEED;
ballIncX=BALL_INITIAL_SPEED;
}
}
private void collisionTap(){
if(ballCollisionTap(paddlePlayer)){
ballIncY*=-1;
ballPy = paddlePlayer.py-ballSide;
}
if(ballCollisionTap(paddleComputer)){
ballIncY*=-1;
ballPy = paddleComputer.py+paddleComputer.height;
}
}
private boolean ballCollisionTap(Paddle a_tap){
if
(
(
ballPx >= a_tap.px && ballPx <= a_tap.px+a_tap.width ||
ballPx <= a_tap.px && ballPx+ballSide >= a_tap.px
)
&&
(
ballPy >= a_tap.py && ballPy <= a_tap.py+a_tap.height ||
ballPy <= a_tap.py && ballPy+ballSide >= a_tap.py
)
)
{
return true;
}
return false;
}
private void combineImage(MarvinImage img, int x, int y){
int rgb;
int width = img.getWidth();
int height = img.getHeight();
for(int iy=0; iy<height; iy++){
for(int ix=0; ix<width; ix++){
if
(
ix+x > 0 && ix+x < imageWidth &&
iy+y > 0 && iy+y < imageHeight
)
{
rgb=img.getIntColor(ix, iy);
if(rgb != 0xFFFFFFFF){
imageOut.setIntColor(ix+x, iy+y, rgb);
}
}
}
}
}
public static void main(String args[]){
TrackingPong trackingPong = new TrackingPong();
trackingPong.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
}
private class SliderHandler implements ChangeListener{
public void stateChanged(ChangeEvent a_event){
sensibility = (60-sliderSensibility.getValue());
}
}
private class MouseHandler implements MouseListener{
public void mouseEntered(MouseEvent a_event){}
public void mouseExited(MouseEvent a_event){}
public void mousePressed(MouseEvent a_event){}
public void mouseClicked(MouseEvent a_event){}
public void mouseReleased(MouseEvent event){
if(!regionSelected){
if(arrInitialRegion == null){
arrInitialRegion = new int[]{event.getX(), event.getY(),0,0};
}
else{
arrInitialRegion[2] = event.getX()-arrInitialRegion[0];
arrInitialRegion[3] = event.getY()-arrInitialRegion[1];
findColorPattern.setAttribute("regionPx", arrInitialRegion[0]);
findColorPattern.setAttribute("regionPy", arrInitialRegion[1]);
findColorPattern.setAttribute("regionWidth", arrInitialRegion[2]);
findColorPattern.setAttribute("regionHeight", arrInitialRegion[3]);
regionSelected = true;
}
}
}
}
private class Paddle{
public int px,py,width,height;
}
}

This article fully explains an algorithm very similar to what you want, and the accompanying source code is here. You can see it in action in this video. The part you would need to add would be, when the user draws a box, to identify which objects (already found by the algorithm) the box is around, and then simply follow the object with that int ID throughout the frames (the algorithm correlates the objects frame-by-frame to know it is the same object throughout the video).
Disclaimer: I'm the author; but I do think this is very useful, and have successfully used the algorithm a lot myself.)
When it comes to commercial computer vision applications, OpenCV and the Point Cloud Library aka PCL are your best friends. And articles like the one linked explains how to use tools like OpenCV to accomplish full stack motion tracking. (The pure Java implementation shows how it works down to the individual pixels.)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.