Gesture Recognition Algorithm - Kinect

Gesture Recognition Algorithm - Kinect - c#

I'm developing an application for the Kinect for my final year university project, and I have a requirement to develop a number of gesture recognition algorithms. I'd appreciate some advice on this.
My initial algorithm is detecting the users hand moving closer towards the kinect, within a certain time frame. For now i'll say this is an arbitrary 500ms.
My idea is as follows:
Record z-axis position every 100ms and store in List.
Each time a new position is recorded, check the z-position for each of the previous 4 positions in the List.
If the z position has varied by the required distance between any of those individually or collectively, fire off a gesture recognised event.
If gesture recognised, clear List, and start again.
This is the first time that I have tried anything like this, and would like some advise on my initial naive implementation.
Thanks.

Are you going to use the official Kinect SDK or opensource drivers(libfreenect or OpenNI) ?
If you're using the Kinect SDK you can start by having a look at something like:
Kinect SDK Dynamic Time Warping (DTW) Gesture Recognition
Candescent NUI
(Candescent NUI focuses more on finger detection though)
If you're planning to use opensource drivers, try OpenNI and NITE.
NITE comes with hand tracking and gestures(swipe, circle control, 2d sliders, etc.).
The idea is to at least have hand detection and carry on from there. If you've got that, you could implement something like an adaptation of the Unistroke Gesture Recognizer or look into other techniques like Motion Templates/MotionHistory, etc....adapting them to the new data you can play with now.
Goodluck!

If you're just trying to recognise the user swinging her hand towards you, your approach should work (despite being very susceptible to misfiring due to noisy data). What you're trying to do falls very nicely in the field of pattern recognition. For this, and very similar tasks, people very often use hidden Markov models with great success. You might want to check the Wikipedia article. I'm not a C# person, but as far as I know, Microsoft has very nice statistical inference libraries for C#, and they will definitely include HMM implementations.

Related

Gesture recognition with Unity, Oculus Quest and the Oculus Quest Integration package

I'm trying to implement hand gesture recognition for Oculus Quest with Unity and the Unity Oculus integration package.
I've read the "Hand Tracking in Unity" documentation on the Oculus developer website, but they only talk about getting the current pinch of the fingers, which is not what I want:
https://developer.oculus.com/documentation/unity/unity-handtracking/
I thought about getting fingers flexion for each finger (with a value between 0 and 1 for example), and then training a k-NN model with the 5 features to then be able to recognize the nearest gesture. But I've been searching for hours and didn't find anything about getting finger position, the only thing I found is getting the pinch.
By looking in the OVRSkeleton.cs file (from the Oculus Integration package), I've been able to get the current Transform for each bone (so the position as a vector and the rotation as a quaternion), but I don't really know how to calculate or get an estimate for the finger flexion with that (or anything useful to perform gesture recognition)
OVRSkeleton skeleton = GetComponent<OVRSkeleton>();
skeleton.Bones[(int) OVRPlugin.BoneId.Hand_Index1].Transform.position
skeleton.Bones[(int) OVRPlugin.BoneId.Hand_Index1].Transform.rotation
The list of bones IDs is in the "Hand Tracking in Unity" documentation page.
In fact, what I want to implement seems to look exactly like this package:
https://assetstore.unity.com/packages/tools/integration/vr-hand-gesture-recognizer-oculus-quest-hand-tracking-168685
Any help, ideas or comments about how to calculate fingers flexion, or any other solution to implement gesture recognition would be greatly appreciated!
Thanks
A few things/links I've explored so far:
https://www.reddit.com/r/OculusQuest/comments/elrn7a/unity_hand_tracking_and_different_gestures/
https://forums.oculusvr.com/developer/discussion/89615/detect-custom-hand-gestures

https://github.com/jorgejgnz/HandTrackingGestureRecorder is something im trying currently, the docs dont say it has a depenedency, but apparently it does, and i dont have it working yet.
you can access the bone rotations at runtime and compare those to a recorded gesture. i think its closer to "template matching" than machine learning, measuring the error between two poses.

different images from different point of view

I want different images to be displayed from different point of view. For the whole concept explaination please look at the images. they explain my idea/query!
As in the first image you see that there are three people at different angle looking at the monitor. Now i want the webcam to track the eyes and show the particular defined image to the user> For example: If user is at 45 degree angle then show image1.png
Depending upon the user's prespective of watching. The computer should show the image.
(the lady is the game character for representation purpose)
Can you please guide me on what steps can be taken to accomplish this? Is there any plugin available for unity that tracks faces? Please guide me
Also thanks for the compliments on my sketching skills xD

Stackoverflow is not really meant to recommend plugins, since the choice is usually opinion based so there is no exact answer.
That being said, on of the most common used API for computer vision (meaning interpreting images, including face recognition) is OpenCV, so that could be a good start for you to look at that.
And fortunately for you, there is a Unity plugin for OpenCV
It is too broad to give you more details about how it works here. You should try to make it work, and if you have a problem with your code, open a new question with the code portion that you struggle with.
PS: nice sketching skills

Perhaps easier option would be to use Kinect
(trying to detect face or eyes from that far might be shaky?)
With Kinect you can get skeletons for multiple people, and getting the angle between target and those kinect avatars would be easy.
If there is no space to put kinect in good position,
could consider placing it on the ceiling above (and then use depth data only to detect people in its view)
Only issue is that apparently Microsoft has stopped Windows kinect support,
so you would need to find 2nd hand versions.. (Unity Asset store still has some kinect plugins and examples available)
https://www.polygon.com/2018/1/2/16842072/xbox-one-kinect-adapter-out-of-stock-production-ended
Or look for kinect alternatives that work with unity, try RealSense cameras:
https://www.intel.sg/content/www/xa/en/architecture-and-technology/realsense-overview.html

Get user inputs from the webcame for the game

I'm creating a simple game using Unity Studio which uses arrow keys to move the player. Now what I want to do is, use webcam as a movement detecting device and track user's movements and move the player according to them. (For example, when user move his hand to right, webcam can track it and move the player to the right...)
So, is this possible ? If so, what are the techniques APIs I should use for this...?
Thanks!

Have a look at OpenCV, it is being used a lot in the field of body and head tracking, and there's a unity plugin which implements it that might be useful.
Video Demo

It can't. But there is a lot of stuff out there on the internet.
This one has some interesting looking links.
Emgu CV looks interesting too.
There is some JavaScript handtracking tool too.
And of course there's kinect, but you need the 3d sensor.
You could also use LeapMoution.

Detecting fingers in Kinect for windows sdk 1.5 c#

I'm now detecting all the skeleton in a wpf application, I want to know how to detect the fingers to appear with the skeleton? I'm using microsoft Kinect for windows sdk ver 1.5
Many thanks

The Kinect unfortunately is not sensitive enough to recognize fingers so the library will not provide that as part of the skeleton. Maybe the Kinect 2.0 rumored to come out with the Xbox 720 will be able to provide that level of detail.

Candescent NUI might be what you're looking for. As OpenUserX03 said, however, the Kinect isn't ideal for this task. Perhabs you should have a look at the coming-up LEAP technology, which specializes in finger detection.

The cameras on the Kinect are not meant to be able to do joint tracking for the hands to that level of detail. Tracking the individual fingers is possible but wont be very reliable. To represent a players hand in the skeleton, you can check if the players hand is opened or closed. A possible way to see if the hand is open or closed would be to do pixel checks in an area surrounding the hand. This way with some tuning you could calculate how much of that area is the hand (using the depth and color stream) and how much is not. For example: If 40% of that area is the same depth as the hand joint, the hand is closed in a fist. If 70% of that area is the same depth as the hand joint, the hand is open. Then you could possibly use the angle of the elbow and wrist joint to be able to represent a closed or open hand at that angle on the skeleton.

People Counting System

I want to develop a "People Counting System" using OpenCV (or Emgu CV).
Please guide me on how to implement or lead me to some examples or open source projects.
(I have done some work: extracting diff then threshold to delete background, using motion history and like that; still no good results.)
Edit 1: I am counting a high people flow (a dozen of them may come through simultaneously).
Edit 2: It must be at least 80% accurate. People are walking through a door that is almost 5 meters wide. The problem is I have no control on the position or angle of the camera. Camera is shouting the place from a 10m distance at a 2.5m height.
Thank you

If you call a people counting system a system that counts people that are in a room then I recommend you implement the hardware with a microcontroller with 2 lazers(normal lazer toys work) and 2 photoresistors.For the microcontroller I recomen you use Arduino.And then make an C# application that has a SerialPort object and reads the data that the arduino sends through the USB.The arduino will send 1 for "someone entered the room" and 0 for "someone left the room" for example.Then the logging and statistics can be done easily in C#.
Arduiono Site:here
Photoresistor for $1: here
This solution is alot cheaper and easyer to implement than using a camera that is with a fairly good quality.
Hope I helped you.

Check out the HOG pedestrian detector that comes with recent versions of OpenCV (>= 2.2).
See modules/objdetect/src/hog.cpp and samples/cpp/peopledetect.cpp in the OpenCV sources. Unfortunately there is no official documentation about it yet.

This would help you to count moving things including people: Motion Detection project on CodeProject

Are people the only kind of "entities" in the scene? If this is not the case, do you care about considering a person some other kind of thing that moves through the scene? Because if that is the case, you could just count blobs that come in or come out from the scene. It may sound a bit naive but I will take some kind of motion image, group motion pixels by distance in clusters. Your distance metric could take into account some restrictions, such as that people will "often" stand so pixels in a cluster should group around some kind of regression line (an straight-up line if the camera is aligned with de floor). It shouldn't be necessary to track them in the scene, just noticing when they enter or they leave, though you'd get some issues with, for example, people entering on their own in the scene and leaving in pairs or in groups... Good luck :)

I think if you have dense people crowd with a lot of occlusions you have to use some machine learning algorithm, for example you can use Implicit Shape Model for features.

It really depends on the position of the camera. Assuming that you can get front facing profiles of the people in the images:
This problem is basically face detection and recognition.
There are many ways to go about finding faces, but this is the approach that I'm a little more familiar with.
For the face detection you need to do image segmentation on the skin tone color. This will extract skin regions. [Arms, the chest (for those wearing V cut tops), face, legs, etc] Then you would need to line up the profiles of the skin regions to the profile of your trained faces.
[You'll need to use Eigenfaces to create a generic profile of what a face looks like]
If the skin region lines up and doesn't devate too far from the profile, then it is considered a face. Once the face is confirmed, then add it into the eigenfaces data store [for recognition]. To save processing you might want to consider limiting the search area if you are looking for a previous face. [Given the frame rate, and last time the person was seen]
If you are referring to "Crowd flow" I think you just mean the density of faces in a crowd.
Now you've confirmed that a moving object in the video is a person. Now you just need to note that and then make sure that you don't consider them as a new person again.
This approach: Really depends on your ability to detect face regions. This may not work if the people in the video are looking down, not fitting the profile of the trained data etc. Also it may be effected if a person puts on sunglasses within the video. [Probably would be considered a "new face"]

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.