machine-learning
flask
opencv
mediapipe
All about how to create a simple hand sign recognition Flask app using Mediapipe and OpenCV and use it to control and give commands in games
No one stays forever but you Your goals will happen, so focus
In this blog, I will be discussing how to create a Flask app that can recognize the hand signs you make and can perform the corresponding actions. I will be using Google Mediapipe and OpenCV library for implementing a machine learning hand sign recognition model. So, by the end of this blog, you will be able to make your own flask app that can implement hand sign recognition in real-time.
In this blog, we'll be going over
Let’s get right into the first step!
Flask is a python based framework, so firstly you have to install python in order to get Flask up and running. You can do that by going to python.org and downloading the latest version of python, for this blog I have used the python version 3.8
Now, that you have python up and running, the next step is to install Flask. You can do that simply by going to your terminal or cmd and typing in the command
pip install Flask
This command will install Flask in your system, now that you have flask let's learn to create a simple Flask app to print "Hello World". I will be using vs code as my code editor for this blog but feel free to use any editor you want.
For setting flask app you need to create the following files
So, your app.py file basically contains your backend code, for say the routes, the functions, or the computation part. The statics folder will contain your CSS, Javascript, images or any static file you want for your app. The templates folder will basically contain all your HTML pages.
You can also refer to Flask documentation for making a minimal flask application
Now, go to your app.py file and write the following code out
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello_world():
return "<p>Hello, World!</p>"
if __name__=='__main__':
app.run(debug=True)
In the above code we imported flask and created a flask app instance with app=Flask(__name__) and finally added an HTML page route that is / or the home page, this route will return the response Hello, World!
Now go ahead and run your app.py file
python3 app.py
Congratulations!!! you have successfully created your first flask app
Now, that you know how to create a basic flask app. Let's start with our hand recognition model so that you can implement it in the flask.
Prerequisite
What are we going to discuss
How to capture live feed from your webcam using opencv
Okay, firstly let create a python file, and import the OpenCV library, in case you don't have an OpenCV library you can install it by using the following command
pip install opencv-python
Now, that we have OpenCV setup let's understand how can we use it to get the live feed from our webcam. Go ahead and type out the following code
import cv2 as cv
vid = cv.VideoCapture(0)
So, this will basically create a VideoCapture instance vid. In VideoCapture() you pass a video file path, index, video stream URL, or a stream of images. VideoCapture will store the video stream frame by frame in the vid instance. These frames are then decoded into a NumPy array and then that array can be manipulated to do all kinds of cool stuff like color transformation, pixel rendering, object detection, and much more. You can refer to OpenCV documentation if you can want to take a deep dive into the world of image processing, for this blog we will just stick to our aim i.e, getting the feed from our webcam using OpenCV. So, as you can see I have pass 0 as the VideoCapture parameter here, which means it will take the feed from my internal webcam. In case you want to access an external webcam you have to pass 1,2or3...
Now, that you have the camera feed let's decode the feed and display it using the OpenCV function show. Go ahead and type out the following code
import cv2 as cv
vid=cv.VideoCapture(0)
print(vid.isOpened())
vid.set(cv.CAP_PROP_FPS,60) #set the video FPS to 60
while vid.grab():
state,frame=vid.read()
cv.imshow("video",frame)
if cv.waitKey(16) & 0xFF == ord('q'):
break
vid.release()
print(vid.isOpened()) #value becomes false as vid is released
cv.destroyAllWindows()
Description of each function in brief
How to manipulate captured video frames
We have successfully captured our video using OpenCV, now the next thing is how can we manipulate the video frames that we just captured. So, as I have discussed above that an image/frame is decoded as a NumPy array(mat object) using the read method is OpenCV. So, in order to manipulate the video, we actually have to manipulate the NumPy array which in turn will transform the video
So, as you can see from the above diagram, in a mat object each pixel is a vector containing 3 values B, G, R and it's very clear that if we can change the values of this vector we will be able to perform color transformations in your image. Similarly, if we delete some of these values we should be able to remove pixels from our image. This is a very basic overview of image processing. Now that you know about how images are manipulated in OpenCV let's write some code and test it
import cv2 as cv
img = cv.imread('../../images/ninetail.jpg')
imgGray=cv.cvtColor(img,cv.COLOR_BGR2GRAY)
img[:,:,1],img[:,:,0]=0,0
cv.imshow("gray",imgGray)
cv.imshow("red",img)
cv.waitKey(0)
So, here I have used the cvtColor method to convert the image to a grayscale image, you can also use simple array manipulation, like img[:,:,1],img[:,:,0]=0,0 to get a redscale, here we are simply setting the B and G vector of the image to 0 using python slicing
You can try a bunch of other image process functions, check this out.
How to apply mediapipe hand-tracking model
Okay, so we have covered all the major concepts, now it's time to have some fun with what we have learned. So, we will be learning how to use the google mediapipe library in our project to implement hand detection on the frames captured using OpenCV. Firstly you have to install mediapipe library, for that use the following command-
pip install mediapipe
Understanding Hand landmarks
So, what mediapipe hand-tracking model basically does is that it takes an image/frame as a parameter and checks if there's any hand present in the frame or not if a hand is detected it returns us the coordinates of the hand landmarks shown in figure
Example of data set returned from the above image is :
[[0, 572, 66], [1, 547, 105], [2, 514, 117], [3, 488, 116], [4, 464, 113], [5, 497, 130], [6, 451, 144], [7, 427, 141], [8, 413, 134], [9, 489, 107], [10, 433, 107], [11, 423, 96], [12, 423, 87], [13, 482, 82], [14, 432, 82], [15, 437, 76], [16, 445, 73], [17, 478, 57], [18, 443, 61], [19, 449, 63], [20, 460, 63]]
Here, we can see total 21 1D-arrays are returned each containing 3 elements, the first element represents landmark number, second element represents x coordinate of the landmark and the third element represents the y coordinate.
Now, that you have a basic understanding of hand landmarks, go ahead and try out the following code
import mediapipe as mp
import cv2 as cv
import numpy as np
mp_drawing=mp.solutions.drawing_utils
mp_hands= mp.solutions.hands
cap=cv.VideoCapture(0)
with mp_hands.Hands(
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands:
while cap.grab():
ret,frame=cap.read()
cv.imshow('hand-tracking',frame)
image = cv.cvtColor(cv.flip(frame, 1), cv.COLOR_BGR2RGB)
image.flags.writeable= False
results=hands.process(image)
image.flags.writeable= True
print(result)
cv.imshow('hand-tracking-show',image)
if cv.waitKey(5) & 0xFF == 27:
break
cap.release()
cv.destroyAllWindows()
Here we create two instances of mediapipe, i.e, mp_drawing=mp.solutions.drawing_utils that will be used to draw outline over the detected object and another instance is mp_hands= mp.solutions.hands which we will be using to track our hand. So, here I have created a VideoCapture instance cap that will store my video frames. As you can see I have converted my BGR image to RGB that is essentially cause mediapipe image processing is done on RGB images. mp_hands.Hands(min_detection_confidence=0.5,min_tracking_confidence=0.5) using this method you can pass the parameter like tracking confidence, max hands you want to detect and you can finally get the coordinates of landmark by using process method over Hands.
Now that we are able to detect our hand let's draw over the detected region. For that, we use the mp_drawing instance
Try out the following code
import mediapipe as mp
import cv2 as cv
import numpy as np
mp_drawing=mp.solutions.drawing_utils
mp_hands= mp.solutions.hands
cap=cv.VideoCapture(0)
with mp_hands.Hands(
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands:
while cap.grab():
ret,frame=cap.read()
cv.imshow('hand-tracking',frame)
image = cv.cvtColor(cv.flip(frame, 1), cv.COLOR_BGR2RGB)
image.flags.writeable= False
results=hands.process(image)
image.flags.writeable= True
image=cv.cvtColor(image, cv.COLOR_RGB2BGR)
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
cv.imshow('hand-tracking-show',image)
if cv.waitKey(5) & 0xFF == 27:
break
cap.release()
cv.destroyAllWindows()
In this code we simply iterate over our results variable, results.multi_hand_landmarks will return single hand landmarks in each iteration, if multiple hands are present in the frame. mp_drawing.draw_landmarks is used to draw circles on each landmark and can also draw connections between each landmark using the same.
So, that's how easy it is to get your hand detection model up and running using mediapipe library.
How analyze the data you get from the handing tracking model and calculating hand signs using it
Now that we are able to make our hand detection model successfully, let's see how can we use the data we get from our model to predict hand signs.
Try out the following code
import cv2
import mediapipe as mp
import numpy as np
import time
class handDetector():
def __init__(self, mode=False, maxHands=1, detectionCon=0.5, trackCon=0.5):
self.mode = mode
self.maxHands = maxHands
self.detectionCon = detectionCon
self.trackCon = trackCon
self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands,
self.detectionCon, self.trackCon)
self.mpDraw = mp.solutions.drawing_utils
def findHands(self, img, draw=True):
imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
self.results = self.hands.process(imgRGB)
# print(results.multi_hand_landmarks)
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
if draw:
self.mpDraw.draw_landmarks(img, handLms,
self.mpHands.HAND_CONNECTIONS,
self.mpDraw.DrawingSpec(color=(255,206,85)),
self.mpDraw.DrawingSpec(color=(240,171,0))
)
return img
def findPosition(self, img, handNo=0, draw=True):
lmList = []
if self.results.multi_hand_landmarks:
myHand = self.results.multi_hand_landmarks[handNo]
for id, lm in enumerate(myHand.landmark):
# print(id, lm)
h, w, c = img.shape
cx, cy = int(lm.x * w), int(lm.y * h)
# print(id, cx, cy)
lmList.append([id, cx, cy])
font = cv2.FONT_HERSHEY_SIMPLEX
if id==12:
cv2.putText(img,'12',(cx,cy), font, .5,(0,0,255),2,cv2.LINE_AA)
elif id==11:
cv2.putText(img,'11',(cx,cy), font, .5,(0,0,255),2,cv2.LINE_AA)
return lmList
cap = cv2.VideoCapture(0)
detector = handDetector()
handsign='none'
while True:
success, img = cap.read()
img = cv2.flip(img, 1)
img = detector.findHands(img)
x,y,w= img.shape
font = cv2.FONT_HERSHEY_SIMPLEX
lmList = detector.findPosition(img)
if len(lmList) != 0:
# print(lmList)
if lmList[12][2]>lmList[11][2]:
cv2.putText(img,'down',(10,200), font, 5,(0,0,255),2,cv2.LINE_AA)
else:
cv2.putText(img,'up',(10,200), font, 5,(0,0,255),2,cv2.LINE_AA)
cv2.imshow("hand",img)
if cv2.waitKey(5) & 0xFF == 27:
break
So, in this code, I have basically created a class that will detect my hand and will return the landmarks. What I have done is by using these landmarks I am calculating the hand sign. For say I want to check if the middle finger is open or closed so, that I will compare the y coordinate value of landmarks 12 with the y coordinate of landmark 11 that is
Here you can see that the y coordinate value of landmark 12 is 80 and y coordinate value of landmark 11 is 114, when my middle finger is up, that is y12<y11 when the middle finger is up
But as I put my middle finger down the value of y12 becomes greater than the value of y11, so by comparing the value of landmarks y12 and 11y we can determine whether the middle finger is up or down. Similarly, we can check multiple landmarks to recognize a different kinds of hand signs.
Fun to do for you:
import cv2
import mediapipe as mp
import numpy as np
import time
class handDetector():
def __init__(self, mode=False, maxHands=1, detectionCon=0.5, trackCon=0.5):
self.mode = mode
self.maxHands = maxHands
self.detectionCon = detectionCon
self.trackCon = trackCon
self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands,
self.detectionCon, self.trackCon)
self.mpDraw = mp.solutions.drawing_utils
def findHands(self, img, draw=True):
imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
self.results = self.hands.process(imgRGB)
# print(results.multi_hand_landmarks)
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
if draw:
self.mpDraw.draw_landmarks(img, handLms,
self.mpHands.HAND_CONNECTIONS,
self.mpDraw.DrawingSpec(color=(255,206,85)),
self.mpDraw.DrawingSpec(color=(240,171,0))
)
return img
def findPosition(self, img, handNo=0, draw=True):
lmList = []
if self.results.multi_hand_landmarks:
myHand = self.results.multi_hand_landmarks[handNo]
for id, lm in enumerate(myHand.landmark):
# print(id, lm)
h, w, c = img.shape
cx, cy = int(lm.x * w), int(lm.y * h)
# print(id, cx, cy)
lmList.append([id, cx, cy])
return lmList
cap = cv2.VideoCapture(0)
detector = handDetector()
handsign='none'
while True:
success, img = cap.read()
img = cv2.flip(img, 1)
img = detector.findHands(img)
lmList = detector.findPosition(img)
if len(lmList) != 0:
# print(lmList)
if lmList[12][2]>lmList[11][2] and lmList[16][2]>lmList[15][2] and lmList[4][1]<lmList[5][1] and lmList[7][2]>lmList[8][2] and lmList[19][2]>lmList[20][2]:
handsign="yo"
elif lmList[12][2]>lmList[11][2] and lmList[16][2]>lmList[15][2] and lmList[4][1]<lmList[5][1] and lmList[7][2]<lmList[8][2] and lmList[19][2]>lmList[20][2]:
handsign="thulu"
elif lmList[12][2]>lmList[11][2] and lmList[16][2]>lmList[15][2] and lmList[4][1]<lmList[5][1] and lmList[7][2]>lmList[8][2] and lmList[19][2]<lmList[20][2]:
handsign="L"
elif lmList[12][2]<lmList[11][2] and lmList[16][2]<lmList[15][2] and lmList[4][1]<lmList[5][1] and lmList[7][2]>lmList[8][2] and lmList[19][2]>lmList[20][2]:
handsign="open"
elif lmList[12][2]>lmList[11][2] and lmList[16][2]<lmList[15][2] and lmList[4][1]<lmList[5][1] and lmList[7][2]>lmList[8][2] and lmList[19][2]>lmList[20][2]:
handsign="MidDown"
elif lmList[12][2]<lmList[11][2] and lmList[16][2]>lmList[15][2] and lmList[4][1]<lmList[5][1] and lmList[7][2]>lmList[8][2] and lmList[19][2]>lmList[20][2]:
handsign="MidcloseDown"
elif lmList[12][2]>lmList[11][2] and lmList[4][1]>lmList[5][1] and lmList[16][2]>lmList[15][2] and lmList[7][2]<lmList[8][2] and lmList[19][2]<lmList[20][2]:
handsign="fist"
elif lmList[12][2]<lmList[11][2] and lmList[16][2]>lmList[15][2] and lmList[4][1]<lmList[5][1] and lmList[7][2]>lmList[8][2] and lmList[19][2]<lmList[20][2]:
handsign="LL"
else:
handsign="no move"
print(handsign)
cv2.imshow("hand",img)
if cv2.waitKey(5) & 0xFF == 27:
break
Try out this code on your own and try to think of the logical implementation!!!
The final step is to set up webcam streaming using so that we can implement hand tracking in our flask app. I have already discussed have to set up a basic flask app, now let's go further
Try out the following code
app.py
from flask import Flask,render_template, Response
import cv2 as cv
app = Flask(__name__)
def generate_frames():
cap=cv.VideoCapture(0)
while cap.grab():
success, frame = cap.read() # read the camera frame
image =cv.flip(frame, 1)
if not success:
break
else:
ret, buffer = cv.imencode('.jpg', image)
image = buffer.tobytes()
yield (b'--frame\r\n'
b'Content-Type: image/jpeg\r\n\r\n' + image + b'\r\n')
@app.route("/")
def index():
return render_template('index.html')
@app.route("/video")
def video():
return Response(generate_frames(),mimetype='multipart/x-mixed-replace; boundary=frame')
if __name__ == "__main__":
app.run(debug=True)
index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>
<div class="container">
<div class="row">
<div class="col-lg-8 offset-lg-2" style="display:flex;flex-direction:column;">
<div style="border: 10px;width: 100%;"><img src="{{ url_for('video') }}" width="50%"></div>
</div>
</div>
</div>
</body>
</html>
Here, in our Flask app, we basically created a /video route that is generating frames using the method generate_frame. The method generate_frame generates frames using OpenCV as he has discussed above and it continuously yield them to the video route using the yield function. In HTML code I have used jinja2 templating to get the video stream from our backend code to the frontend.
Congratulation!!! You have completed your first video streaming flask app
Fun to do for you
Try to create a flask that can form hand tracking in real-time.
I created this Naruto Jutsu Battle game Hackathon
Here's GitHub repository for this project
Thank you for reading my blog, if you found it useful do share it with your friend. Also for more awesome blogs make sure to follow up on TechHub Community. TechHub is a great community to learn and explore new technologies. We also have a Discord Server, join today to get the latest updates.