Check if your python interpreter is linked to jupyter notebook by printing a simple statement.
print('HOG-descriptor-from-scratch')
The path where this ipython script is present also contains 'Data' folder. The data folder has following structure:
data
->pedestrians128x64
->pedestrians_neg
->img_test
You can download data folder from:
https://drive.google.com/file/d/1YCXkb2muHz-m-nqNWtxCfuxa817DaWB6/view?usp=sharing
datadir = "data"
dataset = "pedestrians128x64"
datafile = "%s/%s.tar.gz" % (datadir, dataset)
extractdir = "%s/%s" % (datadir, dataset)
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
for i in range(5):
filename = "%s/per0010%d.ppm" % (extractdir, i)
img = cv2.imread(filename)
plt.subplot(1, 5, i + 1)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.axis('off')
HOG descriptor is used in open cv by calling cv2.HOGDescriptor function this function requires the following input arguments.
win_size = (48, 96)
block_size = (16, 16)
block_stride = (8, 8)
cell_size = (8, 8)
num_bins = 9
hog = cv2.HOGDescriptor(win_size, block_size, block_stride,
cell_size, num_bins)
X-pos list contains randomly picked positive samples of pedestrians. Then apply hog descriptor to them
import numpy as np
import random
random.seed(42)
X_pos = []
for i in random.sample(range(900), 400):
filename = "%s/per%05d.ppm" % (extractdir, i)
img = cv2.imread(filename)
if img is None:
print('Could not find image %s' % filename)
continue
X_pos.append(hog.compute(img, (64, 64)))
I picked 399 training images each of them have 1980 hog feature values.
X_pos = np.array(X_pos, dtype=np.float32)
y_pos = np.ones(X_pos.shape[0], dtype=np.int32)
X_pos.shape, y_pos.shape
negdir = "%s/pedestrians_neg" % datadir
I loop through all the negative images in the directory using os.listdir() and cut out a 64 x 128 region of interest (ROI):
import os
hroi = 128
wroi = 64
X_neg = []
for negfile in os.listdir(negdir):
filename = '%s/%s' % (negdir, negfile)
img = cv2.imread(filename)
img = cv2.resize(img, (512, 512))
for j in range(5):
rand_y = random.randint(0, img.shape[0] - hroi)
rand_x = random.randint(0, img.shape[1] - wroi)
roi = img[rand_y:rand_y + hroi, rand_x:rand_x + wroi, :]
X_neg.append(hog.compute(roi, (64, 64)))
X_neg = np.array(X_neg, dtype=np.float32)
y_neg = -np.ones(X_neg.shape[0], dtype=np.int32)
X_neg.shape, y_neg.shape
X = np.concatenate((X_pos, X_neg))
y = np.concatenate((y_pos, y_neg))
from sklearn import model_selection as ms
X_train, X_test, y_train, y_test = ms.train_test_split(
X, y, test_size=0.2, random_state=42
)
Here I train SVM
def train_svm(X_train, y_train):
svm = cv2.ml.SVM_create()
svm.train(X_train, cv2.ml.ROW_SAMPLE, y_train)
return svm
def score_svm(svm, X, y):
from sklearn import metrics
_, y_pred = svm.predict(X)
return metrics.accuracy_score(y, y_pred)
After training SVM I find training scores and testing scores
svm = train_svm(X_train, y_train)
score_svm(svm, X_train, y_train)
score_svm(svm, X_test, y_test)
The training score is much higher than the testing score, this is a problem of overfitting. To tackle this problem of overfitting I find fasle positive in the test set once they are found I append them to the training set. And repet this for 2 rounds as you can see I got 64% accuracy in the first round and 100% accuracy in the second round.
score_train = []
score_test = []
for j in range(3):
svm = train_svm(X_train, y_train)
score_train.append(score_svm(svm, X_train, y_train))
score_test.append(score_svm(svm, X_test, y_test))
_, y_pred = svm.predict(X_test)
false_pos = np.logical_and((y_test.ravel() == -1),
(y_pred.ravel() == 1))
if not np.any(false_pos):
print('no more false positives: done')
break
X_train = np.concatenate((X_train,
X_test[false_pos, :]),
axis=0)
y_train = np.concatenate((y_train, y_test[false_pos]),
axis=0)
score_train
score_test
Now that I have made a fairly accurate model I should move on to detection. Divide image into small squares and find out if each square contains pedestrian or not. For this I use 'stride' feature, stride is nothing but number of pixels. I made sure I dont cross boundry of image by using
if ystart + hroi > img_test.shape[0]:
if xstart + wroi > img_test.shape[1]:
After this I find region of interest. If the ROI is classified as a pedestrian then I add it to list of successes using this code
if np.allclose(ypred, 1): found.append((ystart, xstart, hroi, wroi))
stride = 16
found = []
img_test = cv2.imread('img_test.jpg')
for ystart in np.arange(0, img_test.shape[0], stride):
for xstart in np.arange(0, img_test.shape[1], stride):
if ystart + hroi > img_test.shape[0]:
continue
if xstart + wroi > img_test.shape[1]:
continue
roi = img_test[ystart:ystart + hroi,
xstart:xstart + wroi, :]
feat = np.array([hog.compute(roi, (64, 64))])
_, ypred = svm.predict(feat)
if np.allclose(ypred, 1):
found.append((ystart, xstart, hroi, wroi))
I pass all svm parameters to the 'hog' object
rho, _, _ = svm.getDecisionFunction(0)
sv = svm.getSupportVectors()
hog.setSVMDetector(np.append(sv.ravel(), rho))
The size of people can vary hence I use detectMultiScale
found = hog.detectMultiScale(img_test)
from matplotlib import patches
fig = plt.figure()
ax = fig.add_subplot(111)
ax.imshow(cv2.cvtColor(img_test, cv2.COLOR_BGR2RGB))
for f in found:
ax.add_patch(patches.Rectangle((f[0], f[1]), f[2], f[3],
color='y', linewidth=3,
fill=False))