Python HTML5 canvas game web automation example using Selenium and OpenCV

Branimir Valentic | 05.03.2019.

While looking for good Selenium exercises for our youngest member Danijel we came up with an idea to automate games with Python, Selenium and OpenCV. We decided to play Galaga. As the project now needed more than just Selenium knowledge Danijel took the task to enable user input from Selenium and I decided to take a look at the rest of the challenge. As Daniel quickly learned, long pressing a key from chrome driver and Selenium was no trivial task. After some trial and error, Danijel decided that using pyautogui library was the way to go, at least in this stage of the project.
Using Selenium we have imported the HTML5 canvas into OpenCV, and have also defined a method to differentiate relevant objects within the Galaga game and have made a simple autopilot for the game.

The program in action:

Here is how to import the HTML5 canvas to OpenCV:

driver = webdriver.Chrome(“./chromedriver”)

def get_driver_image(driver):
canvas_base64 = driver.execute_script(“return arguments[0].toDataURL(‘image/png’).substring(21);”, a)
cap = base64.b64decode(canvas_base64)
global image
global gray_image
image = cv2.imdecode(np.frombuffer(cap, np.uint8), 1)

After some tinkering we have settled on using cv2.Laplacian, cv2.dilate, cv2.threshold, cv2.connectedComponentsWithStats and cv2.SimpleBlobDetector_Params to be able to easily differentiate the game objects by size:

def prepare_image(image, gray_image):
global crop_img
global threshold
crop_img = gray_image[0:240, 0:205]
binary_image = cv2.Laplacian(crop_img, cv2.CV_8UC1)
dilated_image = cv2.dilate(binary_image, np.ones((6, 6)))
_, thresh = cv2.threshold(dilated_image, BINARY_THRESHOLD, 255, cv2.THRESH_BINARY)
components = cv2.connectedComponentsWithStats(thresh, CONNECTIVITY, cv2.CV_32S)
centers = components[3]
retval, threshold = cv2.threshold(thresh, 200, 255, cv2.THRESH_BINARY_INV)

Player’s ship parameters example:

player_params = cv2.SimpleBlobDetector_Params()
player_params.filterByArea = True
player_params.minArea = 411
player_params.maxArea =420
player_params.filterByCircularity = False
player_params.filterByConvexity = False
player_params.filterByInertia = False
player_detector = cv2.SimpleBlobDetector_create(player_params)

keypoints_player = player_detector.detect(threshold)

def draw_keypoints(image, *args):
for keypoint in args:
if keypoint == keypoints_player:
point_color = [0, 255, 0]
elif keypoint == keypoints_enemy:
point_color = [0, 0, 255]
point_color = [255, 255, 255]
image = cv2.drawKeypoints(
image, keypoint, np.array([]), (point_color),
return image

im_with_keypoints = draw_keypoints(
image, keypoints_player,

After using this method to go through all the elements the OpenCV putput looks like this:



The exact location of the game objects was extracted from the keypoints using this method:

player_axis_y = keypoints_player[i].pt[0]
player_axis_x = keypoints_player[i].pt[1]

The complete code is available on github:

What’s next?
-Figure out what input method and driver works best for our application as native Chrome driver/Selenium methods did not work for continued input (key press and release) since we would also like for this application to run in headless mode
-Pinpoint and log the lag time in the loop
-Try the code out with parallel processing and multiple threads to speed things up
-Machine learning would also be a great thing to add


Josipa Jurja Križanića 1
31000, Osijek

Pin It on Pinterest

Share This