From Presence to Pixels to Presence: Part 2, Camera Calibration

Calibrating a Smartphone Camera for 3D Vision

As discussed in the previous blog post, a camera, at its core, is a device that squashes our 3D world into a 2D image. This projection loses vital information, most importantly, depth. To reverse this process and build a 3D model from an image, we need the depth map for the image and the camera-intrinsic matrix characterising our camera's sensor.

We'll demystify camera calibration for arbitrary phone (I am using Samsung Galaxy S25) using Zhang’s method. We’ll use laptop screen for setting up the calibration pattern (to avoid taking printouts).

Why Bother Calibrating?

An uncalibrated camera gives you a pretty picture. A calibrated camera gives you data. The goal of calibration is to find the Intrinsic Matrix (K) and Distortion Coefficients (D).

The Intrinsic Matrix (K): This 3x3 matrix is like the camera's birth certificate. It tells us:

Focal Length (fx, fy): How "zoomed in" the lens is, measured in pixels.
Principal Point (cx, cy): The true optical center of the image, which is rarely the exact pixel center.

    [ fx  0  cx ]
K = [  0 fy  cy ]
    [  0  0   1 ]

Distortion Coefficients (D): No lens is perfect. They all bend light in slightly imperfect ways, causing straight lines in the real world to appear curved in an image (think of the "fisheye" effect). These five or more coefficients quantify that distortion, allowing us to mathematically un-warp our images.

Once we have K and D, we can take a 2D pixel (u, v) from an image, combine it with its estimated depth, and accurately project it back into 3D space (X, Y, Z). This is the bedrock of 3D reconstruction, AR, robotics, etc.

The Tool: A Checkerboard on Your Screen

Traditionally, calibration involves printing a checkerboard pattern on a perfectly flat surface. But we can do better. By displaying the pattern on a high-resolution screen like a MacBook's, we gain two advantages:

Perfect Flatness: A screen is inherently flatter than any paper glued to cardboard.
Perfect Precision: We can calculate the exact physical size of the checkerboard squares down to the micrometer, just by knowing our screen's specs.

However, this method comes with a challenge: glare. A glossy screen is a mirror, and reflections are the enemy of corner-detection algorithms. The solution? A controlled environment.

I will be using my personal Macbook Pro’s 14inch retina display (3024 × 1964), with default resolution of (1512 x 982). We use this specification to compute the “pixel width in mm”, which is necessary to know the precise 3D coordinates for each of the inner corners on the checkerboard pattern. The following shows the calculation to obtain the “pixel width in mm” for the Macbook Pro 14inch screen via its specification.

Given that we know the “pixel width in mm”, we know precisely the size and location of each tiny square (i.e., X,Y,Z) in the checkerboard pattern in the 3D coordinate system (where we assume Z=0, for the screen).

The following image shows a checkerboard pattern characterised by 9x6 internal corners, along with a while boundary.

Fig 1: A checkerboard with a white border, displayed on a Macbook Pro screen screen in a dark room.

The Process: A Step-by-Step Guide

Step 1: Generate the Calibration Target

First, we need a perfect checkerboard image with a wide, white border. The white border is crucial because it gives the detection algorithm a "quiet zone" to work in. We can generate this with a simple Python script using OpenCV.

import numpy as np
import cv2
import os

def create_checkerboard_image(board_shape, square_size_pixels, filename="checkerboard.png"):
    """
    Generates a PNG image of a checkerboard pattern with a required white border.

    Args:
        board_shape (tuple): (cols, rows) of inner corners.
        square_size_pixels (int): The size of each square in pixels.
        filename (str): The name of the output PNG file.
    """
    # Number of squares is one more than the number of internal corners
    squares_y = board_shape[1] + 1
    squares_x = board_shape[0] + 1

    # Total image size in pixels for the pattern itself
    pattern_height = squares_y * square_size_pixels
    pattern_width = squares_x * square_size_pixels

    # Create a black and white checkerboard pattern
    board = np.zeros((pattern_height, pattern_width), dtype=np.uint8)
    
    for y in range(squares_y):
        for x in range(squares_x):
            if (x % 2 == y % 2):
                # This square is white (255)
                board[y*square_size_pixels:(y+1)*square_size_pixels,
                      x*square_size_pixels:(x+1)*square_size_pixels] = 255
    
    # Add a white border around the pattern. This is critical for detection.
    # The border should be at least as wide as one square.
    border_size = square_size_pixels
    img_with_border = cv2.copyMakeBorder(
        board,
        top=border_size,
        bottom=border_size,
        left=border_size,
        right=border_size,
        borderType=cv2.BORDER_CONSTANT,
        value=255  # White border
    )

    # Save the image
    cv2.imwrite(filename, img_with_border)
    print(f"Checkerboard image saved as '{filename}'")
    print(f"Board dimensions (inner corners): {board_shape[0]}x{board_shape[1]}")
    print(f"Square size: {square_size_pixels} pixels")
    print(f"\nIMPORTANT: Display '{filename}' full-screen on a black background for calibration.")
    print("The white border is essential for the corner detection algorithm.")

if __name__ == '__main__':
    # --- Configuration ---
    # Number of *internal corners* for the checkerboard. (width, height)
    # A standard choice is 9x6.
    CHECKERBOARD_CORNERS = (9, 6)

    # The size of each square in pixels.
    # A value between 50 and 100 is usually good for most screens.
    SQUARE_SIZE_PIXELS = 80
    
    create_checkerboard_image(CHECKERBOARD_CORNERS, SQUARE_SIZE_PIXELS)

Step 2: Set the Scene

This is the most critical part for getting good results.

Go Dark: Find a dark room and turn off all the lights. The screen should be the only major light source.
Max Brightness, No "True Tone": On your Mac, turn off any automatic brightness or color adjustments and set the brightness to maximum.
Display Full-Screen: Open the checkerboard.png file and view it in full-screen mode.

Step 3: Capture the Images

Now, grab your phone (Samsung Galaxy S25, in my case). Switch to "Pro Mode" to lock the focus, ISO, and shutter speed - consistency is key.

Take about 20-25 photos of the checkerboard displayed on your screen. The trick is to move the phone, not the screen. Capture the board from a wide variety of angles and positions:

Tilt the phone up, down, left, and right.
Position the board in every corner of the camera's view.
Vary the distance.
Avoid reflections! Check your phone's screen before each shot to make sure you can't see a reflection of the lens.

Capture the checkerboard from many different angles to get a robust calibration.

Fig2: Collage of various images captured from the phone on the Macbook Pro 14 inch in dark room.

Step 4: Run the Calibration Algorithm

With our images ready, we can feed them to OpenCV's powerful calibrateCamera function. The algorithm, based on a technique known as Zhang's Method, performs some incredible mathematical detective work:

It finds the precise pixel coordinates of the checkerboard corners in each of your 20+ images.
It knows the ideal 3D geometry of the board (since we defined it).
By comparing the ideal 3D points to the observed 2D image points across all the different angles, it solves for the one set of intrinsic and distortion parameters that best explains all the views simultaneously (i.e., Homography).

import cv2
import numpy as np
import glob
import os

# ===========================================================================
# --- 1. CONFIGURATION - YOU MUST EDIT THIS SECTION ---
# ===========================================================================

# The number of INNER corners on your checkerboard. (width, height)
# This MUST match the settings used in the generation script.
CHECKERBOARD_CORNERS = (9, 6)

# The real-world size of a square's side in millimeters.
# This is the CRITICAL value you calculated in Step 1.
# Replace 8.0448 with your own calculated value.
SQUARE_SIZE_MM = 80 * 0.0986 # 80pixels * 0.0986mm/pixel

# Path to the folder containing your calibration images.
IMAGE_DIR = "calibration_images"

# File to save the calibration results to.
OUTPUT_FILE = "camera_calibration.npz"

# Scale for the debug display window. 1.0 is full size. 0.3 is 30%.
DEBUG_DISPLAY_SCALE = 0.3

# ===========================================================================
# --- 2. CALIBRATION SCRIPT (No need to edit below this line) ---
# ===========================================================================

def run_calibration():
    """
    Performs camera calibration using checkerboard images with a visual debug loop.
    """
    # Create the image directory if it doesn't exist
    if not os.path.exists(IMAGE_DIR):
        os.makedirs(IMAGE_DIR)
        print(f"ERROR: Created directory '{IMAGE_DIR}'.")
        print("Please place your calibration images here and run the script again.")
        return

    # Termination criteria for corner refinement
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)

    # Prepare object points, like (0,0,0), (1,0,0), ..., (8,5,0)
    # These are the 3D coordinates of the corners in the checkerboard's own space.
    objp = np.zeros((CHECKERBOARD_CORNERS[0] * CHECKERBOARD_CORNERS[1], 3), np.float32)
    objp[:, :2] = np.mgrid[0:CHECKERBOARD_CORNERS[0], 0:CHECKERBOARD_CORNERS[1]].T.reshape(-1, 2)
    objp = objp * SQUARE_SIZE_MM  # Scale to real-world size (mm)

    # Arrays to store object points and image points from all the images.
    objpoints = []  # 3D points in real world space
    imgpoints = []  # 2D points in the image plane.

    image_paths = glob.glob(os.path.join(IMAGE_DIR, '*.jpeg')) + glob.glob(os.path.join(IMAGE_DIR, '*.png'))
    if not image_paths:
        print(f"ERROR: No .jpg or .png images found in '{IMAGE_DIR}'.")
        return

    print(f"Found {len(image_paths)} images. Starting corner detection and visual debugging...")
    image_size = None

    for fname in image_paths:
        img = cv2.imread(fname)
        if img is None:
            print(f"Warning: Could not read image {fname}")
            continue

        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        if image_size is None:
            image_size = gray.shape[::-1] # (width, height)

        print(f"\n--- Processing: {os.path.basename(fname)} ---")

        # Use more robust flags for finding corners
        flags = cv2.CALIB_CB_ADAPTIVE_THRESH + cv2.CALIB_CB_NORMALIZE_IMAGE
        
        # Find the chess board corners
        ret, corners = cv2.findChessboardCorners(gray, CHECKERBOARD_CORNERS, flags)

        if ret:
            print("SUCCESS: Corners found!")
            objpoints.append(objp)
            
            # Refine corner locations for greater accuracy
            corners_refined = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
            imgpoints.append(corners_refined)

            # Draw and display the corners to visually verify
            img_with_corners = cv2.drawChessboardCorners(img, CHECKERBOARD_CORNERS, corners_refined, ret)
            h, w = img_with_corners.shape[:2]
            display_img = cv2.resize(img_with_corners, (int(w*DEBUG_DISPLAY_SCALE), int(h*DEBUG_DISPLAY_SCALE)))
            
            cv2.imshow('Debug: Found Corners', display_img)
        else:
            print("FAILURE: Corners not found. This image will be skipped.")
            h, w = gray.shape[:2]
            display_img = cv2.resize(gray, (int(w*DEBUG_DISPLAY_SCALE), int(h*DEBUG_DISPLAY_SCALE)))
            cv2.imshow('Debug: Failed Image', display_img)
        
        print("Press any key to continue to the next image...")
        cv2.waitKey(0)
    
    cv2.destroyAllWindows()

    if len(objpoints) < 10:
        print(f"\nCalibration failed. Only found corners in {len(objpoints)} images.")
        print("You need at least 10 good images for a reliable calibration.")
        return

    print(f"\nCorner detection complete. {len(objpoints)} of {len(image_paths)} images were successful.")
    print("Running final calibration...")

    # Perform the actual camera calibration
    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, image_size, None, None)

    if not ret:
        print("\nCalibration failed. OpenCV could not compute the parameters.")
        return

    print("\nCalibration SUCCEEDED!")
    print("\n--- Intrinsic Matrix (K) ---")
    print(mtx)
    print("\n--- Distortion Coefficients (k1, k2, p1, p2, k3) ---")
    print(dist)

    # Calculate and display the re-projection error
    mean_error = 0
    for i in range(len(objpoints)):
        imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
        error = cv2.norm(imgpoints[i], imgpoints2, cv2.NORM_L2) / len(imgpoints2)
        mean_error += error
    
    reprojection_error = mean_error / len(objpoints)
    print(f"\nTotal Re-projection Error: {reprojection_error:.4f} pixels")
    if reprojection_error < 0.5:
        print("(This is a good result!)")
    elif reprojection_error < 1.0:
        print("(This is an acceptable result.)")
    else:
        print("(This result is a bit high. Consider retaking photos with less blur or better angles.)")

    # Save the results
    np.savez(OUTPUT_FILE, mtx=mtx, dist=dist, img_size=np.array(image_size))
    print(f"\nCalibration data saved to '{OUTPUT_FILE}'")
    print("\nYou can now use this file with your 3D reconstruction code.")


if __name__ == '__main__':
    run_calibration()

After running our calibration script, we get our intrinsic parameters and the distortion coefficients (for Samsung Galaxy S25 phone):

Calibration SUCCEEDED!

--- Intrinsic Matrix (K) ---
[[2772.19  0.0    1491.16]
 [   0.0  2771.59  2014.83]
 [   0.0     0.0      1.0 ]]

--- Distortion Coefficients ---
[[ 0.0529 -0.1161  0.0001 -0.0007  0.0911]]

Total Re-projection Error: 0.1345 pixels
(This is a good result!)

The extremely low re-projection error (under 0.5 pixels is great, and 0.13 is excellent) tells us that our model is a near-perfect digital twin of our phone's camera.

The Payoff: Creating a 3D Point Cloud

With our calibrated parameters, we can now achieve our ultimate goal. We take a new image and its corresponding depth map (which can be generated by AI models like Depth Anything V2).

For every pixel in the image, we do the following:

Undistort: Use the distortion coefficients to correct the pixel's location.
Back-Project: Use the intrinsic matrix (fx, fy, cx, cy) and the depth value to calculate the pixel's real 3D (X, Y, Z) coordinate.

When we do this for every pixel, we transform a flat image into a rich, vibrant 3D point cloud.

This process, which started with a simple checkerboard on a screen, unlocks the true potential of our smartphone camera, turning it from a picture-taker into a 3D scanner. It's the first and most important step on the road to building immersive digital worlds from the one we see every day.