Calibrating a Smartphone Camera for 3D Vision
As discussed in the previous blog post, a camera, at its core, is a device that squashes our 3D world into a 2D image. This projection loses vital information, most importantly, depth. To reverse this process and build a 3D model from an image, we need the depth map for the image and the camera-intrinsic matrix characterising our camera's sensor.
We'll demystify camera calibration for arbitrary phone (I am using Samsung Galaxy S25) using Zhang’s method. We’ll use laptop screen for setting up the calibration pattern (to avoid taking printouts).
Why Bother Calibrating?
An uncalibrated camera gives you a pretty picture. A calibrated camera gives you data. The goal of calibration is to find the Intrinsic Matrix (K) and Distortion Coefficients (D).
- The Intrinsic Matrix (K): This 3x3 matrix is like the camera's birth certificate. It tells us:
- Focal Length (
fx,fy): How "zoomed in" the lens is, measured in pixels. - Principal Point (
cx,cy): The true optical center of the image, which is rarely the exact pixel center. - Distortion Coefficients (D): No lens is perfect. They all bend light in slightly imperfect ways, causing straight lines in the real world to appear curved in an image (think of the "fisheye" effect). These five or more coefficients quantify that distortion, allowing us to mathematically un-warp our images.
[ fx 0 cx ]
K = [ 0 fy cy ]
[ 0 0 1 ]
Once we have K and D, we can take a 2D pixel (u, v) from an image, combine it with its estimated depth, and accurately project it back into 3D space (X, Y, Z). This is the bedrock of 3D reconstruction, AR, robotics, etc.
The Tool: A Checkerboard on Your Screen
Traditionally, calibration involves printing a checkerboard pattern on a perfectly flat surface. But we can do better. By displaying the pattern on a high-resolution screen like a MacBook's, we gain two advantages:
- Perfect Flatness: A screen is inherently flatter than any paper glued to cardboard.
- Perfect Precision: We can calculate the exact physical size of the checkerboard squares down to the micrometer, just by knowing our screen's specs.
However, this method comes with a challenge: glare. A glossy screen is a mirror, and reflections are the enemy of corner-detection algorithms. The solution? A controlled environment.
I will be using my personal Macbook Pro’s 14inch retina display (3024 × 1964), with default resolution of (1512 x 982). We use this specification to compute the “pixel width in mm”, which is necessary to know the precise 3D coordinates for each of the inner corners on the checkerboard pattern. The following shows the calculation to obtain the “pixel width in mm” for the Macbook Pro 14inch screen via its specification.
Given that we know the “pixel width in mm”, we know precisely the size and location of each tiny square (i.e., X,Y,Z) in the checkerboard pattern in the 3D coordinate system (where we assume Z=0, for the screen).
The following image shows a checkerboard pattern characterised by 9x6 internal corners, along with a while boundary.
Fig 1: A checkerboard with a white border, displayed on a Macbook Pro screen screen in a dark room.
The Process: A Step-by-Step Guide
Step 1: Generate the Calibration Target
First, we need a perfect checkerboard image with a wide, white border. The white border is crucial because it gives the detection algorithm a "quiet zone" to work in. We can generate this with a simple Python script using OpenCV.
Step 2: Set the Scene
This is the most critical part for getting good results.
- Go Dark: Find a dark room and turn off all the lights. The screen should be the only major light source.
- Max Brightness, No "True Tone": On your Mac, turn off any automatic brightness or color adjustments and set the brightness to maximum.
- Display Full-Screen: Open the
checkerboard.pngfile and view it in full-screen mode.
Step 3: Capture the Images
Now, grab your phone (Samsung Galaxy S25, in my case). Switch to "Pro Mode" to lock the focus, ISO, and shutter speed - consistency is key.
Take about 20-25 photos of the checkerboard displayed on your screen. The trick is to move the phone, not the screen. Capture the board from a wide variety of angles and positions:
- Tilt the phone up, down, left, and right.
- Position the board in every corner of the camera's view.
- Vary the distance.
- Avoid reflections! Check your phone's screen before each shot to make sure you can't see a reflection of the lens.
Capture the checkerboard from many different angles to get a robust calibration.
Fig2: Collage of various images captured from the phone on the Macbook Pro 14 inch in dark room.
Step 4: Run the Calibration Algorithm
With our images ready, we can feed them to OpenCV's powerful calibrateCamera function. The algorithm, based on a technique known as Zhang's Method, performs some incredible mathematical detective work:
- It finds the precise pixel coordinates of the checkerboard corners in each of your 20+ images.
- It knows the ideal 3D geometry of the board (since we defined it).
- By comparing the ideal 3D points to the observed 2D image points across all the different angles, it solves for the one set of intrinsic and distortion parameters that best explains all the views simultaneously (i.e., Homography).
After running our calibration script, we get our intrinsic parameters and the distortion coefficients (for Samsung Galaxy S25 phone):
Calibration SUCCEEDED!
--- Intrinsic Matrix (K) ---
[[2772.19 0.0 1491.16]
[ 0.0 2771.59 2014.83]
[ 0.0 0.0 1.0 ]]
--- Distortion Coefficients ---
[[ 0.0529 -0.1161 0.0001 -0.0007 0.0911]]
Total Re-projection Error: 0.1345 pixels
(This is a good result!)
The extremely low re-projection error (under 0.5 pixels is great, and 0.13 is excellent) tells us that our model is a near-perfect digital twin of our phone's camera.
The Payoff: Creating a 3D Point Cloud
With our calibrated parameters, we can now achieve our ultimate goal. We take a new image and its corresponding depth map (which can be generated by AI models like Depth Anything V2).
For every pixel in the image, we do the following:
- Undistort: Use the distortion coefficients to correct the pixel's location.
- Back-Project: Use the intrinsic matrix (
fx, fy, cx, cy) and the depth value to calculate the pixel's real 3D(X, Y, Z)coordinate.
When we do this for every pixel, we transform a flat image into a rich, vibrant 3D point cloud.
This process, which started with a simple checkerboard on a screen, unlocks the true potential of our smartphone camera, turning it from a picture-taker into a 3D scanner. It's the first and most important step on the road to building immersive digital worlds from the one we see every day.