Computer vision has changed how machines understand the world. From unlocking your phone with your face to self-driving cars navigating streets, these applications rely on one thing: the ability to process and interpret visual data. OpenCV (Open Source Computer Vision Library) makes this power accessible to Python developers.
This tutorial walks you through building real computer vision applications. You’ll start with reading images, move through transformations and filters, and finish by detecting faces in real-time video. Each step builds on the last, giving you hands-on experience with techniques used in production systems.
Prerequisites
Before writing any code, you need OpenCV installed. The library works best with Python 3.8 or higher.
pip install opencv-python
pip install opencv-contrib-python
pip install numpy
pip install matplotlib
The opencv-python package contains core functionality. opencv-contrib-python adds extra modules like advanced feature detectors. NumPy handles array operations (images are just arrays of pixels), and Matplotlib helps visualize results.
Test your installation:
import cv2
import numpy as np
print(f"OpenCV version: {cv2.__version__}")
You should see version 4.x or higher. If you get import errors, check that you’re using the correct Python environment.
Step 1: Reading and Displaying Images
Every computer vision project starts with loading images. OpenCV represents images as NumPy arrays, where each pixel has intensity values.
import cv2
import matplotlib.pyplot as plt
# Read an image
image = cv2.imread('sample.jpg')
# Check if image loaded successfully
if image is None:
print("Error: Could not load image")
exit()
# Get image dimensions
height, width, channels = image.shape
print(f"Image size: {width}x{height}, Channels: {channels}")
# OpenCV uses BGR color format, convert to RGB for display
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Display using matplotlib
plt.figure(figsize=(10, 6))
plt.imshow(image_rgb)
plt.axis('off')
plt.title('Original Image')
plt.show()
OpenCV uses BGR (Blue, Green, Red) format instead of the standard RGB. This dates back to early camera sensors but causes confusion. Always convert to RGB when displaying with Matplotlib or most other libraries.
You can also write images back to disk:
# Convert to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Save the result
cv2.imwrite('output_gray.jpg', gray_image)
Grayscale images have one channel (intensity) instead of three (RGB). They take less memory and many algorithms work better on grayscale input.
Step 2: Image Transformations and Filters
Raw images often need preprocessing. Transformations adjust size, rotation, and perspective. Filters enhance details or remove noise.
Resizing and Rotation
# Resize to specific dimensions
resized = cv2.resize(image, (400, 300))
# Resize by scale factor
scaled = cv2.resize(image, None, fx=0.5, fy=0.5)
# Get image center for rotation
center = (width // 2, height // 2)
angle = 45
# Create rotation matrix
rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
# Apply rotation
rotated = cv2.warpAffine(image, rotation_matrix, (width, height))
The resize() function can take absolute dimensions or scale factors. warpAffine() applies affine transformations (rotation, scaling, translation) using a transformation matrix.
Blurring and Sharpening
Blurring removes noise and reduces detail. Different kernels create different effects:
# Gaussian blur: smooth, natural-looking
gaussian = cv2.GaussianBlur(image, (5, 5), 0)
# Median blur: removes salt-and-pepper noise
median = cv2.medianBlur(image, 5)
# Bilateral filter: smooths while preserving edges
bilateral = cv2.bilateralFilter(image, 9, 75, 75)
Gaussian blur works well for general smoothing. Median blur excels at removing random noise. Bilateral filters preserve edges, making them useful before edge detection.
For sharpening, create a custom kernel:
# Sharpening kernel
kernel = np.array([[-1, -1, -1],
[-1, 9, -1],
[-1, -1, -1]])
sharpened = cv2.filter2D(image, -1, kernel)
This kernel amplifies differences between a pixel and its neighbors, enhancing edges and details.
Step 3: Edge Detection and Contours
Edge detection finds boundaries where pixel intensity changes rapidly. These boundaries often correspond to object edges in the scene.
Canny Edge Detection
Canny edge detection is the gold standard. It uses multiple stages to detect strong edges while suppressing noise:
# Convert to grayscale first
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply Gaussian blur to reduce noise
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
# Canny edge detection
edges = cv2.Canny(blurred, 50, 150)
# Display result
plt.figure(figsize=(12, 5))
plt.subplot(121)
plt.imshow(gray, cmap='gray')
plt.title('Original Grayscale')
plt.axis('off')
plt.subplot(122)
plt.imshow(edges, cmap='gray')
plt.title('Canny Edges')
plt.axis('off')
plt.show()
The two numbers (50 and 150) are thresholds. Pixels with gradient values above 150 are strong edges. Values below 50 are discarded. Values between 50 and 150 are kept only if connected to strong edges.
Adjust these values based on your image. Higher thresholds detect fewer edges. Lower thresholds detect more but include noise.
Finding and Drawing Contours
Contours are continuous curves along edges. They’re useful for shape detection and object counting:
# Find contours
contours, hierarchy = cv2.findContours(edges,
cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
# Create a copy to draw on
image_contours = image.copy()
# Draw all contours
cv2.drawContours(image_contours, contours, -1, (0, 255, 0), 2)
print(f"Found {len(contours)} contours")
# Filter by area
large_contours = [c for c in contours if cv2.contourArea(c) > 500]
print(f"Contours larger than 500 pixels: {len(large_contours)}")
RETR_EXTERNAL retrieves only outer contours. RETR_TREE gets all contours and their relationships. CHAIN_APPROX_SIMPLE compresses contours by removing redundant points.
You can analyze each contour:
for contour in large_contours:
# Get bounding rectangle
x, y, w, h = cv2.boundingRect(contour)
cv2.rectangle(image_contours, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Calculate area and perimeter
area = cv2.contourArea(contour)
perimeter = cv2.arcLength(contour, True)
print(f"Area: {area:.2f}, Perimeter: {perimeter:.2f}")
This forms the basis of object detection and measurement in images.
Step 4: Face Detection with Haar Cascades
Face detection demonstrates computer vision’s power. Haar Cascades are pre-trained classifiers that detect specific objects.
Loading the Classifier
OpenCV includes pre-trained cascades for faces, eyes, and smiles:
# Load the face cascade
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
# Load eye cascade (optional)
eye_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_eye.xml'
)
# Verify cascades loaded
if face_cascade.empty():
print("Error: Could not load face cascade")
exit()
These XML files contain trained parameters. The face cascade works for frontal faces. Side profiles need different cascades.
Detecting Faces
# Read image
image = cv2.imread('people.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(
gray,
scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30)
)
print(f"Found {len(faces)} faces")
# Draw rectangles around faces
for (x, y, w, h) in faces:
# Draw face rectangle
cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Detect eyes within face region
roi_gray = gray[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray, 1.1, 5)
# Draw eye rectangles
for (ex, ey, ew, eh) in eyes:
cv2.rectangle(image, (x+ex, y+ey), (x+ex+ew, y+ey+eh), (0, 255, 0), 2)
# Display result
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 8))
plt.imshow(image_rgb)
plt.axis('off')
plt.show()
Parameters explained:
scaleFactor: How much the image size is reduced at each scale (1.1 means 10% reduction)minNeighbors: How many neighbors each candidate rectangle should have (higher values reduce false positives)minSize: Minimum face size to detect
If you get too many false positives, increase minNeighbors. If you miss faces, decrease scaleFactor or minNeighbors.
Step 5: Real-Time Video Processing
Processing video means processing frames in sequence. Each frame is just an image, so you can apply any technique from previous steps.
Capturing from Webcam
# Open webcam (0 is usually the default camera)
cap = cv2.VideoCapture(0)
# Check if camera opened successfully
if not cap.isOpened():
print("Error: Could not open camera")
exit()
# Load face cascade
face_cascade = cv2.CascadeClassifier(
cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
print("Press 'q' to quit")
while True:
# Read frame
ret, frame = cap.read()
if not ret:
print("Error: Could not read frame")
break
# Convert to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
# Draw rectangles
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
cv2.putText(frame, 'Face', (x, y-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)
# Display frame
cv2.imshow('Face Detection', frame)
# Break on 'q' key
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Clean up
cap.release()
cv2.destroyAllWindows()
waitKey(1) waits 1 millisecond for a key press. This creates a loop that processes around 60 frames per second, giving smooth video.
Processing Video Files
The same code works for video files:
# Open video file instead of camera
cap = cv2.VideoCapture('video.mp4')
# Get video properties
fps = int(cap.get(cv2.CAP_PROP_FPS))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
print(f"Video: {width}x{height} at {fps} FPS")
# Optional: Save processed video
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Process frame (edge detection example)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 100, 200)
# Convert edges back to BGR for video writer
edges_bgr = cv2.cvtColor(edges, cv2.COLOR_GRAY2BGR)
# Write to output file
out.write(edges_bgr)
# Display (optional)
cv2.imshow('Processed', edges_bgr)
if cv2.waitKey(25) & 0xFF == ord('q'):
break
cap.release()
out.release()
cv2.destroyAllWindows()
Processing video takes time. A 1-minute video at 30 FPS has 1,800 frames. Each frame needs processing.
Common Pitfalls
BGR vs RGB Color Space
OpenCV reads images in BGR format. Most other libraries (Matplotlib, PIL, TensorFlow) use RGB. Forgetting to convert causes weird colors:
# Wrong: Colors look wrong
plt.imshow(cv2.imread('image.jpg'))
# Right: Convert first
image = cv2.imread('image.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image_rgb)
Image Not Loading
imread() returns None if the file doesn’t exist or can’t be read. Always check:
image = cv2.imread('file.jpg')
if image is None:
print("Error: Could not load image")
exit()
Common causes: wrong file path, unsupported format, file permissions.
Window Not Closing
When displaying images with imshow(), windows might freeze. Always use waitKey():
cv2.imshow('Image', image)
cv2.waitKey(0) # Wait for key press
cv2.destroyAllWindows()
waitKey(0) waits indefinitely. waitKey(1) waits 1 millisecond (useful in loops).
Memory Leaks in Video Processing
Video processing can consume memory fast. Release resources when done:
cap = cv2.VideoCapture('video.mp4')
# ... processing code ...
# Always release
cap.release()
cv2.destroyAllWindows()
For long videos, process in batches or use frame skipping:
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Process every 5th frame
if frame_count % 5 == 0:
process_frame(frame)
frame_count += 1
Haar Cascade False Positives
Face detection sometimes detects non-faces. Adjust parameters:
# More strict detection (fewer false positives)
faces = face_cascade.detectMultiScale(
gray,
scaleFactor=1.05, # Smaller steps
minNeighbors=8, # More neighbors required
minSize=(50, 50) # Larger minimum size
)
For better accuracy, consider deep learning models like MTCNN or DNN face detector (included in opencv-contrib-python).
Summary
You’ve learned the foundations of computer vision with OpenCV and Python. Starting with reading and displaying images, you progressed through transformations, edge detection, and face recognition. The final step covered real-time video processing, showing how these techniques work on live camera feeds.
These skills apply across many domains. Security systems use face detection. Manufacturing uses edge detection for quality control. Medical imaging uses filters to enhance X-rays and MRIs. Self-driving cars use everything you learned here (and more) to navigate safely.
The OpenCV documentation covers each topic in more detail. Experiment with different parameters. Try combining techniques (blur before edge detection, detect faces then analyze eye movements). Computer vision becomes more powerful when you chain operations together.
Your next steps: Build a project. Maybe track objects in video, create a document scanner app, or build a gesture recognition system. The best way to learn is by solving real problems.
Discussion
Leave a comment
No comments yet
Be the first to start the conversation.