Cover image for Stream ESP32CAM video

ESP32CAM

ESP32CAM Module

ESP32CAM

The ESP32Cam is a tiny module that allows you to stream video from the camera to a web browser. It’s a great way to get easily add a POV capability to your robotics project. The firmware that comes loaded on the ESP32Cam has a web server, self hosted wifi hotspot as well as a built-in a video streamer.

We’ll use the video streamer to capture the video stream in Python on a Raspberry Pi 5, and then process that video in realtime.


What is RTSP?

The ESP32Cam video streamer uses RTSP to stream the video.

RTSP stands for Real-Time Streaming Protocol. It’s a network protocol, a set of rules, used for controlling the streaming of audio and video data over the Internet in real-time. Think of it as a ‘remote control’ for live video feeds.


How Does RTSP Work?

  1. Connection: First, your device (like your computer or smartphone) contacts the server (where the video is stored or being broadcast from) using RTSP. It’s like dialing a phone number to start a call.

  2. Control Commands: Once connected, RTSP allows you to send control commands to the server. You can tell the server to do things like ‘play the video’, ‘pause’, ‘rewind’, or ‘fast forward’ – similar to how you use a remote control with your TV.

  3. Streaming: Unlike downloading a file, where you wait for the entire file to download before viewing, RTSP allows the video or audio to be played as it’s being transmitted. This is known as streaming.

  4. Separate Data Transport: RTSP itself doesn’t send the video or audio data. Instead, it works alongside other protocols (like RTP - Real-Time Transport Protocol) that handle the actual transmission of the audio and video data.


Why Use RTSP?

  • Live Control: RTSP is great for situations where you need real-time control over streaming, like in security camera feeds, live broadcasts, or video conferencing.

  • Efficiency: It’s efficient for streaming live content because it reduces delay and allows for interactive control over the stream.

  • Flexibility: RTSP supports various media types and can be used with different kinds of networks and devices.


We can use the cv2 library in Python to capture the video stream from the ESP32Cam. We’ll use the cv2.VideoCapture() function to capture the video stream. We’ll pass the URL of the ESP32Cam video stream to the cv2.VideoCapture() function, and it will return a video stream object that we can use to capture the video frames.

rtsp_url = 'http://192.168.4.1:81/stream'

# Capture the video stream
cap = cv2.VideoCapture(rtsp_url)

Detecting Objects in the Video Stream

Before we write a simple program to capture the RTSP stream and process it, we need to setup a new Python environment on the Raspberry Pi 5. We’ll use the cvzone module to detect objects in the video stream. The cvzone module is a wrapper around the cv2 library, and makes it easier to detect objects in the video stream.


Creating a virtual environment

python3 -m venv venv
source venv/bin/activate

Install CVZone

pip3 install cvzone mediapipe opencv-python

Face Detection

We can use the cv2 library to detect objects in the video stream. To detect faces, we can use the CVZone module and the FaceDetector class. The FaceDetector class will detect faces in the video stream, and return a list of faces that it has detected. The cv2.imshow() function will display the video stream in a window on the screen, with a greenbox around the detected faces, along with a percentage confidence score.


```python
while True:
    ret, frame = cap.read()
    if not ret:
        print("Can't receive frame (stream end?). Exiting ...")
        break

    # Process the frame with OpenCV here
    frame, list_faces = face_detector.findFaces(frame)
    cv2.imshow("Face Detection", frame)

Why this is cool

The ESP32Cam is low power, simple to setup and configure and pretty low cost too (£0.57 each on Aliexpress - plus shipping). By offloading the video processing to the Raspberry Pi 5, we don’t need to change the ESP32Cam firmware and can build on the image processing capabilities on the Pi 5.


What else can we do?

We can use the image data to make decision on how to control the robot remotely, making it move towards objects or look at a face.

ESP32CAM

ESP32CAM


Bill of Materials

Item Description Price per item Qty Cost
ESP32CAM ESP32CAM Module £0.57 1 £0.57

Python code

import cv2
import cvzone
from cvzone import FaceDetectionModule
face_detector = FaceDetectionModule.FaceDetector()


# Replace with your RTSP stream URL
rtsp_url = 'http://192.168.4.1:81/stream'

# Capture the video stream
cap = cv2.VideoCapture(rtsp_url)

while True:
    ret, frame = cap.read()
    if not ret:
        print("Can't receive frame (stream end?). Exiting ...")
        break

    # Process the frame with OpenCV here
    frame, list_faces = face_detector.findFaces(frame)
    cv2.imshow("Face Detection", frame)
    
    if cv2.waitKey(1) == ord('q'):
        break

# Release everything if job is finished
cap.release()
cv2.destroyAllWindows()

Did you find this content useful?


If you found this high quality content useful please consider supporting my work, so I can continue to create more content for you.

I give away all my content for free: Weekly video content on YouTube, 3d Printable designs, Programs and Code, Reviews and Project write-ups, but 98% of visitors don't give back, they simply read/watch, download and go. If everyone who reads or watches my content, who likes it, helps fund it just a little, my future would be more secure for years to come. A price of a cup of coffee is all I ask.

There are a couple of ways you can support my work financially:


If you can't afford to provide any financial support, you can also help me grow my influence by doing the following:


Thank you again for your support and helping me grow my hobby into a business I can sustain.
- Kevin McAleer