Annotab Blog

Tl;dr

A step-by-step introduction to mean average precision and how it works.

Tl;dr

A step-by-step introduction to mean average precision and how it works.

Tl;dr

A step-by-step introduction to mean average precision and how it works.

Tl;dr

A step-by-step introduction to mean average precision and how it works.

Introduction

As a newbie to the field of deep learning, you embark on your initial step in object detection. Learning the ropes of building an object detection model can be a challenging endeavor. After investing significant effort in understanding the techniques involved, you eventually manage to successfully implement them. The sense of accomplishment fills you with joy, but there's still an essential aspect you cannot overlook: Evaluating the model's performance! The widely recognized metric for this purpose is mean Average Precision (mAP). This metric plays an essential role in assessing the effectiveness of object detection models.

You tried to read many documents about mAP, but... you felt concerned about something not being clear and perhaps got lost in it. Although it is difficult, you haven't given up and want to continue. Don't worry! You're not flying solo! This article aims to help you understand this topic from scratch: from what mean average precision is to how it works. Let’s get started!

Introduction

As a newbie to the field of deep learning, you embark on your initial step in object detection. Learning the ropes of building an object detection model can be a challenging endeavor. After investing significant effort in understanding the techniques involved, you eventually manage to successfully implement them. The sense of accomplishment fills you with joy, but there's still an essential aspect you cannot overlook: Evaluating the model's performance! The widely recognized metric for this purpose is mean Average Precision (mAP). This metric plays an essential role in assessing the effectiveness of object detection models.

You tried to read many documents about mAP, but... you felt concerned about something not being clear and perhaps got lost in it. Although it is difficult, you haven't given up and want to continue. Don't worry! You're not flying solo! This article aims to help you understand this topic from scratch: from what mean average precision is to how it works. Let’s get started!

Introduction

As a newbie to the field of deep learning, you embark on your initial step in object detection. Learning the ropes of building an object detection model can be a challenging endeavor. After investing significant effort in understanding the techniques involved, you eventually manage to successfully implement them. The sense of accomplishment fills you with joy, but there's still an essential aspect you cannot overlook: Evaluating the model's performance! The widely recognized metric for this purpose is mean Average Precision (mAP). This metric plays an essential role in assessing the effectiveness of object detection models.

You tried to read many documents about mAP, but... you felt concerned about something not being clear and perhaps got lost in it. Although it is difficult, you haven't given up and want to continue. Don't worry! You're not flying solo! This article aims to help you understand this topic from scratch: from what mean average precision is to how it works. Let’s get started!

Introduction

As a newbie to the field of deep learning, you embark on your initial step in object detection. Learning the ropes of building an object detection model can be a challenging endeavor. After investing significant effort in understanding the techniques involved, you eventually manage to successfully implement them. The sense of accomplishment fills you with joy, but there's still an essential aspect you cannot overlook: Evaluating the model's performance! The widely recognized metric for this purpose is mean Average Precision (mAP). This metric plays an essential role in assessing the effectiveness of object detection models.

You tried to read many documents about mAP, but... you felt concerned about something not being clear and perhaps got lost in it. Although it is difficult, you haven't given up and want to continue. Don't worry! You're not flying solo! This article aims to help you understand this topic from scratch: from what mean average precision is to how it works. Let’s get started!

What is object detection?

Now, let’s take an overview of object detection and its applications. Object detection is a computer vision technique that combines two tasks: Localizing and identifying objects of interest in images or video frames.

Localizing is the process of determining the location or coordinate of objects. In other words, it answers the question: "Where is the object in the image?".
Identifying is the task of recognizing or classifying the object once it has been localized. It addresses the inquiry: "What is the object? What category or class does it belong to?"

Object detection is a fundamental component of the visual problem in AI and this technology has a wide range of applications:

Self-driving: This is an object detection-based method. Information is gathered from the surrounding environment by various sensors such as cameras, LiDAR, radar, GPS, temperature, and weather sensors,… These sensors provide crucial inputs: other vehicles, pedestrians, traffic signs, and obstacles,… Object detection allows autonomous vehicles to perceive, and decision-making systems are built upon this perception, enabling them to interact with the environment effectively.
Security: Object detection plays a crucial role in this application. Firstly, it is used for access control through face recognition. In detail, it identifies criminals, unauthorized accesses, or suspicious activities through CCTV cameras. Furthermore, it is useful for the early detection of fire and smoke, along with the analysis of crowd behavior in public spaces. In summary, object detection strengthens security systems to identify and respond to potential threats swiftly and effectively.
Retail: Object detection enhances the management process, with one popular application being cashier-less systems. This function allows customers to self-checkout, where products are automatically identified and the bill is also calculated. Another benefit is checking the effective placement of products, including identifying products that are out of stock or incorrectly positioned.
Healthcare: The well-known usage is face mask recognition during the COVID-19 pandemic. Additionally, object detection is not only used for detecting harmful areas such as tumors or disease regions but is also valuable for tracking the movement of objects of interest. This technology aids radiologists and clinicians in faster and more accurate disease detection and diagnosis. Moreover, object detection can monitor patients and provide early fall detection.

Figure 1: Example of applications of object detection in self-driving, healthcare and retail

Before we dive deep into mAP, let's thoroughly explore some related definitions: confusion matrix and Intersection over Union (IoU).

What is object detection?

Now, let’s take an overview of object detection and its applications. Object detection is a computer vision technique that combines two tasks: Localizing and identifying objects of interest in images or video frames.

Localizing is the process of determining the location or coordinate of objects. In other words, it answers the question: "Where is the object in the image?".
Identifying is the task of recognizing or classifying the object once it has been localized. It addresses the inquiry: "What is the object? What category or class does it belong to?"

Object detection is a fundamental component of the visual problem in AI and this technology has a wide range of applications:

Self-driving: This is an object detection-based method. Information is gathered from the surrounding environment by various sensors such as cameras, LiDAR, radar, GPS, temperature, and weather sensors,… These sensors provide crucial inputs: other vehicles, pedestrians, traffic signs, and obstacles,… Object detection allows autonomous vehicles to perceive, and decision-making systems are built upon this perception, enabling them to interact with the environment effectively.
Security: Object detection plays a crucial role in this application. Firstly, it is used for access control through face recognition. In detail, it identifies criminals, unauthorized accesses, or suspicious activities through CCTV cameras. Furthermore, it is useful for the early detection of fire and smoke, along with the analysis of crowd behavior in public spaces. In summary, object detection strengthens security systems to identify and respond to potential threats swiftly and effectively.
Retail: Object detection enhances the management process, with one popular application being cashier-less systems. This function allows customers to self-checkout, where products are automatically identified and the bill is also calculated. Another benefit is checking the effective placement of products, including identifying products that are out of stock or incorrectly positioned.
Healthcare: The well-known usage is face mask recognition during the COVID-19 pandemic. Additionally, object detection is not only used for detecting harmful areas such as tumors or disease regions but is also valuable for tracking the movement of objects of interest. This technology aids radiologists and clinicians in faster and more accurate disease detection and diagnosis. Moreover, object detection can monitor patients and provide early fall detection.

Figure 1: Example of applications of object detection in self-driving, healthcare and retail

Before we dive deep into mAP, let's thoroughly explore some related definitions: confusion matrix and Intersection over Union (IoU).

What is object detection?

Now, let’s take an overview of object detection and its applications. Object detection is a computer vision technique that combines two tasks: Localizing and identifying objects of interest in images or video frames.

Localizing is the process of determining the location or coordinate of objects. In other words, it answers the question: "Where is the object in the image?".
Identifying is the task of recognizing or classifying the object once it has been localized. It addresses the inquiry: "What is the object? What category or class does it belong to?"

Object detection is a fundamental component of the visual problem in AI and this technology has a wide range of applications:

Self-driving: This is an object detection-based method. Information is gathered from the surrounding environment by various sensors such as cameras, LiDAR, radar, GPS, temperature, and weather sensors,… These sensors provide crucial inputs: other vehicles, pedestrians, traffic signs, and obstacles,… Object detection allows autonomous vehicles to perceive, and decision-making systems are built upon this perception, enabling them to interact with the environment effectively.
Security: Object detection plays a crucial role in this application. Firstly, it is used for access control through face recognition. In detail, it identifies criminals, unauthorized accesses, or suspicious activities through CCTV cameras. Furthermore, it is useful for the early detection of fire and smoke, along with the analysis of crowd behavior in public spaces. In summary, object detection strengthens security systems to identify and respond to potential threats swiftly and effectively.
Retail: Object detection enhances the management process, with one popular application being cashier-less systems. This function allows customers to self-checkout, where products are automatically identified and the bill is also calculated. Another benefit is checking the effective placement of products, including identifying products that are out of stock or incorrectly positioned.
Healthcare: The well-known usage is face mask recognition during the COVID-19 pandemic. Additionally, object detection is not only used for detecting harmful areas such as tumors or disease regions but is also valuable for tracking the movement of objects of interest. This technology aids radiologists and clinicians in faster and more accurate disease detection and diagnosis. Moreover, object detection can monitor patients and provide early fall detection.

Figure 1: Example of applications of object detection in self-driving, healthcare and retail

Before we dive deep into mAP, let's thoroughly explore some related definitions: confusion matrix and Intersection over Union (IoU).

What is object detection?

Now, let’s take an overview of object detection and its applications. Object detection is a computer vision technique that combines two tasks: Localizing and identifying objects of interest in images or video frames.

Localizing is the process of determining the location or coordinate of objects. In other words, it answers the question: "Where is the object in the image?".
Identifying is the task of recognizing or classifying the object once it has been localized. It addresses the inquiry: "What is the object? What category or class does it belong to?"

Object detection is a fundamental component of the visual problem in AI and this technology has a wide range of applications:

Self-driving: This is an object detection-based method. Information is gathered from the surrounding environment by various sensors such as cameras, LiDAR, radar, GPS, temperature, and weather sensors,… These sensors provide crucial inputs: other vehicles, pedestrians, traffic signs, and obstacles,… Object detection allows autonomous vehicles to perceive, and decision-making systems are built upon this perception, enabling them to interact with the environment effectively.
Security: Object detection plays a crucial role in this application. Firstly, it is used for access control through face recognition. In detail, it identifies criminals, unauthorized accesses, or suspicious activities through CCTV cameras. Furthermore, it is useful for the early detection of fire and smoke, along with the analysis of crowd behavior in public spaces. In summary, object detection strengthens security systems to identify and respond to potential threats swiftly and effectively.
Retail: Object detection enhances the management process, with one popular application being cashier-less systems. This function allows customers to self-checkout, where products are automatically identified and the bill is also calculated. Another benefit is checking the effective placement of products, including identifying products that are out of stock or incorrectly positioned.
Healthcare: The well-known usage is face mask recognition during the COVID-19 pandemic. Additionally, object detection is not only used for detecting harmful areas such as tumors or disease regions but is also valuable for tracking the movement of objects of interest. This technology aids radiologists and clinicians in faster and more accurate disease detection and diagnosis. Moreover, object detection can monitor patients and provide early fall detection.

Figure 1: Example of applications of object detection in self-driving, healthcare and retail

Before we dive deep into mAP, let's thoroughly explore some related definitions: confusion matrix and Intersection over Union (IoU).

Clarification of the confusion in confusion matrix

Ground truths and predictions

In most cases, you need a labeled dataset to build an object detection model. Labeling involves annotating each image in the dataset to specify the location and category (a class) of interested objects you want the model to detect. This information is crucial for training the model because it allows the model to learn patterns and features associated with different objects‘s categories and their positions. Bounding box is one of the types of annotations commonly used in object detection. This is a rectangular area of an image containing the object of the class to be detected. It is called a ‘ground truth bounding box’ or ‘actual bounding box’, simply are ‘ground truth box’ or ‘actual box’.

The output of detection model consists of a bounding box, a class, and a confidence value. The predicted bounding box made by model is simply named ‘predicted box’.

In Figure 3, these green boxes are actual boxes and pink boxes are predicted boxes made by the model.

Precision and Recall

First of all, let’s talk about “positive” and “negative” in the context of object detection.

“Positive” typically represents to object of interest – it means a class we want to detect. When we have multiple classes, each class is considered “positive” sequentially, and the rest are considered “negative”.
“Negative" refers to a bounding box or region that does not contain the target class. It can be other classes or regions that do not cover the target class. Simply, negatives are the rest of the image except for the positives.

Following these definitions, "positive" and "negative" help clarify the difference between detecting the object of interest and what is not the object in object detection scenarios.

True positive: The detection is identified as positive and it is truly positive.
False positive: The detection is identified as positive but it's actually a negative.
True negative: The detection is identified as negative and it is truly negative.
False negative: The detection is identified as negative but it's actually a positive.

Figure 2: Visualization of True positive, False positive, True negative, False negative

Precision and recall are typically calculated for each class separately. This allows you to evaluate the model's performance for each individual class in terms of its ability to correctly detect and classify objects of that class.

Precision is the ratio of true positive detections (TP) to the total number of positive detections the model made (TP+FP). It measures how accurate detection is. i.e. the percentage of detections that are correct.

Recall is the ratio of true positive detections (TP) to the total number of all actual positives (all actual positives = total number of ground truth = TP + FN). On the other hand, Recall measures how well the model finds all the positives.

Ok, let’s take a basic example. Assume we have a dog detection model, and we evaluate it on this image:

Figure 3: Predicted boxes and ground truth boxes by dog detection model

+ Number of true positives = 4

+ Number of false positives = 1

+ Number of false negative = 0

As you can see, while the Recall is 1, the Precision is 0.8. It means the model succeeds in detecting all the dogs, but it is also incorrect in detecting some cases.

Intersection over Union (IoU)

The IoU measures the overlap between the predicted bounding box (the region the model believes contains the object) and the ground truth bounding box (the actual region where the object is located).

It is calculated as the ratio of the area of intersection (the overlapping region) to the area of the union of the predicted and ground truth bounding boxes.

Figure 4: Intersection over Union

The value of IoU ranges from 0 to 1, where:

+ IoU = 0: No overlapping between the predicted and ground truth bounding box.

+ IoU ≈ 1: Perfect matching between the predicted and ground truth bounding boxes, meaning they nearly completely overlap.

In section 2, we discuss precision and recall without the context of IoU. However, object detection requires to use the IoU value as a measurement when identifying true positives and false positives. In general, the common value used for IoU is 0.5, 0.75 or 0.9,… depending on individual purposes.

For example, when you choose an IoU value of 0.5, it means that only the predicted bounding boxes whose IoU with their respective ground truth bounding boxes is equal to or greater than 0.5 are considered true positives. If the IoU is smaller than 0.5, they are considered as false positives.

Figure 5: Example of determining true positives based on IoU

Look at the Husky’s image, assume the IoU of the predicted box and ground truth box is 0.72.

If the IoU threshold is 0.7, this predicted box is considered true positive. But if we choose the IoU threshold as 0.75, it becomes a false positive.

So, based on the definition of object detection, when a predicted bounding box is ‘false positive’, it can be:

+ Correctly predict the category or class it belongs to, but have an IoU with its ground truth box smaller than the IoU threshold (localization issue)

+ Have an IoU with its ground truth box greater than the IoU threshold but incorrectly predict its category or class (identification issue).