Training a machine learning model requires evaluation to determine its real-world performance. Effective evaluation metrics vary based on the problem type. For classification tasks, accuracy is a popular metric due to its simplicity, representing the ratio of correctly predicted instances to the total instances. However, accuracy can be misleading, so more detailed metrics like precision and recall are also essential. Precision indicates the proportion of true positive instances among all instances labeled as positive, while recall measures the proportion of true positive instances among all actual positives. Both metrics provide different insights; precision focuses on the accuracy of positive labels, while recall emphasizes the capture ability of positive instances. These metrics can also extend beyond binary classification tasks, accommodating more complex classification scenarios.