OVERFLOW TAG
PREDICTOR
CONTENTS
● REAL BUSINESS PROBLEM
● BUSINESS OBJECTIVES & CONSTRAINTS
● DATA OVERVIEW
● TYPE OF MACHINE LEARNING PROBLEM
● PERFORMANCE METRICS
● ANALYSIS OF TAGS
● DATA PREPROCESSING
● FEATURIZATION
● CLASSIFIERS TO BE USED
REAL BUSINESS PROBLEM
In a bunch of questions
provided, each contains
three segments Title,
Description and Tags.
Tags: The tags associated with the question in a space separated format
TYPE OF MACHINE LEARNING
PROBLEM
Multi-class classification problem:
Multi-label Classification:
We can use F1 Score here as it only gives good value if both the Precision and
Recall are high. For Multi Label Setting F1 score is modified as:
iii. Removed Special characters from Question title and description (not in
code)
Bag Of Words:
CLASSIFIERS TO BE USED
Our One vs Rest classifier can take any model
Preferred:
Logistic Regression
Not Preferred:
Random Forest
GBDT