Why my XGBoostClassifier model results in perfect accuracy despite dropping corelated features?

Ask Question

Asked 2 years, 1 month ago

Modified 2 years, 1 month ago

Viewed 55 times

I am trying to do a binary classification on ticket canceling data from kaggle.

I know this question has been asked before. For example here and here

Summary of what I learned in those references:

this can happen if data is unbalanced
data leakage: one of the input features is actually a direct proxy for the target variable.

My data is unbalanced but not extremely unbalanced. Since this is binary classification:

y.sum()/len(y) = 0.151

Thus I have about 15% in one category. This is high but not exterme. For data leakage, I looked at the correlation matrix which is as follows:

The feature importance is

The target variable is "Cancel". None of the variables have extremely high correlation. The model is

   model = XGBClassifier(objective='multi:softmax', num_class=3)

Yet my classification report is perfect:

          precision    recall  f1-score   support

       0       1.00      1.00      1.00     21455
       1       1.00      1.00      1.00      3781

accuracy                            1.00     25236
macro avg       1.00      1.00      1.00     25236
weighted avg    1.00      1.00      1.00     25236

How to solve this?

asked Oct 7, 2023 at 15:05

wander95

1011 bronze badge

$\begingroup$ Is this classification report calculated using a separate test set? $\endgroup$

usεr11852
– usεr11852

2023-10-18 20:37:22 +00:00
Commented Oct 18, 2023 at 20:37
$\begingroup$ yes, for a test_train split $\endgroup$

wander95
– wander95

2023-10-19 00:36:38 +00:00
Commented Oct 19, 2023 at 0:36

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Why my XGBoostClassifier model results in perfect accuracy despite dropping corelated features?

0

Your Answer

Linked

Hot Network Questions

Why my XGBoostClassifier model results in perfect accuracy despite dropping corelated features?

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions