1

I'm trying to calculate the roc curve for a set of predictions likes this

 fpr, tpr, thresholds = roc_curve(y_test, probas)

Here is the y_test array

 array([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11,
       -9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9,
       -8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74,
       -8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5,
       -8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26,
       -8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94,
       -7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71,
       -7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27,
       -7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83,
       -6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16,
       -6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92], dtype=object)

And here is the probas array

 array([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=object)

Now when I try to run

fpr, tpr, thresholds = roc_curve(y_test, probas)

I get a ValueError

--> 318 raise ValueError("{0} format is not supported".format(y_type)) 319 320 check_consistent_length(y_true, y_score, sample_weight)

ValueError: continuous format is not supported

How can I solve this?

2
  • that's not probabilities. Please give us the code of the classifier Commented Sep 21, 2018 at 11:00
  • 1
    They aren't probabilities, they are protein/ligand docking scores (gibbs free energy). Maybe I should have made that clear, sorry. Commented Sep 21, 2018 at 11:02

1 Answer 1

3

It looked like you switched the target scores and the binary labels. I had to remove the dtype=object from your arrays to make it work. Following is the working solution. As per the official page here, the first argument for roc_curve is the binary labels in the range {0,1} and the second argument is the target score. You were passing probab as the target scores and y_test as the binary labels.

from sklearn.metrics import roc_curve

y_test = np.asarray([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11, -9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9, -8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74, -8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5, -8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26, -8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94, -7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71, -7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27, -7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83, -6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16, -6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92])
probas = np.asarray([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
fpr, tpr, thresholds = roc_curve(probas,y_test)
plt.plot(fpr, label = 'fpr')
plt.plot(tpr, label = 'tpr')
plt.legend(fontsize=16)

Output

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.