Type error:Singleton array while trying to split the dataset in python using train_test_split()

Question

This is the format of the dataset enter image description here

This is my code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Importing the dataset
dataset1 = pd.read_csv('DATASETS/movielens movie recommender/ml-25m/ratings.csv')

#Splitting into dependent and independent variables
X1 = dataset1.iloc[:,[0,3]].values
y1 = dataset1.iloc[:, 1:3].values

#Encoding
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0,1])], remainder='passthrough')
ct2 = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
y1 = np.array(ct.fit_transform(y1))
X1 = np.array(ct2.fit_transform(X1))


#Splitting into training set and test set
from sklearn.model_selection import train_test_split
X1_train, X1_test, y1_train, y1_test = train_test_split(X1, y1, test_size = 0.2, random_state = 1)

I get the following error

TypeError: Singleton array array(<25000095x162542 sparse matrix of type '<class 'numpy.float64'>'
    with 50000190 stored elements in Compressed Sparse Row format>,
      dtype=object) cannot be considered a valid collection.

Could someone tell me what this means and how i could solve this?

ColumnTransformer(transformers=[('encoder', OneHotEncoder(sparse=False), [0,1])], remainder='passthrough') can you try this? — erentknn
– erentknn, Commented Oct 25, 2020 at 8:44

Slava Rozhnev · Accepted Answer · 2021-07-21 13:15:21Z

2

instead of this

y1 = np.array(ct.fit_transform(y1))

X1 = np.array(ct2.fit_transform(X1))

you can use

y1 = ct.fit_transform(y1).toarray()

x1 = ct.fit_transform(x1).toarray()

it works for me !

edited Jul 21, 2021 at 13:15

Slava Rozhnev

10.2k6 gold badges27 silver badges43 bronze badges

answered Jul 21, 2021 at 12:49

ALKESH KUMAR

212 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Type error:Singleton array while trying to split the dataset in python using train_test_split()

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related