0

This is the format of the dataset enter image description here

This is my code:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Importing the dataset
dataset1 = pd.read_csv('DATASETS/movielens movie recommender/ml-25m/ratings.csv')

#Splitting into dependent and independent variables
X1 = dataset1.iloc[:,[0,3]].values
y1 = dataset1.iloc[:, 1:3].values

#Encoding
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0,1])], remainder='passthrough')
ct2 = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
y1 = np.array(ct.fit_transform(y1))
X1 = np.array(ct2.fit_transform(X1))


#Splitting into training set and test set
from sklearn.model_selection import train_test_split
X1_train, X1_test, y1_train, y1_test = train_test_split(X1, y1, test_size = 0.2, random_state = 1)

I get the following error

TypeError: Singleton array array(<25000095x162542 sparse matrix of type '<class 'numpy.float64'>'
    with 50000190 stored elements in Compressed Sparse Row format>,
      dtype=object) cannot be considered a valid collection.

Could someone tell me what this means and how i could solve this?

1
  • 1
    ColumnTransformer(transformers=[('encoder', OneHotEncoder(sparse=False), [0,1])], remainder='passthrough') can you try this? Commented Oct 25, 2020 at 8:44

1 Answer 1

2

instead of this

y1 = np.array(ct.fit_transform(y1))

X1 = np.array(ct2.fit_transform(X1))

you can use

y1 = ct.fit_transform(y1).toarray()

x1 = ct.fit_transform(x1).toarray()

it works for me !

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.