Columntransformer Fails With Countvectorizer In A Pipeline
Solution 1:
You can utilize make_column_transformer
and do something like the following. remainder are the remaining features on which you can apply other transformations. By default, remainder is set to 'drop' which means that the remaining features without any transformations will be dropped.:
preprocess = make_column_transformer((CountVectorizer(), 'text_feat'),
remainder='passthrough')
make_pipeline(preprocess).fit_transform(X)
More info here
The following blog goes into more details: https://jorisvandenbossche.github.io/blog/2018/05/28/scikit-learn-columntransformer/
A few tips on your code: While transforming features, you do not need to (read: shouldn't) pass y (i.e. the target). The issue in your code is because you are passing the list of text features instead of name the column. If you change your code slightly, you should get the same results.
preprocessor = ColumnTransformer(
transformers=[('text', text_transformer, 'text_feat')])
Solution 2:
# wrap in ColumnTransformerpreprocessor = ColumnTransformer(transformers=[('text', CountVectorizer(),'text_feat')])
# second pipelinepipeline = Pipeline(steps=[('preprocessor', preprocessor)])
X_test = pipeline.fit_transform(X)
This works and seems the simplest for me.
Post a Comment for "Columntransformer Fails With Countvectorizer In A Pipeline"