创建文档术语矩阵时出现属性错误

问题描述：

我正在尝试创建以熊猫数据框的形式表示的文档术语矩阵。这是我到目前为止的代码：创建文档术语矩阵时出现属性错误

df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete

当我运行这段代码，我得到了以下错误：

'list' object has no attribute 'lower'

我怎样才能摆脱这种错误的？

答

包裹列表STR（）对象将它们转换为字符串：

df_profession['Athlete_Clean'] = str(df_profession['Athlete Biographies']).lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = str(df_profession['Athlete_Clean']).split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete

所以这似乎已经超过了问题，但现在我得到“ValueError异常：值的长度不符合的长度索引“的任何建议，为什么这是出现？ – Jberk

这个错误是熊猫图书馆内部的，所以我不确定。这可能值得一个新的问题。如果你确实把它作为一个新问题，我建议使用dataframe标签。 – JacobIRR

好的，谢谢JacobIRR。我会继续并就这个新错误创建一个新问题。 – Jberk

创建文档术语矩阵时出现属性错误

相关推荐