创建文档术语矩阵时出现属性错误
问题描述:
我正在尝试创建以熊猫数据框的形式表示的文档术语矩阵。这是我到目前为止的代码:创建文档术语矩阵时出现属性错误
df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower()
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()]))
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split()
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation]
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')]
profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index)
profession_dtm_athlete
当我运行这段代码,我得到了以下错误:
'list' object has no attribute 'lower'
我怎样才能摆脱这种错误的?
答
包裹列表STR()对象将它们转换为字符串:
df_profession['Athlete_Clean'] = str(df_profession['Athlete Biographies']).lower()
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()]))
df_profession['Athlete_Clean'] = str(df_profession['Athlete_Clean']).split()
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation]
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')]
profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index)
profession_dtm_athlete
所以这似乎已经超过了问题,但现在我得到“ValueError异常:值的长度不符合的长度索引“的任何建议,为什么这是出现? – Jberk
这个错误是熊猫图书馆内部的,所以我不确定。这可能值得一个新的问题。如果你确实把它作为一个新问题,我建议使用dataframe标签。 – JacobIRR
好的,谢谢JacobIRR。我会继续并就这个新错误创建一个新问题。 – Jberk