字符串转换的熊猫系列到湘江边
问题描述:
我想暗算“MJD”与“MJD_DUPLICATE”与(13MB)数据集 DR14Q_pruned_repeats.csv”找到这里:: https://www.dropbox.com/s/1dyong27bre3p9j/DR14Q_pruned_repeats.csv?dl=0字符串转换的熊猫系列到湘江边
这里是我的代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from astropy.table import Table
from astropy.io import ascii
from astropy.io import fits
filename = 'DR14Q_pruned_repeats.csv'
df = pd.read_csv(filename)
multiples = df[df["N_SPEC"] >2]
multiples.plot.scatter(x='MJD', y='N_SPEC')
plt.show()
multiples.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()
的MJD与MJD_DUPLICATE绘制线返回一个错误::
ValueError: scatter requires y column to be numeric
和pd.to_numeric线返回只是 NaNs。
答
您需要:从string
小号
import ast
doubles = df[df["N_SPEC"] ==2].copy()
multiples = df[df["N_SPEC"] >2].copy()
repeats = df[df["N_SPEC"] >1].copy()
multiples.plot.scatter(x='MJD', y='N_SPEC')
plt.show()
将列MJD_DUPLICATE
到元组,然后选择值按位置 - 例如str[1]
元组的第二个值:
print (multiples['MJD_DUPLICATE'].head(10))
5 (0, 56279, 0, 56539, 0, 56957, -1, -1, -1, -1,...
85 (0, 56243, 0, 56543, 0, 57328, -1, -1, -1, -1,...
170 (0, 52262, 0, 55447, 0, 57011, -1, -1, -1, -1,...
200 (0, 52262, 0, 55443, 0, 57006, -1, -1, -1, -1,...
262 (0, 52525, 0, 55443, 0, 57011, -1, -1, -1, -1,...
277 (0, 51793, 0, 55531, 0, 57006, -1, -1, -1, -1,...
287 (0, 55182, 0, 55184, 0, 55443, -1, -1, -1, -1,...
313 (0, 56248, 0, 56245, 0, 56572, -1, -1, -1, -1,...
314 (0, 55182, 0, 55184, 0, 55444, -1, -1, -1, -1,...
324 (0, 52261, 0, 55184, 0, 55444, -1, -1, -1, -1,...
Name: MJD_DUPLICATE, dtype: object
ser = multiples['MJD_DUPLICATE'].apply(ast.literal_eval).str[1]
multiples['MJD_DUPLICATE'] = pd.to_numeric(ser, errors='coerce')
print (multiples['MJD_DUPLICATE'].head(10))
5 56279
85 56243
170 52262
200 52262
262 52525
277 51793
287 55182
313 56248
314 55182
324 52261
Name: MJD_DUPLICATE, dtype: int64
multiples.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()
这是有效的,但不会做我以后的事情。我需要保留MJD_DUPLICATES中的所有数字数据,而不仅仅是第二列。 – npross
是的,然后使用新名称'multiples ['MJD_DUPLICATE_NEW'] = pd.to_numeric(ser,errors ='coerce')'创建新列并绘制它'multiples.plot.scatter(x ='MJD',y =' MJD_DUPLICATE_NEW')' – jezrael
根本无法绘制元组,需要标量。 – jezrael