Python数据可视化 - Pandas + DataFrame( 作图 )
Pandas模块中常见函数
- pandas.read_csv("path")
- 读取文件时会自动判定每列的数据类型,如果一列出现多种数据类型使用.info()查看时就会显示当前列属性为object
- 可以使用 "a[字段名].value_counts()" 来对该object类型中各个类型进行统计
data = DataFrame(np.arange(20).reshape(4,5),index = list("ABCD"),columns=list("abcde"))
- data.head()
- 查看前五条记录
- data.info()
- 查看各个字段的信息
- data.describe()
- 返回对每列数据基本处理后的各个数据 (mean/max之类
- data.shape[0] / len(data)
- 行数
- data.shape[1] / data.columns.size
- 列数
- data.iloc[1:3,1:3]
- 切片访问(Index:左闭右开)
- data.mean[0] + data.mean[1]
- 参数0表示求行平均值,1表示求列平均值
DataFrame绘图:
1> Plot折线图
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
a = pd.DataFrame(np.arange(15).reshape(3,5),columns=['Data-1','Data-2','Data-3','Data-4','Data-5']) # Row_Name:Index
b = a.describe()
print(b)
b.plot()
plt.legend(['Data-1','Data-2','Data-3','Data-4','Data-5'],loc="upper left")
plt.show()
2> Hist直方图
3> 散点图( demo涉及DataFrame行列的增加 )
import pandas as pd
from pandas import DataFrame
import numpy as np
import matplotlib.pyplot as plt
data = DataFrame([{"A":1,"B":2,"C":3}])
#print(data)
data = data.append([{"A":11,"B":22,"C":33},{"A":29.558,"B":55,"C":89}])
#print(data)
for i in range(20):
b = DataFrame([{"A":np.random.rand()*100,"B":np.random.rand()*100,"C":np.random.rand()*100}])
data = data.append(b,ignore_index=True)
#print(data)
data["D"]=np.random.ranf(23)*100
#print(data)
data.plot.scatter(x="B",y="C",color="red",alpha=0.3)
plt.show()
向DataFrame格式数据中插入一行与一列:
1> 插入一行
使用append()函数:
1. data = data.append([{"A":1,"B":2,"C":3}, {"A":11,"B":22,"C":33}, {"A":111,"B":222,"C":333}])
2. data = data.append(new_data, ignore_index=True)
2> 插入一列( 行数较少/较多时报错 )
data["New_Name"] = [..., ..., ...]
import pandas as pd
from pandas import DataFrame
import numpy as np
import matplotlib.pyplot as plt
data = DataFrame([{"A":1,"B":2,"C":3}])
print(data)
data = data.append([{"A":11,"B":22,"C":33},{"A":29.558,"B":55,"C":89}])
print(data)
for i in range(20):
b = DataFrame([{"A":np.random.rand()*100,"B":np.random.rand()*100,"C":np.random.rand()*100}])
data = data.append(b,ignore_index=True)
print(data)
data["D"]=np.random.ranf(23)*100
#print(data)
Result:
DataFrame转List:
https://blog.****.net/qq_42292831/article/details/89182921