2018年8月

  • 1.上传失败

没有给usr/uploads文件夹777权限

chmod -R 777 uploads
  • 2.上传较大图片时提示:

413 Request Entity Too Large413 Request Entity Too Large

一般是nginx的限制
修改nginx.conf文件中http{}部分
添加client_max_body_size 10M;

最好和php.ini中的文件限制大小相近较好(如果有)
upload_max_filesize
post_max_size
通过ps -ef|grep nginx查看nginx位置
通过./nginx -t测试配置文件是否正常
通过./nginc -s reload重载nginx

  • 3.EditorMD插件和Affinity主题有冲突,EditorMD启用后,主题不加载缩略图了。。。解决方法不知道。。

import requests
import time
from random import randint
from bs4 import BeautifulSoup as bs
def req_info(i):
    url = 'https://smtebooks.us/getfile/{}'.format(i)
    headers = {
        'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
    }

    res = requests.get(url,headers=headers,allow_redirects=False)
    try:
        res.headers['Location']
    except KeyError:
        pass
    else:
        return 'no content'
    res_decode = res.content.decode('utf-8')
    soup = bs(res_decode,'lxml')
    url = 'no'
    title = 'no'
    dl_url = 'no'
    for _ in soup.find_all('div',class_='priging_thumb'):
        url = 'https://smtebooks.us'+_.a.get('href')
        title = _.a.get('title')
    for _ in soup.find_all('a',class_='Download btn btn-block btn-lg btn-success'):
        dl_url = _.get('href')
    print(i,url,title,dl_url)
    with open('smtebooks.csv','a',encoding='utf-8') as f:
        #f.write('{},{},{},{}\n'.format(i,title,url,dl_url))
        #考虑到title有特殊字符
        f.write('{},"{}",{},{}\n'.format(i,title,url,dl_url))
    return url,title,dl_url
for i in range(20,13517):#13516
    print(i)
    req_info(i)
    k = randint(2,10)
    print('sleep',k,'s')
    time.sleep(k)
9
9 https://smtebooks.us/book/9/beginning-f-40-2nd-edition-pdf Beginning F# 4.0 2nd Edition https://drive.google.com/uc?export=download&id=0B4hhbFaItiPxUGRKVnQtWDhBMWM
sleep 3 s
10
10 https://smtebooks.us/book/10/head-first-ajax-pdf Head First Ajax 1st Edition https://drive.google.com/uc?export=download&id=0B4hhbFaItiPxbUlPdnlYckJ6MG8
sleep 3 s
11
11 https://smtebooks.us/book/11/c-24-hour-trainer-2nd-edition-pdf C# 24-Hour Trainer 2nd Edition https://drive.google.com/uc?export=download&id=0B4hhbFaItiPxcnRkMkNLSE44TFU
sleep 2 s
12
sleep 2 s
13
sleep 1 s
14
sleep 2 s
15
sleep 2 s
16
sleep 2 s
17
sleep 3 s
18
sleep 1 s
19
19 https://smtebooks.us/book/19/professional-aspnet-4-c-vb-pdf Professional ASP.NET 4 in C# and VB 1st Edition /Book/DownloadFile/19/professional-aspnet-4-c-vb-pdf
sleep 2 s

先写下遇到的问题及解决

导入pandas_datareader时出错

import pandas_datareader

出现

...
ImportError: cannot import name 'is_list_like'

根据stackoverflow的提问,通过评论中通过

import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like

来解决报错

获取阿里巴巴股票数据出错

alibaba = pdr.get_data_yahoo('BABA')

出现

ImmediateDeprecationError: Yahoo Actions has been immediately
deprecated due to large breaks in the API without the introduction of
a stable replacement. Pull Requests to re-enable these data connectors
are welcome.

github上说打补丁什么的,还有改代码之类的,也有说有pandas_datareader分支有已经修复的包。
后来还是根据说安装pandas_datareader的dev包来解决
不过GitHub的写法是

pip install git+https://github.com/pydata/pandas-datareader.git

注意 这都是anaconda管理的,但不能直接通过conda直接安装dev包
不过我么有装git啊 那么操作如下

  • 先运行Anaconda Prompt就是Anaconda中打开的那个terminal,不过在win下开始菜单可以直接用Anaconda Prompt打开。
  • 安装dev包之前需要卸载已经安装的pandas-datareader包

pip uninstall pandas-datareader

  • 下载zip包 解压 切换到解压后的目录
  • 然后通过pip安装dev包

pip setup.py install

pdr.get_data_yahoo('APPL')获取不到苹果的股票数据 好像是接口问题 舍弃苹果的数据

调用函数sns.distplot()有个警告

C:UsersweimoAnaconda3libsite-packagesmatplotlibaxes_axes.py:6462: UserWarning: The 'normed' > kwarg is deprecated, and has been replaced by the 'density' kwarg.
warnings.warn("The 'normed' kwarg is deprecated, and has been "

GitHub上有评论说是版本变化引出的问题,但似乎又没有影响,搜索到一个中文博客是说直接修改源代码

seaborn/distributions.py
hist_kws.setdefault(“normed”, norm_hist)

改为

hist_kws.setdefault(“density”, norm_hist)

不过最终决定暂时不管,毕竟只是个warning。
推测应该是数据源的问题

总结:就像是在学matlab画图...但课程中提到Matplotlib for Python Developers这本书,目前出第二版了(2018)。第一版还是2009年,计划好好学学。顺便翻译一下?
这本书的彩图PDF官方链接:
https://www.packtpub.com/sites/default/files/downloads/MatplotlibforPythonDevelopersSecondEdition_ColorImages.pdf
GitHub上的章节代码:
https://github.com/PacktPublishing/Matplotlib-for-Python-Developers-Second-Edition/
书籍首页:
Matplotlib for Python Developers, 2nd Edition

记录正文

import numpy as np
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas import Series, DataFrame
import pandas_datareader as pdr

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from datetime import datetime
start = datetime(2014, 9, 20)
alibaba = pdr.get_data_yahoo('BABA', start=start)
amazon = pdr.get_data_yahoo('AMZN', start=start)
alibaba.head()

<div>

High Low Open Close Volume Adj Close
Date
2014-09-19 99.699997 89.949997 92.699997 93.889999 271879400 93.889999
2014-09-22 92.949997 89.500000 92.699997 89.889999 66657800 89.889999
2014-09-23 90.480003 86.620003 88.940002 87.169998 39009800 87.169998
2014-09-24 90.570000 87.220001 88.470001 90.570000 32088000 90.570000
2014-09-25 91.500000 88.500000 91.089996 88.919998 28598000 88.919998

</div>

amazon.head()

<div>

High Low Open Close Volume Adj Close
Date
2014-09-19 332.760010 325.570007 327.600006 331.320007 6886200 331.320007
2014-09-22 329.489990 321.059998 328.489990 324.500000 3109700 324.500000
2014-09-23 327.600006 321.250000 322.459991 323.630005 2352600 323.630005
2014-09-24 329.440002 319.559998 324.170013 328.209991 2642200 328.209991
2014-09-25 328.540009 321.399994 327.989990 321.929993 2928800 321.929993

</div>

#alibaba.shape
#alibaba.describe()
alibaba.to_csv("alibaba.csv")
amazon.to_csv("amazon.csv")
alibaba['Adj Close'].plot(legend=True)
<matplotlib.axes._subplots.AxesSubplot at 0x2eb3813a128>

output_5_1.png

for _ in alibaba:
    if _ == 'Volume':
        continue
    alibaba[_].plot(legend=True)

output_6_0.png

alibaba['high-low'] = alibaba['High'] - alibaba['Low']
alibaba.head()

<div>

High Low Open Close Volume Adj Close high-low
Date
2014-09-19 99.699997 89.949997 92.699997 93.889999 271879400 93.889999 9.750000
2014-09-22 92.949997 89.500000 92.699997 89.889999 66657800 89.889999 3.449997
2014-09-23 90.480003 86.620003 88.940002 87.169998 39009800 87.169998 3.860001
2014-09-24 90.570000 87.220001 88.470001 90.570000 32088000 90.570000 3.349998
2014-09-25 91.500000 88.500000 91.089996 88.919998 28598000 88.919998 3.000000

</div>

alibaba['high-low'].plot(figsize=(25,5))
<matplotlib.axes._subplots.AxesSubplot at 0x2eb388cd6d8>

output_9_1.png

alibaba['daily-return'] = alibaba['Adj Close'].pct_change()
alibaba['daily-return'].plot(figsize=(25,5),linestyle='--',marker='o')
<matplotlib.axes._subplots.AxesSubplot at 0x2eb38b609b0>

output_11_1.png

sns.distplot(alibaba['daily-return'].dropna(),bins=100,color='red')
C:\Users\weimo\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "

<matplotlib.axes._subplots.AxesSubplot at 0x2eb3a9bfac8>

output_12_2.png

start = datetime(2015, 1, 1)
company = ['GOOG', 'MSFT', 'AMZN', 'FB']
#company = 'APPL'
top_tech_df = pdr.get_data_yahoo(company,start=start)['Adj Close']
top_tech_df.head()

<div>

Symbols AMZN FB GOOG MSFT
Date
2014-12-31 310.350006 78.019997 523.521423 42.663837
2015-01-02 308.519989 78.449997 521.937744 42.948578
2015-01-05 302.190002 77.190002 511.057617 42.553627
2015-01-06 295.290009 76.150002 499.212799 41.929050
2015-01-07 298.420013 76.150002 498.357513 42.461777

</div>

top_tech_df.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x180d6e03f60>

output_15_1.png

top_tech_dr = top_tech_df.pct_change()
top_tech_df[['FB', 'MSFT']].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x180d6de7550>

output_17_1.png

sns.jointplot('AMZN','GOOG',top_tech_dr,kind='scatter')
<seaborn.axisgrid.JointGrid at 0x180d6859f98>

output_18_1.png

sns.pairplot(top_tech_dr.dropna())
<seaborn.axisgrid.PairGrid at 0x180d6db60f0>

output_19_1.png

top_tech_dr['MSFT'].quantile(0.02)
-0.029942679770481886

output_5_1.png

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
s1 = Series([1,2,3],index=['A','B','C'])
s1
A    1
B    2
C    3
dtype: int64
s2 = Series([4,5,6,7],index=['B','C','D','E'])
s2
B    4
C    5
D    6
E    7
dtype: int64
s1 + s2
A    NaN
B    6.0
C    8.0
D    NaN
E    NaN
dtype: float64

对应index相加,对不上的就是NaN

DataFrame的运算

df_a = DataFrame(np.arange(4).reshape(2,2),index=['A','B'],columns=['北京','上海'])
df_b = DataFrame(np.arange(9).reshape(3,3),index=['A','B','C'],columns=['北京','上海','广州'])
df_a

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

北京 上海
A 0 1
B 2 3

</div>

df_b

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

北京 上海 广州
A 0 1 2
B 3 4 5
C 6 7 8

</div>

df_a + df_b

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

上海 北京 广州
A 2.0 0.0 NaN
B 7.0 5.0 NaN
C NaN NaN NaN

</div>

类似的,index和columns对应的部分可以相加,否则为NaN

df_c = DataFrame([[1,2,3],[4,5,np.nan],[7,8,9]],index=['A','B','C'],columns=['c1','c2','c3'])
df_c

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

c1 c2 c3
A 1 2 3.0
B 4 5 NaN
C 7 8 9.0

</div>

df_c.sum()
c1    12.0
c2    15.0
c3    12.0
dtype: float64
df_c.sum(axis = 1)
A     6.0
B     9.0
C    24.0
dtype: float64
type(df_c.sum())
pandas.core.series.Series

DataFrame中求和的时候会忽略NaN

axis = 1 可以指定行的计算

df_c.describe()

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

c1 c2 c3
count 3.0 3.0 2.000000
mean 4.0 5.0 6.000000
std 3.0 3.0 4.242641
min 1.0 2.0 3.000000
25% 2.5 3.5 4.500000
50% 4.0 5.0 6.000000
75% 5.5 6.5 7.500000
max 7.0 8.0 9.000000

</div>

s1.index
Index(['A', 'B', 'C'], dtype='object')
s1.sort_values()#按value升序
A    1
B    2
C    3
dtype: int64
s1.sort_values(ascending=False)#按value降序
C    3
B    2
A    1
dtype: int64
s1.sort_index(ascending=False)#按index降序
C    3
B    2
A    1
dtype: int64
df_d = DataFrame(np.random.randn(35).reshape(7,5),columns=['A','B','C','D','E'])
df_d

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

A B C D E
0 -0.694245 -0.302792 0.667865 0.447782 -0.413812
1 -0.502081 -1.849090 1.885715 -1.117864 0.406936
2 0.384877 0.076701 -1.052755 -0.709675 0.272562
3 -1.194740 -0.518320 -0.139549 -0.745238 1.270952
4 -1.266443 -1.163004 -0.644873 -0.333446 0.349508
5 -0.695937 -0.589887 1.475200 0.278659 2.207159
6 -0.712247 0.171372 0.268192 0.138490 0.604858

</div>

df_d.sort_values('A',ascending=False)#按A列降序

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

A B C D E
2 0.384877 0.076701 -1.052755 -0.709675 0.272562
1 -0.502081 -1.849090 1.885715 -1.117864 0.406936
0 -0.694245 -0.302792 0.667865 0.447782 -0.413812
5 -0.695937 -0.589887 1.475200 0.278659 2.207159
6 -0.712247 0.171372 0.268192 0.138490 0.604858
3 -1.194740 -0.518320 -0.139549 -0.745238 1.270952
4 -1.266443 -1.163004 -0.644873 -0.333446 0.349508

</div>

df_d.sort_index(ascending=False)#按index降序

<div>
<style scoped>

.dataframe tbody tr th:only-of-type {
    vertical-align: middle;
}

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

A B C D E
6 -0.712247 0.171372 0.268192 0.138490 0.604858
5 -0.695937 -0.589887 1.475200 0.278659 2.207159
4 -1.266443 -1.163004 -0.644873 -0.333446 0.349508
3 -1.194740 -0.518320 -0.139549 -0.745238 1.270952
2 0.384877 0.076701 -1.052755 -0.709675 0.272562
1 -0.502081 -1.849090 1.885715 -1.117864 0.406936
0 -0.694245 -0.302792 0.667865 0.447782 -0.413812

</div>

所谓课后练习 ——> 将一个已有csv文件中的三列提取出来,按其中一项降序排列,要求用一行完成(真是。。)。

import pandas as pd
csv_file = open("E:/Python3数据科学入门与实战/project/o25mso/homework/movie_metadata.csv","r",encoding="utf-8")
read_csv_in = pd.read_csv(csv_file)
df = read_csv_in[['imdb_score','director_name','movie_title']].sort_values('imdb_score',ascending=False)
df.to_csv('new_imdb.csv')
#pd.read_csv(open("E:/Python3数据科学入门与实战/project/o25mso/homework/movie_metadata.csv"))[['imdb_score','director_name','movie_title']].sort_values('imdb_score',ascending=False)
!ls
csv_practice.ipynb
demo.ipynb
new_imdb.csv
practice-2018-08-11.ipynb

有中文路径时通过open("filename","r",encoding="utf-8")读进来,

再用pandas.read_csv()读。注意用utf-8读取否则会出现问题。
这样做好处主要是允许路径有中文

df.head()

<div>

imdb_score director_name movie_title
2765 9.5 John Blanchard Towering Inferno
1937 9.3 Frank Darabont The Shawshank Redemption
3466 9.2 Francis Ford Coppola The Godfather
4409 9.1 John Stockwell Kickboxer: Vengeance
2824 9.1 NaN Dekalog

</div>