标签 python 下的文章

由于自用的浏览器cookie管理插件导入导出时用的是字典形式,而python里面对[{}, {}]的cookies加载支持不太友好。
所以根据RFC6265和RequestsCookieJar源码,写了一个适用于此形式cookie的转换脚本,便于python载入使用。
代码如下:

'''
@作者: weimo
@创建日期: 2020-04-11 21:45:46
@上次编辑时间: 2020-04-11 22:37:48
@一个人的命运啊,当然要靠自我奋斗,但是...
'''
import json
from pathlib import Path
from http.cookiejar import Cookie
from requests.cookies import RequestsCookieJar

def convert(cookies_path: str = "cookies.json"):
    # convert [{}, {}] cookies to RequestsCookieJar format
    try:
        _cookies = json.loads(Path(cookies_path).read_text(encoding="utf-8"))
    except Exception as e:
        return
    BASECOOKIE = {
        "version": 0,
        "name": "",
        "value": "",
        "port": None,
        "port_specified": False,
        "domain": "",
        "domain_specified": False,
        "domain_initial_dot": False,
        "path": "/",
        "path_specified": False,
        "secure": False,
        "expires": None,
        "discard": False,
        "comment": None,
        "comment_url": None,
        "rest": {},
    }
    cookies = RequestsCookieJar()
    for c in _cookies:
        BASECOOKIE["name"] = c["name"]
        BASECOOKIE["value"] = c["value"]
        if c["domain"] != "":
            BASECOOKIE["domain"] = c["domain"]
            BASECOOKIE["domain_specified"] = True
            if c["domain"].split(".").__len__() == 3:
                BASECOOKIE["domain_initial_dot"] = True
        BASECOOKIE["path"] = c["path"]
        BASECOOKIE["secure"] = c["secure"]
        BASECOOKIE["expires"] = c.get("expirationDate")
        if c["path"] != "":
            BASECOOKIE["path"] = c["path"]
            BASECOOKIE["path_specified"] = True
        if c["httpOnly"]:
            BASECOOKIE["rest"].update({"httpOnly":None})
        if c["hostOnly"]:
            BASECOOKIE["rest"].update({"hostOnly":None})
        cookies.set_cookie(Cookie(**BASECOOKIE))
    return cookies

if __name__ == "__main__":
    convert("cookies.json")

为什么会有这个需求?

如果要追踪一个js中的变量变化,一般是在开发者工具的Sources选项卡中,对js下断点。
这其中有几个不爽的地方:

  1. js文件被压缩,不方便直接下断点,一般在格式化后下断点比较方便。
  2. js文件现在越来越大,本来浏览器就已经够占内存了,在Sources选项卡进入js并格式化常常需要等很久的时间,甚至直接没有响应。
  3. 在多个地方下断点不方便,以及有的地方下了断点也进不去。

因此本方案通过Gooreplacer插件重定向特定的js请求到本地js文件来解决上述问题烦心的点。
为了本地js能够返回特定的headers,选择通过重写SimpleHTTPRequestHandler来实现,同时保证浏览器顺利请求本地文件。

以获取西瓜视频中DRM解密用的key为例

自定义返回头脚本见此或文末。
获取西瓜视频DRM解密key的关键点
地址:

https://www.ixigua.com/cinema/album/7MzYdtWv46X_7MBDgA7bPWt/

  • 打开上述地址,F12后在Network过滤js文件关键词xgplayer_encrypt
  • 可以看到该js有一些特定的返回头

请输入图片描述

  • 首先编写一个如下形式的配置文件,由于我们格式化了js,这里要去掉content-encoding和content-length,命名为config.json,实际上不需要全部的头,只需要保证有access-control-allow-origin就行

请输入图片描述
精简版配置:

{
    "host": "127.0.0.1",
    "port": 22222,
    "scripts_path": "scripts",
    "vendors~xgplayer_encrypt.b05f677a.chunk.js": {
        "access-control-allow-origin": "*"
    }
}
  • 新建scripts文件夹,将vendors~xgplayer_encrypt.b05f677a.chunk.js放在scripts文件夹里面
  • 保存配置并执行cheat_server脚本,通过访问http://127.0.0.1:22222/vendors~xgplayer_encrypt.b05f677a.chunk.js可以看到返回头与设定的全部一致

请输入图片描述

  • 打开vendors~xgplayer_encrypt.b05f677a.chunk.js进行格式化,并在window.Module.UTF8ToString(p)前面加一句debugger;

请输入图片描述

  • 设定Gooreplacer插件重定向规则,并启用,注意不需要调试的时候记得关闭

请输入图片描述

  • 现在提前F12并刷新西瓜视频地址,等待自动进入debugger处

请输入图片描述

  • 现在愉快的拿到DRM解密用的key啦

cheat_server实现代码

完整cheat_server脚本见:
https://github.com/xhlove/cheat_server

#!/usr/bin/env python3.7
# coding=utf-8
'''
# 作者: weimo
# 创建日期: 2020-01-18 01:01:09
# 上次编辑时间: 2020-02-22 18:14:01
# 一个人的命运啊,当然要靠自我奋斗,但是...
'''
import os
import sys
import json
import chardet
import datetime
import email.utils
import urllib.parse
from http import HTTPStatus
from functools import partial
from http.server import HTTPServer, SimpleHTTPRequestHandler

def load_config():
    config = {}
    config_path = "config.json"
    if os.path.isfile(config_path):
        with open(config_path, "rb") as f:
            # 只读256是为了避免读取文件太大,虽然一般不会太大
            _encoding = chardet.detect(f.read(256))["encoding"]
        with open(config_path, "r", encoding=_encoding) as f:
            config = json.loads(f.read())
    return config

class MyHandler(SimpleHTTPRequestHandler):

    def __init__(self, *args, config: dict = {}, **kwargs):
        self.config = config
        kwargs["directory"] = os.path.join(os.getcwd(), config["scripts_path"])
        super().__init__(*args, **kwargs)

    def send_head(self):
        path = self.translate_path(self.path)
        f = None
        if os.path.isdir(path):
            parts = urllib.parse.urlsplit(self.path)
            if not parts.path.endswith('/'):
                # redirect browser - doing basically what apache does
                self.send_response(HTTPStatus.MOVED_PERMANENTLY)
                new_parts = (parts[0], parts[1], parts[2] + '/',
                             parts[3], parts[4])
                new_url = urllib.parse.urlunsplit(new_parts)
                self.send_header("Location", new_url)
                self.end_headers()
                return None
            for index in "index.html", "index.htm":
                index = os.path.join(path, index)
                if os.path.exists(index):
                    path = index
                    break
            else:
                return self.list_directory(path)
        ctype = self.guess_type(path)
        try:
            f = open(path, 'rb')
        except OSError:
            self.send_error(HTTPStatus.NOT_FOUND, "File not found")
            return None

        try:
            fs = os.fstat(f.fileno())
            # Use browser cache if possible
            if ("If-Modified-Since" in self.headers
                    and "If-None-Match" not in self.headers):
                # compare If-Modified-Since and time of last file modification
                try:
                    ims = email.utils.parsedate_to_datetime(
                        self.headers["If-Modified-Since"])
                except (TypeError, IndexError, OverflowError, ValueError):
                    # ignore ill-formed values
                    pass
                else:
                    if ims.tzinfo is None:
                        # obsolete format with no timezone, cf.
                        # https://tools.ietf.org/html/rfc7231#section-7.1.1.1
                        ims = ims.replace(tzinfo=datetime.timezone.utc)
                    if ims.tzinfo is datetime.timezone.utc:
                        # compare to UTC datetime of last modification
                        last_modif = datetime.datetime.fromtimestamp(
                            fs.st_mtime, datetime.timezone.utc)
                        # remove microseconds, like in If-Modified-Since
                        last_modif = last_modif.replace(microsecond=0)

                        if last_modif <= ims:
                            self.send_response(HTTPStatus.NOT_MODIFIED)
                            self.end_headers()
                            f.close()
                            return None

            self.send_response(HTTPStatus.OK)
            # self.send_header("Content-type", ctype)
            # self.send_header("Content-Length", str(fs[6]))
            # self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
            self.send_custom_header()
            self.end_headers()
            return f
        except:
            f.close()
            raise

    def send_custom_header(self):
        if self.path.startswith("/"):
            js_path = self.path.lstrip("/")
        else:
            js_path = self.path
        if self.config.get(js_path) is None:
            return
        headers = self.config[js_path]
        for key, value in headers.items():
            self.send_header(key, value)

    def send_response(self, code, message=None):
        self.log_request(code)
        self.send_response_only(code, message)
        # self.send_header('Server', self.version_string())
        # self.send_header('Date', self.date_time_string())

    def send_header(self, keyword, value):
        if self.request_version != 'HTTP/0.9':
            if not hasattr(self, '_headers_buffer'):
                self._headers_buffer = []
            self._headers_buffer.append(
                ("%s: %s\r\n" % (keyword, value)).encode('latin-1', 'strict'))

        if keyword.lower() == 'connection':
            if value.lower() == 'close':
                self.close_connection = True
            elif value.lower() == 'keep-alive':
                self.close_connection = False

def main():
    config = load_config()
    Handler = partial(MyHandler, config=config)
    server = HTTPServer((config["host"], config["port"]), Handler)
    print("Starting server, listen at: http://{host}:{port}".format(**config))
    server.serve_forever()

if __name__ == '__main__':
    main()

这个接口的好处是不需要cookies,也不需要计算某个参数,替换tvid就可以用。
至于headers中的sign参数,通常情况下和t一起出现,有效期很长,另外目前不需要sign和t也能请求。
代码示例如下

import requests

url = "http://iface2.iqiyi.com/video/3.0/v_download"

querystring = {"api":"nil","app_t":"1","app_p":"gphone","app_k":"69842642483add0a63503306d63f0443","app_v":"10.9.0","dev_ua":"Redmi","dev_os":"10","dev_hw":"null","album_id":"239229801","tv_id":"8502453100","platform_id":"10","play_core":"nil","net_sts":"16","usr_res":"512","secure_v":"1","secure_p":"GPhone","qdv":"1","api_v":"9.8","req_sn":"1571529770886"}

headers = {
    'sign': "abf536b8e1203ae3f2e6c5c8dbb9b65d",
    'User-Agent': "iqiyi/com.qiyi.video/10.9.0/NetLib-okhttp/3.12.5.1",
    't': "529636367",
    'Accept': "*/*",
    'Host': "iface2.iqiyi.com",
    'Accept-Encoding': "gzip, deflate"
    }

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

返回示例

{
    "code": 0,
    "album": {
        "_blk": 0,
        "_cid": 2,
        "_ct": "2019-03-18 10:34:23",
        "_id": 239229801,
        "_img": "http://m.iqiyipic.com/image/20191010/5f/ad/a_100272782_m_601_m2_120_160.jpg",
        "_pc": 2,
        "_pid": 239229801,
        "_t": "光荣时代",
        "_tvct": 1,
        "_tvs": 46,
        "ctype": 0,
        "fst_time": "2019-04-25",
        "tv_id": 8502453100,
        "cpt_l": 1,
        "cpt_r": 2,
        "t_pc": 1,
        "_dn": "2736",
        "h1_img": "http://m.iqiyipic.com/image/20191010/5f/ad/a_100272782_m_601_m2_180_236.jpg",
        "boss_type": 99,
        "clm": "",
        "tvfcs": "张译黄志忠明暗对决",
        "year": "2019",
        "t": "光荣时代",
        "v2_img": "http://m.iqiyipic.com/image/20191010/5f/ad/a_100272782_m_601_m2_284_160.jpg",
        "logo": 1,
        "logo_position": -1,
        "logo_hidden": [],
        "business_type": []
    },
    "video": {
        "_id": 8502453100,
        "_n": "光荣时代第13集",
        "subtitle": "朝阳冒充中统",
        "_od": 13,
        "desc": "北平开始打击“粮老虎”,很多奸商被抓。郑朝阳从奸商的往来账目中发现了杨凤刚的驻地——平西沙子口旧矿场。郑朝阳化妆粮食商人一路做了记号到旧矿场见到杨凤刚,杨凤刚知道郑朝阳是警察。郑朝阳则冒充自己是中统骗过杨凤刚,但还是被关押起来。在关押处发现了冼怡竟然也关在这里。矿场外,包围圈完成。矿场里,杨凤刚获悉郑朝阳说谎。下令连冼怡一起处决。这时,冼登奎带着委任状来和杨凤刚谈判。城里,郑朝山从旧警察多门的口中探听到郑朝阳带人抓捕杨凤刚的情报,紧急给杨凤刚发报撤离。\n",
        "_img": "http://m.iqiyipic.com/image/20191014/77/39/v_139408871_m_601_160_90.jpg",
        "_dn": "2705",
        "s_t": 90,
        "e_t": 2530,
        "ts_res": {
            "16": {
                "vid": "becedab20c5e7a6f4a436d039cd0b103",
                "len": 368685759,
                "f4v_url": "",
                "m3u8_url": "",
                "dolby_len": 0,
                "h265_len": 0,
                "audio_aac_len": 0
            },
            "512": {
                "vid": "a509b538d47e0e59c0ac51c16863950e",
                "len": 704308245,
                "f4v_url": "",
                "m3u8_url": "",
                "dolby_len": 0,
                "h265_len": 0,
                "audio_aac_len": 0
            },
            "4": {
                "vid": "bc4a5528adf8ead809255ff2b5b183ea",
                "len": 121820901,
                "f4v_url": "",
                "m3u8_url": "",
                "dolby_len": 0,
                "h265_len": 0,
                "audio_aac_len": 0
            },
            "8": {
                "vid": "e633dfc4ecb17a5824a86c728ca2ed4e",
                "len": 163529602,
                "f4v_url": "",
                "m3u8_url": "",
                "dolby_len": 0,
                "h265_len": 0,
                "audio_aac_len": 0
            },
            "128": {
                "vid": "79fb300726b67a3d0dcdc0afd81d45b3",
                "len": 65016514,
                "f4v_url": "",
                "m3u8_url": "",
                "dolby_len": 0,
                "h265_len": 0,
                "audio_aac_len": 0
            }
        },
        "mp4_res": {
            "1": {
                "vid": "716a30d6ec16b88b819f991d5a83857f",
                "url": "",
                "len": 77340238
            },
            "32": {
                "vid": "ff9417948c6a9d9f96e7c8778228c27d",
                "url": "",
                "len": 128380783
            },
            "2": {
                "vid": "ff9417948c6a9d9f96e7c8778228c27d",
                "url": "",
                "len": 128380783
            }
        },
        "video_ctype": 0,
        "boss_type": 1,
        "pre_img": {
            "pre_img_url": "http://preimage2.iqiyipic.com/preimage/20191018/87/fe/v_139408871_m_612_m1_220_124.jpg",
            "rule": "1-30",
            "interval": 10
        },
        "ta": {
            "215235805": {
                "id": 215235805,
                "img": "http://pic2.iqiyipic.com/image/20191010/66/09/p_2001251_m_601_m3_128_128.jpg",
                "name": "黄志忠"
            },
            "204240905": {
                "id": 204240905,
                "img": "http://pic6.iqiyipic.com/image/20181229/94/97/p_1037073_m_601_m2_128_128.jpg",
                "name": "李添诺"
            },
            "203609805": {
                "id": 203609805,
                "img": "http://pic6.iqiyipic.com/image/20181229/08/12/p_1030762_m_601_m4_128_128.jpg",
                "name": "张译"
            },
            "234334505": {
                "id": 234334505,
                "img": "http://pic3.iqiyipic.com/image/20181228/ed/19/p_5232901_m_601_m2_128_128.jpg",
                "name": "张隽溢"
            },
            "215381805": {
                "id": 215381805,
                "img": "http://pic1.iqiyipic.com/image/20181229/ed/0d/p_2013408_m_601_m3_128_128.jpg",
                "name": "潘之琳"
            },
            "214463605": {
                "id": 214463605,
                "img": "http://pic7.iqiyipic.com/image/20181228/08/81/p_2008149_m_601_m6_128_128.jpg",
                "name": "黄品沅"
            },
            "214432805": {
                "id": 214432805,
                "img": "http://pic4.iqiyipic.com/image/20181229/da/a2/p_2006722_m_601_m3_128_128.jpg",
                "name": "王骁"
            }
        },
        "vip_type": "0",
        "pay_mark": 2,
        "business_type": [],
        "play_mode": 1,
        "fst_time": "20191019",
        "video_tail_start_point": 2530000,
        "bullet_num": "300"
    },
    "setting": {
        "dl_type": 4,
        "ts": "16,4,512,8,128",
        "authResult": {
            "auth_success": false
        }
    }
}

需求

统计ass字幕中样式使用情况

缘由

最近要对几十个字幕文件做更改,字幕文件中有预设各种样式,可以视情况使用不同字幕样式。虽然大体上有几种样式都会用,不过偶尔有少数地方用少用的样式。昨晚做好的压制脚本,预计是能处理20+视频的,结果早上起来发现处理了9个就停了。因为第10个字幕使用了一个没有预设的样式(之前统一更换了全部字幕的预设样式,那种只用了一两次的搞掉了)...

代码

# -*- coding: utf-8 -*-

#import re
import os


def count_assstyle(ass_name = "filename.ass"):
    com_num = 0
    count_flag = False
    style_dict = dict()
    with open(ass_name,"r",encoding="utf-8") as ass:
        for line in ass.readlines():
            if "[Events]" == line.strip():
                count_flag = True
                continue
            elif count_flag == False:
                continue
            if "Dialogue" in line.strip():
                style = line.strip().split(",")[3]
                #style = re.findall(".*?,.*?,.*?,(.*?),.*",line.strip())[0]
                if style in style_dict:
                    style_dict[style]+=1
                else:
                    style_dict.update({style:1})
            elif "Comment" in line.strip():
                com_num+=1
            else:
                pass
        print(ass_name, dict(sorted(style_dict.items())))
        
_ = [count_assstyle(_) for _ in os.listdir() if ".ass" in _]

其他

字幕内容开始的地方是

[Events]

其下一行是

Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text

其中Style位置是样式名
一开始用正则

style = re.findall(".?,.?,.?,(.?),.*",line.strip())[0]

不过这样有点不合理,直接分割更好。

style = line.strip().split(",")[3]

脚本是统计当前文件夹下的全部字幕的样式,一目了然。
count_style