Codeql 踩坑记录 (二)

发表于 2020 年 3 月 31 日

首先需要解决的就是上次留下的问题, 添加自定义的 taint track.
在自带的 tests 中就有示例, 可以参考 ql/python/ql/test/library-tests/taint/extensions/ExtensionsLib.qll

最后大概是这样

 1class AnyCallFlow extends DataFlowExtension::DataFlowNode {
 2     AnyCallFlow() {
 3         exists(CallNode call |
 4            call.getFunction().(AttrNode).getObject() = this
 5         )
 6     }
 7 
 8     override ControlFlowNode getASuccessorNode() {
 9         result.(CallNode).getFunction().(AttrNode).getObject() = this
10     }
11}

意思就是如果一个 funccall 中是 val.attr 类型的, 且 val 被 taint 了, 那么整个 CallNode 都将被 taint.
然后加到 Configuration 里面就可以了

1override predicate isExtension(TaintTracking::Extension extension) {
2    extension instanceof AnyCallFlow
3}

此时就能够识别 split 等方法了, 不过这样的结果肯定是增加误报率了.
这里插一句, 最近在看南大开在 B 站上的软件分析课程, 讲的挺好, 这里其实就是 soundness completeness 问题, 在安全这一块还是 soundness 好一点, 所以最好还是牺牲虚警率来提高漏报率吧.

最后按照官方库的方法, 封装一下, 最后的结果

 1import python
 2import semmle.python.security.TaintTracking
 3import semmle.python.web.flask.Request
 4
 5class AnyCallFlow extends DataFlowExtension::DataFlowNode {
 6     AnyCallFlow() {
 7         exists(CallNode call |
 8            call.getFunction().(AttrNode).getObject() = this
 9         )
10     }
11 
12     override ControlFlowNode getASuccessorNode() {
13         result.(CallNode).getFunction().(AttrNode).getObject() = this
14     }
15}
16
17class DangerousFunctionArg0 extends Value {
18    DangerousFunctionArg0() {
19        exists(Value val |
20            this = val and
21            (
22                val = Value::named("subprocess.check_output") or
23                val = Value::named("os.system") or 
24                val = Value::named("os.popen") or 
25                val = Value::named("eval") or 
26                val = Value::named("exec") or
27                val = Value::named("flask.render_template_string")
28            )
29        )
30    }
31}
32
33class DangerousFunctionArg0Sink extends TaintSink {
34    DangerousFunctionArg0Sink() {
35        exists(
36            CallNode call, DangerousFunctionArg0 dangerous_func |
37            call.getFunction().pointsTo(dangerous_func) and
38            call.getArg(0) = this
39        )
40    }
41
42    override predicate sinks(TaintKind taint) {
43        any()
44    }
45}
46
47class SystemCommandExecution extends TaintTracking::Configuration {
48    SystemCommandExecution() { this = "SystemCommandExecution Tracking" }
49
50    override predicate isSource(DataFlow::Node src, TaintKind kind) {
51        src.asCfgNode() instanceof FlaskRequestArgs
52    }
53
54    override predicate isSink(DataFlow::Node sink, TaintKind kind) {
55        sink.asCfgNode() instanceof DangerousFunctionArg0Sink
56    }
57
58    override predicate isExtension(TaintTracking::Extension extension) {
59         extension instanceof AnyCallFlow
60    }
61}
62
63from SystemCommandExecution config, DataFlow::Node src, DataFlow::Node sink
64where config.hasSimpleFlow(src, sink)
65select sink, src

检测以下 sample, 一共 10 个漏洞, 都能找到, 还是不错的

 1import flask
 2import subprocess
 3from subprocess import check_output
 4from flask import request
 5
 6app = flask.Flask(__name__)
 7
 8def passby(i):
 9    return i.split('123')
10
11@app.route('/index')
12def index():
13    return subprocess.check_output(flask.request.args.get('c', 'ls'))
14
15@app.route('/index2')
16def index2():
17    tmp = flask.request.args.get('c', 'ls')
18    tmp = tmp.split('|')
19    return subprocess.check_output(tmp)
20
21@app.route('/index3')
22def index3():
23    tmp = flask.request.args.get('c', 'ls')
24    tmp = tmp.split('|')
25    return check_output(tmp)
26
27@app.route('/index4')
28def index4():
29    tmp = request.args.get('c', 'ls')
30    tmp = tmp.split('|')
31    return subprocess.check_output(tmp)
32
33@app.route('/index5')
34def index5():
35    tmp = flask.request.args.get('c', 'ls')
36    tmp = tmp + "i"
37    return subprocess.check_output(tmp)
38
39@app.route('/index6')
40def index6():
41    tmp = request.args.get('c', 'ls')
42    tmp = tmp + "i"
43    return subprocess.check_output(tmp)
44
45@app.route('/index7')
46def index7():
47    tmp = request.args.get('c', 'ls')
48    tmp = tmp + "i"
49    return check_output(tmp)
50
51@app.route('/index8')
52def index8():
53    tmp = request.args.get('c', 'ls')
54    tmp = tmp + "i"
55    return flask.render_template_string(tmp)
56
57@app.route('/index9')
58def index9():
59    tmp = request.args.get('c', 'ls')
60    tmp = tmp + "i"
61    return flask.render_template_string("asd", t=tmp)
62
63@app.route('/index10')
64def index10():
65    tmp = request.args.get('c', 'ls')
66    tmp = passby(tmp + "i")
67    return flask.render_template_string("asd", t=tmp)
68
69@app.route('/index11')
70def index11():
71    tmp = request.args.get('c', 'ls')
72    tmp = passby(tmp + "i")
73    return eval(tmp)
74
75@app.route('/index12')
76def index12():
77    tmp = request.args.get('c', 'ls')
78    tmp = passby(tmp + "i")
79    return flask.render_template_string(tmp)
80
81app.run()

最后, 其实感觉编写最大的难点还是需要思维的转换, 这种声明式的语言像 SQL 一样, 是告诉程序, 希望在 xx 地方是 xx, 且 xx 里面的 yy 是 zz 这样.
需要一点时间来转变思维吧, 之后是这个官方 python 接口库感觉本身写的就有点乱 (逃, 各种类似的对象, 又是 PythonFunctionCall, CallNode 的, 同样的目的可以由一万种不同的方式达成. 感觉对新手确实不太友好. 等后续文档跟上吧.