SekaiCTF 2024 writeups
My writeups for SekaiCTF 2024
CalcQL
Use CodeQL to find the function that returns 42.
Author: irogir
Nice, a chance to use CodeQL.
Challenge
Write a CodeQL query that detects functions that return 42. The CodeQL query must run on Sekai’s challenge platform over 2 files, that are randomly generated and ~3500 lines long.
The backend codebase and Dockerfile were provided to the participants for simplicity’s sake, and CodeQL is available publicly if you dare to install it.
Here’s a sample of one of those generated challenge files:
def entry_c9d32a0799c677bada16():
return fn_41b2662cc07e9d415aad() + fn_dc74661a89fa9d7b781c() + fn_a5069863a1ada4cf4f31() + fn_75ac78910b7fb9f71882() + fn_9137f86648eb1cd605c1() + fn_3f62afd14b26d481a243() + fn_1db0ac419dffc1ffbf19() + 2 + 1 + 2 + 1
def entry_b0a098a549a34c848bd2():
return fn_b3e40891831cb0d37873() + fn_1f2290ad8a4960f0965b() + fn_a7cf0d75d6e192bcb819() + fn_f7037d3e4977b7e94d1c() + fn_3d1903c8851663911b6f() + fn_589da486391aaea98883() + fn_06311901fd087a41787b() + fn_19fb39766eeca40a2b07() + fn_20859e1436314cf04f76() + 5 + 1
def fn_7c4eb1a35da5224979b0():
return fn_1727ecb33f8b858b5175() + 2 + fn_3f62afd14b26d481a243() + 1 + 2 + 2 + 1 + 2 + 2 + 2 + 1 + 1
...
Solution
This is the solution that printed the flag in the live environment around 30 seconds after the end time. Not because it was slow or anything, just we got it working 5 minutes before the end of the CTF.
/**
* Alerts on functions that return 42 and whose name starts with "entry_".
*
* Assumes that:
* - your functions are one-liners with a return statement
* - your return statement's inner expression is a binary expression
* - binary expressions are all add expressions
* - operands are either function call expressions without indirection, other binary expressions, or integer literals
*
* @name summer2
* @kind problem
* @problem.severity warning
* @id python/example/hello-world
*/
import python
int value(Expr e) {
e instanceof IntegerLiteral and result = e.(IntegerLiteral).getValue()
or
e instanceof BinaryExpr and
e.(BinaryExpr).getOp() instanceof Add and
result = value(e.(BinaryExpr).getLeft()) + value(e.(BinaryExpr).getRight())
or
e instanceof Call and
exists(Function f |
f.getName() = e.(Call).getFunc().(Name).getId() and
result = value(f.getBody().getItem(0).(Return).getASubExpression())
)
}
from Function g
where
value(g.getBody().getItem(0).(Return).getASubExpression()) = 42 and
g.getName().matches("entry_%")
select g, "called " + g.getName()
Now, before arriving at the solution, here’s where we started:
Setting up VS Code to run CodeQL
- Install the CodeQL VS Code extension
- Write a small python test file with a few simple function definitions
- Run
CodeQL: Create query
- It triggers an internal installation of CodeQL - use it
- Find the path of ^ above CodeQL and use it to generate a code database into a nearby folder
- Right click the folder in the file browser and select
CodeQL: Set current database
- Run
CodeQL: Create query
again - See that it dumps some files and an
example.ql
inside - Open
example.ql
, type Cmd-Shift-P and runCodeQL: Run query on selected database
Hey presto, it executes!
Use the CodeQL side button in the left sidebar to get about 20% of an ergonomic query editor.
Editing the query
There are 2 types of queries that you can try for this challenge: queries, and path queries. Queries look like logic programming meets SQL, and execute over the AST.
Path queries look similar but execute over the data flow graph. Importantly, path queries are not the fit here - they can easily find you whether a start and end point meet, but will not easily give you the path between them.
Throughout the challenge, you have to remember that this is a predicate logic based language, so think of the entire system as a “narrowing down” of the possible values a variable can have. It also helps to go iteratively, as the way to search some things can be quite upside down when compared to “normal programming”.
Here’s the first try to get something working and test the IDE. Read it like this:
- Look at all possible statements in the code file
- Keep only statements that are instanceof Return
- => goes into variable
s
- Look at all possible integer literals
- Keep only integer literals that are inside the aforementioned statements
s
- … and whose value is 42
- Keep only such Return statements where we found one of those integer literals inside
- Return 2 columns: (1) the code fragment that matched the alert, and (2) a message to go with it
/**
* This is an automatically generated file
*
* @name Hello world
* @kind problem
* @problem.severity warning
* @id python/example/hello-world
*/
import python
from Stmt s
where
s instanceof Return and
exists(IntegerLiteral i | s.contains(i) and i.getValue() = 42)
select s, "lol found it"
Moving on, the next thing was to try and build a predicate logic function that can evaluate those expressions. The VS Code CodeQL AST viewer can help a bit, but it doesn’t reload automatically when the source file changes.
- Open a CodeQL file
- Make it return at least 1 finding
- Run the query on the current database
- Click on the location of the finding in the CodeQL Query Results pane
- You are now viewing a copy of the original script, mounted from a zip file
- You can only run Cmd-Shift-P
CodeQL: View AST
on these mounted zipped source files - Now the left panel “AST Viewer” works
- If you change the source file, you have to delete and regenerate the database, and
run
CodeQL: Restart query server
, and click the clear button on the AST view. It sometimes still doesn’t work.
This next one works but doesn’t resolve function calls. Almost there.
/**
* This is an automatically generated file
*
* @name summer
* @kind problem
* @problem.severity warning
* @id python/example/hello-world
*/
import python
import semmle.python.dataflow.new.DataFlow
int value(Expr e) {
if e instanceof IntegerLiteral
then result = e.(IntegerLiteral).getValue()
else
if e instanceof BinaryExpr and e.(BinaryExpr).getOp() instanceof Add
then result = value(e.(BinaryExpr).getLeft()) + value(e.(BinaryExpr).getRight())
else
if e instanceof Call
then result = 999
else result = 0
}
from DataFlow::Node input
where
input.asExpr().(BinaryExpr).getOp() instanceof Add and
value(input.asExpr()) = 42
select input.asExpr(), "This is a binary addition expression."
After some attempts, it was possible to break the problem down to individual steps.
Evaluating a BinaryExpr
This can be done simply by a recursive value-returning predicate:
// BinaryExpr
// Op (Add, ...)
// Left
// Expr
// Right
// Expr
int value(Expr e) {
e instanceof IntegerLiteral and result = e.(IntegerLiteral).getValue()
or
e instanceof BinaryExpr and
e.(BinaryExpr).getOp() instanceof Add and
result = value(e.(BinaryExpr).getLeft()) + value(e.(BinaryExpr).getRight())
// Function calls...??? Read more!
}
Getting the Return Expression of a Function
This was tricky and makes a lot of assumptions but it worked fine for the chall:
// Function f
// Body
// Stmt[]
// [0]
// Return
// [Expr]
f.getBody().getItem(0).(Return).getASubExpression()
Converting a Function Call to a Function
Took a while to figure out how to make function calls work in a quick and dirty way:
- a
Call
node appears to consist of aName
node and aArguments
node e.(Call).getFunc()
gives you theName
of the function, as a typeExpr
so you need to downcast- In Python mode, you can’t dig through a call expression to the function it is referring to.
- Even
pointsTo
didn’t help much sadly, not sure why it didn’t work. - Maybe we can look at all functions, get their names, and look at the call expression and figure out what the name of the function is, and put the two together?
Here’s how this clause could look like:
exists(Function f |
f.getName() = e.(Call).getFunc().(Name).getId()
)
Putting it all together
Now how to make it useful for us? It is possible to assign a result of a predicate inside the aggregate clause for some reason. This lets us write the 3rd clause for our value() function:
int value(Expr e) {
//
// Integer literals - easy
e instanceof IntegerLiteral and result = e.(IntegerLiteral).getValue()
or
//
// Binary expressions - left + right
e instanceof BinaryExpr and
e.(BinaryExpr).getOp() instanceof Add and
result = value(e.(BinaryExpr).getLeft()) + value(e.(BinaryExpr).getRight())
or
//
// Call expressions - a bit cursed
// "func2()" == Call { Name { getId() = "func2" } }
// "def func2(): return 2 + 3" == Function { getName() = "func2", Body { Stmt[0] { Return { Expr ... } } } }
e instanceof Call and
exists(Function f |
f.getName() = e.(Call).getFunc().(Name).getId() and
result = value(f.getBody().getItem(0).(Return).getASubExpression())
)
}
And now nothing left to do than to utilise this function in our query.
// find all functions
from Function g
where
// get the return statement,
// calculate the value,
// check if it's 42
// and that the function name matches entry_...
value(g.getBody().getItem(0).(Return).getASubExpression()) = 42 and
g.getName().matches("entry_%")
select g, "called " + g.getName()
The challenge check scripts were generating 2 arbitrary Python scripts on boot, and allowed you to upload a CodeQL file that would be run over these scripts. The CodeQL file was supposed to output a list of locations where the query matched, and the right function name would trigger the flag to be displayed.
Summary
A brain bending challenge for sure, due to the logic programming involved. Probably not the best fit to coerce Java-like syntax onto a logic programming language, but after getting over the initial hump, it was a fun challenge to solve.
Also, CodeQL will come in handy as a skill in the future.
Bonus content
The right way to find out that path queries are not the right fit: make one, and try to coerce it to work.
In any case, you can use this as a template to analyze data flow. It alerts you when an integer literal flows into a function named “entry_%”. :)
The plan was to compute the data flow graph, hope that it “looks through” function calls,
and walk the graph backwards summing up all the integers. Turns out you can’t ask for the
path between source
and sink
like that.
/**
* This is an automatically generated file
*
* @name summer2
* @kind path-problem
* @problem.severity warning
* @id python/example/hello-world
*/
import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.Concepts
private module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source.asExpr().isConstant() and source.asExpr() instanceof IntegerLiteral
}
predicate isSink(DataFlow::Node sink) {
exists(Function f | f.getName().matches("entry_%") and f.contains(sink.asExpr())) and
sink.asExpr().getParent() instanceof Return
}
}
module Flow = DataFlow::Global<MyConfig>;
import Flow::PathGraph
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink.getNode(), source, sink, "<message>"