This commit was merged in pull request #105.
This commit is contained in:
@@ -1,9 +1,9 @@
|
||||
"""Run the eval set against one candidate and print a scorecard.
|
||||
|
||||
Usage:
|
||||
python run_eval.py candidates/current_model
|
||||
python run_eval.py candidates/swapped_model
|
||||
python run_eval.py candidates/current_model --threshold 0.9
|
||||
python3 run_eval.py candidates/current_model
|
||||
python3 run_eval.py candidates/swapped_model
|
||||
python3 run_eval.py candidates/current_model --threshold 0.9
|
||||
|
||||
A "candidate" is a directory containing a tasks.py that an agent produced. The
|
||||
runner imports that tasks.py, runs every case in eval_set.py against it, prints
|
||||
|
||||
Reference in New Issue
Block a user