Developing algorithms
Install
This process is meant for developers. To launch, first download the code. It’s possible to download a ZIP file of Salmon’s source, or if Git is installed, to run this command:
$ git clone https://github.com/stsievert/salmon.git
Then, to launch a local version of Salmon you’ll need Docker Compose. After that dependency is intalled, run the following code:
$ cd salmon
$ docker-compose build
$ docker-compose up
$ # visit http://localhost:8421/init or http://localhost:8421/docs
If you make changes to this code, run these commands:
$ docker-compose stop
$ docker-compose build
$ docker-compose up
If you want to log into the Docker container, execute these commands:
$ docker ps # to get list of running conatiners
CONTAINER ID IMAGE ... [more info] ... NAMES
08b96fbcc4c3 salmon_server ... [more info] ... salmon_server_1
57cb3b7652d9 redislabs/rejson ... [more info] ... salmon_redis_1
$ docker exec -it 08b96fbcc4c3 /bin/bash
(base) root@08b96fbcc4c3:/salmon# conda activate salmon
(salmon) root@08b96fbcc4c3:/salmon#
Note
This is an alternative way to install Salmon’s dependencies. If you create a
file in the Docker container in /salmon
, it will also be written to
/path/to/salmon
on your local machine.
If you run the command export SALMON_DEBUG=1
, the Salmon server will watch
for changes in the source and re-launch as necessary. This won’t be perfect,
but it will reduce the number of times required to run docker-compose {stop,
build, up}
.
If you run the command export SALMON_NO_AUTH=1
, the Salmon server will
not require a username/password.
Basics
First, write an algorithm on your machine. The basic interface requires two functions, one to get queries and one to process answers. Briefly, Salmon expects two functions:
process_answers
, a function to process answers (which might involve updating the model).A function to get queries. There are two choices for this:
get_query
, which returns a single query/scoreget_queries
, which returns a list of queries and scores. These are saved in the database, and popped when a user requests a query.
Use of get_queries
is strongly recommended. Then Salmon’s backend relies on
Dask, which allows for higher throughput (more concurrent users). get_query
uses a single worker process, so it may get overloaded with a moderate number
of concurrent users.
For complete documentation, see API. In short, your algorithm should
be a class that implement get_query
and process_answers
.
After you have developed these functions, look at other algorithms in
salmon/triplets/samplers
(e.g, _adaptive_runners.py
or _round_robin.py
)
to figure out inheritance details. In short, the following details are
important:
Inheriting from
Sampler
, which enables Salmon to work with custom algorithms.Accepting an
ident: str
keyword argument in__init__
and passing that argument tosuper().__init__
. (ident
is passed to all algorithms and is the unique identifier in the database).
I recommend the following when developing your algorithm. These aren’t necessary but are highly encouraged:
Have you algorithm be serializable:
pickle.loads(pickle.dumps(alg))
should work for your algorithm. Otherwise, your algorithm can’t be restored on a new machine.Ensure query searches are fast enough. The user will be waiting if thousands of users come to Salmon and deplete all the searched queries.
Debugging
Let’s say you’ve integrated most of your algorithm into
Sampler
. Now, you’d like to make sure everything is
working properly.
This script will help:
from salmon.triplets.samplers import STE
from copy import copy
import random
def random_answer(q):
ans = copy(q)
winner = random.choice(["left", "right"])
ans["winner"] = q[winner]
return ans
params = {
"optimizer__lr": 0.1,
"optimizer__momentum": 0.75,
}
alg = STE(n=10, **params) # or your custom alg
for k in range(1000):
query, score = alg.get_query()
if query is None:
queries, scores = alg.get_queries()
h, a, b = queries[scores.argmax()]
query = {"head": h, "left": a, "right": b, "score": scores.max()}
answer = random_answer(query)
alg.process_answers([answer])