`salmon.triplets.offline`.OfflineEmbedding

class salmon.triplets.offline.OfflineEmbedding(n=None, d=None, max_epochs=400000, opt=None, verbose=1000, ident='', noise_model='CKL', random_state=None, **kwargs)

Generate an embedding offline (after responses are downloaded from Salmon).

Parameters

nint: Number of targets.
dint: Embedding dimension
max_epochsint: Number of epochs or passes through the dataset to run for.
optOptional[Optimizer]
verboseint: Interval at which to score.
random_stateOptional[int]: Random state for initialization.
noise_modelstr, optional (default: SOE): Noise model to optimize over.
kwargsdict, optional, default: {}: Arguments for OGD. Only used when opt is None.

Attributes

embedding_: The current embedding.
history_: The history that’s recorded during fit.
meta_: Meta-information about this estimator.

Methods

`fit`(X_train, X_test[, embedding, get_stats])	Fit the embedding with train and validation data.
`get_params`([deep])	Get parameters for this estimator.
`initialize`(X_train[, embedding])	Initialize this optimizer.
`partial_fit`(X_train)	Fit this optimizer for (approximately) one pass through the training data.
`score`(X)	Score the responses against the current embedding.
`set_params`(**params)	Set the parameters of this estimator.

property embedding_: The current embedding. If there are n objects being embedded into d dimensions, then embedding_.shape == (n, d).

fit(X_train, X_test, embedding=None, get_stats=None, **stats_kwargs)

Fit the embedding with train and validation data.

Parameters

X_trainarray-like

Data to fit the embedding too.

The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].

X_testarray-like

Data to score the embedding on

The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].

embeddingnp.ndarray, optional

The embedding to initialize with.

Note

This is particularly useful when embedding is the online embedding from the CSV:

import pandas as pd
em = pd.read_csv("embedding.csv")  # from dashboard
df = pd.read_csv("responses.csv")  # from dashboard
X = df[["head", "winner", "loser"]]

from salmon.triplets.offline import OfflineEmbedding
est = OfflineEmbedding(...)
est.initialize(X, embedding=em)

get_statsCallable

property history_: The history that’s recorded during fit. Available keys include score_test and loss_test.

initialize(X_train, embedding=None)

Initialize this optimizer.

Parameters

X_trainnp.ndarray: Responses organized to be [head, winner, loser].
embeddingnd.ndarray, optional: If specified, initialize the embedding with the given values.

property meta_: Meta-information about this estimator. Available keys include score_train and loss_train.

partial_fit(X_train)

Fit this optimizer for (approximately) one pass through the training data.

Parameters

X_trainarray-like: The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].

score(X)

Score the responses against the current embedding. Record the loss and accuracy, and return the accuracy.

Parameters

Xarray-like

The responses to score against the current embedding.

The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].

salmon.triplets.offline.OfflineEmbedding

`salmon.triplets.offline`.OfflineEmbedding