salmon.triplets.offline.OfflineEmbedding

class salmon.triplets.offline.OfflineEmbedding(n=None, d=None, max_epochs=400000, opt=None, verbose=1000, ident='', noise_model='CKL', random_state=None, **kwargs)

Generate an embedding offline (after responses are downloaded from Salmon).

Parameters
nint

Number of targets.

dint

Embedding dimension

max_epochsint

Number of epochs or passes through the dataset to run for.

optOptional[Optimizer]
verboseint

Interval at which to score.

random_stateOptional[int]

Random state for initialization.

noise_modelstr, optional (default: SOE)

Noise model to optimize over.

kwargsdict, optional, default: {}

Arguments for OGD. Only used when opt is None.

Attributes
embedding_

The current embedding.

history_

The history that’s recorded during fit.

meta_

Meta-information about this estimator.

Methods

fit(X_train, X_test[, embedding, get_stats])

Fit the embedding with train and validation data.

get_params([deep])

Get parameters for this estimator.

initialize(X_train[, embedding])

Initialize this optimizer.

partial_fit(X_train)

Fit this optimizer for (approximately) one pass through the training data.

score(X)

Score the responses against the current embedding.

set_params(**params)

Set the parameters of this estimator.

property embedding_

The current embedding. If there are n objects being embedded into d dimensions, then embedding_.shape == (n, d).

fit(X_train, X_test, embedding=None, get_stats=None, **stats_kwargs)

Fit the embedding with train and validation data.

Parameters
X_trainarray-like

Data to fit the embedding too.

The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].

X_testarray-like

Data to score the embedding on

The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].

embeddingnp.ndarray, optional

The embedding to initialize with.

Note

This is particularly useful when embedding is the online embedding from the CSV:

import pandas as pd
em = pd.read_csv("embedding.csv")  # from dashboard
df = pd.read_csv("responses.csv")  # from dashboard
X = df[["head", "winner", "loser"]]

from salmon.triplets.offline import OfflineEmbedding
est = OfflineEmbedding(...)
est.initialize(X, embedding=em)
get_statsCallable
property history_

The history that’s recorded during fit. Available keys include score_test and loss_test.

initialize(X_train, embedding=None)

Initialize this optimizer.

Parameters
X_trainnp.ndarray

Responses organized to be [head, winner, loser].

embeddingnd.ndarray, optional

If specified, initialize the embedding with the given values.

property meta_

Meta-information about this estimator. Available keys include score_train and loss_train.

partial_fit(X_train)

Fit this optimizer for (approximately) one pass through the training data.

Parameters
X_trainarray-like

The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].

score(X)

Score the responses against the current embedding. Record the loss and accuracy, and return the accuracy.

Parameters
Xarray-like

The responses to score against the current embedding.

The responses with shape (n_questions, 3). Each question is organized as [head, winner, loser].