salmon.triplets.offline
.OfflineEmbedding
- class salmon.triplets.offline.OfflineEmbedding(n=None, d=None, max_epochs=400000, opt=None, verbose=1000, ident='', noise_model='CKL', random_state=None, **kwargs)
Generate an embedding offline (after responses are downloaded from Salmon).
- Parameters
- nint
Number of targets.
- dint
Embedding dimension
- max_epochsint
Number of epochs or passes through the dataset to run for.
- optOptional[Optimizer]
- verboseint
Interval at which to score.
- random_stateOptional[int]
Random state for initialization.
- noise_modelstr, optional (default:
SOE
) Noise model to optimize over.
- kwargsdict, optional, default:
{}
Arguments for
OGD
. Only used whenopt is None
.
- Attributes
embedding_
The current embedding.
history_
The history that’s recorded during
fit
.meta_
Meta-information about this estimator.
Methods
fit
(X_train, X_test[, embedding, get_stats])Fit the embedding with train and validation data.
get_params
([deep])Get parameters for this estimator.
initialize
(X_train[, embedding])Initialize this optimizer.
partial_fit
(X_train)Fit this optimizer for (approximately) one pass through the training data.
score
(X)Score the responses against the current embedding.
set_params
(**params)Set the parameters of this estimator.
- property embedding_
The current embedding. If there are
n
objects being embedded intod
dimensions, thenembedding_.shape == (n, d)
.
- fit(X_train, X_test, embedding=None, get_stats=None, **stats_kwargs)
Fit the embedding with train and validation data.
- Parameters
- X_trainarray-like
Data to fit the embedding too.
The responses with shape
(n_questions, 3)
. Each question is organized as[head, winner, loser]
.- X_testarray-like
Data to score the embedding on
The responses with shape
(n_questions, 3)
. Each question is organized as[head, winner, loser]
.- embeddingnp.ndarray, optional
The embedding to initialize with.
Note
This is particularly useful when
embedding
is the online embedding from the CSV:import pandas as pd em = pd.read_csv("embedding.csv") # from dashboard df = pd.read_csv("responses.csv") # from dashboard X = df[["head", "winner", "loser"]] from salmon.triplets.offline import OfflineEmbedding est = OfflineEmbedding(...) est.initialize(X, embedding=em)
- get_statsCallable
- property history_
The history that’s recorded during
fit
. Available keys includescore_test
andloss_test
.
- initialize(X_train, embedding=None)
Initialize this optimizer.
- Parameters
- X_trainnp.ndarray
Responses organized to be [head, winner, loser].
- embeddingnd.ndarray, optional
If specified, initialize the embedding with the given values.
- property meta_
Meta-information about this estimator. Available keys include
score_train
andloss_train
.
- partial_fit(X_train)
Fit this optimizer for (approximately) one pass through the training data.
- Parameters
- X_trainarray-like
The responses with shape
(n_questions, 3)
. Each question is organized as[head, winner, loser]
.
- score(X)
Score the responses against the current embedding. Record the loss and accuracy, and return the accuracy.
- Parameters
- Xarray-like
The responses to score against the current embedding.
The responses with shape
(n_questions, 3)
. Each question is organized as[head, winner, loser]
.