salmon.triplets.manager.Sampling

pydantic settings salmon.triplets.manager.Sampling

Settings to configure how more than two samplers are used.

These settings are used in the HTML but are not stylistic. The exception is common, which is passed to every sampler during initialization and will likely be optional arguments for Adaptive.

The default configuration is specified completely below, which is rendered into a YAML file in an example.

Show JSON schema

{
   "title": "Sampling",
   "description": "Settings to configure how more than two samplers are used.\n\nThese settings are used in the HTML but are not stylistic. The\nexception is ``common``, which is passed to every ``sampler`` during\ninitialization and will likely be optional arguments for\n:class:`~salmon.triplets.samplers.Adaptive`.\n\nThe default configuration is specified completely below, which is rendered into a YAML file in `an example`_.\n\n.. _an example: https://github.com/stsievert/salmon/blob/master/examples/default.yaml",
   "type": "object",
   "properties": {
      "common": {
         "title": "Common",
         "description": "Arguments to pass to every sampler for initialization (likely\n        values for :class:`~salmon.triplets.samplers.Adaptive`; note that\n        values for ``n`` and ``ident`` are already specified). Any\n        values specified in this field will be overwritten by\n        sampler-specific arguments.",
         "default": {
            "d": 2
         },
         "env_names": "{'common'}",
         "type": "object"
      },
      "probs": {
         "title": "Probs",
         "description": "The percentage to sample each ``sampler`` when given the\n        opportunity (which depends on ``samplers_per_user``). The percentages\n        in this sampler must add up to 100.\n        If not specified (default), choose each sampler with equal\n        probability.",
         "env_names": "{'probs'}",
         "type": "object",
         "additionalProperties": {
            "type": "integer"
         }
      },
      "samplers_per_user": {
         "title": "Samplers Per User",
         "description": "The number of samplers to assign to each user. Setting\n        ``samplers_per_user=1`` means any user only sees queries generated\n        from one sampler, and ``sampler_per_user=0`` means the user sees a\n        new sampler every query",
         "default": 0,
         "env_names": "{'samplers_per_user'}",
         "type": "integer"
      },
      "details": {
         "title": "Details",
         "description": "Different options for a deterministic choice of samplers.\n\n        This dictionary is of the form ``{query_shown_to_user: options}``. The\n        ``options`` is a dictionary with up to two keys:\n\n        - ``sampler`` (required), which reflects which sampler receives the\n          responses)\n\n        - ``query`` (optional), which is a list of length 3 indicating the target\n          indices appear in the query.\n\n        For example, this YAML will ensure the 1st and 10th query the\n        crowdsourcing user sees will be from:\n\n        .. code-block:: yaml\n\n          targets: [zero, one, two, three, four, five, six]\n          # ^ list of (textual) targets; target \"zero\" has index 0 and\n          # is targets[0] in Python\n\n          samplers:\n            ARR: {}\n            Validation: {}\n            valid2:\n              class: Validation\n              n_queries: 3\n\n          sampling:\n            probs: {ARR: 100, Validation: 0, valid2: 0}\n            details:\n              # Each key \"n\" is the n-th query the user sees\n              # So here the 1st and 10th queries the user sees is customized\n              1: {sampler: \"Validation\", query: [0, 2, 3]}\n              10: {sampler: \"valid2\"}\n\n          html:\n            # ask 10 queries according to \"sampling.probs\".\n            # The probabilistic sampling will be overriden by sampling.details\n            max_queries: 10\n\n        In this case, the crowdsourcing user will see the following:\n\n        * 1st query shown will have head \"zero\", and feet \"two\" and \"three\".\n        * Queries 2 and 9 will be generated by the :class:`~salmon.triplets.samplers.ARR` sampler.\n        * The 10th query they see (also the last query): one of three\n          (random) fixed/static queries.\n\n        The sampler ``valid2`` will receive answers to 3 fixed/static queries,\n        and the ``Validation`` sampler will receive answers to the query\n        ``[0, 2, 3]``.\n        ",
         "default": {},
         "env_names": "{'details'}",
         "type": "object"
      }
   },
   "additionalProperties": false
}

Fields

common (Dict[str, Any])
details (Dict[int, Any])
probs (Optional[Dict[str, int]])
samplers_per_user (int)

field common: Dict[str, Any] = {'d': 2}: Arguments to pass to every sampler for initialization (likely values for Adaptive; note that values for n and ident are already specified). Any values specified in this field will be overwritten by sampler-specific arguments.

field details: Dict[int, Any] = {}

Different options for a deterministic choice of samplers.

This dictionary is of the form {query_shown_to_user: options}. The options is a dictionary with up to two keys:

sampler (required), which reflects which sampler receives the responses)
query (optional), which is a list of length 3 indicating the target indices appear in the query.

For example, this YAML will ensure the 1st and 10th query the crowdsourcing user sees will be from:

targets: [zero, one, two, three, four, five, six]
# ^ list of (textual) targets; target "zero" has index 0 and
# is targets[0] in Python

samplers:
  ARR: {}
  Validation: {}
  valid2:
    class: Validation
    n_queries: 3

sampling:
  probs: {ARR: 100, Validation: 0, valid2: 0}
  details:
    # Each key "n" is the n-th query the user sees
    # So here the 1st and 10th queries the user sees is customized
    1: {sampler: "Validation", query: [0, 2, 3]}
    10: {sampler: "valid2"}

html:
  # ask 10 queries according to "sampling.probs".
  # The probabilistic sampling will be overriden by sampling.details
  max_queries: 10

In this case, the crowdsourcing user will see the following:

1st query shown will have head “zero”, and feet “two” and “three”.
Queries 2 and 9 will be generated by the ARR sampler.
The 10th query they see (also the last query): one of three (random) fixed/static queries.

The sampler valid2 will receive answers to 3 fixed/static queries, and the Validation sampler will receive answers to the query [0, 2, 3].

field probs: Optional[Dict[str, int]] = None: The percentage to sample each sampler when given the opportunity (which depends on samplers_per_user). The percentages in this sampler must add up to 100. If not specified (default), choose each sampler with equal probability.

field samplers_per_user: int = 0: The number of samplers to assign to each user. Setting samplers_per_user=1 means any user only sees queries generated from one sampler, and sampler_per_user=0 means the user sees a new sampler every query