`prompting.validators.reward.diversity`#

Module Contents#

Classes#

`DiversityRewardEvent`
`DiversityRewardModel`

Functions#

mean_pooling(model_output, attention_mask)

Applies mean pooling to the token embeddings generated by the model.

prompting.validators.reward.diversity.mean_pooling(model_output, attention_mask)#

Applies mean pooling to the token embeddings generated by the model. :param model_output: Embedding model output, where the first element contains token embeddings. :type model_output: torch.Tensor :param attention_mask: Attention mask to indicate valid tokens. :type attention_mask: torch.Tensor

Returns:: Mean-pooled representation of the token embeddings.
Return type:: torch.Tensor

Notes

The function calculates the mean-pooled representation using the attention mask for valid tokens.
Input_mask_expanded is created by expanding the attention mask to match the size of token embeddings.
The result is obtained by summing the element-wise multiplication of embeddings and input_mask_expanded,
and dividing it by the sum of input_mask_expanded after clamping its values to a minimum of 1e-9.

class prompting.validators.reward.diversity.DiversityRewardEvent#

Bases: prompting.validators.reward.reward.BaseRewardEvent

historic: float#

batch: float#

class prompting.validators.reward.diversity.DiversityRewardModel(device)#

Bases: prompting.validators.reward.reward.BaseRewardModel

Parameters:: device (str) –

property name: str#

Return type:: str

diversity_model_path = 'sentence-transformers/all-mpnet-base-v2'#

get_embeddings(sentences)#

Runs a forward pass through the model. :param sentences: text message to be encoded. :type sentences: List[str]

Returns:: Embedding for the message.
Return type:: embedding (torch.FloatTensor)
Parameters:: sentences (List[str]) –

update_historic_embeddings(embeddings)#

Parameters:: embeddings (torch.FloatTensor) –

get_historic_rewards(embeddings)#

Parameters:: embeddings (torch.FloatTensor) –
Return type:: torch.FloatTensor

get_batch_rewards(embeddings)#

Parameters:: embeddings (torch.FloatTensor) –
Return type:: torch.FloatTensor

get_rewards(prompt, completions, name)#

Parameters:

prompt (str) –
completions (List[str]) –
name (str) –

Return type:

List[DiversityRewardEvent]

normalize_rewards(raw_rewards)#

This method normalizes the given rewards by updating the moving mean and variance statistics. The rewards are first standardized, and then scaled to the 0-1 range using a cumulative distribution function (CDF) to ensure they’re in a comparable range across different environments.

Args: rewards (torch.FloatTensor): The reward values to be normalized.

Returns: torch.FloatTensor: The normalized reward values.

Note: - This function uses Welford’s online algorithm to update the mean and variance. - It standardizes the reward values using the updated mean and variance. - It then scales the standardized values to the 0-1 range using the error function (erf) as a CDF.

Parameters:: raw_rewards (torch.FloatTensor) –
Return type:: torch.FloatTensor

prompting.validators.reward.diversity

Contents

prompting.validators.reward.diversity#

Module Contents#

Classes#

Functions#

`prompting.validators.reward.diversity`#