PepTransformerModel

`elfragmentador.model`

Attributes

`elfragmentador.model.LiteralFalse = Literal[False]` `module-attribute`

Classes

`elfragmentador.model.PepTransformerModel(num_decoder_layers: int = 6, num_encoder_layers: int = 6, nhid: int = 2024, d_model: int = 516, nhead: int = 4, dropout: float = 0.1, combine_embeds: bool = True, combine_encoders: bool = True, final_decoder: str = 'linear', lr: float = 0.0001, scheduler: str = 'plateau', lr_ratio: float | int = 200, steps_per_epoch: None = None, loss_ratio: float = 5, args, kwargs) -> None`

Bases: pl.LightningModule

PepTransformerModel Predicts retention times and HCD spectra from peptides.

init Instantiates the class.

Generates a new instance of the PepTransformerModel

PARAMETER	DESCRIPTION
`num_decoder_layers`	int, optional Number of layers in the transformer decoder, by default 6 DEFAULT: `6`
`num_encoder_layers`	int, optional Number of laters in the transformer encoder, by default 6 DEFAULT: `6`
`nhid`	int, optional Number of dimensions used in the feedforward networks inside the transformer encoder and decoders, by default 2024 DEFAULT: `2024`
`d_model`	int, optional Number of features to pass to the transformer encoder. The embedding transforms the input to this input, by default 516 DEFAULT: `516`
`nhead`	int, optional Number of multi-attention heads in the transformer, by default 4 DEFAULT: `4`
`dropout`	float, optional dropout, by default 0.1 DEFAULT: `0.1`
`combine_embeds`	bool, optional Whether the embeddings for modifications and sequences should be shared for irt and fragment predictions TYPE: `bool` DEFAULT: `True`
`combine_encoders`	bool = True, Whether the transformer encoders for for irt and fragments should be shared. TYPE: `bool` DEFAULT: `True`
`lr`	float, optional Learning rate, by default 1e-4 DEFAULT: `0.0001`
`scheduler`	str, optional What scheduler to use, check the available ones with `PepTransformerModel.accepted_schedulers`, by default "plateau" DEFAULT: `'plateau'`
`lr_ratio`	Union[float, int], optional For cosine annealing: Ratio of the initial learning rate to use with cosine annealing for instance a lr or 1 and a ratio of 10 would have a minimum learning rate of 0.1. For onecycle: Ratio of the initial lr and and maximum one, for instance if lr is 0.1 and ratio is 10, the max learn rate would be 1.0. by default 200 DEFAULT: `200`
`steps_per_epoch`	None, optional expected number of steps per epoch, used internally to calculate learning rates when using the oncecycle scheduler, by default None DEFAULT: `None`
`loss_ratio`	float, optional The ratio of the spectrum to retention time loss to use when adding before passing to the optimizer. Higher values mean more weight to spectra with respect to the retention time. By default 5 TYPE: `float` DEFAULT: `5`

Source code in elfragmentador/model/__init__.py

def __init__(
    self,
    num_decoder_layers: int = 6,
    num_encoder_layers: int = 6,
    nhid: int = 2024,
    d_model: int = 516,
    nhead: int = 4,
    dropout: float = 0.1,
    combine_embeds: bool = True,
    combine_encoders: bool = True,
    final_decoder: str = "linear",
    lr: float = 1e-4,
    scheduler: str = "plateau",
    lr_ratio: float | int = 200,
    steps_per_epoch: None = None,
    loss_ratio: float = 5,
    *args,
    **kwargs,
) -> None:
    """
    __init__ Instantiates the class.

    Generates a new instance of the PepTransformerModel

    Parameters:
        num_decoder_layers : int, optional
            Number of layers in the transformer decoder, by default 6
        num_encoder_layers : int, optional
            Number of laters in the transformer encoder, by default 6
        nhid : int, optional
            Number of dimensions used in the feedforward networks inside
            the transformer encoder and decoders, by default 2024
        d_model : int, optional
            Number of features to pass to the transformer encoder.
            The embedding transforms the input to this input, by default 516
        nhead : int, optional
            Number of multi-attention heads in the transformer, by default 4
        dropout : float, optional
            dropout, by default 0.1
        combine_embeds: bool, optional
            Whether the embeddings for modifications and sequences
            should be shared for irt and fragment predictions
        combine_encoders: bool = True,
            Whether the transformer encoders for for irt and
            fragments should be shared.
        lr : float, optional
            Learning rate, by default 1e-4
        scheduler : str, optional
            What scheduler to use, check the available ones with
            `PepTransformerModel.accepted_schedulers`, by default "plateau"
        lr_ratio : Union[float, int], optional
            For cosine annealing:
            Ratio of the initial learning rate to use with cosine annealing for
            instance a lr or 1 and a ratio of 10 would have a minimum learning
            rate of 0.1.

            For onecycle:
            Ratio of the initial lr and and maximum one,
            for instance if lr is 0.1 and ratio is 10, the max learn rate
            would be 1.0.

            by default 200
        steps_per_epoch : None, optional
            expected number of steps per epoch, used internally to calculate
            learning rates when using the oncecycle scheduler, by default None
        loss_ratio: float, optional
            The ratio of the spectrum to retention time loss to use when adding
            before passing to the optimizer. Higher values mean more weight to
            spectra with respect to the retention time. By default 5
    """
    super().__init__()
    self.ms2ml_config = CONFIG
    self.NUM_FRAGMENT_EMBEDDINGS = self.ms2ml_config.num_fragment_embeddings
    self.save_hyperparameters()
    logger.info(
        f"num_decoder_layers {num_decoder_layers} "
        f"num_encoder_layers {num_encoder_layers} "
        f"nhid {nhid} d_model {d_model} "
        f"nhead {nhead} dropout {dropout}"
        f"combined embeds {combine_embeds} combined encoders {combine_encoders}"
    )
    self.main_model = PepTransformerBase(
        num_fragments=self.NUM_FRAGMENT_EMBEDDINGS,
        num_decoder_layers=num_decoder_layers,
        num_encoder_layers=num_encoder_layers,
        nhid=nhid,
        d_model=d_model,
        nhead=nhead,
        dropout=dropout,
        combine_embeds=combine_embeds,
        combine_encoders=combine_encoders,
        final_decoder=final_decoder,
    )

    self.metric_calculator = MetricCalculator()
    self.mse_loss = nn.MSELoss(reduction="none")
    self.cosine_loss = CosineLoss(dim=1, eps=1e-8)
    self.angle_loss = SpectralAngleLoss(dim=1, eps=1e-8)

    # Training related things
    self.lr = lr

    assert (
        scheduler in self.accepted_schedulers
    ), f"Passed scheduler '{scheduler} is not one of {self.accepted_schedulers}"
    self.scheduler = scheduler
    self.lr_ratio = lr_ratio
    self.steps_per_epoch = steps_per_epoch
    self.loss_ratio = loss_ratio

    self.irt_metric = MissingDataAverager()
    self.loss_metric = MissingDataAverager()
    self.spectra_metric = MissingDataAverager()
    self.spectra_metric2 = MissingDataAverager()

Attributes

`accepted_schedulers = ['plateau', 'cosine', 'onecycle']` `class-attribute`

`version = elfragmentador.version` `class-attribute`

`ms2ml_config = CONFIG` `instance-attribute`

`NUM_FRAGMENT_EMBEDDINGS = self.ms2ml_config.num_fragment_embeddings` `instance-attribute`

`main_model = PepTransformerBase(num_fragments=self.NUM_FRAGMENT_EMBEDDINGS, num_decoder_layers=num_decoder_layers, num_encoder_layers=num_encoder_layers, nhid=nhid, d_model=d_model, nhead=nhead, dropout=dropout, combine_embeds=combine_embeds, combine_encoders=combine_encoders, final_decoder=final_decoder)` `instance-attribute`

`metric_calculator = MetricCalculator()` `instance-attribute`

`mse_loss = nn.MSELoss(reduction='none')` `instance-attribute`

`cosine_loss = CosineLoss(dim=1, eps=1e-08)` `instance-attribute`

`angle_loss = SpectralAngleLoss(dim=1, eps=1e-08)` `instance-attribute`

`lr = lr` `instance-attribute`

`scheduler = scheduler` `instance-attribute`

`lr_ratio = lr_ratio` `instance-attribute`

`steps_per_epoch = steps_per_epoch` `instance-attribute`

`loss_ratio = loss_ratio` `instance-attribute`

`irt_metric = MissingDataAverager()` `instance-attribute`

`loss_metric = MissingDataAverager()` `instance-attribute`

`spectra_metric = MissingDataAverager()` `instance-attribute`

`spectra_metric2 = MissingDataAverager()` `instance-attribute`

Functions

`summarize(max_depth = 3)`

Source code in elfragmentador/model/__init__.py

def summarize(self, max_depth=3):
    return summarize(self, max_depth)

`forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor)`

Source code in elfragmentador/model/__init__.py

def forward(
    self,
    seq: Tensor,
    mods: Tensor,
    charge: Tensor,
    nce: Tensor,
):
    return self.main_model.forward(seq=seq, mods=mods, charge=charge, nce=nce)

`predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | Spectrum`

Source code in elfragmentador/model/__init__.py

def predict_from_seq(
    self,
    seq: str,
    nce: float,
    as_spectrum=False,
) -> PredictionResults | Spectrum:
    return self.main_model.predict_from_seq(
        seq=seq,
        nce=nce,
        as_spectrum=as_spectrum,
    )

`torch_batch_from_seq(args, kwargs) -> ForwardBatch` `staticmethod`

Source code in elfragmentador/model/__init__.py

@staticmethod
def torch_batch_from_seq(*args, **kwargs) -> ForwardBatch:
    torch_batch_from_seq.__doc__
    return torch_batch_from_seq(*args, **kwargs)

`to_torchscript()`

Convert the model to torchscript.

Example:

model = PepTransformerModel() ts = model.to_torchscript() type(ts)

Source code in elfragmentador/model/__init__.py

def to_torchscript(self):
    """
    Convert the model to torchscript.

    Example:
    >>> model = PepTransformerModel()
    >>> ts = model.to_torchscript()
    >>> type(ts)
    <class 'torch.jit._trace.TopLevelTracedModule'>
    """
    _fake_input_data_torchscript = self.torch_batch_from_seq(
        seq="MYM[U:35]DIFIEDPEPTYDE", charge=3, nce=27.0
    )

    backup_calculator = self.metric_calculator
    self.metric_calculator = None

    bkp_1 = self.main_model.decoder.nce_encoder.static_size
    self.main_model.decoder.nce_encoder.static_size = self.NUM_FRAGMENT_EMBEDDINGS
    bkp_2 = self.main_model.decoder.charge_encoder.static_size
    self.main_model.decoder.charge_encoder.static_size = (
        self.NUM_FRAGMENT_EMBEDDINGS
    )

    script = super().to_torchscript(
        example_inputs=_fake_input_data_torchscript, method="trace"
    )

    self.main_model.decoder.nce_encoder.static_size = bkp_1
    self.main_model.decoder.charge_encoder.static_size = bkp_2
    self.main_model.metric_calculator = backup_calculator

    return script

`add_model_specific_args(parser: _ArgumentGroup) -> _ArgumentGroup` `staticmethod`

Add_model_specific_args Adds arguments to a parser.

It is used to add the command line arguments for the training/generation of the model.

PARAMETER DESCRIPTION

parser

An argparser parser (anything that has the .add_argument method) to which the arguments will be added

TYPE: _ArgumentGroup

RETURNS	DESCRIPTION
`_ArgumentGroup`	_ArgumentGroup, the same parser with the added arguments

Source code in elfragmentador/model/__init__.py

@staticmethod
def add_model_specific_args(parser: _ArgumentGroup) -> _ArgumentGroup:
    """
    Add_model_specific_args Adds arguments to a parser.

    It is used to add the command line arguments for the training/generation
    of the model.

    Args:
        parser (_ArgumentGroup):
            An argparser parser (anything that has the `.add_argument` method) to
            which the arguments will be added

    Returns:
        _ArgumentGroup, the same parser with the added arguments
    """
    parser.add_argument(
        "--num_queries",
        default=150,
        type=int,
        help="Expected encoding length of the spectra",
    )
    parser.add_argument(
        "--num_decoder_layers",
        default=6,
        type=int,
        help="Number of sub-encoder-layers in the encoder",
    )
    parser.add_argument(
        "--num_encoder_layers",
        default=6,
        type=int,
        help="Number of sub-encoder-layers in the decoder",
    )
    parser.add_argument(
        "--nhid",
        default=1024,
        type=int,
        help="Dimension of the feedforward networks",
    )
    parser.add_argument(
        "--d_model",
        default=516,
        type=int,
        help="Number of input features to the transformer encoder",
    )
    parser.add_argument(
        "--nhead", default=12, type=int, help="Number of attention heads"
    )
    parser.add_argument("--dropout", default=0.1, type=float)
    parser.add_argument(
        "--combine_embeds",
        action=argparse.BooleanOptionalAction,
        help=(
            "Whether the embeddings for aminoacid and modifications"
            " should be shared between the irt and fragment sections"
        ),
    )
    parser.add_argument(
        "--combine_encoders",
        action=argparse.BooleanOptionalAction,
        help=(
            "Whether the encoders for aminoacid and modifications"
            " should be shared between the irt and fragment sections"
        ),
    )
    parser.add_argument(
        "--final_decoder",
        default="mlp",
        type=str,
        help=(
            "What kind of final layer should the docer have to"
            " output a single number, options are 'mlp' and 'linear'"
        ),
    )
    parser.add_argument("--lr", default=1e-4, type=float)
    parser.add_argument(
        "--scheduler",
        default="plateau",
        type=str,
        help=(
            "Scheduler to use during training, "
            f"either of {PepTransformerModel.accepted_schedulers}"
        ),
    )
    parser.add_argument(
        "--lr_ratio",
        default=200.0,
        type=float,
        help=(
            "For cosine annealing: "
            "Ratio of the initial learning rate to use with cosine annealing"
            " for instance a lr or 1 and a ratio of 10 would have a minimum"
            " learning rate of 0.1\n"
            "For onecycle: "
            "Ratio of the initial lr and and maximum one, "
            "for instance if lr is 0.1 and ratio is 10, the max learn rate"
            "would be 1.0"
        ),
    )
    parser.add_argument(
        "--loss_ratio",
        default=5.0,
        type=float,
        help=(
            "Ratio between the retention time and the spectrum loss"
            " (higher values mean more weight to the spectra loss"
            " with respect to the retention time loss)"
        ),
    )

    return parser

`configure_scheduler_plateau(optimizer, lr_ratio)` `staticmethod`

Source code in elfragmentador/model/__init__.py

@staticmethod
def configure_scheduler_plateau(optimizer, lr_ratio):
    assert lr_ratio < 1
    scheduler_dict = {
        "scheduler": torch.optim.lr_scheduler.ReduceLROnPlateau(
            optimizer=optimizer,
            mode="min",
            factor=lr_ratio,
            patience=2,
            verbose=False,
        ),
        "interval": "epoch",
        "monitor": "val_l",
    }
    return scheduler_dict

`configure_scheduler_cosine(optimizer, lr_ratio, min_lr)` `staticmethod`

Source code in elfragmentador/model/__init__.py

@staticmethod
def configure_scheduler_cosine(optimizer, lr_ratio, min_lr):
    assert lr_ratio > 1
    scheduler_dict = {
        "scheduler": CosineAnnealingWarmRestarts(
            optimizer=optimizer,
            T_0=1,
            T_mult=2,
            eta_min=min_lr,
            last_epoch=-1,
            verbose=False,
        ),
        "interval": "step",
    }
    return scheduler_dict

`configure_scheduler_oncecycle(optimizer, lr_ratio, learning_rate, steps_per_epoch, accumulate_grad_batches, max_epochs)` `staticmethod`

Source code in elfragmentador/model/__init__.py

@staticmethod
def configure_scheduler_oncecycle(
    optimizer,
    lr_ratio,
    learning_rate,
    steps_per_epoch,
    accumulate_grad_batches,
    max_epochs,
):
    max_lr = learning_rate * lr_ratio
    spe = steps_per_epoch // accumulate_grad_batches
    pct_start = 0.3

    logger.info(
        f">> Scheduler setup: max_lr {max_lr}, "
        f"Max Epochs: {max_epochs}, "
        f"Steps per epoch: {steps_per_epoch}, "
        f"SPE (after accum grad batches) {spe}, "
        f"Percent Warmup {pct_start}, "
        f"Accumulate Batches {accumulate_grad_batches}, "
    )

    scheduler_dict = {
        "scheduler": torch.optim.lr_scheduler.OneCycleLR(
            optimizer=optimizer,
            max_lr=max_lr,
            epochs=max_epochs,
            pct_start=pct_start,
            steps_per_epoch=spe,
        ),
        "interval": "step",
    }
    return scheduler_dict

`configure_optimizers() -> tuple[list[AdamW], list[dict[str, ReduceLROnPlateau | str]]] | tuple[list[AdamW], list[dict[str, CosineAnnealingWarmRestarts | str]]] | tuple[list[AdamW], list[dict[str, OneCycleLR | str]]]`

Configure_optimizers COnfigures the optimizers for training.

It is internally used by pytorch_lightning during training, so far I implemented 3 options (set when making the module).

OneCycleLR seems to give the best results overall in the least amount of time. The only tradeoff that I see is that resuming training does not seem to be really easy.

Check the pytorch_lightning documentation to see how this is used in the training loop

RETURNS	DESCRIPTION
`tuple[list[AdamW], list[dict[str, ReduceLROnPlateau \| str]]] \| tuple[list[AdamW], list[dict[str, CosineAnnealingWarmRestarts \| str]]] \| tuple[list[AdamW], list[dict[str, OneCycleLR \| str]]]`	Two lists, one containing the optimizer and another contining the scheduler.

Source code in elfragmentador/model/__init__.py

def configure_optimizers(
    self,
) -> (
    tuple[list[AdamW], list[dict[str, ReduceLROnPlateau | str]]]
    | tuple[list[AdamW], list[dict[str, CosineAnnealingWarmRestarts | str]]]
    | tuple[list[AdamW], list[dict[str, OneCycleLR | str]]]
):
    """
    Configure_optimizers COnfigures the optimizers for training.

    It is internally used by pytorch_lightning during training, so far I
    implemented 3 options (set when making the module).

    OneCycleLR seems to give the best results overall in the least amount
    of time. The only tradeoff that I see is that resuming training does
    not seem to be really easy.

    Check the pytorch_lightning documentation to see how this is used in the
    training loop

    Returns:
      Two lists, one containing the optimizer and another contining the scheduler.
    """
    opt = torch.optim.AdamW(
        filter(lambda p: p.requires_grad, self.parameters()),
        lr=self.lr,
        betas=(0.9, 0.98),
    )

    if self.scheduler == "plateau":
        sched_dict = self.configure_scheduler_plateau(
            optimizer=opt, lr_ratio=self.lr_ratio
        )
    elif self.scheduler == "cosine":
        sched_dict = self.configure_scheduler_cosine(
            optimizer=opt, lr_ratio=self.lr_ratio, min_lr=self.lr / self.lr_ratio
        )
    elif self.scheduler == "onecycle":
        assert self.steps_per_epoch is not None, "Please set steps_per_epoch"
        if self.trainer.max_epochs == 1000:
            warnings.warn("Max epochs was 1000, make sure you want this")
        if self.lr_ratio > 20:
            warnings.warn(
                f"Provided LR ratio '{self.lr_ratio}' seems a lil high,"
                " make sure you want that for the OneCycleLR scheduler"
            )
            time.sleep(3)  # just so the user has time to see the message...

        sched_dict = self.configure_scheduler_oncecycle(
            optimizer=opt,
            lr_ratio=self.lr_ratio,
            learning_rate=self.lr,
            steps_per_epoch=self.steps_per_epoch,
            accumulate_grad_batches=self.trainer.accumulate_grad_batches,
            max_epochs=self.trainer.max_epochs,
        )

    else:
        raise ValueError(
            "Scheduler should be one of 'plateau' or 'cosine', passed: ",
            self.scheduler,
        )
    # TODO check if using different optimizers for different parts of the
    # model would work better
    logger.info(f"\n\n>>> Setting up schedulers:\n\n{sched_dict}")

    return [opt], [sched_dict]

`plot_scheduler_lr()`

Plot the learning rate of the scheduler.

This is useful to see how the learning rate changes during training, and to make sure that the scheduler is working as intended.

Source code in elfragmentador/model/__init__.py

def plot_scheduler_lr(self):
    """
    Plot the learning rate of the scheduler.

    This is useful to see how the learning rate changes during training,
    and to make sure that the scheduler is working as intended.

    """
    steps_per_epoch = self.steps_per_epoch
    if steps_per_epoch is None:
        steps_per_epoch = 1000
    try:
        accumulate_grad_batches = self.trainer.accumulate_grad_batches
        max_epochs = self.trainer.max_epochs
    except RuntimeError:
        accumulate_grad_batches = 1
        max_epochs = 10
    spe = steps_per_epoch // accumulate_grad_batches

    optimizer, schedulers = self.configure_optimizers()
    optimizer = optimizer[0]
    scheduler = schedulers[0]["scheduler"]

    xs = list(range(spe * max_epochs))
    lrs = []
    for i in xs:
        optimizer.step()
        lrs.append(optimizer.param_groups[0]["lr"])
        scheduler.step()

    str_list = uniplot.plot_to_string(
        np.log1p(np.array(lrs)), xs, title="Learning Rate Schedule"
    )
    plot_str = "\n".join(str_list)
    logger.info(f"\n\n{plot_str}\n\n")

`training_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor`

See pytorch_lightning documentation.

Source code in elfragmentador/model/__init__.py

def training_step(self, batch: TrainBatch, batch_idx: int | None = None) -> Tensor:
    """See pytorch_lightning documentation."""
    step_out = self._step(batch, batch_idx=batch_idx)
    log_dict = {"train_" + k: v for k, v in step_out.items()}
    log_dict.update({"LR": self.trainer.optimizers[0].param_groups[0]["lr"]})

    self.log_dict(
        log_dict,
        prog_bar=True,
        # reduce_fx=nanmean,
    )

    return step_out["l"]

`on_train_start() -> None`

Source code in elfragmentador/model/__init__.py

def on_train_start(self) -> None:
    logger.info("Weights before the start of the training epoch:")
    logger.info(copy.deepcopy(self.state_dict()))
    return super().on_train_start()

`validation_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor`

See pytorch_lightning documentation.

Source code in elfragmentador/model/__init__.py

def validation_step(
    self, batch: TrainBatch, batch_idx: int | None = None
) -> Tensor:
    """See pytorch_lightning documentation."""
    step_out = self._step(batch, batch_idx=batch_idx)

    self.irt_metric.update(step_out["irt_l"])
    self.loss_metric.update(step_out["l"])
    self.spectra_metric.update(step_out["spec_l"])
    self.spectra_metric2.update(step_out["spec_l2"])

    return step_out["l"]

`validation_epoch_end(outputs: list[Tensor]) -> list[Tensor]`

See pytorch lightning documentation.

Source code in elfragmentador/model/__init__.py

def validation_epoch_end(self, outputs: list[Tensor]) -> list[Tensor]:
    """See pytorch lightning documentation."""
    log_dict = {
        "val_irt_l": self.irt_metric.compute(),
        "val_l": self.loss_metric.compute(),
        "val_spec_l": self.spectra_metric.compute(),
        "val_spec_l2": self.spectra_metric2.compute(),
    }

    self.log_dict(
        log_dict,
        prog_bar=True,
    )

    self.irt_metric.reset()
    self.loss_metric.reset()
    self.spectra_metric.reset()
    self.spectra_metric2.reset()

    return super().validation_epoch_end(outputs)

`test_step(batch, batch_idx: int | None = None) -> tuple[dict[str, Tensor], PredictionResults]`

Source code in elfragmentador/model/__init__.py

def test_step(
    self, batch, batch_idx: int | None = None
) -> tuple[dict[str, Tensor], PredictionResults]:
    losses, pred_out = self._evaluation_step(batch=batch, batch_idx=batch_idx)
    return losses, pred_out.irt, batch.irt

`test_epoch_end(results: list)`

Source code in elfragmentador/model/__init__.py

def test_epoch_end(self, results: list):
    self.metric_calculator.trainer = self.trainer
    self.metric_calculator.log_dict = self.log_dict
    return self.metric_calculator.test_epoch_end(results)

`predict_step(batch: TrainBatch, batch_idx: int | None = None)`

Source code in elfragmentador/model/__init__.py

def predict_step(self, batch: TrainBatch, batch_idx: int | None = None):
    yhat_irt, yhat_spectra = self.forward(
        seq=batch.seq, mods=batch.mods, charge=batch.charge, nce=batch.nce
    )
    pred_out = PredictionResults(irt=yhat_irt, spectra=torch.relu(yhat_spectra))
    return pred_out

`on_after_backward()`

Source code in elfragmentador/model/__init__.py

def on_after_backward(self):
    msg = []
    global_step = self.global_step
    if (global_step % 50) == 0:
        for name, param in self.named_parameters():
            if "weight" in name and "norm" not in name:
                if param.requires_grad:
                    try:
                        if param.grad is None:
                            raise AttributeError
                        if any(
                            x in name
                            for x in [
                                "aa_encoder.weight",
                                "mod_encoder.weight",
                                "trans_decoder_embedding.weight",
                            ]
                        ):
                            val = param.grad.abs().mean()
                            if torch.any(torch.isnan(val)):
                                logger.error(
                                    f"nan mean gradient for {name}: {param.grad}"
                                )
                            self.log(name, val, prog_bar=True, on_step=True)
                    except AttributeError:
                        msg.append(name)
                    except ValueError:
                        msg.append(name)

    if len(msg) > 0:
        logger.warning(
            " ".join(msg) + "Did not have gradients in step {global_step}"
        )

`on_train_epoch_end() -> None`

Source code in elfragmentador/model/__init__.py

def on_train_epoch_end(self) -> None:
    evaluate_landmark_rt(self)
    return super().on_train_epoch_end()

`load_from_checkpoint(args, kwargs)` `classmethod`

Source code in elfragmentador/model/__init__.py

@classmethod
def load_from_checkpoint(cls, *args, **kwargs):
    mod = super().load_from_checkpoint(*args, **kwargs)
    evaluate_landmark_rt(mod)
    return mod

Functions

`elfragmentador.model.evaluate_landmark_rt(model: PepTransformerModel)`

Checks the prediction of the model on the iRT peptides.

Predicts all the procal and Biognosys iRT peptides and checks the correlation of the theoretical iRT values and the predicted ones

PARAMETER DESCRIPTION

model

PepTransformerModel A model to test the predictions on

TYPE: PepTransformerModel

Source code in elfragmentador/model/__init__.py

def evaluate_landmark_rt(model: PepTransformerModel):
    """Checks the prediction of the model on the iRT peptides.

    Predicts all the procal and Biognosys iRT peptides and checks the correlation
    of the theoretical iRT values and the predicted ones

    Parameters:
        model: PepTransformerModel
            A model to test the predictions on
    """
    model.eval()
    real_rt = []
    pred_rt = []
    for seq, desc in IRT_PEPTIDES.items():
        with torch.no_grad():
            out = model.predict_from_seq(f"{seq}/2", 25)
            pred_rt.append(out.irt.clone().cpu().numpy())
            real_rt.append(np.array(desc["irt"]))

    fit = polyfit(np.array(real_rt).flatten(), np.array(pred_rt).flatten())
    logger.info(fit)
    plot_str = uniplot.plot_to_string(
        xs=np.array(real_rt).flatten(),
        ys=np.array(pred_rt).flatten(),
        title="Prediction vs real iRT of biognosys and procal peptides",
    )
    logger.info("\n" + "\n".join(plot_str), "\n")
    return fit, plot_str

`elfragmentador.model.peptransformer`

Classes

`elfragmentador.model.peptransformer.PepTransformerBase(num_fragments, num_decoder_layers: int = 6, num_encoder_layers: int = 6, nhid: int = 2024, d_model: int = 516, nhead: int = 4, dropout: float = 0.1, combine_embeds: bool = True, combine_encoders: bool = True, final_decoder = 'linear') -> None`

Bases: nn.Module

Source code in elfragmentador/model/peptransformer.py

def __init__(
    self,
    num_fragments,
    num_decoder_layers: int = 6,
    num_encoder_layers: int = 6,
    nhid: int = 2024,
    d_model: int = 516,
    nhead: int = 4,
    dropout: float = 0.1,
    combine_embeds: bool = True,
    combine_encoders: bool = True,
    final_decoder="linear",
) -> None:
    super().__init__()
    # Peptide encoder
    self.encoder = PeptideTransformerEncoder(
        d_model=d_model,
        dropout=dropout,
        nhead=nhead,
        nhid=nhid,
        layers=num_encoder_layers,
    )

    # Peptide decoder
    self.decoder = FragmentTransformerDecoder(
        d_model=d_model,
        nhead=nhead,
        nhid=nhid,
        layers=num_decoder_layers,
        dropout=dropout,
        num_fragments=num_fragments,
        final_decoder=final_decoder,
    )

    self.irt_decoder = IRTDecoder(
        d_model=d_model,
        dim_feedforward=nhid,
        nhead=nhead,
        n_layers=num_encoder_layers,
        dropout=dropout,
        final_decoder=final_decoder,
    )

    if combine_embeds:
        self.irt_decoder.aa_embed = self.encoder.aa_embed

    if combine_encoders:
        self.irt_decoder.encoder = self.encoder.encoder

Attributes

`encoder = PeptideTransformerEncoder(d_model=d_model, dropout=dropout, nhead=nhead, nhid=nhid, layers=num_encoder_layers)` `instance-attribute`

`decoder = FragmentTransformerDecoder(d_model=d_model, nhead=nhead, nhid=nhid, layers=num_decoder_layers, dropout=dropout, num_fragments=num_fragments, final_decoder=final_decoder)` `instance-attribute`

`irt_decoder = IRTDecoder(d_model=d_model, dim_feedforward=nhid, nhead=nhead, n_layers=num_encoder_layers, dropout=dropout, final_decoder=final_decoder)` `instance-attribute`

Functions

`forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor) -> PredictionResults`

Forward Generate predictions.

Privides the function for the forward pass to the model.

PARAMETER	DESCRIPTION
`seq`	Encoded pepide sequence [B, L] (view details) TYPE: `Tensor`
`mods`	Encoded modification sequence [B, L], by default None TYPE: `Tensor`
`nce`	float Tensor with the charges [B, 1] TYPE: `Tensor`
`charge`	long Tensor with the charges [B, 1], by default None TYPE: `Tensor`

Details

seq: The peptide is encoded as integers for the aminoacid. "AAA" encoded for a max length of 5 would be torch.Tensor([ 1, 1, 1, 0, 0]).long() nce: Normalized collision energy to use during the prediction. charge: A tensor corresponding to the charges of each of the peptide precursors (long) mods: Modifications encoded as integers

Source code in elfragmentador/model/peptransformer.py

def forward(
    self,
    seq: Tensor,
    mods: Tensor,
    charge: Tensor,
    nce: Tensor,
) -> PredictionResults:
    """
    Forward Generate predictions.

    Privides the function for the forward pass to the model.

    Parameters:
        seq (Tensor): Encoded pepide sequence [B, L] (view details)
        mods (Tensor): Encoded modification sequence [B, L], by default None
        nce (Tensor): float Tensor with the charges [B, 1]
        charge (Tensor): long Tensor with the charges [B, 1], by default None

    Details:
        seq:
            The peptide is encoded as integers for the aminoacid.
            "AAA" encoded for a max length of 5 would be
            torch.Tensor([ 1,  1,  1,  0,  0]).long()
        nce:
            Normalized collision energy to use during the prediction.
        charge:
            A tensor corresponding to the charges of each of the
            peptide precursors (long)
        mods:
            Modifications encoded as integers
    """

    trans_encoder_output, mem_mask = self.encoder(seq=seq, mods=mods)

    rt_output = self.irt_decoder(seq=seq, mods=mods)

    spectra_output = self.decoder(
        memory=trans_encoder_output,
        charge=charge,
        nce=nce,
        memory_key_padding_mask=mem_mask,
    )

    return PredictionResults(irt=rt_output, spectra=spectra_output)

`predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | AnnotatedPeptideSpectrum`

Predict_from_seq Predicts spectra from a sequence as a string.

Utility method that gets a sequence as a string, encodes it internally to the correct input form and outputs the predicted spectra.

Note that the spectra is not decoded as an output, please check elfragmentador.encoding_decoding.decode_fragment_tensor for the decoding.

The irt is scaled by 100 and is in the Biognosys scale.

TODO: consider if the output should be decoded ...

PARAMETER DESCRIPTION

seq

Sequence to use for prediction, supports modifications in the form of S[PHOSPHO], S[+80] and T[181]

TYPE: str

nce

Normalized collision energy to use during the prediction

TYPE: float

as_spectrum

Wether to return a Spectrum object instead of the raw tensor predictions (Default value = False)

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION

PredictionResults

A named tuple with two named results; irt and spectra

TYPE: PredictionResults | AnnotatedPeptideSpectrum

Spectrum

A spectrum object with the predicted spectrum

TYPE: PredictionResults | AnnotatedPeptideSpectrum

Examples:

>>> import pytorch_lightning as pl
>>> from elfragmentador.config import CONFIG
>>> pl.seed_everything(42)
42
>>> my_model = PepTransformerBase(num_fragments=CONFIG.num_fragment_embeddings) # Or load the model from a checkpoint
>>> _ = my_model.eval()
>>> my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27)
PredictionResults(irt=tensor(...), spectra=tensor([...]))
>>> out = my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27,             as_spectrum=True)
>>> type(out)
<class 'ms2ml.spectrum.AnnotatedPeptideSpectrum'>
>>> # my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27)

Source code in elfragmentador/model/peptransformer.py

@torch.no_grad()
def predict_from_seq(
    self,
    seq: str,
    nce: float,
    as_spectrum=False,
) -> PredictionResults | AnnotatedPeptideSpectrum:
    """
    Predict_from_seq Predicts spectra from a sequence as a string.

    Utility method that gets a sequence as a string, encodes it internally
    to the correct input form and outputs the predicted spectra.

    Note that the spectra is not decoded as an output, please check
    `elfragmentador.encoding_decoding.decode_fragment_tensor` for the
    decoding.

    The irt is scaled by 100 and is in the Biognosys scale.

    TODO: consider if the output should be decoded ...

    Parameters:
        seq (str):
            Sequence to use for prediction, supports modifications in the form
            of S[PHOSPHO], S[+80] and T[181]
        nce (float):
            Normalized collision energy to use during the prediction
        as_spectrum (bool, optional):
            Wether to return a Spectrum object instead of the raw tensor predictions
            (Default value = False)

    Returns:
      PredictionResults: A named tuple with two named results; irt and spectra
      Spectrum: A spectrum object with the predicted spectrum

    Examples:
        >>> import pytorch_lightning as pl
        >>> from elfragmentador.config import CONFIG
        >>> pl.seed_everything(42)
        42
        >>> my_model = PepTransformerBase(num_fragments=CONFIG.num_fragment_embeddings) # Or load the model from a checkpoint
        >>> _ = my_model.eval()
        >>> my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27)
        PredictionResults(irt=tensor(...), spectra=tensor([...]))
        >>> out = my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27, \
        as_spectrum=True)
        >>> type(out)
        <class 'ms2ml.spectrum.AnnotatedPeptideSpectrum'>
        >>> # my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27)
    """  # noqa

    in_batch = Tensorizer().convert_string(data=seq, nce=nce)
    device = next(self.parameters()).device
    in_batch_dict = {k: v.clone().to(device) for k, v in in_batch._asdict().items()}

    out = self.forward(**in_batch_dict)
    logger.debug(out)

    if as_spectrum:
        spec = DeTensorizer.make_spectrum(
            seq=in_batch.seq,
            mod=in_batch.mods,
            charge=in_batch.charge,
            fragment_vector=out.spectra,
            irt=out.irt,
        )
        out = spec

    return out

`elfragmentador.model.ms_transformer_layers`

Attributes

`elfragmentador.model.ms_transformer_layers.CONFIG = get_default_config()` `module-attribute`

Classes

`elfragmentador.model.ms_transformer_layers.IRTDecoder(d_model, dim_feedforward = 224, nhead = 4, n_layers = 3, dropout = 0.05, final_decoder = 'linear')`

Bases: nn.Module

Source code in elfragmentador/model/ms_transformer_layers.py

def __init__(
    self,
    d_model,
    dim_feedforward=224,
    nhead=4,
    n_layers=3,
    dropout=0.05,
    final_decoder="linear",
):
    super().__init__()
    """Decode iRTs.

    It is technically an encoder-decoder...

    Args:
        d_model (int):
            Number of dimensions to expect as input
        nhead (int):
            Number of heads in the attention layers that decode the input.
            defaults to 4
        dim_feedforward (int, optional):
            Number of hidden dimensions in the FFN that decodes the sequence.
            Defaults to 224
        n_layers (int, optional):
            dropout to use in the multihead attention.
            Defaults to 3
    """

    self.aa_embed = AASequenceEmbedding(
        d_model=d_model,
        aa_names=CONFIG.encoding_aa_order,
        mod_names=CONFIG.encoding_mod_order,
        max_length=100,
    )
    encoder_layers = nn.TransformerEncoderLayer(
        d_model=d_model,
        nhead=nhead,
        dim_feedforward=dim_feedforward,
        dropout=dropout,
        activation="gelu",
    )
    self.encoder = nn.TransformerEncoder(
        encoder_layer=encoder_layers, num_layers=n_layers
    )
    self.decoder = _LearnableEmbedTransformerDecoder(
        d_model=d_model,
        nhead=nhead,
        nhid=dim_feedforward,
        layers=n_layers,
        dropout=dropout,
        num_outputs=1,
        final_decoder=final_decoder,
    )

Attributes

`aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100)` `instance-attribute`

`encoder = nn.TransformerEncoder(encoder_layer=encoder_layers, num_layers=n_layers)` `instance-attribute`

`decoder = _LearnableEmbedTransformerDecoder(d_model=d_model, nhead=nhead, nhid=dim_feedforward, layers=n_layers, dropout=dropout, num_outputs=1, final_decoder=final_decoder)` `instance-attribute`

Functions

`forward(seq, mods)`

Source code in elfragmentador/model/ms_transformer_layers.py

def forward(self, seq, mods):
    # seq [N, S], mods [N, S]
    trans_encoder_mask = torch.zeros_like(seq, dtype=torch.float)
    trans_encoder_mask = trans_encoder_mask.masked_fill(
        seq <= 0, float("-inf")
    ).masked_fill(seq > 0, float(0.0))
    # mask [N, S]

    embed_seq = self.aa_embed(seq=seq, mods=mods)  # [S, N, d_model]

    memory = self.encoder(embed_seq, src_key_padding_mask=trans_encoder_mask)
    out = self.decoder(memory, trans_encoder_mask)
    return out

`elfragmentador.model.ms_transformer_layers.PeptideTransformerEncoder(d_model: int, dropout: float, nhead: int, nhid: int, layers: int) -> None`

Bases: torch.nn.Module

Source code in elfragmentador/model/ms_transformer_layers.py

def __init__(
    self, d_model: int, dropout: float, nhead: int, nhid: int, layers: int
) -> None:
    super().__init__()

    # Aminoacid embedding
    self.aa_embed = AASequenceEmbedding(
        d_model=d_model,
        aa_names=CONFIG.encoding_aa_order,
        mod_names=CONFIG.encoding_mod_order,
        max_length=100,
    )

    # Transformer encoder sections
    encoder_layers = nn.TransformerEncoderLayer(
        d_model=d_model,
        nhead=nhead,
        dim_feedforward=nhid,
        dropout=dropout,
        activation="gelu",
    )
    self.encoder = nn.TransformerEncoder(encoder_layers, layers)

Attributes

`aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100)` `instance-attribute`

`encoder = nn.TransformerEncoder(encoder_layers, layers)` `instance-attribute`

Functions

`forward(seq: Tensor, mods: Tensor) -> Tensor`

Source code in elfragmentador/model/ms_transformer_layers.py

def forward(self, seq: Tensor, mods: Tensor) -> Tensor:
    # For the mask ....
    # If a BoolTensor is provided, positions with True are not allowed
    # to attend while False values will be unchanged <- form the pytorch docs

    # [1,1,0]
    # bool [True, True, False]
    # ~    [False, False, True]
    # [Attend, Attend, Dont Attend]

    # seq shape [N, S]
    trans_encoder_mask = torch.zeros_like(seq, dtype=torch.float)
    trans_encoder_mask = trans_encoder_mask.masked_fill(
        seq <= 0, float("-inf")
    ).masked_fill(seq > 0, float(0.0))

    x = self.aa_embed(seq=seq, mods=mods)
    # x shape [S, N, d_model]

    trans_encoder_output = self.encoder(x, src_key_padding_mask=trans_encoder_mask)
    # trans_encoder_output shape [S, N, d_model]

    return trans_encoder_output, trans_encoder_mask

`elfragmentador.model.ms_transformer_layers.FragmentTransformerDecoder(d_model: int, nhead: int, nhid: int, layers: int, dropout: float, num_fragments: int, charge_dims_pct: float = 0.05, nce_dims_pct: float = 0.05, final_decoder: str = 'linear') -> None`

Bases: _LearnableEmbedTransformerDecoder

Source code in elfragmentador/model/ms_transformer_layers.py

def __init__(
    self,
    d_model: int,
    nhead: int,
    nhid: int,
    layers: int,
    dropout: float,
    num_fragments: int,
    charge_dims_pct: float = 0.05,
    nce_dims_pct: float = 0.05,
    final_decoder: str = "linear",
) -> None:
    charge_dims = math.ceil(d_model * charge_dims_pct)
    nce_dims = math.ceil(d_model * nce_dims_pct)
    n_embeds = d_model - (charge_dims + nce_dims)

    super().__init__(
        d_model=d_model,
        embed_dims=n_embeds,
        nhead=nhead,
        nhid=nhid,
        layers=layers,
        dropout=dropout,
        num_outputs=num_fragments,
        final_decoder=final_decoder,
    )

    self.charge_encoder = ConcatenationEncoder(
        dims_add=charge_dims, max_val=10.0, scaling=math.sqrt(d_model)
    )
    self.nce_encoder = ConcatenationEncoder(
        dims_add=nce_dims, max_val=100.0, scaling=math.sqrt(d_model)
    )
    self.init_weights()

Attributes

`charge_encoder = ConcatenationEncoder(dims_add=charge_dims, max_val=10.0, scaling=math.sqrt(d_model))` `instance-attribute`

`nce_encoder = ConcatenationEncoder(dims_add=nce_dims, max_val=100.0, scaling=math.sqrt(d_model))` `instance-attribute`

Functions

`init_weights()`

Source code in elfragmentador/model/ms_transformer_layers.py

def init_weights(self):
    initrange = 0.1
    nn.init.uniform_(self.trans_decoder_embedding.weight, -initrange, initrange)

`preprocess_query(query, charge, nce)`

Source code in elfragmentador/model/ms_transformer_layers.py

def preprocess_query(self, query, charge, nce):
    # [T, B, E2]
    trans_decoder_tgt = self.charge_encoder(query, charge)
    # [T, B, E1]
    trans_decoder_tgt = self.nce_encoder(trans_decoder_tgt, nce)
    # [T, B, E]
    return trans_decoder_tgt

`forward(memory: Tensor, memory_key_padding_mask: Tensor, charge: Tensor, nce: Tensor) -> Tensor`

Source code in elfragmentador/model/ms_transformer_layers.py

def forward(
    self,
    memory: Tensor,
    memory_key_padding_mask: Tensor,
    charge: Tensor,
    nce: Tensor,
) -> Tensor:
    trans_decoder_tgt = self.get_learnable_query(batch_size=charge.size(0))
    trans_decoder_tgt = self.preprocess_query(
        trans_decoder_tgt, nce=nce, charge=charge
    )
    # [T, B, E]

    output = self.decoder_forward(
        trans_decoder_tgt=trans_decoder_tgt,
        memory=memory,
        memory_key_padding_mask=memory_key_padding_mask,
    )
    return output

Functions

`elfragmentador.model.nn_encoding`

Implements torch models to handle encoding and decoding of positions as well as.

learnable embeddings for the aminoacids and ions.

Attributes

`elfragmentador.model.nn_encoding.LiteralFalse = Literal[False]` `module-attribute`

Classes

`elfragmentador.model.nn_encoding.ConcatenationEncoder(dims_add: int, max_val: Union[float, int] = 200.0, static_size: bool = False, scaling = 1) -> None`

Bases: torch.nn.Module

ConcatenationEncoder concatenates information into the embedding.

Adds information on continuous variables into an embedding by concatenating an n number of dimensions to it.

It is meant to add different information to every element in a batch, but the same information (number of dimensions) to every element of a sequence inside an element of the batch. (x[i_1,j,-y:] = x[i_2,j,-y:]) ; being (y) the number of added dimensions.

PARAMETER DESCRIPTION

dims_add

Number of dimensions to add as an encoding

TYPE: int

max_val

maximum expected value of the variable that will be encoded, by default 200.0

TYPE: float DEFAULT: 200.0

static_size

Optional ingeter to pass in order to make the size deterministic. This is only required if you want to export your model to torchscript, by default False

TYPE: Union[Literal[False], float] DEFAULT: False

Examples:

>>> x1 = torch.zeros((5, 1, 20))
>>> x2 = torch.zeros((5, 2, 20))
>>> encoder = ConcatenationEncoder(dims_add = 10, max_val=10)
>>> output = encoder(x1, torch.tensor([[7]]))
>>> output = encoder(x2, torch.tensor([[7], [4]]))

Source code in elfragmentador/model/nn_encoding.py

def __init__(
    self,
    dims_add: int,
    max_val: Union[float, int] = 200.0,
    static_size: bool = False,
    scaling=1,
) -> None:
    r"""ConcatenationEncoder concatenates information into the embedding.

    Adds information on continuous variables into an embedding by concatenating
    an n number of dimensions to it.

    It is meant to add different information to every element in a batch, but the
    same information (number of dimensions) to every element of a sequence inside
    an element of the batch. \(x[i_1,j,-y:] = x[i_2,j,-y:]\) ; being \(y\) the
    number of added dimensions.

    Args:
        dims_add (int): Number of dimensions to add as an encoding
        max_val (float, optional):
            maximum expected value of the variable that will be encoded,
            by default 200.0
        static_size (Union[Literal[False], float], optional):
            Optional ingeter to pass in order to make the size deterministic.
            This is only required if you want to export your model to torchscript,
            by default False

    Examples:
        >>> x1 = torch.zeros((5, 1, 20))
        >>> x2 = torch.zeros((5, 2, 20))
        >>> encoder = ConcatenationEncoder(dims_add = 10, max_val=10)
        >>> output = encoder(x1, torch.tensor([[7]]))
        >>> output = encoder(x2, torch.tensor([[7], [4]]))
    """
    super().__init__()

    # pos would be a variable ...
    div_term = torch.exp(
        torch.arange(0, dims_add, 2).float()
        * (-math.log(float(2 * max_val)) / (dims_add))
    )
    self.register_buffer("div_term", div_term)
    # TODO add option to make trainable
    self.static_size = static_size
    self.dims_add = dims_add
    self.scaling = scaling

Attributes

`static_size = static_size` `instance-attribute`

`dims_add = dims_add` `instance-attribute`

`scaling = scaling` `instance-attribute`

Functions

`forward(x: Tensor, val: Tensor) -> Tensor`

Forward pass thought the encoder.

PARAMETER DESCRIPTION

x

the sequence fed to the encoder model (required). shape is [sequence length, batch size, embed dim].

TYPE: Tensor

val

value to be encoded into the sequence (required). Shape is [batch size, 1].

TYPE: Tensor

RETURNS	DESCRIPTION
`Tensor`	Tensor (Tensor),
`Tensor`	Of shape [sequence length, batch size, embed_dim + added_dims]

Examples:

>>> x1 = torch.zeros((5, 1, 20))
>>> x2 = torch.cat([x1, x1+1], axis = 1)
>>> encoder = ConcatenationEncoder(10, max_val = 10)
>>> output = encoder(x1, torch.tensor([[7]]))
>>> output.shape
torch.Size([5, 1, 30])
>>> output = encoder(x2, torch.tensor([[7], [4]]))

Source code in elfragmentador/model/nn_encoding.py

def forward(self, x: Tensor, val: Tensor) -> Tensor:
    """
    Forward pass thought the encoder.

    Parameters:
        x (Tensor):
            the sequence fed to the encoder model (required).
            shape is **[sequence length, batch size, embed dim]**.
        val (Tensor):
            value to be encoded into the sequence (required).
            Shape is **[batch size, 1]**.

    Returns:
        Tensor (Tensor),
        Of shape **[sequence length, batch size, embed_dim + added_dims]**

    Examples:
        >>> x1 = torch.zeros((5, 1, 20))
        >>> x2 = torch.cat([x1, x1+1], axis = 1)
        >>> encoder = ConcatenationEncoder(10, max_val = 10)
        >>> output = encoder(x1, torch.tensor([[7]]))
        >>> output.shape
        torch.Size([5, 1, 30])
        >>> output = encoder(x2, torch.tensor([[7], [4]]))
    """

    e_sin = torch.sin(val * self.div_term)
    e_cos = torch.cos(val * self.div_term)
    e = torch.cat([e_sin, e_cos], axis=-1)

    assert (
        e.shape[-1] < self.dims_add + 2
    ), "Internal error in concatenation encoder"

    e = e[..., : self.dims_add]
    e = e.unsqueeze(0).expand(x.size(0), -1, -1) / self.scaling
    x = torch.cat((x, e), axis=-1)

    return x

`elfragmentador.model.nn_encoding.FourierPositionalEncoding(d_model: int, max_len: int = 5000, static_size: Union[LiteralFalse, int] = False) -> None`

Bases: torch.nn.Module

FourierPositionalEncoding adds positional information to tensors.

Inject some information about the relative or absolute position of the tokens in the sequence. The positional encodings have the same dimension as the embeddings, so that the two can be summed. Here, we use sine and cosine functions of different frequencies.

({PosEncoder}(pos, 2i) = sin(pos/10000^(2i/d_model)) ({PosEncoder}(pos, 2i+1) = cos(pos/10000^(2i/d_model)))

where pos is the word position and i is the embed idx)

PARAMETER DESCRIPTION

d_model

the embed dim (required), must be even.

TYPE: int

max_len

the max. length of the incoming sequence (default=5000).

TYPE: int DEFAULT: 5000

static_size

If it is an integer it is the size of the inputs that will be given, it is used only when tracing the model for torchscript (since torchscript needs fixed length inputs), by default False

TYPE: Union[LiteralFalse, int] DEFAULT: False

Note

Therefore encoding are (seq_length, batch, encodings)

Examples:

>>> posencoder = FourierPositionalEncoding(20, max_len=20)
>>> x = torch.ones((2,1,20)).float()
>>> x.shape
torch.Size([2, 1, 20])
>>> posencoder(x).shape
torch.Size([2, 1, 20])

Source code in elfragmentador/model/nn_encoding.py

def __init__(
    self,
    d_model: int,
    max_len: int = 5000,
    static_size: Union[LiteralFalse, int] = False,
) -> None:
    r"""FourierPositionalEncoding adds positional information to tensors.

    Inject some information about the relative or absolute position of the tokens
    in the sequence. The positional encodings have the same dimension as
    the embeddings, so that the two can be summed. Here, we use sine and cosine
    functions of different frequencies.

    \({PosEncoder}(pos, 2i) = sin(pos/10000^(2i/d_model)\)
    \({PosEncoder}(pos, 2i+1) = cos(pos/10000^(2i/d_model))\)

    where pos is the word position and i is the embed idx)

    Args:
        d_model (int):
            the embed dim (required), must be even.
        max_len (int):
            the max. length of the incoming sequence (default=5000).
        static_size (Union[LiteralFalse, int], optional):
            If it is an integer it is the size of the inputs that will
            be given, it is used only when tracing the model for torchscript
            (since torchscript needs fixed length inputs), by default False

    Note:
        Therefore encoding are **(seq_length, batch, encodings)**

    Examples:
        >>> posencoder = FourierPositionalEncoding(20, max_len=20)
        >>> x = torch.ones((2,1,20)).float()
        >>> x.shape
        torch.Size([2, 1, 20])
        >>> posencoder(x).shape
        torch.Size([2, 1, 20])
    """
    super().__init__()
    self.d_model = d_model

    pe = torch.zeros(max_len, d_model)
    position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(
        torch.arange(0, d_model, 2, dtype=torch.float)
        * (-math.log(10000.0) / d_model)
    )
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    pe = pe.unsqueeze(0).transpose(0, 1) / math.sqrt(d_model)
    # Pe has [shape max_len, 1, d_model]
    self.register_buffer("pe", pe)
    self.static_size = static_size

Attributes

`d_model = d_model` `instance-attribute`

`static_size = static_size` `instance-attribute`

Functions

`forward(x: Tensor) -> Tensor`

Forward pass though the encoder.

PARAMETER	DESCRIPTION
`x`	the sequence fed to the positional encoder model (required). Shape [sequence length, batch size, embed dim] TYPE: `Tensor`

RETURNS	DESCRIPTION
`Tensor`	Tensor (Tensor), of shape [sequence length, batch size, embed dim]

Examples:

>>> import pytorch_lightning as pl
>>> pl.seed_everything(42)
42
>>> x = torch.ones((4,1,6)).float()
>>> pos_encoder = FourierPositionalEncoding(6, max_len=10)
>>> output = pos_encoder(x)
>>> output.shape
torch.Size([4, 1, 6])
>>> output
tensor([[[...]],
    [[...]],
    [[...]],
    [[...]]])

Source code in elfragmentador/model/nn_encoding.py

def forward(self, x: Tensor) -> Tensor:
    r"""Forward pass though the encoder.

    Args:
        x (Tensor):
            the sequence fed to the positional encoder model (required).
            Shape **[sequence length, batch size, embed dim]**

    Returns:
        Tensor (Tensor), of shape **[sequence length, batch size, embed dim]**

    Examples:
        >>> import pytorch_lightning as pl
        >>> pl.seed_everything(42)
        42
        >>> x = torch.ones((4,1,6)).float()
        >>> pos_encoder = FourierPositionalEncoding(6, max_len=10)
        >>> output = pos_encoder(x)
        >>> output.shape
        torch.Size([4, 1, 6])
        >>> output
        tensor([[[...]],
            [[...]],
            [[...]],
            [[...]]])
    """
    if self.static_size:
        end_position = self.static_size
    else:
        end_position = x.size(0)

    x = x + self.pe[:end_position, :]
    return x

`plot_encoding(ax = None)`

Source code in elfragmentador/model/nn_encoding.py

def plot_encoding(self, ax=None):
    if ax is None:
        raise ValueError("Must pass an axis to plot on")
    ax.imwhow(self.pe.clone().detach().numpy().squeeze().numpy())

`elfragmentador.model.nn_encoding.AASequenceEmbedding(d_model, max_length, aa_names, mod_names, mod_pad_index = 0, aa_pad_index = 0)`

Bases: torch.nn.Module

Source code in elfragmentador/model/nn_encoding.py

def __init__(
    self, d_model, max_length, aa_names, mod_names, mod_pad_index=0, aa_pad_index=0
):
    logger.info("Initializing AASequenceEmbedding")
    super().__init__()
    # Positional information additions
    self.aa_names = aa_names
    self.mod_names = mod_names

    self.position_embed = FourierPositionalEncoding(
        d_model=d_model,
        max_len=max_length,
    )

    # Aminoacid embedding
    self.aa_encoder = nn.Embedding(
        len(self.aa_names), d_model, padding_idx=aa_pad_index
    )
    logger.debug(
        "Aminoacid embedding will use:"
        f" {aa_names[aa_pad_index]} as the padding index",
    )

    # PTM embedding
    self.mod_encoder = nn.Embedding(
        len(self.mod_names), d_model, padding_idx=mod_pad_index
    )
    logger.debug(
        "Modification embedding will use:"
        f" {mod_names[aa_pad_index]} as the padding index",
    )

    # Weight Initialization
    self.init_weights()

Attributes

`aa_names = aa_names` `instance-attribute`

`mod_names = mod_names` `instance-attribute`

`position_embed = FourierPositionalEncoding(d_model=d_model, max_len=max_length)` `instance-attribute`

`aa_encoder = nn.Embedding(len(self.aa_names), d_model, padding_idx=aa_pad_index)` `instance-attribute`

`mod_encoder = nn.Embedding(len(self.mod_names), d_model, padding_idx=mod_pad_index)` `instance-attribute`

Functions

`init_weights() -> None`

Source code in elfragmentador/model/nn_encoding.py

def init_weights(self) -> None:
    logger.info("Initializing weights on AASequenceEmbedding")
    initrange = 0.1
    ptm_initrange = initrange * 0.01
    torch.nn.init.uniform_(self.aa_encoder.weight, -initrange, initrange)
    self.initial_aa_weights = self.aa_encoder.weight.clone().detach().cpu()
    torch.nn.init.uniform_(self.mod_encoder.weight, -ptm_initrange, ptm_initrange)
    self.initial_mod_weights = self.mod_encoder.weight.clone().detach().cpu()

`forward(seq, mods)`

Source code in elfragmentador/model/nn_encoding.py

def forward(self, seq, mods):
    # seq and mod are [N, S] shaped
    mods = F.pad(mods, (0, seq.size(1) - mods.size(1)), "constant")
    seq = self.aa_encoder(seq.permute(1, 0))
    mods = self.mod_encoder(mods.permute(1, 0))
    seq = seq + mods

    # TODO consider if this line is needed, it is used in attention is all you need
    seq = seq * math.sqrt(self.aa_encoder.num_embeddings)
    seq = self.position_embed(seq)

    return seq

`as_DataFrames() -> Tuple[DataFrame, DataFrame]`

Returns the weights as data frames.

RETURNS	DESCRIPTION
`Tuple[DataFrame, DataFrame]`	Tuple[DataFrame, DataFrame]: A data frame of the aminoacid embeddings and the modification embeddings

Examples:

>>> from elfragmentador.config import CONFIG
>>> embed = AASequenceEmbedding(
...     d_model=20,
...     aa_names=CONFIG.encoding_aa_order,
...     mod_names=CONFIG.encoding_mod_order,
...     max_length=100,)
>>> aa_embed, mod_embed = embed.as_DataFrames()
>>> list(aa_embed)
['__missing__', 'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K',
 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y',
 'c_term', 'n_term']
>>> list(mod_embed)
[None, '[UNIMOD:4]', '[UNIMOD:35]', '[UNIMOD:21]', '[UNIMOD:121]',
'[UNIMOD:737]', '[UNIMOD:1]', '[UNIMOD:34]', '[UNIMOD:36]',
'[UNIMOD:37]', '[UNIMOD:354]', '[UNIMOD:7]', '__unknown1__']

Source code in elfragmentador/model/nn_encoding.py

def as_DataFrames(self) -> Tuple[DataFrame, DataFrame]:
    """
    Returns the weights as data frames.

    Returns:
        Tuple[DataFrame, DataFrame]:
            A data frame of the aminoacid embeddings
            and the modification embeddings

    Examples:
        >>> from elfragmentador.config import CONFIG
        >>> embed = AASequenceEmbedding(
        ...     d_model=20,
        ...     aa_names=CONFIG.encoding_aa_order,
        ...     mod_names=CONFIG.encoding_mod_order,
        ...     max_length=100,)
        >>> aa_embed, mod_embed = embed.as_DataFrames()
        >>> list(aa_embed)
        ['__missing__', 'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K',
         'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y',
         'c_term', 'n_term']
        >>> list(mod_embed)
        [None, '[UNIMOD:4]', '[UNIMOD:35]', '[UNIMOD:21]', '[UNIMOD:121]',
        '[UNIMOD:737]', '[UNIMOD:1]', '[UNIMOD:34]', '[UNIMOD:36]',
        '[UNIMOD:37]', '[UNIMOD:354]', '[UNIMOD:7]', '__unknown1__']
    """
    df_aa = pd.DataFrame(data=self.aa_encoder.weight.detach().numpy().T)
    df_aa.columns = self.aa_names

    df_mod = pd.DataFrame(data=self.mod_encoder.weight.detach().cpu().numpy().T)
    df_mod.columns = self.mod_names

    return df_aa, df_mod

PepTransformerModel

elfragmentador.model

Attributes

elfragmentador.model.LiteralFalse = Literal[False] module-attribute

Classes

Attributes

accepted_schedulers = ['plateau', 'cosine', 'onecycle'] class-attribute

__version__ = elfragmentador.__version__ class-attribute

ms2ml_config = CONFIG instance-attribute

NUM_FRAGMENT_EMBEDDINGS = self.ms2ml_config.num_fragment_embeddings instance-attribute

metric_calculator = MetricCalculator() instance-attribute

mse_loss = nn.MSELoss(reduction='none') instance-attribute

cosine_loss = CosineLoss(dim=1, eps=1e-08) instance-attribute

angle_loss = SpectralAngleLoss(dim=1, eps=1e-08) instance-attribute

lr = lr instance-attribute

scheduler = scheduler instance-attribute

lr_ratio = lr_ratio instance-attribute

steps_per_epoch = steps_per_epoch instance-attribute

loss_ratio = loss_ratio instance-attribute

irt_metric = MissingDataAverager() instance-attribute

loss_metric = MissingDataAverager() instance-attribute

spectra_metric = MissingDataAverager() instance-attribute

spectra_metric2 = MissingDataAverager() instance-attribute

Functions

summarize(max_depth = 3)

forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor)

predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | Spectrum

torch_batch_from_seq(args, kwargs) -> ForwardBatch staticmethod

to_torchscript()

add_model_specific_args(parser: _ArgumentGroup) -> _ArgumentGroup staticmethod

configure_scheduler_plateau(optimizer, lr_ratio) staticmethod

configure_scheduler_cosine(optimizer, lr_ratio, min_lr) staticmethod

configure_scheduler_oncecycle(optimizer, lr_ratio, learning_rate, steps_per_epoch, accumulate_grad_batches, max_epochs) staticmethod

configure_optimizers() -> tuple[list[AdamW], list[dict[str, ReduceLROnPlateau | str]]] | tuple[list[AdamW], list[dict[str, CosineAnnealingWarmRestarts | str]]] | tuple[list[AdamW], list[dict[str, OneCycleLR | str]]]

plot_scheduler_lr()

training_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor

on_train_start() -> None

validation_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor

validation_epoch_end(outputs: list[Tensor]) -> list[Tensor]

test_step(batch, batch_idx: int | None = None) -> tuple[dict[str, Tensor], PredictionResults]

test_epoch_end(results: list)

predict_step(batch: TrainBatch, batch_idx: int | None = None)

on_after_backward()

on_train_epoch_end() -> None

load_from_checkpoint(args, kwargs) classmethod

Functions

elfragmentador.model.evaluate_landmark_rt(model: PepTransformerModel)

elfragmentador.model.peptransformer

Classes

elfragmentador.model.peptransformer.PepTransformerBase(num_fragments, num_decoder_layers: int = 6, num_encoder_layers: int = 6, nhid: int = 2024, d_model: int = 516, nhead: int = 4, dropout: float = 0.1, combine_embeds: bool = True, combine_encoders: bool = True, final_decoder = 'linear') -> None

Attributes

encoder = PeptideTransformerEncoder(d_model=d_model, dropout=dropout, nhead=nhead, nhid=nhid, layers=num_encoder_layers) instance-attribute

decoder = FragmentTransformerDecoder(d_model=d_model, nhead=nhead, nhid=nhid, layers=num_decoder_layers, dropout=dropout, num_fragments=num_fragments, final_decoder=final_decoder) instance-attribute

irt_decoder = IRTDecoder(d_model=d_model, dim_feedforward=nhid, nhead=nhead, n_layers=num_encoder_layers, dropout=dropout, final_decoder=final_decoder) instance-attribute

Functions

forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor) -> PredictionResults

predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | AnnotatedPeptideSpectrum

elfragmentador.model.ms_transformer_layers

Attributes

elfragmentador.model.ms_transformer_layers.CONFIG = get_default_config() module-attribute

Classes

elfragmentador.model.ms_transformer_layers.IRTDecoder(d_model, dim_feedforward = 224, nhead = 4, n_layers = 3, dropout = 0.05, final_decoder = 'linear')

Attributes

aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100) instance-attribute

encoder = nn.TransformerEncoder(encoder_layer=encoder_layers, num_layers=n_layers) instance-attribute

decoder = _LearnableEmbedTransformerDecoder(d_model=d_model, nhead=nhead, nhid=dim_feedforward, layers=n_layers, dropout=dropout, num_outputs=1, final_decoder=final_decoder) instance-attribute

Functions

forward(seq, mods)

elfragmentador.model.ms_transformer_layers.PeptideTransformerEncoder(d_model: int, dropout: float, nhead: int, nhid: int, layers: int) -> None

Attributes

aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100) instance-attribute

encoder = nn.TransformerEncoder(encoder_layers, layers) instance-attribute

Functions

forward(seq: Tensor, mods: Tensor) -> Tensor

elfragmentador.model.ms_transformer_layers.FragmentTransformerDecoder(d_model: int, nhead: int, nhid: int, layers: int, dropout: float, num_fragments: int, charge_dims_pct: float = 0.05, nce_dims_pct: float = 0.05, final_decoder: str = 'linear') -> None

Attributes

charge_encoder = ConcatenationEncoder(dims_add=charge_dims, max_val=10.0, scaling=math.sqrt(d_model)) instance-attribute

nce_encoder = ConcatenationEncoder(dims_add=nce_dims, max_val=100.0, scaling=math.sqrt(d_model)) instance-attribute

Functions

init_weights()

`elfragmentador.model`

`elfragmentador.model.LiteralFalse = Literal[False]` `module-attribute`

`accepted_schedulers = ['plateau', 'cosine', 'onecycle']` `class-attribute`

`version = elfragmentador.version` `class-attribute`

`ms2ml_config = CONFIG` `instance-attribute`

`NUM_FRAGMENT_EMBEDDINGS = self.ms2ml_config.num_fragment_embeddings` `instance-attribute`

`metric_calculator = MetricCalculator()` `instance-attribute`

`mse_loss = nn.MSELoss(reduction='none')` `instance-attribute`

`cosine_loss = CosineLoss(dim=1, eps=1e-08)` `instance-attribute`

`angle_loss = SpectralAngleLoss(dim=1, eps=1e-08)` `instance-attribute`

`lr = lr` `instance-attribute`

`scheduler = scheduler` `instance-attribute`

`lr_ratio = lr_ratio` `instance-attribute`

`steps_per_epoch = steps_per_epoch` `instance-attribute`

`loss_ratio = loss_ratio` `instance-attribute`

`irt_metric = MissingDataAverager()` `instance-attribute`

`loss_metric = MissingDataAverager()` `instance-attribute`

`spectra_metric = MissingDataAverager()` `instance-attribute`

`spectra_metric2 = MissingDataAverager()` `instance-attribute`

`summarize(max_depth = 3)`

`forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor)`

`predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | Spectrum`

`torch_batch_from_seq(args, kwargs) -> ForwardBatch` `staticmethod`

`to_torchscript()`

`add_model_specific_args(parser: _ArgumentGroup) -> _ArgumentGroup` `staticmethod`

`configure_scheduler_plateau(optimizer, lr_ratio)` `staticmethod`

`configure_scheduler_cosine(optimizer, lr_ratio, min_lr)` `staticmethod`

`configure_scheduler_oncecycle(optimizer, lr_ratio, learning_rate, steps_per_epoch, accumulate_grad_batches, max_epochs)` `staticmethod`

`configure_optimizers() -> tuple[list[AdamW], list[dict[str, ReduceLROnPlateau | str]]] | tuple[list[AdamW], list[dict[str, CosineAnnealingWarmRestarts | str]]] | tuple[list[AdamW], list[dict[str, OneCycleLR | str]]]`

`plot_scheduler_lr()`

`training_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor`

`on_train_start() -> None`

`validation_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor`

`validation_epoch_end(outputs: list[Tensor]) -> list[Tensor]`

`test_step(batch, batch_idx: int | None = None) -> tuple[dict[str, Tensor], PredictionResults]`

`test_epoch_end(results: list)`

`predict_step(batch: TrainBatch, batch_idx: int | None = None)`

`on_after_backward()`

`on_train_epoch_end() -> None`

`load_from_checkpoint(args, kwargs)` `classmethod`

`elfragmentador.model.evaluate_landmark_rt(model: PepTransformerModel)`

`elfragmentador.model.peptransformer`

`elfragmentador.model.peptransformer.PepTransformerBase(num_fragments, num_decoder_layers: int = 6, num_encoder_layers: int = 6, nhid: int = 2024, d_model: int = 516, nhead: int = 4, dropout: float = 0.1, combine_embeds: bool = True, combine_encoders: bool = True, final_decoder = 'linear') -> None`

`encoder = PeptideTransformerEncoder(d_model=d_model, dropout=dropout, nhead=nhead, nhid=nhid, layers=num_encoder_layers)` `instance-attribute`

`decoder = FragmentTransformerDecoder(d_model=d_model, nhead=nhead, nhid=nhid, layers=num_decoder_layers, dropout=dropout, num_fragments=num_fragments, final_decoder=final_decoder)` `instance-attribute`

`irt_decoder = IRTDecoder(d_model=d_model, dim_feedforward=nhid, nhead=nhead, n_layers=num_encoder_layers, dropout=dropout, final_decoder=final_decoder)` `instance-attribute`

`forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor) -> PredictionResults`

`predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | AnnotatedPeptideSpectrum`

`elfragmentador.model.ms_transformer_layers`

`elfragmentador.model.ms_transformer_layers.CONFIG = get_default_config()` `module-attribute`

`elfragmentador.model.ms_transformer_layers.IRTDecoder(d_model, dim_feedforward = 224, nhead = 4, n_layers = 3, dropout = 0.05, final_decoder = 'linear')`

`aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100)` `instance-attribute`

`encoder = nn.TransformerEncoder(encoder_layer=encoder_layers, num_layers=n_layers)` `instance-attribute`

`decoder = _LearnableEmbedTransformerDecoder(d_model=d_model, nhead=nhead, nhid=dim_feedforward, layers=n_layers, dropout=dropout, num_outputs=1, final_decoder=final_decoder)` `instance-attribute`

`forward(seq, mods)`

`elfragmentador.model.ms_transformer_layers.PeptideTransformerEncoder(d_model: int, dropout: float, nhead: int, nhid: int, layers: int) -> None`

`aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100)` `instance-attribute`

`encoder = nn.TransformerEncoder(encoder_layers, layers)` `instance-attribute`

`forward(seq: Tensor, mods: Tensor) -> Tensor`

`elfragmentador.model.ms_transformer_layers.FragmentTransformerDecoder(d_model: int, nhead: int, nhid: int, layers: int, dropout: float, num_fragments: int, charge_dims_pct: float = 0.05, nce_dims_pct: float = 0.05, final_decoder: str = 'linear') -> None`

`charge_encoder = ConcatenationEncoder(dims_add=charge_dims, max_val=10.0, scaling=math.sqrt(d_model))` `instance-attribute`

`nce_encoder = ConcatenationEncoder(dims_add=nce_dims, max_val=100.0, scaling=math.sqrt(d_model))` `instance-attribute`

`init_weights()`

`preprocess_query(query, charge, nce)`

`forward(memory: Tensor, memory_key_padding_mask: Tensor, charge: Tensor, nce: Tensor) -> Tensor`

`elfragmentador.model.nn_encoding`

`elfragmentador.model.nn_encoding.LiteralFalse = Literal[False]` `module-attribute`

`elfragmentador.model.nn_encoding.ConcatenationEncoder(dims_add: int, max_val: Union[float, int] = 200.0, static_size: bool = False, scaling = 1) -> None`

`static_size = static_size` `instance-attribute`

`dims_add = dims_add` `instance-attribute`

`scaling = scaling` `instance-attribute`

`forward(x: Tensor, val: Tensor) -> Tensor`

`elfragmentador.model.nn_encoding.FourierPositionalEncoding(d_model: int, max_len: int = 5000, static_size: Union[LiteralFalse, int] = False) -> None`

`d_model = d_model` `instance-attribute`

`static_size = static_size` `instance-attribute`

`forward(x: Tensor) -> Tensor`

`plot_encoding(ax = None)`

`elfragmentador.model.nn_encoding.AASequenceEmbedding(d_model, max_length, aa_names, mod_names, mod_pad_index = 0, aa_pad_index = 0)`

`aa_names = aa_names` `instance-attribute`

`mod_names = mod_names` `instance-attribute`