PepTransformerModel
elfragmentador.model
Attributes
elfragmentador.model.LiteralFalse = Literal[False]
module-attribute
Classes
elfragmentador.model.PepTransformerModel(num_decoder_layers: int = 6, num_encoder_layers: int = 6, nhid: int = 2024, d_model: int = 516, nhead: int = 4, dropout: float = 0.1, combine_embeds: bool = True, combine_encoders: bool = True, final_decoder: str = 'linear', lr: float = 0.0001, scheduler: str = 'plateau', lr_ratio: float | int = 200, steps_per_epoch: None = None, loss_ratio: float = 5, args, kwargs) -> None
Bases: pl.LightningModule
PepTransformerModel Predicts retention times and HCD spectra from peptides.
init Instantiates the class.
Generates a new instance of the PepTransformerModel
PARAMETER | DESCRIPTION |
---|---|
num_decoder_layers |
int, optional Number of layers in the transformer decoder, by default 6
DEFAULT:
|
num_encoder_layers |
int, optional Number of laters in the transformer encoder, by default 6
DEFAULT:
|
nhid |
int, optional Number of dimensions used in the feedforward networks inside the transformer encoder and decoders, by default 2024
DEFAULT:
|
d_model |
int, optional Number of features to pass to the transformer encoder. The embedding transforms the input to this input, by default 516
DEFAULT:
|
nhead |
int, optional Number of multi-attention heads in the transformer, by default 4
DEFAULT:
|
dropout |
float, optional dropout, by default 0.1
DEFAULT:
|
combine_embeds |
bool, optional Whether the embeddings for modifications and sequences should be shared for irt and fragment predictions
TYPE:
|
combine_encoders |
bool = True, Whether the transformer encoders for for irt and fragments should be shared.
TYPE:
|
lr |
float, optional Learning rate, by default 1e-4
DEFAULT:
|
scheduler |
str, optional
What scheduler to use, check the available ones with
DEFAULT:
|
lr_ratio |
Union[float, int], optional For cosine annealing: Ratio of the initial learning rate to use with cosine annealing for instance a lr or 1 and a ratio of 10 would have a minimum learning rate of 0.1. For onecycle: Ratio of the initial lr and and maximum one, for instance if lr is 0.1 and ratio is 10, the max learn rate would be 1.0. by default 200
DEFAULT:
|
steps_per_epoch |
None, optional expected number of steps per epoch, used internally to calculate learning rates when using the oncecycle scheduler, by default None
DEFAULT:
|
loss_ratio |
float, optional The ratio of the spectrum to retention time loss to use when adding before passing to the optimizer. Higher values mean more weight to spectra with respect to the retention time. By default 5
TYPE:
|
Source code in elfragmentador/model/__init__.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
|
Attributes
accepted_schedulers = ['plateau', 'cosine', 'onecycle']
class-attribute
__version__ = elfragmentador.__version__
class-attribute
ms2ml_config = CONFIG
instance-attribute
NUM_FRAGMENT_EMBEDDINGS = self.ms2ml_config.num_fragment_embeddings
instance-attribute
main_model = PepTransformerBase(num_fragments=self.NUM_FRAGMENT_EMBEDDINGS, num_decoder_layers=num_decoder_layers, num_encoder_layers=num_encoder_layers, nhid=nhid, d_model=d_model, nhead=nhead, dropout=dropout, combine_embeds=combine_embeds, combine_encoders=combine_encoders, final_decoder=final_decoder)
instance-attribute
metric_calculator = MetricCalculator()
instance-attribute
mse_loss = nn.MSELoss(reduction='none')
instance-attribute
cosine_loss = CosineLoss(dim=1, eps=1e-08)
instance-attribute
angle_loss = SpectralAngleLoss(dim=1, eps=1e-08)
instance-attribute
lr = lr
instance-attribute
scheduler = scheduler
instance-attribute
lr_ratio = lr_ratio
instance-attribute
steps_per_epoch = steps_per_epoch
instance-attribute
loss_ratio = loss_ratio
instance-attribute
irt_metric = MissingDataAverager()
instance-attribute
loss_metric = MissingDataAverager()
instance-attribute
spectra_metric = MissingDataAverager()
instance-attribute
spectra_metric2 = MissingDataAverager()
instance-attribute
Functions
summarize(max_depth = 3)
forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor)
predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | Spectrum
torch_batch_from_seq(args, kwargs) -> ForwardBatch
staticmethod
to_torchscript()
Convert the model to torchscript.
Example:
model = PepTransformerModel() ts = model.to_torchscript() type(ts)
Source code in elfragmentador/model/__init__.py
add_model_specific_args(parser: _ArgumentGroup) -> _ArgumentGroup
staticmethod
Add_model_specific_args Adds arguments to a parser.
It is used to add the command line arguments for the training/generation of the model.
PARAMETER | DESCRIPTION |
---|---|
parser |
An argparser parser (anything that has the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
_ArgumentGroup
|
_ArgumentGroup, the same parser with the added arguments |
Source code in elfragmentador/model/__init__.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
|
configure_scheduler_plateau(optimizer, lr_ratio)
staticmethod
Source code in elfragmentador/model/__init__.py
configure_scheduler_cosine(optimizer, lr_ratio, min_lr)
staticmethod
Source code in elfragmentador/model/__init__.py
configure_scheduler_oncecycle(optimizer, lr_ratio, learning_rate, steps_per_epoch, accumulate_grad_batches, max_epochs)
staticmethod
Source code in elfragmentador/model/__init__.py
configure_optimizers() -> tuple[list[AdamW], list[dict[str, ReduceLROnPlateau | str]]] | tuple[list[AdamW], list[dict[str, CosineAnnealingWarmRestarts | str]]] | tuple[list[AdamW], list[dict[str, OneCycleLR | str]]]
Configure_optimizers COnfigures the optimizers for training.
It is internally used by pytorch_lightning during training, so far I implemented 3 options (set when making the module).
OneCycleLR seems to give the best results overall in the least amount of time. The only tradeoff that I see is that resuming training does not seem to be really easy.
Check the pytorch_lightning documentation to see how this is used in the training loop
RETURNS | DESCRIPTION |
---|---|
tuple[list[AdamW], list[dict[str, ReduceLROnPlateau | str]]] | tuple[list[AdamW], list[dict[str, CosineAnnealingWarmRestarts | str]]] | tuple[list[AdamW], list[dict[str, OneCycleLR | str]]]
|
Two lists, one containing the optimizer and another contining the scheduler. |
Source code in elfragmentador/model/__init__.py
plot_scheduler_lr()
Plot the learning rate of the scheduler.
This is useful to see how the learning rate changes during training, and to make sure that the scheduler is working as intended.
Source code in elfragmentador/model/__init__.py
training_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor
See pytorch_lightning documentation.
Source code in elfragmentador/model/__init__.py
on_train_start() -> None
validation_step(batch: TrainBatch, batch_idx: int | None = None) -> Tensor
See pytorch_lightning documentation.
Source code in elfragmentador/model/__init__.py
validation_epoch_end(outputs: list[Tensor]) -> list[Tensor]
See pytorch lightning documentation.
Source code in elfragmentador/model/__init__.py
test_step(batch, batch_idx: int | None = None) -> tuple[dict[str, Tensor], PredictionResults]
test_epoch_end(results: list)
predict_step(batch: TrainBatch, batch_idx: int | None = None)
Source code in elfragmentador/model/__init__.py
on_after_backward()
Source code in elfragmentador/model/__init__.py
on_train_epoch_end() -> None
Functions
elfragmentador.model.evaluate_landmark_rt(model: PepTransformerModel)
Checks the prediction of the model on the iRT peptides.
Predicts all the procal and Biognosys iRT peptides and checks the correlation of the theoretical iRT values and the predicted ones
PARAMETER | DESCRIPTION |
---|---|
model |
PepTransformerModel A model to test the predictions on
TYPE:
|
Source code in elfragmentador/model/__init__.py
elfragmentador.model.peptransformer
Classes
elfragmentador.model.peptransformer.PepTransformerBase(num_fragments, num_decoder_layers: int = 6, num_encoder_layers: int = 6, nhid: int = 2024, d_model: int = 516, nhead: int = 4, dropout: float = 0.1, combine_embeds: bool = True, combine_encoders: bool = True, final_decoder = 'linear') -> None
Bases: nn.Module
Source code in elfragmentador/model/peptransformer.py
Attributes
encoder = PeptideTransformerEncoder(d_model=d_model, dropout=dropout, nhead=nhead, nhid=nhid, layers=num_encoder_layers)
instance-attribute
decoder = FragmentTransformerDecoder(d_model=d_model, nhead=nhead, nhid=nhid, layers=num_decoder_layers, dropout=dropout, num_fragments=num_fragments, final_decoder=final_decoder)
instance-attribute
irt_decoder = IRTDecoder(d_model=d_model, dim_feedforward=nhid, nhead=nhead, n_layers=num_encoder_layers, dropout=dropout, final_decoder=final_decoder)
instance-attribute
Functions
forward(seq: Tensor, mods: Tensor, charge: Tensor, nce: Tensor) -> PredictionResults
Forward Generate predictions.
Privides the function for the forward pass to the model.
PARAMETER | DESCRIPTION |
---|---|
seq |
Encoded pepide sequence [B, L] (view details)
TYPE:
|
mods |
Encoded modification sequence [B, L], by default None
TYPE:
|
nce |
float Tensor with the charges [B, 1]
TYPE:
|
charge |
long Tensor with the charges [B, 1], by default None
TYPE:
|
Details
seq: The peptide is encoded as integers for the aminoacid. "AAA" encoded for a max length of 5 would be torch.Tensor([ 1, 1, 1, 0, 0]).long() nce: Normalized collision energy to use during the prediction. charge: A tensor corresponding to the charges of each of the peptide precursors (long) mods: Modifications encoded as integers
Source code in elfragmentador/model/peptransformer.py
predict_from_seq(seq: str, nce: float, as_spectrum = False) -> PredictionResults | AnnotatedPeptideSpectrum
Predict_from_seq Predicts spectra from a sequence as a string.
Utility method that gets a sequence as a string, encodes it internally to the correct input form and outputs the predicted spectra.
Note that the spectra is not decoded as an output, please check
elfragmentador.encoding_decoding.decode_fragment_tensor
for the
decoding.
The irt is scaled by 100 and is in the Biognosys scale.
TODO: consider if the output should be decoded ...
PARAMETER | DESCRIPTION |
---|---|
seq |
Sequence to use for prediction, supports modifications in the form of S[PHOSPHO], S[+80] and T[181]
TYPE:
|
nce |
Normalized collision energy to use during the prediction
TYPE:
|
as_spectrum |
Wether to return a Spectrum object instead of the raw tensor predictions (Default value = False)
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
PredictionResults
|
A named tuple with two named results; irt and spectra
TYPE:
|
Spectrum
|
A spectrum object with the predicted spectrum
TYPE:
|
Examples:
>>> import pytorch_lightning as pl
>>> from elfragmentador.config import CONFIG
>>> pl.seed_everything(42)
42
>>> my_model = PepTransformerBase(num_fragments=CONFIG.num_fragment_embeddings) # Or load the model from a checkpoint
>>> _ = my_model.eval()
>>> my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27)
PredictionResults(irt=tensor(...), spectra=tensor([...]))
>>> out = my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27, as_spectrum=True)
>>> type(out)
<class 'ms2ml.spectrum.AnnotatedPeptideSpectrum'>
>>> # my_model.predict_from_seq("MYPEPT[U:21]IDEK/3", 27)
Source code in elfragmentador/model/peptransformer.py
elfragmentador.model.ms_transformer_layers
Attributes
elfragmentador.model.ms_transformer_layers.CONFIG = get_default_config()
module-attribute
Classes
elfragmentador.model.ms_transformer_layers.IRTDecoder(d_model, dim_feedforward = 224, nhead = 4, n_layers = 3, dropout = 0.05, final_decoder = 'linear')
Bases: nn.Module
Source code in elfragmentador/model/ms_transformer_layers.py
Attributes
aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100)
instance-attribute
encoder = nn.TransformerEncoder(encoder_layer=encoder_layers, num_layers=n_layers)
instance-attribute
decoder = _LearnableEmbedTransformerDecoder(d_model=d_model, nhead=nhead, nhid=dim_feedforward, layers=n_layers, dropout=dropout, num_outputs=1, final_decoder=final_decoder)
instance-attribute
Functions
forward(seq, mods)
Source code in elfragmentador/model/ms_transformer_layers.py
elfragmentador.model.ms_transformer_layers.PeptideTransformerEncoder(d_model: int, dropout: float, nhead: int, nhid: int, layers: int) -> None
Bases: torch.nn.Module
Source code in elfragmentador/model/ms_transformer_layers.py
Attributes
aa_embed = AASequenceEmbedding(d_model=d_model, aa_names=CONFIG.encoding_aa_order, mod_names=CONFIG.encoding_mod_order, max_length=100)
instance-attribute
encoder = nn.TransformerEncoder(encoder_layers, layers)
instance-attribute
Functions
forward(seq: Tensor, mods: Tensor) -> Tensor
Source code in elfragmentador/model/ms_transformer_layers.py
elfragmentador.model.ms_transformer_layers.FragmentTransformerDecoder(d_model: int, nhead: int, nhid: int, layers: int, dropout: float, num_fragments: int, charge_dims_pct: float = 0.05, nce_dims_pct: float = 0.05, final_decoder: str = 'linear') -> None
Bases: _LearnableEmbedTransformerDecoder
Source code in elfragmentador/model/ms_transformer_layers.py
Attributes
charge_encoder = ConcatenationEncoder(dims_add=charge_dims, max_val=10.0, scaling=math.sqrt(d_model))
instance-attribute
nce_encoder = ConcatenationEncoder(dims_add=nce_dims, max_val=100.0, scaling=math.sqrt(d_model))
instance-attribute
Functions
init_weights()
preprocess_query(query, charge, nce)
Source code in elfragmentador/model/ms_transformer_layers.py
forward(memory: Tensor, memory_key_padding_mask: Tensor, charge: Tensor, nce: Tensor) -> Tensor
Source code in elfragmentador/model/ms_transformer_layers.py
Functions
elfragmentador.model.nn_encoding
Implements torch models to handle encoding and decoding of positions as well as.
learnable embeddings for the aminoacids and ions.
Attributes
elfragmentador.model.nn_encoding.LiteralFalse = Literal[False]
module-attribute
Classes
elfragmentador.model.nn_encoding.ConcatenationEncoder(dims_add: int, max_val: Union[float, int] = 200.0, static_size: bool = False, scaling = 1) -> None
Bases: torch.nn.Module
ConcatenationEncoder concatenates information into the embedding.
Adds information on continuous variables into an embedding by concatenating an n number of dimensions to it.
It is meant to add different information to every element in a batch, but the same information (number of dimensions) to every element of a sequence inside an element of the batch. (x[i_1,j,-y:] = x[i_2,j,-y:]) ; being (y) the number of added dimensions.
PARAMETER | DESCRIPTION |
---|---|
dims_add |
Number of dimensions to add as an encoding
TYPE:
|
max_val |
maximum expected value of the variable that will be encoded, by default 200.0
TYPE:
|
static_size |
Optional ingeter to pass in order to make the size deterministic. This is only required if you want to export your model to torchscript, by default False
TYPE:
|
Examples:
>>> x1 = torch.zeros((5, 1, 20))
>>> x2 = torch.zeros((5, 2, 20))
>>> encoder = ConcatenationEncoder(dims_add = 10, max_val=10)
>>> output = encoder(x1, torch.tensor([[7]]))
>>> output = encoder(x2, torch.tensor([[7], [4]]))
Source code in elfragmentador/model/nn_encoding.py
Attributes
static_size = static_size
instance-attribute
dims_add = dims_add
instance-attribute
scaling = scaling
instance-attribute
Functions
forward(x: Tensor, val: Tensor) -> Tensor
Forward pass thought the encoder.
PARAMETER | DESCRIPTION |
---|---|
x |
the sequence fed to the encoder model (required). shape is [sequence length, batch size, embed dim].
TYPE:
|
val |
value to be encoded into the sequence (required). Shape is [batch size, 1].
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
Tensor (Tensor), |
Tensor
|
Of shape [sequence length, batch size, embed_dim + added_dims] |
Examples:
>>> x1 = torch.zeros((5, 1, 20))
>>> x2 = torch.cat([x1, x1+1], axis = 1)
>>> encoder = ConcatenationEncoder(10, max_val = 10)
>>> output = encoder(x1, torch.tensor([[7]]))
>>> output.shape
torch.Size([5, 1, 30])
>>> output = encoder(x2, torch.tensor([[7], [4]]))
Source code in elfragmentador/model/nn_encoding.py
elfragmentador.model.nn_encoding.FourierPositionalEncoding(d_model: int, max_len: int = 5000, static_size: Union[LiteralFalse, int] = False) -> None
Bases: torch.nn.Module
FourierPositionalEncoding adds positional information to tensors.
Inject some information about the relative or absolute position of the tokens in the sequence. The positional encodings have the same dimension as the embeddings, so that the two can be summed. Here, we use sine and cosine functions of different frequencies.
({PosEncoder}(pos, 2i) = sin(pos/10000^(2i/d_model)) ({PosEncoder}(pos, 2i+1) = cos(pos/10000^(2i/d_model)))
where pos is the word position and i is the embed idx)
PARAMETER | DESCRIPTION |
---|---|
d_model |
the embed dim (required), must be even.
TYPE:
|
max_len |
the max. length of the incoming sequence (default=5000).
TYPE:
|
static_size |
If it is an integer it is the size of the inputs that will be given, it is used only when tracing the model for torchscript (since torchscript needs fixed length inputs), by default False
TYPE:
|
Note
Therefore encoding are (seq_length, batch, encodings)
Examples:
>>> posencoder = FourierPositionalEncoding(20, max_len=20)
>>> x = torch.ones((2,1,20)).float()
>>> x.shape
torch.Size([2, 1, 20])
>>> posencoder(x).shape
torch.Size([2, 1, 20])
Source code in elfragmentador/model/nn_encoding.py
Attributes
d_model = d_model
instance-attribute
static_size = static_size
instance-attribute
Functions
forward(x: Tensor) -> Tensor
Forward pass though the encoder.
PARAMETER | DESCRIPTION |
---|---|
x |
the sequence fed to the positional encoder model (required). Shape [sequence length, batch size, embed dim]
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tensor
|
Tensor (Tensor), of shape [sequence length, batch size, embed dim] |
Examples:
>>> import pytorch_lightning as pl
>>> pl.seed_everything(42)
42
>>> x = torch.ones((4,1,6)).float()
>>> pos_encoder = FourierPositionalEncoding(6, max_len=10)
>>> output = pos_encoder(x)
>>> output.shape
torch.Size([4, 1, 6])
>>> output
tensor([[[...]],
[[...]],
[[...]],
[[...]]])
Source code in elfragmentador/model/nn_encoding.py
elfragmentador.model.nn_encoding.AASequenceEmbedding(d_model, max_length, aa_names, mod_names, mod_pad_index = 0, aa_pad_index = 0)
Bases: torch.nn.Module
Source code in elfragmentador/model/nn_encoding.py
Attributes
aa_names = aa_names
instance-attribute
mod_names = mod_names
instance-attribute
position_embed = FourierPositionalEncoding(d_model=d_model, max_len=max_length)
instance-attribute
aa_encoder = nn.Embedding(len(self.aa_names), d_model, padding_idx=aa_pad_index)
instance-attribute
mod_encoder = nn.Embedding(len(self.mod_names), d_model, padding_idx=mod_pad_index)
instance-attribute
Functions
init_weights() -> None
Source code in elfragmentador/model/nn_encoding.py
forward(seq, mods)
Source code in elfragmentador/model/nn_encoding.py
as_DataFrames() -> Tuple[DataFrame, DataFrame]
Returns the weights as data frames.
RETURNS | DESCRIPTION |
---|---|
Tuple[DataFrame, DataFrame]
|
Tuple[DataFrame, DataFrame]: A data frame of the aminoacid embeddings and the modification embeddings |
Examples:
>>> from elfragmentador.config import CONFIG
>>> embed = AASequenceEmbedding(
... d_model=20,
... aa_names=CONFIG.encoding_aa_order,
... mod_names=CONFIG.encoding_mod_order,
... max_length=100,)
>>> aa_embed, mod_embed = embed.as_DataFrames()
>>> list(aa_embed)
['__missing__', 'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K',
'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y',
'c_term', 'n_term']
>>> list(mod_embed)
[None, '[UNIMOD:4]', '[UNIMOD:35]', '[UNIMOD:21]', '[UNIMOD:121]',
'[UNIMOD:737]', '[UNIMOD:1]', '[UNIMOD:34]', '[UNIMOD:36]',
'[UNIMOD:37]', '[UNIMOD:354]', '[UNIMOD:7]', '__unknown1__']