Usage of the multiple command line interface options ...

! elfragmentador append_pin --help

output

usage: elfragmentador append_pin [-h] [--pin PIN] [--nce NCE] [--rawfile_locations RAWFILE_LOCATIONS] [--out OUT] [--model_checkpoint MODEL_CHECKPOINT] [--threads THREADS]

options: -h, --help show this help message and exit --pin PIN Input percolator file --nce NCE Collision energy to use for the prediction --rawfile_locations RAWFILE_LOCATIONS Locations to look for the raw files --out OUT Input percolator file --model_checkpoint MODEL_CHECKPOINT Model checkpoint to use for the prediction, if nothing is passed will download a pretrained model --threads THREADS Number of threads to use during inference

Evaluating prediction data with your own data!

! elfragmentador evaluate --help

output

usage: elfragmentador evaluate [-h] [--input INPUT] [--nce NCE] [--out OUT] [--assure_notrain ASSURE_NOTRAIN] [--model_checkpoint MODEL_CHECKPOINT] [--threads THREADS]

options: -h, --help show this help message and exit --input INPUT Path to a file to use as a reference for the evaluation (.sptxt generally) --nce NCE Comma delimited series of collision energies to use --out OUT csv file to output results to --assure_notrain ASSURE_NOTRAIN Whether to remove all sequences that could be assigned to the training set --model_checkpoint MODEL_CHECKPOINT Model checkpoint to use for the prediction, if nothing is passed will download a pretrained model --threads THREADS Number of threads to use during inference

Example

You can use several spectral library formats for compare the predictions from ElFragmentador with your data.

In this case we will use the .peptides.txt file that mokapor uses as a default output! This also requires having the .mzML with the spectra in the same directory. (if it is not there it will try to find them in a couple of other directories).

This will go over the different nces provided, find the one that matches the best the data provided (the first couple hundred spectra). Then it will use that that nce to predict all spectra in the file and compare them to the real one. It finally shows some "plots" on the performance and a csv file with the calculated metrics.

Note that the --assure_notrain flag can be used to ignore in the similarity calculations all peptides that even had a chance to be in the training of the model.

$ poetry run elfragmentador evaluate --input mokapot.peptides.txt.evaluation.log --nce 24,28,30,32,34,38,42 --out evaluation.csv --assure_notrain 1


Global seed set to 2020
2022-11-21 07:42:50.131 | INFO     | elfragmentador.cli:greeting:72 - ElFragmentador version: 0.55.0a1
2022-11-21 07:42:50.131 | INFO     | elfragmentador.cli:setup_model:39 - Loading model from https://github.com/jspaezp/elfragmentador-modelzoo/raw/9e6ee76cde441d2459ec52418ec6f874e69f9a7b/0.55.0a2/0.55.0a2_ef_21e_4h_64_120_nel5ndl6_500_lmlp_sim01_val_l%3D0.129946_epoch%3D020.ckpt
2022-11-21 07:42:50.159 | INFO     | elfragmentador.model:__init__:126 - num_decoder_layers 6 num_encoder_layers 5 nhid 120 d_model 64 nhead 4 dropout 0.02combined embeds True combined encoders False
2022-11-21 07:42:50.159 | INFO     | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-21 07:42:50.164 | INFO     | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-21 07:42:50.167 | INFO     | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=6
2022-11-21 07:42:50.171 | INFO     | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 174
2022-11-21 07:42:50.172 | INFO     | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-21 07:42:50.172 | INFO     | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-21 07:42:50.175 | INFO     | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=5
2022-11-21 07:42:50.180 | INFO     | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 1
2022-11-21 07:42:50.804 | INFO     | elfragmentador.model:evaluate_landmark_rt:769 - {'polynomial': [0.010075807176896115, -0.0008703060814287248], 'determination': 0.9961272999162974}
2022-11-21 07:42:50.820 | INFO     | elfragmentador.model:evaluate_landmark_rt:775 -
   Prediction vs real iRT of biognosys and procal peptides
┌────────────────────────────────────────────────────────────┐
│                │                                        ▗▖▐│ 1
│                │                                     ▗▖ ▗ ▝│
│                │                                  ▖▞ ▀     │
│                │                              ▄▄▚▖         │
│                │                          ▄▝▖ ▘            │
│                │                       ▗▙▝▝                │
│                │                    ▖▝▘▘                   │
│                │                 ▞▝  ▘                     │
│                │             ▟▗▘                           │
│                │        ▖ ▐▗                               │
│                │      ▝▐                                   │
│                │ ▗▄▝▝▝                                     │
│▁▁▁▁▁▁▁▁▁▁▁▁▁▁▗▁▚▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│ 0
│            ▖ ▞ │                                           │
│        ▗       │                                           │
│        ▘       │                                           │
│▖▗   ▘          │                                           │
└────────────────────────────────────────────────────────────┘
       -20       0      20       40       60       80      100
2022-11-21 07:42:50.821 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:42:55.007 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:03.947 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:03.953 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:07.883 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:16.347 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:16.349 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:20.260 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:28.969 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:28.977 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:32.877 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:41.633 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:41.647 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:45.604 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:54.175 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:54.190 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:58.125 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:44:06.524 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:44:06.526 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:44:10.412 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:44:18.825 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:44:18.827 | INFO     | elfragmentador.data.predictor:screen_nce:74 - Best NCE: 30.0, with median spectral angle: 0.4119728918060316
2022-11-21 07:44:18.827 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:44:22.865 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
100%|██████████| 3618/3618 [00:13<00:00, 273.69it/s]
2022-11-21 07:44:36.086 | INFO     | elfragmentador.data.predictor:compare:119 - Skipped 2790/3618 spectra
                    Pred RT (y) vs RT (x)
┌────────────────────────────────────────────────────────────┐
│                                                      ▗▛▛▘ ▝│
│                                                      █▙   ▖│ 6,000
│                                                  ▄ ▗▟▘▝    │
│                                             ▗ ▟▞▐▙▜█▛▘     │
│                                         ▗▖▐▄▟██▛▛▛▘▝▝      │
│                                    ▞ ▄▄▝▙█▞██▐ ▖           │ 4,000
│                 ▖             ▗ ▐▟▄▟█▞▌██ ▘                │
│             ▘             ▗▄▌▛████▀▛▀                      │
│                      ▝ ▞▐▟█▛▟█▛▛▛▖ ▘                       │ 2,000
│                ▗▖ ▝▙█▐███▛▛▘▝▝                             │
│         ▖  ▖  ▄▖█▗██▛▜▀▘                                   │
│▁▁▗▁▖▁▁▖▁▁▗▙▖██████▀▌▝▘▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│ 0
│   ▗▖▖ ▟▄▞▟███▀▀▝▘▘                                         │
│  ▌▙▀▜▗█▟▛▛▀▖   ▗         ▝                                 │
│ ▐▙███▜▘▘ ▘      ▝                                          │
│▀▝▛▘                                                        │ -2,000
│ ▗                                                          │
└────────────────────────────────────────────────────────────┘
        1,000      2,000      3,000      4,000       5,000
               Histogram of the spectral angles
               Median: 0.33
               Q1: 0.22
               Q3: 0.47
┌────────────────────────────────────────────────────────────┐
│  │               ▛▀▀▀▀▀▌                                   │ 80
│  │            ▄▄▄▌     ▀▀▀▌                                │
│  │         ▄▄▄▌           ▀▀▀▀▀▜                           │
│  │         ▌                   ▐                           │
│  │         ▌                   ▐                           │
│  │         ▌                   ▐                           │
│  │         ▌                   ▝▀▀▜                        │
│  │      ▛▀▀▘                      ▐▄▄▄                     │
│  │      ▌                            ▝▀▀▜                  │ 40
│  │      ▌                               ▝▀▀▜               │
│  │      ▌                                  ▐               │
│  │      ▌                                  ▝▀▀▜            │
│  │      ▌                                     ▐            │
│  │   ▄▄▄▌                                     ▐            │
│  │   ▌                                        ▝▀▀▜         │
│  │▄▄▄▌                                           ▐▄▄▄      │
│▄▄▄▌▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▐▄▄▄▄▄▄│ 0
└────────────────────────────────────────────────────────────┘
   0            0.2          0.4           0.6           0.8
  Histogram of the spectral angles of only the fragment ions
  Median: 0.85
  Q1: 0.79
  Q3: 0.90
┌────────────────────────────────────────────────────────────┐
│                                               ▐▀▀▜         │
│                                            ▗▄▄▟  ▐         │ 200
│                                            ▐     ▐         │
│                                            ▐     ▐         │
│                                            ▐     ▐         │
│                                            ▐     ▐         │
│                                            ▐     ▝▀▀▜      │
│                                         ▗▄▄▟        ▐      │
│                                         ▐           ▐      │
│                                         ▐           ▐      │ 100
│                                         ▐           ▐      │
│                                         ▐           ▐      │
│                                      ▐▀▀▀           ▐      │
│                                      ▐              ▐      │
│                                      ▐              ▐      │
│                              ▄▄▄▄▄▟▀▀▀              ▐      │
│▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▛▀▀▙▄▄▄▄▄▌▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▐▄▄▄▄▄▄│ 0
└────────────────────────────────────────────────────────────┘
           0.2            0.5            0.8             1

Predicting peptides directly from a fasta file

To get the help for the function run ...

! elfragmentador predict --help

output

usage: elfragmentador predict [-h] [--fasta FASTA] [--enzyme ENZYME] [--nce NCE] [--charges CHARGES] [--missed_cleavages MISSED_CLEAVAGES] [--min_length MIN_LENGTH] [--min_mz MIN_MZ] [--max_mz MAX_MZ] [--out OUT] [--model_checkpoint MODEL_CHECKPOINT] [--threads THREADS]

options: -h, --help show this help message and exit --fasta FASTA Input fasta file --enzyme ENZYME Enzyme to use to digest the fasta file --nce NCE Collision energy to use for the prediction --charges CHARGES Comma delimited series of charges to use --missed_cleavages MISSED_CLEAVAGES Maximum number of missed clevages --min_length MIN_LENGTH Minimum peptide length to consider --min_mz MIN_MZ Minimum precursor mz to use --max_mz MAX_MZ Maximum precursor mz to use --out OUT Output .dlib file --model_checkpoint MODEL_CHECKPOINT Model checkpoint to use for the prediction, if nothing is passed will download a pretrained model --threads THREADS Number of threads to use during inference

Example

$ elfragmentador predict --fasta tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta --nce 32 --charges 2 --missed_cleavages 0 --min_length 7 --out foo.dlib

Global seed set to 2020
2022-11-26 21:39:39.072 | INFO     | elfragmentador.cli:greeting:72 - ElFragmentador version: 0.55.0a1
2022-11-26 21:39:39.072 | INFO     | elfragmentador.cli:setup_model:39 - Loading model from https://github.com/jspaezp/elfragmentador-modelzoo/raw/9e6ee76cde441d2459ec52418ec6f874e69f9a7b/0.55.0a2/0.55.0a2_ef_21e_4h_64_120_nel5ndl6_500_lmlp_sim01_val_l%3D0.129946_epoch%3D020.ckpt
2022-11-26 21:39:39.092 | INFO     | elfragmentador.model:__init__:126 - num_decoder_layers 6 num_encoder_layers 5 nhid 120 d_model 64 nhead 4 dropout 0.02combined embeds True combined encoders False
2022-11-26 21:39:39.092 | INFO     | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-26 21:39:39.104 | INFO     | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-26 21:39:39.107 | INFO     | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=6
2022-11-26 21:39:39.110 | INFO     | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 174
2022-11-26 21:39:39.111 | INFO     | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-26 21:39:39.111 | INFO     | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-26 21:39:39.113 | INFO     | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=5
2022-11-26 21:39:39.115 | INFO     | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 1
2022-11-26 21:39:39.797 | INFO     | elfragmentador.model:evaluate_landmark_rt:769 - {'polynomial': [0.010075807176896115, -0.0008703060814287248], 'determination': 0.9961272999162974}
2022-11-26 21:39:39.813 | INFO     | elfragmentador.model:evaluate_landmark_rt:775 -
   Prediction vs real iRT of biognosys and procal peptides
┌────────────────────────────────────────────────────────────┐
│                │                                        ▗▖▐│ 1
│                │                                     ▗▖ ▗ ▝│
│                │                                  ▖▞ ▀     │
│                │                              ▄▄▚▖         │
│                │                          ▄▝▖ ▘            │
│                │                       ▗▙▝▝                │
│                │                    ▖▝▘▘                   │
│                │                 ▞▝  ▘                     │
│                │             ▟▗▘                           │
│                │        ▖ ▐▗                               │
│                │      ▝▐                                   │
│                │ ▗▄▝▝▝                                     │
│▁▁▁▁▁▁▁▁▁▁▁▁▁▁▗▁▚▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│ 0
│            ▖ ▞ │                                           │
│        ▗       │                                           │
│        ▘       │                                           │
│▖▗   ▘          │                                           │
└────────────────────────────────────────────────────────────┘
       -20       0      20       40       60       80      100
2022-11-26 21:39:39.816 | INFO     | ms2ml.data.adapters:read_data:52 - Reading data from tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta using <class 'ms2ml.data.adapters.fasta.FastaAdapter'>
2022-11-26 21:39:39.816 | INFO     | ms2ml.data.parsing.fasta:parse_file:52 - Processing file tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta, with enzyme=trypsin,  missed_cleavages=0 min_length=7 max_length=30
2022-11-26 21:39:39.823 | INFO     | ms2ml.data.parsing.fasta:parse_file:82 - Done, 468 sequences
2022-11-26 21:39:39.823 | INFO     | ms2ml.data.parsing.fasta:parse_file:84 - Removed 205 duplicates
2022-11-26 21:39:39.823 | INFO     | ms2ml.data.adapters.fasta:parse:86 - Number of peptides: 468
2022-11-26 21:39:39.824 | INFO     | elfragmentador.data.predictor:adapter_out_hook_predict_factory:228 - Setting up the adapter to keep training spectra

  0%|          | 0/468 [00:00<?, ?it/s]2022-11-26 21:39:39.831 | INFO     | ms2ml.data.parsing.fasta:parse_file:52 - Processing file tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta, with enzyme=trypsin,  missed_cleavages=0 min_length=7 max_length=30

 98%|█████████▊| 457/468 [00:04<00:00, 114.90it/s]2022-11-26 21:39:44.393 | INFO     | ms2ml.data.parsing.fasta:parse_file:82 - Done, 468 sequences
2022-11-26 21:39:44.393 | INFO     | ms2ml.data.parsing.fasta:parse_file:84 - Removed 205 duplicates
2022-11-26 21:39:44.393 | INFO     | ms2ml.data.adapters.fasta:parse:86 - Number of peptides: 468

100%|██████████| 468/468 [00:04<00:00, 102.56it/s]
2022-11-26 21:39:44.393 | INFO     | ms2ml.data.parsing.encyclopedia:write_encyclopedia:182 - Finished writing EncyclopeDIA database to foo.dlib
2022-11-26 21:39:44.393 | INFO     | ms2ml.data.parsing.encyclopedia:write_encyclopedia:183 - Wrote 468 spectra

! elfragmentador train --help

output

usage: elfragmentador train [-h] [--run_name RUN_NAME] [--wandb_project WANDB_PROJECT] [--terminator_patience TERMINATOR_PATIENCE] [--from_checkpoint FROM_CHECKPOINT] [--num_queries NUM_QUERIES] [--num_decoder_layers NUM_DECODER_LAYERS] [--num_encoder_layers NUM_ENCODER_LAYERS] [--nhid NHID] [--d_model D_MODEL] [--nhead NHEAD] [--dropout DROPOUT] [--combine_embeds | --no-combine_embeds] [--combine_encoders | --no-combine_encoders] [--final_decoder FINAL_DECODER] [--lr LR] [--scheduler SCHEDULER] [--lr_ratio LR_RATIO] [--loss_ratio LOSS_RATIO] [--batch_size BATCH_SIZE] [--data_dir DATA_DIR] [--logger [LOGGER]] [--enable_checkpointing [ENABLE_CHECKPOINTING]] [--default_root_dir DEFAULT_ROOT_DIR] [--gradient_clip_val GRADIENT_CLIP_VAL] [--gradient_clip_algorithm GRADIENT_CLIP_ALGORITHM] [--num_nodes NUM_NODES] [--num_processes NUM_PROCESSES] [--devices DEVICES] [--gpus GPUS] [--auto_select_gpus [AUTO_SELECT_GPUS]] [--tpu_cores TPU_CORES] [--ipus IPUS] [--enable_progress_bar [ENABLE_PROGRESS_BAR]] [--overfit_batches OVERFIT_BATCHES] [--track_grad_norm TRACK_GRAD_NORM] [--check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH] [--fast_dev_run [FAST_DEV_RUN]] [--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES] [--max_epochs MAX_EPOCHS] [--min_epochs MIN_EPOCHS] [--max_steps MAX_STEPS] [--min_steps MIN_STEPS] [--max_time MAX_TIME] [--limit_train_batches LIMIT_TRAIN_BATCHES] [--limit_val_batches LIMIT_VAL_BATCHES] [--limit_test_batches LIMIT_TEST_BATCHES] [--limit_predict_batches LIMIT_PREDICT_BATCHES] [--val_check_interval VAL_CHECK_INTERVAL] [--log_every_n_steps LOG_EVERY_N_STEPS] [--accelerator ACCELERATOR] [--strategy STRATEGY] [--sync_batchnorm [SYNC_BATCHNORM]] [--precision PRECISION] [--enable_model_summary [ENABLE_MODEL_SUMMARY]] [--num_sanity_val_steps NUM_SANITY_VAL_STEPS] [--resume_from_checkpoint RESUME_FROM_CHECKPOINT] [--profiler PROFILER] [--benchmark [BENCHMARK]] [--reload_dataloaders_every_n_epochs RELOAD_DATALOADERS_EVERY_N_EPOCHS] [--auto_lr_find [AUTO_LR_FIND]] [--replace_sampler_ddp [REPLACE_SAMPLER_DDP]] [--detect_anomaly [DETECT_ANOMALY]] [--auto_scale_batch_size [AUTO_SCALE_BATCH_SIZE]] [--plugins PLUGINS] [--amp_backend AMP_BACKEND] [--amp_level AMP_LEVEL] [--move_metrics_to_cpu [MOVE_METRICS_TO_CPU]] [--multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE] [--inference_mode [INFERENCE_MODE]]

options: -h, --help show this help message and exit

Program Parameters: Program level parameters, these should not change the outcome of the run

--run_name RUN_NAME Name to be given to the run (logging) --wandb_project WANDB_PROJECT Wandb project to log to, check out wandb... please

Model Parameters: Parameters that modify the model or its training (learn rate, scheduler, layers, dimension ...)

--num_queries NUM_QUERIES Expected encoding length of the spectra --num_decoder_layers NUM_DECODER_LAYERS Number of sub-encoder-layers in the encoder --num_encoder_layers NUM_ENCODER_LAYERS Number of sub-encoder-layers in the decoder --nhid NHID Dimension of the feedforward networks --d_model D_MODEL Number of input features to the transformer encoder --nhead NHEAD Number of attention heads --dropout DROPOUT --combine_embeds, --no-combine_embeds Whether the embeddings for aminoacid and modifications should be shared between the irt and fragment sections --combine_encoders, --no-combine_encoders Whether the encoders for aminoacid and modifications should be shared between the irt and fragment sections --final_decoder FINAL_DECODER What kind of final layer should the docer have to output a single number, options are 'mlp' and 'linear' --lr LR --scheduler SCHEDULER Scheduler to use during training, either of ['plateau', 'cosine', 'onecycle'] --lr_ratio LR_RATIO For cosine annealing: Ratio of the initial learning rate to use with cosine annealing for instance a lr or 1 and a ratio of 10 would have a minimum learning rate of 0.1 For onecycle: Ratio of the initial lr and and maximum one, for instance if lr is 0.1 and ratio is 10, the max learn ratewould be 1.0 --loss_ratio LOSS_RATIO Ratio between the retention time and the spectrum loss (higher values mean more weight to the spectra loss with respect to the retention time loss)

Data Parameters: Parameters for the loading of data

--batch_size BATCH_SIZE --data_dir DATA_DIR

Trainer Parameters: Parameters that modify the model or its training

--terminator_patience TERMINATOR_PATIENCE Patience for early termination --from_checkpoint FROM_CHECKPOINT The path of a checkpoint to copy weights from before training

pl.Trainer: --logger [LOGGER] Logger (or iterable collection of loggers) for experiment tracking. A True value uses the default TensorBoardLogger if it is installed, otherwise CSVLogger. False will disable logging. If multiple loggers are provided, local files (checkpoints, profiler traces, etc.) are saved in the log_dir of he first logger. Default: True. --enable_checkpointing [ENABLE_CHECKPOINTING] If True, enable checkpointing. It will configure a default ModelCheckpoint callback if there is no user- defined ModelCheckpoint in :paramref:~pytorch_lightni ng.trainer.trainer.Trainer.callbacks. Default: True. --default_root_dir DEFAULT_ROOT_DIR Default path for logs and weights when no logger/ckpt_callback passed. Default: os.getcwd(). Can be remote file paths such as s3://mybucket/path or 'hdfs://path/' --gradient_clip_val GRADIENT_CLIP_VAL The value at which to clip gradients. Passing gradient_clip_val=None disables gradient clipping. If using Automatic Mixed Precision (AMP), the gradients will be unscaled before. Default: None. --gradient_clip_algorithm GRADIENT_CLIP_ALGORITHM The gradient clipping algorithm to use. Pass gradient_clip_algorithm="value" to clip by value, and gradient_clip_algorithm="norm" to clip by norm. By default it will be set to "norm". --num_nodes NUM_NODES Number of GPU nodes for distributed training. Default: 1. --num_processes NUM_PROCESSES Number of processes for distributed training with accelerator="cpu". Default: 1. .. deprecated:: v1.7 num_processes has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='cpu' and devices=x instead. --devices DEVICES Will be mapped to either gpus, tpu_cores, num_processes or ipus, based on the accelerator type. --gpus GPUS Number of GPUs to train on (int) or which GPUs to train on (list or str) applied per node Default: None. .. deprecated:: v1.7 gpus has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='gpu' and devices=x instead. --auto_select_gpus [AUTO_SELECT_GPUS] If enabled and gpus or devices is an integer, pick available gpus automatically. This is especially useful when GPUs are configured to be in "exclusive mode", such that only one process at a time can access them. Default: False. .. deprecated:: v1.9 auto_select_gpus has been deprecated in v1.9.0 and will be removed in v2.0.0. Please use the function :fu nc:~lightning_fabric.accelerators.cuda.find_usable_cu da_devices instead. --tpu_cores TPU_CORES How many TPU cores to train on (1 or 8) / Single TPU to train on (1) Default: None. .. deprecated:: v1.7 tpu_cores has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='tpu' and devices=x instead. --ipus IPUS How many IPUs to train on. Default: None. .. deprecated:: v1.7 ipus has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='ipu' and devices=x instead. --enable_progress_bar [ENABLE_PROGRESS_BAR] Whether to enable to progress bar by default. Default: True. --overfit_batches OVERFIT_BATCHES Overfit a fraction of training/validation data (float) or a set number of batches (int). Default: 0.0. --track_grad_norm TRACK_GRAD_NORM -1 no tracking. Otherwise tracks that p-norm. May be set to 'inf' infinity-norm. If using Automatic Mixed Precision (AMP), the gradients will be unscaled before logging them. Default: -1. --check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH Perform a validation loop every after every N training epochs. If None, validation will be done solely based on the number of training batches, requiring val_check_interval to be an integer value. Default: 1. --fast_dev_run [FAST_DEV_RUN] Runs n if set to n (int) else 1 if set to True batch(es) of train, val and test to find any bugs (ie: a sort of unit test). Default: False. --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES Accumulates grads every k batches or as set up in the dict. Default: None. --max_epochs MAX_EPOCHS Stop training once this number of epochs is reached. Disabled by default (None). If both max_epochs and max_steps are not specified, defaults to max_epochs = 1000. To enable infinite training, set max_epochs = -1. --min_epochs MIN_EPOCHS Force training for at least these many epochs. Disabled by default (None). --max_steps MAX_STEPS Stop training after this number of steps. Disabled by default (-1). If max_steps = -1 and max_epochs = None, will default to max_epochs = 1000. To enable infinite training, set max_epochs to -1. --min_steps MIN_STEPS Force training for at least these number of steps. Disabled by default (None). --max_time MAX_TIME Stop training after this amount of time has passed. Disabled by default (None). The time duration can be specified in the format DD:HH:MM:SS (days, hours, minutes seconds), as a :class:datetime.timedelta, or a dictionary with keys that will be passed to :class:datetime.timedelta. --limit_train_batches LIMIT_TRAIN_BATCHES How much of training dataset to check (float = fraction, int = num_batches). Default: 1.0. --limit_val_batches LIMIT_VAL_BATCHES How much of validation dataset to check (float = fraction, int = num_batches). Default: 1.0. --limit_test_batches LIMIT_TEST_BATCHES How much of test dataset to check (float = fraction, int = num_batches). Default: 1.0. --limit_predict_batches LIMIT_PREDICT_BATCHES How much of prediction dataset to check (float = fraction, int = num_batches). Default: 1.0. --val_check_interval VAL_CHECK_INTERVAL How often to check the validation set. Pass a float in the range [0.0, 1.0] to check after a fraction of the training epoch. Pass an int to check after a fixed number of training batches. An int value can only be higher than the number of training batches when check_val_every_n_epoch=None, which validates after every N training batches across epochs or during iteration-based training. Default: 1.0. --log_every_n_steps LOG_EVERY_N_STEPS How often to log within steps. Default: 50. --accelerator ACCELERATOR Supports passing different accelerator types ("cpu", "gpu", "tpu", "ipu", "hpu", "mps", "auto") as well as custom accelerator instances. --strategy STRATEGY Supports different training strategies with aliases as well custom strategies. Default: None. --sync_batchnorm [SYNC_BATCHNORM] Synchronize batch norm layers between process groups/whole world. Default: False. --precision PRECISION Double precision (64), full precision (32), half precision (16) or bfloat16 precision (bf16). Can be used on CPU, GPU, TPUs, HPUs or IPUs. Default: 32. --enable_model_summary [ENABLE_MODEL_SUMMARY] Whether to enable model summarization by default. Default: True. --num_sanity_val_steps NUM_SANITY_VAL_STEPS Sanity check runs n validation batches before starting the training routine. Set it to -1 to run all batches in all validation dataloaders. Default: 2. --resume_from_checkpoint RESUME_FROM_CHECKPOINT Path/URL of the checkpoint from which training is resumed. If there is no checkpoint file at the path, an exception is raised. If resuming from mid-epoch checkpoint, training will start from the beginning of the next epoch. .. deprecated:: v1.5 resume_from_checkpoint is deprecated in v1.5 and will be removed in v2.0. Please pass the path to Trainer.fit(..., ckpt_path=...) instead. --profiler PROFILER To profile individual steps during training and assist in identifying bottlenecks. Default: None. --benchmark [BENCHMARK] The value (True or False) to set torch.backends.cudnn.benchmark to. The value for torch.backends.cudnn.benchmark set in the current session will be used (False if not manually set). If :paramref:~pytorch_lightning.trainer.Trainer.deter ministic is set to True, this will default to False. Override to manually set a different value. Default: None. --reload_dataloaders_every_n_epochs RELOAD_DATALOADERS_EVERY_N_EPOCHS Set to a non-negative integer to reload dataloaders every n epochs. Default: 0. --auto_lr_find [AUTO_LR_FIND] If set to True, will make trainer.tune() run a learning rate finder, trying to optimize initial learning for faster convergence. trainer.tune() method will set the suggested learning rate in self.lr or self.learning_rate in the LightningModule. To use a different key set a string instead of True with the key name. Default: False. --replace_sampler_ddp [REPLACE_SAMPLER_DDP] Explicitly enables or disables sampler replacement. If not specified this will toggled automatically when DDP is used. By default it will add shuffle=True for train sampler and shuffle=False for val/test sampler. If you want to customize it, you can set replace_sampler_ddp=False and add your own distributed sampler. --detect_anomaly [DETECT_ANOMALY] Enable anomaly detection for the autograd engine. Default: False. --auto_scale_batch_size [AUTO_SCALE_BATCH_SIZE] If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory. The result will be stored in self.batch_size in the LightningModule or LightningDataModule depending on your setup. Additionally, can be set to either power that estimates the batch size through a power search or binsearch that estimates the batch size through a binary search. Default: False. --plugins PLUGINS Plugins allow modification of core behavior like ddp and amp, and enable custom lightning plugins. Default: None. --amp_backend AMP_BACKEND The mixed precision backend to use ("native" or "apex"). Default: 'native''. .. deprecated:: v1.9 Setting amp_backend inside the Trainer is deprecated in v1.8.0 and will be removed in v2.0.0. This argument was only relevant for apex which is being removed. --amp_level AMP_LEVEL The optimization level to use (O1, O2, etc...). By default it will be set to "O2" if amp_backend is set to "apex". .. deprecated:: v1.8 Setting amp_level inside the Trainer is deprecated in v1.8.0 and will be removed in v2.0.0. --move_metrics_to_cpu [MOVE_METRICS_TO_CPU] Whether to force internal logged metrics to be moved to cpu. This can save some gpu memory, but can make training slower. Use with attention. Default: False. --multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE How to loop over the datasets when there are multiple train loaders. In 'max_size_cycle' mode, the trainer ends one epoch when the largest dataset is traversed, and smaller datasets reload when running out of their data. In 'min_size' mode, all the datasets reload when reaching the minimum length of datasets. Default: "max_size_cycle". --inference_mode [INFERENCE_MODE] Whether to use :func:torch.inference_mode or :func:torch.no_grad during evaluation (validate/test/predict).