Usage of the multiple command line interface options ...
output
usage: elfragmentador append_pin [-h] [--pin PIN] [--nce NCE] [--rawfile_locations RAWFILE_LOCATIONS] [--out OUT] [--model_checkpoint MODEL_CHECKPOINT] [--threads THREADS]
options: -h, --help show this help message and exit --pin PIN Input percolator file --nce NCE Collision energy to use for the prediction --rawfile_locations RAWFILE_LOCATIONS Locations to look for the raw files --out OUT Input percolator file --model_checkpoint MODEL_CHECKPOINT Model checkpoint to use for the prediction, if nothing is passed will download a pretrained model --threads THREADS Number of threads to use during inference
Evaluating prediction data with your own data!
output
usage: elfragmentador evaluate [-h] [--input INPUT] [--nce NCE] [--out OUT] [--assure_notrain ASSURE_NOTRAIN] [--model_checkpoint MODEL_CHECKPOINT] [--threads THREADS]
options: -h, --help show this help message and exit --input INPUT Path to a file to use as a reference for the evaluation (.sptxt generally) --nce NCE Comma delimited series of collision energies to use --out OUT csv file to output results to --assure_notrain ASSURE_NOTRAIN Whether to remove all sequences that could be assigned to the training set --model_checkpoint MODEL_CHECKPOINT Model checkpoint to use for the prediction, if nothing is passed will download a pretrained model --threads THREADS Number of threads to use during inference
Example
You can use several spectral library formats for compare the predictions from ElFragmentador with your data.
In this case we will use the .peptides.txt file that mokapor uses as a default output! This also requires having the .mzML with the spectra in the same directory. (if it is not there it will try to find them in a couple of other directories).
This will go over the different nces provided, find the one that matches the best the data provided (the first couple hundred spectra). Then it will use that that nce to predict all spectra in the file and compare them to the real one. It finally shows some "plots" on the performance and a csv file with the calculated metrics.
Note that the --assure_notrain flag can be used to ignore in the similarity calculations all peptides that even had a chance to be in the training of the model.
$ poetry run elfragmentador evaluate --input mokapot.peptides.txt.evaluation.log --nce 24,28,30,32,34,38,42 --out evaluation.csv --assure_notrain 1
Global seed set to 2020
2022-11-21 07:42:50.131 | INFO | elfragmentador.cli:greeting:72 - ElFragmentador version: 0.55.0a1
2022-11-21 07:42:50.131 | INFO | elfragmentador.cli:setup_model:39 - Loading model from https://github.com/jspaezp/elfragmentador-modelzoo/raw/9e6ee76cde441d2459ec52418ec6f874e69f9a7b/0.55.0a2/0.55.0a2_ef_21e_4h_64_120_nel5ndl6_500_lmlp_sim01_val_l%3D0.129946_epoch%3D020.ckpt
2022-11-21 07:42:50.159 | INFO | elfragmentador.model:__init__:126 - num_decoder_layers 6 num_encoder_layers 5 nhid 120 d_model 64 nhead 4 dropout 0.02combined embeds True combined encoders False
2022-11-21 07:42:50.159 | INFO | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-21 07:42:50.164 | INFO | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-21 07:42:50.167 | INFO | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=6
2022-11-21 07:42:50.171 | INFO | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 174
2022-11-21 07:42:50.172 | INFO | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-21 07:42:50.172 | INFO | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-21 07:42:50.175 | INFO | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=5
2022-11-21 07:42:50.180 | INFO | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 1
2022-11-21 07:42:50.804 | INFO | elfragmentador.model:evaluate_landmark_rt:769 - {'polynomial': [0.010075807176896115, -0.0008703060814287248], 'determination': 0.9961272999162974}
2022-11-21 07:42:50.820 | INFO | elfragmentador.model:evaluate_landmark_rt:775 -
Prediction vs real iRT of biognosys and procal peptides
┌────────────────────────────────────────────────────────────┐
│ │ ▗▖▐│ 1
│ │ ▗▖ ▗ ▝│
│ │ ▖▞ ▀ │
│ │ ▄▄▚▖ │
│ │ ▄▝▖ ▘ │
│ │ ▗▙▝▝ │
│ │ ▖▝▘▘ │
│ │ ▞▝ ▘ │
│ │ ▟▗▘ │
│ │ ▖ ▐▗ │
│ │ ▝▐ │
│ │ ▗▄▝▝▝ │
│▁▁▁▁▁▁▁▁▁▁▁▁▁▁▗▁▚▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│ 0
│ ▖ ▞ │ │
│ ▗ │ │
│ ▘ │ │
│▖▗ ▘ │ │
└────────────────────────────────────────────────────────────┘
-20 0 20 40 60 80 100
2022-11-21 07:42:50.821 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:42:55.007 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:03.947 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:03.953 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:07.883 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:16.347 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:16.349 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:20.260 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:28.969 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:28.977 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:32.877 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:41.633 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:41.647 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:45.604 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:43:54.175 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:43:54.190 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:43:58.125 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:44:06.524 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:44:06.526 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:44:10.412 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
2022-11-21 07:44:18.825 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 1955/2456 spectra
2022-11-21 07:44:18.827 | INFO | elfragmentador.data.predictor:screen_nce:74 - Best NCE: 30.0, with median spectral angle: 0.4119728918060316
2022-11-21 07:44:18.827 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from evaluation/results/CElegansGG/mokapot/mokapot.peptides.txt using <class 'ms2ml.data.adapters.mokapot.MokapotPSMAdapter'>
2022-11-21 07:44:22.865 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:218 - Setting up the adapter to drop training spectra
100%|██████████| 3618/3618 [00:13<00:00, 273.69it/s]
2022-11-21 07:44:36.086 | INFO | elfragmentador.data.predictor:compare:119 - Skipped 2790/3618 spectra
Pred RT (y) vs RT (x)
┌────────────────────────────────────────────────────────────┐
│ ▗▛▛▘ ▝│
│ █▙ ▖│ 6,000
│ ▄ ▗▟▘▝ │
│ ▗ ▟▞▐▙▜█▛▘ │
│ ▗▖▐▄▟██▛▛▛▘▝▝ │
│ ▞ ▄▄▝▙█▞██▐ ▖ │ 4,000
│ ▖ ▗ ▐▟▄▟█▞▌██ ▘ │
│ ▘ ▗▄▌▛████▀▛▀ │
│ ▝ ▞▐▟█▛▟█▛▛▛▖ ▘ │ 2,000
│ ▗▖ ▝▙█▐███▛▛▘▝▝ │
│ ▖ ▖ ▄▖█▗██▛▜▀▘ │
│▁▁▗▁▖▁▁▖▁▁▗▙▖██████▀▌▝▘▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│ 0
│ ▗▖▖ ▟▄▞▟███▀▀▝▘▘ │
│ ▌▙▀▜▗█▟▛▛▀▖ ▗ ▝ │
│ ▐▙███▜▘▘ ▘ ▝ │
│▀▝▛▘ │ -2,000
│ ▗ │
└────────────────────────────────────────────────────────────┘
1,000 2,000 3,000 4,000 5,000
Histogram of the spectral angles
Median: 0.33
Q1: 0.22
Q3: 0.47
┌────────────────────────────────────────────────────────────┐
│ │ ▛▀▀▀▀▀▌ │ 80
│ │ ▄▄▄▌ ▀▀▀▌ │
│ │ ▄▄▄▌ ▀▀▀▀▀▜ │
│ │ ▌ ▐ │
│ │ ▌ ▐ │
│ │ ▌ ▐ │
│ │ ▌ ▝▀▀▜ │
│ │ ▛▀▀▘ ▐▄▄▄ │
│ │ ▌ ▝▀▀▜ │ 40
│ │ ▌ ▝▀▀▜ │
│ │ ▌ ▐ │
│ │ ▌ ▝▀▀▜ │
│ │ ▌ ▐ │
│ │ ▄▄▄▌ ▐ │
│ │ ▌ ▝▀▀▜ │
│ │▄▄▄▌ ▐▄▄▄ │
│▄▄▄▌▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▐▄▄▄▄▄▄│ 0
└────────────────────────────────────────────────────────────┘
0 0.2 0.4 0.6 0.8
Histogram of the spectral angles of only the fragment ions
Median: 0.85
Q1: 0.79
Q3: 0.90
┌────────────────────────────────────────────────────────────┐
│ ▐▀▀▜ │
│ ▗▄▄▟ ▐ │ 200
│ ▐ ▐ │
│ ▐ ▐ │
│ ▐ ▐ │
│ ▐ ▐ │
│ ▐ ▝▀▀▜ │
│ ▗▄▄▟ ▐ │
│ ▐ ▐ │
│ ▐ ▐ │ 100
│ ▐ ▐ │
│ ▐ ▐ │
│ ▐▀▀▀ ▐ │
│ ▐ ▐ │
│ ▐ ▐ │
│ ▄▄▄▄▄▟▀▀▀ ▐ │
│▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▛▀▀▙▄▄▄▄▄▌▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▐▄▄▄▄▄▄│ 0
└────────────────────────────────────────────────────────────┘
0.2 0.5 0.8 1
Predicting peptides directly from a fasta file
To get the help for the function run ...
output
usage: elfragmentador predict [-h] [--fasta FASTA] [--enzyme ENZYME] [--nce NCE] [--charges CHARGES] [--missed_cleavages MISSED_CLEAVAGES] [--min_length MIN_LENGTH] [--min_mz MIN_MZ] [--max_mz MAX_MZ] [--out OUT] [--model_checkpoint MODEL_CHECKPOINT] [--threads THREADS]
options: -h, --help show this help message and exit --fasta FASTA Input fasta file --enzyme ENZYME Enzyme to use to digest the fasta file --nce NCE Collision energy to use for the prediction --charges CHARGES Comma delimited series of charges to use --missed_cleavages MISSED_CLEAVAGES Maximum number of missed clevages --min_length MIN_LENGTH Minimum peptide length to consider --min_mz MIN_MZ Minimum precursor mz to use --max_mz MAX_MZ Maximum precursor mz to use --out OUT Output .dlib file --model_checkpoint MODEL_CHECKPOINT Model checkpoint to use for the prediction, if nothing is passed will download a pretrained model --threads THREADS Number of threads to use during inference
Example
$ elfragmentador predict --fasta tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta --nce 32 --charges 2 --missed_cleavages 0 --min_length 7 --out foo.dlib
Global seed set to 2020
2022-11-26 21:39:39.072 | INFO | elfragmentador.cli:greeting:72 - ElFragmentador version: 0.55.0a1
2022-11-26 21:39:39.072 | INFO | elfragmentador.cli:setup_model:39 - Loading model from https://github.com/jspaezp/elfragmentador-modelzoo/raw/9e6ee76cde441d2459ec52418ec6f874e69f9a7b/0.55.0a2/0.55.0a2_ef_21e_4h_64_120_nel5ndl6_500_lmlp_sim01_val_l%3D0.129946_epoch%3D020.ckpt
2022-11-26 21:39:39.092 | INFO | elfragmentador.model:__init__:126 - num_decoder_layers 6 num_encoder_layers 5 nhid 120 d_model 64 nhead 4 dropout 0.02combined embeds True combined encoders False
2022-11-26 21:39:39.092 | INFO | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-26 21:39:39.104 | INFO | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-26 21:39:39.107 | INFO | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=6
2022-11-26 21:39:39.110 | INFO | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 174
2022-11-26 21:39:39.111 | INFO | elfragmentador.model.nn_encoding:__init__:210 - Initializing AASequenceEmbedding
2022-11-26 21:39:39.111 | INFO | elfragmentador.model.nn_encoding:init_weights:243 - Initializing weights on AASequenceEmbedding
2022-11-26 21:39:39.113 | INFO | elfragmentador.model.transformer_layers:__init__:87 - Creating TransformerDecoder nhid=120, d_model=64 nhead=4 layers=5
2022-11-26 21:39:39.115 | INFO | elfragmentador.model.transformer_layers:__init__:109 - Creating embedding for spectra of length 1
2022-11-26 21:39:39.797 | INFO | elfragmentador.model:evaluate_landmark_rt:769 - {'polynomial': [0.010075807176896115, -0.0008703060814287248], 'determination': 0.9961272999162974}
2022-11-26 21:39:39.813 | INFO | elfragmentador.model:evaluate_landmark_rt:775 -
Prediction vs real iRT of biognosys and procal peptides
┌────────────────────────────────────────────────────────────┐
│ │ ▗▖▐│ 1
│ │ ▗▖ ▗ ▝│
│ │ ▖▞ ▀ │
│ │ ▄▄▚▖ │
│ │ ▄▝▖ ▘ │
│ │ ▗▙▝▝ │
│ │ ▖▝▘▘ │
│ │ ▞▝ ▘ │
│ │ ▟▗▘ │
│ │ ▖ ▐▗ │
│ │ ▝▐ │
│ │ ▗▄▝▝▝ │
│▁▁▁▁▁▁▁▁▁▁▁▁▁▁▗▁▚▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁│ 0
│ ▖ ▞ │ │
│ ▗ │ │
│ ▘ │ │
│▖▗ ▘ │ │
└────────────────────────────────────────────────────────────┘
-20 0 20 40 60 80 100
2022-11-26 21:39:39.816 | INFO | ms2ml.data.adapters:read_data:52 - Reading data from tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta using <class 'ms2ml.data.adapters.fasta.FastaAdapter'>
2022-11-26 21:39:39.816 | INFO | ms2ml.data.parsing.fasta:parse_file:52 - Processing file tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta, with enzyme=trypsin, missed_cleavages=0 min_length=7 max_length=30
2022-11-26 21:39:39.823 | INFO | ms2ml.data.parsing.fasta:parse_file:82 - Done, 468 sequences
2022-11-26 21:39:39.823 | INFO | ms2ml.data.parsing.fasta:parse_file:84 - Removed 205 duplicates
2022-11-26 21:39:39.823 | INFO | ms2ml.data.adapters.fasta:parse:86 - Number of peptides: 468
2022-11-26 21:39:39.824 | INFO | elfragmentador.data.predictor:adapter_out_hook_predict_factory:228 - Setting up the adapter to keep training spectra
0%| | 0/468 [00:00<?, ?it/s]2022-11-26 21:39:39.831 | INFO | ms2ml.data.parsing.fasta:parse_file:52 - Processing file tests/data/fasta/uniprot-proteome_UP000464024_reviewed_yes.fasta, with enzyme=trypsin, missed_cleavages=0 min_length=7 max_length=30
98%|█████████▊| 457/468 [00:04<00:00, 114.90it/s]2022-11-26 21:39:44.393 | INFO | ms2ml.data.parsing.fasta:parse_file:82 - Done, 468 sequences
2022-11-26 21:39:44.393 | INFO | ms2ml.data.parsing.fasta:parse_file:84 - Removed 205 duplicates
2022-11-26 21:39:44.393 | INFO | ms2ml.data.adapters.fasta:parse:86 - Number of peptides: 468
100%|██████████| 468/468 [00:04<00:00, 102.56it/s]
2022-11-26 21:39:44.393 | INFO | ms2ml.data.parsing.encyclopedia:write_encyclopedia:182 - Finished writing EncyclopeDIA database to foo.dlib
2022-11-26 21:39:44.393 | INFO | ms2ml.data.parsing.encyclopedia:write_encyclopedia:183 - Wrote 468 spectra
output
usage: elfragmentador train [-h] [--run_name RUN_NAME] [--wandb_project WANDB_PROJECT] [--terminator_patience TERMINATOR_PATIENCE] [--from_checkpoint FROM_CHECKPOINT] [--num_queries NUM_QUERIES] [--num_decoder_layers NUM_DECODER_LAYERS] [--num_encoder_layers NUM_ENCODER_LAYERS] [--nhid NHID] [--d_model D_MODEL] [--nhead NHEAD] [--dropout DROPOUT] [--combine_embeds | --no-combine_embeds] [--combine_encoders | --no-combine_encoders] [--final_decoder FINAL_DECODER] [--lr LR] [--scheduler SCHEDULER] [--lr_ratio LR_RATIO] [--loss_ratio LOSS_RATIO] [--batch_size BATCH_SIZE] [--data_dir DATA_DIR] [--logger [LOGGER]] [--enable_checkpointing [ENABLE_CHECKPOINTING]] [--default_root_dir DEFAULT_ROOT_DIR] [--gradient_clip_val GRADIENT_CLIP_VAL] [--gradient_clip_algorithm GRADIENT_CLIP_ALGORITHM] [--num_nodes NUM_NODES] [--num_processes NUM_PROCESSES] [--devices DEVICES] [--gpus GPUS] [--auto_select_gpus [AUTO_SELECT_GPUS]] [--tpu_cores TPU_CORES] [--ipus IPUS] [--enable_progress_bar [ENABLE_PROGRESS_BAR]] [--overfit_batches OVERFIT_BATCHES] [--track_grad_norm TRACK_GRAD_NORM] [--check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH] [--fast_dev_run [FAST_DEV_RUN]] [--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES] [--max_epochs MAX_EPOCHS] [--min_epochs MIN_EPOCHS] [--max_steps MAX_STEPS] [--min_steps MIN_STEPS] [--max_time MAX_TIME] [--limit_train_batches LIMIT_TRAIN_BATCHES] [--limit_val_batches LIMIT_VAL_BATCHES] [--limit_test_batches LIMIT_TEST_BATCHES] [--limit_predict_batches LIMIT_PREDICT_BATCHES] [--val_check_interval VAL_CHECK_INTERVAL] [--log_every_n_steps LOG_EVERY_N_STEPS] [--accelerator ACCELERATOR] [--strategy STRATEGY] [--sync_batchnorm [SYNC_BATCHNORM]] [--precision PRECISION] [--enable_model_summary [ENABLE_MODEL_SUMMARY]] [--num_sanity_val_steps NUM_SANITY_VAL_STEPS] [--resume_from_checkpoint RESUME_FROM_CHECKPOINT] [--profiler PROFILER] [--benchmark [BENCHMARK]] [--reload_dataloaders_every_n_epochs RELOAD_DATALOADERS_EVERY_N_EPOCHS] [--auto_lr_find [AUTO_LR_FIND]] [--replace_sampler_ddp [REPLACE_SAMPLER_DDP]] [--detect_anomaly [DETECT_ANOMALY]] [--auto_scale_batch_size [AUTO_SCALE_BATCH_SIZE]] [--plugins PLUGINS] [--amp_backend AMP_BACKEND] [--amp_level AMP_LEVEL] [--move_metrics_to_cpu [MOVE_METRICS_TO_CPU]] [--multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE] [--inference_mode [INFERENCE_MODE]]
options: -h, --help show this help message and exit
Program Parameters: Program level parameters, these should not change the outcome of the run
--run_name RUN_NAME Name to be given to the run (logging) --wandb_project WANDB_PROJECT Wandb project to log to, check out wandb... please
Model Parameters: Parameters that modify the model or its training (learn rate, scheduler, layers, dimension ...)
--num_queries NUM_QUERIES Expected encoding length of the spectra --num_decoder_layers NUM_DECODER_LAYERS Number of sub-encoder-layers in the encoder --num_encoder_layers NUM_ENCODER_LAYERS Number of sub-encoder-layers in the decoder --nhid NHID Dimension of the feedforward networks --d_model D_MODEL Number of input features to the transformer encoder --nhead NHEAD Number of attention heads --dropout DROPOUT --combine_embeds, --no-combine_embeds Whether the embeddings for aminoacid and modifications should be shared between the irt and fragment sections --combine_encoders, --no-combine_encoders Whether the encoders for aminoacid and modifications should be shared between the irt and fragment sections --final_decoder FINAL_DECODER What kind of final layer should the docer have to output a single number, options are 'mlp' and 'linear' --lr LR --scheduler SCHEDULER Scheduler to use during training, either of ['plateau', 'cosine', 'onecycle'] --lr_ratio LR_RATIO For cosine annealing: Ratio of the initial learning rate to use with cosine annealing for instance a lr or 1 and a ratio of 10 would have a minimum learning rate of 0.1 For onecycle: Ratio of the initial lr and and maximum one, for instance if lr is 0.1 and ratio is 10, the max learn ratewould be 1.0 --loss_ratio LOSS_RATIO Ratio between the retention time and the spectrum loss (higher values mean more weight to the spectra loss with respect to the retention time loss)
Data Parameters: Parameters for the loading of data
--batch_size BATCH_SIZE --data_dir DATA_DIR
Trainer Parameters: Parameters that modify the model or its training
--terminator_patience TERMINATOR_PATIENCE Patience for early termination --from_checkpoint FROM_CHECKPOINT The path of a checkpoint to copy weights from before training
pl.Trainer:
--logger [LOGGER] Logger (or iterable collection of loggers) for
experiment tracking. A True
value uses the default
TensorBoardLogger
if it is installed, otherwise
CSVLogger
. False
will disable logging. If
multiple loggers are provided, local files
(checkpoints, profiler traces, etc.) are saved in the
log_dir
of he first logger. Default: True
.
--enable_checkpointing [ENABLE_CHECKPOINTING]
If True
, enable checkpointing. It will configure a
default ModelCheckpoint callback if there is no user-
defined ModelCheckpoint in :paramref:~pytorch_lightni
ng.trainer.trainer.Trainer.callbacks
. Default:
True
.
--default_root_dir DEFAULT_ROOT_DIR
Default path for logs and weights when no
logger/ckpt_callback passed. Default: os.getcwd()
.
Can be remote file paths such as s3://mybucket/path
or 'hdfs://path/'
--gradient_clip_val GRADIENT_CLIP_VAL
The value at which to clip gradients. Passing
gradient_clip_val=None
disables gradient clipping.
If using Automatic Mixed Precision (AMP), the
gradients will be unscaled before. Default: None
.
--gradient_clip_algorithm GRADIENT_CLIP_ALGORITHM
The gradient clipping algorithm to use. Pass
gradient_clip_algorithm="value"
to clip by value,
and gradient_clip_algorithm="norm"
to clip by
norm. By default it will be set to "norm"
.
--num_nodes NUM_NODES
Number of GPU nodes for distributed training. Default:
1
.
--num_processes NUM_PROCESSES
Number of processes for distributed training with
accelerator="cpu"
. Default: 1
. .. deprecated::
v1.7 num_processes
has been deprecated in v1.7 and
will be removed in v2.0. Please use
accelerator='cpu'
and devices=x
instead.
--devices DEVICES Will be mapped to either gpus
, tpu_cores
,
num_processes
or ipus
, based on the accelerator
type.
--gpus GPUS Number of GPUs to train on (int) or which GPUs to
train on (list or str) applied per node Default:
None
. .. deprecated:: v1.7 gpus
has been
deprecated in v1.7 and will be removed in v2.0. Please
use accelerator='gpu'
and devices=x
instead.
--auto_select_gpus [AUTO_SELECT_GPUS]
If enabled and gpus
or devices
is an integer,
pick available gpus automatically. This is especially
useful when GPUs are configured to be in "exclusive
mode", such that only one process at a time can access
them. Default: False
. .. deprecated:: v1.9
auto_select_gpus
has been deprecated in v1.9.0 and
will be removed in v2.0.0. Please use the function :fu
nc:~lightning_fabric.accelerators.cuda.find_usable_cu
da_devices
instead.
--tpu_cores TPU_CORES
How many TPU cores to train on (1 or 8) / Single TPU
to train on (1) Default: None
. .. deprecated::
v1.7 tpu_cores
has been deprecated in v1.7 and
will be removed in v2.0. Please use
accelerator='tpu'
and devices=x
instead.
--ipus IPUS How many IPUs to train on. Default: None
. ..
deprecated:: v1.7 ipus
has been deprecated in v1.7
and will be removed in v2.0. Please use
accelerator='ipu'
and devices=x
instead.
--enable_progress_bar [ENABLE_PROGRESS_BAR]
Whether to enable to progress bar by default. Default:
True
.
--overfit_batches OVERFIT_BATCHES
Overfit a fraction of training/validation data (float)
or a set number of batches (int). Default: 0.0
.
--track_grad_norm TRACK_GRAD_NORM
-1 no tracking. Otherwise tracks that p-norm. May be
set to 'inf' infinity-norm. If using Automatic Mixed
Precision (AMP), the gradients will be unscaled before
logging them. Default: -1
.
--check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH
Perform a validation loop every after every N
training epochs. If None
, validation will be done
solely based on the number of training batches,
requiring val_check_interval
to be an integer
value. Default: 1
.
--fast_dev_run [FAST_DEV_RUN]
Runs n if set to n
(int) else 1 if set to True
batch(es) of train, val and test to find any bugs (ie:
a sort of unit test). Default: False
.
--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
Accumulates grads every k batches or as set up in the
dict. Default: None
.
--max_epochs MAX_EPOCHS
Stop training once this number of epochs is reached.
Disabled by default (None). If both max_epochs and
max_steps are not specified, defaults to max_epochs
= 1000
. To enable infinite training, set
max_epochs = -1
.
--min_epochs MIN_EPOCHS
Force training for at least these many epochs.
Disabled by default (None).
--max_steps MAX_STEPS
Stop training after this number of steps. Disabled by
default (-1). If max_steps = -1
and max_epochs =
None
, will default to max_epochs = 1000
. To
enable infinite training, set max_epochs
to
-1
.
--min_steps MIN_STEPS
Force training for at least these number of steps.
Disabled by default (None
).
--max_time MAX_TIME Stop training after this amount of time has passed.
Disabled by default (None
). The time duration can
be specified in the format DD:HH:MM:SS (days, hours,
minutes seconds), as a :class:datetime.timedelta
, or
a dictionary with keys that will be passed to
:class:datetime.timedelta
.
--limit_train_batches LIMIT_TRAIN_BATCHES
How much of training dataset to check (float =
fraction, int = num_batches). Default: 1.0
.
--limit_val_batches LIMIT_VAL_BATCHES
How much of validation dataset to check (float =
fraction, int = num_batches). Default: 1.0
.
--limit_test_batches LIMIT_TEST_BATCHES
How much of test dataset to check (float = fraction,
int = num_batches). Default: 1.0
.
--limit_predict_batches LIMIT_PREDICT_BATCHES
How much of prediction dataset to check (float =
fraction, int = num_batches). Default: 1.0
.
--val_check_interval VAL_CHECK_INTERVAL
How often to check the validation set. Pass a
float
in the range [0.0, 1.0] to check after a
fraction of the training epoch. Pass an int
to
check after a fixed number of training batches. An
int
value can only be higher than the number of
training batches when
check_val_every_n_epoch=None
, which validates
after every N
training batches across epochs or
during iteration-based training. Default: 1.0
.
--log_every_n_steps LOG_EVERY_N_STEPS
How often to log within steps. Default: 50
.
--accelerator ACCELERATOR
Supports passing different accelerator types ("cpu",
"gpu", "tpu", "ipu", "hpu", "mps", "auto") as well as
custom accelerator instances.
--strategy STRATEGY Supports different training strategies with aliases as
well custom strategies. Default: None
.
--sync_batchnorm [SYNC_BATCHNORM]
Synchronize batch norm layers between process
groups/whole world. Default: False
.
--precision PRECISION
Double precision (64), full precision (32), half
precision (16) or bfloat16 precision (bf16). Can be
used on CPU, GPU, TPUs, HPUs or IPUs. Default: 32
.
--enable_model_summary [ENABLE_MODEL_SUMMARY]
Whether to enable model summarization by default.
Default: True
.
--num_sanity_val_steps NUM_SANITY_VAL_STEPS
Sanity check runs n validation batches before starting
the training routine. Set it to -1
to run all
batches in all validation dataloaders. Default: 2
.
--resume_from_checkpoint RESUME_FROM_CHECKPOINT
Path/URL of the checkpoint from which training is
resumed. If there is no checkpoint file at the path,
an exception is raised. If resuming from mid-epoch
checkpoint, training will start from the beginning of
the next epoch. .. deprecated:: v1.5
resume_from_checkpoint
is deprecated in v1.5 and
will be removed in v2.0. Please pass the path to
Trainer.fit(..., ckpt_path=...)
instead.
--profiler PROFILER To profile individual steps during training and assist
in identifying bottlenecks. Default: None
.
--benchmark [BENCHMARK]
The value (True
or False
) to set
torch.backends.cudnn.benchmark
to. The value for
torch.backends.cudnn.benchmark
set in the current
session will be used (False
if not manually set).
If :paramref:~pytorch_lightning.trainer.Trainer.deter
ministic
is set to True
, this will default to
False
. Override to manually set a different value.
Default: None
.
--reload_dataloaders_every_n_epochs RELOAD_DATALOADERS_EVERY_N_EPOCHS
Set to a non-negative integer to reload dataloaders
every n epochs. Default: 0
.
--auto_lr_find [AUTO_LR_FIND]
If set to True, will make trainer.tune() run a
learning rate finder, trying to optimize initial
learning for faster convergence. trainer.tune() method
will set the suggested learning rate in self.lr or
self.learning_rate in the LightningModule. To use a
different key set a string instead of True with the
key name. Default: False
.
--replace_sampler_ddp [REPLACE_SAMPLER_DDP]
Explicitly enables or disables sampler replacement. If
not specified this will toggled automatically when DDP
is used. By default it will add shuffle=True
for
train sampler and shuffle=False
for val/test
sampler. If you want to customize it, you can set
replace_sampler_ddp=False
and add your own
distributed sampler.
--detect_anomaly [DETECT_ANOMALY]
Enable anomaly detection for the autograd engine.
Default: False
.
--auto_scale_batch_size [AUTO_SCALE_BATCH_SIZE]
If set to True, will initially
run a batch size
finder trying to find the largest batch size that fits
into memory. The result will be stored in
self.batch_size in the LightningModule or
LightningDataModule depending on your setup.
Additionally, can be set to either power
that
estimates the batch size through a power search or
binsearch
that estimates the batch size through a
binary search. Default: False
.
--plugins PLUGINS Plugins allow modification of core behavior like ddp
and amp, and enable custom lightning plugins. Default:
None
.
--amp_backend AMP_BACKEND
The mixed precision backend to use ("native" or
"apex"). Default: 'native''
. .. deprecated:: v1.9
Setting amp_backend
inside the Trainer
is
deprecated in v1.8.0 and will be removed in v2.0.0.
This argument was only relevant for apex which is
being removed.
--amp_level AMP_LEVEL
The optimization level to use (O1, O2, etc...). By
default it will be set to "O2" if amp_backend
is
set to "apex". .. deprecated:: v1.8 Setting
amp_level
inside the Trainer
is deprecated in
v1.8.0 and will be removed in v2.0.0.
--move_metrics_to_cpu [MOVE_METRICS_TO_CPU]
Whether to force internal logged metrics to be moved
to cpu. This can save some gpu memory, but can make
training slower. Use with attention. Default:
False
.
--multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE
How to loop over the datasets when there are multiple
train loaders. In 'max_size_cycle' mode, the trainer
ends one epoch when the largest dataset is traversed,
and smaller datasets reload when running out of their
data. In 'min_size' mode, all the datasets reload when
reaching the minimum length of datasets. Default:
"max_size_cycle"
.
--inference_mode [INFERENCE_MODE]
Whether to use :func:torch.inference_mode
or
:func:torch.no_grad
during evaluation
(validate
/test
/predict
).