FindMarkers.Rd
Finds markers (differentially expressed genes) for identity classes
FindMarkers(object, ...) # S3 method for default FindMarkers( object, slot = "data", counts = numeric(), cells.1 = NULL, cells.2 = NULL, features = NULL, reduction = NULL, logfc.threshold = 0.25, test.use = "wilcox", min.pct = 0.1, min.diff.pct = -Inf, verbose = TRUE, only.pos = FALSE, max.cells.per.ident = Inf, random.seed = 1, latent.vars = NULL, min.cells.feature = 3, min.cells.group = 3, pseudocount.use = 1, ... ) # S3 method for Seurat FindMarkers( object, ident.1 = NULL, ident.2 = NULL, group.by = NULL, subset.ident = NULL, assay = NULL, slot = "data", reduction = NULL, features = NULL, logfc.threshold = 0.25, test.use = "wilcox", min.pct = 0.1, min.diff.pct = -Inf, verbose = TRUE, only.pos = FALSE, max.cells.per.ident = Inf, random.seed = 1, latent.vars = NULL, min.cells.feature = 3, min.cells.group = 3, pseudocount.use = 1, ... )
object | An object |
---|---|
... | Arguments passed to other methods and to specific DE methods |
slot | Slot to pull data from; note that if |
counts | Count matrix if using scale.data for DE tests. This is used for computing pct.1 and pct.2 and for filtering features based on fraction expressing |
cells.1 | Vector of cell names belonging to group 1 |
cells.2 | Vector of cell names belonging to group 2 |
features | Genes to test. Default is to use all genes |
reduction | Reduction to use in differential expression testing - will test for DE on cell embeddings |
logfc.threshold | Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells. Default is 0.25 Increasing logfc.threshold speeds up the function, but can miss weaker signals. |
test.use | Denotes which test to use. Available options are:
|
min.pct | only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. Meant to speed up the function by not testing genes that are very infrequently expressed. Default is 0.1 |
min.diff.pct | only test genes that show a minimum difference in the fraction of detection between the two groups. Set to -Inf by default |
verbose | Print a progress bar once expression testing begins |
only.pos | Only return positive markers (FALSE by default) |
max.cells.per.ident | Down sample each identity class to a max number. Default is no downsampling. Not activated by default (set to Inf) |
random.seed | Random seed for downsampling |
latent.vars | Variables to test, used only when |
min.cells.feature | Minimum number of cells expressing the feature in at least one of the two groups, currently only used for poisson and negative binomial tests |
min.cells.group | Minimum number of cells in one of the groups |
pseudocount.use | Pseudocount to add to averaged expression values when calculating logFC. 1 by default. |
ident.1 | Identity class to define markers for; pass an object of class
|
ident.2 | A second identity class for comparison; if |
group.by | Regroup cells into a different identity class prior to performing differential expression (see example) |
subset.ident | Subset a particular identity class prior to regrouping. Only relevant if group.by is set (see example) |
assay | Assay to use in differential expression testing |
data.frame with a ranked list of putative markers as rows, and associated
statistics as columns (p-values, ROC score, etc., depending on the test used (test.use
)). The following columns are always present:
avg_logFC
: log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group
pct.1
: The percentage of cells where the gene is detected in the first group
pct.2
: The percentage of cells where the gene is detected in the second group
p_val_adj
: Adjusted p-value, based on bonferroni correction using all genes in the dataset
p-value adjustment is performed using bonferroni correction based on the total number of genes in the dataset. Other correction methods are not recommended, as Seurat pre-filters genes using the arguments above, reducing the number of tests performed. Lastly, as Aaron Lun has pointed out, p-values should be interpreted cautiously, as the genes used for clustering are the same genes tested for differential expression.
McDavid A, Finak G, Chattopadyay PK, et al. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics. 2013;29(4):461-467. doi:10.1093/bioinformatics/bts714
Trapnell C, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nature Biotechnology volume 32, pages 381-386 (2014)
Andrew McDavid, Greg Finak and Masanao Yajima (2017). MAST: Model-based Analysis of Single Cell Transcriptomics. R package version 1.2.1. https://github.com/RGLab/MAST/
Love MI, Huber W and Anders S (2014). "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome Biology. https://bioconductor.org/packages/release/bioc/html/DESeq2.html
# Find markers for cluster 2 markers <- FindMarkers(object = pbmc_small, ident.1 = 2) head(x = markers)#> p_val avg_logFC pct.1 pct.2 p_val_adj #> HLA-DPB1 2.667056e-08 1.749094 0.947 0.410 6.134228e-06 #> MS4A1 7.532290e-08 4.533790 0.526 0.033 1.732427e-05 #> HLA-DQB1 2.436852e-07 2.049598 0.789 0.246 5.604759e-05 #> HLA-DRB1 1.341885e-06 1.659090 0.842 0.410 3.086335e-04 #> HLA-DRA 4.618445e-06 1.812419 0.895 0.623 1.062242e-03 #> TCL1A 6.943137e-06 4.079870 0.368 0.016 1.596922e-03# Take all cells in cluster 2, and find markers that separate cells in the 'g1' group (metadata # variable 'group') suppressWarnings(markers <- FindMarkers(pbmc_small, ident.1 = "g1", group.by = 'groups', subset.ident = "2")) head(x = markers)#> p_val avg_logFC pct.1 pct.2 p_val_adj #> GSTP1 0.01601528 1.760034 0.7 0.111 1 #> TPM4 0.02048683 3.037198 0.5 0.000 1 #> LINC00936 0.02048683 2.835565 0.5 0.000 1 #> IFI30 0.04515259 3.241233 0.4 0.000 1 #> LGALS2 0.04515259 2.980942 0.4 0.000 1 #> RHOC 0.04515259 2.726791 0.4 0.000 1# Pass 'clustertree' or an object of class phylo to ident.1 and # a node to ident.2 as a replacement for FindMarkersNode # pbmc_small <- BuildClusterTree(object = pbmc_small) # markers <- FindMarkers(object = pbmc_small, ident.1 = 'clustertree', ident.2 = 5) # head(x = markers)