With text files¶
The generic way to set up your own cell browser is to start from tab-separated (tsv) or comma-separated (csv) format text files. The steps on this page assume that you have already gone through the process of clustering your cells.
The files you will need¶
You will need the first three files described below in tsv or csv format, the fourth is optional:
- Expression matrix: one row per gene and one column per cell, ideally gzipped. The first column must be the gene identifier or gene symbol, or ideally geneId|symbol. Ensembl/GENCODE gene identifiers starting with ENSG and ENSMUSG will be translated automatically to symbols. The other columns are expression values as numbers, one per cell. The number type will be auto-detected (float or int). The first line of the file must be a header that includes the cell identifiers.
- Cell annotation metadata table, one row per cell. No need to gzip this relatively small file. The first column is the name of the cell and it has to match the name in the expression matrix. There should be at least two columns: one with the name of the cell and one with the name of the cluster. To speed up processing of both your expression matrix and metadata file, these files should describe the same numbers of cells and be in the same order. This allows cbBuild process these files without needing to trim the matrix and reorder the metadata file. The metadata file also must have a header line.
- Cell coordinates, often t-SNE or UMAP coordinates. This file
always has three columns, (cellName, x, y). The cellName must be the same as in
the expression matrix and cell annotation metadata file. If you have run
multiple dimensionality reduction algorithms, you can specify multiple
coordinate files in this format. The number rows in these coordinates doesn’t
need to match that of your expression matrix or metadata files, allowing you to
specify only a subset of the cells. In this way, you can use a single dimensionality
reduction algorithm, but include multiple subsets and view of the cells,
e.g. one coordinates file per tissue. Note, if R has changed your
cell identifiers (e.g. by adding dots), you may be able to fix them by running
- Cluster-specific marker genes (optional). The first column is
the cluster name (from the cell annotation metadata file), the second column
contains the gene symbol (or Ensembl gene ID, which will automatically be mapped
to the gene symbol), and
the third column is some numeric score (e.g. p-Value or FDR). You can add
as many other columns as you like with additional information about this
gene. You can also run
cbMarkerAnnotateon this file to add information from various gene-centric databases and link-outs to other resources to your existing table. See Annotate genes on how to add link to external gene-databases (like Allan Brain Atlas or OMIM) to your marker genes. If you used Seurat for your clustering, you can just provide the raw Seurat marker gene output. You can also specify multiple files of cluster-specific marker genes, e.g. in case that you are also doing differential gene expression analysis or have results from multiple algorithms.
Make sure that all your input files have Unix line endings and fix the line endings if necessary with mac2unix or dos2unix:
file *.txt *.csv *.tsv *.tab
Setting up your
After you have all of these files in place, go to the directory go to the
directory containing all of these files and run the following command to
copy a sample
cellbrowser.conf into your current directory:
cellbrowser.conf includes the required settings plus some other useful settings.
The current list of all possible cellbrowser.conf statements can be found in our example cellbrowser.conf.
In your cellbrowser.conf, replace the default values in the config statements:
exprMatrix- expression matrix file name
meta- cell annotation metadata file name
coords- coordinate file names with a layout method label for each
markers- cluster-specific marker gene file namew with a label for each
clusterField- name of cluster field from header line of metadata file
From the directory where your
cellbrowser.conf is located, run:
cbBuild -o /tmp/cb -p 8888
Point your internet browser to the name of the server (or localhost, if you’re running this on your own machine) followed by :8888, e.g. http://localhost:8888.
The cell browser output directory (/tmp/cb in this example) can hold multiple datasets. If you have a second dataset in another directory that contains cellbrowser.conf, just make sure that the other cellbrowser.conf specifies a different dataset name with name=xxx. Then run cbBuild -o /tmp/cb -p 8888 in the other directory to add the second dataset to your cell browser output directory /tmp/cb.