Describing datasets

The file desc.conf is a key-value file, similar to cellbrowser.conf, but it describes the dataset.

A sample file can be created with the command cbBuild --init or be copied from our Github repo. The tags in the file refer to either HTML files or directly contain the relevant text, URL, accession or in rare cases key/value information.

The most important tags are title, abstract, methods, unitDesc, image, pmid, paper_url, lab and submitter. All tags are described below and they are sorted by content.

These tags contain longer text and can include HTML markup:

  • title: title of the dataset, often the paper title
  • abstract: a big picture summary of the dataset, as a string
  • methods: the methods for the dataset, as a string
  • unitDesc: a description of the values / the unit in the expression matrix (e.g. ‘TPM’ or ‘log’ed counts’)

Instead of long strings with HTML content for abstract and methods, you can also create the files abstract.html and methods.html, they will be used instead. Or use the statements abstractFile and methodsFile to specify other file names. In the HTML, you can use text like <section>some subtitle</section> to split the text into sections.

These tags contain a file name:

  • image: usually a 400px-wide thumbnail of the dimensionality reduction. You can use a command like convert graphical_abstract.png -sampling-factor 4:2:0 -strip -resize 400 -quality 85 -interlace JPEG -colorspace sRGB thumb.jpg to create thumb.jpg, a version of the image only 400 pixels wide that loads faster.
  • rawMatrixFile: the file name of the raw unprocessed matrix. Usually a .zip or .gz file. Also see rawMatrixNote.

The following tags can contain URLs and optionally, separated with a space, a label for the link. If you do not specify the label, a default label will be used (e.g. ‘Biorxiv Preprint’):

  • biorxiv_url: URL of the pre-print
  • paper_url: URL to any website with the fulltext
  • other_url: URL to a website that describes the dataset

The following tags contain accession IDs and will be translated to links (remember that in addition to a string, they can also be a list of strings, in the usual JSON format e.g. [‘123’, ‘234’]):

  • pmid: Pubmed ID of the publication (CIRM TagsV5)
  • geo_series: NCBI GEO series ID (CIRM TagsV5)
  • sra: NCBI SRA accession
  • arrayexpress: EBI Arrayexpress accession
  • ena_project: EBI ENA project accession, ENAPxxxx
  • sra_study: NCBI SRA SRPxxxx accession
  • doi: DOI of paper fulltext
  • dbgap: NCBI dbGaP accession, starts with phs
  • bioproject: NCBI Bioproject accession, PRJNAxxxx. Can be included with or without the PRJNA prefix.
  • ega_study: EGA accession
  • cirm_dataset: CIRM CDW dataset name

The following tags contain just text:

  • submitter: name and/or email of submitter
  • lab: lab and University of submitter
  • submission_date: ideally in format year-month-day
  • rawMatrixNote: text to describe the raw matrix, see rawMatrixFile
  • version: version of dataset, a simple number (1,2,3,…) that should be increased each time a major change (usually meta data) was received from the lab
  • wrangler and shepherd: these are mostly used at UCSC. We store the name of the person of our team who loaded the data (wrangler) and sometimes the name of the person who was in contact with the lab and did quality control on the data (shepherd).

The following tags contain key-value information:

- ``custom``: anything about the dataset that does not have an existing tag, e.g. {'taxon_id':'9606'}
- ``urls``: any url you want to show, e.g. { 'Raw data on synapse':'https://www.synapse.org/#!Synapse:syn21560407' }.
- ``algParams``: algorithm parameters, e.g. { 'louvainRes':'0.7' }. This tag is generated by cbScanpy.

The following tags contain a list of key-value information:: - supplFiles: additional files that should be copied and be shown in the ‘Download’ section, like protocols, Seurat

or Scanpy files, e.g. supplFiles=[{'file':'seurat3.rds', 'label':'Seurat3 RDS'}]

The folllowing tags contain only True or False:

- ``hideDownload=True``: do not show the download instructions.