Describing datasets¶
The file desc.conf
is a key-value file, similar to cellbrowser.conf
,
but it describes the dataset.
A sample file can be created with the command cbBuild --init
or be copied
from our Github repo.
The tags in the file refer to either HTML files or directly
contain the relevant text, URL, accession or in rare cases key/value information.
The most important tags are title
, abstract
, methods
, unitDesc
, image
,
pmid
, paper_url
, lab
and submitter
. All tags are described below and they
are sorted by content.
These tags contain longer text and can include HTML markup:
title
: title of the dataset, often the paper titleabstract
: a big picture summary of the dataset, as a stringmethods
: the methods for the dataset, as a stringunitDesc
: a description of the values / the unit in the expression matrix (e.g. ‘TPM’ or ‘log’ed counts’)
Instead of long strings with HTML content for abstract
and methods
, you can also create the
files abstract.html
and methods.html
, they will be used instead. Or use the
statements abstractFile
and methodsFile
to specify other file names. In the HTML,
you can use text like <section>some subtitle</section>
to split the text into sections.
These tags contain a file name:
image
: usually a 400px-wide thumbnail of the dimensionality reduction. You can use a command likeconvert graphical_abstract.png -sampling-factor 4:2:0 -strip -resize 400 -quality 85 -interlace JPEG -colorspace sRGB thumb.jpg
to createthumb.jpg
, a version of the image only 400 pixels wide that loads faster.rawMatrixFile
: the file name of the raw unprocessed matrix. Usually a .zip or .gz file. Also seerawMatrixNote
.
The following tags can contain URLs and optionally, separated with a space, a label for the link. If you do not specify the label, a default label will be used (e.g. ‘Biorxiv Preprint’):
biorxiv_url
: URL of the pre-printpaper_url
: URL to any website with the fulltextother_url
: URL to a website that describes the dataset
The following tags contain accession IDs and will be translated to links (remember that in addition to a string, they can also be a list of strings, in the usual JSON format e.g. [‘123’, ‘234’]):
pmid
: Pubmed ID of the publication (CIRM TagsV5)geo_series
: NCBI GEO series ID (CIRM TagsV5)sra
: NCBI SRA accessionarrayexpress
: EBI Arrayexpress accessionena_project
: EBI ENA project accession, ENAPxxxxsra_study
: NCBI SRA SRPxxxx accessiondoi
: DOI of paper fulltextdbgap
: NCBI dbGaP accession, starts with phsbioproject
: NCBI Bioproject accession, PRJNAxxxx. Can be included with or without the PRJNA prefix.ega_study
: EGA accessioncirm_dataset
: CIRM CDW dataset name
The following tags contain just text:
submitter
: name and/or email of submitterlab
: lab and University of submittersubmission_date
: ideally in format year-month-dayrawMatrixNote
: text to describe the raw matrix, seerawMatrixFile
version
: version of dataset, a simple number (1,2,3,…) that should be increased each time a major change (usually meta data) was received from the labwrangler
andshepherd
: these are mostly used at UCSC. We store the name of the person of our team who loaded the data (wrangler) and sometimes the name of the person who was in contact with the lab and did quality control on the data (shepherd).
The following tags contain key-value information:
- ``custom``: anything about the dataset that does not have an existing tag, e.g. {'taxon_id':'9606'}
- ``urls``: any url you want to show, e.g. { 'Raw data on synapse':'https://www.synapse.org/#!Synapse:syn21560407' }.
- ``algParams``: algorithm parameters, e.g. { 'louvainRes':'0.7' }. This tag is generated by cbScanpy.
The following tags contain a list of key-value information::
- supplFiles
: additional files that should be copied and be shown in the ‘Download’ section, like protocols, Seurat
or Scanpy files, e.g.
supplFiles=[{'file':'seurat3.rds', 'label':'Seurat3 RDS'}]
The folllowing tags contain only True
or False
:
- ``hideDownload=True``: do not show the download instructions.