Loading a dataset ----------------- Once you have found a dataset of interest on https://cells.ucsc.edu, it is very easy to load it into your favorite analysis environment. (Let us know if the commands below do not work in your environment.) First, download the expression matrix and the meta data, usually in a Unix terminal:: mkdir adultPancreas cd adultPancreas wget https://cells.ucsc.edu/adultPancreas/exprMatrix.tsv.gz wget https://cells.ucsc.edu/adultPancreas/meta.tsv Replace "quakePancreas" above with the dataset name of interest, it is shown in the URL when you open a dataset after "ds=" or in the download instructions or on the dataset page as the "CellBrowser dataset identifier". Then open your favorite tool (e.g. RStudio or Jupyter) and follow the instructions below. Seurat ^^^^^^ Run these commands if you have downloaded the file as above:: require(Seurat) require(data.table) setwd("adultPancreas") mat <- fread("exprMatrix.tsv.gz") meta <- read.table("meta.tsv", header=T, sep="\t", as.is=T, row.names=1) genes = mat[,1][[1]] genes = gsub(".+[|]", "", genes) mat = data.frame(mat[,-1], row.names=genes, check.names=FALSE) so <- CreateSeuratObject(counts = mat, project = "adultPancreas", meta.data=meta) Or you can download directly into R, without wget, by replacing the fread and read.table commands above in line 4 and 5 with these:: mat <- fread("https://cells.ucsc.edu/adultPancreas/exprMatrix.tsv.gz") meta <- data.frame(fread("https://cells.ucsc.edu/adultPancreas/meta.tsv"), row.names=1) If your version of data.tables does not support .gz yet, the fread commands can be changed to this:: # from current directory mat <- fread("zcat < exprMatrix.tsv.gz") # or direct download: mat <- fread("curl https://cells.ucsc.edu/adultPancreas/exprMatrix.tsv.gz | zcat") If the matrix name is not ``exprMatrix.tsv.gz`` but ``matrix.mtx``, you have to use Seurat's MTX loader. In addition to ``matrix.mtx``, make sure to also download the files ``barcodes.tsv`` and ``genes.tsv`` sometimes called ``features.tsv``. If you downloaded these three files and ``meta.tsv`` into a directory ``downloadDir``, load them like this:: require(Seurat) setwd("downloadDir") mat = Read10X(".") meta = read.table("meta.tsv", header=T, sep="\t", as.is=T, row.names=1) so <- CreateSeuratObject(counts = mat, project = "myProjectName", meta.data=meta) Scanpy ^^^^^^ To create an anndata object in Scanpy if the expression matrix is a .tsv.gz file:: import scanpy as sc import pandas as pd ad = sc.read_text("exprMatrix.tsv.gz") ad = ad.T meta = pd.read_csv("meta.tsv", sep="\t", index_col=0) ad.obs = meta If the expression matrix is an MTX file:: import scanpy as sc import pandas as pd ad = sc.read_mtx("matrix.mtx.gz") meta = pd.read_csv("meta.tsv", sep="\t") ad.var = meta Some datasets use the format identifier|symbol for the ad.obs gene names (e.g. "ENSG0123123.3|HOX3"). To keep only the symbol: ad.obs.index = [x.split("|")[1] for x in ad.obs.index.tolist()]