As part of Oz Single Cell 2018 conference, we are hosting a single cell data analysis challenge. The challenge aims to foster the development of ideas and approaches to better utilise data arising from single cell sequencing technology, specifically using one or more of three single cell data sets. We encourage innovation from any aspect of single cell analytics from the extraction of simple biological insight through to complex analytical approaches.  


Possible entries to the challenge include:

  • Technological innovation including better quality control, normalization pipeline, frameworks for comparison of pipelines, 

  • Biological discovery or confirmation, 

  • Analytical innovation including integration with other data sets, high-level visualisation and others. 


Please note that entries are not limited to the above suggestions and participants are strongly encouraged to come up with new ways for utilizing these types of data.


Challenge Entrants


Teams can consist of any number of people, but we encourage teams to have a mix of students, postdocs, and more senior academic staff. To register your team, please fill in our form here, note that this is not binding and you can change your mind later. One week before the single cell conference, we ask each team entering the challenge to provide us with a 200-word description of their entry.  


Judging criteria 


All participants are invited to present their analysis during the Oz Single Cell conference. Each team will be allocated a 3 min rapid-fire talk and a panel will judge the entries according to the novelty of the technique/biological insight and its potential impact on future single cell work. A number of prizes will be given (more details on this coming).



You must use as the basis of your analysis one of the datasets below.


  1. Nestorowa et al., 2016.  This dataset consists of 1,654 human haematopoietic stem and progenitor cells. You can find the normalized gene counts along with flow cytometry data here 

  2. Nguyen et al., 2018. Paper is here. This dataset consists of 18,787 human induced pluripotent stem cells. We provide the data on three levels:

    • Raw fastq files.

    • Pre-QC read count expression matrix here

    • Post-QC and normalized expression matrix. All the data be downloaded as one file here (note the dataset is ~120 Gb. There is a small issue with the raw count matrix in this gzipped file and the correct raw count matrix can be downloaded using the link above). 

  3. Su et al., 2017. Paper is here. This dataset consists of 507 developing mouse Liver cells in a time course experiment. We provide the data on three levels:

    1. Raw fastq files can be found on SRA here

    2. Pre-QC read count expression matrix here

    3. Post-QC and normalized expression matrix based on Su et al (2017) can be found on GEO with the ID GSE87795. Link to the dataset here


There is more description of the three datasets below

You are able to augment your data analysis with any public datasets (not limited to single cell data), especially if it helps to gain novel biological insight. 




Details of each of the dataset


Nguyen et al., 2018. 


This study aimed to use single-cell sequencing to uncover heterogeneity of cell states in pluripotent stem cells. The authors sequenced 18,787 WTC CRISPRi human iPSC and used an unsupervised clustering method to identify four subpopulations based on the cell state: core pluripotent population (48.3%), proliferative (47.8%), early-primed for differentiation (2.8%) and late-primed for differentiation (1.1%). Sequencing was performed using 10X Genomics Single Cell 3’ Chips, four biological replicates were sequenced to an average depth of 44,506 reads per cell. 


Nestorowa et al., 2016.


This study aimed to use single-cell technologies to provide insights into the gene expression profiles observed during blood stem cell differentiation in mice. The experiment generated gene-expression profiles for 1654 FACS sorted cells resulting in the quantification of the expression 4291 genes and acquisition of abundance measures for 9 surface-marker proteins. The authors used diffusion maps to reduce the dimensionality of their data down to three dimensions and hierarchical clustering to identify four main cell clusters. These projections were used to visualize the genes driving the cell heteregoenity. Two external single-cell RNA-Seq experiments of mouse blood stem cell were also projected onto the space defined be the diffusion maps.


The authors provide their processed data and code used to perform their analysis here in addition to links to the three manuscripts that use this data.


Su et al., 2017.


The study aims to investigate the dynamic development of fetal liver stem/progenitor cells (LSPCs). The experiment derived single-cell RNA-seq from 507 single cells from seven stages during mouse liver development, including embryonic day (E) 11.5, E12.5, E13.5, E14.5, E16.5, E18.5 and postnatal day (P) 2.5. ERCC Spike-ins were included in each sample as controls. Stage E12.5, E14.5, E16.5, E18.5 and P2.5 include the cells from two biological replicates (two embryonic mice), while for stage E11.5 and E13.5, the experiment analysed the cells from one embryonic mouse respectively. Single-cell libraries were pooled and sequenced by NextSeq 500 (Illumina). 

Raw FASTQ files were mapped to mm10 supplemented with ERCC sequences with STAR version 2.5.2, counted using featureCounts software. Only reads mapping to a unique genomic location were used.




If you have any questions about this challenge, feel free to email us at: