# Generating citations for Census slices

This notebook demonstrates how to generate a citation string for all datasets contained in a Census slice.

**Contents**

1. Requirements
1. Generating citation strings
   1. Via cell metadata query
   1. Via an AnnData query 

⚠️ Note that the Census RNA data includes duplicate cells present across multiple datasets. Duplicate cells can be filtered in or out using the cell metadata variable `is_primary_data` which is described in the [Census schema](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md#repeated-data).

## Requirements

This notebook requires:

- `cellxgene_census` Python package.
- Census data release with [schema version](https://github.com/chanzuckerberg/cellxgene-census/blob/main/docs/cellxgene_census_schema.md) 1.3.0 or greater.

## Generating citation strings

First we open a handle to the Census data. To ensure we open a data release with schema version 1.3.0 or greater, we use `census_version="latest"`

In [1]:
import cellxgene_census

In [2]:
census = cellxgene_census.open_soma(census_version="latest")

In [3]:
census["census_info"]["summary"].read().concat().to_pandas()

Unnamed: 0,soma_joinid,label,value
0,0,census_schema_version,2.1.0
1,1,census_build_date,2024-07-29
2,2,dataset_schema_version,5.1.0
3,3,total_cell_count,118836776
4,4,unique_cell_count,62904117
5,5,number_donors_homo_sapiens,18138
6,6,number_donors_mus_musculus,4640


Then we load the dataset table which contains a column `"citation"` for each dataset included in Census. 

In [4]:
datasets = census["census_info"]["datasets"].read().concat().to_pandas()

In [6]:
datasets["citation"].head()

0    Publication: https://doi.org/10.1002/hep4.1854...
1    Publication: https://doi.org/10.1126/sciimmuno...
2    Publication: https://doi.org/10.1038/s41593-02...
3    Publication: https://doi.org/10.1038/s41467-02...
4    Publication: https://doi.org/10.1038/s41590-02...
Name: citation, dtype: object

For cross-ref style citations you can look at the column `"collection_doi_label"`

In [7]:
datasets["collection_doi_label"].head()

0    Andrews et al. (2022) Hepatology Communications
1                   King et al. (2021) Sci. Immunol.
2                    Leng et al. (2021) Nat Neurosci
3          Rodríguez-Ubreva et al. (2022) Nat Commun
4                   Triana et al. (2021) Nat Immunol
Name: collection_doi_label, dtype: object

In [8]:
datasets.head()

Unnamed: 0,soma_joinid,citation,collection_id,collection_name,collection_doi,collection_doi_label,dataset_id,dataset_version_id,dataset_title,dataset_h5ad_path,dataset_total_cell_count
0,0,Publication: https://doi.org/10.1002/hep4.1854...,44531dd9-1388-4416-a117-af0a99de2294,"Single-Cell, Single-Nucleus, and Spatial RNA S...",10.1002/hep4.1854,Andrews et al. (2022) Hepatology Communications,0895c838-e550-48a3-a777-dbcd35d30272,aaab3abd-624a-442e-b62b-3f2edb10b45e,Healthy human liver: B cells,0895c838-e550-48a3-a777-dbcd35d30272.h5ad,146
1,1,Publication: https://doi.org/10.1126/sciimmuno...,3a2af25b-2338-4266-aad3-aa8d07473f50,Single-cell analysis of human B cell maturatio...,10.1126/sciimmunol.abe6291,King et al. (2021) Sci. Immunol.,00ff600e-6e2e-4d76-846f-0eec4f0ae417,50c1d621-995d-4386-9fcb-5c70fcdf8d66,Human tonsil nonlymphoid cells scRNA,00ff600e-6e2e-4d76-846f-0eec4f0ae417.h5ad,363
2,2,Publication: https://doi.org/10.1038/s41593-02...,180bff9c-c8a5-4539-b13b-ddbc00d643e6,Molecular characterization of selectively vuln...,10.1038/s41593-020-00764-7,Leng et al. (2021) Nat Neurosci,bdacc907-7c26-419f-8808-969eab3ca2e8,e95b54b1-8656-4fe8-9f53-6fdd97f397ba,Molecular characterization of selectively vuln...,bdacc907-7c26-419f-8808-969eab3ca2e8.h5ad,3799
3,3,Publication: https://doi.org/10.1038/s41467-02...,bf325905-5e8e-42e3-933d-9a9053e9af80,Single-cell Atlas of common variable immunodef...,10.1038/s41467-022-29450-x,Rodríguez-Ubreva et al. (2022) Nat Commun,a5d95a42-0137-496f-8a60-101e17f263c8,d6e742c5-f6e5-42f4-8064-622783542f6b,Steady-state B cells - scRNA-seq,a5d95a42-0137-496f-8a60-101e17f263c8.h5ad,1324
4,4,Publication: https://doi.org/10.1038/s41590-02...,93eebe82-d8c3-41bc-a906-63b5b5f24a9d,Single-cell proteo-genomic reference maps of t...,10.1038/s41590-021-01059-0,Triana et al. (2021) Nat Immunol,d3566d6a-a455-4a15-980f-45eb29114cab,61f15353-e598-43b5-bb5a-80ac44a0cf0b,blood and bone marrow from a healthy young donor,d3566d6a-a455-4a15-980f-45eb29114cab.h5ad,15502


And now we can use the column `"dataset_id"` present in both the dataset table and the Census cell metadata to create citation strings for any Census slice.

### Via cell metadata query

In [9]:
# Query cell metadata
cell_metadata = cellxgene_census.get_obs(
    census, "homo_sapiens",
    value_filter="tissue == 'cardiac atrium'",
    column_names=["dataset_id", "cell_type"]
)

In [10]:
# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(cell_metadata["dataset_id"])]

In [11]:
print(*set(slice_datasets["citation"]), sep="\n\n")

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.

In [12]:
print(*set(slice_datasets["collection_doi_label"]), sep="\n\n")

The Tabula Sapiens Consortium* et al. (2022) Science


### Via AnnData query

In [13]:
# Fetch an AnnData object
adata = cellxgene_census.get_anndata(
    census=census,
    organism="homo_sapiens",
    measurement_name="RNA",
    obs_value_filter="tissue == 'cardiac atrium'",
    var_value_filter="feature_name == 'MYBPC3'",
    obs_column_names=["dataset_id", "cell_type"],
)

In [14]:
# Get a citation string for the slice
slice_datasets = datasets[datasets["dataset_id"].isin(adata.obs["dataset_id"])]

In [17]:
slice_datasets.head()

Unnamed: 0,soma_joinid,citation,collection_id,collection_name,collection_doi,collection_doi_label,dataset_id,dataset_version_id,dataset_title,dataset_h5ad_path,dataset_total_cell_count
413,413,Publication: https://doi.org/10.1126/science.a...,e5f58829-1a66-40b5-a624-9046778e74f5,Tabula Sapiens,10.1126/science.abl4896,The Tabula Sapiens Consortium* et al. (2022) S...,e6a11140-2545-46bc-929e-da243eed2cae,dbcbe0a6-918a-4440-9a56-6d03f0f22df5,Tabula Sapiens - Heart,e6a11140-2545-46bc-929e-da243eed2cae.h5ad,11505
572,572,Publication: https://doi.org/10.1126/science.a...,e5f58829-1a66-40b5-a624-9046778e74f5,Tabula Sapiens,10.1126/science.abl4896,The Tabula Sapiens Consortium* et al. (2022) S...,5a11f879-d1ef-458a-910c-9b0bdfca5ebf,07900e47-7ab4-48d4-a26e-abdd010f4bbf,Tabula Sapiens - Endothelial,5a11f879-d1ef-458a-910c-9b0bdfca5ebf.h5ad,31691
718,718,Publication: https://doi.org/10.1126/science.a...,e5f58829-1a66-40b5-a624-9046778e74f5,Tabula Sapiens,10.1126/science.abl4896,The Tabula Sapiens Consortium* et al. (2022) S...,a68b64d8-aee3-4947-81b7-36b8fe5a44d2,cb872c2c-64a4-405f-96c3-03124405cc6c,Tabula Sapiens - Stromal,a68b64d8-aee3-4947-81b7-36b8fe5a44d2.h5ad,82478
745,745,Publication: https://doi.org/10.1126/science.a...,e5f58829-1a66-40b5-a624-9046778e74f5,Tabula Sapiens,10.1126/science.abl4896,The Tabula Sapiens Consortium* et al. (2022) S...,97a17473-e2b1-4f31-a544-44a60773e2dd,3149e7d3-1ae4-4b59-a54b-73e9f591b699,Tabula Sapiens - Epithelial,97a17473-e2b1-4f31-a544-44a60773e2dd.h5ad,104148
799,799,Publication: https://doi.org/10.1126/science.a...,e5f58829-1a66-40b5-a624-9046778e74f5,Tabula Sapiens,10.1126/science.abl4896,The Tabula Sapiens Consortium* et al. (2022) S...,c5d88abe-f23a-45fa-a534-788985e93dad,50a18e6a-797b-40bd-aa07-6ed50a1f2cf6,Tabula Sapiens - Immune,c5d88abe-f23a-45fa-a534-788985e93dad.h5ad,264824


In [15]:
print(*set(slice_datasets["citation"]), sep="\n\n")

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/3149e7d3-1ae4-4b59-a54b-73e9f591b699.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/dbcbe0a6-918a-4440-9a56-6d03f0f22df5.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/07900e47-7ab4-48d4-a26e-abdd010f4bbf.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5

Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.

In [16]:
print(*set(slice_datasets["collection_doi_label"]), sep="\n\n")

The Tabula Sapiens Consortium* et al. (2022) Science


And don't forget to close the Census handle

In [18]:
census.close()