ChIP-seq Genomic Range Annotation#

import genomic_features as gf

import bioframe
import seaborn
import matplotlib.pyplot as plt
import numpy as np

Download example data#

The data used in this tutorial is ChIP-seq IDR ranked peaks with target CTCF, downloaded from ENCODE (accession: ENCFF417KHZ). The data is from CRISPR modified homo sapiens cell line HCT116 and more information about the dataset can be found on the ENCODE website here. Column names are assigned by bioframe using the function read_table() based on the column contents.

ctcf_peak = bioframe.read_table(
    "https://www.encodeproject.org/files/ENCFF417KHZ/@@download/ENCFF417KHZ.bed.gz",
    schema="narrowPeak",
)

ctcf_peak

	chrom	start	end	name	score	strand	fc	-log10p	-log10q	relSummit
0	chr10	99392395	99392827	.	1000	.	306.78095	-1.0	4.96085	230
1	chr3	194929204	194929707	.	1000	.	284.09197	-1.0	4.96085	234
2	chr8	97559907	97560401	.	1000	.	284.16880	-1.0	4.96085	242
3	chr8	66609310	66609747	.	1000	.	265.66492	-1.0	4.96085	238
4	chr15	66903598	66904095	.	1000	.	263.27660	-1.0	4.96085	252
...	...	...	...	...	...	...	...	...	...	...
129238	chr17	55794221	55794717	.	78	.	5.91739	-1.0	0.35122	248
129239	chr5	39135451	39135947	.	78	.	6.62387	-1.0	0.51343	248
129240	chr20	16241885	16242381	.	78	.	8.24784	-1.0	0.96486	248
129241	chr2	39231186	39231682	.	78	.	6.21278	-1.0	0.42335	248
129242	chr16	84553939	84554435	.	78	.	8.06121	-1.0	0.91185	248

129243 rows × 10 columns

Download Ensembl annotations using genomic_features#

Using the genomic ranges provided in the bed file’s chrom, start, and end columns we can ascribe an ensembl gene id from EnsemblDB to each peak. Read in the correct ensembl annotation version to match to the data. It’s possible to add more columns to the returned dataframe if exon, protein, or sequence information that is not included by default is needed. The function list_columns() shows all possible columns.

ensdb = gf.ensembl.annotation(species="Hsapiens", version="108")

ensdb.list_columns()

['seq_name',
 'seq_length',
 'is_circular',
 'gene_id',
 'entrezid',
 'exon_id',
 'exon_seq_start',
 'exon_seq_end',
 'gene_id',
 'gene_name',
 'gene_biotype',
 'gene_seq_start',
 'gene_seq_end',
 'seq_name',
 'seq_strand',
 'seq_coord_system',
 'description',
 'gene_id_version',
 'canonical_transcript',
 'name',
 'value',
 'tx_id',
 'protein_id',
 'protein_sequence',
 'protein_id',
 'protein_domain_id',
 'protein_domain_source',
 'interpro_accession',
 'prot_dom_start',
 'prot_dom_end',
 'tx_id',
 'tx_biotype',
 'tx_seq_start',
 'tx_seq_end',
 'tx_cds_seq_start',
 'tx_cds_seq_end',
 'gene_id',
 'tx_support_level',
 'tx_id_version',
 'gc_content',
 'tx_external_name',
 'tx_is_canonical',
 'tx_id',
 'exon_id',
 'exon_idx',
 'protein_id',
 'uniprot_id',
 'uniprot_db',
 'uniprot_mapping_type']

genes = ensdb.genes(
    filter=gf.filters.GeneBioTypeFilter("protein_coding"),
)

genes

	gene_id	gene_name	gene_biotype	gene_seq_start	gene_seq_end	seq_name	seq_strand	seq_coord_system	description	gene_id_version	canonical_transcript
0	ENSG00000000003	TSPAN6	protein_coding	100627108	100639991	X	-1	chromosome	tetraspanin 6 [Source:HGNC Symbol;Acc:HGNC:11858]	ENSG00000000003.15	ENST00000373020
1	ENSG00000000005	TNMD	protein_coding	100584936	100599885	X	1	chromosome	tenomodulin [Source:HGNC Symbol;Acc:HGNC:17757]	ENSG00000000005.6	ENST00000373031
2	ENSG00000000419	DPM1	protein_coding	50934867	50959140	20	-1	chromosome	dolichyl-phosphate mannosyltransferase subunit...	ENSG00000000419.14	ENST00000371588
3	ENSG00000000457	SCYL3	protein_coding	169849631	169894267	1	-1	chromosome	SCY1 like pseudokinase 3 [Source:HGNC Symbol;A...	ENSG00000000457.14	ENST00000367771
4	ENSG00000000460	C1orf112	protein_coding	169662007	169854080	1	1	chromosome	chromosome 1 open reading frame 112 [Source:HG...	ENSG00000000460.17	ENST00000359326
...	...	...	...	...	...	...	...	...	...	...	...
22890	ENSG00000290318		protein_coding	49357176	49412051	3	-1	chromosome	novel protein	ENSG00000290318.1	ENST00000704381
22891	ENSG00000290320	H2BN1	protein_coding	32895433	32906586	17	1	chromosome	H2B.N variant histone 1 [Source:HGNC Symbol;Ac...	ENSG00000290320.1	ENST00000704639
22892	ENSG00000291237	SOD2	protein_coding	159669069	159762529	6	-1	chromosome	superoxide dismutase 2 [Source:HGNC Symbol;Acc...	ENSG00000291237.1	ENST00000538183
22893	ENSG00000291239		protein_coding	36850858	36998682	19	1	chromosome	novel protein	ENSG00000291239.1	ENST00000706165
22894	ENSG00000291240		protein_coding	21008247	21032572	22	-1	chromosome	novel protein	ENSG00000291240.1	ENST00000706202

22895 rows × 11 columns

Mapping chromosome annotation styles#

In order to annotate the ranges in the dataframe with gene symbols, ensembl ids, strand direction, etc. the two dataframes need to be merged. To make this seamless we use the bioframe assembly_info to map the numeric Ensembl chromosome annotation with the “chr” notation of the CHIP-Seq dataset and remove any peaks with unmapped chromosome values. This step is optional but makes joining the two dataframes much simpler in the next step.

from bioframe import assembly_info

hg38 = assembly_info("hg38")

genes["seq_name"] = genes["seq_name"].map(hg38.alias_dict)

genes = genes.dropna(subset="seq_name")
genes

	gene_id	gene_name	gene_biotype	gene_seq_start	gene_seq_end	seq_name	seq_strand	seq_coord_system	description	gene_id_version	canonical_transcript
0	ENSG00000000003	TSPAN6	protein_coding	100627108	100639991	chrX	-1	chromosome	tetraspanin 6 [Source:HGNC Symbol;Acc:HGNC:11858]	ENSG00000000003.15	ENST00000373020
1	ENSG00000000005	TNMD	protein_coding	100584936	100599885	chrX	1	chromosome	tenomodulin [Source:HGNC Symbol;Acc:HGNC:17757]	ENSG00000000005.6	ENST00000373031
2	ENSG00000000419	DPM1	protein_coding	50934867	50959140	chr20	-1	chromosome	dolichyl-phosphate mannosyltransferase subunit...	ENSG00000000419.14	ENST00000371588
3	ENSG00000000457	SCYL3	protein_coding	169849631	169894267	chr1	-1	chromosome	SCY1 like pseudokinase 3 [Source:HGNC Symbol;A...	ENSG00000000457.14	ENST00000367771
4	ENSG00000000460	C1orf112	protein_coding	169662007	169854080	chr1	1	chromosome	chromosome 1 open reading frame 112 [Source:HG...	ENSG00000000460.17	ENST00000359326
...	...	...	...	...	...	...	...	...	...	...	...
22890	ENSG00000290318		protein_coding	49357176	49412051	chr3	-1	chromosome	novel protein	ENSG00000290318.1	ENST00000704381
22891	ENSG00000290320	H2BN1	protein_coding	32895433	32906586	chr17	1	chromosome	H2B.N variant histone 1 [Source:HGNC Symbol;Ac...	ENSG00000290320.1	ENST00000704639
22892	ENSG00000291237	SOD2	protein_coding	159669069	159762529	chr6	-1	chromosome	superoxide dismutase 2 [Source:HGNC Symbol;Acc...	ENSG00000291237.1	ENST00000538183
22893	ENSG00000291239		protein_coding	36850858	36998682	chr19	1	chromosome	novel protein	ENSG00000291239.1	ENST00000706165
22894	ENSG00000291240		protein_coding	21008247	21032572	chr22	-1	chromosome	novel protein	ENSG00000291240.1	ENST00000706202

20012 rows × 11 columns

Merging genomic ranges to annotate Ensembl ids and gene names#

When joining the two dataframes we can use the overlap() function to join the dataframes based on the gene region the peak overlaps with the most. The cols1 option passes synonymous column names from the ensembl dataframe to the corresponding ctcf peak columns for matching. Now each remaining peak has an ensembl id and gene name.

gene_ov = bioframe.overlap(
    ctcf_peak, genes, cols2=["seq_name", "gene_seq_start", "gene_seq_end"], how="inner"
)

gene_ov

	chrom	start	end	name	score	strand	fc	-log10p	-log10q	relSummit	...	gene_name_	gene_biotype_	gene_seq_start_	gene_seq_end_	seq_name_	seq_strand_	seq_coord_system_	description_	gene_id_version_	canonical_transcript_
0	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	SLC25A18	protein_coding	17563450	17590995	chr22	1	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451
1	chr22	19179353	19179849	.	800	.	37.47772	-1.0	4.96085	248	...	CLTCL1	protein_coding	19179473	19291719	chr22	-1	chromosome	clathrin heavy chain like 1 [Source:HGNC Symbo...	ENSG00000070371.16	ENST00000427926
2	chr22	19756445	19756941	.	885	.	35.76666	-1.0	4.96085	248	...	TBX1	protein_coding	19756703	19783593	chr22	1	chromosome	T-box transcription factor 1 [Source:HGNC Symb...	ENSG00000184058.16	ENST00000649276
3	chr22	20016770	20017266	.	176	.	11.34425	-1.0	2.13939	248	...	TANGO2	protein_coding	20017014	20067164	chr22	1	chromosome	transport and golgi organization 2 homolog [So...	ENSG00000183597.16	ENST00000327374
4	chr22	20916962	20917458	.	484	.	22.82968	-1.0	4.96085	248	...	CRKL	protein_coding	20917407	20953747	chr22	1	chromosome	CRK like proto-oncogene, adaptor protein [Sour...	ENSG00000099942.13	ENST00000354336
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
73773	chr8	144895624	144896120	.	78	.	8.95825	-1.0	1.20680	248	...	ZNF250	protein_coding	144876497	144902168	chr8	-1	chromosome	zinc finger protein 250 [Source:HGNC Symbol;Ac...	ENSG00000196150.14	ENST00000417550
73774	chr8	144900092	144900588	.	1000	.	63.47950	-1.0	4.96085	248	...	ZNF250	protein_coding	144876497	144902168	chr8	-1	chromosome	zinc finger protein 250 [Source:HGNC Symbol;Ac...	ENSG00000196150.14	ENST00000417550
73775	chr8	144901897	144902393	.	454	.	37.27992	-1.0	4.96085	248	...	ZNF250	protein_coding	144876497	144902168	chr8	-1	chromosome	zinc finger protein 250 [Source:HGNC Symbol;Ac...	ENSG00000196150.14	ENST00000417550
73776	chr8	144934494	144934990	.	385	.	12.88503	-1.0	2.82659	248	...	ZNF16	protein_coding	144930358	144950888	chr8	-1	chromosome	zinc finger protein 16 [Source:HGNC Symbol;Acc...	ENSG00000170631.15	ENST00000394909
73777	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	C8orf33	protein_coding	145052465	145066685	chr8	1	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434

73778 rows × 21 columns

Annotating exons and introns#

Ensembl DB also has an exon table that we can use to select exon/intron peaks if they were not already loaded via the gene table. To keep consistent chromosome style, we again map seq_name using bioframe and drop unmapped chromosome names.

exons = ensdb.exons(filter=gf.filters.GeneIDFilter(gene_ov["gene_id_"]))

exons["seq_name"] = exons["seq_name"].map(hg38.alias_dict)

exons = exons.dropna(subset="seq_name")
exons

	exon_id	exon_seq_start	exon_seq_end	seq_name	gene_id
0	ENSE00001855382	100636608	100636806	chrX	ENSG00000000003
1	ENSE00003662440	100635558	100635746	chrX	ENSG00000000003
2	ENSE00003654571	100635178	100635252	chrX	ENSG00000000003
3	ENSE00003658810	100633931	100634029	chrX	ENSG00000000003
4	ENSE00003554016	100633405	100633539	chrX	ENSG00000000003
...	...	...	...	...	...
513101	ENSE00003995056	21029711	21030351	chr22	ENSG00000291240
513102	ENSE00003995060	21029336	21029444	chr22	ENSG00000291240
513103	ENSE00003995062	21013042	21013184	chr22	ENSG00000291240
513104	ENSE00003995061	21011384	21011590	chr22	ENSG00000291240
513105	ENSE00003995059	21008247	21009449	chr22	ENSG00000291240

513106 rows × 5 columns

We can overlap our peaks again to annotate the peaks that are overlapping exon regions and subsequently label non-overlapping peaks as introns.

exon_ov = bioframe.overlap(
    gene_ov,
    exons,
    cols2=["seq_name", "exon_seq_start", "exon_seq_end"],
    how="left",
)

exon_ov

	chrom	start	end	name	score	strand	fc	-log10p	-log10q	relSummit	...	seq_strand_	seq_coord_system_	description_	gene_id_version_	canonical_transcript_	exon_id_	exon_seq_start_	exon_seq_end_	seq_name_	gene_id_
0	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	1	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001540287	17563450	17563713	chr22	ENSG00000182902
1	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	1	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001430286	17563464	17563713	chr22	ENSG00000182902
2	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	1	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001955974	17563498	17563713	chr22	ENSG00000182902
3	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	1	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001866696	17563589	17563713	chr22	ENSG00000182902
4	chr22	19179353	19179849	.	800	.	37.47772	-1.0	4.96085	248	...	-1	chromosome	clathrin heavy chain like 1 [Source:HGNC Symbo...	ENSG00000070371.16	ENST00000427926	ENSE00003739095	19179473	19179969	chr22	ENSG00000070371
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
111113	chr8	144934494	144934990	.	385	.	12.88503	-1.0	2.82659	248	...	-1	chromosome	zinc finger protein 16 [Source:HGNC Symbol;Acc...	ENSG00000170631.15	ENST00000394909	None	<NA>	<NA>	None	None
111114	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	1	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003839769	145065708	145066516	chr8	ENSG00000182307
111115	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	1	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003840227	145065715	145066624	chr8	ENSG00000182307
111116	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	1	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003838341	145065708	145066679	chr8	ENSG00000182307
111117	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	1	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003835531	145065708	145066685	chr8	ENSG00000182307

111118 rows × 26 columns

Now we can label the peaks that are mapped to exons as exons and those that are not mapped as introns. A coarser definition is encouraged for introns depending on where the sequence lands in the gene position.

exon_ov["gene_feature"] = np.where(exon_ov["exon_id_"].isnull(), "intron", "exon")
exon_ov

	chrom	start	end	name	score	strand	fc	-log10p	-log10q	relSummit	...	seq_coord_system_	description_	gene_id_version_	canonical_transcript_	exon_id_	exon_seq_start_	exon_seq_end_	seq_name_	gene_id_	gene_feature
0	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001540287	17563450	17563713	chr22	ENSG00000182902	exon
1	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001430286	17563464	17563713	chr22	ENSG00000182902	exon
2	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001955974	17563498	17563713	chr22	ENSG00000182902	exon
3	chr22	17563181	17563677	.	748	.	23.66046	-1.0	4.96085	248	...	chromosome	solute carrier family 25 member 18 [Source:HGN...	ENSG00000182902.14	ENST00000327451	ENSE00001866696	17563589	17563713	chr22	ENSG00000182902	exon
4	chr22	19179353	19179849	.	800	.	37.47772	-1.0	4.96085	248	...	chromosome	clathrin heavy chain like 1 [Source:HGNC Symbo...	ENSG00000070371.16	ENST00000427926	ENSE00003739095	19179473	19179969	chr22	ENSG00000070371	exon
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
111113	chr8	144934494	144934990	.	385	.	12.88503	-1.0	2.82659	248	...	chromosome	zinc finger protein 16 [Source:HGNC Symbol;Acc...	ENSG00000170631.15	ENST00000394909	None	<NA>	<NA>	None	None	intron
111114	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003839769	145065708	145066516	chr8	ENSG00000182307	exon
111115	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003840227	145065715	145066624	chr8	ENSG00000182307	exon
111116	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003838341	145065708	145066679	chr8	ENSG00000182307	exon
111117	chr8	145065907	145066403	.	78	.	10.13864	-1.0	1.63946	248	...	chromosome	chromosome 8 open reading frame 33 [Source:HGN...	ENSG00000182307.14	ENST00000331434	ENSE00003835531	145065708	145066685	chr8	ENSG00000182307	exon

111118 rows × 27 columns

# Create a histogram
palette = seaborn.color_palette(
    "ch:s=.25,rot=-.25", len(exon_ov["gene_feature"].unique())
)

ax = seaborn.histplot(exon_ov["gene_feature"], stat="percent")

# Set ylim
plt.ylim(0, 100)

# Assigning different colors to each bar
for i, bar in enumerate(ax.patches):
    bar.set_color(palette[i])

# Add values on top of the bars
for p in ax.patches:
    ax.annotate(
        f"{p.get_height():.2f}%",
        (p.get_x() + p.get_width() / 2.0, p.get_height()),
        ha="center",
        va="center",
        fontsize=10,
        color="black",
        xytext=(0, 5),
        textcoords="offset points",
    )

# Show the plot
plt.show()

../_images/58c5d520331ce0d69b72df90e5c0e25dfaa7f274768a1d21645ca80537941e51.png

Similarly, it is possible to categorize peaks into promoter, gene body, etc. groups based on distance and upstream/downstream thresholds using bioframe. Promoter region peaks are filtered here using a maximum threshold of 2000kb upstream from promoter region.

out = bioframe.closest(
    genes,
    ctcf_peak,
    cols1=["seq_name", "gene_seq_start", "gene_seq_end"],
    ignore_overlaps=True,
    ignore_downstream=True,
    k=1,
)
out[
    out["distance"] < 2000
]  # filter for peaks 2000kb upstream or closer to promoter region
out

	gene_id	gene_name	gene_biotype	gene_seq_start	gene_seq_end	seq_name	seq_strand	seq_coord_system	description	gene_id_version	...	start_	end_	name_	score_	strand_	fc_	-log10p_	-log10q_	relSummit_	distance
0	ENSG00000000457	SCYL3	protein_coding	169849631	169894267	chr1	-1	chromosome	SCY1 like pseudokinase 3 [Source:HGNC Symbol;A...	ENSG00000000457.14	...	169809746	169810242	.	434.0	.	10.49881	-1.0	1.78549	248.0	39389
1	ENSG00000000460	C1orf112	protein_coding	169662007	169854080	chr1	1	chromosome	chromosome 1 open reading frame 112 [Source:HG...	ENSG00000000460.17	...	169655862	169656358	.	78.0	.	6.05988	-1.0	0.38208	248.0	5649
2	ENSG00000000938	FGR	protein_coding	27612064	27635185	chr1	-1	chromosome	FGR proto-oncogene, Src family tyrosine kinase...	ENSG00000000938.13	...	27609989	27610485	.	78.0	.	11.17754	-1.0	2.06285	248.0	1579
3	ENSG00000000971	CFH	protein_coding	196651754	196752476	chr1	1	chromosome	complement factor H [Source:HGNC Symbol;Acc:HG...	ENSG00000000971.17	...	196635316	196635812	.	419.0	.	23.07669	-1.0	4.96085	248.0	15942
4	ENSG00000001460	STPG1	protein_coding	24356999	24416934	chr1	-1	chromosome	sperm tail PG-rich repeat containing 1 [Source...	ENSG00000001460.18	...	24355241	24355737	.	78.0	.	8.59700	-1.0	1.08362	248.0	1262
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
20007	ENSG00000242389	RBMY1E	protein_coding	21903618	21918042	chrY	-1	chromosome	RNA binding motif protein Y-linked family 1 me...	ENSG00000242389.9	...	<NA>	<NA>	<NA>	NaN	<NA>	NaN	NaN	NaN	NaN	<NA>
20008	ENSG00000242875	RBMY1B	protein_coding	21511338	21525786	chrY	1	chromosome	RNA binding motif protein Y-linked family 1 me...	ENSG00000242875.7	...	<NA>	<NA>	<NA>	NaN	<NA>	NaN	NaN	NaN	NaN	<NA>
20009	ENSG00000244395	RBMY1D	protein_coding	21880076	21894526	chrY	-1	chromosome	RNA binding motif protein Y-linked family 1 me...	ENSG00000244395.6	...	<NA>	<NA>	<NA>	NaN	<NA>	NaN	NaN	NaN	NaN	<NA>
20010	ENSG00000258992	TSPY1	protein_coding	9466955	9469749	chrY	1	chromosome	testis specific protein Y-linked 1 [Source:HGN...	ENSG00000258992.7	...	<NA>	<NA>	<NA>	NaN	<NA>	NaN	NaN	NaN	NaN	<NA>
20011	ENSG00000280969	RPS4Y2	protein_coding	20756108	20781032	chrY	1	chromosome	ribosomal protein S4 Y-linked 2 [Source:HGNC S...	ENSG00000280969.2	...	<NA>	<NA>	<NA>	NaN	<NA>	NaN	NaN	NaN	NaN	<NA>

20012 rows × 22 columns

ChIP-seq Genomic Range Annotation

Contents