Home | About CGNC | Guidelines | Downloads | Biocuration List | Contact us |
Since chicken is the de facto model avian species, we will assign avian gene nomenclature based upon standardized gene nomenclature for chicken. The chicken research community agreed to assign chicken gene nomenclature in consultation with researchers and in concordance with assigned human gene nomenclature, where such nomenclature exists for human:chicken ortholog pairs [1]. The CGNC will assign nomenclature for avians exactly as described for chicken [2] and will use chicken ortholog determination amongst avian species to ensure that names for orthologous genes are concordant.
Currently the CGNC is working to identify these ortholog pairs and CGNC biocurators are manually checking these assertions and adding nomenclature for genes that do not have a clear 1:1 ortholog with human. Data generated from both these projects are integrated and freely available.
More detailed information about gene nomenclature assignment can be found in the current HGNC guidelines. Briefly, the guidelines state that: gene names should be brief and specific and should convey the character or function of the gene the first letter of the symbol should be the same as that of the name in order to facilitate alphabetical listing and grouping; gene names should follow American spelling; tissue specificity and molecular weight designations should be avoided; and gene symbols must be unique, representative of the descriptive gene name, contain Latin letters and Arabic numerals and should not contain punctuation symbols or a "G" for gene or any reference to species (e.g., "c" or "ch" for chicken).
We also capture aliases or synonyms for avian genes. In many cases where a standardized gene name is applied to an avian gene there will be other names used to report this gene, oftentimes based on separate reports of the gene in published literature. By making this data available, researchers will be able to better find and evaluate available literature for the gene(s) they are studying.
Since avian gene nomenclature is to be based on existing human gene nomenclature where possible [3], the first step is to identify strict 1:1 orthologs and assign these avian genes symbols and names based upon the human nomenclature. These orthologs will initially be identified using ortholog prediction and then manually verified by nomenclature biocurators.
In cases where the human ortholog is identified by its chromosome of origin, the letters "orf" for open reading frame and a number (C#orf#), we will prefix the human symbol with the relevant avian chromosome number. For example, the chicken ortholog for human C1orf26 (HGNC:16785) is located on chromosome 8 and is designated C8H1orf26 ìchromosome 8 open reading frame, human C1orf26î. These names will be replaced by more informative nomenclature as more becomes known about these genes and their function.
Novel avian genes fall into two broad categories: novel genes predicted by bioinformatic gene prediction programs and novel avian genes that have been studied prior to the completion of avian genome sequencing. Putative open reading frames from the NCBI gene prediction pipeline are designated with a locus number, for example LOC777587. In cases where there is no assigned nomenclature for orthologous genes, the LOC# or gene project equivalent will be used as the temporary gene symbol.
Avian genes that do not have strict 1:1 orthologs will be manually curated and assigned nomenclature on the basis of their current names. Only unique symbols and gene names will be approved. Where individual researchers have published gene names, they will be asked to provide feedback on nomenclature within current nomenclature guidelines but that will also be as close to the original name as possible. In such cases where genes have been published under more than one name efforts will be made to contact all parties to decide upon a unique nomenclature for the gene. If the gene is a member of an established gene family an alternative symbol may be approved, following consultation with researchers.
Chicken MHC genes that have similarity to a human HLA gene shall be assigned nomenclature that reflects this relationship. The gene name will follow the form:
Major histocompatibility complex class #
The symbol will remain as the assigned chicken designator.
For example, Entrez Gene: 693256 BLB2 becomes
Gene name: Major histocompatibility complex class II beta chain BLB2, (similar to HLA class II, D beta chain)
Gene symbol: BLB2
Based on it similarity to HLA class II, D beta chain genes (e.g. HGNC IDs: 4945, 4937, 4953).
An exception to the rule of preferring feedback from publishing authors is the case of gene families. Hierarchical symbols for both structural and functional gene families will be used where possible because a stem (or root) symbol as a basis for a symbol series allows easy identification of other family members in both database searches and the literature. Examples of gene families include the G protein-coupled receptor genes (GPR1, GPR2, GPR3, etc) and the cytochrome P450 superfamily (CYP1A1, CYP21A2, CYP51A1, etc); the latter already has an established nomenclature for chicken (http://drnelson.uthsc.edu/cytochromeP450.html). For gene families, consecutive symbols take precedence over those published, but again this will be a consultative matter with the research community.
We expect that in the case of gene families, specialized knowledge will be required to correctly determine members of gene families, their order and nomenclature. For example, considerable work has already been done on providing nomenclature for the chicken major histocompatibility B complex genes [4, 5, 6, 7]. We expect to utilize the work done by experts in this field.
For genes with very well-recognized common names, the appropriate gene family name and symbol should be assigned and the common name appended in parentheses in the name field (e.g. ORM1 orosomucoid 1 (ovoglycoprotein)). In cases where chicken researchers and nomenclature experts have agreed that the common name should be kept, the appropriate gene family name should be appended in parentheses in the name field, e.g.:
OVAL ovalbumin (SERPINB14)