James Tauber : James Tauber's Blog 2008/11/02

blog > 2008 > 11 >

Cell naming

My previous post introduced my adventures into C. elegans.

I've gone ahead and implemented my own little cell lineage browser using django-mptt. Once I've added more functionality, I'll put it online.

But for now, I'm intrigued by the naming of cells in the lineage. In particular, the majority of cells are named by appending either 'a' or 'p' to the parent cell. What do 'a' and 'p' stand for?

As an example:

P0 -> P1' -> P2' -> C

but then

C divides into Ca, Cp
Ca divides into Caa and Cap; Cp divides into Cpa and Cpp

Caa, Cpa then have a slightly different progression than Cap and Cpp:

Caa and Cpa respectively split into Caaa, Caap and Cpaa and Cpap
these then split into Caaaa, Caaap, Caapa, Caapp, Cpaaa, Cpaap, Cpapa and Cpapp
these then split into the 16 you'd expect except that Cpapp splits into what are called Cpappd and hyp11, Caapp splits into Caappd and PVR and Caapa splits into Caapap and DVC.

Cap and Cpp progress as follows:

they split into Capa, Capp, Cppa, Cppp as you'd expect
these split into Capaa, Capap, Cappa, Cappp, Cppaa, Cppap, Cpppa, Cpppp as you'd expect
these then split into Capaaa, Capaap, Capapa, Capapp, Cappaa, Cappap, Capppa, Capppp, Cppaaa, Cppaap, Cppapa, Cppapp, Cpppaa, Cpppap, Cppppa, Cppppp
and finally the 32 you would expect except Cppppp splits into what are called Cpppppd and Cpppppv

This is just the C lineage which is less than 10%. But I'd love to know what the 'a' and 'p' stand for; what the 'd' and 'v' stand for; and why hyp11, PVR and DVR get such a distinct names.

UPDATE: I added a "cell type" field to my browser and it revealed a couple of useful things: the "leaf nodes" (i.e. final cells) from Cap and Cpp are all marked as of cell type "muscle". The leaf nodes from Cpa (including hyp11) are all marked cell type "hypodermis". The leaf nodes from Caa are a little more interesting: The Caaa... leaf nodes are all "hypodermis". The leaf nodes from Caap are the most interesting, though. Caappd is "hypodermis", Caapap is marked as dying, and PVR and DVC are neurons.

UPDATE 2: Just as a point of comparison, there is another founder cell D whose descendants are a lot cleaner. D results in 20 cells, all of type "muscle". All are named with a/p. The only reason it's not a power of 2 is the two D{a|p}pp split into 4 whereas the others at that level split into only 2.

UPDATE 3: Based on http://en.wikipedia.org/wiki/Anatomical_terms_of_location I'm now convinced a, p, d, and v refer to anterior, posterior, dorsal and ventral respectively.

by : Created on Nov. 2, 2008 : Last modified Nov. 2, 2008 : (permalink)

C. elegans

I don't normally talk about biology because I don't know much about it. Growing up, I was the physicist and my sisters were the biologists. But I'm interested in the computational modeling of just about anything so I've long been interested in biological simulations, artificial life, etc and have recently been getting in to computational neuroscience in a fairly big way.

I can't remember when I first read about Caenorhabditis elegans (henceforth abbreviated, as it is by biologists, to C. elegans) but it was probably about a year ago and it totally blew my mind.

C. elegans is a tiny roundworm, about one millimeter long but what is remarkable is just how much we know about it. How much? well, we know every single cell and how it develops from the single cell zygote. We know every single neuron and how the entire brain is wired. That's pretty incredible. Oh, and of course we've sequenced the entire genome.

C. elegans, along with fruit flies and zebrafish, is an example of a model organism. Model organisms are those that have been studied in great depth in the hope of understanding organisms in general (including humans). Numerous characteristics make a particular organism suitable as a model. In the case of C. elegans I think it's how quickly they generate and the fact they have a very defined development and fixed number of cells. They can also be revived after being frozen.

Now C. elegans are almost always hermaphrodite, although a tiny fraction are male. The hermaphrodites have 959 cells and, as I mentioned, we know how each of them developed from the initial zygote. So P0 splits in to AB and P1', P1' into EMS and P2', EMS in to E and EMS, E into Ea and Ep, and so on. This tree structure is called the cell lineage or pedigree and it's available online at http://www.wormbase.org/db/searches/pedigree. For each cell, there's also an information page and that information is also available in an XML format (e.g. http://www.wormbase.org/db/cell/cell.cgi?name=EMS;class=Cell. Because I wanted to dig around a little more, I ended up writing a data scraping script in Python to download all the XML files (parsing each one to find out what the daughter cells were then recursing).

The data I've downloaded also includes the neuronal wiring. At some point I'd like to do a little Django app for navigating around the data in a way that's a little friendlier for the layperson. Might also be a good excuse for me to try out django-mptt.

The data is all in a format that is shared across different model organism research projects and there is open source software for dealing with this data (especially the genomic data). For example, GBrowse is used for browsing and searching the genome of both C. elegans and the fruit fly. GBrowse is part of the GMOD project. Most of the stuff looks like it's Perl CGI scripts.

In my fascination with computer modeling but my complete ignorance of the state of biology, I wonder how far we are from cell-level simulations of organisms like C. elegans. Do we know enough to even begin to think about doing this for a 959-cell organism? I mean, isn't the Blue Brain project supposed to eventually simulate a 10,000-cell neocortical column? (edit: it already is, see comments below) Or how far are we from simulating the cell develop of C. elegans? i.e. given P0 (including the genome), press play and get the 959 cells of the C. elegans adult hermaphrodite at the end. The fact that (edit: one of) the most powerful computer(s) in the world and a multi-year project are what it's going to take for 10,000 cells, I guess we're not going to be writing C. elegans simulators in Python on our desktops any time soon.

But hey, it would sure be cool.

by : Created on Nov. 2, 2008 : Last modified Nov. 2, 2008 : (permalink)