PDA

View Full Version : Tests of evolution online, as many as you like


hecaterin
04-07-2008, 07:20 AM
I've just discovered NCBI's new "distance tree" feature when viewing blast results. Cute. And here you go, creationists, this is your chance!

Take a gene, look up its sequence ID, blast it against refseq_rna or nr, and view the distance results as a tree. Nifty. And entirely open to the public. So go, knock yourselves out, see if you can find a gene which violates the standard nested hierarchy. Hint: not CRYAA, human crystallin alpha A. One down, only 3 million genes more to check.

I wonder how many creationists have actually been to NCBI (http://www.ncbi.nlm.nih.gov/). I mean, this is just one website containing one subset of the mounds of data that they need to explain better than evolutionary theory... :D

Martin B
04-07-2008, 11:23 AM
Be careful what you ask for. Particularly using distance methods, you probably can find this -- especially for a single gene. Distance measures don't produce nested hierarchies in the same way that phylogenetic methods do. They produce clusters that minimize differences. If two distantly related taxa happen to share a lot of plesiomorphies (i.e. primitive features) or similarities simply due to chance ("long branches", for instance) they are going to pair up as sister groups in the tree.

So, I respectfully suggest that you withdraw or alter this challenge.

Dlx2
04-07-2008, 04:51 PM
Be careful what you ask for. Particularly using distance methods, you probably can find this -- especially for a single gene. Distance measures don't produce nested hierarchies in the same way that phylogenetic methods do. They produce clusters that minimize differences. If two distantly related taxa happen to share a lot of plesiomorphies (i.e. primitive features) or similarities simply due to chance ("long branches", for instance) they are going to pair up as sister groups in the tree.

So, I respectfully suggest that you withdraw or alter this challenge.


Additionally, some genes are actually highly convergent in function, for example, rhodopsin.

Additionally, the NCBI distance tree function is built around NCBI's alignment protocol, which is not necessarily the best alignment program out there. A good alignment is pretty critical for a phylogenetic analysis.

VoxRat
04-07-2008, 05:20 PM
Additionally, some genes are actually highly convergent in function, for example, rhodopsin.

Though, even here, you will probably get a lot of third-codon-position variation that will generally confirm the nested hierarchies paradigm.

Dlx2
04-07-2008, 06:35 PM
Additionally, some genes are actually highly convergent in function, for example, rhodopsin.

Though, even here, you will probably get a lot of third-codon-position variation that will generally confirm the nested hierarchies paradigm.

....not really, considering that in most lineages, 3rd codon is uninformative.

Mike PSS
04-07-2008, 07:00 PM
Be careful what you ask for. Particularly using distance methods, you probably can find this -- especially for a single gene. Distance measures don't produce nested hierarchies in the same way that phylogenetic methods do. They produce clusters that minimize differences. If two distantly related taxa happen to share a lot of plesiomorphies (i.e. primitive features) or similarities simply due to chance ("long branches", for instance) they are going to pair up as sister groups in the tree.

So, I respectfully suggest that you withdraw or alter this challenge.
Is there an example from the results that would counter the OP claim?

Not meaning to be mean, but it IS a challange. I know I can't meet it, but can you?

VoxRat
04-07-2008, 07:22 PM
Additionally, some genes are actually highly convergent in function, for example, rhodopsin.

Though, even here, you will probably get a lot of third-codon-position variation that will generally confirm the nested hierarchies paradigm.

....not really, considering that in most lineages, 3rd codon is uninformative.

"uninformative"? In what sense?

In that the amount of genetic information (i.e. how much it determines the amino acid) is relatively low? But of course - that's the whole point.

Or in the sense that the amount of information that it provides as to phylogenetic relatedness is low? Because there I would beg to differ.


F'rinstance...
My cytochrome C and chimpanzee's cytochrome C (the protein) is identical, amino acid for amino acid. My cytochrome C and your cytochrome C is identical.

My coding sequence for cytochrome C, however, differs from a chimpanzee's in four (codon position #3) nucleotides. Mine and yours, however, are identical (I'm pretty sure).

I call that consistent with the "nested hierarchies" prediction, and informative.

Martin B
04-07-2008, 08:35 PM
Though, even here, you will probably get a lot of third-codon-position variation that will generally confirm the nested hierarchies paradigm.

....not really, considering that in most lineages, 3rd codon is uninformative.

"uninformative"? In what sense?

In that the amount of genetic information (i.e. how much it determines the amino acid) is relatively low? But of course - that's the whole point.

Or in the sense that the amount of information that it provides as to phylogenetic relatedness is low? Because there I would beg to differ.


F'rinstance...
My cytochrome C and chimpanzee's cytochrome C (the protein) is identical, amino acid for amino acid. My cytochrome C and your cytochrome C is identical.

My coding sequence for cytochrome C, however, differs from a chimpanzee's in four (codon position #3) nucleotides. Mine and yours, however, are identical (I'm pretty sure).

I call that consistent with the "nested hierarchies" prediction, and informative.

Actually, that's just a phenetic argument and has nothing to do with nested hierarchy or relatedness at all. Which of the two is retaining the primitive state, the human's or the chimp's? Or, are they both derived in some way? How do you know? You need an outgroup.

Martin B
04-07-2008, 08:44 PM
Be careful what you ask for. Particularly using distance methods, you probably can find this -- especially for a single gene. Distance measures don't produce nested hierarchies in the same way that phylogenetic methods do. They produce clusters that minimize differences. If two distantly related taxa happen to share a lot of plesiomorphies (i.e. primitive features) or similarities simply due to chance ("long branches", for instance) they are going to pair up as sister groups in the tree.

So, I respectfully suggest that you withdraw or alter this challenge.
Is there an example from the results that would counter the OP claim?

Not meaning to be mean, but it IS a challange. I know I can't meet it, but can you?

Not sure what you're asking. What results?

Are you asking if I could run the distance-based cluster analysis and get a tree that violates nested hierarchies? Well, yes and no.

Yes, in that the topology of the resultant dendrogram would not be congruent with an accepted topology for organismal interrelationships as a whole.

No, because the comparison is invalidated by the fact that a phenogram and a cladogram are not like objects. It would not technically invalidate nested hierarchies and, as such, never could. Not even in principle.

This is the issue here. A distance-based tree isn't a cladogram or phylogenetic (even if the distances have evolutionary causes--and they do). The distance-based results presented on a bifurcating network (a tree) don't reflect relative relationships and could (and frequently do) produce extremely strange topologies, if interpreted as a phylogeny.

ericmurphy
04-07-2008, 08:54 PM
What's the distinction between a phenogram and a cladogram? Is it that phenograms don't take into account evolutionary history or the relative importance of characters, and cladograms do?

Dlx2
04-07-2008, 08:59 PM
Though, even here, you will probably get a lot of third-codon-position variation that will generally confirm the nested hierarchies paradigm.

....not really, considering that in most lineages, 3rd codon is uninformative.

"uninformative"? In what sense?

In that the amount of genetic information (i.e. how much it determines the amino acid) is relatively low? But of course - that's the whole point.

Or in the sense that the amount of information that it provides as to phylogenetic relatedness is low? Because there I would beg to differ.

Uninformative in that 3rd codon positions evolve so fast that the signal-to-noise ratio is dangerously high. 3rd codon is deeply in the Felsenstein Zone in most cases.

Martin B
04-07-2008, 09:02 PM
What's the distinction between a phenogram and a cladogram? Is it that phenograms don't take into account evolutionary history or the relative importance of characters, and cladograms do?

Cladograms are based on the parsimonious distribution of similarities, while a phenogram is based on a metric of the absolute number of similarities.

The classic example: in a rooted phenogram a lizard and a crocodile would form a pair to the exclusion of a bird. In a rooted cladogram, a crocodile and a bird would form a pair to the exclusion of a lizard.

What matters in a cladogram are the shared unique characters and their distribution in a tree (nested hierarchies). What matters in a phenogram, is the actual amount of similarities and differences which are used to form pairs of branches and give those branches lengths.

Dlx2
04-07-2008, 09:02 PM
What's the distinction between a phenogram and a cladogram? Is it that phenograms don't take into account evolutionary history or the relative importance of characters, and cladograms do?

Phenograms are simple diagrams of distance. As such, you don't establish character polarities, and your major branches generally represent gaps in your sample rather than hierarchical data. Cladograms are supposed to be better about that, although there are still major issues with LBA. A cladogram does, however, analyze evolution as a process rather than a simple pattern, but we're still essentially looking at a cluster analysis of sorts for parsimony analysis. Statistical methods are probably the best choices, though, in my opinion.

Martin B
04-07-2008, 09:05 PM
... the Felsenstein Zone ...

*shudders a the thought of such a place*

Martin B
04-07-2008, 09:09 PM
A cladogram does, however, analyze evolution as a process rather than a simple pattern, but we're still essentially looking at a cluster analysis of sorts for parsimony analysis.

Care to explain this?

Dlx2
04-07-2008, 09:23 PM
A cladogram does, however, analyze evolution as a process rather than a simple pattern, but we're still essentially looking at a cluster analysis of sorts for parsimony analysis.

Care to explain this?

Objective function methods analyze dendrograms with the inference that character distance represents a discrete event rather than a distribution. This is not the case for algorithmic methods (ultrametric or neighbor-joining distance methods). So, while it is recovering a hierarchy, that hierarchy is the product of a process, rather than a simple distribution. The difference between statistical methods and parsimony methods is that parsimony methods assume that all change is essentially unique, as the null hypothesis of a parsimony analysis is that there is zero change. Statistical methods are in some ways preferable, but there are certainly methodological issues with those as well, most notably that you have to explicitly determine your character change/stasis probabilities a priori.

Martin B
04-07-2008, 09:28 PM
I fail to see how you're looking at anything other than a pattern. The mere inference that a pattern reflect a process is still analyzing a pattern. You're simply adding another level of inference.

You weren't there. You didn't see it happen. You're analyzing patterns in the present and inferring processes in the past.

ericmurphy
04-07-2008, 09:40 PM
What's the distinction between a phenogram and a cladogram? Is it that phenograms don't take into account evolutionary history or the relative importance of characters, and cladograms do?

Phenograms are simple diagrams of distance. As such, you don't establish character polarities, and your major branches generally represent gaps in your sample rather than hierarchical data. Cladograms are supposed to be better about that, although there are still major issues with LBA. A cladogram does, however, analyze evolution as a process rather than a simple pattern, but we're still essentially looking at a cluster analysis of sorts for parsimony analysis. Statistical methods are probably the best choices, though, in my opinion.

So if you have a group of, say, half a dozen different taxa (an amniote, a teleost fish, a lissamphibian, a fungus, etc.) a phenogram would just be a diagram of how closely- or distantly-related each taxon is from the others, without regard to polarities or the evolutionary processes that would get from one taxon to the next? I.e., a phenogram would tell you that an amniote is more closely-related to a teleost than to a fungus, but it wouldn't tell you whether an amniote is derived from a teleost, or the other way around?

Dlx2
04-07-2008, 09:41 PM
I fail to see how you're looking at anything other than a pattern. The mere inference that a pattern reflect a process is still analyzing a pattern. You're simply adding another level of inference.

You weren't there. You didn't see it happen. You're analyzing patterns in the present and inferring processes in the past.

The choice between Euclidean and Hamming distance makes this inference for you. By choosing Hamming distance, you're already assuming substitution of one state for another, which immediately assumes a transformational process, whether is is a transformation of position in space, or a transformation of character state in time. It's an implicit assumption of the method that you're looking at something process-based, so you're not actually adding a new level of inference. You are, however, adding an assumption, but it's an assumption that is reasonable to make and is sufficiently met by the data. There are other assumptions made by a maximum parsimony analysis that I feel differently about, but I don't have a problem with the assumption that character distance represents transformation.

ck1
04-08-2008, 01:30 AM
....not really, considering that in most lineages, 3rd codon is uninformative.

"uninformative"? In what sense?

In that the amount of genetic information (i.e. how much it determines the amino acid) is relatively low? But of course - that's the whole point.

Or in the sense that the amount of information that it provides as to phylogenetic relatedness is low? Because there I would beg to differ.

Uninformative in that 3rd codon positions evolve so fast that the signal-to-noise ratio is dangerously high. 3rd codon is deeply in the Felsenstein Zone in most cases.

Can you explain this?

ck1
04-08-2008, 01:36 AM
What's the distinction between a phenogram and a cladogram? Is it that phenograms don't take into account evolutionary history or the relative importance of characters, and cladograms do?

Cladograms are based on the parsimonious distribution of similarities, while a phenogram is based on a metric of the absolute number of similarities.

The classic example: in a rooted phenogram a lizard and a crocodile would form a pair to the exclusion of a bird. In a rooted cladogram, a crocodile and a bird would form a pair to the exclusion of a lizard.

What matters in a cladogram are the shared unique characters and their distribution in a tree (nested hierarchies). What matters in a phenogram, is the actual amount of similarities and differences which are used to form pairs of branches and give those branches lengths.

How does this analysis vary when considering morphological features vs amino acid or nucleotide sequence differences/identities?

ck1
04-08-2008, 01:42 AM
I fail to see how you're looking at anything other than a pattern. The mere inference that a pattern reflect a process is still analyzing a pattern. You're simply adding another level of inference.

You weren't there. You didn't see it happen. You're analyzing patterns in the present and inferring processes in the past.

The choice between Euclidean and Hamming distance makes this inference for you. By choosing Hamming distance, you're already assuming substitution of one state for another, which immediately assumes a transformational process, whether is is a transformation of position in space, or a transformation of character state in time. It's an implicit assumption of the method that you're looking at something process-based, so you're not actually adding a new level of inference. You are, however, adding an assumption, but it's an assumption that is reasonable to make and is sufficiently met by the data. There are other assumptions made by a maximum parsimony analysis that I feel differently about, but I don't have a problem with the assumption that character distance represents transformation.

It might be useful to translate this for the average poster on this forum.

You could start by explaining what you mean mean by Euclidean vs Hamming distance.

hecaterin
04-08-2008, 02:12 AM
OK, OK!

Yes, I'm going to have to read some more of these replies more carefully. But even I know the basic dangers of data fishing expeditions already.

My main point is just the trivial: gosh, wow, look at all this incredible amount of data, and AFAIK nothing in there falsifies common descent. And here it is totally open to the public. So go on, bring it on. Creationists have *THIS* much data that they need to explain better than current science. Are they even aware of how big *THIS* much data is?

If anyone *can* actually find a couple of distance trees that don't match 100% with the standard ideas of common descent, it's not going to cause a crisis. There's controversies over where particular ancient splits occur - it could be an educational experience. Exceptions prove (test) the rule.

Dlx2
04-08-2008, 02:27 AM
Can you explain this?

Felsenstein Zone of Inconsistency. In circumstances with significantly heterogeneous evolutionary processes, you can get positively misleading results. A full discussion can be found in Felsenstein (1978):

http://www.jstor.org/pss/2412923

Dlx2
04-08-2008, 02:58 AM
It might be useful to translate this for the average poster on this forum.

You could start by explaining what you mean mean by Euclidean vs Hamming distance.

Consider a grid with three axes, X, Y, and Z.

Now, consider two points:

A (0,0,0)
B (1,1,1)

If we take a direct distance from A to B, this distance is sqrt(3). This is Euclidean distance.

However, consider that each axis was a separate change. On axis X, 0 had to become 1. On axis Y, 0 had to become 1, and on axis Z, 0 had to become 1. As such, for A to become B, there must be a transformation of 3 steps. This is Hamming distance.

Obd
04-08-2008, 10:04 AM
It might be useful to translate this for the average poster on this forum.

You could start by explaining what you mean mean by Euclidean vs Hamming distance.

Consider a grid with three axes, X, Y, and Z.

Now, consider two points:

A (0,0,0)
B (1,1,1)

If we take a direct distance from A to B, this distance is sqrt(3). This is Euclidean distance.

However, consider that each axis was a separate change. On axis X, 0 had to become 1. On axis Y, 0 had to become 1, and on axis Z, 0 had to become 1. As such, for A to become B, there must be a transformation of 3 steps. This is Hamming distance.


It might be useful to clarify this a bit more. For two vectors X(x1,x2,x3) and Y(y1,y2,y3) the Euclidean distance is calculated as follows:
sqrt((y1-x1)2 + (y2-x2)2 + (y3-x3)2)).

Hamming distance is simply the number of positions in which each vector (string is probably a more adequate name) differs.

Given my limited knowledge of genetics I can't see why one would use a euclidean distance measure since as far as I know there is no quantitative difference in the distance from say A to C and a to T. Maybe someone with a little more knowledge of genetics could enlighten me on that issue?

Dlx2
04-08-2008, 10:13 AM
Given my limited knowledge of genetics I can't see why one would use a euclidean distance measure since as far as I know there is no quantitative difference in the distance from say A to C and a to T. Maybe someone with a little more knowledge of genetics could enlighten me on that issue?

Well, there's a probabilistic difference between the rate of transitions and the rate of transversions, that is to say, replacement of a pyrimidine with a pyrimidine is easier than replacement of a pyrimidine with a purine. As such, it does make sense for these to represent different quantities.

At the same time, I agree that euclidean makes jack shit amount of sense. Essentially, it requires that one reconstruct impossible character states for nodes. At least when you're using a maximum parsimony method, you're looking at a method which assesses character evolution as a discrete event, which is certainly a step above NJ.

hecaterin
04-08-2008, 02:05 PM
BTW, all the simple distance trees that I've seen match the standard phylogeny nicely. How common would a mismatch be, in reality?

And is a sequence distance tree an example of a "phenogram"?