One possible relation is {(a, b) | a and b share two biological parents} This relation is an equivalence relation. (I'm assuming that everyone has exactly two biological parents.)

Another possible relation is {(a, b) | a and b share at least one biological parent} This relation is reflexive and symmetric, but not transitive. There are cases where someone has two half-siblings who are neither full siblings nor half-siblings of each other.

Another possible relation is {(a, b) | a is a direct ancestor of b} This relation is transitive, but neither reflexive nor symmetric.

Another possible relation is {(a, b) | the set of a's ancestors is a subset of the set of b's ancestors} This relation is reflexive and transitive, but not symmetric.

What's cool about equivalence relations on a set X is that such a relation induces a partition of X.

Roughly, a partition of a nonempty set X is a way of chopping up X into distinct nonempty non-overlapping pieces. More precisely, a partition of X is a set P whose elements are nonempty subsets of X, such that each element of X is an element of exactly one element of P.

Let R be an equivalence relation on X. For each x ∈ X, we define the "equivalence class" of x (under R) by

[x]_{R} = {y ∈ X | (y, x) ∈ R}.

(We can omit the subscript from "[x]" if we make it clear from the context which relation is being specified.)

Then the set {[x]_{R} | x ∈ X} (which is sometimes denoted X/R) is a partition of X.

Let's prove that. (Here, I'll drop the subscript from the equivalence class notation, and I'll use "∼" to denote relatedness.) Every element of our alleged partition is an equivalence class of some element of X, and is therefore a subset of X. The reflexive property of R guarantees that x ∈ [x] for every x ∈ X, so every such equivalence class is nonempty, and every element of X is in at least one of those equivalence classes. All that remains to show is that an element of X cannot be in two distinct equivalence classes. Suppose that x ∈ [y]. Then x ∼ y and so y ∼ x by the symmetric property. For any z ∈ [y], we have z ∼ y and hence z ∼ x by the transitive property, so z ∈ [x]. Thus [y] ⊆ [x]. Similarly, for any z ∈ [x], we have z ∼ x and hence z ∼ y by the transitive property, so z ∈ [y]. Thus [x] ⊆ [y]. It follows that [y] = [x], and thus [x] is the only equivalence class containing x. Done.

So from an equivalence relation, the set of equivalence classes forms a partition. Conversely: Given a partition, we can get an equivalence relation whose equivalence classes are just the elements of the partition: R = {(x, y) | y belongs to the same element of the partition as x does} or equivalently R = {(x, y) | either x = y or (x ≠ y and {x, y} is a subset of some element of the partition)}

The moral of the story: When you have an equivalence relation, you have a partition, and vice-versa.

Let f be a relation with the property that, for each x ∈ dom f, the set {y | (x, y) ∈ f} is a singleton. Then we call f a "function".

For any x ∈ dom f, we use the notation f(x) to denote the sole element of rng f having the property that (x, f(x)) ∈ f. (f(x) is read as "f of x".)

You can think of a function as a "rule of correspondence" that takes some object x as input, and consistently spits out some other object f(x) as output. The "input" object is sometimes called the "argument" of the function.

When we introduce a function, we often describe it as being "from" some set, "into" another set. For example, the function f could be "from X into Y" for some sets X and Y. This specification is often given in such notation as f : X → Y. It means simply that the domain of f is X, and that the range of f is some subset of Y.

If we described f as being "from X onto Y" (instead of "into Y"), that would mean that the range of f is precisely Y (as opposed to being merely some unspecified subset of Y).

When a function f : X → Y is described as a "surjection", this is just a fancy way of saying that it's a function "onto" the set to which the arrow is pointing. Given the way we've defined functions here, being a surjection is not really a property of the function itself, but of the way in which the function is presented. If Z ⊂ Y and rng f = Z, the function f is a surjection if we present it as f : X → Z, but the very same function could be presented instead as f : X → Y, in which case it is not a surjection. However, there are contexts in which we are concerned with a family of functions that are all from some set X into some other set Y. In such a context, it makes sense to highlight whether the range of such a function is all of Y or otherwise, so it makes sense to describe such a function as a surjection or not a surjection.

If f has the property that f(x) ≠ f(y) whenever x and y are two distinct (unequal) elements of dom f, then f can be described as "one-to-one". A one-to-one function is also called an "injection".

Recall that we defined earlier the "inverse" of a relation. Since we've defined a function as a type of relation, each function therefore has an inverse. If f is an injection, then f^{ -1} is also an injection. But if f is a function that is not an injection, then f^{ -1} is not a function at all.

A function that is both an injection and a surjection is also called a "bijection" or a "one-to-one correspondence".

Suppose f is a function whose domain is X. Let Z be a subset of X. Then the "restriction of f to Z", sometimes denoted f|_{Z}, is simply the function whose domain is Z, having the property that f|_{Z}(x) = f(x) for all x ∈ Z. Such restrictions are often useful insofar as a suitably chosen restriction of a function that is not an injection can be an injection, and can therefore have an inverse that is also a function.

For any function f and any set A, we can define the "image of A under f" as the set f(A) = {f(x) | x ∈ A ⋂ dom f} and also define the "inverse image of A under f" as the set f^{ -1}(A) = {x ∈ dom f | f(x) ∈ A}.

Let f be an injection. Then f^{ -1} is also a function, and we can refer to f^{ -1}(x) for any x ∈ rng f. If y = f(x), then x = f^{ -1}(y).

Recall that we defined "compositions" of relations. Since we've defined a function as a type of relation, we can similarly make compositions of functions. A composition of functions is also a function.

Perhaps the most useful case arises when we have functions f : X → Y and g : Y → Z, such that the range of f is a subset of the domain of g. Then the function g ∘ f has the properties dom (g ∘ f) = dom f = X rng (g ∘ f) = g (rng f) ⊆ rng g (g ∘ f) (x) = g(f(x)) for all x ∈ X

The function i_{X} : X → X, defined by i_{X}(x) = x for all x ∈ X, is called the "identity function on X". It is clear that i_{X} is one-to-one (and "onto" X) and that i_{X}^{-1} = i_{X}.

If f : X → Y is one-to-one, then f^{ -1} ∘ f = i_{X} and f ∘ f^{ -1} = i_{f(X)}.

Let f be a function whose domain is a set of ordered pairs. If (a, b) ∈ dom f, then we would often write f(a, b), rather than f((a, b)), to denote the "output" that the function gives when (a, b) is the input. Although the argument of the function is actually an ordered pair, we can get away with referring to the coordinates of that ordered pair as arguments (plural) of the function. The first coordinate of the (ordered pair) argument would be called the "first argument" while the second coordinate of the (ordered pair) argument would be called the "second argument".

If f : X × X → X, then we can refer to f as a "binary operation on X". In a looser sense, we might even call f a "binary operation on X" if its domain is merely a subset of X × X, but I intend to stick with the pickier sense of the term.

Some common binary operations are expressed in a rather different notation. Instead of putting the name of the function before the two arguments, as in "f(a, b)", we often put the name of the function between the two arguments, as in "a f b". (In other words, we sometimes use "infix notation" instead of "prefix notation".) Such functions often have their own dedicated symbols, so we don't have to use letters.

Addition (of numbers) is one of these common binary operations, familiar to us from childhood. It uses the symbol "+". And customarily we use infix notation: That is, we write "a + b", rather than "+(a, b)", to refer to the result of applying this function to (a, b).

I'll probably have much more to say about addition later in this thread.

In mathematics, there are many kinds of "structures" (my word there, not a standard term) whose definitions take (roughly) the form "a set S, taken together with [some other stuff], such that [certain properties are obeyed]". A fairly typical "other stuff" would be a binary operation on the underlying set S.

For example, one of these types of "structures" is a "group". A group is defined as a set S, taken together with a binary operation (called the "group operation") on S (I'll use a dot, •, to represent this operation, and I'll use infix notation for it), such that (1) for all a, b, c ∈ S, (a • b) • c = a • (b • c) - in other words, the group operation is associative; (2) there exists a unique element e ∈ S (called the "identity"), such that for all a in S, a • e = e • a = a; (3) for each a ∈ S, there exists another element b ∈ S, called the "inverse" of a, such that a • b = b • a = e (where e is the identity as defined above).

Don't worry about the details for now. I don't intend to get into group theory in this thread (although it is a worthy subject, and you can fill bookshelves with it). My point is simply to present an example of a mathematical "structure".

We have defined a group in terms of two things: first, the underlying set S, and second, the binary operation •. We could write G = (S, •), thinking of the group G as being a special kind of ordered pair. It may be surprising, therefore, that in practice we often identify a group with its underlying set. In particular, elements of the underlying set S are referred to as elements of the group G. We even write things like a ∈ G, to mean that a ∈ S where G = (S, •). This practice is commonplace with most kinds of mathematical "structures".

When the uncool school gets going, we'll see "fields" and "vector spaces" as two important kinds of "structures".

We're at the point that I would consider as a bare minimum before embarking on the uncool school or on any similar course in maths. So I could stop here. But I won't! Instead, I'm going to blather on about numbers for a while, maybe a hundred posts or so.

Anyway, I was going from a distant memory of something I read 30 years ago, in a footnote in one of my textbooks. I ought to dig that book up and see what it actually says.

I looked up the brief footnote in question, which appears very early on in a book on algebra by Allenby. It's consistent with Quiz's response. After giving a very brief treatment of Russell's Paradox itself, Allenby says that schemes exist to restrict the kinds of propositions that can be used to define a set, thus avoiding such paradoxes. In particular he mentions Zermelo, and says that all the sets mentioned throughout his book are consistent with Zermelo's scheme. That's about it.

Hey guys - I had planned to write a new intro and second post this week, but technical difficulties got in the way. Hopefully next week, if technical corrections go well.

We have already used the symbol ℕ for the set of natural numbers. These have a bunch of familiar properties. It turns out that we can derive everything we need to know about the natural numbers from a small set of axioms. These (five) axioms are called the "Peano postulates":

P1: 1 ∈ ℕ. (That is, ℕ is nonempty, and there's a special element of ℕ which we happen to call "1".)

P2: For each n ∈ ℕ, there is a unique element n* ∈ ℕ called the "successor" of n. (We can think of the successor of n as the "next number" after n that we reach by ordinary counting.)

P3: For each n ∈ ℕ, n* ≠ 1. (So 1 is not the successor of any element of ℕ.)

P4: For each m, n ∈ ℕ such that m ≠ n, n* ≠ m*. (So distinct elements of ℕ have distinct successors.)

P5: Suppose that A is a subset of ℕ, such that 1 ∈ A, and having the property that for any n ∈ ℕ, n ∈ A implies n* ∈ A. Then A = ℕ.

We could have expressed P2 as defining a function s : ℕ → ℕ, where s(n) is the successor of n. P3 says that 1 ∉ rng s, and P4 says that s is an injection. But here we're using the notation n* instead of s(n).

P5 is worth chewing on. If some subset of ℕ includes 1 and includes the successor of every one of its own elements, then that subset of ℕ is really all of ℕ; nothing is left out. It follows that every element of ℕ \ {1} is the successor of some element of ℕ.

P5 gives us a really useful way of proving things about all natural numbers. Suppose we have some proposition P(n) that depends on n. Let A be the set of natural numbers n for which the proposition P(n) is true. Suppose that you prove that P(1) is true (i.e. that 1 ∈ A). Suppose further that you also prove that P(n*) is true whenever P(n) is true (i.e. that for any n ∈ ℕ, n ∈ A implies n* ∈ A). In that case, you have proved that P(n) is true for all n ∈ ℕ (i.e. that A = ℕ). This technique is called "mathematical induction". Hence the postulate P5 is sometimes called "the principle of mathematical induction". We'll see a bunch of examples in later posts.

Anyway, let us simply assume that the set ℕ, obeying the Peano postulates, exists. There's an astonishing amount of stuff that we can prove without adding any further assumptions.

Now we can give each natural number its conventional name.

We already have 1, from P1.

We know from P2 that 1* exists, and from P3 that 1* ≠ 1. So let's give 1* a different name. Let's define 2 = 1*.

We know from P2 that 2* exists, and from P3 that 2* ≠ 1, and from P4 that 2* ≠ 2 (because 2 = 1*, and 2 ≠ 1). So let's give 2* a different name. Let's define 3 = 2*.

We know from P2 that 3* exists, and from P3 that 3* ≠ 1, and from P4 that 3* ≠ 2 (because 2 = 1*, and 3 ≠ 1), and from P4 again that 3* ≠ 3 (because 3 = 2*, and 3 ≠ 2). So let's give 3* a different name. Let's define 4 = 3*.

Carrying on in this way - giving the successor of each newly-named number its own name - we are sure never to name the same number twice.

Our notation "10" consists of two symbols ("digits") strung together, but should be treated as a single symbol. Similarly, the conventional name for every natural number other than those in {1, 2, 3, 4, 5, 6, 7, 8, 9} will also consist of two or more digits strung together, but each such multi-digit name should be treated as a single symbol. Each digit will be one of {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}, but the leftmost digit will not be "0".

For convenience, let's define a function t : ℕ \ {1, 2, 3, 4, 5, 6, 7, 8, 9} → ℕ, where for any n ∈ dom t, the number t(n) is the number whose conventional name is given by taking the conventional name of n and deleting the rightmost digit.

Let n ∈ dom t. Then we have the following three cases, to derive the conventional name for n* from the conventional name for n: Case 1: If the rightmost digit of the conventional name for n is "0", then the conventional name for n* is given by appending the digit "1" to the right of the conventional name for t(n). Case 2: If the rightmost digit of the conventional name for n is one of {"1", "2", "3", "4", "5", "6", "7", "8"}, then let m be the number in {1, 2, 3, 4, 5, 6, 7, 8} whose name is the same as the rightmost digit of the conventional name for n. In this case, the conventional name for n* is given by appending the digit corresponding to the conventional name for m* to the right of the conventional name for t(n). Case 3: If the rightmost digit of the conventional name for n is "9", then the conventional name for n* is given by appending the digit "0" to the right of the conventional name for t(n)*.

Case 3 is recursive, because the conventional name for t(n)* will itself fall into one of those three cases, unless it consists of only one digit, in which case the recursion is terminated.

So for every natural number that is named by the process above, the successor of that number is also named.

Therefore, if we let A = {n ∈ ℕ | n is given a name according to the process in this post}, then P5 tells us that A = ℕ. So every natural number is given a name.

Now let's define two binary operations on ℕ. One of them is addition (symbol "+", using infix notation), and the other is multiplication (symbol "∙", using infix notation): n + 1 = n* for each n ∈ ℕ; n + p* = (n + p)* for each n ∈ ℕ and p ∈ ℕ; n ∙ 1 = n for each n ∈ ℕ; n ∙ p* = (n ∙ p) + n for each n ∈ ℕ and p ∈ ℕ.

Addition is associative. That is, (m + n) + p = m + (n + p), for all m, n, p ∈ ℕ.

Proof: Let m, n ∈ ℕ. Let A = {q ∈ ℕ | (m + n) + q = m + (n + q)}. Clearly A ⊆ ℕ. Since (m + n) + 1 = (m + n)* = m + n* = m + (n + 1), we have 1 ∈ A. Now suppose that p ∈ A. Then (m + n) + p* = ((m + n) + p)* = (m + (n + p))* = m + (n + p)* = m + (n + p*), and so p* ∈ A also. Thus A = ℕ by induction (i.e., from axiom P5). So we have (m + n) + p = m + (n + p) for all p ∈ ℕ, given our choice of m and n. But m and n were arbitrary, so we're done.

Associativity of addition allows us to write things like m + n + p without ambiguity. This will be helpful in later proofs.

Addition is commutative. That is, m + n = n + m, for all m, n ∈ ℕ.

Proof: Let A = {p ∈ ℕ | 1 + p = p + 1}. Clearly A ⊆ ℕ. 1 ∈ A, trivially. If n ∈ A, then 1 + n* = 1 + n + 1 = n + 1 + 1 = n* + 1, so n* ∈ A. So A = ℕ by induction. So we have 1 + n = n + 1 for all n ∈ ℕ. Now let m ∈ ℕ. Let B = {p ∈ ℕ | m + p = p + m}. Clearly B ⊆ ℕ. We have already proved that 1 ∈ B. If n ∈ B, then m + n* = m + n + 1 = n + m + 1 = n + 1 + m = n* + m, so n* ∈ B. Thus B = ℕ by induction. So we have m + n = n + m for all n ∈ ℕ, given our choice of m. But m was arbitrary, so we're done.

Multiplication distributes over addition. That is, p ∙ (m + n) = (p ∙ m) + (p ∙ n), for all m, n, p ∈ ℕ, and (m + n) ∙ p = (m ∙ p) + (n ∙ p), for all m, n, p ∈ ℕ.

Proof of the first part: Let m, p ∈ ℕ. Let A = {q ∈ ℕ | p ∙ (m + q) = (p ∙ m) + (p ∙ q)}. Clearly A ⊆ ℕ. Notice that p ∙ (m + 1) = p ∙ m* = (p ∙ m) + p = (p ∙ m) + (p ∙ 1), so 1 ∈ A. If n ∈ A, then p ∙ (m + n*) = p ∙ (m + n)* = p ∙ (m + n) + p = (p ∙ m) + (p ∙ n) + p = (p ∙ m) + (p ∙ n*), so n* ∈ A. Thus A = ℕ by induction. So we have p ∙ (m + n) = (p ∙ m) + (p ∙ n) for all n ∈ ℕ, given our choice of m and p. But m and p were arbitrary, so we're done.

Proof of the second part: Let m, n ∈ ℕ. Let A = {q ∈ ℕ | (m + n) ∙ q = (m ∙ q) + (n ∙ q)}. Clearly A ⊆ ℕ. Notice that (m + n) ∙ 1 = m + n = (m ∙ 1) + (n ∙ 1), so 1 ∈ A. If p ∈ A, then (m + n) ∙ p* = ((m + n) ∙ p) + m + n = (m ∙ p) + (n ∙ p) + m + n = (n ∙ p) + (m ∙ p) + m + n = (n ∙ p) + (m ∙ p*) + n = (m ∙ p*) + (n ∙ p) + n = (m ∙ p*) + (n ∙ p*), so p* ∈ A. Thus A = ℕ by induction. So we have (m + n) ∙ p = (m ∙ p) + (n ∙ p) for all p ∈ ℕ, given our choice of m and n. But m and n were arbitrary, so we're done.

Multiplication is associative. That is, (m ∙ n) ∙ p = m ∙ (n ∙ p), for all m, n, p ∈ ℕ.

Proof: Let m, n ∈ ℕ. Let A = {q ∈ ℕ | (m ∙ n) ∙ q = m ∙ (n ∙ q)}. Clearly A ⊆ ℕ. Notice that (m ∙ n) ∙ 1 = m ∙ n = m ∙ (n ∙ 1), so 1 ∈ A. If p ∈ A, then (m ∙ n) ∙ p* = ((m ∙ n) ∙ p) + (m ∙ n) = (m ∙ (n ∙ p)) + (m ∙ n) = m ∙ ((n ∙ p) + n) = m ∙ (n ∙ p*), so p* ∈ A. Thus A = ℕ by induction. So we have (m ∙ n) ∙ p = m ∙ (n ∙ p) for all p ∈ ℕ, given our choice of m and n. But m and n were arbitrary, so we're done.

Associativity of multiplication allows us to write things like m ∙ n ∙ p without ambiguity. Again, this will be helpful in later proofs.

Multiplication is commutative. That is, m ∙ n = n ∙ m, for all m, n ∈ ℕ.

Proof: Let A = {p ∈ ℕ | p ∙ 1 = 1 ∙ p}. Clearly A ⊆ ℕ. 1 ∈ A, trivially. If n ∈ A, then 1 ∙ n* = (1 ∙ n) + 1 = (n ∙ 1) + 1 = n + 1 = n* = n* ∙ 1, so n* ∈ A. So A = ℕ by induction. So we have 1 ∙ n = n ∙ 1 for all n ∈ ℕ. Now let m ∈ ℕ. Let B = {p ∈ ℕ | m ∙ p = p ∙ m}. Clearly B ⊆ ℕ. We have already proved that 1 ∈ B. If n ∈ B, then m ∙ n* = (m ∙ n) + m = (m ∙ n) + (m ∙ 1) = (n ∙ m) + (1 ∙ m) = (n + 1) ∙ m = n* ∙ m, so n* ∈ B. Thus B = ℕ by induction. So we have m ∙ n = n ∙ m for all n ∈ ℕ, given our choice of m. But m was arbitrary, so we're done.

We have the following "cancellation law": p + m = p + n implies m = n, for all m, n, p ∈ ℕ.

Proof: Let A = {q ∈ ℕ | q + m = q + n implies m = n, for all m, n ∈ ℕ}. Let m, n ∈ ℕ, and suppose that 1 + m = 1 + n. Then m* = n*, and it follows from axiom P4 that m = n. So 1 ∈ A. Suppose that p ∈ A. Let m, n ∈ ℕ, and suppose that p* + m = p* + n. Then 1 + p + m = 1 + p + n, so p + m = p + n because 1 ∈ A, so m = n because p ∈ A. So p* ∈ A. Thus A = ℕ by induction. So we have p + m = p + n implies m = n, for all m, n, p ∈ ℕ. Done.

There's a useful relation on ℕ, symbolized by "<", that we can define as follows: For m, n ∈ ℕ, m < n if and only if there exists p ∈ ℕ such that m + p = n.

When m < n, we read that as "m is less than n".

Clearly, n < n* for all n ∈ ℕ. So the domain of "<" is (all of) ℕ.

This relation is transitive: If m, n, p ∈ ℕ and m < n and n < p, then we have q, r ∈ ℕ such that m + q = n and n + r = p, so p = m + q + r, so m < p.

Suppose m < n. Then m + p = n for some p ∈ ℕ. So n + 1 = m + p + 1 = m + p*. Axiom P3 tells us that p* ≠ 1, so the cancellation law in the previous post tells us that m ≠ n. Thus, we cannot have both m = n and m < n.

Given the transitivity of "<", we cannot have both m < n and n < m, or else we would have m < m, and we have just seen that this is impossible.

It follows that the "<" relation is neither reflexive nor symmetric.

Some other relations can be defined in terms of this one. For any m, n ∈ ℕ, m > n (read as "m is greater than n") means that n < m. m ≤ n (read as "m is less than or equal to n") means that m < n or m = n. m ≥ n (read as "m is greater than or equal to n") means that n ≤ m.

All of these relations are transitive. None of them is symmetric. "≤" and "≥" are reflexive, but "<" and ">" are not.