Benefits for Publishing Family Genomes on the Internet

It has been for a long while since I’ve been wanting to write about the stuff that Mike Cariaso, founder of SNPedia, has been doing with my family genotypes. Initially, he performed their data analysis with Promethease for assignment of traits and annotation to observed SNPs. More recently, he has also developed a tool for visualization and comparison of genotypes between different people. He has used my family’s and Manu Sporny’s genotypes as test cases.

This is an unanticipated benefit we have experienced as a family for publishing our genomes on the Internet. Using Promethease’s report we were able to learn that dad is lactose intolerant. The fact that he did not like milk and had not taken milk in years kind of made sense when we discovered that his two SNPs rs4988235(C;C) and rs182549(C;C) make him unlikely to digest lactose with 70% probability. This result regarding lactose intolerance was in fact in the 23andMe report but we missed it.

It is clear that Direct-to-consumer genetic companies do try to cater to the non-expert, i.e. the majority of its customer base. The novel SNPedia visualization tool will be an useful addition to those of us who strive to DIY our own discoveries about our personal genomes data.

Using his visualization tool, when I compare all my SNPs with those of my sister’s, I find that 68% of mine are identical to hers, a total of 389,250 (see below).

SNP comparison between my sister and myself

Note that the graph is using a logarithmic scale. Of all our analyzed SNPs, 25% are halfmatch (i.e. one of the alleles is common to both of us) and about 2% are conflicts. Example of conflicts may include different SNPs with the same position. This, according to Mike, may not be an accident. Because I know that we were analyzed in two different array platforms, version 2 and version 3 respectively, I can now tell the number of SNPs that are different between both of us, i.e. not present in either genotype.  Of the total 0.5 Million plus SNPs in my genome about 29,082 do not match hers.

The other nice feature this tool provides is an actual graphical representation of chromosomal SNPs in a map of pixels, colored consistently with the above notations: light blue means match, dark blue halfmatch, red conflict and grey different SNPs:

Pixelated map of chromosome 1/chromosome 1 comparison between me and sister

The above figure shows two representations for chromsome/chromosome comparison between my chromosome 1 and my sister’s. Clearly most of the area is light blue, indicating complete match. Also the number of differences, halfmatches and conflicts are reported. Clicking on any of these links, one can find the actual SNPs in conflict, getting an output that looks like this:

1	rs9729550	1	1125105	CC	AA
2	rs12142199	1	1239050	GG	AA
3	rs7531583	1	1696020	GG	AA
4	rs6681938	1	1771080	CC	TT
5	rs41307846	1	1949559	GG	--
6	rs3128296	1	2058766	TT	GG
7	rs262654	1	2079386	AA	GG
8	rs262688	1	2103425	GG	TT
9	rs6659405	1	2362949	TT	GG
10	rs4648482	1	2739781	CC	TT
11	rs2483266	1	3225901	CC	TT
12	rs868688	1	3290667	TT	CC
13	rs10492939	1	3292731	AA	GG
14	rs2493268	1	3298358	TT	CC
15	rs871822	1	3302774	GG	TT
16	rs12024847	1	3310659	TT	CC
17	rs2821017	1	3510731	GG	AA
18	rs3765761	1	3620336	CC	TT
19	rs3765766	1	3624520	TT	CC
20	rs4233262	1	4136842	CC	TT
21	rs966321	1	4215064	GG	TT
22	rs964715	1	4216644	TT	CC
23	rs1390136	1	4241703	CC	TT
24	rs4654545	1	4425464	TT	CC
25	rs446529	1	4695274	CC	TT

This table shows that for the first SNP, rs9729550, I have CC while my sister has AA.

In conclusion, Promethease and the SNPedia visualization tool is helping me learn more about my SNP genotype results, complementing the information that I initially got from my Direct-to-consumer provider. Hopefully I will be able to do some additional research based on the results hereby obtained.

If you want to see my family’s genomes with Mike Cariaso’s tool you can find it here:

Don’t forget to send me any exciting findings that you might encounter!


  1. conflicts are possible when someone was run on multiple platforms. An example is David Ewing Duncan


    different platforms are reporting different things for his rs2660917. That is a real conflict.

    In the case of the Corpas family, in order to get the graphs the way I wanted (an M x N grid), I had to pool all of the family’s files together. This makes a correct Family comparison

    but a meaningless top level report

    In particular note the 22k genotypes, while any of the individual family members reports have only 13k genotypes

    or 9k in the case of Manuel on the older v2.

    Future releases should either resolve this or make it clearer what’s going on. But I practice ‘release early, release often’ and with this release my goal was just to get the family comparison graphs working well.

    Some of this was explained to Manuel in emails between us as I was showing him my work in progress. The fact he announced it so widely was a bit of a (pleasant) surprise, but does lead to this sort of confusion. For this reason I’ve not yet announced the new family features anywhere on the Promethease page. When it’s working to my satisfaction, I’ll hype this a bit more.

    As for the images, The clearest example of crossover is in the comparison of the mother vs the aunt. However in the Manuel vs sister image above, there does appear to be 5 bands of

    (light blue, dark blue)
    (light blue, dark blue, red)
    (light blue, dark blue)
    (light blue, dark blue, red)
    (light blue, dark blue)

    It’s subtle, and perhaps still not easy enough to see. Also its subject to the choice of probes on the microarray. The fact Manuel is on a v2 while his his sister is a v3 makes this quite difficult. However chromosomes 2 & 3 show the effect much more clearly.

  2. How many of the “conflict” SNPs are real differences and how many are SNP-call errors from the platform? What would the result be if you had your data run twice (with different platforms)? How many differences would you see from yourself?

    The random pattern in the chr1 comparison does not look like crossover between parental genomes, but just noise.

  3. Those features are unique to the paid reports from 0.1.114 and later, which isn’t yet online. But what the heck, ok now it is.

    To make a report similar to the ones above put all of your family members data in during step 1. Then move through the wizard, and pay the $2. When it’s ready you’ll be able to step through a few more wizard screens. During the F1 report question put all of your family members data in again. Hit Next a few more times, and then let it run.

    Most of the report won’t make any sense, since it is pooling all of your family members into one virtual person. But the very bottom of the page will have a link to an ‘experimental family report’. At that link you will find the ALL vs ALL comparison of the family members.

    PS. Version 0.1.115 and later may change this behavior, caveat emptor.

