A friend of mine in the genetic anthropology group at UNM needed a tool to help her mine data from the National Center for Biological Information, so I built her such a thing. It was a fairly simple project but nonetheless was my first attempt at webscraping that resulted in more than a bit of fun.
She used the data to characterize the global pattern of high-frequency derived genetic variation in continental ancestry groups. This was motivated by a need to understand evolutionary opportunities for group-specific disease variants, as this information can help us understand where we can find novel disease variants and how those may differ from current “worldwide genomic datasets”.
The work is not published yet, but it was submitted as
AIMs and Ascertainment Bias in Genomic Diversity Sets - S. Niedbalski and J.C. Long