![]() ![]() |
|
BLAT (Blast-like Alignment Tool) |
The Kilokluster is a large compute farm, consisting of 1024 Pentium III processors running on the GNU/Linux operating system. UCSC computer scientists created the Kilokluster for genome-wide analyses. It is located at the School of Engineering under the supervision of Prof. David Haussler. Processors for the Kilokluster are housed in 8 racks of 63 compute nodes and an intermediate server, each with dual Pentium III processors, allowing 128 processors to be fit into a single rack. This provides an exceptional amount of inexpensive computing power in a minmal space. The Kilokluster is principally used for genome assembly, analysis, and comparison. It uses a custom batch scheduling system, called Parasol, to manage the high volume of jobs. Batches with very large memory, disk, or database requirements can be scheduled to run only on the eight intermediate servers, which each have 4GB of RAM and 120GB of disk. Access to the facility and technical support can be obtained by contacting the Genome Bioinformatics Group. The Centicluster compute farm was originally built to assemble the first working draft of the Human Genome. It consisted of 100 Pentium III processors. Now reduced to 40 processors, it is principally used for protein sequence comparison and protein structure prediction (see SAM, below). This compute farm utilizes Condor, a batch scheduling system created at the University of Wisconsin. The Centicluster is also located at the School of Engineering under the supervision of Prof. Haussler. Access and technical support can be obtained through the Genome Bioinformatics Group. Aligning whole genomes against each other is one of the most compute-intensive problems in bioinformatics. By breaking genomes into pieces and distributing smaller jobs to many CPUs, processing time is greatly reduced. Unfortunately, this sometimes results in hundreds of thousands of jobs being queued for processing, and traditional schedulers cannot handle such large queues effectively. Parasol was designed by Jim Kent (UCSC Genome Bioinformatics Group) to schedule extremely large batches of jobs for processing, and responds rapidly to inevitable systems failures that occur on such large clusters by automatically removing a machine from service when a problem is detected. Although originally written for the Kilokluster, Parasol is portable to other operating systems. Parasol is an open source project and available without restriction (credit to the author is required). BLAT (Blast-like Alignment Tool) BLAT was designed by Jim Kent for the Genome Browser Database to enable rapid sequence alignments. It can be run directly on the Kilokluster or via the Genome Browser. BLAT is more accurate and 500 times faster than popular tools for DNA sequence alignments, and 50 times faster for protein alignments when comparing vertebrate sequences. Genome Browser Database The Genome Browser Database UCSC’s Genome Browser Database is used for genome wide analysis and comparison. Originally created to support the Human Genome Project, it now also contains annotated sequence data for the mouse and rat genomes. Other vertebrate genomes will be added as they approach completion. The database represents the collaborative efforts of numerous institutions and individual annotators from around the world. The database is maintained on the Kilokluster computing facility at the SOE. Specially designed computational tools (e.g. BLAT, Parasol) enable a wide range of rapid and complex genomic analyses. The database can be accessed via the web for partial sequence analysis, while large-scale research projects generally take place at the Kilokluster itself. Accounts and instructions for using the database are available through the Genome Bioinformatics Group. UCSC’s Genome Browser is an efficient, user-friendly, web-based tool for the display of data from the Genome Browser Database. The Browser was designed by Jim Kent and is maintained and upgraded by the Genome Bioinformatics Group. The Browser can display any requested portion of a genome at any scale. It contains several dozen annotation tracks for sequence analysis and comparison of human, mouse and rat genomes. It displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, and transposon repeats as a stack of co-registered tracks. Users can also add custom tracks for educational or research purposes. Secondary links at the web site lead to sequence details and supplementary off-site databases. At present, the web site receives over 120,000 requests per day. The Splicing Microarray Database is designed to implicitly store the graph structure of a gene, making it possible to relate individual probes to particular exons, introns, or splice junctions. The database supports standard and splicing microarrays, with flexibility for new experimental designs. It includes a suite of web cgi programs that allow users to load, normalize, and visualize their data. Of particular interest for splicing analyses, users can view gene structures from loci of interest over a series of experiments and browse results for particular exons, introns, or splice junctions. The Improbizer is a software tool for detecting regulatory motifs in DNA or RNA sequences. It uses a variation of the expectation maximization (EM) algorithm. This tool finds sequence patterns that occur more frequently than those that appear by chance (i.e. background levels). An assortment of hidden Markov models can be used to adjust for the varying nucleotide background and foreground levels of different species. Designed by Jim Kent for the SOE computing clusters, the program is also downloadable. Protein Structure Prediction Webserver (SAM-T02) SAM-T02 is a web-based tool for predicting the fold and secondary structure of a target protein sequence. It uses multi-track hidden Markov models and neural nets trained on multiple alignments generated by the SAM-T2K iterated search procedure. SAM was developed by the research groups of Kevin Karplus and Richard Hughey and is maintained at the SOE Center for Biomolecular Science and Engineering. The SAM webserver includes links to download stand-alone programs, which are free to academics, government laboratories, and non-profits. The Yeast Intron Database is a web-based tool with genome level information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. Developed by the research groups of Professors Manual Ares (Dept. of MCD Biology) and David Haussler (Dept. of Biomolecular Engineering), the database lists known spliceosomal introns of yeast and documents the splice sites used by this organism. This information is used to understand splicing patterns, how they are regulated globally, and change during evolution. The website also contains graphs, histograms, images, and hidden Markov model information. Data can be both downloaded and submitted on-line. The Intronerator The Intronerator is a collection of web-based tools for exploring the molecular biology and genomics of C. elegans, with a special emphasis on alternative splicing. Developed at the Department of MCD Biology by Professor Al Zahler and Jim Kent (now in the Genome Bioinformatics Group), it includes a catalog of alternatively spliced genes, an intron database, software for genome alignment comparisons between species, and many other useful tools for molecular biology studies. top of page |
|||
Research Faculty | Laboratory Facilities | Biocomputing Resources | Training Programs Biomedical Research Website by David M. States Last updated April 15, 2005 |