Consensus Higher Order Repeats and Frequency of String Distributions in Human Genome

نوع فایل : کتاب
زبان : انگلیسی
مؤلف : Vladimir Paar1,*, Ivan Basar1, Marija Rosandi..2 and Matko Glun..i..1
چاپ و سال / کشور: 2007

Description

Key string algorithm (KSA) could be viewed as robust computational generalization of restriction enzyme method. KSA enables robust and effective identification and structural analyzes of any given genomic sequences, like in the case of NCBI assembly for human genome. We have developed a method, using total frequency distribution of all r-bp key strings in dependence on the fragment length l, to determine the exact size of all repeats within the given genomic sequence, both of monomeric and HOR type. Subsequently, for particular fragment lengths equal to each of these repeat sizes we compute the partial frequency distribution of r-bp key strings; the key string with highest frequency is a dominant key string, optimal for segmentation of a given genomic sequence into repeat units. We illustrate how a wide class of 3-bp key strings leads to a key-string-dependent periodic cell which enables a simple identification and consensus length determinations of HORs, or any other highly convergent repeat of monomeric or HOR type, both tandem or dispersed. We illustrated KSA application for HORs in human genome and determined consensus HORs in the Build 35.1 assembly. In the next step we compute suprachromosomal family classification and CENP-B box / pJ.. distributions for HORs. In the case of less convergent repeats, like for example monomeric alpha satellite (20-40% divergence), we searched for optimal compact key string using frequency method and developed a concept of composite key string (GAAAC--CTTTG) or flexible relaxation (28 bp key string) which provides both monomeric alpha satellites as well as alpha monomer segmentation of internal HOR structure. This method is convenient also for study of R-strand (direct) / S-strand (reverse complement) alpha monomer alternations. Using KSA we identified 16 alternating regions of R-strand and S-strand monomers in one contig in choromosome 7. Use of CENP-B box and/or pJ.. motif as key string is suitable both for identification of HORs and monomeric pattern as well as for studies of CENP-B box / pJ.. distribution. As an example of application of KSA to sequences outside of HOR regions we present our finding of a tandem with highly convergent 3434-bp long monomer in chromosome 5 (divergence less then 0.3%).

Current Genomics, 2007, 8, 93-111 Received on: January 17, 2007 - Revised on: January 26, 2007 - Accepted on: January 30, 2007

Consensus Higher Order Repeats and Frequency of String Distributions in Human Genome

Description

اگر شما نسبت به این اثر یا عنوان محق هستید، لطفا از طریق "بخش تماس با ما" با ما تماس بگیرید و برای اطلاعات بیشتر، صفحه قوانین و مقررات را مطالعه نمایید.

دیدگاه کاربران