What I read this week
Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform 2010;11:473-483.
- hash tables, spaced seed, q-gram filter, multiple seed hits, suffix/prefix tries, seed extension, suffix trie, FM-index, inexact matches, gapped alignment,
- Aligning sequence reads: gapped alignment, paired-end and mate-pair mapping, base quality, long-read aligner, capillary read aligner, SOLiD reads, bisulfite-treated reads, spliced reads, realignment
- Speed, memory considerations
Revital Eres, Gad M Landau, Laxmi Parida. Permutation pattern discovery in biosequences.Journal of computational biology a journal of computational molecular cell biology 2004 11 (6) p. 1050-1060
- sliding window technique for pattern matching with examples
Smith TF, Waterman MS. Indentification of common molecular subsequences. J Mol Biol 1981;147:195-7.
- Similarity (homology) measure
- More detailed algorithm for the Smith-Waterman homology measure, comparison to Sellers and Needleman and Wunsch algorithms
MacIsaac KD, Fraenkel E (2006) Practical strategies for discovering regulatory DNA sequence motifs. PLoS Comput Biol 2(4): e36. DOI: 10.1371/journal.pcbi0020036
- DNA Encoding Schemes with examples
o consensus sequence of preferred nucleotides (ACGT)
o position weight matrix (PWM)
o example: seq to pwm 6 positions
- Clustering of DNA - techniques, dimensionality
o hierarchical clustering to the motifs and combined clusters with a similarity exceeding 70% by computing a consensus sequence
- Distance / similarity measure for DNA sequences
o fraction of common bits as a similarity metric
o Pearson correlation coefficient between motif PWMs as the similarity measure
I need to understand basics of DNA, Computational Chemistry, and Bioinformatics better. I have a few book chapters downloaded from the WVU libraries electronic collection.