Class Utils

java.lang.Object
jebl.evolution.sequences.Utils

public class Utils extends Object
Version:
$Id: Utils.java 918 2008-06-04 01:28:08Z twobeers $
Author:
Andrew Rambaut, Alexei Drummond
  • Method Details

    • translate

      public static Sequence translate(Sequence sequence, GeneticCode geneticCode)
      Translates a given Sequence to a corresponding Sequence under the given genetic code. Simply a utility function that calls AminoAcidState[] translate(final State[] states, GeneticCode geneticCode)
      Parameters:
      sequence - the Sequence.
      geneticCode -
      Returns:
    • translate

      public static Sequence translate(Sequence sequence, GeneticCode geneticCode, int readingFrame)
      Translates a given Sequence to a corresponding Sequence under the given genetic code. Simply a utility function that calls AminoAcidState[] translate(final State[] states, GeneticCode geneticCode)
      Parameters:
      sequence - the Sequence.
      geneticCode -
      readingFrame -
      Returns:
    • translate

      public static AminoAcidState[] translate(State[] states, GeneticCode geneticCode)
      Translates each of a given sequence of NucleotideStates or CodonStates to the AminoAcidState corresponding to it under the given genetic code. Translation doesn't stop at stop codons; these are translated to AminoAcids.STOP_STATE. If translating from NucleotideState and the number of states is not a multiple of 3, then the excess states at the end are silently dropped.
      Parameters:
      states - States to translate; must all be of the same type, either NucleotideState or CodonState.
      geneticCode -
      Returns:
    • translate

      public static AminoAcidState[] translate(State[] states, GeneticCode geneticCode, int readingFrame)
      Translates each of a given sequence of NucleotideStates or CodonStates to the AminoAcidState corresponding to it under the given genetic code. Translation doesn't stop at stop codons; these are translated to AminoAcids.STOP_STATE. If translating from NucleotideState and the number of states is not a multiple of 3, then the excess states at the end are silently dropped.
      Parameters:
      states - States to translate; must all be of the same type, either NucleotideState or CodonState.
      geneticCode -
      readingFrame -
      Returns:
    • isPredominantlyRNA

      public static boolean isPredominantlyRNA(CharSequence sequenceString, int maximumNonGapsToLookAt)
      Is the given NucleotideSequence predominantly RNA? (i.e the more occurrences of "U" than "T")
      Parameters:
      sequenceString - the sequence string to inspect to determine if it's RNA
      maximumNonGapsToLookAt - for performance reasons, only look at a maximum of this many non-gap residues in deciding if the sequence is predominantly RNA. Can be -1 or Integer.MAX_VALUE to look at the entire sequence.
      Returns:
      true if the given NucleotideSequence predominantly RNA
    • reverseComplement

      public static String reverseComplement(String nucleotideSequence)
    • reverseComplementWithGaps

      public static String reverseComplementWithGaps(String nucleotideSequence)
    • translateCharSequence

      public static String translateCharSequence(CharSequence nucleotideSequence, GeneticCode geneticCode)
      Translates the given nucleotideSequence into an amino acid sequence string, using the given geneticCode. The translation is done triplet by triplet, starting with the triplet that is at index 0..2 in nucleotideSequence, then the one at index 3..5 etc. until there are less than 3 nucleotides left.

      This method uses translate(State[],GeneticCode) to do the translation, hence it shares some properties with that method: 1.) Any excess nucleotides at the end will be silently discarded, 2.) Translation doesn't stop at stop codons; instead, they are translated to "*", which is AminoAcids.STOP_STATE's code.

      Parameters:
      nucleotideSequence - nucleotide sequence to translate
      geneticCode - genetic code to use for the translation
      Returns:
      A string with length nucleotideSequence.length() / 3 (rounded down), the translation of nucleotideSequence with the given genetic code
    • translate

      public static String translate(String nucleotideSequence, GeneticCode geneticCode)
      A wrapper for translateCharSequence(CharSequence,GeneticCode) that takes a nucleotide sequence as a String only rather than a CharSequence. This is to preserve backwards compatibility with existing compiled code.
      Parameters:
      nucleotideSequence - nucleotide sequence string to translate
      geneticCode - genetic code to use for the translation
      Returns:
      A string with length nucleotideSequence.length() / 3 (rounded down), the translation of nucleotideSequence with the given genetic code
    • stripGaps

      public static State[] stripGaps(State[] sequence)
      Strips a sequence of gaps
      Parameters:
      sequence - the sequence
      Returns:
      the stripped sequence
    • stripStates

      public static State[] stripStates(State[] sequence, List<State> stripStates)
      Strips a sequence of any states given
      Parameters:
      sequence - the sequence
      stripStates - the states to strip
      Returns:
      an array of states
    • replaceStates

      public static State[] replaceStates(State[] sequence, List<State> searchStates, State replaceState)
      Searchers and replaces a sequence of any states given
      Parameters:
      sequence - the sequence
      searchStates - the states to search for
      Returns:
      an array of states
    • reverse

      public static State[] reverse(State[] sequence)
    • complement

      public static NucleotideState[] complement(NucleotideState[] sequence)
    • reverseComplement

      public static NucleotideState[] reverseComplement(NucleotideState[] sequence)
    • getStateIndices

      public static byte[] getStateIndices(State[] sequence)
    • getGaplessLocation

      public static int getGaplessLocation(Sequence sequence, int gappedLocation)
      Gets the site location index for this sequence excluding any gaps. The location is indexed from 0.
      Parameters:
      sequence - the sequence
      gappedLocation - the location including gaps
      Returns:
      the location without gaps.
    • getGappedLocation

      public static int getGappedLocation(Sequence sequence, int gaplessLocation)
      Gets the site location index for this sequence that corresponds to a location given excluding all gaps. The first non-gapped site in the sequence has a gaplessLocation of 0.
      Parameters:
      sequence - the sequence
      gaplessLocation -
      Returns:
      the site location including gaps
    • guessSequenceType

      public static SequenceType guessSequenceType(CharSequence seq)
      Guess type of sequence from contents.
      Parameters:
      seq - the sequence
      Returns:
      SequenceType.NUCLEOTIDE or SequenceType.AMINO_ACID, if sequence is believed to be of that type. If the sequence contains characters that are valid for neither of these two sequence types, then this method returns null.
    • getStopCodonCount

      public static int getStopCodonCount(Sequence sequence)
      Counts the number of stop codons in an amino acid sequence
      Parameters:
      sequence - the sequence string to count stop codons
      Returns:
      the number of stop codons
    • cleanSequence

      public static State[] cleanSequence(CharSequence seq, SequenceType type)
      Produce a clean sequence filtered of spaces and digits.
      Parameters:
      seq - the sequence
      type - the sequence type
      Returns:
      An array of valid states of SequenceType (may be shorter than the original sequence)
    • toString

      public static String toString(State[] states)