.TH "getseq" 1 ""Dec 1998"" "HMMER 2.1.1" "HMMER Manual"

.SH NAME
.TP 
getseq - get a sequence from a flatfile database.

.SH SYNOPSIS
.B getseq
.I [options]
.I seqname

.SH DESCRIPTION

.B getseq
retrieves the sequence named
.I seqname
from a sequence database.

.PP
Which database is used is controlled by the
.B -d 
and 
.B -D
options, or "little databases" and "big
databases".
The directory location of "big databases" can
be specified by environment variables,
such as $SWDIR for Swissprot, and $GBDIR
for Genbank (see
.B -D 
for complete list). 
A complete file path must be specified
for "little databases".
By default, if neither option is specified
and the name looks like a Swissprot identifier
(e.g. it has a _ character), the $SWDIR
environment variable is used to attempt
to retrieve the sequence 
.I seqname
from Swissprot.

.PP
A variety of other options are available which allow
retrieval of subsequences
.RI ( -f,-t );
retrieval by accession number instead of
by name 
.RI ( -a );
reformatting the extracted sequence into a variety
of other formats
.RI ( -F );
etc.

.PP
If the database has been GSI indexed, sequence
retrieval will be extremely efficient; else,
retrieval may be painfully slow (the entire
database may have to be read into memory to
find 
.IR seqname ).
GSI indexing
is recommended for all large or permanent 
databases. 

.SH OPTIONS

.TP
.B -a 
Interpret 
.I seqname
as an accession number, not an identifier.

.TP 
.BI -d " <seqfile>"
Retrieve the sequence from a sequence file named
.I <seqfile>.
If a GSI index 
.I <seqfile>.gsi
exists, it is used to speed up the retrieval.

.TP
.BI -f " <from>"
Extract a subsequence starting from position
.I <from>,
rather than from 1. See
.B -t.
If 
.I <from> 
is greater than
.I <to>
(as specified by the
.B -t
option), then the sequence is extracted as 
its reverse complement (it is assumed to be
nucleic acid sequence).

.TP
.B -h
Print brief help; includes version number and summary of
all options, including expert options.

.TP
.BI -o " <outfile>" 
Direct the output to a file named
.I <outfile>.
By default, output would go to stdout. 

.TP
.BI -r " <newname>"
Rename the sequence
.I <newname>
in the output after extraction. By default, the original
sequence identifier would be retained. Useful, for instance,
if retrieving a sequence fragment; the coordinates of
the fragment might be added to the name (this is what Pfam
does).

.TP
.BI -t " <to>"
Extract a subsequence that ends at position
.I <to>,
rather than at the end of the sequence. See
.B -f.
If 
.I <to> 
is less than
.I <from>
(as specified by the
.B -f
option), then the sequence is extracted as 
its reverse complement (it is assumed to be
nucleic acid sequence)

.TP
.BI -D " <database>"
Retrieve the sequence from the main sequence database
coded 
.I <database>. For each code, there is an environment
variable that specifies the directory path to that
database.
Recognized codes and their corresponding environment
variables are
.I -Dsw
(Swissprot, $SWDIR);
.I -Dpir
(PIR, $PIRDIR);
.I -Dem
(EMBL, $EMBLDIR);
.I -Dgb
(Genbank, $GBDIR);
.I -Dwp 
(Wormpep, $WORMDIR); and
.I -Dowl
(OWL, $OWLDIR).
Each database is read in its native flatfile format.

.TP
.BI -F " <format>"
Reformat the extracted sequence into a different format.
(By default, the sequence is extracted from the database
in the same format as the database.) Available formats
are
.B embl, fasta, genbank, gcg, strider, zuker, ig, pir, squid,
and
.B raw.

.SH SEE ALSO

.PP
alistat getseq seqstat sreformat

.SH AUTHOR

This software and documentation is Copyright (C) 1992-1998 Washington
University School of Medicine.  It is freely distributable under terms
of the GNU General Public License. See COPYING in the source code
distribution for more details, or contact me.

.nf
Sean Eddy
Dept. of Genetics
Washington Univ. School of Medicine
4566 Scott Ave.
St Louis, MO 63110 USA
Phone: 1-314-362-7666
FAX  : 1-314-362-7855
Email: eddy@genetics.wustl.edu
.fi


