|
This is the old version of
the documentation: New
Version
ChIP-Seq Analysis: Step 1, Creating a "Tag Directory"
To facility the analysis of ChIP-Seq (or any other type of short read
re-sequencing data), it is useful to first transform the sequence
alignment into platform independent data structure representing the
experiment, analogous to loading the data into a database. HOMER
does this by placing all relevant information about the experiment into
a "Tag Directory", which is essentially a directory on your computer
that contains several files describing your experiment.
To create a "Tag Directory", you must have alignment files in one of
the following formats:
- BED format
- *.eland_result.txt or *_export.txt format from the Illumina
pipeline
- bowtie output format
If your alignment is in a different format, it is recommended that you
convert it into a BED file format:
Column1: chromosome
Column2: start position
Column3: end position
Column4: Name (or strand +/-)
Column5: Number of reads at this position
Column6: Strand +/-
Alternatively (or in combination), you can make tag directories from
existing tag directories or from tag files (explained below).
To make a tag directory, run the following command:
makeTagDirectory
<Output
Directory
Name>
[options] <alignment file1>
[alignment file 2] ...
Where the first argument must be the output directory (required).
If it does
not exist, it will be created. If it does exist, it will be
overwritten.
An example:
makeTagDirectory
Macrophage-PU.1-ChIP-Seq/
pu1.lane1.bed
pu1.land2.bed
pu1.lane3.bed
Several additional options exist for
makeTagDirectory. The program attempts to guess the format of
your alignment files, but if it is unsuccessful, you can force the
format with "-format <X>".
To
combine
tag
directories, for example when combining two separate
experiments into one, do the following:
makeTagDirectory
Combined-PU.1-ChIP-Seq/
-d
Exp1-ChIP-Seq/
Exp2-ChIP-Seq/ Exp3-ChIP-Seq/
What does makeTagDirectory do?
makeTagDirectory
basically parses through the alignment file and splits the tags into
separate files based on their chromosome. As a result, several
*.tags.tsv files are created in the output directory. These are
made to very efficiently return to the data during downstream
analysis. This also helps speed up the analysis of very large
data sets without running out of memory.
In the end, your output directory will contain several *.tags.tsv
files, as well as a file named "tagInfo.txt".
This
file
contains
information about your sequencing run, including the
total number of tags considered. This file is used by later
programs to quickly reference information about the experiment, and can
be manually modified to set certain parameters for analysis.
makeTagDirectory also
performs several quality control steps which are covered in the next
section.
Command line options of makeTagDirectory command:
Usage: makeTagDirectory
<directory> <alignment file 1> [file 2] ... [options]
Creates a
platform-independent 'tag directory' for later analysis.
Currently BED, Eland, and
bowtie files are accepted. The program will try to
automatically detect the
alignment format if not specified
Existing tag directories can
be added or combined to make a new one using -d/-t
If more than one format is
needed and the program cannot auto-detect it properly,
make separate tag
directories by running the program separately, then combine them.
Options:
-genome
<genome
name>
(specify genome for later analysis)
To
list
available
genomes, run "??"
-name
<experiment
name>
(optional, names the experiment)
-format
<X>
where
X can be: (with column specifications
underneath)
bed
-
BED
format files:
(1:chr,2:start,3:end,4:+/-
or
read
name,5:# tags,6:+/-)
bowtie
-
output
from bowtie (run with --best -k 2 options)
(1:read
name,2:+/-,3:chr,4:position,5:seq,6:quality,
7:NA,8:mismatch
info)
eland_result
-
output
from basic eland
(1:read
name,2:seq,3:code,4:#zeroMM,5:#oneMM,6:#twoMM,7:chr,
8:position,9:F/R,10-:mismatches
eland_export
-
output
from illumina pipeline (22 columns total)
(1-5:read
name
info,9:sequence,10:quality,11:chr,13:position,14:strand)
-C
(color
space
mapping with bowtie)
-keep
(keep
one
mapping of each read regardless if multiple equal
mappings exist)
-forceBED
(if
5th
column of BED file contains stupid values, like
mapping quality
instead
of
number
of tags, then ignore this column)
-d
<tag
directory>
[tag directory 2] ... (add Tag directory to
new tag directory)
-t
<tag
file>
[tag file 2] ... (add tag file i.e. *.tags.tsv to
new tag directory)
|