Homer Software and Data Download

Step 1 - Build Index (takes a while, but only do this once):

After installing bowtie, the reference genome must first be "indexed" so that reads may be quickly aligned. You can download pre-made indecies from the bowtie website (check for those here first). Otherwise, to perform make your own from FASTA files, do the following:

Download FASTA files for the unmasked genome of interest if you haven't already (i.e. from UCSC)
From the directory containing the FASTA files, run the "bowtie-build" command. For example, for hg18:

/path-to-bowtie-programs/bowtie-build chr1.fa,chr2.fa,chr3.fa,...chrY.fa,chrM.fa hg18

Where ... are the rest of the *.fa files. This command will take a long time to run, but will produce several files named hg18.*.ebwt

Copy the *.ebwt files to the bowtie indexes directory so that bowtie knows where to find them later:

cp *.ebwt /path-to-bowtie-programs/indexes/

Step 2 - Align sequences with bowtie (perform for each experiment):

The most common output format for high-throughput sequencing is FASTQ format, which contains information about the sequence (A,C,G,Ts) and quality information which describes how certain the sequencer is of the base calls that were made. In the case of Illumina sequencing, the output is usually a "s_1_sequence.txt" file. In addition, much of the data available in the SRA, the primary archive of high-throughput sequencing data, is in this format. To map this data, run the following command:

/path-to-bowtie-programs/bowtie -q --best -m 1 -p <# cpu> <genome> <fastq file> <output filename>
Where <genome> would be hg18 from the index made above, <fastq file> could be "s_1_sequence.txt", and <output filename> something like "s_1_sequence.hg18.alignment.txt"

The parameters "--best" and "-m 1" are needed to make sure bowtie outputs only unique alignments. There are many options and many different ways to perform alignments, with different trade-offs for different types of projects - well beyond the scope of what I am describing here.

NOTE: HOMER contains automated parsing for uniquely aligned reads from output files generated with bowtie in this fashion. Homer also accepts *eland_result.txt and *_export.txt formats from the Illumina pipeline. If different programs are used, or special parsing of output files are needed, please parse/reformat alignment files to general BED format, which is also accepted by HOMER.

HOMER

Alignment of High-throughput Sequencing Data

Which reference genome (version) should I map my reads to?

Should I trim my reads when mapping to the genome?

Example - Alignment with bowtie:

Step 1 - Build Index (takes a while, but only do this once):

Step 2 - Align sequences with bowtie (perform for each experiment):

Back to ChIP-Seq Analysis