Bah, xslt has problems with comments. It will bail out with

   file.xml:1602: parser error : Comment not terminated

when finding things like this:

<!--

bla bla bla
-rw-r--r-- 1 bach bach     954473 2010-01-27 20:48 rel8593a_info_featuresummary.

-->


Therefore, instead of commenting out, things go to attic *sigh*




  <sect1 id="sect1_est_walkthroughs">
    <title>
      Walkthroughs
    </title>
    <para>
    </para>
    <para>
      These walkthroughs use "msd" as project name (acronym for My Simple Dataset),
      please replace that with your own project name according to the MIRA naming
      convention.
    </para>
    <sect2 id="sect2_mira_with_jobest">
      <title>
	mira with "--job=est"
      </title>
      <para>
      </para>
      <sect3 id="sect3_input:_one_strain_sanger_without_adaptors_and_no_xml">
	<title>
	  Example: One strain, Sanger without vectors and no XML
	</title>
	<para>
	  Given is just a FASTA and FASTA quality file, where the Sanger
	  sequencing vector sequences and problematic things (like bad
	  quality) have been either completely removed from the data or were
	  masked with "X". Apart from that, no further processing (poly-A
	  removal etc.) was done. Your directory looks like this:
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>ls -l</userinput>
-rwxr--r-- 1 bach bach 15486163 2009-02-22 21:01 msd_in.sanger.fasta
-rwxr--r-- 1 bach bach 38017687 2009-02-22 21:01 msd_in.sanger.fasta.qual</screen>
	<para>
	</para>
	<para>
	  Then, use this command:
	</para>
	<screen>
<prompt>$</prompt> <userinput>mira --project=msd
  --job=denovo,est,accurate,sanger
  SANGER_SETTINGS
  -CL:qc=no
  &gt;&amp; log_assembly.txt</userinput></screen>
	<para>
	</para>
	<para>
	  We switch off the Sanger quality clips because bad quality is
	  already trimmed away by your pipeline.
	</para>
      </sect3>
      <sect3 id="sect3_input:_one_strain_454_with_xml_ancillary_data">
	<title>
	  Example: One strain, 454 with XML ancillary data
	</title>
	<para>
	  Like above, but this time 454 sequencing and the FASTA files contain
	  everything (including remaining adaptors and bad quality), but
	  there's a XML with ancillary data which contains all necessary clips
	  (like generated by, e.g., <command>sff_extract</command>):
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>ls -l</userinput>
-rwxr--r-- 1 bach bach 15486163 2009-02-22 21:01 msd_in.454.fasta
-rwxr--r-- 1 bach bach 38017687 2009-02-22 21:01 msd_in.454.fasta.qual
-rwxr--r-- 1 bach bach 10433244 2009-02-22 21:01 msd_traceinfo_in.454.xml</screen>
	<para>
	  Then, use this command:
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput> mira --project=msd
  --job=denovo,est,accurate,454
  454_SETTINGS
  -CL:qc=no
  &gt;&amp; log_assembly.txt</userinput></screen>
	<para>
	</para>
	<para>
	  We just switch off our quality clip for 454 (and load the quality
	  clips from the XML), poly-A removal is performed by MIRA. Loading of
	  TRACEINFO XML data must not be switched on as it's the default for
	  454 data.
	</para>
      </sect3>
      <sect3 id="sect3_input:_one_strain_454_with_xml_ancillary_data_polya_already_removed">
	<title>
	  Example: One strain, 454 with XML ancillary data, poly-A already removed.
	</title>
	<para>
	  Like above, but this time the data was pre-processed by another program
	  to mask the poly-A stretches with X:
	</para>
	<screen>
<prompt>bach@arcadia:</prompt>$ <userinput>ls -l</userinput>
-rwxr--r-- 1 bach bach 15486163 2009-02-22 21:01 msd_in.454.fasta
-rwxr--r-- 1 bach bach 38017687 2009-02-22 21:01 msd_in.454.fasta.qual
-rwxr--r-- 1 bach bach 10433244 2009-02-22 21:01 msd_traceinfo_in.454.xml</screen>
	<para>
	  Then, use this command:
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>mira --project=msd
  --job=denovo,est,accurate,454
  454_SETTINGS
  -CL:qc=no:cpat=no
  &gt;&amp; log_assembly.txt</userinput>
	</screen>
	<para>
	</para>
	<para>
	  We just switch off our quality clip (and load the quality clips from
	  the XML) and also switch off poly-A clipping. Remember, never
	  perform poly-A/T clipping twice on a data set.
	</para>
      </sect3>
      <sect3 id="sect3_input:_two_strains_454_with_xml_ancillary_data_polya_already_removed">
	<title>
	  Example: Two strains, 454 with XML ancillary data, poly-A already
	  removed.
	</title>
	<para>
	  Like above, but this time we assign reads to different
	  strains. This can happen either by putting the strain information
	  into the XML file (using the <literal>strain</literal> field of the
	  NCBI TRACEINFO format definition) or by using a two column,
	  tab-delimited file which mira loads on request.
	</para>
	<para>
	  As written. when using XML no change to the command line from the
	  last example would be needed. This example uses the extra file with
	  strain information. The file
	  <filename>msd_straindata_in.txt</filename> contains key value pair
	  information on the relationship of reads to strains and looks like
	  this (gnlti* are name of reads):
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>cat msd_straindata_in.454.txt</userinput>
gnlti136478626 tom
gnlti136479357 tom
gnlti136479063 tom
gnlti136478624 jerry
gnlti136479522 jerry
gnlti136477918 jerry</screen>
	<para>
	  Then, use this command (note the additional <arg>-LR:lsd</arg>
	  option):
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>mira --project=msd
  --job=denovo,est,accurate,454
  454_SETTINGS
  -LR:lsd=yes
  -CL:qc=no:cpat=no
  &gt;&amp; log_assembly.txt</userinput>
	</screen>
      </sect3>
    </sect2>
    <sect2 id="sect2_mirasearchestsnps">
      <title>
	miraSearchESTSNPs
      </title>
      <para>
      </para>
      <sect3 id="sect3_input:_two_strains_sanger_with_masked_sequences_no_xml">
	<title>
	  Example: Two strains, Sanger with masked sequences, no XML
	</title>
	<para>
	  Given just a FASTA and FASTA quality file, where the Sanger
	  sequencing vectors and all sequencing related things (like bad
	  quality) have been either completely removed from the data or were
	  masked with "X". Apart from that, no further processing (poly-A
	  removal etc.) was done.
	</para>
	<para>
	  You have <emphasis>n</emphasis> strains (in this
	  example <emphasis>n</emphasis>=2) called "tom" and "jerry"
	</para>
	<para>
	  Your directory looks like this:
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>ls -l</userinput>
-rw-r--r-- 1 bach bach  5276 2009-02-22 21:23 msd_in.sanger.fasta
-rw-r--r-- 1 bach bach 13827 2009-02-22 21:23 msd_in.sanger.fasta.qual
-rw-r--r-- 1 bach bach   120 2009-02-22 21:27 msd_straindata_in.txt</screen>
	<para>
	  The file <filename>msd_straindata_in.txt</filename> contains key
	  value pair information on the relationship of reads to strains and
	  looks like this (gnlti* are name of reads):
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>cat msd_straindata_in.txt</userinput>
gnlti136478626 tom
gnlti136479357 tom
gnlti136479063 tom
gnlti136478624 jerry
gnlti136479522 jerry
gnlti136477918 jerry</screen>
	<para>
	  To assemble, use this:
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>miraSearchESTSNPs
  --project=msd
  --job=denovo,accurate,sanger,esps1
  &gt;&amp;log_assembly_esps1.txt</userinput></screen>
	<para>
	  Note that the results of this first step are in sub-directories
	  prefixed with "step1".
	</para>
	<para>
	  When the first step finished, continue with this (note that no
	  "--project" is given here):
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>miraSearchESTSNPs
  --job=denovo,accurate,esps2
  &gt;&amp;log_assembly_esps2.txt</userinput>
	</screen>
	<para>
	  Note that the results of this second step are in sub-directories
	  prefixed with "tom", "jerry" and "remain". You will find in each
	  directory the clean transcripts from every strain/organism.
	</para>
	<para>
	  To see which SNPs exist between both "tom" and "jerry", launch the
	  third step:
	</para>
	<screen>
<prompt>bach@arcadia:$</prompt> <userinput>miraSearchESTSNPs
  --job=denovo,accurate,esps3
  &gt;&amp;log_assembly_esps3.txt</userinput>
	</screen>
	<para>
	</para>
	<para>
	  Note that the results of this third step are in sub-directories
	  prefixed with "step3".
	</para>
	<para>
	  In the <filename>step3_d_results</filename> directory for example,
	  you can transform the CAF file into a gap4 database and then look at
	  the SNPs searching for the tags SROr, SIOr and SAOr.
	</para>
      </sect3>
    </sect2>
  </sect1>
