I'm going to thank and for most of the methods that I'm going to explain in this thread. So you have a list of books you need to buy for college. There are some really great resources for finding them online, but I'm going to run through the easiest methods for searching all of them here.
First, let's start off by using the method that will allow you to find the vast majority of books. It tends to be easier if you can search multiple sites at once, and this is the easiest way to do it. There was a search engine that made 3 years ago, but by now most of the sites are either dead or have changed the url. I decided to make a new one, this time after reviewing 100+ sites and adding to working ones to the custom search engine.
Sites it indexes:. Also search on since it's not showing up in the search results for some odd reason. This should be able to find most of your books. Unfortunately, not every book ever published is going to be on the internet. Trying 'ext:pdf (title of book)' into Google might also yield some results, just be careful what you click on. So, in order to find the ones you can't find, there a few other ways to find them. You can download ebooks via irc.
The easiest way to search is on a search engine, like this one:. Then follow the last part of 's post: IRC If you don't know how to use IRC to download books, check out this link or read this: IRC You obviously need an IRC client installed, a quick google search should find you a bunch of decent free ones. Load the client and then type: /server irc.irchighway.net This will connect you to the irchighway server.
Once you're on there, type: /join #ebooks This will take you into the ebooks channel, which has more than 2.2Terabytes of ebooks for free download. Once you're in, all you have to do to find what you want is to simply type: @search 'Author or book title'. Removing the inverted commas.
You'll get a message saying that your search has been accepted. A few seconds later a searchbot will offer you a file, so accept the download. That will send you a.txt file that's zipped. Extract that and you'll get a list of files.
I searched for Lian Hearn and got a list of 147 files. Ive heard some allusion to the idea of chopping off the binding of a book and using some kind of large automated scanner.
I imagined that some of those huge office printers or perhaps the newer ones have the ability to take a stack of papers in a little drawer (prob similar to the place you put paper for printing) then the scanner does it page by page. Not positive though. Just found a video of it: also came across this: though im sure this is much more involved.
The main thing weighing on my mind is how to even get started building a robotic arm. I mean too weak and it gets out of place and ends up swinging at the air for the last 200 pages, too strong and thing the could hurt someone or break soemthing let alone destroy the book. Still- i may try to make one.
Background Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent’s PGM, Pacific Biosciences’ RS and the Illumina MiSeq.
Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Results Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate.
Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. Sequencing technology is evolving rapidly and during the course of 2011 several new sequencing platforms were released.
Of note were the Ion Torrent Personal Genome Machine (PGM) and the Pacific Biosciences (PacBio) RS that are based on revolutionary new technologies. The Ion Torrent PGM “harnesses the power of semiconductor technology” detecting the protons released as nucleotides are incorporated during synthesis. DNA fragments with specific adapter sequences are linked to and then clonally amplified by emulsion PCR on the surface of 3-micron diameter beads, known as Ion Sphere Particles. The templated beads are loaded into proton-sensing wells that are fabricated on a silicon wafer and sequencing is primed from a specific location in the adapter sequence. As sequencing proceeds, each of the four bases is introduced sequentially. If bases of that type are incorporated, protons are released and a signal is detected proportional to the number of bases incorporated.
PacBio have developed a process enabling single molecule real time (SMRT) sequencing. Here, DNA polymerase molecules, bound to a DNA template, are attached to the bottom of 50 nm-wide wells termed zero-mode waveguides (ZMWs). Each polymerase is allowed to carry out second strand DNA synthesis in the presence of γ-phosphate fluorescently labeled nucleotides.
The width of the ZMW is such that light cannot propagate through the waveguide, but energy can penetrate a short distance and excite the fluorophores attached to those nucleotides that are in the vicinity of the polymerase at the bottom of the well. As each base is incorporated, a distinctive pulse of fluorescence is detected in real time. In recent years, the sequencing industry has been dominated by Illumina, who have adopted a sequencing-by-synthesis approach , utilizing fluorescently labeled reversible-terminator nucleotides, on clonally amplified DNA templates immobilized to an acrylamide coating on the surface of a glass flowcell. The Illumina Genome Analyzer and more recently the HiSeq 2000 have set the standard for high throughput massively parallel sequencing, but in 2011 Illumina released a lower throughput fast-turnaround instrument, the MiSeq, aimed at smaller laboratories and the clinical diagnostic market. Sequence generation Platform specific libraries were constructed for a set of microbial genomes Bordetella pertussis (67.7% GC, with some regions in excess of 90% GC content), Salmonella Pullorum (52% GC), Staphylococcus aureus (33% GC) and Plasmodium falciparum (19.3% GC, with some regions close to 0% GC content). We routinely use these to test new sequencing technologies, as together their sequences represent the range of genomic landscapes that one might encounter.
PCR-free Illumina libraries were uniquely barcoded, pooled and run on a MiSeq flowcell with paired 150 base reads plus a 6-base index read and also on a single lane of an Illumina HiSeq with paired 75 base reads plus an 8-base index read (Additional file: Table S1). Illumina libraries prepared with amplification using Kapa HiFi polymerase were run on a single lane of an Illumina GA IIx with paired 76 base reads plus an 8-base index read and on a MiSeq flowcell with paired 150 base reads plus a 6-base index read. PCR-free libraries represent an improvement over the standard Illumina library preparation method as they result in more even sequence coverage and are included here alongside libraries prepared with PCR in order to enable comparison to PacBio which has an amplification free workflow. Ion Torrent libraries were each run on a single 316 chip for a 65 cycles generating mean read lengths of 112–124 bases (Additional file: Table S2). Standard PacBio libraries, with an average of 2 kb inserts, were run individually over multiple SMRT cells, each using C1 chemistry, and providing ≥20x sequence coverage data for each genome (Additional file: Table S3).
The datasets generated were mapped to the corresponding reference genome as described in Methods. For a fair comparison, all sequence datasets were randomly down-sampled (normalized) to contain reads representing a 15x average genome coverage. Workflow All the platforms have library preparation protocols that involve fragmenting genomic DNA and attaching specific adapter sequences. Typically this takes somewhere between 4 and 8 hours for one sample.
In addition, the Ion Torrent template preparation has a two hour emulsion PCR and a template bead enrichment step. In the battle to become the platform with the fastest turnaround time, all the manufacturers are seeking to streamline library preparation protocols. Life Technologies have developed the Ion Xpress Fragment Library Kit that has an enzymatic “Fragmentase” formulation for shearing starting DNA, thereby avoiding the labour of physical shearing and potentially enabling complete library automation.
We tested this kit on our four genomes alongside the standard library kit with physical shearing and found both to give equal genomic representation (see Additional file: Figure S1 for results obtained with P. Illumina purchased Epicentre in order to package the Nextera technology with the MiSeq. Nextera uses a transposon to shear genomic DNA and simultaneously introduce adapter sequences. The Nextera method can produce sequencing ready DNA in around 90 minutes and gave us remarkably even genome representation (Additional file: Figures S2 and Additional file: Figure S3) with B.
Pertussis and S. Aureus, but produced a very biased sequence dataset from the extremely AT-rich P.
Falciparum genome. Genome coverage and GC bias. Figure 1 Genome coverage plots for 15x depth randomly downsampled sequence coverage from the sequencing platforms tested. A) The percentage of the B. Pertussis genome covered at different read depths; B) The number of bases covered at different depths for B. Pertussis; C) The percentage of the S. Aureus genome covered at different read depths; D) The number of bases covered at different depths for S.
Aureus; E) The percentage of the P. Falciparum genome covered at different read depths; and F) The number of bases covered at different depths for P. In the context of the GC-rich genome of B. Pertussis, most platforms gave similar uniformity of sequence coverage, with the Ion Torrent data giving slightly more uneven coverage. Aureus genome the PGM performed better. The PGM gave very biased coverage when sequencing the extremely AT-rich P. Falciparum genome (Figure ).
This affect was also evident when we plotted coverage depth against GC content (Additional file: Figure S4). Whilst the PacBio platform gave a sequence dataset with quite even coverage on GC and extremely AT-rich contexts, it did demonstrate slight but noticeable unevenness of coverage and bias towards GC-rich sequences with the S.
Aureus genome. With the GC-neutral S. Pullorum genome all platforms gave equal coverage with unbiased GC representation (data not shown). The most dramatic observation from our results was the severe bias seen when sequencing the extremely AT-rich genome of P. Falciparum on the PGM. The result of this was deeper than expected coverage of the GC-rich var and subtelomeric regions and poor coverage within introns and AT-rich exonic segments (Figure ), with approximately 30% of the genome having no sequence coverage whatsoever.
This bias was observed with libraries prepared using both enzymatic and physical shearing (Additional file: Figure S1). Figure 2 Artemis genome browser screenshots illustrating the variation in sequence coverage of a selected region of P. Falciparum chromosome 11, with 15x depth of randomly normalized sequence from the platforms tested. In each window, the top graph shows the percentage GC content at each position, with the numbers on the right denoting the minimum, average and maximum values. The middle graph in each window is a coverage plot for the dataset from each instrument; the colour code is shown above graph a).
Each of the middle graphs shows the depth of reads mapped at each position, and below that in B-D are the coordinates of the selected region in the genome with gene models on the (+) strand above and (−) strand below. A) View of the first 200 kb of chromosome 11. Graphs are smoothed with window size of 1000. A heatmap of the errors, normalized by the amount of mapping reads is included just below the GC content graph (PacBio top line, PGM middle and MiSeq bottom).
B) Coverage over region of extreme GC content, ranging from 70% to 0%. C) Coverage over the gene PF3D71103500. D) Example of intergenic region between genes PF3D71104200 and PF3D71104300. The window size of B, C and D is 50 bp. In a recent study to investigate the optimal enzyme for next generation library preparation , we found that the enzyme used for fragment amplification during next generation library preparation can have a significant influence on bias.
We found the enzyme Kapa HiFi amplifies fragments with the least bias, giving even coverage, close to that obtained without amplification. Since the PGM has two amplification steps, one during library preparation and the other emulsion PCR (emPCR) for template amplification, we reasoned that this might be the cause of the observed bias. Substituting the supplied Platinum Taq enzyme with Kapa HiFi for the nick translation and amplification step during library preparation profoundly reduced the observed bias (Figure ). We were unable to further improve this by use of Kapa HiFi for the emPCR (results not shown). Figure 3 The effect of substituting Platinum HiFi PCR supermix with Kapa HiFi in the PGM library prep amplification step. A) The percentage of the P.
Falciparum genome covered at different read depths. The blue line shows the data obtained with the recommended Platinum enzyme and the green line with Kapa HiFi.
The red line depicts ideal coverage behavior. B) The number of bases covered at different depths. C) Sequence representation vs.
GC-content plots. Of the four genomes sequenced, the P. Falciparum genome is the largest and most complex and contains a significant quantity of repetitive sequences. Falciparum to analyse the effect of read length versus mappability.
As the PacBio pipeline doesn’t generate a mapping quality value and to ensure a fair comparison, we remapped the reads of all technologies using the k-mer based mapper, SMALT , and then analysed coverage across the P. Falciparum genome (Additional file: Table S4). This data confirms the poor performance of Ion Torrent on the P. Falciparum genome, as only 65% of the genome is covered with high quality (Q20) reads compared to 98-99% for the other platforms.
Whilst the mean mapped readlength of the PacBio reads with this genome was 1336 bases, average subread length (the length of sequence covering the genome) is significantly less (645 bases). The short average subread length is due to preferential loading of short fragment constructs in the library and the effect of lag time (non-imaged bases) after sequencing initiation, the latter resulting in sequences near the beginning of library constructs not being reported. As the median length of the PacBio subreads for this data set are just 600 bases, we compared their coverage with an equivalent amount of in silico filtered reads of 620 bases. This led to a very small decrease in the percentage of bases covered. Using paired reads on the Illumina MiSeq, however, gave a strong positive effect, with 1.1% more coverage being observed from paired-end reads compared to single-end reads. We observed error rates of below 0.4% for the Illumina platforms, 1.78% for Ion Torrent and 13% for PacBio sequencing (Table ).
The number of error-free reads, without a single mismatch or indel, was 76.45%, 15.92% and 0% for, MiSeq, Ion Torrent and PacBio, respectively. The error heatmap in Figure shows that the PacBio errors are distributed evenly over the chromosome. We manually inspected the regions where Ion Torrent and Illumina generated more errors. Illumina produced errors after long ( 20-base) homopolymer tracts (Figure ). Figure 4 Illustration of platform-specific errors.
The panels show Artemis BAM views with reads (horizontal bars) mapping to defined regions of chromosome 11 of P. Falciparum from PacBio (P; top), Ion Torrent (I; middle) and MiSeq (M; bottom). Red vertical dashes are 1 base differences to the reference and white points are indels.
A) Illustration of errors in Illumina data after a long homopolymer tract. Ion torrent data has a drop of coverage and multiple indels are visible in PacBio data. B) Example of errors associated with short homopolymer tracts. Multiple insertions are visible in the PacBio Data, deletions are observed in the PGM data and the MiSeq sequences read generally correct through the homopolymer tract. C) Example of strand specific deletions (red circles) observed in Ion Torrent data.
Also evident in the MiSeq data, were strand errors due to the GGC motif. Following the finding that the motif GGC generates strand-specific errors, we analyzed this phenomenon in the MiSeq data for P.
Falciparum (Additional file: Table S5). We observed that the error is mostly generated by GC-rich motifs, principally GGCGGG.
Explain Pain Video
We found no evidence for an error if the triplet after the GGC is AT-rich. Other MiSeq datasets also showed this artifact (data not shown).
In addition to this being a strand-specific issue, it appears that this is a read-specific phenomenon. Whilst there is a quality drop in the first read following these GC-rich motifs, there is a striking loss of quality in read 2, where the reads have nearly half the mean quality value compared to the read 1 reads for GC-rich triplets that follow the GGC motif. We could observe this low quality in read 2 in all our analysed Illumina lanes. For AT-rich motifs the ratio is nearly 1 (1.03). Ion Torrent didn’t generate reads at all for long ( 14-base) homopolymer tracts, and could not predict the correct number of bases in homopolymers 8 bases long.
Very few errors were observed following short homopolymer stretches in the MiSeq data (Figure ). Additionally, we observed strand-specific errors in the PGM data but were unable to associate these with any obvious motif (Figure ). In order to determine whether or not the higher error rates observed with the PGM and PacBio affected their ability to call SNPs, we aligned the reads from the S. Aureus genome, for which all platforms gave good sequence representation, against the reference genome of the closely related strain USA300FPR3757 , and compared the SNPs called against those obtained by aligning the reference sequences of the two genomes (Figure and Additional file: Table S6).
In order to create a fair comparison we initially used the same randomly normalized 15x datasets used in our analysis of genome coverage, which according to the literature is sufficient to accurately call heterozygous variants but found that that was insufficient for the PacBio datasets where a 190x coverage was used. Figure 5 Accuracy of SNP detection from the S. Aureus datasets generated from each platform, compared against the reference genome of its close relative S. Aureus USA300FPR3757. Both the Torrent server variant calling pipeline and SAMtools were used for Ion Torrent data; SAMtools was used for Illumina data and SMRT portal pipeline for PacBio data.
A) The percentage of SNPs detected using each platform overall (blue bar), and outside of repeats, indels and mobile genetic elements (red bar). B) The number of incorrect SNP calls for each platform overall (blue bar), and outside of repeats, indels and mobile genetic elements (red bar). Overall the rate of SNP calling was slightly higher for the Ion Torrent data than for Illumina data (chi square p value 3.15E-08), with approximately 82% of SNPs being correctly called for the PGM and 68-76% of the SNPs being detected from the Illumina data (Figure ). Conversely, the rate of false SNP calls was higher with Ion Torrent data than for Illumina data (Figure ). SNP calling from PacBio data proved more problematic, as existing tools are optimized for short-read data and not for high error-rate long-read data. We were reliant on SNPs called by the SMRT portal pipeline for this analysis.
Our results showed that SNP detection from PacBio data was not as accurate as that from the other platforms, with overall only 71% of SNPs being detected and 2876 SNPs being falsely called (Additional file: Table S6). Amongst the datasets obtained from the Illumina sequencers, the percentage of correct SNP calls was higher for the MiSeq (76%) than the GAIIx (70%) data than for that obtained from the HiSeq (69%), despite the same libraries being run on both MiSeq and HiSeq. The use of Nextera library preparation gave similar results with 76% of SNPs being correctly called. It should be noted that we found the inbuilt automatic variant calling inadequate on both MiSeq and PGM, with MiSeq reporter calling just 6.6% of variants and Torrent suite 1.5.1 calling only 1.4% of variants. A key feature of these new platforms is their speed.
Decreasing run time has clear advantages particularly within the clinical sequencing arena, but poses challenges in itself. Whilst manufacturers may state library prep times on the order of a couple of hours, these times don’t include upfront QC and library QC and quantification. Also, typical library prep times quoted usually apply to processing of only one sample; i.e., pipetting time is largely ignored.
Purchasers of sequencing instruments will want to keep them running at full utilization, in order to maximize their investment and will also want to pool multiple samples on single runs for economic reasons. To obtain maximum throughput, users must consider the whole process, potentially investing in ancillary equipment and robotics to create an automated pipeline for the preparation of large numbers of samples. To process large numbers of samples quickly, a facility’s instrument base must be large enough to avoid sample backlogs. With this in mind, manufacturers are seeking to develop more streamlined sample-prep protocols that will facilitate timely sample loading.
Here we have tested two such developments: enzymatic fragmentation and the Nextera technique. We conclude that these methods can be very useful, but users must carefully evaluate the methods they use for their particular applications and for use with genomes of extreme base composition to avoid bias. Whilst the data generated using the Ion Torrent PGM platform has a higher raw error rate (1.8%) than Illumina data (. Genomic DNA P. Falciparum 3D7 genomic DNA was a gift from Prof Chris Newbold, University of Oxford, UK.
Bordetella pertussis ST24 genomic DNA was a gift from Craig Cummings, Stanford University School of Medicine, CA. Staphylococcus aureus TW20 genomic DNA was a gift from Jodi Lindsay, St George’s Hospital Medical School, University of London. Pullorum S449/87 genomic DNA was prepared at the Wellcome Trust Sanger Institute, UK. Illumina library construction DNA (0.5 μg in 120 μl of 10 mM Tris–HCl pH8.5) was sheared in an AFA microtube using a Covaris S2 device (Covaris Inc.) with the following settings: Duty cycle 20, Intensity 5, cycles/burst 200, 45 seconds. Sheared DNA was purified by binding to an equal volume of Ampure beads (Beckman Coulter Inc.) and eluted in 32 μl of 10 mM Tris–HCl, pH8.5.
End-repair, A-tailing and paired-end adapter ligation were performed (as per the protocols supplied by Illumina, Inc. Using reagents from New England Biolabs- NEB) with purification using a 1.5:1 ratio of standard Ampure to sample between each enzymatic reaction. PCR-free libraries were constructed according to Kozarewa et al. After ligation, excess adapters and adapter dimers were removed using two Ampure clean-ups, first with a 1.5:1 ratio of standard Ampure to sample, followed by a 0.7:1 ratio of Ampure beads.
PCR free libraries were then used as is. Libraries prepared with amplification were diluted to 2 ng/μl and 1 μl was used as template for PCR amplification with Kapa HiFi 2 x mastermix (KK2601, Kapa Biosystems). PCR reactions were performed in 0.2 μl thin-wall microtubes on an MJ tetrad thermal cycler with the following conditions: 94°C −2 minutes; 14 cycles of 94°C −20 seconds, 65°C −30 seconds, 72°C −30 seconds; 72°C - 3 minutes with 200nM final concentration of standard PE1.0 and modified multiplexing PE2.0 primers. After PCR, excess primers and any primer dimer were removed using two Ampure clean-ups, with a 1.5:1 ratio of Ampure then with a 0.8:1 ratio of Ampure beads. All libraries were quantified by real-time PCR using the SYBR Fast Illumina Library Quantification Kit (Kapa Biosystems) and pooled so as to give equal genome coverage from each library.
Illumina sequencing Each multiplexed library pool was sequenced on: i) an Illumina GAIIx instrument for 76 cycles from each end plus an 8 base-index sequence read, using version 2 chemistry, ii) an Illumina MiSeq for 151 cycles from each end plus an 8 base-index sequence read, iii) an Illumina HiSeq 2000 instrument for 75 cycles from each end plus an 8 base-index sequence read, using version 3 chemistry. Summary sequencing statistics are given in Additional file: Table S1. All runs had error rates, and associated sequence quality, that surpassed the minimum Illumina specifications. Ion torrent library preparation sequencing For the B. Pertussis, S.
Aureus and P. Falciparum genomes, library preparation was carried out using the Ion Xpress™ Fragment Library Kit, with 100 ng of DNA.
Adapter ligation, size selection, nick repair and amplification (8 cycles for B. Pertussis and S. Aureus, 6 cycles for P. Falciparum) were performed as described in the Ion Torrent protocol associated with the kit (Ion Xpress™ Fragment Library Kit - Part Number 4469142 Rev. Pullorum genome, library preparation was undertaken using the Ion Fragment Library Kit with 5 μg of DNA. The DNA was fragmented by adaptive focused acoustics using a Covaris S2 (Covaris Inc.) with AFA tubes as described in the protocol (Part Number 4467320 Rev. End repair, adapter ligation, nick repair and amplification (8 cycles) were also performed as described in the Ion Torrent protocol.
Size selection was performed using the LabChip XT (Caliper LifeSciences) and the LabChip XT DNA750 Assay Kit (Caliper LifeSciences), with collection between 175 bp and 220 bp. The Agilent 2100 Bioanalyzer (Agilent Technologies) and the associated High Sensitivity DNA kit (Agilent Technologies) were used to determine quality and concentration of the libraries. The amount of library required for template preparation was calculated using the Template Dilution Factor calculation described in the protocol. Emulsion PCR and enrichment steps were carried out using the Ion Xpress™ Template Kit and associated protocol (Part Number 4469004 Rev. Ion Sphere Particle quality assessment was carried out as outlined in an earlier version of this protocol (Part Number 4467389 Rev. B) for all samples because no benefit was seen with using the Ion Sphere Quality Control Kit as recommended in the later version of the protocol. The oligos used for this analysis were purchased from IDT (Integrated DNA Technologies Inc.).
Assessment of the Ion Sphere Particle quality was undertaken between the emulsion PCR and enrichment steps only. Ion torrent sequencing Sequencing was undertaken using 316 chips in all cases and barcoding was not used for these samples. The Ion Sequencing Kit v2.0 was used for all sequencing reactions, following the recommended protocol (Part Number 4469714 Rev. B) and Torrent Suite 1.5 was used for all analyses.
Summary sequencing statistics are given in Additional file: Table S2. PacBio library construction DNA (2.0-10 μg in 200 μl 10 mM Tris–HCl pH8.5) was sheared in an AFA clear mini-tube using a Covaris S2 device (Covaris Inc.) with the following settings: Duty cycle 20, Intensity 0.1, cycles/burst 1000, 600 seconds. Sheared DNA was purified by binding to 0.6X volume of pre-washed AMPure XP beads (Beckman Coulter Inc.), as per PacBio protocol 000-710-821-DRAFT (five times in purified water, one time in EB, reconstituted in original supernatant) and eluted in EB to 60 ng/μl. The sheared DNA was quantified on an Agilent 2100 Bioanalyzer using the 7500 kit. 1 μg of sheared DNA was end-repaired using the PacBio DNA Template Prep Kit 1.0 (Part Number 001-322-716) and incubated for 15 min at 25°C prior to another 0.6X AMPure XP clean up, eluting in 30 μl EB.
Blunt adapters were ligated before exonuclease incubation was carried out in order to remove all un-ligated adapters and DNA. Finally, two 0.6X AMPure bead clean ups are performed - removing enzymes and adapter dimers – the final “SMRT bells” being eluted in 10 μl EB. Final quantification was carried out on an Agilent 2100 Bioanalyzer with 1 μl of library. Using the SMRT bell concentration (ng/μl) and insert size previously determined, the PacBio-provided calculator was used to calculate the amounts of primer and polymerase used for the binding reaction. Using the PacBio DNA/Polymerase Binding Kit 1.0 (Part Number 001-359-802), primers are annealed and the proprietary polymerase is bound forming the “Binding Complex”. The Binding Complex can be stored as a long-term storage mix at −20°C or diluted for immediate sequencing. The quantity of SMRT bell determines whether a long-term storage mix can be used.
In this instance, there was ample genomic DNA from the four test genomes to allow long-term storage. PacBio sequencing Long-term storage mixes were diluted to the required concentration and volume with the provided dilution buffer and loaded into 96-well plates. These are loaded onto the instrument, along with DNA Sequencing Kit 1.0 (Part Number 001-379-044) and a SMRT Cell 8Pac. In all sequencing runs, 2x45 min movies were captured for each SMRT Cell loaded with a single binding complex. Primary filtering analysis was performed on the Blade Center server provided with the RS instrument, before this data was transferred off the Blade Center for secondary analysis in SMRT Portal using the SMRT analysis pipeline version 1.2.0.1.81002.
Summary sequencing statistics are given in Additional file: Table S3. Reference genomes Each reference genome was created using capillary sequence data with manual finishing and are available to download from. The methods used to sequence the genomes of P. Falciparum and S. Aureus TW20 have been published.
Data processing After sequencing, reads were mapped to each genome reference sequence using the manufacturers’ alignment tools, tmap for PGM and blasr for PacBio ( ). BWA was used for mapping reads from the Illumina GAIIx, MiSeq and HiSeq.
SAMtools was then used to generate pileup and coverage information from the mapping output. Genome coverage We counted the number of bases in the genome that were not covered by any reads (Coverage = 0) and those with less than 5x read coverage (Coverage.