Perl Program To Calculate Gc Content
- Perl Program To Calculate Gc Content Based
- Perl Program To Calculate Gc Content
- Perl Program To Calculate Gc Content Analysis
- Perl Program To Calculate Gc Content Analysis
- Perl Program To Calculate Gc Content Sample
What is GC Content? GC content is usually calculated as a percentage value and sometimes called G+C ratio or GC-ratio. GC-content percentage is calculated as Count (G + C)/Count (A + T + G + C). 100%. The GC content calculation algorithm has been integrated into our Codon Optimization Software, which serves our protein expression services.
Intermediate Perl
Bedtools.: a powerful toolset for genome arithmetic. Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle. The software available on the FTP site also includes a Perl script that is needed to unjustify FASTA files that are to be used by PatMatch. This simple script takes a FASTA file, with a single or multiple sequences, as input and outputs a file with each individual sequence on a single line. (10 pts) Write a Perl program that calculates the AT and GC content (i.e. The percentage of G and C, and the percentage of A and T) in a given sequence. You can make up your own dummy sequence and store it as a scalar variable for this question (no need to take it from a file here). GC Content Calculator. The program calculates the GC content of a given DNA/RNA sequence. Enter your DNA/RNA sequence in the box below: Results. GC content: 00.00%.
GC content is a very interesting property of DNA sequences because it is correlated to repeats and gene deserts. A simple way to calculate GC content is to divide the sum of G and C letters by the total number of nucleotides in the sequence. Let’s assume that you start with a string $sequence.
The WRONG way in which I initially did this was to convert the string to an array of letters, as shown here:
This is a very inefficient way of calculating the GC content, because arrays in Perl are quite expensive in terms of memory. The result of this was that I run out of memory quite quickly.
I found a more efficient approach by using the substr function, looping through the whole sequence, taking one base at a time. However, according to a colleague, Andy Jenkinson, it contains some bugs:
The reasons for being wrong, Andy argues, are that “it ignores the first character of the sequence because the substr function is zero-index based. The rounding at the end using S{6} also only works where there are >=6 characters in the resulting fraction – so a string like “ATCG” has a GC content of 0.5, but will appear to your application as zero. If you need to do this, you should use S{0,6}.”
I addition to this, he adds that whilst it solves the memory issue, [one] might also consider a much more CPU-friendly and simpler implementation:
He carried out a test simulation of #METHOD 3 for human chromosome 1 (247 million characters), which took 12 seconds with the same memory footprint as #METHOD 2, which took 111 seconds. Here is the source code for Andy’s simulation:
I have not had time to test #METHOD 3 yet, but I hope this last addition helps people.
Happy coding!
PrevNext
In this part of the Perl Tutorial we are going to talkabout the for loop in Perl. Some people also call it the C-style for loop,but this construct is actually available in many programming languages.
Perl for loop
The for keyword in Perl can work in two different ways.It can work just as a foreach loop works and it can actas a 3-part C-style for loop. It is called C-style thoughit is available in many languages.
I'll describe how this works although I prefer to write the foreachstyle loop as described in the section about perl arrays.
The two keywords for and foreach can be used as synonyms.Perl will work out which meaning you had in mind.
The C-style for loop has 3 parts in the controlling section.In general it looks like this code, though you can omit any ofthe 4 parts.
As an example see this code:
The INITIALIZE part will be executed once when the execution reaches that point.
Then, immediately after that the TEST part is executed. If this is false,the whole loop is skipped. If the TEST part is true then the BODY is executed followed bythe STEP part.
(For the real meaning of TRUE and FALSE, check the boolean values in Perl.)
Then comes the TEST again and it goes on and on, as long as the TEST executes to some true value.So it looks like this:
foreach
Perl Program To Calculate Gc Content Based
The above loop - going from 0 to 9 can be also written in a foreach loopand I think the intention is much clearer:
As I wrote the two are actually synonyms so some people use the for keywordbut write foreach style loop like this:
The parts of the perl for loop
Perl Program To Calculate Gc Content
INITIALIZE is of course to initialize some variable. It is executed exactly once.
TEST is some kind of boolean expression that tests if the loop should stop or if it should go on.It is executed at least once. TEST is executed one more time than either BODY or STEP are.
BODY is a set of statements, usually that's what we want to do repeatedtimes though in some cases an empty BODY can also make sense.Well, probably all those cases can be rewritten in some nice way.
Perl Program To Calculate Gc Content Analysis
STEP is another set of action usually used to increment or decrement some kind of an index.This too can be left empty if, for example, we make that change inside the BODY.
Perl Program To Calculate Gc Content Analysis
Infinite loop
Perl Program To Calculate Gc Content Sample
You can write an infinite loop using the for loop:
People usually write it with a while statement such as:
It is described in the partabout the while loop in perl.
perldoc
You can find the official description of the for-loop in theperlsyn section of thePerl documentation.
Published on 2013-03-26