<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Catalyst Art &#187; Perl</title>
	<atom:link href="http://CatalystArt.reststop.com/category/perl/feed/" rel="self" type="application/rss+xml" />
	<link>http://CatalystArt.reststop.com</link>
	<description>great software for cool systems</description>
	<lastBuildDate>Mon, 23 Aug 2010 22:49:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Why Use Perl?</title>
		<link>http://CatalystArt.reststop.com/2010/02/22/why-use-perl/</link>
		<comments>http://CatalystArt.reststop.com/2010/02/22/why-use-perl/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:12:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://CatalystArt.reststop.com/?p=43</guid>
		<description><![CDATA[Perl: Strengths and Weaknesses
The reasons to use Perl are many, but there are also many roadblocks that have kept it out of production use at many companies.  Operations and system support teams were likely the first people to use Perl in production.  Here&#8217;s why:

Defined Support Structure vs. Volunteer Support
Management Approval or Lack Thereof
Rapid [...]]]></description>
			<content:encoded><![CDATA[<h3>Perl: Strengths and Weaknesses</h3>
<p>The reasons to use Perl are many, but there are also many roadblocks that have kept it out of production use at many companies.  Operations and system support teams were likely the first people to use Perl in production.  Here&#8217;s why:</p>
<ul>
<li>Defined Support Structure vs. Volunteer Support</li>
<li>Management Approval or Lack Thereof</li>
<li>Rapid Prototyping or Long Development Cycles</li>
<li>Powerful Capabilities and Libraries</li>
</ul>
<h3>Defined Support Structure vs. Volunteer Support</h3>
<p>Several years ago, I worked for a large multi-national corporation. They had a policy that every product we used had to have a support contract and an escalation path. The problem with Perl, was that it was free software, supported by the the author and other volunteers on the internet.</p>
<p>
Those of us on various support teams had learned about Perl and saw its potential.  Compared to C programming or sh and csh scripting, Perl was easier to use, easier to debug, and very powerful. However, our management still balked on allowing Perl to be used in any production capacity. We had service level agreements with our customers and needed to have a similar agreement with our suppliers.
</p>
<p>
As the popularity of Perl increased, more programmers kept asking to use it for production scripts.  Those who had to pass rigorous QA scrutiny were often denied its use, and given the choice of using sh, csh, C or Pascal.  Those of us who worked in small teams, however had free reign to choose our language of choice, as long as at least two members of the team could provide support.
</p>
<h3>Management Approval or Lack Thereof</h3>
<p>Finally, one manager wrote a business case, showing the benefits of using Perl, weighing having a support contract against the volunteer support of Perl.  He showed that even with a 24 x 7 support contract, software vendors rarely provided a quick solution, but rather a work-around until their development teams can replicate and solve the problem. In many cases we were dead in the water, until at least some work-around was created. With Perl, it was determined that not only was the volunteer support team quicker to come up with a solution, but several work-arounds were usually proposed within hours by different volunteers.</p>
<h3>Rapid Prototyping or Long Development Cycles</h3>
<p>Perl won out for support scripting, but full-blown applications were still written in C or Concurrent Pascal. The next big step was the time it took for development to bring a new release of their products into production. I was given a set of production tools, written in C, and a blank slate to migrate users from a mainframe platform onto Unix.</p>
<p>
Here is where Perl was able to shine.  Using a combination of sh, csh and Perl scripts, I was able to create a complete user environment in a matter of a couple of months working alone or with a partner.  Compare that against more than three years for six to twelve development teams working full time. Perl was able to prototype a design and implement it, tweaking it for changes to procedures in days or hours.  After we went into production, nearly every support script was written in Perl.  A big change from three years earlier.  The development department, however continued to work in C for at least another 10 years.
</p>
<h3>Less Expensive Than C</h3>
<p>To attest to Perl&#8217;s flexibility and power, I was tasked with rewriting an accounting data collection tool.  The original application was written in assembly language and used some arcane data communication protocols, including sending all data in ones complement (all 0 bits were sent as a 1, and all 1 bits sent as a 0) mode, and only allowing a single connection with a special binary data handshake. Due to development and QA cycles, I was told I needed to provide the final data in IBM labeled magnetic-tape image format.</p>
<p>
Perl came to the rescue. I wrote one program to connect to the accounting server, and collect the data, and a second program to read the collected data and format it as an IBM labeled tape image.  The project took three months to write and document. Two weeks to debug the data formats, and two weeks to pass the development&#8217;s QA department. Writing the project in Perl saved $1.5 million dollars in one-time development costs and more than $350 thousand dollars in annual production and maintenance costs.
</p>
<h3>User Contributed Library: CPAN</h3>
<p>Perl&#8217;s other strength is that there are literally thousands of <a href="www.cpan.org">user contributed library modules</a> and subroutines to do nearly every common programming tasks. The <a href="www.cpan.org">Comprehensive Perl Archive Network (CPAN)</a> has been online since October 1995.  As of February 2010, there are 17,477 modules authored by 7,994 different contributors.<br />
The popularity of the language has grown from the early days. Perl has been used as the glue between server applications and the web, with its own module, mod-perl for<br />
the widely used <a href="http://webdesign.about.com/od/apache/Apache_HTTP_Web_Server.htm>apache web server</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://CatalystArt.reststop.com/2010/02/22/why-use-perl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Forgotten &#8216;tr&#8217; Function</title>
		<link>http://CatalystArt.reststop.com/2010/02/22/the-forgotten-tr-function/</link>
		<comments>http://CatalystArt.reststop.com/2010/02/22/the-forgotten-tr-function/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:10:28 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://CatalystArt.reststop.com/?p=41</guid>
		<description><![CDATA[
One of the most forgotten functions in Perl is the &#8216;tr&#8216; or translate function.
I say forgotten, because other than changing the case of letters from upper to lower, most programmers don&#8217;t think much of the &#8216;tr&#8216; function. By definition:

tr/SEARCHLIST/REPLACEMENTLIST/[c][d][s]
y/SEARCHLIST/REPLACEMENTLIST/[c][d][s]

This function translates all occurrences of the characters found in the search list to the corresponding character [...]]]></description>
			<content:encoded><![CDATA[<p>
One of the most forgotten functions in Perl is the &#8216;<em>tr</em>&#8216; or translate function.<br />
I say forgotten, because other than changing the case of letters from upper to lower, most programmers don&#8217;t think much of the &#8216;<em>tr</em>&#8216; function. By definition:</p>
<blockquote><p>
tr/SEARCHLIST/REPLACEMENTLIST/[c][d][s]<br />
y/SEARCHLIST/REPLACEMENTLIST/[c][d][s]
</p></blockquote>
<p>This function translates all occurrences of the characters found in the search list to the corresponding character in the replacement list.
</p>
<p>
In general use we see:</p>
<blockquote><p>
$value =~ tr/A-Z/a-z/;
</p></blockquote>
<p>which simply translated all upper-case characters in <em>$value</em> into lower-case characters. In each field, a range of characters is specified. &#8220;A-Z&#8221; for the SEARCHLIST and &#8220;a-z&#8221; for the REPLACEMENTLIST.
</p>
<h3>Direct or Complement?</h3>
<p>
What is often overlooked or forgotten is that lists can be specified directly or as a complement. In this instance &#8220;<em>complement</em>&#8221; means everything except. So for instance /A-Za-z0-9/ refers to all alphanumeric characters. As a complement it means all non-alphanumeric characters.  So, for example, if you want to disallow punctuation and other non-printing characters, you can translate them to spaces or even just delete them.</p>
<blockquote><p>
$letters =~ tr/A-Za-z0-9/ /c;<br />
$letters =~ tr/A-Za-z0-9//cd;<br />
$letters =~ tr/A-Za-z0-9/ /cs;
</p></blockquote>
<p>The first example translates all non-alphanumerics to spaces, and the second example deletes them from the variable <em>$letters</em>. The third example uses the squeeze option (s) to combine the translated letters so that consecutive translated letters of the same destination translation are compressed together into a single character.  This takes what would normally be multiple spaces, or multiple occurrences of any single letter and combines them to a single space.  If $letters contained the string &#8220;ABC::D,E,(F+G)/H&#8221; the result would be &#8220;ABC space space D space E space space F space G space space H&#8221;. With the (s) option, each occurrence of &#8220;space space&#8221; becomes a single &#8220;space&#8221;.
</p>
<h3>Counting Characters</h3>
<p>
One other forgotten use of the &#8216;tr&#8217; function is that it also returns a value which is the count of  translations made to individual characters.  Thus, it can be used as a quick count, even if it translates a given character into itself.</p>
<blockquote><p>
$plus = tr/+/+/;<br />
$letters = tr/A-Za-z/A-Za-z/;
</p></blockquote>
<p>The first example counts the number of plus (+) characters in the default string &#8220;<em>$_</em>&#8220;, and the second example counts the number of letters in &#8220;<em>$_</em>&#8220;.  Please notice the difference in these examples.  The first set of examples is using &#8220;=~&#8221; which says to apply the function to the variable on the left of the &#8220;=~&#8221;.  The last set of examples uses a simple &#8220;=&#8221; to assign the count of translated characters to the variables &#8220;<em>$plus</em>&#8221; and &#8220;<em>$letters</em>&#8220;.</p>
]]></content:encoded>
			<wfw:commentRss>http://CatalystArt.reststop.com/2010/02/22/the-forgotten-tr-function/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Real World Perl</title>
		<link>http://CatalystArt.reststop.com/2010/02/12/28/</link>
		<comments>http://CatalystArt.reststop.com/2010/02/12/28/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 20:40:27 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://CatalystArt.reststop.com/?p=28</guid>
		<description><![CDATA[Real World Perl I

Those of you who have ever taken a class in programming, know that many of the class assignments seem to have no real-world applications. The assignment below is one that solved a real problem quickly and easily.

The Problem

We need to create a list of words for a computer game. The words should [...]]]></description>
			<content:encoded><![CDATA[<h3>Real World Perl I</h3>
<p>
Those of you who have ever taken a class in programming, know that many of the class assignments seem to have no real-world applications. The assignment below is one that solved a real problem quickly and easily.
</p>
<h3>The Problem</h3>
<p>
We need to create a list of words for a computer game. The words should be grouped by the number of letters in each word. For our purposes, we wanted to collect words that are 4, 5 and 6 letters long. These will be the legal words the player may use to guess the secret word.
</p>
<h3>The Task</h3>
<ul>
<li><a href="#wordlist">Locate a dictionary of words</a></li>
<li><a href="#wordlength">Separate the words by length</a></li>
<li><a href="#savewords">Save words that are 4, 5 or 6 letters long in separate files</a></li>
<li><a href="#runningprogram">Running the completed program</a></li>
<li><a href="#reusingprogram">Reusing the Program</a></li>
</ul>
<p><a name="wordlist"></a></p>
<h3>Where to Get a List of Words</h3>
<p>
On many modern computer systems, such as Unix, Linux, or Mac OS/X (links to Unix, Linux, Mac OS/X) a common set of system files come with the computer. One of these is a dictionary that can be found at <em>/usr/dict/words</em> or <em>/usr/share/dict/words</em>. It is a simple list of English words of varying length sorted in alphabetical order. On some systems the words may be in other languages, but English is the language used on many Unix-like systems.
</p>
<p><a name="wordlength"></a></p>
<h3>Find the Word Length</h3>
<p>
In Perl, as in many computer languages, text words are a collection of characters stored in memory sequentially and often referred to as a string of characters or, simply, a string.  As it turns out, Perl has a handy function, <em>length</em>, which returns the length of a string. So, to find the length of a particular word, we need to read the list of words and get the length of each word. If it matches our criteria of 4, 5 or 6 letters long, we want to remember it.
</p>
<h3>Starting the Program</h3>
<p>
The first thing we need to do is to read the dictionary file and get the word we want to examine.  This will be a loop, which reads all the words in the file.
</p>
<blockquote><p><strong><br />
<code><br />
while ( $line = &lt;stdin&gt; ) {<br />
}<br />
</code><br />
</strong></p></blockquote>
<p>
The <em>while</em> statement checks a condition and then executes the statements (if any) that appear between the open brace &#8220;{&#8221; and the close brace &#8220;}&#8221;. In this example, there are no statements to execute.  The condition is the result of the assignment of data from the file handle called &#8220;stdin&#8221; (link to file handles, stdin). The stdin file handle is connected to standard input on a Unix system and is the standard place to get input for the majority of programs on Unix systems.  As this is the most common use, Perl allows for a shorthand notation &#8220;&lt;&gt;&#8221;:
</p>
<blockquote><p><strong><br />
<code><br />
while ( $line = &lt;&gt; ) {<br />
}<br />
</code><br />
</strong></p></blockquote>
<h3>Getting the Word</h3>
<p>
In this example, we now have a line of text in our variable,<em> $line</em>. This is not a word.  We know the input file contains a list of words, one per line. However, when we read a line of input, the text file contains a special end of line character (EOL) or sequence of characters. (See <em>End of Line</em> in the Glossary section for more details). Perl provides two functions, <em>chop</em> and <em>chomp</em> (added in Perl5). The <em>chop</em> function removes the last character from the end of a string of characters, returning the deleted character as the value of function. The <em>chomp</em> function is more selective (and safer than chop) as it only removes the last character if it is an end of line character or sequence.
</p>
<p>
Our program now becomes:
</p>
<blockquote><p><strong><br />
<code>
<pre>
while ( $line = &lt;&gt; ) {
    chomp( $word = $line );
}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
Note that the chomp function surrounds the assignment of <em>$line</em> to <em>$word</em>. This is very important. We want to remove the last character if it is an EOL character, and we want the variable <em>$word</em> to contain the result.  In effect, we are actually assigning a copy of the contents of <em>$line</em> to <em>$word</em>, and then calling the <em>chomp</em> function on <em>$word</em>.</p>
<p><h3>Checking the Length</h3>
<p>
Again, we only want to keep words that are 4, 5 or 6 letters long. The next step is to compare the length of the word against our criteria. However, notice we want words of three different lengths.  One of the most powerful features of Unix systems is reusability. Instead of writing special code to check for a range of word lengths, and saving the words in separate files, it is better to write code which checks one length and stores those that match in one file, and then reuse the same code with a different length and a different file. So let us work with words of length 4 to start.
</p>
<blockquote><p><strong><br />
<code>
<pre>
while ( $line = &lt;&gt; ) {
    chomp( $word = $line );
     if ( length($word) == 4 ) {
         # save the word
    }
}
</pre>
<p></code><br />
</strong></p></blockquote>
<p><a name="savewords"></a></p>
<h3>Save the Words</h3>
<p>
Now for the next length, we modify the code to use a variable. Instead of hardcoding the length, we can set the length we want to check and then change the length for each run of the program. Since we are expecting to change the length with each run of the program, we can save the words we want by sending them to the standard output or simply printing them using <em>print</em>. Notice we are adding a newline &#8220;\n&#8221; when we print it. Of course, if you are using a non-Unix system, Perl should used the correct end of line (link to<em> End of Line</em> in glossary) character(s).
</p>
<blockquote><p><strong><br />
<code>
<pre>
$len = 4;
while ( $line = &lt;&gt; ) {
    chomp( $word = $line );
    if ( length($word) == $len ) {
        print $word, "\n";
    }
}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
The program is now complete.
</p>
<hr />
<h3>Real World Perl  II</h3>
<p><a name="runningprogram"></a></p>
<h3>Running the Program</h3>
<p>
We are working on a program which reads a list of words, one word per line, and creates a new list of words that are 4, 5, or 6 letters long.  The program is complete and can be easily run from the command line by typing:
</p>
<blockquote><p><strong><br />
<code><br />
perl myprogram &lt; /usr/share/dict/words<br />
</code><br />
</strong></p></blockquote>
<p>
the results will appear on the terminal, in the terminal window, or your console log depending on what system you are running the program on.  If you want to control where the output goes, type instead:
</p>
<blockquote><p><strong><br />
<code><br />
 perl myprogram &lt; /usr/share/dict/words &gt; myoutput<br />
</code><br />
</strong></p></blockquote>
<p>
The Unix command line uses the less than &#8220;&lt;&#8221; and greater than &#8220;&gt;&#8221; symbols to direct or redirect input and output respectively to a program.  This allows the user to specify the input and output files for a particular program when the program is run, instead of having to hard-code the names in the program each time.  This feature allows programs to be reused with different data without any modifications required to the program.
</p>
<p><a name="reusingprogram"></a></p>
<h3>Making the Program Executable and Reusable</h3>
<p>
You will notice that there wasn&#8217;t any cryptic Unix-like script identification in my program.  That&#8217;s because Perl doesn&#8217;t require it to run a program.  However, if you want to execute the program directly by double clicking or using the program name as a command from the command line, you have to add the script identifier line at the beginning.  This line varies on some systems and some implementations, but the normal line would be:
</p>
<blockquote><p><strong><br />
<code><br />
#! /usr/bin/perl<br />
</code><br />
</strong></p></blockquote>
<p>
This tells Unix-like and some other systems where to find the perl runtime program on your computer. To make this program a little more usable, I like to use one of Perl&#8217;s option switches &#8220;-s&#8221;, especially while I&#8217;m testing my programs out, and &#8220;-w&#8221; to keep me honest and to catch syntax issues. The &#8220;-s&#8221; option lets me define and set variables on the command line.
</p>
<h3>Competed Program</h3>
<p>
My completed program:
</p>
<blockquote><p><strong><br />
<code>
<pre>
#! /usr/bin/perl -ws
if ( ! defined $len ) {
    $len = 4;
}
while ( $line = &lt;&gt; ) {
    chomp( $word = $line );
    if ( length($word) == $len ) {
        print $word, "\n";
    }
}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
The changed program checks to see if the <em>$len</em> variable is defined, and if not, sets the default value to 4.  To execute the program, I set the permissions on the file, myprogram, to be executable with the Unix command: <em>chmod</em>
</p>
<blockquote><p><strong><br />
<code><br />
chmod +x myprogram<br />
</code><br />
</strong></p></blockquote>
<p>
Then, I enter the commands:
</p>
<blockquote><p><strong><br />
<code><br />
myprogram -len=4 &lt; /usr/share/dict/words &gt; myoutput-4<br />
myprogram -len=5 &lt; /usr/share/dict/words &gt; myoutput-5<br />
myprogram -len=6 &lt; /usr/share/dict/words &gt; myoutput-6<br />
</code><br />
</strong></p></blockquote>
<p>
and I have three files, myoutput-4, myoutput-5, and myoutput-6, each containing words that are 4, 5 or 6 letters long.  And the best feature? If I want a list of 3 letter words or 10 letter words, I just execute the program again with a different option.
</p>
<hr />
<h3>Real World Perl  III</h3>
<p><a name="extracredit" /></p>
<h3>Extra Credit</h3>
<p>
Now that we have our lists of words separated by word length, there&#8217;s the matter of creating a separate list of secret words from the larger lists of words.  What we want to allow in the secret words is only those words which are all different letters with no duplicates in the same word.  For example, FIRE, APHID or CHROME are acceptable, but FREE, SPOOK, and RECORD are unacceptable.
</p>
<h3>The Task</h3>
<ul>
<li><a href="#description">Describe the Problem</a></li>
<li><a href="#findduplicate">How to Find Duplicate Letters</a></li>
<li><a href="#usingtables">Arrays or Hash Tables</a></li>
</ul>
<p><a name="description"></a></p>
<h3>Describe the Problem</h3>
<p>
This is easy. Take a word and count its letters. Check to see if there are any duplicate letters. If we find a duplicate letter, the word cannot be used. Simple for you or I to do. What about a computer program?
</p>
<p><a name="findduplicate"></a></p>
<h3>How to Find Duplicate Letters</h3>
<p>
One method is for a program to compare every letter against every other letter. The brute force way of doing this is to take a letter and compare it against all other letters skipping the comparison against itself. A slightly faster method would be to compare each letter against the remaining letters, since the preceding letters have already been compared to it. A third method is to identify each letter by number and enter the numbers into a table and if the same number is entered more than once, you have a duplicate. For short words, either of the first methods work well, and are easy to program.  However, the third option works with words of any length and is the easiest to program.
</p>
<h3>Compare Every Letter Against Every Other</h3>
<p>
The first method, is a simple set of two loops, comparing each letter against every other letter. This actually compares each letter against every other letter twice.
</p>
<blockquote><p><strong><br />
<code>
<pre>
01	$duplicate = 0;
02	foreach $i ( 0..(length($word)-1) ) {
03		foreach $j ( 0..(length($word)-1) ) {
04			next if ( $j == $i );
05			if ( substr($word,$i,1) eq substr($word,$j,1) ) {
06				$duplicate = 1;
07			}
08		}
09	}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
First we set the variable <em>$duplicate</em> to 0 to indicate not found, and in line 06 we set it to 1 if a duplicate is found. That&#8217;s the easy part.  We introduce the <em>foreach</em> construct for a loop using index <em>$i</em> which iterates over the range or set of elements following.  Perl allows you to specify a range of 1 to 5 as (1..5), or n to m as ( n..m ). Arrays and strings are zero (0) based, so we need to compare from letter 0 to letter length-1.  Next we do the same for an inner loop using index <em>$j</em>. Then inside the inner loop we introduce the <em>next</em> function. What <em>next</em> does is break out of the current iteration of a loop. In this instance we call next if the two indices are equal, thus avoiding comparing the same character.  The comparison is obtained by taking each character in the string we want to compare using the <em>substr</em> (sub-string) function.
</p>
<h3>Compare Every Letter Against Every Other, Once</h3>
<blockquote><p><strong><br />
<code>
<pre>
01	$duplicate = 0;
02	foreach $i ( 0..(length($word)-1) ) {
03		foreach $j ( ($i+1)..(length($word)-1) ) {
04			if ( substr($word,$i,1) eq substr($word,$j,1) ) {
05				$duplicate = 1;
06			}
07		}
08	}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
This method is shorter by one whole line!  Because we start the inner loop one character past the current index, we never compare the same letters twice, so we don&#8217;t need to check to see if they match.  Otherwise the rest of the method is identical to the first. One thing to watch out for, is to not change the indices inside either loop as this could cause unpredictable results.
</p>
<p><a name="usingtables"></a></p>
<h3>Arrays or Hash Tables</h3>
<ul>
<li><a href="#usingarray">Using an Array</a></li>
<li><a href="#usinghashtable">Using a Hash Table</a></li>
<li><a href="#differences">Differences Between Arrays And Hash Tables</a></li>
</ul>
<p>
Perl comes with several types of data structures, including two built-in types of tables. The first is called an array, and contains consecutive entries. An array is a simple list. You can start the list at any number and end at any higher numeric number, such as 0 to 3, 45 to 99, or -3 to +3. The default in Perl is 0 but can be changed by the user for all arrays or just a single array.  It&#8217;s best to leave it at 0 unless you have a good reason to change it.
</p>
<p><a name="usingarray"></a></p>
<h3>Using an Array</h3>
<blockquote><p><strong><br />
<code>
<pre>
01	$duplicate = 0;
02	@letters();
03	foreach $i ( ("A"-"A")..("Z"-"A") ) { $letters[$i] = 0; }
04	foreach $i ( 0..(length($word)-1) ) {
05		$letter = substr($word,$i,1);
06		if ( $letters[$letter-"A"] == 1 ) { $duplicate = 1; }
07		$letters[$letter-"A"] = 1;
08	}
<pre></code>
</strong></blockquote>


Whoa!  Wait, it just looks unusual.  Some of what you see is advanced, but the rest is just documenting what is going on instead of hiding the details.  First, we create an empty array in line 2.  To document what we are doing, we set the first 26 elements of the array to 0.  This corresponds to the range 0..25 as defined by "A"-"A" (0), to "Z"-"A" (25).  This might not work as expected in other alphabets, but works fine for ASCII text, upper and lower case characters.  We are assuming our word list is all upper case for this example.



Line 04 begins the real work with a simple loop once per letter in the word. We grab the letter and then use it as the index, offset by subtracting the "A" value again so that "A" will be 0, "B" will be 1, etc. If the array value has been set to 1, we know we've been here before with this letter and can say we found a duplicate. Otherwise we set the array value to 1.  If we complete the loop without a collision, we did not find any duplicates.


<a name="usinghashtable"></a>
<h3>Using a Hash Table</h3>


The other kind of table in Perl is called a hash table.  This is because it is a special database structure where elements are allocated to positions based on a numerical calculation. This calculation is called a hashing function or just a hash for short. It sounds complicated, but it isn't as far as we're concerned.  Think of it as magic that separates sets of items to keep them separate but easily accessible.  I like to tell people that hash tables allow you to store things as a collection without having to know the exact order they are in the database.  The hash is the index that lets you locate the specific item without having to compare the value against all the others in order to retrieve the information you want.

<blockquote>
<strong>
<code>
<pre>
01	$duplicate = 0;
02	%letters();
03	foreach $i ( 0..(length($word)-1) ) {
04		$letter = substr($word,$i,1);
05		if ( $letters{$letter} == 1 ) { $duplicate = 1; }
06		$letters{$letter} = 1;
07	}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
Line 02 defines the hash table.  It looks almost like an array. An array uses the prefix character '@' and a hash table uses the prefix character '%'. In line 05 we check to see if we have already set a value for this letter.  If we have, we found a duplicate.  If not, we proceed to the next line which sets the value for this letter to 1. The rest of the code is identical to that used by an array.
</p>
<p><a name="differences"></a></p>
<h3>Differences Between  Arrays and Hash Tables</h3>
<p>
One major difference you see is that I did not have to pre-initialize the hash table. The other is that individual elements are represented using open brace "{" and close brace "}" instead of open bracket "[" and close bracket "]". The most important change is that we do not have to work or think of elements beginning at an index of 0, or n, or any start point.  You can put two items into a hash table with widely different index values and it will take up the same amount of storage, two entries.
</p>
<p>
The syntax is different, but the has table method is identical to the array method. It is a little cleaner to code and to maintain that code later on. Another good thing about hash tables is that they can hold a significantly large number of items and quickly get to the necessary data quickly.  This means little when you use it for a 5 letter table, but when you are working with 500,000 items it is significant.
</p>
<hr />
<h3>Real World Perl  IV </h3>
<h3>Putting It All Together</h3>
<p>
To recap, we've been working on a program to read a list of words and pull out the words that are a specified length.  For my purposes they were 4, 5 and 6 characters long. As an extra credit assignment, we wanted to extract those words which contained no duplicate characters.  For example, FREE is not acceptable because it has the letter "E" twice, but FIRE is acceptable.
</p>
<blockquote><p><strong><br />
<code>
<pre>
#! /usr/bin/perl -ws
if ( ! defined $len ) { $len = 4; }
while ( $line = &lt;&gt; ) {
    chomp( $word = $line );
    if ( length($word) == $len ) {
        tr/a-z/A-Z/;
        if ( defined $secret ) {
            $duplicate = 0;
            %letters();
            foreach $i ( 0..(length($word)-1) ) {
                $letter = substr($word,$i,1);
                if ( $letters{$letter} == 1 ) {
                    $duplicate = 1;
                }
                $letters{$letter} = 1;
            }
            if ( $duplicate != 1 ) {
                print $word, "\n";
            }
        }
        else {
            print $word, "\n";
        }
    }
}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
We also made an assumption that the words in the dictionary file were upper case in the earlier examples on how to detect duplicate letters.  In our final version above, we added the <em>tr</em> statement which translates any occurrences of the first set or range of characters with those in the second set or range. This translation converts any lower case letters into upper case letters; (a-z) becomes (A-Z).
</p>
<p>
Instead of writing two completely different programs, I decided to combine them, so I could use the original set of words and the same program to produce the list of usable guess words and the usable list of secret words.
</p>
<h3>Running the Program</h3>
<p>
To run the program, we simply use commands similar to the original program:
</p>
<blockquote><p><strong><br />
<code><br />
myprogram -len=4 &lt; /usr/share/dict/words &gt; myoutput-4<br />
myprogram -len=4 -secret &lt; /usr/share/dict/words &gt; secret-4<br />
myprogram -len=5 &lt; /usr/share/dict/words &gt; myoutput-5<br />
myprogram -len=5 -secret &lt; /usr/share/dict/words &gt; secret-5<br />
myprogram -len=6 &lt; /usr/share/dict/words &gt; myoutput-6<br />
myprogram -len=6 -secret &lt; /usr/share/dict/words &gt; secret-6<br />
</code><br />
</strong></p></blockquote>
<p>
The <em>-secret</em> option to myprogram defines the variable <em>$secret</em> and the program merely tests for whether <em>$secret</em> is defined before executing the duplicate letters test. If the variable is not defined, the test is not made and the any word matching the word length is written to the output.  if the variable is defined, the test is made, and only those words with unique, non-duplicating letters are written to the output file.
</p>
<h3>Personal Comments</h3>
<p>
My own implementation of this program was slightly more cryptic, by combining some elements such as using "++" to increment a counter instead of setting it to 1, using the substring value of <em>$word</em> directly instead of using a separate <em>$letter</em> variable, and an added test to eliminate words in the various dictionaries I used that contained numbers in front of some words.  I also prefer the constructs:
</p>
<blockquote><p><strong><br />
<code><br />
statement if condition;<br />
$variable = value if $field == $test;<br />
</code><br />
</strong></p></blockquote>
<p>
instead of
</p>
<blockquote><p><strong><br />
<code><br />
if ( condition ) { statement; }<br />
if ( $field == $test ) { $variable = value; }<br />
</code><br />
</strong></p></blockquote>
<p>
But, that is a personal preference, and it actually make it harder to read the code if you are not used to using those conventions or that style. Just for reference, my original program is included below:
</p>
<blockquote><p><strong><br />
<code>
<pre>
#!  /usr/bin/perl -sw
$len = 5 if ! defined( $len );
 #print "Len = $len\n" if $debug;
while ( $line = &lt;&gt; ) {
    $line =~ s/\r\n//g;
    $f = substr($line,0,1);
    #	print "F $f\n" if $debug;
    next if (($f ge "0") &amp;&amp; ($f le "9"));
    chomp($line);
    #	print "Line: $line ",length($line)," $len\n" if $debug;
    chomp($line);
    #	print "Line: $line ",length($line)," $len\n" if $debug;
    if ( length($line) == $len ) {
        %chars = ( );
        for $i (0..length($line)-1) {
        #		print "&lt;",substr($line,$i,1),"&gt; " if $debug;
            last if $chars{substr($line,$i,1)}++;
        }
        #		print "\n" if $debug;
        $ch = keys %chars;
        print $line,"\n" if $ch == $len;
    }
}
</pre>
<p></code><br />
</strong></p></blockquote>
<p>
I also tend to leave debugging code in many of my quick programs, but comment them out when I'm finished, just so perl doesn't try to interpret them unnecessarily. The final dictionary used for this project contained over 17,000 5-letter words and over 7,000 secret words. I also used two constructs that we did not discuss in my recreation of this project. I use global substitution to remove some line endings, and instead of keeping a separate $duplicate variable, I check the number of unique characters found in the word by comparing the number of keys in my data array against the number of characters in the word.
</p>
<blockquote><p><strong><br />
<code><br />
$line =~ s/\r\n//g;<br />
$ch = keys %chars;<br />
print $line,"\n" if $ch == $len;<br />
</code><br />
</strong></p></blockquote>
<p>
Just remember, "There's More Than One Way To Do It!"
</p>
<hr />
<h3>Real World Perl  IV </h3>
<h3>Glossary</h3>
<h3>Array</h3>
<p>
A sequential list of items.  In Perl the items can be individual numbers, strings, or even references to other arrays, hash tables, and user-defined class objects. The main structure of an array is simply a list which can be sequentially read or written or indexed directly to the nth element.
</p>
<h3>Hash Table</h3>
<p>
An in-memory data structure which acts like a simple relational database to allow quick access to specific data storage indexed by keys instead of direct sequential location. More powerful than an Array because retrieval is based on a calculation performed on the key to quickly locate a data element instead of searching sequentially through a list.  An array can be used as the final storage for a hash table.  That is how hash tables were originally created in other languages. Perl has them built-in to the syntax of the language.
</p>
<h3>EOL, End of Line</h3>
<p>
The end of line character (EOL) is different on different computers. Unix systems use the ASCII (link to definition of ASCII) character LF (line feed), also known as NL (new line) and depicted as "\n". MS-DOS and Windows (links to each) use two characters, a carriage return followed by a line feed CR+LF, or CRLF. Mac OS/X uses a single carriage return CR or a newline NL depending on the program. In the 1982 movie, <em>Tron,</em> the MCP or Master Control Program ended all of its conversations with the phrase "End of Line". (link to IMDB for Tron info, also link to Tron Wiki for quote about MCP using End Of Line; verify no about.com links can be used before linking offsite.)
</p>
<h3>CR, carriage return</h3>
<p>
The carriage return character is named after the function on typewriters where the operator pushed a lever to reset the position of the carriage to the return position so the operator could enter another line of text.  In ASCII, this is the character represented by octal '015', decimal 13, or hexadecimal 0x0D. It can be entered on most keyboards as control-M, ^M, or CTRL-M by holding down the control key and the M key simultaneously. However, some systems will translate this as the RETURN key and translate it to newline or the correct EOL sequence for that system. This is the end of line character on the Apple Macintosh MacOS operating systems prior to System 10.0 or OS/X.
</p>
<h3>CRLF, CR+LF, carriage-return line-feed</h3>
<p>
The combination of the characters CR (carriage return) and LF (linefeed) appearing together sequentially is called a CRLF. This is the end of line sequence of characters for many computer systems including MS-DOS, PC-DOS, TOPS-10, TOPS-20, and Windows. Some other systems use LF followed by CR, or LFCR, but it is rare.
</p>
<h3>LF, linefeed, line feed</h3>
<p>
The line feed character is named after the function to feed the paper up one line while maintaining the current position on the paper. This function exists on manual typewriters and many printing devices. In ASCII, this is the character represented by octal '012', decimal 10, or hexadecimal 0x0A. It can be entered directly on many keyboards as control-J, ^J or CTRL-J by holding down the control key and the J key simultaneously. This is the end of line character on most variants of Unix or Linux.
</p>
<h3>NL, newline, new line</h3>
<p>
A newline character, taken from the line feed function, but in computer terms it also resets the carriage or other mechanisms to be ready for the next new line of text. In ASCII, this is the character represented by octal '012', decimal 10, or hexadecimal 0x0A. This is the end of line character on most variants of Unix or Linux.</p>
]]></content:encoded>
			<wfw:commentRss>http://CatalystArt.reststop.com/2010/02/12/28/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
