Return to Library
Don Arndt is the Reference/Electronic Resources Librarian for Biddle Law Library at the University of Pennsylvania. In his spare time, he’s the VP/President-Elect of the Greater Philadelphia Law Library Association. This article addresses some of the technical skills he’ll be teaching as part of Genie Tyburski’s workshop, “The ‘Compleat’ Internet Researcher,” at the AALL Annual Meeting in Baltimore this Summer .
No, you don’t. However, certain basic skills should be common to anyone who does research on the Internet, or who uses the Internet as a communication medium. More and more, this describes all library and information professionals. Besides being important and marketable skills, they will make your use of the Internet easier. Even though Internet tools are increasingly user-friendly, strict rules still govern the processes of moving various data formats from point A to point B on the Internet and making use of that data when it gets there. For instance, FTP’ing a WordPerfect 5.1 document as a plain ASCII text file rather than as a binary document will result in a puddle of unrecognizable garbage at the receiving end, instead of the useful document you thought you were getting. And then, even if the data is transferred correctly, often it isn’t useful immediately after downloading because it needs some sort of post-transfer processing (decoding, uncompressing, or making use of some helper application in order to view/manipulate it).
Sounds scary, doesn’t it? Part of the confusion, and (dare I say it?) fun of this millieu is that there are so many ways to move data between hither and yon. There’s the World Wide Web of course, and “file transfer protocol” (FTP), e-mail, Usenet Newsgroups, gopher, Web-fax, shared applications and whiteboard space, direct-dial telecommunications (Kermit and so on) connecting you to your computer at work or to a bulletin board … let’s see, did I miss any? Sure, no question. That’s because new stuff comes out every day. Fortunately, you don’t have to be on the cusp of this minute’s cutting edge tecnological marvel to make use of most of these functions. FTP, for example, has been around a long time–it’s not going anywhere and nobody’s asking it to. So while it’s true that you could use the latest, greatest FTP client software running on a screaming barnburner of a whiz-bang machine to grab that neat document from half a world away, it’s equally true that you could grab it using five year old FTP software running on a dinosaur machine. No matter your setup or background, the best way to learn is by doing. With that, let’s jump right in and get an overview of some of the basic technical skills that are essential for the Internet researcher.
Techie Quick Start
This is a quick and dirty approach to each of the following subjects. It’s enough to get you started if, like me, you’re a Windows 3.x user. If you use another operating system, or would like to go into the subject in greater detail, click on the “more info” link for each topic.
Compression/Decompression and Archiving
Compression is the process of making files smaller. This can be useful for archival purposes since compressed files use less storage space, or for transfer purposes since compressed files use less bandwidth and take less time to download. Decompression is the process of expanding compressed files to make them usable again.
Archiving is the process of taking multiple files and packing them together into a single file. When that archive file is later opened and unpacked, the original files will reveal themselves again.
Compressed files may be archived and archived files may be compressed, and the two processes often go together. How to recognize these files depends both on the software used to archive/compress them and on the operating system for which that software was built, as well as whether the file was designed to be self-extracting or in need of an external piece of software to do the unpacking/decompressing (usually the same software that did the packing/compressing in the first place).
Usually, archived/compressed files from the Windows operating system are recognizable by a .zip extension to the file name. These are referred to as “zipped” files, and while they will always be compressed, you have to “unzip” them to find out whether they’ve been archived as well since zipping can accomplish both in one step. One popular Windows program for compressing and decompressing files is Winzip.
WinZip can also produce self-extracting (“self-unzipping?”) files, recognizable by the .exe extension. Of course, this can present a bit of a recognition problem in its own right since so many of the program files in the Microsoft family of operating systems use the .exe extension as well. Oh well, confusion is part of life. One tip: be wary of viruses in files with the .exe extension unless they come from a reliable source. Being self-executing, they could delete your hard drive when run, or cause other havoc.
Here’s a quick tutorial on how to set up and use WinZip.
Click here for more information on compression and decompression generally.
Let’s start off by talking about the two types of encoded files you’re liable to encounter on the Internet. The first kind is a security lock, to keep prying eyes off a sensitive document for instance; the second kind is simply a way to make certain kinds of documents easily transportable over the Internet via e-mail and Usenet Newsgroups. To avoid confusion, we’ll call the first kind “Encrypted“, and the second kind “Encoded“.
We’re not going to spend much time on this, except to make you aware of the concept and how to find out more. Since the Internet is such an unsecure environment, one way to protect the privacy of your information is to turn it into gobbledeeguk which only you can decypher. In other words, the idea behind encryption for security purposes is that it doesn’t matter if others obtain your encrypted document because they won’t be able to read it without cracking the code first. And cracking a 128 bit encryption, say, can be tough, even if you have access to a Cray supercomputer to do the crunching. One good, widely used and free program available for this purpose is Pretty Good Privacy (PGP). To find out more about it, check out this site: http://www.yahoo.com/Computers and Internet/Security and Encryption/PGP _ Pretty Good Privacy/
Simply put, an encoded file is a binary file which has been converted into a text file in order for it to be shared over the Internet via a text-only medium, such as E-mail or a Usenet Newsgroup posting for instance. A binary file is anything (e.g., pictures, sound files, spreadsheets and word processing documents, etc.) which is not plain, simple ASCII text. Sometimes the distinction is tricky. For instance, this html page is plain ASCII text, but a textual document in WordPerfect 5.1 format is a binary document. Decoding is what you do to turn an encoded file back into a usable binary file at the receiving end. Binaries can be reliably transported via ftp (file transfer protocol), or via one of the many communications programs such as Kermit, as long as your program is set to do a “binary transfer” and not an “ASCII transfer”. Yet it’s still nice to be able to send a friend (colleague, boss, client…) a binary file directly by e-mail, or to be able to make use of it should they send one to your e-mailbox.
The easiest way to recognize an encoded file, when one shows up in your mailbox or in a newsgroup you subscribe to, is by looking at the text of the message. If the first screen looks something like this:
section 1 of uuencode 4.13 of file DILBERT.GIF by R.E.M.
begin 644 DILBERT.GIF
M2PVKU*5_&ANDR,]8;8AZ:P.IV(YV;@”IV@”HV0>DU`”EUHAR:0″BTW5V=K15J [email protected]
then that’s a pretty good clue that the file is encoded. That, plus somewhere near the top of the document there will be a line giving you a clue to its identity. In this case it’s a very small sliver of a UUENCODED gif image. If you were to save the message, decode it, and open it on a suitable viewer, you would see a cartoon picture of Dilbert and his lovable mutt Dogbert.
One good program for encoding and decoding files is Wincode version 2.7.3a.
Proprietary File Formats
These are file formats that can’t be handled by your Web browser on its own, without calling up the assistance of another viewer or “helper application” (a.k.a., “plug-in”). Examples of proprietary file formats abound: word processing and spreadsheet files, Adobe PDF (portable document format) and PostScript files, video clips, sound files, vector graphics, etc. If your browser encounters such a troublesome file format and isn’t able to display it on its own, it calls up the correct helper app automatically, assuming that the helper app is loaded on your computer and that your browser’s been configured correctly and knows what helper app to look for.