![Virus, Adware, Spyware, Worms, Trojan Horses, etc. [Virus]](/images/iconVirus.png)
![Computers [Computer]](/images/iconComputer.png)
Apparently someone somewhere out there believes that Java is somehow inherently insecure. I haven't met this person myself, but I read about it in a newsgroup.
I like the idea of storage in a free form un-categorized way, and the website convinced me to download it (Parsnips).
I then saw that Java was required, and that is a big turnoff. I have used some Java apps before, and found it to be problematic, and I'm not entirely sure that Java is actually safe to have on my PC, so I stopped downloading.
I'm very surprised by this stance because it seems to imply that if the same program were not written using Java-related technologies, it would somehow be more secure. I generally have the opposite view, and I'd like to explain why here. Note though that this explanation is designed to be very non-technical, and so I oversimplified a lot of issues here. That is to say, if you know a lot about computers, you'll probably spot some technical errors here, but I'm sure you'll agree that the basic concept is essentially correct. This text is meant to be an explanation of why Java is generally more secure that normal "native" programs to a non-technical audience.
First of all, a very brief intro on how computers work. Programs are typically stored as a series of instructions. You can download these instructions just like you can download any other type of file from the Internet and save these instructions on your hard disk. When you double click on a program file, your operating system (e.g. Windows XP, MacOS X, Linux, whatever) will notice that this is a program file, and start executing the instructions one by one.
In general, the computer doesn't really know what you want to do, and so will blindly follow whatever the instructions say. If the instructions tell the computer to play some music through the speakers, the computer will do so. If the instructions tell the computer to delete all the files on the disk, it will also do so. So generally speaking, whenever you download a program file off the Internet and run it, you're taking a risk.
With Java, it's different. To understand this though, we have to understand that the word "Java" refers to a lot of things. Two particular concepts will interest us here; the first is "Java" the runtime environment; the second is "Java" the programming language.
When you download a Java program, the instructions are encoded in a format called "bytecode". Most computers cannot natively understand the bytecode format, and so the operating system can't actually decipher what all the instructions say and execute them. That's why you need to have the Java Runtime Environment (JRE) installed to actually use your Java program. The JRE takes the bytecode and translates it into a format that your computer can actually understand so that it can actually run the program.
One thing that the JRE does while translating the bytecode is that it actually analyzes the bytecode to make sure it's safe! When you load an applet on a webpage for example, the JRE will look at the bytecode, and if it sees that there's an instruction there that's trying to read files on your computer, it'll actually prevent that instruction from running. When you run a normal computer program, there's nothing to stop it from randomly read, rename, move or delete your files. That's one reason why Java programs is inherently more secure than normal programs.
So the JRE will try to stop malicious Java applets from damaging your system or compromising any secret data stored on there. That's great. But when you look at all the computer security issues that make it to the news, only a few of them are about malicious programs. A lot of the time, the problem was a coding mistake in a benign program that allowed crackers to break into the system. That is to say, the programmers who wrote the program had good intentions, but the programs they wrote accidentally had security flaws in them that, when installed on a computer, created a backdoor through which crackers could come in and start messing with your system. Does Java protect against that?
Actually, it does. Java the programming language has a lot of facilities that track the most common security mistakes programmers make, and will not actually let them create a program which has these security flaws.
The number one most common programming mistake made (prior to Java) which resulted in a security flaw is the Buffer Overflow mistake. There are many ways that buffer overflow mistake can pop up in non-java programs. I'll explain each one, and show how Java prevents programmers from making these mistakes.
The first form of buffer overflow takes place in "strings". "Strings" are just how text is stored in computer memory. Back before Java was invented, a lot of people wrote programs in C. In this language, you typically had to specify what the maximum length a string could be so that the computer could reserve enough memory to store it. For example, you might want to write a program that asks what the user's name is, and since this is textual data (as opposed to numerical data), you'd have to store the user's name in a string. You might assume that no user will ever have a name longer than 200 characters, so you write in your program to allocate a string of size 200.
Now a malicious hacker comes along and someone finds out that there's a limitation of 200 characters in the string. What he does is purposely enters a name much longer than 200 characters. What happens then? Well, remember how a computer works: when you run a program, the computer reads the set of instructions that make up that program into memory and then starts executing the instructions one by one. So now imagine the computer is executing the "ask the user for his or her name and store it in a string of size 200" instruction. The hacker provided a name of say, 500 characters. The computer will try to store the 500 character string into a memory slot that only has space for 200 characters. The remaining 300 characters will actually overwrite the instructions that lie in memory! So when the computer tries to execute the next instruction it sees, it'll actually execute whatever instructions that were in the 300 remaining characters that the hacker provided. As a very simplified example, I could enter my name as "Nebuuuuuuuuu[... imagine there were 200 'U's here]delete all files", and so the "delete all files" instruction would overwrite whatever the next instruction was in memory. When the computer stores this string and tries to read the next instruction, it sees "delete all files" and happily does so.
So how does Java fix this problem? By not having the programmer specify a maximum string length at all! Java is actually smart enough to detect the length of the string that the user entered, and allocate exactly enough space to store it. So even if the hacker tried entering their name to be a million characters long, Java would allocate enough space to store it without overriding any instructions.
An example of this particular buffer overflow attack happened relatively recently with the popular Winamp mp3 music player. In mp3 files, in addition to the music, you could also store information such as the name of the artist and the name of the song. A clever hacker figured out that if you made the name of the artist long enough, you could actually overwrite the instructions that winamp would execute, so that playing a carefully constructed mp3 file in Winamp could actually delete the files on your computer, for example. This flaw was fixed in Winamp version 5.01, but it never would have been an issue if Winamp had been written in Java!
Buffer overflow can also happen with "arrays". Arrays are just a collection of similar objects. For example, if you were writing a program that kept track of names, addresses and phone numbers for an address book, you might store the entries of the address book in an array. Like strings, you also have to specify the size of an array. For example, you might specify that you want the computer to allocate enough memory for an array to store 10 contacts. What would happen if the user wanted to store more than 10 contacts in her address book? Traditionally, the program would then allocate a new array, perhaps of size 20, and then copy all the contacts from the old array (of size 10) into the new one, and then delete the old array, so it would give the illusion that the array had "grown" to allow for more data. The problem with this is that you now have to keep track of the size of the array, which adds extra complexity. The reason you need to do this is because if you ask the computer to access the 30th item in the array, even though the array is only of size 20, the computer will happily do so.
Reading the 30th item in a size 20 array would typically just give you nonsensical data. Writing into the 30th item was a much more serious problem. Like with the string-based buffer overflow, you'd be overwriting actual program instructions. Java fixes this by keeping track of the size of arrays automatically for you. If the programmer tries to access an entry outside of the bounds of the array, Java will stop that operation from executing and report the problem to the programmer (in the form of an "Array Index Out Of Bounds Exception").
Another security-related programming mistake has to do with "pointers". Pointers is somewhat similar to the concept of "shortcuts" in Windows. Let's say you have a movie file of your pet dog dancing around in a circle called "dog-dance.avi" and it's in your folder "movies\family\". You also have a folder called "movies\funny\" and you occasionally sometimes forget where you've placed that "dog-dance" movie, so you look in "movies\funny\" and see that it's not there. You then remember that it's in "movies\family\" so you are finally able to view it. You figure it'd be more convenient if you could access the movie file from BOTH locations, as the movie is both related to your family (if you consider pets as being part of the family), and is funny. One way to solve this problem is to actually copy the movie file to both locations; however, if the movie file is large, this will waste a lot of diskspace, as you now need to store the file twice. A better solution is to create a "shortcut" in the "movies\funny\" directory that points to the "movies\family\dog-dance.avi" movie. The shortcut is relatively small because it doesn't actually contain the fully movie; all it contains is the location of the real movie file. But you can double click on the shortcut or double click on the original movie and the effects are the same: your dancing dog movie plays. "Pointers" are basically the same concept in programming; they're a piece of memory that points to the location of some chunk of data.
When the concept of pointers was first invented, everyone thought it was great, because it allowed a lot more flexibility when writing programs. However, people very quickly discovered that writing programs using pointers is difficult. This doesn't have anything to do with the skill level of the programmer; from a theoretical computer science point of view, it's shown that you can actually make fewer reasonings and weaker analysis of a program when it uses pointers. In other words, it is actually proven that pointers make it harder to program.
Instead of pointers, Java uses what it calls "object references" which provide most the flexibility of pointers, without any of the problems associated with them. One of the flexible aspects of pointers that isn't present in object references is "pointer arithmetic". If you are a perfect, infallible programmer, you can perform clever tricks using pointer arithmetic, but in general, most programmers just get themselves into very deep trouble when they use pointer arithmetic, and it was shown that almost anything you could accomplish with pointer arithmetic, you could accomplish with object references — though some things are more difficult to do with object references, they are safer to do. In general, this is regarded as a good tradeoff. It means a bit more work for the programmers, but the software is a lot more stable and secure. Personally, after using object references and not needing to worry about whether I had "dangling pointers" (pointers which point to random locations in memory) anywhere, I never wanted to go back to programming with pointers.
One last common programming mistake I'd like to talk about is memory leaks. In languages like C, when you were done with a piece of data, you manually had to deallocate it to free up the memory it was using. For example, when you allocated that string of size 200, and were done with it, you were supposed to deallocate it. Unfortunately, like programming with pointers, this turned out to be extremely hard to do in practice. As a result, a lot of data would never get deallocated, and so the memory would stay marked as being "reserved" even though it wasn't being used anywhere. The longer your program ran, the more of often this would happen, until typically your computer would run out of memory and crash (this was fairly typical back in the days of Windows 95, for example).
Eventually people figured out how to make the computer keep track of what parts of memory were in use, and to deallocate the data automatically. This technology is generally referred to as "garbage collection" (the name comes from the fact that there's a sub-program going around looking for "garbage" data and collecting it to free up some space for new data), and Java has garbage collection built in. It's still possible to have memory leaks in Java programs, but this occurs much less rarely than in programs that don't have garbage collection technology.
Java can also be more secure for pragmatic reasons. Now that Linux and Macintosh are both gaining more market share, many commercial products are being ported (or re-written) to be apple to support these platforms in addition to Windows. Traditionally, a company would need to hire a team to write the main code, a team to write the Windows specific code, a team to write the Linux specific code, and a team to write the Apple specific code. That's a lot of teams! Unless you're a really big corporation like Microsoft, probably each team will be under-staffed, which typically results in looming deadlines, and cutting corners. Each team is already busy enough with just getting the product to do what it's advertised as being capable of; they hardly have time to ensure that it's secure (or even bug free).
Java was designed in such a way so that you only needed to write the code in one way, and the program would perform the same on all platforms. This is possible because the JRE (remember the JRE? The Java Runtime Environment?) translates the Java code into the instruction set that the computer it's running on can understand. The company only has to write the Java code once. The Windows JRE will translate that code to a format that Windows can understand; the Macintosh JRE will translate the code into a format that Macintosh can understand; and similarly for Linux or anything other operating system. The best part is that Sun has already released JREs for all these platforms, and has even released the technical specifications for how a JRE works, so that if you're using a really obscure platform (perhaps you want to manufacture a "smart fridge" which can run Java programs whenever it detects that the milk is running low), you could make your own JRE (i.e. you're not out of luck if Sun refuses to write a JRE for your platform, or even if Sun suddenly goes out of business).
So not only can you can the Windows-specific people and the Mac-specific people and put them on your main-code team, thus ensuring higher quality and less buggy program, but you can then take your Linux team and assign them instead to analyze the security implications of the program design! Actually having people focused on security is probably the biggest factor on whether a product will be secure or not. When a company uses Java, they're saving a themselves a lot of time and work which can then be spent on enhancing their products both functionally and from a security point of view.
In summary, when Sun designed Java, they took into consideration a lot of security concerns (because they knew from the start that Java code would mostly be transferred over the Internet). Java, in a sense, does a lot of "babysitting" on the bytecode to ensure that it is safe to run and free of many common programming errors — native programs don't have any babysitting done and so can do virtually any nasty thing to your computer. It's because of this extra layer of protection that I would be much more ready to trust a program written in Java than a native program, all other factors being equal. As an aside, one of the reasons Java programs used to run slow is because all this extra babysitting work had to be done, thus using up more computer processing power. However, Sun has invested a lot of effort into speeding up Java so that its speed is comparable to native programs, so even this is no longer an issue.
Sun is in fact continuously improving Java to make Java programs more robust and secure. Towards the end of September 2004, Sun released "Java 5", which added, among other new features, "generics". Generics solve yet another very common programming mistake, "casting exceptions". To put it simply, a casting exception occurs when the programmer thinks the chunk of data he's working with is of one type (for example, textual) when actually it's another type (perhaps numerical). At my job, for example, I took some old Java code and updated it to use generics, and it instantly it allowed us to spot a dozen or so bugs in our code. We were performing some "casts", which basically meant we're saying we think this piece of data is of this particular type. When I added generics, the Java platform was able to tell us "No, you have a contradiction here. According to this 'generics' information, it's impossible for this piece of data to be of this type."
For further reading, here's Sun's (fairly technical) document on what security features are present in Java, and here is an independent analysis of security features in Java which essentially concludes that installing any software at all poses a security risk and that Java is no exception, but that "[f]or a system such as a home computer, many people are likely to find that the benefits of Java outweigh the risks."