| Thinking in Java source ref: work2.html |
Creating a good input/output (I/O) system is one of the more difficult
tasks for the language designer.
This is evidenced by the number of different approaches. The challenge seems to be in
covering all eventualities. Not only are there different sources and sinks of I/O that you
want to communicate with (files, the console, network connections, etc.), but you need to
talk to them in a wide variety of ways (sequential, random-access, buffered, binary,
character, by lines, by words, etc.). .
The Java library designers attacked this problem by creating lots of classes. In fact,
there are so many classes for Javas I/O system that it can be intimidating at first
(ironically, the Java I/O design actually prevents an explosion of classes). There was
also a significant change in the I/O library after Java 1.0, when the original byte-oriented library was supplemented with char-oriented,
Unicode-based I/O classes. In JDK 1.4, the nio classes (for new I/O, a
name well still be using years from now) were added for improved performance and
functionality. As a result, there are a fair number of classes to learn before you
understand enough of Javas I/O picture that you can use it properly. In addition,
its rather important to understand the evolution history of the I/O library, even if
your first reaction is dont bother me with history, just show me how to use
it! The problem is that without the historical perspective, you will rapidly become
confused with some of the classes and when you should and shouldnt use them. .
This chapter will give you an introduction to the variety of I/O classes in the
standard Java library and how to use them. .
Before getting into
the classes that actually read and write data to streams, well look at a utility
provided with the library to assist you in handling file directory issues. .
The File class has a deceiving name; you might think it refers to a file, but it
doesnt. It can represent either the name of a particular file or the names
of a set of files in a directory. If its a set of files, you can ask for that set
using the list( ) method, which returns an array of String. It makes
sense to return an array rather than one of the flexible container classes, because the
number of elements is fixed, and if you want a different directory listing, you just
create a different File object. In fact, FilePath would have been a
better name for the class. This section shows an example of the use of this class,
including the associated FilenameFilter interface. .
Suppose youd like to see a
directory listing. The File object can be listed in two ways. If you call list( )
with no arguments, youll get the full list that the File object contains.
However, if you want a restricted listfor example, if you want all of the files with
an extension of .javathen you use a directory filter, which is a
class that tells how to select the File objects for display. .
Heres the code for the example. Note that the result has been effortlessly sorted
(alphabetically) using the java.utils.Arrays.sort( ) method and the AlphabeticComparator
defined in Chapter 11:
//: c12:DirList.java
// Displays directory listing using regular expressions.
// {Args: "D.*\.java"}
import java.io.*;
import java.util.*;
import java.util.regex.*;
import com.bruceeckel.util.*;
public class DirList {
public static void main(String[] args) {
File path = new File(".");
String[] list;
if(args.length == 0)
list = path.list();
else
list = path.list(new DirFilter(args[0]));
Arrays.sort(list, new AlphabeticComparator());
for(int i = 0; i < list.length; i++)
System.out.println(list[i]);
}
}
class DirFilter implements FilenameFilter {
private Pattern pattern;
public DirFilter(String regex) {
pattern = Pattern.compile(regex);
}
public boolean accept(File dir, String name) {
// Strip path information, search for regex:
return pattern.matcher(
new File(name).getName()).matches();
}
} ///:~
The DirFilter class implements the interface FilenameFilter.
Its useful to see how simple the FilenameFilter interface is: .
public interface FilenameFilter {
boolean accept(File dir, String name);
}
It says all that this type of object does is provide a method called accept( ).
The whole reason behind the creation of this class is to provide the accept( )
method to the list( ) method so that list( ) can call
back accept( ) to determine which file names should be included in the
list. Thus, this structure is often referred to as a callback.
More specifically, this is an example of the Strategy Pattern, because list( )
implements basic functionality, and you provide the Strategy in the form of a FilenameFilter
in order to complete the algorithm necessary for list( ) to provide its
service. Because list( ) takes a FilenameFilter object as its argument,
it means that you can pass an object of any class that implements FilenameFilter to
choose (even at run time) how the list( ) method will behave. The purpose of a
callback is to provide flexibility in the behavior of code. .
DirFilter shows that just because an interface contains only a set of
methods, youre not restricted to writing only those methods. (You must at least
provide definitions for all the methods in an interface, however.) In this case, the DirFilter
constructor is also created. .
The accept( ) method must accept a File object representing the
directory that a particular file is found in, and a String containing the name of
that file. You might choose to use or ignore either of these arguments, but you will
probably at least use the file name. Remember that the list( ) method is
calling accept( ) for each of the file names in the directory object to see
which one should be included; this is indicated by the boolean result returned by accept( ).
.
To make sure the element youre working with is only the file name and contains no
path information, all you have to do is take the String object and create a File
object out of it, then call getName( ), which strips away all the path
information (in a platform-independent way). Then accept( ) uses a regular
expression matcher object to see if the regular expression regex matches the
name of the file. Using accept( ), the list( ) method returns an
array. .
This example is ideal for rewriting using an anonymous inner class (described in
Chapter 8). As a first cut, a method filter( ) is created that returns a reference to a FilenameFilter:
//: c12:DirList2.java
// Uses anonymous inner classes.
// {Args: "D.*\.java"}
import java.io.*;
import java.util.*;
import java.util.regex.*;
import com.bruceeckel.util.*;
public class DirList2 {
public static FilenameFilter filter(final String regex) {
// Creation of anonymous inner class:
return new FilenameFilter() {
private Pattern pattern = Pattern.compile(regex);
public boolean accept(File dir, String name) {
return pattern.matcher(
new File(name).getName()).matches();
}
}; // End of anonymous inner class
}
public static void main(String[] args) {
File path = new File(".");
String[] list;
if(args.length == 0)
list = path.list();
else
list = path.list(filter(args[0]));
Arrays.sort(list, new AlphabeticComparator());
for(int i = 0; i < list.length; i++)
System.out.println(list[i]);
}
} ///:~
Note that the argument to filter( ) must be final. This is required by the anonymous inner class so that
it can use an object from outside its scope. .
This design is an improvement because the FilenameFilter class is now tightly
bound to DirList2. However, you can take this approach one step further and define
the anonymous inner class as an argument to list( ), in which case its
even smaller:
//: c12:DirList3.java
// Building the anonymous inner class "in-place."
// {Args: "D.*\.java"}
import java.io.*;
import java.util.*;
import java.util.regex.*;
import com.bruceeckel.util.*;
public class DirList3 {
public static void main(final String[] args) {
File path = new File(".");
String[] list;
if(args.length == 0)
list = path.list();
else
list = path.list(new FilenameFilter() {
private Pattern pattern = Pattern.compile(args[0]);
public boolean accept(File dir, String name) {
return pattern.matcher(
new File(name).getName()).matches();
}
});
Arrays.sort(list, new AlphabeticComparator());
for(int i = 0; i < list.length; i++)
System.out.println(list[i]);
}
} ///:~
The argument to main( ) is now final, since the anonymous inner
class uses args[0] directly. .
This shows you how anonymous inner classes allow the creation of specific, one-off
classes to solve problems. One benefit of this approach is that it keeps the code that
solves a particular problem isolated together in one spot. On the other hand, it is not
always as easy to read, so you must use it judiciously. .
The File class is more than just a representation for an existing file or
directory. You can also use a File object to create a new directory or an entire
directory path if it doesnt exist. You can also look at the characteristics of files
(size, last modification date, read/write), see whether a File object
represents a file or a directory, and delete a file. This program shows some of the other
methods available with the File class (see the HTML documentation from java.sun.com
for the full set):
//: c12:MakeDirectories.java
// Demonstrates the use of the File class to
// create directories and manipulate files.
// {Args: MakeDirectoriesTest}
import com.bruceeckel.simpletest.*;
import java.io.*;
public class MakeDirectories {
private static Test monitor = new Test();
private static void usage() {
System.err.println(
"Usage:MakeDirectories path1 ...\n" +
"Creates each path\n" +
"Usage:MakeDirectories -d path1 ...\n" +
"Deletes each path\n" +
"Usage:MakeDirectories -r path1 path2\n" +
"Renames from path1 to path2");
System.exit(1);
}
private static void fileData(File f) {
System.out.println(
"Absolute path: " + f.getAbsolutePath() +
"\n Can read: " + f.canRead() +
"\n Can write: " + f.canWrite() +
"\n getName: " + f.getName() +
"\n getParent: " + f.getParent() +
"\n getPath: " + f.getPath() +
"\n length: " + f.length() +
"\n lastModified: " + f.lastModified());
if(f.isFile())
System.out.println("It's a file");
else if(f.isDirectory())
System.out.println("It's a directory");
}
public static void main(String[] args) {
if(args.length < 1) usage();
if(args[0].equals("-r")) {
if(args.length != 3) usage();
File
old = new File(args[1]),
rname = new File(args[2]);
old.renameTo(rname);
fileData(old);
fileData(rname);
return; // Exit main
}
int count = 0;
boolean del = false;
if(args[0].equals("-d")) {
count++;
del = true;
}
count--;
while(++count < args.length) {
File f = new File(args[count]);
if(f.exists()) {
System.out.println(f + " exists");
if(del) {
System.out.println("deleting..." + f);
f.delete();
}
}
else { // Doesn't exist
if(!del) {
f.mkdirs();
System.out.println("created " + f);
}
}
fileData(f);
}
if(args.length == 1 &&
args[0].equals("MakeDirectoriesTest"))
monitor.expect(new String[] {
"%% (MakeDirectoriesTest exists"+
"|created MakeDirectoriesTest)",
"%% Absolute path: "
+ "\\S+MakeDirectoriesTest",
"%% Can read: (true|false)",
"%% Can write: (true|false)",
" getName: MakeDirectoriesTest",
" getParent: null",
" getPath: MakeDirectoriesTest",
"%% length: \\d+",
"%% lastModified: \\d+",
"It's a directory"
});
}
} ///:~
In fileData( ) you can see various file investigation methods used to
display information about the file or directory path. .
The first method thats exercised by main( ) is renameTo( ), which allows you to rename (or move) a file
to an entirely new path represented by the argument, which is another File object.
This also works with directories of any length. .
If you experiment with the preceding program, youll find that you can make a
directory path of any complexity, because mkdirs( )
will do all the work for you. .
I/O
libraries often use the abstraction of a stream, which represents any data source
or sink as an object capable of producing or receiving pieces of data. The stream hides
the details of what happens to the data inside the actual I/O device. .
The Java library classes for I/O are divided by input and output, as you can see by
looking at the class hierarchy in the JDK documentation. By inheritance, everything
derived from the InputStream or Reader classes have basic methods called read( )
for reading a single byte or array of bytes. Likewise, everything derived from OutputStream
or Writer classes have basic methods called write( ) for writing a
single byte or array of bytes. However, you wont generally use these methods; they
exist so that other classes can use themthese other classes provide a more useful
interface. Thus, youll rarely create your stream object by using a single class, but
instead will layer multiple objects together to provide your desired functionality. The
fact that you create more than one object to create a single resulting stream is the
primary reason that Javas stream library is confusing. .
Its helpful to categorize the classes by their functionality. In Java 1.0, the
library designers started by deciding that all classes that had anything to do with input
would be inherited from InputStream, and all classes that were associated with
output would be inherited from OutputStream. .
InputStreams job is to represent classes that produce input from different
sources. These sources can be:
Each of these has an associated subclass of InputStream. In addition, the FilterInputStream
is also a type of InputStream, to provide a base class for "decorator"
classes that attach attributes or useful interfaces to input streams. This is discussed
later. .
Table 12-1. Types of InputStream
| Function |
Constructor
Arguments |
|
|---|---|---|
| How to use
it |
||
| ByteArray-InputStream |
Allows a
buffer in memory to be used as an InputStream. |
The buffer
from which to extract the bytes. |
| As a source
of data: Connect it to a FilterInputStream object to provide a useful interface. |
||
| StringBuffer-InputStream |
Converts a String
into an InputStream. |
A String.
The underlying implementation actually uses a StringBuffer. |
| As a source
of data: Connect it to a FilterInputStream object to provide a useful interface. |
||
| File-InputStream |
For reading
information from a file. |
A String
representing the file name, or a File or FileDescriptor object. |
| As a source
of data: Connect it to a FilterInputStream object to provide a useful interface. |
||
| Piped-InputStream |
Produces the
data thats being written to the associated PipedOutput-Stream.
Implements the piping concept. |
PipedOutputStream |
| As a source
of data in multithreading: Connect it to a FilterInputStream object to provide a
useful interface. |
||
| Sequence-InputStream |
Converts two
or more InputStream objects into a single InputStream. |
Two InputStream
objects or an Enumeration for a container of InputStream objects. |
| As a source
of data: Connect it to a FilterInputStream object to provide a useful interface. |
||
| Filter-InputStream |
Abstract
class that is an interface for decorators that provide useful functionality to the other InputStream
classes. See Table 12-3. |
See Table
12-3. |
| See Table
12-3. |
This category includes the classes that
decide where your output will go: an array of bytes (no String, however;
presumably, you can create one using the array of bytes), a file, or a pipe. .
In addition, the FilterOutputStream provides a base class for
"decorator" classes that attach attributes or useful interfaces to output
streams. This is discussed later. .
Table 12-2. Types of OutputStream
| Function |
Constructor
Arguments |
|
|---|---|---|
| How to use
it |
||
| ByteArray-OutputStream |
Creates a
buffer in memory. All the data that you send to the stream is placed in this buffer. |
Optional
initial size of the buffer. |
| To designate
the destination of your data: Connect it to a FilterOutputStream object to provide
a useful interface. |
||
| File-OutputStream |
For sending
information to a file. |
A String
representing the file name, or a File or FileDescriptor object. |
| To designate
the destination of your data: Connect it to a FilterOutputStream object to provide
a useful interface. |
||
| Piped-OutputStream |
Any
information you write to this automatically ends up as input for the associated PipedInput-Stream.
Implements the piping concept. |
PipedInputStream |
| To designate
the destination of your data for multithreading: Connect it to a FilterOutputStream
object to provide a useful interface. |
||
| Filter-OutputStream |
Abstract
class that is an interface for decorators that provide useful functionality to the other OutputStream
classes. See Table 12-4. |
See Table
12-4. |
| See Table
12-4. |
The use of layered objects to dynamically and transparently add responsibilities to
individual objects is referred to as the Decorator
pattern. (Patterns[61] are the subject of Thinking
in Patterns (with Java) at www.BruceEckel.com.) The decorator pattern specifies
that all objects that wrap around your initial object have the same interface. This makes
the basic use of the decorators transparentyou send the same message to an object
whether it has been decorated or not. This is the reason for the existence of the
filter classes in the Java I/O library: The abstract filter class
is the base class for all the decorators. (A decorator must have the same interface as the
object it decorates, but the decorator can also extend the interface, which occurs in
several of the filter classes). .
Decorators are often used when simple subclassing results in a large number of classes
in order to satisfy every possible combination that is neededso many classes that it
becomes impractical. The Java I/O library requires many different combinations of
features, and this is the justification for using the decorator pattern.[62] There is a drawback to the decorator pattern,
however. Decorators give you much more flexibility while youre writing a program
(since you can easily mix and match attributes), but they add complexity to your code. The
reason that the Java I/O library is awkward to use is that you must create many
classesthe core I/O type plus all the decoratorsin order to get
the single I/O object that you want. .
The classes that provide the decorator interface to control a particular InputStream
or OutputStream are the FilterInputStream and FilterOutputStream,
which dont have very intuitive names. FilterInputStream and FilterOutputStream
are derived from the base classes of the I/O library, InputStream and OutputStream,
which is the key requirement of the decorator (so that it provides the common interface to
all the objects that are being decorated). .
The FilterInputStream classes accomplish two significantly different things. DataInputStream
allows you to read different types of primitive data as well as String objects.
(All the methods start with read, such as readByte( ), readFloat( ),
etc.) This, along with its companion DataOutputStream, allows you to move primitive
data from one place to another via a stream. These places are determined by
the classes in Table 12-1. .
The remaining classes modify the way an InputStream behaves internally: whether
its buffered or unbuffered, if it keeps track of the lines its reading
(allowing you to ask for line numbers or set the line number), and whether you can push
back a single character. The last two classes look a lot like support for building a
compiler (that is, they were probably added to support the construction of the Java
compiler), so you probably wont use them in general programming. .
Youll need to buffer your input almost every time, regardless of the I/O device
youre connecting to, so it would have made more sense for the I/O library to make a
special case (or simply a method call) for unbuffered input rather than buffered input. .
Table 12-3. Types of FilterInputStream
| Function |
Constructor
Arguments |
|
| How to use
it |
||
| Data-InputStream |
Used in
concert with DataOutputStream, so you can read primitives (int, char,
long, etc.) from a stream in a portable fashion. |
InputStream |
| Contains a
full interface to allow you to read primitive types. |
||
| Buffered-InputStream |
Use this to
prevent a physical read every time you want more data. Youre saying Use a
buffer. |
InputStream,
with optional buffer size. |
| This
doesnt provide an interface per se, just a requirement that a buffer be used.
Attach an interface object. |
||
| LineNumber-InputStream |
Keeps track
of line numbers in the input stream; you can call getLineNumber( ) and setLineNumber( |
InputStream |
| This just
adds line numbering, so youll probably attach an interface object. |
||
| Pushback-InputStream |
Has a one
byte push-back buffer so that you can push back the last character read. |
InputStream |
| Generally
used in the scanner for a compiler and probably included because the Java compiler needed
it. You probably wont use this. |
The complement to DataInputStream is DataOutputStream, which formats each
of the primitive types and String objects onto a stream in such a way that any DataInputStream,
on any machine, can read them. All the methods start with write, such as writeByte( ),
writeFloat( ), etc. .
The original intent of PrintStream was to print all of the primitive data types
and String objects in a viewable format. This is different from DataOutputStream,
whose goal is to put data elements on a stream in a way that DataInputStream can
portably reconstruct them. .
The two important methods in PrintStream are print( ) and println( ),
which are overloaded to print all the various types. The difference between print( )
and println( ) is that the latter adds a newline when its done. .
PrintStream can be problematic because it traps all IOExceptions (You
must explicitly test the error status with checkError( ), which returns true
if an error has occurred). Also, PrintStream doesnt internationalize properly
and doesnt handle line breaks in a platform-independent way (these problems are
solved with PrintWriter, described later). .
BufferedOutputStream is a modifier and tells the stream to use buffering so you
dont get a physical write every time you write to the stream. Youll probably
always want to use this when doing output. .
Table 12-4. Types of FilterOutputStream
| Function |
Constructor
Arguments |
|
|---|---|---|
| How to use
it |
||
| Data-OutputStream |
Used in
concert with DataInputStream so you can write primitives (int, char, long, etc.) to
a stream in a portable fashion. |
OutputStream |
| Contains full
interface to allow you to write primitive types. |
||
| PrintStream |
For producing
formatted output. While DataOutputStream handles the storage of data, PrintStream
handles display. |
OutputStream,
with optional boolean indicating that the buffer is flushed with every newline. |
| Should be the
final wrapping for your OutputStream object. Youll probably use
this a lot. |
||
| Buffered-OutputStream |
Use this to
prevent a physical write every time you send a piece of data. Youre saying Use
a buffer. You can call flush( ) to flush the buffer. |
OutputStream,
with optional buffer size. |
| This
doesnt provide an interface per se, just a requirement that a buffer is used.
Attach an interface object. |
Java 1.1 made some significant modifications to the fundamental
I/O stream library. When you see the Reader
and Writer classes, your first
thought (like mine) might be that these were meant to replace the InputStream and OutputStream
classes. But thats not the case. Although some aspects of the original streams
library are deprecated (if you use them you will receive a warning from the compiler), the
InputStream and OutputStream classes still provide valuable functionality in
the form of byte-oriented I/O, whereas the Reader and Writer classes
provide Unicode-compliant, character-based I/O. In addition: .
The most important reason for the Reader and Writer hierarchies is for
internationalization. The old I/O stream hierarchy supports only 8-bit byte streams and
doesnt handle the 16-bit Unicode characters well. Since Unicode is used for
internationalization (and Javas native char
is 16-bit Unicode), the Reader
and Writer hierarchies were added to support Unicode in all I/O operations. In
addition, the new libraries are designed for faster operations than the old. .
As is the practice in this book, I will attempt to provide an overview of the classes,
but assume that you will use the JDK documentation to determine all the details, such as
the exhaustive list of methods. .
Almost all of the original Java I/O stream classes have corresponding Reader and
Writer classes to provide native Unicode manipulation. However, there are some
places where the byte-oriented InputStreams and OutputStreams are the
correct solution; in particular, the java.util.zip libraries are byte-oriented
rather than char-oriented. So the most sensible approach to take is to try
to use the Reader and Writer classes whenever you can, and youll
discover the situations when you have to use the byte-oriented libraries, because
your code wont compile. .
Here is a table that shows the correspondence between the sources and sinks of
information (that is, where the data physically comes from or goes to) in the two
hierarchies.
| Corresponding
Java 1.1 class |
|
| InputStream |
Reader |
| OutputStream |
Writer |
| FileInputStream |
FileReader |
| FileOutputStream |
FileWriter |
| StringBufferInputStream |
StringReader |
| (no
corresponding class) |
StringWriter |
| ByteArrayInputStream |
CharArrayReader |
| ByteArrayOutputStream |
CharArrayWriter |
| PipedInputStream |
PipedReader |
| PipedOutputStream |
PipedWriter |
In general, youll find that the interfaces for the two
different hierarchies are similar if not identical.
For InputStreams and OutputStreams, streams were adapted for particular
needs using decorator subclasses of FilterInputStream and FilterOutputStream.
The Reader and Writer class hierarchies continue the use of this
ideabut not exactly. .
In the following table, the correspondence is a rougher approximation than in the
previous table. The difference is because of the class organization; although BufferedOutputStream
is a subclass of FilterOutputStream, BufferedWriter is not a subclass
of FilterWriter (which, even though it is abstract, has no subclasses and so
appears to have been put in either as a placeholder or simply so you wouldnt wonder
where it was). However, the interfaces to the classes are quite a close match.
| Corresponding
Java 1.1 class |
|
|---|---|
| FilterInputStream |
FilterReader |
| FilterOutputStream |
FilterWriter
(abstract class with no subclasses) |
| BufferedInputStream |
BufferedReader |
| BufferedOutputStream |
BufferedWriter |
| DataInputStream |
Use
DataInputStream |
| PrintStream |
PrintWriter |
| LineNumberInputStream |
LineNumberReader |
| StreamTokenizer |
StreamTokenizer |
| PushBackInputStream |
PushBackReader |
Theres one direction thats quite clear: Whenever you
want to use readLine( ), you shouldnt do it with a DataInputStream
(this is met with a deprecation message at compile time), but instead use a BufferedReader.
Other than this, DataInputStream is still a preferred member of the I/O
library.
To make the transition to using a PrintWriter easier, it has constructors that
take any OutputStream object as well as Writer objects. However, PrintWriter
has no more support for formatting than PrintStream does; the interfaces are
virtually the same. .
The PrintWriter constructor also has an option to perform automatic flushing,
which happens after every println( ) if the constructor flag is set. .
Some classes were left unchanged between Java 1.0 and Java 1.1:
| DataOutputStream |
| File |
| RandomAccessFile |
| SequenceInputStream |
DataOutputStream, in particular, is used without change,
so for storing and retrieving data in a transportable format, you use the InputStream
and OutputStream hierarchies.
RandomAccessFile is used for
files containing records of known size so that you can move from one record to another
using seek( ), then read or
change the records. The records dont have to be the same size; you just have to be
able to determine how big they are and where they are placed in the file. .
At first its a little bit hard to believe that RandomAccessFile is not
part of the InputStream or OutputStream hierarchy. However, it has no
association with those hierarchies other than that it happens to implement the DataInput and DataOutput interfaces (which are also implemented by DataInputStream
and DataOutputStream). It doesnt even use any of the functionality of the
existing InputStream or OutputStream classes; its a completely
separate class, written from scratch, with all of its own (mostly native) methods. The
reason for this may be that RandomAccessFile has essentially different behavior
than the other I/O types, since you can move forward and backward within a file. In any
event, it stands alone, as a direct descendant of Object. .
Essentially, a RandomAccessFile works like a DataInputStream pasted
together with a DataOutputStream, along with the methods getFilePointer( )
to find out where you are in the file, seek( ) to move to a new point in the
file, and length( ) to determine the maximum size of the file. In addition,
the constructors require a second argument (identical to fopen( ) in C)
indicating whether you are just randomly reading (r) or reading and
writing (rw). Theres no support for write-only files, which could
suggest that RandomAccessFile might have worked well if it were inherited from DataInputStream.
.
The seeking methods are available only in RandomAccessFile, which works for
files only. BufferedInputStream does allow you to mark( ) a position (whose value is held in a single
internal variable) and reset( )
to that position, but this is limited and not very useful. .
Most, if not all, of the RandomAccessFile functionality is superceded in JDK 1.4
with the nio memory-mapped files, which will be described later in this
chapter.
Although you can combine the I/O stream classes in many different ways, youll
probably just use a few combinations. The following example can be used as a basic
reference; it shows the creation and use of typical I/O configurations. Note that each
configuration begins with a commented number and title that corresponds to the heading for
the appropriate explanation that follows in the text.
//: c12:IOStreamDemo.java
// Typical I/O stream configurations.
// {RunByHand}
// {Clean: IODemo.out,Data.txt,rtest.dat}
import com.bruceeckel.simpletest.*;
import java.io.*;
public class IOStreamDemo {
private static Test monitor = new Test();
// Throw exceptions to console:
public static void main(String[] args)
throws IOException {
// 1. Reading input by lines:
BufferedReader in = new BufferedReader(
new FileReader("IOStreamDemo.java"));
String s, s2 = new String();
while((s = in.readLine())!= null)
s2 += s + "\n";
in.close();
// 1b. Reading standard input:
BufferedReader stdin = new BufferedReader(
new InputStreamReader(System.in));
System.out.print("Enter a line:");
System.out.println(stdin.readLine());
// 2. Input from memory
StringReader in2 = new StringReader(s2);
int c;
while((c = in2.read()) != -1)
System.out.print((char)c);
// 3. Formatted memory input
try {
DataInputStream in3 = new DataInputStream(
new ByteArrayInputStream(s2.getBytes()));
while(true)
System.out.print((char)in3.readByte());
} catch(EOFException e) {
System.err.println("End of stream");
}
// 4. File output
try {
BufferedReader in4 = new BufferedReader(
new StringReader(s2));
PrintWriter out1 = new PrintWriter(
new BufferedWriter(new FileWriter("IODemo.out")));
int lineCount = 1;
while((s = in4.readLine()) != null )
out1.println(lineCount++ + ": " + s);
out1.close();
} catch(EOFException e) {
System.err.println("End of stream");
}
// 5. Storing & recovering data
try {
DataOutputStream out2 = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream("Data.txt")));
out2.writeDouble(3.14159);
out2.writeUTF("That was pi");
out2.writeDouble(1.41413);
out2.writeUTF("Square root of 2");
out2.close();
DataInputStream in5 = new DataInputStream(
new BufferedInputStream(
new FileInputStream("Data.txt")));
// Must use DataInputStream for data:
System.out.println(in5.readDouble());
// Only readUTF() will recover the
// Java-UTF String properly:
System.out.println(in5.readUTF());
// Read the following double and String:
System.out.println(in5.readDouble());
System.out.println(in5.readUTF());
} catch(EOFException e) {
throw new RuntimeException(e);
}
// 6. Reading/writing random access files
RandomAccessFile rf =
new RandomAccessFile("rtest.dat", "rw");
for(int i = 0; i < 10; i++)
rf.writeDouble(i*1.414);
rf.close();
rf = new RandomAccessFile("rtest.dat", "rw");
rf.seek(5*8);
rf.writeDouble(47.0001);
rf.close();
rf = new RandomAccessFile("rtest.dat", "r");
for(int i = 0; i < 10; i++)
System.out.println("Value " + i + ": " +
rf.readDouble());
rf.close();
monitor.expect("IOStreamDemo.out");
}
} ///:~
Here are the descriptions for the numbered sections of the program: .
Parts 1 through 4 demonstrate the creation and use of input streams. Part 4 also shows
the simple use of an output stream.
To open a file for character input, you use a FileInputReader with a String or a File object
as the file name. For speed, youll want that file to be buffered so you give the
resulting reference to the constructor for a BufferedReader.
Since BufferedReader also provides the readLine( ) method, this is your
final object and the interface you read from. When you reach the end of the file, readLine( )
returns null so that is used to break out of the while loop. .
The String s2 is used to accumulate the entire contents of the file (including
newlines that must be added since readLine( ) strips them off). s2 is
then used in the later portions of this program. Finally, close( ) is called
to close the file. Technically, close( ) will be called when finalize( )
runs, and this is supposed to happen (whether or not garbage collection occurs) as the
program exits. However, this has been inconsistently implemented, so the only safe
approach is to explicitly call close( )
for files. .
Section 1b shows how you can wrap System.in
for reading console input. System.in is an InputStream, and BufferedReader
needs a Reader argument, so InputStreamReader is brought in to perform the
adaptation. .
This section takes the String s2 that now contains the entire contents of the
file and uses it to create a StringReader.
Then read( ) is used to read each character one at a time and send it out to
the console. Note that read( ) returns the next byte as an int and thus
it must be cast to a char to print properly. .
To read formatted data, you use a DataInputStream, which is a byte-oriented I/O class
(rather than char-oriented). Thus you must use all InputStream classes
rather than Reader classes. Of course, you can read anything (such as a file) as
bytes using InputStream classes, but here a String is used. To convert the String
to an array of bytes, which is what is appropriate for a ByteArrayInputStream, String
has a getBytes( ) method to do the job. At that point,
you have an appropriate InputStream to hand to DataInputStream. .
If you read the characters from a DataInputStream one byte at a time using readByte( ),
any byte value is a legitimate result, so the return value cannot be used to detect the
end of input. Instead, you can use the available( )
method to find out how many more characters are available. Heres an example that
shows how to read a file one byte at a time:
//: c12:TestEOF.java
// Testing for end of file while reading a byte at a time.
import java.io.*;
public class TestEOF {
// Throw exceptions to console:
public static void main(String[] args)
throws IOException {
DataInputStream in = new DataInputStream(
new BufferedInputStream(
new FileInputStream("TestEOF.java")));
while(in.available() != 0)
System.out.print((char)in.readByte());
}
} ///:~
Note that available( ) works differently depending on what sort of medium
youre reading from; its literally the number of bytes that can be read without
blocking. With a file, this means the whole file, but with a different kind of
stream this might not be true, so use it thoughtfully. .
You could also detect the end of input
in cases like these by catching an exception. However, the use of exceptions for control
flow is considered a misuse of that feature. .
This example also shows how to write data to a file. First, a FileWriter is created to connect to the file. Youll
virtually always want to buffer the output by wrapping it in a BufferedWriter (try removing this wrapping to see the impact
on the performancebuffering tends to dramatically increase performance of I/O
operations). Then for the formatting its turned into a PrintWriter. The data file created this way is readable as an
ordinary text file. .
As the lines are written to the file, line numbers are added. Note that LineNumberInputStream
is not used, because its a silly class and you dont need it. As shown
here, its trivial to keep track of your own line numbers. .
When the input stream is exhausted, readLine( ) returns null.
Youll see an explicit close( ) for out1, because if you
dont call close( ) for all your output files, you might discover that
the buffers dont get flushed, so theyre incomplete. .
The two primary kinds of output streams are separated by the way they write data; one
writes it for human consumption, and the other writes it to be reacquired by a DataInputStream. The RandomAccessFile stands alone,
although its data format is compatible with the DataInputStream and DataOutputStream. .
A PrintWriter formats data so
that its readable by a human. However, to output data for recovery by another
stream, you use a DataOutputStream to write the data and a DataInputStream
to recover the data. Of course, these streams could be anything, but here a file is used,
buffered for both reading and writing. DataOutputStream and DataInputStream
are byte-oriented and thus require the InputStreams and OutputStreams.
.
If you use a DataOutputStream to write the data, then Java guarantees that you
can accurately recover the data using a DataInputStreamregardless of what
different platforms write and read the data. This is incredibly valuable, as anyone knows
who has spent time worrying about platform-specific data issues. That problem vanishes if
you have Java on both platforms.[63] .
When using a DataOutputStream, the only reliable way to write
a String so that it can be recovered by a DataInputStream is to use UTF-8
encoding, accomplished in section 5 of the example using writeUTF( ) and readUTF( ).
UTF-8 is a variation on Unicode, which stores all characters in two bytes. If youre
working with ASCII or mostly ASCII characters (which occupy only seven bits), this is a
tremendous waste of space and/or bandwidth, so UTF-8 encodes ASCII characters in a single
byte, and non-ASCII characters in two or three bytes. In addition, the length of the
string is stored in the first two bytes. However, writeUTF( ) and readUTF( )
use a special variation of UTF-8 for Java (which is completely described in the JDK
documentation for those methods) , so if you read a string written with writeUTF( )
using a non-Java program, you must write special code in order to read the string
properly. .
With writeUTF( ) and readUTF( ), you can intermingle Strings
and other types of data using a DataOutputStream with the knowledge that the Strings
will be properly stored as Unicode, and will be easily recoverable with a DataInputStream.
.
The writeDouble( ) stores
the double number to the stream and the complementary readDouble( ) recovers it (there are similar methods for
reading and writing the other types). But for any of the reading methods to work
correctly, you must know the exact placement of the data item in the stream, since it
would be equally possible to read the stored double as a simple sequence of bytes,
or as a char, etc. So you must either have a fixed format for the data in the file,
or extra information must be stored in the file that you parse to determine where the data
is located. Note that object serialization (described later in this chapter) may be an
easier way to store and retrieve complex data structures. .
As previously noted, the RandomAccessFile is almost totally isolated from the
rest of the I/O hierarchy, save for the fact that it implements the DataInput and DataOutput
interfaces. So you cannot combine it with any of the aspects of the InputStream and
OutputStream subclasses. Even though it might make sense to treat a ByteArrayInputStream
as a random-access element, you can use RandomAccessFile only to open a file. You
must assume a RandomAccessFile is properly buffered since you cannot add that. .
The one option you have is in the second constructor argument: you can open a RandomAccessFile
to read (r) or read and write (rw). .
Using a RandomAccessFile is like using a combined DataInputStream and DataOutputStream
(because it implements the equivalent interfaces). In addition, you can see that seek( ) is used to move about in
the file and change one of the values. .
With the advent of new I/O in JDK 1.4, you may want to consider using memory-mapped
files instead of RandomAccessFile. .
The PipedInputStream, PipedOutputStream,
PipedReader and PipedWriter have been mentioned only briefly in this
chapter. This is not to suggest that they arent useful, but their value is not
apparent until you begin to understand multithreading, since the piped streams are used to
communicate between threads. This is covered along with an example in Chapter 13. .
A very common programming task is to read a file into memory, modify it, and then write
it out again. One of the problems with the Java I/O library is that it requires you to
write quite a bit of code in order to perform these common operationsthere are no
basic helper function to do them for you. Whats worse, the decorators make it rather
hard to remember how to open files. Thus, it makes sense to add helper classes to your
library that will easily perform these basic tasks for you. Heres one that contains static
methods to read and write text files as a single string. In addition, you can create a TextFile
class that holds the lines of the file in an ArrayList (so you have all the ArrayList
functionality available while manipulating the file contents): .
//: com:bruceeckel:util:TextFile.java
// Static functions for reading and writing text files as
// a single string, and treating a file as an ArrayList.
// {Clean: test.txt test2.txt}
package com.bruceeckel.util;
import java.io.*;
import java.util.*;
public class TextFile extends ArrayList {
// Tools to read and write files as single strings:
public static String
read(String fileName) throws IOException {
StringBuffer sb = new StringBuffer();
BufferedReader in =
new BufferedReader(new FileReader(fileName));
String s;
while((s = in.readLine()) != null) {
sb.append(s);
sb.append("\n");
}
in.close();
return sb.toString();
}
public static void
write(String fileName, String text) throws IOException {
PrintWriter out = new PrintWriter(
new BufferedWriter(new FileWriter(fileName)));
out.print(text);
out.close();
}
public TextFile(String fileName) throws IOException {
super(Arrays.asList(read(fileName).split("\n")));
}
public void write(String fileName) throws IOException {
PrintWriter out = new PrintWriter(
new BufferedWriter(new FileWriter(fileName)));
for(int i = 0; i < size(); i++)
out.println(get(i));
out.close();
}
// Simple test:
public static void main(String[] args) throws Exception {
String file = read("TextFile.java");
write("test.txt", file);
TextFile text = new TextFile("test.txt");
text.write("test2.txt");
}
} ///:~
All methods simply pass IOExceptions out to the caller. read( )
appends each line to a StringBuffer (for efficiency) followed by a newline, because
that is stripped out during reading. Then it returns a String containing the whole
file. Write( ) opens and writes the text to the file. Both methods remember to
close( ) the file when they are done. .
The constructor uses the read( ) method to turn the file into a String,
then uses String.split( ) to divide the result into lines along newline
boundaries (if you use this class a lot, you may want to rewrite this constructor to
improve efficiency). Alas, there is no corresponding join method, so the non-static
write( ) method must write the lines out by hand. .
In main( ), a basic test is performed to ensure that the methods work.
Although this is a small amount of code, using it can save a lot of time and make your
life easier, as youll see in some of the examples later in this chapter. .
The term standard I/O refers to the
Unix concept (which is reproduced in some form in Windows and many other operating
systems) of a single stream of information that is used by a program. All the
programs input can come from standard input, all its output can go to standard
output, and all of its error messages can be sent to standard error. The value
of standard I/O is that programs can easily be chained together, and one programs
standard output can become the standard input for another program. This is a powerful
tool. .
Following the standard I/O model, Java has System.in, System.out, and System.err.
Throughout this book, youve seen how to write to standard output using System.out,
which is already prewrapped as a PrintStream object. System.err is likewise
a PrintStream, but System.in is a raw InputStream with no wrapping.
This means that although you can use System.out and System.err right away, System.in
must be wrapped before you can read from it. .
Typically, youll want to read input a line at a time using readLine( ),
so youll want to wrap System.in in a BufferedReader. To do this, you
must convert System.in to a Reader using InputStreamReader.
Heres an example that simply echoes each line that you type in:
//: c12:Echo.java
// How to read from standard input.
// {RunByHand}
import java.io.*;
public class Echo {
public static void main(String[] args)
throws IOException {
BufferedReader in = new BufferedReader(
new InputStreamReader(System.in));
String s;
while((s = in.readLine()) != null && s.length() != 0)
System.out.println(s);
// An empty line or Ctrl-Z terminates the program
}
} ///:~
The reason for the exception specification is that readLine( ) can throw an IOException. Note that System.in
should usually be buffered, as with most streams. .
System.out is a PrintStream, which is an OutputStream.
PrintWriter has a constructor that takes an OutputStream as an argument. Thus,
if you want, you can convert System.out into a PrintWriter using that
constructor:
//: c12:ChangeSystemOut.java
// Turn System.out into a PrintWriter.
import com.bruceeckel.simpletest.*;
import java.io.*;
public class ChangeSystemOut {
private static Test monitor = new Test();
public static void main(String[] args) {
PrintWriter out = new PrintWriter(System.out, true);
out.println("Hello, world");
monitor.expect(new String[] {
"Hello, world"
});
}
} ///:~
Its important to use the two-argument version of the PrintWriter
constructor and to set the second argument to true in order to enable automatic
flushing; otherwise, you may not see the output. .
The Java System class allows you
to redirect the standard input, output, and error I/O streams using simple static method
calls:
setIn(InputStream)
setOut(PrintStream)
setErr(PrintStream) .
Redirecting output is especially useful if you suddenly start creating a large amount
of output on your screen, and its scrolling past faster than you can read it.[64] Redirecting input is valuable for a command-line
program in which you want to test a particular user-input sequence repeatedly. Heres
a simple example that shows the use of these methods:
//: c12:Redirecting.java
// Demonstrates standard I/O redirection.
// {Clean: test.out}
import java.io.*;
public class Redirecting {
// Throw exceptions to console:
public static void main(String[] args)
throws IOException {
PrintStream console = System.out;
BufferedInputStream in = new BufferedInputStream(
new FileInputStream("Redirecting.java"));
PrintStream out = new PrintStream(
new BufferedOutputStream(
new FileOutputStream("test.out")));
System.setIn(in);
System.setOut(out);
System.setErr(out);
BufferedReader br = new BufferedReader(
new InputStreamReader(System.in));
String s;
while((s = br.readLine()) != null)
System.out.println(s);
out.close(); // Remember this!
System.setOut(console);
}
} ///:~
This program attaches standard input to a file and redirects standard output and
standard error to another file. .
I/O redirection manipulates streams of bytes, not streams of characters, thus InputStreams
and OutputStreams are used rather than Readers and Writers. .
The Java
new I/O library, introduced in JDK 1.4 in the java.nio.* packages, has
one goal: speed. In fact, the old I/O packages have been reimplemented using nio in order to take advantage of this speed increase, so you will
benefit even if you dont explicitly write code with nio. The speed increase
occurs in both file I/O, which is explored here,[65]
and in network I/O, which is covered in Thinking
in Enterprise Java. .
The speed comes from using structures that are closer to the operating systems
way of performing I/O: channels and buffers. You could think of it as a coal
mine; the channel is the mine containing the seam of coal (the data), and the buffer is
the cart that you send into the mine. The cart comes back full of coal, and you get the
coal from the cart. That is, you dont interact directly with the channel; you
interact with the buffer and send the buffer into the channel. The channel either pulls
data from the buffer, or puts data into the buffer. .
The only kind of buffer that communicates directly with a channel is
a ByteBufferthat is, a buffer that holds raw bytes.
If you look at the JDK documentation for java.nio.ByteBuffer, youll see that
its fairly basic: You create one by telling it how much storage to allocate, and
there are a selection of methods to put and get data, in either raw byte form or as
primitive data types. But theres no way to put or get an object, or even a String.
Its fairly low-level, precisely because this makes a more efficient mapping with
most operating systems. .
Three of the classes in the old I/O have been modified so that they produce
a FileChannel: FileInputStream, FileOutputStream,
and, for both reading and writing, RandomAccessFile. Notice that these are the byte
manipulation streams, in keeping with the low-level nature of nio. The Reader
and Writer character-mode classes do not produce channels, but the class
java.nio.channels.Channels has utility methods to produce Readers and Writers
from channels. .
Heres a simple example that exercises all three types of stream to produce
channels that are writeable, read/writeable, and readable:
//: c12:GetChannel.java
// Getting channels from streams
// {Clean: data.txt}
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class GetChannel {
private static final int BSIZE = 1024;
public static void main(String[] args) throws Exception {
// Write a file:
FileChannel fc =
new FileOutputStream("data.txt").getChannel();
fc.write(ByteBuffer.wrap("Some text ".getBytes()));
fc.close();
// Add to the end of the file:
fc =
new RandomAccessFile("data.txt", "rw").getChannel();
fc.position(fc.size()); // Move to the end
fc.write(ByteBuffer.wrap("Some more".getBytes()));
fc.close();
// Read the file:
fc = new FileInputStream("data.txt").getChannel();
ByteBuffer buff = ByteBuffer.allocate(BSIZE);
fc.read(buff);
buff.flip();
while(buff.hasRemaining())
System.out.print((char)buff.get());
}
} ///:~
For any of the stream classes shown here, getChannel( )
will produce a FileChannel. A channel is fairly basic: You can hand it a ByteBuffer
for reading or writing, and you can lock regions of the file for exclusive access (this
will be described later). .
One way to put bytes into a ByteBuffer is to stuff them in directly using one of
the put methods, to put one or more bytes, or values of primitive types.
However, as seen here, you can also wrap an existing byte array in a ByteBuffer
using the wrap( ) method. When you do this, the underlying array is not
copied, but instead is used as the storage for the generated ByteBuffer. We say
that the ByteBuffer is backed by the array. .
The data.txt file is reopened using a RandomAccessFile.
Notice that you can move the FileChannel around in the file; here, it is moved to
the end so that additional writes will be appended. .
For read-only access, you must explicitly allocate a ByteBuffer using the static
allocate( ) method. The goal of nio is to
rapidly move large amounts of data, so the size of the ByteBuffer should be
significantin fact, the 1K used here is probably quite a bit smaller than youd
normally want to use (youll have to experiment with your working application to find
the best size). .
Its also possible to go for even more speed by using allocateDirect( )
instead of allocate( ) to produce a direct buffer that may have an
even higher coupling with the operating system. However, the overhead in such an
allocation is greater, and the actual implementation varies from one operating system to
another, so again, you must experiment with your working application to discover whether
direct buffers will buy you any advantage in speed. .
Once you call read( ) to tell the FileChannel
to store bytes into the ByteBuffer, you must call flip( )
on the buffer to tell it to get ready to have its bytes extracted (yes, this seems a bit
crude, but remember that its very low-level and is done for maximum speed). And if
we were to use the buffer for further read( ) operations, wed also have
to call clear( ) to prepare it for each read( ).
You can see this in a simple file copying program: .
//: c12:ChannelCopy.java
// Copying a file using channels and buffers
// {Args: ChannelCopy.java test.txt}
// {Clean: test.txt}
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class ChannelCopy {
private static final int BSIZE = 1024;
public static void main(String[] args) throws Exception {
if(args.length != 2) {
System.out.println("arguments: sourcefile destfile");
System.exit(1);
}
FileChannel
in = new FileInputStream(args[0]).getChannel(),
out = new FileOutputStream(args[1]).getChannel();
ByteBuffer buffer = ByteBuffer.allocate(BSIZE);
while(in.read(buffer) != -1) {
buffer.flip(); // Prepare for writing
out.write(buffer);
buffer.clear(); // Prepare for reading
}
}
} ///:~
You can see that one FileChannel is opened for reading, and one for writing. A ByteBuffer
is allocated, and when FileChannel.read( ) returns -1 (a holdover, no
doubt, from Unix and C), it means that youve reached the end of the input. After
each read( ), which puts data into the buffer, flip( ) prepares
the buffer so that its information can be extracted by the write( ).
After the write( ), the information is still in the buffer, and clear( )
resets all the internal pointers so that its ready to accept data during another read( ).
.
The preceding program is not the ideal way to handle this kind of operation, however.
Special methods transferTo( ) and transferFrom( )
allow you to connect one channel directly to another: .
//: c12:TransferTo.java
// Using transferTo() between channels
// {Args: TransferTo.java TransferTo.txt}
// {Clean: TransferTo.txt}
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class TransferTo {
public static void main(String[] args) throws Exception {
if(args.length != 2) {
System.out.println("arguments: sourcefile destfile");
System.exit(1);
}
FileChannel
in = new FileInputStream(args[0]).getChannel(),
out = new FileOutputStream(args[1]).getChannel();
in.transferTo(0, in.size(), out);
// Or:
// out.transferFrom(in, 0, in.size());
}
} ///:~
You wont do this kind of thing very often, but its good to know about. .
If you look back at GetChannel.java, youll notice that, to print the
information in the file, we are pulling the data out one byte at a time and casting
each byte to a char. This seems a bit primitiveif you look at the java.nio.CharBuffer
class, youll see that it has a toString( ) method that says:
Returns a string containing the characters in this buffer. Since a ByteBuffer
can be viewed as a CharBuffer with the asCharBuffer( )
method, why not use that? As you can see from the first line in the expect( )
statement below, this doesnt work out: .
//: c12:BufferToText.java
// Converting text to and from ByteBuffers
// {Clean: data2.txt}
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import com.bruceeckel.simpletest.*;
public class BufferToText {
private static Test monitor = new Test();
private static final int BSIZE = 1024;
public static void main(String[] args) throws Exception {
FileChannel fc =
new FileOutputStream("data2.txt").getChannel();
fc.write(ByteBuffer.wrap("Some text".getBytes()));
fc.close();
fc = new FileInputStream("data2.txt").getChannel();
ByteBuffer buff = ByteBuffer.allocate(BSIZE);
fc.read(buff);
buff.flip();
// Doesn't work:
System.out.println(buff.asCharBuffer());
// Decode using this system's default Charset:
buff.rewind();
String encoding = System.getProperty("file.encoding");
System.out.println("Decoded using " + encoding + ": "
+ Charset.forName(encoding).decode(buff));
// Or, we could encode with something that will print:
fc = new FileOutputStream("data2.txt").getChannel();
fc.write(ByteBuffer.wrap(
"Some text".getBytes("UTF-16BE")));
fc.close();
// Now try reading again:
fc = new FileInputStream("data2.txt").getChannel();
buff.clear();
fc.read(buff);
buff.flip();
System.out.println(buff.asCharBuffer());
// Use a CharBuffer to write through:
fc = new FileOutputStream("data2.txt").getChannel();
buff = ByteBuffer.allocate(24); // More than needed
buff.asCharBuffer().put("Some text");
fc.write(buff);
fc.close();
// Read and display:
fc = new FileInputStream("data2.txt").getChannel();
buff.clear();
fc.read(buff);
buff.flip();
System.out.println(buff.asCharBuffer());
monitor.expect(new String[] {
"????",
"%% Decoded using [A-Za-z0-9_\\-]+: Some text",
"Some text",
"Some text\0\0\0"
});
}
} ///:~
The buffer contains plain bytes, and to turn these into characters we must either encode
them as we put them in (so that they will be meaningful when they come out) or decode
them as they come out of the buffer. This can be accomplished using the java.nio.charset.Charset
class, which provides tools for encoding into many different types of character sets: .
//: c12:AvailableCharSets.java
// Displays Charsets and aliases
import java.nio.charset.*;
import java.util.*;
import com.bruceeckel.simpletest.*;
public class AvailableCharSets {
private static Test monitor = new Test();
public static void main(String[] args) {
Map charSets = Charset.availableCharsets();
Iterator it = charSets.keySet().iterator();
while(it.hasNext()) {
String csName = (String)it.next();
System.out.print(csName);
Iterator aliases = ((Charset)charSets.get(csName))
.aliases().iterator();
if(aliases.hasNext())
System.out.print(": ");
while(aliases.hasNext()) {
System.out.print(aliases.next());
if(aliases.hasNext())
System.out.print(", ");
}
System.out.println();
}
monitor.expect(new String[] {
"Big5: csBig5",
"Big5-HKSCS: big5-hkscs, Big5_HKSCS, big5hkscs",
"EUC-CN",
"EUC-JP: eucjis, x-eucjp, csEUCPkdFmtjapanese, " +
"eucjp, Extended_UNIX_Code_Packed_Format_for" +
"_Japanese, x-euc-jp, euc_jp",
"euc-jp-linux: euc_jp_linux",
"EUC-KR: ksc5601, 5601, ksc5601_1987, ksc_5601, " +
"ksc5601-1987, euc_kr, ks_c_5601-1987, " +
"euckr, csEUCKR",
"EUC-TW: cns11643, euc_tw, euctw",
"GB18030: gb18030-2000",
"GBK: GBK",
"ISCII91: iscii, ST_SEV_358-88, iso-ir-153, " +
"csISO153GOST1976874",
"ISO-2022-CN-CNS: ISO2022CN_CNS",
"ISO-2022-CN-GB: ISO2022CN_GB",
"ISO-2022-KR: ISO2022KR, csISO2022KR",
"ISO-8859-1: iso-ir-100, 8859_1, ISO_8859-1, " +
"ISO8859_1, 819, csISOLatin1, IBM-819, " +
"ISO_8859-1:1987, latin1, cp819, ISO8859-1, " +
"IBM819, ISO_8859_1, l1",
"ISO-8859-13",
"ISO-8859-15: 8859_15, csISOlatin9, IBM923, cp923," +
" 923, L9, IBM-923, ISO8859-15, LATIN9, " +
"ISO_8859-15, LATIN0, csISOlatin0, " +
"ISO8859_15_FDIS, ISO-8859-15",
"ISO-8859-2", "ISO-8859-3", "ISO-8859-4",
"ISO-8859-5", "ISO-8859-6", "ISO-8859-7",
"ISO-8859-8", "ISO-8859-9",
"JIS0201: X0201, JIS_X0201, csHalfWidthKatakana",
"JIS0208: JIS_C6626-1983, csISO87JISX0208, x0208, " +
"JIS_X0208-1983, iso-ir-87",
"JIS0212: jis_x0212-1990, x0212, iso-ir-159, " +
"csISO159JISC02121990",
"Johab: ms1361, ksc5601_1992, ksc5601-1992",
"KOI8-R",
"Shift_JIS: shift-jis, x-sjis, ms_kanji, " +
"shift_jis, csShiftJIS, sjis, pck",
"TIS-620",
"US-ASCII: IBM367, ISO646-US, ANSI_X3.4-1986, " +
"cp367, ASCII, iso_646.irv:1983, 646, us, iso-ir-6,"+
" csASCII, ANSI_X3.4-1968, ISO_646.irv:1991",
"UTF-16: UTF_16",
"UTF-16BE: X-UTF-16BE, UTF_16BE, ISO-10646-UCS-2",
"UTF-16LE: UTF_16LE, X-UTF-16LE",
"UTF-8: UTF8", "windows-1250", "windows-1251",
"windows-1252: cp1252",
"windows-1253", "windows-1254", "windows-1255",
"windows-1256", "windows-1257", "windows-1258",
"windows-936: ms936, ms_936",
"windows-949: ms_949, ms949", "windows-950: ms950",
});
}
} ///:~
So, returning to BufferToText.java, if you rewind( )
the buffer (to go back to the beginning of the data) and then use that platforms
default character set to decode( ) the data, the
resulting CharBuffer will print to the console just fine. To discover the default
character set, use System.getProperty("file.encoding"), which produces
the string that names the character set. Passing this to Charset.forName( )
produces the Charset object that can be used to decode the string. .
Another alternative is to encode( ) using a
character set that will result in something printable when the file is read, as you see in
the third part of BufferToText.java. Here, UTF-16BE is used to write the text into
the file, and when it is read, all you have to do is convert it to a CharBuffer,
and it produces the expected text. .
Finally, you see what happens if you write to the ByteBuffer through a CharBuffer
(youll learn more about this later). Note that 24 bytes are allocated for the ByteBuffer.
Since each char requires two bytes, this is enough for 12 chars, but
Some text only has 9. The remaining zero bytes still appear in the
representation of the CharBuffer produced by its toString( ), as you
can see in the output. .
Although a ByteBuffer only holds bytes, it contains methods to produce each of
the different types of primitive values from the bytes it contains. This example shows the
insertion and extraction of various values using these methods: .
//: c12:GetData.java
// Getting different representations from a ByteBuffer
import java.nio.*;
import com.bruceeckel.simpletest.*;
public class GetData {
private static Test monitor = new Test();
private static final int BSIZE = 1024;
public static void main(String[] args) {
ByteBuffer bb = ByteBuffer.allocate(BSIZE);
// Allocation automatically zeroes the ByteBuffer:
int i = 0;
while(i++ < bb.limit())
if(bb.get() != 0)
System.out.println("nonzero");
System.out.println("i = " + i);
bb.rewind();
// Store and read a char array:
bb.asCharBuffer().put("Howdy!");
char c;
while((c = bb.getChar()) != 0)
System.out.print(c + " ");
System.out.println();
bb.rewind();
// Store and read a short:
bb.asShortBuffer().put((short)471142);
System.out.println(bb.getShort());
bb.rewind();
// Store and read an int:
bb.asIntBuffer().put(99471142);
System.out.println(bb.getInt());
bb.rewind();
// Store and read a long:
bb.asLongBuffer().put(99471142);
System.out.println(bb.getLong());
bb.rewind();
// Store and read a float:
bb.asFloatBuffer().put(99471142);
System.out.println(bb.getFloat());
bb.rewind();
// Store and read a double:
bb.asDoubleBuffer().put(99471142);
System.out.println(bb.getDouble());
bb.rewind();
monitor.expect(new String[] {
"i = 1025",
"H o w d y ! ",
"12390", // Truncation changes the value
"99471142",
"99471142",
"9.9471144E7",
"9.9471142E7"
});
}
} ///:~
After a ByteBuffer is allocated, its values are checked to see whether buffer
allocation automatically zeroes the contentsand it does. All 1,024 values are
checked (up to the limit( ) of the buffer), and all are zero. .
The easiest way to insert primitive values into a ByteBuffer is to get the
appropriate view on that buffer using asCharBuffer( ), asShortBuffer( ),
etc., and then to use that views put( ) method. You can see this is the
process used for each of the primitive data types. The only one of these that is a little
odd is the put( ) for the ShortBuffer, which requires a cast (note that
the cast truncates and changes the resulting value). All the other view buffers do not
require casting in their put( ) methods. .
A view buffer allows you to look at an underlying ByteBuffer through
the window of a particular primitive type. The ByteBuffer is still the actual
storage thats backing the view, so any changes you make to the view are
reflected in modifications to the data in the ByteBuffer. As seen in the previous
example, this allows you to conveniently insert primitive types into a ByteBuffer.
A view also allows you to read primitive values from a ByteBuffer, either one at a
time (as ByteBuffer allows) or in batches (into arrays). Heres an example
that manipulates ints in a ByteBuffer via an IntBuffer: .
//: c12:IntBufferDemo.java
// Manipulating ints in a ByteBuffer with an IntBuffer
import java.nio.*;
import com.bruceeckel.simpletest.*;
import com.bruceeckel.util.*;
public class IntBufferDemo {
private static Test monitor = new Test();
private static final int BSIZE = 1024;
public static void main(String[] args) {
ByteBuffer bb = ByteBuffer.allocate(BSIZE);
IntBuffer ib = bb.asIntBuffer();
// Store an array of int:
ib.put(new int[] { 11, 42, 47, 99, 143, 811, 1016 });
// Absolute location read and write:
System.out.println(ib.get(3));
ib.put(3, 1811);
ib.rewind();
while(ib.hasRemaining()) {
int i = ib.get();
if(i == 0) break; // Else we'll get the entire buffer
System.out.println(i);
}
monitor.expect(new String[] {
"99",
"11",
"42",
"47",
"1811",
"143",
"811",
"1016"
});
}
} ///:~
The overloaded put( ) method is first used to store an array of int.
The following get( ) and put( ) method calls directly access an int
location in the underlying ByteBuffer. Note that these absolute location accesses
are available for primitive types by talking directly to a ByteBuffer, as well. .
Once the underlying ByteBuffer is filled with ints or some other
primitive type via a view buffer, then that ByteBuffer can be written directly to a
channel. You can just as easily read from a channel and use a view buffer to convert
everything to a particular type of primitive. Heres an example that interprets the
same sequence of bytes as short, int, float, long, and double
by producing different view buffers on the same ByteBuffer: .
//: c12:ViewBuffers.java
import java.nio.*;
import com.bruceeckel.simpletest.*;
public class ViewBuffers {
private static Test monitor = new Test();
public static void main(String[] args) {
ByteBuffer bb = ByteBuffer.wrap(
new byte[]{ 0, 0, 0, 0, 0, 0, 0, 'a' });
bb.rewind();
System.out.println("Byte Buffer");
while(bb.hasRemaining())
System.out.println(bb.position()+ " -> " + bb.get());
CharBuffer cb =
((ByteBuffer)bb.rewind()).asCharBuffer();
System.out.println("Char Buffer");
while(cb.hasRemaining())
System.out.println(cb.position()+ " -> " + cb.get());
FloatBuffer fb =
((ByteBuffer)bb.rewind()).asFloatBuffer();
System.out.println("Float Buffer");
while(fb.hasRemaining())
System.out.println(fb.position()+ " -> " + fb.get());
IntBuffer ib =
((ByteBuffer)bb.rewind()).asIntBuffer();
System.out.println("Int Buffer");
while(ib.hasRemaining())
System.out.println(ib.position()+ " -> " + ib.get());
LongBuffer lb =
((ByteBuffer)bb.rewind()).asLongBuffer();
System.out.println("Long Buffer");
while(lb.hasRemaining())
System.out.println(lb.position()+ " -> " + lb.get());
ShortBuffer sb =
((ByteBuffer)bb.rewind()).asShortBuffer();
System.out.println("Short Buffer");
while(sb.hasRemaining())
System.out.println(sb.position()+ " -> " + sb.get());
DoubleBuffer db =
((ByteBuffer)bb.rewind()).asDoubleBuffer();
System.out.println("Double Buffer");
while(db.hasRemaining())
System.out.println(db.position()+ " -> " + db.get());
monitor.expect(new String[] {
"Byte Buffer",
"0 -> 0",
"1 -> 0",
"2 -> 0",
"3 -> 0",
"4 -> 0",
"5 -> 0",
"6 -> 0",
"7 -> 97",
"Char Buffer",
"0 -> \0",
"1 -> \0",
"2 -> \0",
"3 -> a",
"Float Buffer",
"0 -> 0.0",
"1 -> 1.36E-43",
"Int Buffer",
"0 -> 0",
"1 -> 97",
"Long Buffer",
"0 -> 97",
"Short Buffer",
"0 -> 0",
"1 -> 0",
"2 -> 0",
"3 -> 97",
"Double Buffer",
"0 -> 4.8E-322"
});
}
} ///:~
The ByteBuffer is produced by wrapping an eight-byte array, which is
then displayed via view buffers of all the different primitive types. You can see in the
following diagram the way the data appears differently when read from the different types
of buffers: .

This corresponds to the output from the program.
Different machines may use different byte-ordering approaches to store data. Big
endian places the most significant byte in the lowest memory address, and
little endian places the most significant byte in the highest memory address.
When storing a quantity that is greater than one byte, like int, float, etc., you
may need to consider the byte ordering. A ByteBuffer stores data in big endian
form, and data sent over a network always uses big endian order. You can change the
endian-ness of a ByteBuffer using order( ) with an argument of ByteOrder.BIG_ENDIAN
or ByteOrder.LITTLE_ENDIAN. .
Consider a ByteBuffer containing the following two bytes:

If you read the data as a short (ByteBuffer.asShortBuffer( )), you
will get the number 97 (00000000 01100001), but if you change to little endian, you will
get the number 24832 (01100001 00000000). .
Heres an example that shows how byte ordering is changed in characters depending
on the endian setting:
//: c12:Endians.java
// Endian differences and data storage.
import java.nio.*;
import com.bruceeckel.simpletest.*;
import com.bruceeckel.util.*;
public class Endians {
private static Test monitor = new Test();
public static void main(String[] args) {
ByteBuffer bb = ByteBuffer.wrap(new byte[12]);
bb.asCharBuffer().put("abcdef");
System.out.println(Arrays2.toString(bb.array()));
bb.rewind();
bb.order(ByteOrder.BIG_ENDIAN);
bb.asCharBuffer().put("abcdef");
System.out.println(Arrays2.toString(bb.array()));
bb.rewind();
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.asCharBuffer().put("abcdef");
System.out.println(Arrays2.toString(bb.array()));
monitor.expect(new String[]{
"[0, 97, 0, 98, 0, 99, 0, 100, 0, 101, 0, 102]",
"[0, 97, 0, 98, 0, 99, 0, 100, 0, 101, 0, 102]",
"[97, 0, 98, 0, 99, 0, 100, 0, 101, 0, 102, 0]"
});
}
} ///:~
The ByteBuffer is given enough space to hold all the bytes in charArray
as an external buffer so that that array( ) method can be called to display
the underlying bytes. The array( ) method is optional, and you can
only call it on a buffer that is backed by an array; otherwise, youll get an UnsupportedOperationException.
.
charArray is inserted into the ByteBuffer via a CharBuffer view.
When the underlying bytes are displayed, you can see that the default ordering is the same
as the subsequent big endian order, whereas the little endian order swaps the bytes. .
The diagram here illustrates the relationships between the nio classes, so that
you can see how to move and convert data. For example, if you wish to write a byte
array to a file, then you wrap the byte array using the ByteBuffer.wrap( )
method, open a channel on the FileOutputStream using the getChannel( ) method,
and then write data into FileChannel from this ByteBuffer. .

Note that ByteBuffer is the only way to move data in and out of channels, and
that you can only create a standalone primitive-typed buffer, or get one from a ByteBuffer
using an as method. That is, you cannot convert a primitive-typed buffer to
a ByteBuffer. However, since you are able to move primitive data into and out of a ByteBuffer
via a view buffer, this is not really a restriction. .
A Buffer consists of data and four indexes to access and manipulate this data
efficiently: mark, position, limit and capacity. There are
methods to set and reset these indexes and to query their value. .
| capacity( ) |
Returns the buffers capacity |
| clear( ) |
Clears the buffer, sets the position
to zero, and limit to capacity. You call this method to overwrite an
existing buffer. |
| flip( ) |
Sets limit to position
and position to zero. This method is used to prepare the buffer for a read after
data has been written into it. |
| limit( ) |
Returns the value of limit. |
| limit(int lim) |
Sets the value of limit. |
| mark( ) |
Sets mark at position. |
| position( ) |
Returns the value of position. |
| position(int pos) |
Sets the value of position. |
| remaining( ) |
Returns (limit - position). |
| hasRemaining( ) |
Returns true if there
are any elements between position and limit. |
Methods that insert and extract data from the buffer update these
indexes to reflect the changes.
This example uses a very simple algorithm (swapping adjacent characters) to scramble
and unscramble characters in a CharBuffer: .
//: c12:UsingBuffers.java
import java.nio.*;
import com.bruceeckel.simpletest.*;
public class UsingBuffers {
private static Test monitor = new Test();
private static void symmetricScramble(CharBuffer buffer){
while(buffer.hasRemaining()) {
buffer.mark();
char c1 = buffer.get();
char c2 = buffer.get();
buffer.reset();
buffer.put(c2).put(c1);
}
}
public static void main(String[] args) {
char[] data = "UsingBuffers".toCharArray();
ByteBuffer bb = ByteBuffer.allocate(data.length * 2);
CharBuffer cb = bb.asCharBuffer();
cb.put(data);
System.out.println(cb.rewind());
symmetricScramble(cb);
System.out.println(cb.rewind());
symmetricScramble(cb);
System.out.println(cb.rewind());
monitor.expect(new String[] {
"UsingBuffers",
"sUniBgfuefsr",
"UsingBuffers"
});
}
} ///:~
Although you could produce a CharBuffer directly by calling wrap( ) with
a char array, an underlying ByteBuffer is allocated instead, and a CharBuffer
is produced as a view on the ByteBuffer. This emphasizes that fact that the
goal is always to manipulate a ByteBuffer, since that is what interacts with a
channel. .
Heres what the buffer looks like after the put( ):

The position points to the first element in the buffer, and the capacity
and limit point to the last element. .
In symmetricScramble( ), the while loop iterates until position is
equivalent to limit. The position of the buffer changes when a relative get( )
or put( ) function is called on it. You can also call absolute get( )
and put( ) methods that include an index argument, which is the location where
the get( ) or put( ) takes place. These methods do not modify the
value of the buffers position. .
When the control enters the while loop, the value of mark is set using mark( )
call. The state of the buffer then: .

The two relative get( ) calls save the value of the first two characters in
variables c1 and c2. After these two calls, the buffer looks like this: .

To perform the swap, we need to write c2 at position = 0 and c1 at
position = 1. We can either use the absolute put method to achieve this, or set the
value of position to mark, which is what reset( ) does: .

The two put( ) methods write c2 and then c1:

During the next iteration of the loop, mark is set to the current value of position:

The process continues until the entire buffer is traversed. At the end of the while
loop, position is at the end of the buffer. If you print the buffer, only the
characters between the position and limit are printed. Thus, if you want to
show the entire contents of the buffer you must set position to the start of the
buffer using rewind( ). Here is the state of buffer after the rewind( )
call (the value of mark becomes undefined): .

When the function symmetricScramble( ) is called again, the CharBuffer undergoes
the same process and is restored to its original state. .
Memory-mapped files allow you to create
and modify files that are too big to bring into memory. With a memory-mapped file, you can
pretend that the entire file is in memory and that you can access it by simply treating it
as a very large array. This approach greatly simplifies the code you write in order to
modify the file. Heres a small example: .
//: c12:LargeMappedFiles.java
// Creating a very large file using mapping.
// {RunByHand}
// {Clean: test.dat}
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class LargeMappedFiles {
static int length = 0x8FFFFFF; // 128 Mb
public static void main(String[] args) throws Exception {
MappedByteBuffer out =
new RandomAccessFile("test.dat", "rw").getChannel()
.map(FileChannel.MapMode.READ_WRITE, 0, length);
for(int i = 0; i < length; i++)
out.put((byte)'x');
System.out.println("Finished writing");
for(int i = length/2; i < length/2 + 6; i++)
System.out.print((char)out.get(i));
}
} ///:~
To do both writing and reading, we start with a RandomAccessFile, get a channel
for that file, and then call map( ) to produce a MappedByteBuffer,
which is a particular kind of direct buffer. Note that you must specify the starting point
and the length of the region that you want to map in the file; this means that you have
the option to map smaller regions of a large file. .
MappedByteBuffer is inherited from ByteBuffer, so
it has all of ByteBuffers methods. Only the very simple uses of put( )
and get( ) are shown here, but you can also use things like asCharBuffer( ),
etc. .
The file created with the preceding program is 128 MB long, which is probably larger
than the space your OS will allow. The file appears to be accessible all at once because
only portions of it are brought into memory, and other parts are swapped out. This way a
very large file (up to 2 GB) can easily be modified. Note that the file-mapping facilities
of the underlying operating system are used to maximize performance. .
Although the performance of old stream I/O has been improved by
implementing it with nio, mapped file access tends to be dramatically faster. This
program does a simple performance comparison:
//: c12:MappedIO.java
// {Clean: temp.tmp}
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class MappedIO {
private static int numOfInts = 4000000;
private static int numOfUbuffInts = 200000;
private abstract static class Tester {
private String name;
public Tester(String name) { this.name = name; }
public long runTest() {
System.out.print(name + ": ");
try {
long startTime = System.currentTimeMillis();
test();
long endTime = System.currentTimeMillis();
return (endTime - startTime);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public abstract void test() throws IOException;
}
private static Tester[] tests = {
new Tester("Stream Write") {
public void test() throws IOException {
DataOutputStream dos = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(new File("temp.tmp"))));
for(int i = 0; i < numOfInts; i++)
dos.writeInt(i);
dos.close();
}
},
new Tester("Mapped Write") {
public void test() throws IOException {
FileChannel fc =
new RandomAccessFile("temp.tmp", "rw")
.getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE, 0, fc.size())
.asIntBuffer();
for(int i = 0; i < numOfInts; i++)
ib.put(i);
fc.close();
}
},
new Tester("Stream Read") {
public void test() throws IOException {
DataInputStream dis = new DataInputStream(
new BufferedInputStream(
new FileInputStream("temp.tmp")));
for(int i = 0; i < numOfInts; i++)
dis.readInt();
dis.close();
}
},
new Tester("Mapped Read") {
public void test() throws IOException {
FileChannel fc = new FileInputStream(
new File("temp.tmp")).getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_ONLY, 0, fc.size())
.asIntBuffer();
while(ib.hasRemaining())
ib.get();
fc.close();
}
},
new Tester("Stream Read/Write") {
public void test() throws IOException {
RandomAccessFile raf = new RandomAccessFile(
new File("temp.tmp"), "rw");
raf.writeInt(1);
for(int i = 0; i < numOfUbuffInts; i++) {
raf.seek(raf.length() - 4);
raf.writeInt(raf.readInt());
}
raf.close();
}
},
new Tester("Mapped Read/Write") {
public void test() throws IOException {
FileChannel fc = new RandomAccessFile(
new File("temp.tmp"), "rw").getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE, 0, fc.size())
.asIntBuffer();
ib.put(0);
for(int i = 1; i < numOfUbuffInts; i++)
ib.put(ib.get(i - 1));
fc.close();
}
}
};
public static void main(String[] args) {
for(int i = 0; i < tests.length; i++)
System.out.println(tests[i].runTest());
}
} ///:~
As seen in earlier examples in this book, runTest( ) is the Template Method that provides the
testing framework for various implementations of test( ) defined in anonymous
inner subclasses. Each of these subclasses perform one kind of test, so the test( )
methods also give you a prototype for performing the various I/O activities. .
Although a mapped write would seem to use a FileOutputStream, all output in file
mapping must use a RandomAccessFile, just as read/write does in the preceding code.
.
Heres the output from one run:
Stream Write: 1719 Mapped Write: 359 Stream Read: 750 Mapped Read: 125 Stream Read/Write: 5188 Mapped Read/Write: 16
Note that the test( ) methods include the time for initialization of the
various I/O objects, so even though the setup for mapped files can be expensive, the
overall gain compared to stream I/O is significant. .
File locking, introduced in JDK 1.4,
allows you to synchronize access to a file as a shared resource. However, the two threads
that contend for the same file may be in different JVMs, or one may be a Java thread and
the other some native thread in the operating system. The file locks are visible to other
operating system processes because Java file locking maps directly to the native operating
system locking facility. .
Here is a simple example of file locking.
//: c12:FileLocking.java
// {Clean: file.txt}
import java.io.FileOutputStream;
import java.nio.channels.*;
public class FileLocking {
public static void main(String[] args) throws Exception {
FileOutputStream fos= new FileOutputStream("file.txt");
FileLock fl = fos.getChannel().tryLock();
if(fl != null) {
System.out.println("Locked File");
Thread.sleep(100);
fl.release();
System.out.println("Released Lock");
}
fos.close();
}
} ///:~
You get a FileLock on the entire file by calling either tryLock( ) or lock( )
on a FileChannel. (SocketChannel, DatagramChannel, and ServerSocketChannel
do not need locking since they are inherently single-process entities; you dont
generally share a network socket between two processes.) tryLock( ) is
non-blocking. It tries to grab the lock, but if it cannot (when some other process already
holds the same lock and it is not shared), it simply returns from the method call. lock( )
blocks until the lock is acquired, or the thread that invoked lock( ) is
interrupted, or the channel on which the lock( ) method is called is closed. A
lock is released using FileLock.release( ). .
It is also possible to lock a part of the file by using
tryLock(long position, long size, boolean shared)
or
lock(long position, long size, boolean shared)
which locks the region (size - position). The third argument specifies whether
this lock is shared. .
Although the zero-argument locking methods adapt to changes in the size of a file,
locks with a fixed size do not change if the file size changes. If a lock is acquired for
a region from position to position+size and the file increases beyond position+size,
then the section beyond position+size is not locked. The zero-argument locking
methods lock the entire file, even if it grows. .
Support for exclusive or shared locks must be provided by the underlying operating
system. If the operating system does not support shared locks and a request is made for
one, an exclusive lock is used instead. The type of lock (shared or exclusive) can be
queried using FileLock.isShared( ). .
As mentioned earlier, file mapping is typically used for very large files. One thing
that you may need to do with such a large file is to lock portions of it so that other
processes may modify unlocked parts of the file. This is something that happens, for
example, with a database, so that it can be available to many users at once. .
Heres an example that has two threads, each of which locks a distinct portion of
a file:
//: c12:LockingMappedFiles.java
// Locking portions of a mapped file.
// {RunByHand}
// {Clean: test.dat}
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
public class LockingMappedFiles {
static final int LENGTH = 0x8FFFFFF; // 128 Mb
static FileChannel fc;
public static void main(String[] args) throws Exception {
fc =
new RandomAccessFile("test.dat", "rw").getChannel();
MappedByteBuffer out =
fc.map(FileChannel.MapMode.READ_WRITE, 0, LENGTH);
for(int i = 0; i < LENGTH; i++)
out.put((byte)'x');
new LockAndModify(out, 0, 0 + LENGTH/3);
new LockAndModify(out, LENGTH/2, LENGTH/2 + LENGTH/4);
}
private static class LockAndModify extends Thread {
private ByteBuffer buff;
private int start, end;
LockAndModify(ByteBuffer mbb, int start, int end) {
this.start = start;
this.end = end;
mbb.limit(end);
mbb.position(start);
buff = mbb.slice();
start();
}
public void run() {
try {
// Exclusive lock with no overlap:
FileLock fl = fc.lock(start, end, false);
System.out.println("Locked: "+ start +" to "+ end);
// Perform modification:
while(buff.position() < buff.limit() - 1)
buff.put((byte)(buff.get() + 1));
fl.release();
System.out.println("Released: "+start+" to "+ end);
} catch(IOException e) {
throw new RuntimeException(e);
}
}
}
} ///:~
The LockAndModify thread class sets up the buffer region and creates a slice( )
to be modified, and in run( ), the lock is acquired on the file channel (you
cant acquire a lock on the bufferonly the channel). The call to lock( )
is very similar to acquiring a threading lock on an objectyou now have a
critical section with exclusive access to that portion of the file. .
The locks are automatically released when the JVM exits, or the channel on which it was
acquired is closed, but you can also explicitly call release( ) on the FileLock
object, as shown here. .
The Java I/O library contains classes
to support reading and writing streams in a compressed format. These are wrapped around
existing I/O classes to provide compression functionality. .
These classes are not derived from the Reader and Writer classes, but
instead are part of the InputStream and OutputStream hierarchies. This is
because the compression library works with bytes, not characters. However, you might
sometimes be forced to mix the two types of streams. (Remember that you can use InputStreamReader
and OutputStreamWriter to provide easy conversion between one type and another.)
| Function |
|
|---|---|
| CheckedInputStream |
GetCheckSum( )
produces checksum for any InputStream (not just decompression). |
| CheckedOutputStream |
GetCheckSum( )
produces checksum for any OutputStream (not just compression). |
| DeflaterOutputStream |
Base class
for compression classes. |
| ZipOutputStream |
A DeflaterOutputStream
that compresses data into the Zip file format. |
| GZIPOutputStream |
A DeflaterOutputStream
that compresses data into the GZIP file format. |
| InflaterInputStream |
Base class
for decompression classes. |
| ZipInputStream |
An InflaterInputStream
that decompresses data that has been stored in the Zip file format. |
| GZIPInputStream |
An InflaterInputStream
that decompresses data that has been stored in the GZIP file format. |
Although there are many compression algorithms, Zip and GZIP are
possibly the most commonly used. Thus you can easily manipulate your compressed data with
the many tools available for reading and writing these formats.
The GZIP interface is simple and thus is probably more appropriate when you have a
single stream of data that you want to compress (rather than a container of dissimilar
pieces of data). Heres an example that compresses a single file:
//: c12:GZIPcompress.java
// {Args: GZIPcompress.java}
// {Clean: test.gz}
import com.bruceeckel.simpletest.*;
import java.io.*;
import java.util.zip.*;
public class GZIPcompress {
private static Test monitor = new Test();
// Throw exceptions to console:
public static void main(String[] args)
throws IOException {
if(args.length == 0) {
System.out.println(
"Usage: \nGZIPcompress file\n" +
"\tUses GZIP compression to compress " +
"the file to test.gz");
System.exit(1);
}
BufferedReader in = new BufferedReader(
new FileReader(args[0]));
BufferedOutputStream out = new BufferedOutputStream(
new GZIPOutputStream(
new FileOutputStream("test.gz")));
System.out.println("Writing file");
int c;
while((c = in.read()) != -1)
out.write(c);
in.close();
out.close();
System.out.println("Reading file");
BufferedReader in2 = new BufferedReader(
new InputStreamReader(new GZIPInputStream(
new FileInputStream("test.gz"))));
String s;
while((s = in2.readLine()) != null)
System.out.println(s);
monitor.expect(new String[] {
"Writing file",
"Reading file"
}, args[0]);
}
} ///:~
The use of the compression classes is straightforward; you simply wrap your output
stream in a GZIPOutputStream or ZipOutputStream, and your input stream in a GZIPInputStream
or ZipInputStream. All else is ordinary I/O reading and writing. This is an example
of mixing the char-oriented streams with the byte-oriented streams; in
uses the Reader classes, whereas GZIPOutputStreams constructor can
accept only an OutputStream object, not a Writer object. When the file is
opened, the GZIPInputStream is converted to a Reader. .
The library that supports the Zip format is much more extensive. With it you can easily
store multiple files, and theres even a separate class to make the process of
reading a Zip file easy. The library uses the standard Zip format so that it works
seamlessly with all the tools currently downloadable on the Internet. The following
example has the same form as the previous example, but it handles as many command-line
arguments as you want. In addition, it shows the use of the Checksum
classes to calculate and verify the checksum for the file. There are two Checksum
types: Adler32 (which is faster) and CRC32
(which is slower but slightly more accurate). .
//: c12:ZipCompress.java
// Uses Zip compression to compress any
// number of files given on the command line.
// {Args: ZipCompress.java}
// {Clean: test.zip}
import com.bruceeckel.simpletest.*;
import java.io.*;
import java.util.*;
import java.util.zip.*;
public class ZipCompress {
private static Test monitor = new Test();
// Throw exceptions to console:
public static void main(String[] args)
throws IOException {
FileOutputStream f = new FileOutputStream("test.zip");
CheckedOutputStream csum =
new CheckedOutputStream(f, new Adler32());
ZipOutputStream zos = new ZipOutputStream(csum);
BufferedOutputStream out =
new BufferedOutputStream(zos);
zos.setComment("A test of Java Zipping");
// No corresponding getComment(), though.
for(int i = 0; i < args.length; i++) {
System.out.println("Writing file " + args[i]);
BufferedReader in =
new BufferedReader(new FileReader(args[i]));
zos.putNextEntry(new ZipEntry(args[i]));
int c;
while((c = in.read()) != -1)
out.write(c);
in.close();
}
out.close();
// Checksum valid only after the file has been closed!
System.out.println("Checksum: " +
csum.getChecksum().getValue());
// Now extract the files:
System.out.println("Reading file");
FileInputStream fi = new FileInputStream("test.zip");
CheckedInputStream csumi =
new CheckedInputStream(fi, new Adler32());
ZipInputStream in2 = new ZipInputStream(csumi);
BufferedInputStream bis = new BufferedInputStream(in2);
ZipEntry ze;
while((ze = in2.getNextEntry()) != null) {
System.out.println("Reading file " + ze);
int x;
while((x = bis.read()) != -1)
System.out.write(x);
}
if(args.length == 1)
monitor.expect(new String[] {
"Writing file " + args[0],
"%% Checksum: \\d+",
"Reading file",
"Reading file " + args[0]}, args[0]);
System.out.println("Checksum: " +
csumi.getChecksum().getValue());
bis.close();
// Alternative way to open and read zip files:
ZipFile zf = new ZipFile("test.zip");
Enumeration e = zf.entries();
while(e.hasMoreElements()) {
ZipEntry ze2 = (ZipEntry)e.nextElement();
System.out.println("File: " + ze2);
// ... and extract the data as before
}
if(args.length == 1)
monitor.expect(new String[] {
"%% Checksum: \\d+",
"File: " + args[0]
});
}
} ///:~
For each file to add to the archive, you must call putNextEntry( ) and pass
it a ZipEntry object. The ZipEntry
object contains an extensive interface that allows you to get and set all the data
available on that particular entry in your Zip file: name, compressed and uncompressed
sizes, date, CRC checksum, extra field data, comment, compression method, and whether
its a directory entry. However, even though the Zip format has a way to set a
password, this is not supported in Javas Zip library. And although CheckedInputStream
and CheckedOutputStream support both Adler32 and CRC32 checksums, the
ZipEntry class supports only an interface for CRC. This is a restriction of the
underlying Zip format, but it might limit you from using the faster Adler32. .
To extract files, ZipInputStream has a getNextEntry( ) method that
returns the next ZipEntry if there is one. As a more succinct alternative, you can
read the file using a ZipFile object, which has a method entries( ) to
return an Enumeration to the ZipEntries. .
In order to read the checksum, you must somehow have access to the associated Checksum
object. Here, a reference to the CheckedOutputStream and CheckedInputStream
objects is retained, but you could also just hold onto a reference to the Checksum
object. .
A baffling method in Zip streams is setComment( ). As shown in ZipCompress.java,
you can set a comment when youre writing a file, but theres no way to recover
the comment in the ZipInputStream. Comments appear to be supported fully on an
entry-by-entry basis only via ZipEntry. .
Of course, you are not limited to files when using the GZIP or Zip
librariesyou can compress anything, including data to be sent through a network
connection. .
The Zip format is also used in the JAR (Java ARchive) file format, which is a way to
collect a group of files into a single compressed file, just like Zip. However, like
everything else in Java, JAR files are cross-platform, so you dont need to worry
about platform issues. You can also include audio and image files as well as class files. .
JAR files are particularly helpful when you deal with the
Internet. Before JAR files, your Web browser would have to make repeated requests of a Web
server in order to download all of the files that make up an applet. In addition, each of
these files was uncompressed. By combining all of the files for a particular applet into a
single JAR file, only one server request is necessary and the transfer is faster because
of compression. And each entry in a JAR file can be digitally signed for security (see
Chapter 14 for an example of signing). .
A JAR file consists of a single file containing a collection of zipped files along with
a manifest that describes them. (You can create your own manifest file;
otherwise, the jar program will do it for you.) You can
find out more about JAR manifests in the JDK documentation. .
The jar utility that comes with Suns JDK automatically compresses the
files of your choice. You invoke it on the command line: .
jar [options] destination [manifest] inputfile(s)
The options are simply a collection of letters (no hyphen or any other indicator is
necessary). Unix/Linux users will note the similarity to the tar options. These
are:
| c |
Creates a new
or empty archive. |
| t |
Lists the
table of contents. |
| x |
Extracts all
files. |
| x file |
Extracts the
named file. |
| f |
Says:
Im going to give you the name of the file. If you dont use this, jar
assumes that its input will come from standard input, or, if it is creating a file, its
output will go to standard output. |
| m |
Says that the
first argument will be the name of the user-created manifest file. |
| v |
Generates
verbose output describing what jar is doing. |
| 0 |
Only store
the files; doesnt compress the files (use to create a JAR file that you can put in
your classpath). |
| M |
Dont
automatically create a manifest file. |
If a subdirectory is included in the files to be put into the JAR
file, that subdirectory is automatically added, including all of its subdirectories, etc.
Path information is also preserved.
Here are some typical ways to invoke jar:
jar cf myJarFile.jar *.class
This creates a JAR file called myJarFile.jar that contains all of the class
files in the current directory, along with an automatically generated manifest file. .
jar cmf myJarFile.jar myManifestFile.mf *.class
Like the previous example, but adding a user-created manifest file called myManifestFile.mf.
.
jar tf myJarFile.jar
Produces a table of contents of the files in myJarFile.jar. .
jar tvf myJarFile.jar
Adds the verbose flag to give more detailed information about the files in myJarFile.jar.
.
jar cvf myApp.jar audio classes image
Assuming audio, classes, and image are subdirectories, this
combines all of the subdirectories into the file myApp.jar. The verbose
flag is also included to give extra . while the jar program is working. .
If you create a JAR file using the 0 (zero) option, that file can be placed in
your CLASSPATH:
CLASSPATH="lib1.jar;lib2.jar;"
Then Java can search lib1.jar and lib2.jar for class files. .
The jar tool isnt as useful as a zip utility. For example, you
cant add or update files to an existing JAR file; you can create JAR files only from
scratch. Also, you cant move files into a JAR file, erasing them as they are moved.
However, a JAR file created on one platform will be transparently readable by the jar
tool on any other platform (a problem that sometimes plagues zip utilities). .
As you will see in Chapter 14, JAR files are also used to package JavaBeans. .
Javas object
serialization allows you to take any object that implements the Serializable
interface and turn it into a sequence of bytes that can later be fully restored to
regenerate the original object. This is even true across a network, which means that the
serialization mechanism automatically compensates for differences in operating systems.
That is, you can create an object on a Windows machine, serialize it, and send it across
the network to a Unix machine, where it will be correctly reconstructed. You dont
have to worry about the data representations on the different machines, the byte ordering,
or any other details. .
By itself, object serialization is interesting because it allows you to implement lightweight
persistence. Remember that persistence means that an objects lifetime is not
determined by whether a program is executing; the object lives in between
invocations of the program. By taking a serializable object and writing it to disk, then
restoring that object when the program is reinvoked, youre able to produce the
effect of persistence. The reason its called lightweight is that you
cant simply define an object using some kind of persistent keyword and
let the system take care of the details (although this might happen in the future).
Instead, you must explicitly serialize and deserialize the objects in your program. If you
need a more serious persistence mechanism, consider Java Data Objects (JDO) or a
tool like Hibernate (http://hibernate.sourceforge.net). For details, see
Thinking in Enterprise Java, downloadable from www.BruceEckel.com. .
Object serialization was added to the language to support two major features.
Javas Remote Method Invocation (RMI) allows objects that live on other
machines to behave as if they live on your machine. When sending messages to remote
objects, object serialization is necessary to transport the arguments and return values.
RMI is discussed in Thinking in Enterprise Java. .
Object serialization is also necessary for JavaBeans, described in Chapter 14. When a
Bean is used, its state information is generally configured at design time. This state
information must be stored and later recovered when the program is started; object
serialization performs this task. .
Serializing an object is quite simple as long as the object implements the Serializable
interface (this is a tagging interface and has no methods). When serialization was added
to the language, many standard library classes were changed to make them serializable,
including all of the wrappers for the primitive types, all of the container classes, and
many others. Even Class objects can be serialized. .
To serialize an object, you create some sort of OutputStream object and then
wrap it inside an ObjectOutputStream
object. At this point you need only call writeObject( ),
and your object is serialized and sent to the OutputStream. To reverse the process,
you wrap an InputStream inside an ObjectInputStream and call readObject( ). What comes back
is, as usual, a reference to an upcast Object, so you must downcast to set things
straight. .
A particularly clever aspect of object serialization is that it not only saves an image
of your object, but it also follows all the references contained in your object and saves those
objects, and follows all the references in each of those objects, etc. This is
sometimes referred to as the web of objects that a single object can be
connected to, and it includes arrays of references to objects as well as member objects.
If you had to maintain your own object serialization scheme, maintaining the code to
follow all these links would be a bit mind-boggling. However, Java object serialization
seems to pull it off flawlessly, no doubt using an optimized algorithm that traverses the
web of objects. The following example tests the serialization mechanism by making a
worm of linked objects, each of which has a link to the next segment in the
worm as well as an array of references to objects of a different class, Data:
//: c12:Worm.java
// Demonstrates object serialization.
// {Clean: worm.out}
import java.io.*;
import java.util.*;
class Data implements Serializable {
private int n;
public Data(int n) { this.n = n; }
public String toString() { return Integer.toString(n); }
}
public class Worm implements Serializable {
private static Random rand = new Random();
private Data[] d = {
new Data(rand.nextInt(10)),
new Data(rand.nextInt(10)),
new Data(rand.nextInt(10))
};
private Worm next;
private char c;
// Value of i == number of segments
public Worm(int i, char x) {
System.out.println("Worm constructor: " + i);
c = x;
if(--i > 0)
next = new Worm(i, (char)(x + 1));
}
public Worm() {
System.out.println("Default constructor");
}
public String toString() {
String s = ":" + c + "(";
for(int i = 0; i < d.length; i++)
s += d[i];
s += ")";
if(next != null)
s += next;
return s;
}
// Throw exceptions to console:
public static void main(String[] args)
throws ClassNotFoundException, IOException {
Worm w = new Worm(6, 'a');
System.out.println("w = " + w);
ObjectOutputStream out = new ObjectOutputStream(
new FileOutputStream("worm.out"));
out.writeObject("Worm storage\n");
out.writeObject(w);
out.close(); // Also flushes output
ObjectInputStream in = new ObjectInputStream(
new FileInputStream("worm.out"));
String s = (String)in.readObject();
Worm w2 = (Worm)in.readObject();
System.out.println(s + "w2 = " + w2);
ByteArrayOutputStream bout =
new ByteArrayOutputStream();
ObjectOutputStream out2 = new ObjectOutputStream(bout);
out2.writeObject("Worm storage\n");
out2.writeObject(w);
out2.flush();
ObjectInputStream in2 = new ObjectInputStream(
new ByteArrayInputStream(bout.toByteArray()));
s = (String)in2.readObject();
Worm w3 = (Worm)in2.readObject();
System.out.println(s + "w3 = " + w3);
}
} ///:~
To make things interesting, the array of Data objects inside Worm are
initialized with random numbers. (This way you dont suspect the compiler of keeping
some kind of meta-information.) Each Worm segment is labeled with a char
thats automatically generated in the process of recursively generating the linked
list of Worms. When you create a Worm, you tell the constructor how long you
want it to be. To make the next reference, it calls the Worm constructor
with a length of one less, etc. The final next reference is left as null,
indicating the end of the Worm. .
The point of all this was to make something reasonably complex that couldnt
easily be serialized. The act of serializing, however, is quite simple. Once the ObjectOutputStream
is created from some other stream, writeObject( ) serializes the object.
Notice the call to writeObject( ) for a String, as well. You can also
write all the primitive data types using the same methods as DataOutputStream (they
share the same interface). .
There are two separate code sections that look similar. The first writes and reads a
file and the second, for variety, writes and reads a ByteArray. You can read and
write an object using serialization to any DataInputStream or DataOutputStream
including, as you can see in Thinking in Enterprise Java, a network. The output
from one run was:
Worm constructor: 6 Worm constructor: 5 Worm constructor: 4 Worm constructor: 3 Worm constructor: 2 Worm constructor: 1 w = :a(414):b(276):c(773):d(870):e(210):f(279) Worm storage w2 = :a(414):b(276):c(773):d(870):e(210):f(279) Worm storage w3 = :a(414):b(276):c(773):d(870):e(210):f(279)
You can see that the deserialized object really does contain all of the links that were
in the original object. .
Note that no constructor, not even the default constructor, is called in the process of
deserializing a Serializable object. The entire object is restored by recovering
data from the InputStream. .
Object serialization is byte-oriented, and thus uses the InputStream and OutputStream
hierarchies. .
You might wonder whats necessary for an object to be recovered from its
serialized state. For example, suppose you serialize an object and send it as a file or
through a network to another machine. Could a program on the other machine reconstruct the
object using only the contents of the file? .
The best way to answer this question is (as usual) by performing an experiment. The
following file goes in the subdirectory for this chapter: .
//: c12:Alien.java
// A serializable class.
import java.io.*;
public class Alien implements Serializable {} ///:~
The file that creates and serializes an Alien object goes in the same directory:
.
//: c12:FreezeAlien.java
// Create a serialized output file.
// {Clean: X.file}
import java.io.*;
public class FreezeAlien {
// Throw exceptions to console:
public static void main(String[] args) throws Exception {
ObjectOutput out = new ObjectOutputStream(
new FileOutputStream("X.file"));
Alien zorcon = new Alien();
out.writeObject(zorcon);
}
} ///:~
Rather than catching and handling exceptions, this program takes the quick-and-dirty
approach of passing the exceptions out of main( ), so theyll be reported
on the console. .
Once the program is compiled and run, it produces a file called X.file in the c12
directory. The following code is in a subdirectory called xfiles: .
//: c12:xfiles:ThawAlien.java
// Try to recover a serialized file without the
// class of object that's stored in that file.
// {ThrowsException}
import java.io.*;
public class ThawAlien {
public static void main(String[] args) throws Exception {
ObjectInputStream in = new ObjectInputStream(
new FileInputStream(new File("..", "X.file")));
Object mystery = in.readObject();
System.out.println(mystery.getClass());
}
} ///:~
Even opening the file and reading in the object mystery requires the Class
object for Alien; the JVM cannot find Alien.class (unless it happens to be
in the Classpath, which it shouldnt be in this example). Youll get a ClassNotFoundException.
(Once again, all evidence of alien life vanishes before proof of its existence can be
verified!) The JVM must be able to find the associated .class file. .
As you can see, the default
serialization mechanism is trivial to use. But what if you have special needs? Perhaps you
have special security issues and you dont want to serialize portions of your object,
or perhaps it just doesnt make sense for one subobject to be serialized if that part
needs to be created anew when the object is recovered. .
You can control the process of serialization by implementing the Externalizable interface instead of the Serializable interface. The Externalizable interface
extends the Serializable interface and adds two methods, writeExternal( ) and readExternal( ), that are automatically called for your
object during serialization and deserialization so that you can perform your special
operations. .
The following example shows simple implementations of the Externalizable
interface methods. Note that Blip1 and Blip2 are nearly identical except for
a subtle difference (see if you can discover it by looking at the code):
//: c12:Blips.java
// Simple use of Externalizable & a pitfall.
// {Clean: Blips.out}
import com.bruceeckel.simpletest.*;
import java.io.*;
import java.util.*;
class Blip1 implements Externalizable {
public Blip1() {
System.out.println("Blip1 Constructor");
}
public void writeExternal(ObjectOutput out)
throws IOException {
System.out.println("Blip1.writeExternal");
}
public void readExternal(ObjectInput in)
throws IOException, ClassNotFoundException {
System.out.println("Blip1.readExternal");
}
}
class Blip2 implements Externalizable {
Blip2() {
System.out.println("Blip2 Constructor");
}
public void writeExternal(ObjectOutput out)
throws IOException {
System.out.println("Blip2.writeExternal");
}
public void readExternal(ObjectInput in)
throws IOException, ClassNotFoundException {
System.out.println("Blip2.readExternal");
}
}
public class Blips {
private static Test monitor = new Test();
// Throw exceptions to console:
public static void main(String[] args)
throws IOException, ClassNotFoundException {
System.out.println("Constructing objects:");
Blip1 b1 = new Blip1();
Blip2 b2 = new Blip2();
ObjectOutputStream o = new ObjectOutputStream(
new FileOutputStream("Blips.out"));
System.out.println("Saving objects:");
o.writeObject(b1);
o.writeObject(b2);
o.close();
// Now get them back:
ObjectInputStream in = new ObjectInputStream(
new FileInputStream("Blips.out"));
System.out.println("Recovering b1:");
b1 = (Blip1)in.readObject();
// OOPS! Throws an exception:
//! System.out.println("Recovering b2:");
//! b2 = (Blip2)in.readObject();
monitor.expect(new String[] {
"Constructing objects:",
"Blip1 Constructor",
"Blip2 Constructor",
"Saving objects:",
"Blip1.writeExternal",
"Blip2.writeExternal",
"Recovering b1:",
"Blip1 Constructor",
"Blip1.readExternal"
});
}
} ///:~
The reason that the Blip2 object is not recovered is that trying to do so causes
an exception. Can you see the difference between Blip1 and Blip2? The
constructor for Blip1 is public, while the constructor for Blip2 is
not, and that causes the exception upon recovery. Try making Blip2s
constructor public and removing the //! comments to see the correct results.
.
When b1 is recovered, the Blip1 default constructor is called. This is
different from recovering a Serializable object, in which the object is constructed
entirely from its stored bits, with no constructor calls. With an Externalizable
object, all the normal default construction behavior occurs (including the initializations
at the point of field definition), and then readExternal( ) is called.
You need to be aware of thisin particular, the fact that all the default
construction always takes placeto produce the correct behavior in your Externalizable
objects. .
Heres an example that shows what you must do to fully store and retrieve an Externalizable
object: .
//: c12:Blip3.java
// Reconstructing an externalizable object.
import com.bruceeckel.simpletest.*;
import java.io.*;
import java.util.*;
public class Blip3 implements Externalizable {
private static Test monitor = new Test();
private int i;
private String s; // No initialization
public Blip3() {
System.out.println("Blip3 Constructor");
// s, i not initialized
}
public Blip3(String x, int a) {
System.out.println("Blip3(String x, int a)");
s = x;
i = a;
// s & i initialized only in nondefault constructor.
}
public String toString() { return s + i; }
public void writeExternal(ObjectOutput out)
throws IOException {
System.out.println("Blip3.writeExternal");
// You must do this:
out.writeObject(s);
out.writeInt(i);
}
public void readExternal(ObjectInput in)
throws IOException, ClassNotFoundException {
System.out.println("Blip3.readExternal");
// You must do this:
s = (String)in.readObject();
i = in.readInt();
}
public static void main(String[] args)
throws IOException, ClassNotFoundException {
System.out.println("Constructing objects:");
Blip3 b3 = new Blip3("A String ", 47);
System.out.println(b3);
ObjectOutputStream o = new ObjectOutputStream(
new FileOutputStream("Blip3.out"));
System.out.println("Saving object:");
o.writeObject(b3);
o.close();
// Now get it back:
ObjectInputStream in = new ObjectInputStream(
new FileInputStream("Blip3.out"));
System.out.println("Recovering b3:");
b3 = (Blip3)in.readObject();
System.out.println(b3);
monitor.expect(new String[] {
"Constructing objects:",
"Blip3(String x, int a)",
"A String 47",
"Saving object:",
"Blip3.writeExternal",
"Recovering b3:",
"Blip3 Constructor",
"Blip3.readExternal",
"A String 47"
});
}
} ///:~
The fields s and i are initialized only in the second constructor, but
not in the default constructor. This means that if you dont initialize s and i
in readExternal( ), s will be null and i will be zero
(since the storage for the object gets wiped to zero in the first step of object
creation). If you comment out the two lines of code following the phrases You must
do this and run the program, youll see that when the object is recovered, s
is null and i is zero. .
If you are inheriting from an Externalizable object, youll typically call
the base-class versions of writeExternal( ) and readExternal( ) to
provide proper storage and retrieval of the base-class components. .
So to make things work correctly you must not only write the important data from the
object during the writeExternal( ) method (there is no default behavior that
writes any of the member objects for an Externalizable object), but you must also
recover that data in the readExternal( ) method. This can be a bit confusing
at first because the default construction behavior for an Externalizable object can
make it seem like some kind of storage and retrieval takes place automatically. It does
not. .
When youre controlling serialization, there might be a particular subobject that
you dont want Javas serialization mechanism to automatically save and restore.
This is commonly the case if that subobject represents sensitive information that you
dont want to serialize, such as a password. Even if that information is private in
the object, once it has been serialized, its possible for someone to access it by
reading a file or intercepting a network transmission. .
One way to prevent sensitive parts of your object from being serialized is to implement
your class as Externalizable, as shown previously. Then nothing is automatically
serialized, and you can explicitly serialize only the necessary parts inside writeExternal( ).
.
If youre working with a Serializable object, however, all serialization
happens automatically. To control this, you can turn off serialization on a field-by-field
basis using the transient
keyword, which says Dont bother saving or restoring thisIll take
care of it. .
For example, consider a Login object that keeps information about a particular
login session. Suppose that, once you verify the login, you want to store the data, but
without the password. The easiest way to do this is by implementing Serializable
and marking the password field as transient. Heres what it looks like:
//: c12:Logon.java
// Demonstrates the "transient" keyword.
// {Clean: Logon.out}
import java.io.*;
import java.util.*;
public class Logon implements Serializable {
private Date date = new Date();
private String username;
private transient String password;
public Logon(String name, String pwd) {
username = name;
password = pwd;
}
public String toString() {
String pwd = (password == null) ? "(n/a)" : password;
return "logon info: \n username: " + username +
"\n date: " + date + "\n password: " + pwd;
}
public static void main(String[] args) throws Exception {
Logon a = new Logon("Hulk", "myLittlePony");
System.out.println( "logon a = " + a);
ObjectOutputStream o = new ObjectOutputStream(
new FileOutputStream("Logon.out"));
o.writeObject(a);
o.close();
Thread.sleep(1000); // Delay for 1 second
// Now get them back:
ObjectInputStream in = new ObjectInputStream(
new FileInputStream("Logon.out"));
System.out.println("Recovering object at "+new Date());
a = (Logon)in.readObject();
System.out.println("logon a = " + a);
}
} ///:~
You can see that the date and username fields are ordinary (not transient),
and thus are automatically serialized. However, the password is transient,
so it is not stored to disk; also, the serialization mechanism makes no attempt to recover
it. The output is: .
logon a = logon info: username: Hulk date: Mon Oct 21 12:10:13 MDT 2002 password: myLittlePony Recovering object at Mon Oct 21 12:10:14 MDT 2002 logon a = logon info: username: Hulk date: Mon Oct 21 12:10:13 MDT 2002 password: (n/a)
When the object is recovered, the password field is null. Note that toString( )
must check for a null value of password, because if you try to
assemble a String object using the overloaded + operator, and
that operator encounters a null reference, youll get a NullPointerException.
(Newer versions of Java might contain code to avoid this problem.) .
You can also see that the date field is stored to and recovered from disk and
not generated anew. .
Since Externalizable objects do not store any of their fields by default, the transient
keyword is for use with Serializable objects only. .
If youre not keen on implementing the Externalizable interface,
theres another approach. You can implement the Serializable interface and add
(notice I say add and not override or implement)
methods called writeObject( )
and readObject( ) that will
automatically be called when the object is serialized and deserialized, respectively. That
is, if you provide these two methods, they will be used instead of the default
serialization. .
The methods must have these exact signatures:
private void writeObject(ObjectOutputStream stream) throws IOException; private void readObject(ObjectInputStream stream) throws IOException, ClassNotFoundException
From a design standpoint, things get really weird here. First of all, you might think
that because these methods are not part of a base class or the Serializable
interface, they ought to be defined in their own interface(s). But notice that they are
defined as private, which means they are to be called only by other members of this
class. However, you dont actually call them from other members of this class, but
instead the writeObject( ) and readObject( ) methods of the ObjectOutputStream
and ObjectInputStream objects call your objects writeObject( )
and readObject( ) methods. (Notice my tremendous restraint in not launching
into a long diatribe about using the same method names here. In a word: confusing.) You
might wonder how the ObjectOutputStream and ObjectInputStream objects have
access to private methods of your class. We can only assume that this is part of
the serialization magic. .
In any event, anything defined in an interface is automatically public so
if writeObject( ) and readObject( ) must be private, then
they cant be part of an interface. Since you must follow the signatures
exactly, the effect is the same as if youre implementing an interface. .
It would appear that when you call ObjectOutputStream.writeObject( ), the Serializable
object that you pass it to is interrogated (using reflection, no doubt) to see if it
implements its own writeObject( ). If so, the normal serialization process is
skipped and the writeObject( ) is called. The same sort of situation exists
for readObject( ). .
Theres one other twist. Inside your writeObject( ), you can choose to
perform the default writeObject( ) action by calling defaultWriteObject( ).
Likewise, inside readObject( ) you can call defaultReadObject( ).
Here is a simple example that demonstrates how you can control the storage and retrieval
of a Serializable object:
//: c12:SerialCtl.java
// Controlling serialization by adding your own
// writeObject() and readObject() methods.
import com.bruceeckel.simpletest.*;
import java.io.*;
public class SerialCtl implements Serializable {
private static Test monitor = new Test();
private String a;
private transient String b;
public SerialCtl(String aa, String bb) {
a = "Not Transient: " + aa;
b = "Transient: " + bb;
}
public String toString() { return a + "\n" + b; }
private void writeObject(ObjectOutputStream stream)
throws IOException {
stream.defaultWriteObject();
stream.writeObject(b);
}
private void readObject(ObjectInputStream stream)
throws IOException, ClassNotFoundException {
stream.defaultReadObject();
b = (String)stream.readObject();
}
public static void main(String[] args)
throws IOException, ClassNotFoundException {
SerialCtl sc = new SerialCtl("Test1", "Test2");
System.out.println("Before:\n" + sc);
ByteArrayOutputStream buf= new ByteArrayOutputStream();
ObjectOutputStream o = new ObjectOutputStream(buf);
o.writeObject(sc);
// Now get it back:
ObjectInputStream in = new ObjectInputStream(
new ByteArrayInputStream(buf.toByteArray()));
SerialCtl sc2 = (SerialCtl)in.readObject();
System.out.println("After:\n" + sc2);
monitor.expect(new String[] {
"Before:",
"Not Transient: Test1",
"Transient: Test2",
"After:",
"Not Transient: Test1",
"Transient: Test2"
});
}
} ///:~
In this example, one String field is ordinary and the other is transient,
to prove that the non-transient field is saved by the defaultWriteObject( ) method and the transient
field is saved and restored explicitly. The fields are initialized inside the constructor
rather than at the point of definition to prove that they are not being initialized by
some automatic mechanism during deserialization. .
If you are going to use the default mechanism to write the non-transient parts
of your object, you must call defaultWriteObject( ) as the first operation in writeObject( ),
and defaultReadObject( ) as
the first operation in readObject( ). These are strange method calls. It would
appear, for example, that you are calling defaultWriteObject( ) for an ObjectOutputStream
and passing it no arguments, and yet it somehow turns around and knows the reference to
your object and how to write all the non-transient parts. Spooky. .
The storage and retrieval of the transient objects uses more familiar code. And
yet, think about what happens here. In main( ), a SerialCtl object is
created, and then its serialized to an ObjectOutputStream. (Notice in this
case that a buffer is used instead of a fileits all the same to the ObjectOutputStream.)
The serialization occurs in the line:
o.writeObject(sc);
The writeObject( ) method must be examining sc to see if it has its
own writeObject( ) method. (Not by checking the interfacethere
isnt oneor the class type, but by actually hunting for the method using
reflection.) If it does, it uses that. A similar approach holds true for readObject( ).
Perhaps this was the only practical way that they could solve the problem, but its
certainly strange. .
Its possible that you might want
to change the version of a serializable class (objects of the original class might be
stored in a database, for example). This is supported, but youll probably do it only
in special cases, and it requires an extra depth of understanding that we will not attempt
to achieve here. The JDK documents downloadable from java.sun.com cover this topic
quite thoroughly. .
You will also notice in the JDK documentation many comments that begin with:
Warning: Serialized objects of this class will not be compatible with future
Swing releases. The current serialization support is appropriate for short term storage or
RMI between applications ...
This is because the versioning mechanism is too simple to work reliably in all
situations, especially with JavaBeans. Theyre working on a correction for the
design, and thats what the warning is about. .
Its quite appealing to use serialization technology to
store some of the state of your program so that you can easily restore the program to the
current state later. But before you can do this, some questions must be answered. What
happens if you serialize two objects that both have a reference to a third object? When
you restore those two objects from their serialized state, do you get only one occurrence
of the third object? What if you serialize your two objects to separate files and
deserialize them in different parts of your code? .
Heres an example that shows the problem:
//: c12:MyWorld.java
import java.io.*;
import java.util.*;
class House implements Serializable {}
class Animal implements Serializable {
private String name;
private House preferredHouse;
Animal(String nm, House h) {
name = nm;
preferredHouse = h;
}
public String toString() {
return name + "[" + super.toString() +
"], " + preferredHouse + "\n";
}
}
public class MyWorld {
public static void main(String[] args)
throws IOException, ClassNotFoundException {
House house = new House();
List animals = new ArrayList();
animals.add(new Animal("Bosco the dog", house));
animals.add(new Animal("Ralph the hamster", house));
animals.add(new Animal("Fronk the cat", house));
System.out.println("animals: " + animals);
ByteArrayOutputStream buf1 =
new ByteArrayOutputStream();
ObjectOutputStream o1 = new ObjectOutputStream(buf1);
o1.writeObject(animals);
o1.writeObject(animals); // Write a 2nd set
// Write to a different stream:
ByteArrayOutputStream buf2 =
new ByteArrayOutputStream();
ObjectOutputStream o2 = new ObjectOutputStream(buf2);
o2.writeObject(animals);
// Now get them back:
ObjectInputStream in1 = new ObjectInputStream(
new ByteArrayInputStream(buf1.toByteArray()));
ObjectInputStream in2 = new ObjectInputStream(
new ByteArrayInputStream(buf2.toByteArray()));
List
animals1 = (List)in1.readObject(),
animals2 = (List)in1.readObject(),
animals3 = (List)in2.readObject();
System.out.println("animals1: " + animals1);
System.out.println("animals2: " + animals2);
System.out.println("animals3: " + animals3);
}
} ///:~
One thing thats interesting here is that its possible to use object
serialization to and from a byte array as a way of doing a deep copy of any
object thats Serializable. (A deep copy means that youre duplicating
the entire web of objects, rather than just the basic object and its references.) Object
copying is covered in depth in Appendix A. .
Animal objects contain fields of type House. In main( ), a List
of these Animals is created and it is serialized twice to one stream and then again
to a separate stream. When these are deserialized and printed, you see the following
results for one run (the objects will be in different memory locations each run): .
animals: [Bosco the dog[Animal@1cde100], House@16f0472 , Ralph the hamster[Animal@18d107f], House@16f0472 , Fronk the cat[Animal@360be0], House@16f0472 ] animals1: [Bosco the dog[Animal@e86da0], House@1754ad2 , Ralph the hamster[Animal@1833955], House@1754ad2 , Fronk the cat[Animal@291aff], House@1754ad2 ] animals2: [Bosco the dog[Animal@e86da0], House@1754ad2 , Ralph the hamster[Animal@1833955], House@1754ad2 , Fronk the cat[Animal@291aff], House@1754ad2 ] animals3: [Bosco the dog[Animal@ab95e6], House@fe64b9 , Ralph the hamster[Animal@186db54], House@fe64b9 , Fronk the cat[Animal@a97b0b], House@fe64b9 ]
Of course you expect that the deserialized objects have different addresses from their
originals. But notice that in animals1 and animals2, the same addresses
appear, including the references to the House object that both share. On the other
hand, when animals3 is recovered, the system has no way of knowing that the objects
in this other stream are aliases of the objects in the first stream, so it makes a
completely different web of objects. .
As long as youre serializing everything to a single stream, youll be able
to recover the same web of objects that you wrote, with no accidental duplication of
objects. Of course, you can change the state of your objects in between the time you write
the first and the last, but thats your responsibility; the objects will be written
in whatever state they are in (and with whatever connections they have to other objects)
at the time you serialize them. .
The safest thing to do if you want to save the state of a system is to serialize as an
atomic operation. If you serialize some things, do some other work, and
serialize some more, etc., then you will not be storing the system safely. Instead, put
all the objects that comprise the state of your system in a single container and simply
write that container out in one operation. Then you can restore it with a single method
call as well. .
The following example is an imaginary computer-aided design (CAD) system that
demonstrates the approach. In addition, it throws in the issue of static fields; if
you look at the JDK documentation youll see that Class is Serializable,
so it should be easy to store the static fields by simply serializing the Class object. That seems like a
sensible approach, anyway. .
//: c12:CADState.java
// Saving and restoring the state of a pretend CAD system.
// {Clean: CADState.out}
//package c12;
import java.io.*;
import java.util.*;
abstract class Shape implements Serializable {
public static final int RED = 1, BLUE = 2, GREEN = 3;
private int xPos, yPos, dimension;
private static Random r = new Random();
private static int counter = 0;
public abstract void setColor(int newColor);
public abstract int getColor();
public Shape(int xVal, int yVal, int dim) {
xPos = xVal;
yPos = yVal;
dimension = dim;
}
public String toString() {
return getClass() +
"color[" + getColor() + "] xPos[" + xPos +
"] yPos[" + yPos + "] dim[" + dimension + "]\n";
}
public static Shape randomFactory() {
int xVal = r.nextInt(100);
int yVal = r.nextInt(100);
int dim = r.nextInt(100);
switch(counter++ % 3) {
default:
case 0: return new Circle(xVal, yVal, dim);
case 1: return new Square(xVal, yVal, dim);
case 2: return new Line(xVal, yVal, dim);
}
}
}
class Circle extends Shape {
private static int color = RED;
public Circle(int xVal, int yVal, int dim) {
super(xVal, yVal, dim);
}
public void setColor(int newColor) { color = newColor; }
public int getColor() { return color; }
}
class Square extends Shape {
private static int color;
public Square(int xVal, int yVal, int dim) {
super(xVal, yVal, dim);
color = RED;
}
public void setColor(int newColor) { color = newColor; }
public int getColor() { return color; }
}
class Line extends Shape {
private static int color = RED;
public static void
serializeStaticState(ObjectOutputStream os)
throws IOException { os.writeInt(color); }
public static void
deserializeStaticState(ObjectInputStream os)
throws IOException { color = os.readInt(); }
public Line(int xVal, int yVal, int dim) {
super(xVal, yVal, dim);
}
public void setColor(int newColor) { color = newColor; }
public int getColor() { return color; }
}
public class CADState {
public static void main(String[] args) throws Exception {
List shapeTypes, shapes;
if(args.length == 0) {
shapeTypes = new ArrayList();
shapes = new ArrayList();
// Add references to the class objects:
shapeTypes.add(Circle.class);
shapeTypes.add(Square.class);
shapeTypes.add(Line.class);
// Make some shapes:
for(int i = 0; i < 10; i++)
shapes.add(Shape.randomFactory());
// Set all the static colors to GREEN:
for(int i = 0; i < 10; i++)
((Shape)shapes.get(i)).setColor(Shape.GREEN);
// Save the state vector:
ObjectOutputStream out = new ObjectOutputStream(
new FileOutputStream("CADState.out"));
out.writeObject(shapeTypes);
Line.serializeStaticState(out);
out.writeObject(shapes);
} else { // There's a command-line argument
ObjectInputStream in = new ObjectInputStream(
new FileInputStream(args[0]));
// Read in the same order they were written:
shapeTypes = (List)in.readObject();
Line.deserializeStaticState(in);
shapes = (List)in.readObject();
}
// Display the shapes:
System.out.println(shapes);
}
} ///:~
The Shape class implements Serializable,
so anything that is inherited from Shape is automatically Serializable as
well. Each Shape contains data, and each derived Shape class contains a static
field that determines the color of all of those types of Shapes. (Placing a static
field in the base class would result in only one field, since static fields are not
duplicated in derived classes.) Methods in the base class can be overridden to set the
color for the various types (static methods are not dynamically bound, so these are
normal methods). The randomFactory( ) method creates a different Shape
each time you call it, using random values for the Shape data. .
Circle and Square are straightforward extensions of Shape; the
only difference is that Circle initializes color at the point of definition
and Square initializes it in the constructor. Well leave the discussion of Line
for later. .
In main( ), one ArrayList is used to hold the Class objects
and the other to hold the shapes. If you dont provide a command-line argument, the shapeTypes
ArrayList is created and the Class objects are added, and then the shapes
ArrayList is created and Shape objects are added. Next, all the static
color values are set to GREEN, and everything is serialized to the file CADState.out.
.
If you provide a command-line argument (presumably CADState.out), that file is
opened and used to restore the state of the program. In both situations, the resulting ArrayList
of Shapes is printed. The results from one run are:
$ java CADState [class Circlecolor[3] xPos[71] yPos[82] dim[44] , class Squarecolor[3] xPos[98] yPos[21] dim[49] , class Linecolor[3] xPos[16] yPos[80] dim[37] , class Circlecolor[3] xPos[51] yPos[74] dim[7] , class Squarecolor[3] xPos[7] yPos[78] dim[98] , class Linecolor[3] xPos[38] yPos[79] dim[93] , class Circlecolor[3] xPos[84] yPos[12] dim[62] , class Squarecolor[3] xPos[16] yPos[51] dim[94] , class Linecolor[3] xPos[51] yPos[0] dim[73] , class Circlecolor[3] xPos[47] yPos[6] dim[49] ] $ java CADState CADState.out [class Circlecolor[1] xPos[71] yPos[82] dim[44] , class Squarecolor[0] xPos[98] yPos[21] dim[49] , class Linecolor[3] xPos[16] yPos[80] dim[37] , class Circlecolor[1] xPos[51] yPos[74] dim[7] , class Squarecolor[0] xPos[7] yPos[78] dim[98] , class Linecolor[3] xPos[38] yPos[79] dim[93] , class Circlecolor[1] xPos[84] yPos[12] dim[62] , class Squarecolor[0] xPos[16] yPos[51] dim[94] , class Linecolor[3] xPos[51] yPos[0] dim[73] , class Circlecolor[1] xPos[47] yPos[6] dim[49] ]
You can see that the values of xPos, yPos, and dim were all stored
and recovered successfully, but theres something wrong with the retrieval of the static
information. Its all 3 going in, but it doesnt come out that way. Circles
have a value of 1 (RED, which is the definition), and Squares have a value
of 0 (remember, they are initialized in the constructor). Its as if the statics
didnt get serialized at all! Thats righteven though class Class
is Serializable, it doesnt do what you expect. So if you want to serialize statics,
you must do it yourself. .
This is what the serializeStaticState( ) and deserializeStaticState( )
static methods in Line are for. You can see that they are explicitly called
as part of the storage and retrieval process. (Note that the order of writing to the
serialize file and reading back from it must be maintained.) Thus to make CADState.java
run correctly, you must: .
Another issue you might have to think about is security, since serialization also saves
private data. If you have a security issue, those fields should be marked as transient.
But then you have to design a secure way to store that information so that when you do a
restore you can reset those private variables. .
JDK 1.4 introduced the Preferences API,
which is much closer to persistence than object serialization because it automatically
stores and retrieves your information. However, its use is restricted to small and limited
data setsyou can only hold primitives and Strings, and the length of each
stored String cant be longer than 8K (not tiny, but you dont want to
build anything serious with it, either). As the name suggests, the Preferences API is
designed to store and retrieve user preferences and program-configuration settings. .
Preferences are key-value sets (like Maps) stored in a hierarchy of nodes.
Although the node hierarchy can be used to create complicated structures, its
typical to create a single node named after your class and store the information there.
Heres a simple example: .
//: c12:PreferencesDemo.java
import java.util.prefs.*;
import java.util.*;
public class PreferencesDemo {
public static void main(String[] args) throws Exception {
Preferences prefs = Preferences
.userNodeForPackage(PreferencesDemo.class);
prefs.put("Location", "Oz");
prefs.put("Footwear", "Ruby Slippers");
prefs.putInt("Companions", 4);
prefs.putBoolean("Are there witches?", true);
int usageCount = prefs.getInt("UsageCount", 0);
usageCount++;
prefs.putInt("UsageCount", usageCount);
Iterator it = Arrays.asList(prefs.keys()).iterator();
while(it.hasNext()) {
String key = it.next().toString();
System.out.println(key + ": "+ prefs.get(key, null));
}
// You must always provide a default value:
System.out.println(
"How many companions does Dorothy have? " +
prefs.getInt("Companions", 0));
}
} ///:~
Here, userNodeForPackage( ) is used, but you could
also choose systemNodeForPackage( ); the choice is
somewhat arbitrary, but the idea is that user is for individual user
preferences, and system is for general installation configuration. Since main( )
is static, PreferencesDemo.class is used to identify the node, but inside a
non-static method, youll usually use getClass( ). You dont need to
use the current class as the node identifier, but thats the usual practice. .
Once you create the node, its available for either loading or reading data. This
example loads the node with various types of items and then gets the keys( ).
These come back as a String[], which you might not expect if youre used to keys( )
in the collections library. Here, theyre converted to a List that is used to
produce an Iterator for printing the keys and values. Notice the second argument to
get( ). This is the default value that is produced if there isnt any
entry for that key value. While iterating through a set of keys, you always know
theres an entry, so using null as the default is safe, but normally
youll be fetching a named key, as in: .
prefs.getInt("Companions", 0));
In the normal case, youll want to provide a reasonable default value. In fact, a
typical idiom is seen in the lines:
int usageCount = prefs.getInt("UsageCount", 0);
usageCount++;
prefs.putInt("UsageCount", usageCount);
This way, the first time you run the program, the UsageCount will be zero, but
on subsequent invocations it will be nonzero. .
When you run PreferencesDemo.java youll see that the UsageCount
does indeed increment every time you run the program, but where is the data stored?
Theres no local file that appears after the program is run the first time. The
Preferences API uses appropriate system resources to accomplish its task, and these will
vary depending on the OS. In Windows, the registry is used (since its already a
hierarchy of nodes with key-value pairs). But the whole point is that the information is
magically stored for you so that you dont have to worry about how it works from one
system to another. .
Theres more to the Preferences API than shown here. Consult the JDK
documentation, which is fairly understandable, for further details. .
To finish this chapter, well look at regular
expressions, which were added in JDK 1.4 but have been integral to standard Unix
utilities like sed and awk, and languages like Python and Perl (some would argue that they
are predominant reason for Perls success). Technically, these are string
manipulation tools (previously delegated to the String, StringBuffer, and StringTokenizer
classes in Java), but they are typically used in conjunction with I/O, so its not
too far-fetched to include them here.[66] .
Regular expressions are powerful and flexible text-processing tools. They allow you to
specify, programmatically, complex patterns of text that can be discovered in an input
string. Once you discover these patterns, you can then react to them any way you want.
Although the syntax of regular expressions can be intimidating at first, they provide a
compact and dynamic language that can be employed to solve all sorts of string processing,
matching and selection, editing, and verification problems in a completely general way. .
You can begin learning regular expressions with a useful subset of the possible
constructs. A complete list of constructs for building regular expressions can be found in
the javadocs for the Pattern class for package java.util.regex. .
| Characters |
|
| B |
The specific
character B |
| \xhh |
Character
with hex value 0xhh |
| \uhhhh |
The Unicode
character with hex representation 0xhhhh |
| \t |
Tab |
| \n |
Newline |
| \r |
Carriage
return |
| \f |
Form feed |
| \e |
Escape |
The power of regular expressions begins to appear when defining
character classes. Here are some typical ways to create character classes, and some
predefined classes: .
| Character Classes |
|
| . |
Represents
any character |
| [abc] |
Any of the
characters a, b, or c (same as a|b|c) |
| [^abc] |
Any character
except a, b, and c (negation) |
| [a-zA-Z] |
Any character
a through z or A through Z (range) |
| [abc[hij]] |
Any of a,b,c,h,i,j
(same as a|b|c|h|i|j) (union) |
| [a-z&&[hij]] |
Either h,
i, or j (intersection) |
| \s |
A whitespace
character (space, tab, newline, formfeed, carriage return) |
| \S |
A
non-whitespace character ([^\s]) |
| \d |
A numeric
digit [0-9] |
| \D |
A non-digit [^0-9] |
| \w |
A word
character [a-zA-Z_0-9] |
| \W |
A non-word
character [^\w] |
If you have any experience with regular expressions in other
languages, youll immediately notice a difference in the way backslashes are handled.
In other languages, \\ means I want to insert a plain old
(literal) backslash in the regular expression. Dont give it any special
meaning. In Java, \\ means Im inserting a regular
expression backslash, so the following character has special meaning. For example,
if you want to indicate one or more word characters, your regular expression string will
be \\w+. If you want to insert a literal backslash, you say \\\\.
However, things like newlines and tabs just use a single backslash: \n\t. .
Whats shown here is only a sampling; youll want to have the java.util.regex.Pattern
JDK documentation page bookmarked or on your Start menu so you can easily
access all the possible regular expression patterns. .
| Logical Operators |
|
| XY |
X followed by
Y |
| X|Y |
X or Y |
| (X) |
A capturing
group. You can refer to the ith captured group later in the
expression with \i |
| Boundary Matchers |
|
| ^ |
Beginning of
a line |
| $ |
End of a line |
| \b |
Word boundary |
| \B |
Non-word
boundary |
| \G |
End of the
previous match |
As an example, each of the following represent valid regular
expressions, and all will successfully match the character sequence "Rudolph":
Rudolph [rR]udolph [rR][aeiou][a-z]ol.* R.*
A quantifier describes the way that a pattern absorbs
input text:
| Greedy |
Reluctant |
Possessive |
Matches |
| X? |
X?? |
X?+ |
X, one or none |
| X* |
X*? |
X*+ |
X, zero or more |
| X+ |
X+? |
X++ |
X, one or more |
| X{n} |
X{n}? |
X{n}+ |
X, exactly n times |
| X{n,} |
X{n,}? |
X{n,}+ |
X, at least n times |
| X{n,m} |
X{n,m}? |
X{n,m}+ |
X, at least n but not more
than m times |
You should be very aware that the expression X will
often need to be surrounded in parentheses for it to work the way you desire. For example:
abc+
Might seem like it would match the sequence abc one or more times, and if
you apply it to the input string abcabcabc, you will in fact get three
matches. However, the expression actually says match ab followed
by one or more occurrences of c. To match the entire string
abc one or more times, you must say:
(abc)+
You can easily be fooled when using regular expressions; its a new language, on
top of Java. .
JDK 1.4 defines a new interface called CharSequence,
which establishes a definition of a character sequence abstracted from the String
or StringBuffer classes:
interface CharSequence {
charAt(int i);
length();
subSequence(int start, int end);
toString();
}
The String, StringBuffer, and CharBuffer classes have been
modified to implement this new CharSequence interface. Many regular expression
operations take CharSequence arguments. .
As a first example, the following class can be used to test regular expressions against
an input string. The first argument is the input string to match against, followed by one
or more regular expressions to be applied to the input. Under Unix/Linux, the regular
expressions must be quoted on the command line. .
This program can be useful in testing regular expressions as you construct them to see
that they produce your intended matching behavior.
//: c12:TestRegularExpression.java
// Allows you to easly try out regular expressions.
// {Args: abcabcabcdefabc "abc+" "(abc)+" "(abc){2,}" }
import java.util.regex.*;
public class TestRegularExpression {
public static void main(String[] args) {
if(args.length < 2) {
System.out.println("Usage:\n" +
"java TestRegularExpression " +
"characterSequence regularExpression+");
System.exit(0);
}
System.out.println("Input: \"" + args[0] + "\"");
for(int i = 1; i < args.length; i++) {
System.out.println(
"Regular expression: \"" + args[i] + "\"");
Pattern p = Pattern.compile(args[i]);
Matcher m = p.matcher(args[0]);
while(m.find()) {
System.out.println("Match \"" + m.group() +
"\" at positions " +
m.start() + "-" + (m.end() - 1));
}
}
}
} ///:~
Regular expressions are implemented in Java through the Pattern and Matcher classes in the package java.util.regex. A Pattern
object represents a compiled version of a regular expression. The static compile( )
method compiles a regular expression string into a Pattern object. As seen in the
preceding example, you can use the matcher( ) method and the input string to
produce a Matcher object from the compiled Pattern object. Pattern
also has a
static boolean ( regex, input)
for quickly discerning if regex can be found in input, and a split( )
method that produces an array of String that has been broken around matches of the regex.
.
A Matcher object is generated by calling Pattern.matcher( ) with the
input string as an argument. The Matcher object is then used to access the results,
using methods to evaluate the success or failure of different types of matches:
boolean matches() boolean lookingAt() boolean find() boolean find(int start)
The matches( ) method is successful if the pattern matches the entire input
string, while lookingAt( ) is successful if the input string, starting at the
beginning, is a match to the pattern. .
Matcher.find( ) can be used to discover multiple pattern matches in the CharSequence
to which it is applied. For example:
//: c12:FindDemo.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
import java.util.*;
public class FindDemo {
private static Test monitor = new Test();
public static void main(String[] args) {
Matcher m = Pattern.compile("\\w+")
.matcher("Evening is full of the linnet's wings");
while(m.find())
System.out.println(m.group());
int i = 0;
while(m.find(i)) {
System.out.print(m.group() + " ");
i++;
}
monitor.expect(new String[] {
"Evening",
"is",
"full",
"of",
"the",
"linnet",
"s",
"wings",
"Evening vening ening ning ing ng g is is s full " +
"full ull ll l of of f the the he e linnet linnet " +
"innet nnet net et t s s wings wings ings ngs gs s "
});
}
} ///:~
The pattern \\w+ indicates one or more word characters,
so it will simply split up the input into words. find( ) is like an iterator,
moving forward through the input string. However, the second version of find( )
can be given an integer argument that tells it the character position for the beginning of
the searchthis version resets the search position to the value of the argument, as
you can see from the output. .
Groups are regular expressions set off by parentheses that can
be called up later with their group number. Group zero indicates the whole expression
match, group one is the first parenthesized group, etc. Thus in
A(B(C))D
there are three groups: Group 0 is ABCD, group 1 is BC, and group 2 is C.
.
The Matcher object has methods to give you information about groups:
public int groupCount( ) returns the number of groups in this matcher's
pattern. Group zero is not included in this count.
public String group( ) returns group zero (the entire match) from the
previous match operation (find( ), for example).
public String group(int i) returns the given group number during the previous
match operation. If the match was successful, but the group specified failed to match any
part of the input string, then null is returned.
public int start(int group) returns the start index of the group found in the
previous match operation.
public int end(int group) returns the index of the last character, plus one, of
the group found in the previous match operation. .
Heres an example of regular expression groups:
//: c12:Groups.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
public class Groups {
private static Test monitor = new Test();
static public final String poem =
"Twas brillig, and the slithy toves\n" +
"Did gyre and gimble in the wabe.\n" +
"All mimsy were the borogoves,\n" +
"And the mome raths outgrabe.\n\n" +
"Beware the Jabberwock, my son,\n" +
"The jaws that bite, the claws that catch.\n" +
"Beware the Jubjub bird, and shun\n" +
"The frumious Bandersnatch.";
public static void main(String[] args) {
Matcher m =
Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$")
.matcher(poem);
while(m.find()) {
for(int j = 0; j <= m.groupCount(); j++)
System.out.print("[" + m.group(j) + "]");
System.out.println();
}
monitor.expect(new String[]{
"[the slithy toves]" +
"[the][slithy toves][slithy][toves]",
"[in the wabe.][in][the wabe.][the][wabe.]",
"[were the borogoves,]" +
"[were][the borogoves,][the][borogoves,]",
"[mome raths outgrabe.]" +
"[mome][raths outgrabe.][raths][outgrabe.]",
"[Jabberwock, my son,]" +
"[Jabberwock,][my son,][my][son,]",
"[claws that catch.]" +
"[claws][that catch.][that][catch.]",
"[bird, and shun][bird,][and shun][and][shun]",
"[The frumious Bandersnatch.][The]" +
"[frumious Bandersnatch.][frumious][Bandersnatch.]"
});
}
} ///:~
The poem is the first part of Lewis Carrolls Jabberwocky, from Through
the Looking Glass. You can see that the regular expression pattern has a number of
parenthesized groups, consisting of any number of non-whitespace characters (\S+)
followed by any number of whitespace characters (\s+). The goal is to
capture the last three words on each line; the end of a line is delimited by $.
However, the normal behavior is to match $ with the end of the entire
input sequence, so we must explicitly tell the regular expression to pay attention to
newlines within the input. This is accomplished with the (?m) pattern
flag at the beginning of the sequence (pattern flags will be shown shortly). .
Following a successful matching operation, start( ) returns the start index
of the previous match, and end( ) returns the index of the last character
matched, plus one. Invoking either start( ) or end( ) following an
unsuccessful matching operation (or prior to a matching operation being attempted)
produces an IllegalStateException. The following program also demonstrates matches( )
and lookingAt( ): .
//: c12:StartEnd.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
public class StartEnd {
private static Test monitor = new Test();
public static void main(String[] args) {
String[] input = new String[] {
"Java has regular expressions in 1.4",
"regular expressions now expressing in Java",
"Java represses oracular expressions"
};
Pattern
p1 = Pattern.compile("re\\w*"),
p2 = Pattern.compile("Java.*");
for(int i = 0; i < input.length; i++) {
System.out.println("input " + i + ": " + input[i]);
Matcher
m1 = p1.matcher(input[i]),
m2 = p2.matcher(input[i]);
while(m1.find())
System.out.println("m1.find() '" + m1.group() +
"' start = "+ m1.start() + " end = " + m1.end());
while(m2.find())
System.out.println("m2.find() '" + m2.group() +
"' start = "+ m2.start() + " end = " + m2.end());
if(m1.lookingAt()) // No reset() necessary
System.out.println("m1.lookingAt() start = "
+ m1.start() + " end = " + m1.end());
if(m2.lookingAt())
System.out.println("m2.lookingAt() start = "
+ m2.start() + " end = " + m2.end());
if(m1.matches()) // No reset() necessary
System.out.println("m1.matches() start = "
+ m1.start() + " end = " + m1.end());
if(m2.matches())
System.out.println("m2.matches() start = "
+ m2.start() + " end = " + m2.end());
}
monitor.expect(new String[] {
"input 0: Java has regular expressions in 1.4",
"m1.find() 'regular' start = 9 end = 16",
"m1.find() 'ressions' start = 20 end = 28",
"m2.find() 'Java has regular expressions in 1.4'" +
" start = 0 end = 35",
"m2.lookingAt() start = 0 end = 35",
"m2.matches() start = 0 end = 35",
"input 1: regular expressions now " +
"expressing in Java",
"m1.find() 'regular' start = 0 end = 7",
"m1.find() 'ressions' start = 11 end = 19",
"m1.find() 'ressing' start = 27 end = 34",
"m2.find() 'Java' start = 38 end = 42",
"m1.lookingAt() start = 0 end = 7",
"input 2: Java represses oracular expressions",
"m1.find() 'represses' start = 5 end = 14",
"m1.find() 'ressions' start = 27 end = 35",
"m2.find() 'Java represses oracular expressions' " +
"start = 0 end = 35",
"m2.lookingAt() start = 0 end = 35",
"m2.matches() start = 0 end = 35"
});
}
} ///:~
Notice that find( ) will locate the regular expression anywhere in the
input, but lookingAt( ) and matches( ) only succeed if the regular
expression starts matching at the very beginning of the input. While matches( )
only succeeds if the entire input matches the regular expression, lookingAt( )[67] succeeds if only the first part of the input
matches. .
An alternative compile( ) method accepts flags that affect the behavior of
regular expression matching:
Pattern Pattern.compile(String regex, int flag)
where flag is drawn from among the following Pattern class constants:
| Compile Flag |
Effect |
| Pattern.CANON_EQ |
Two characters will be
considered to match if, and only if, their full canonical decompositions match. The
expression a\u030A, for example, will match the string ? when this
flag is specified. By default, matching does not take canonical equivalence into account. |
| Pattern.CASE_INSENSITIVE (?i) |
By default, case-insensitive
matching assumes that only characters in the US-ASCII character set are being matched.
This flag allows your pattern to match without regard to case (upper or lower).
Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE
flag in conjunction with this flag. |
| Pattern.COMMENTS (?x) |
In this mode, whitespace is
ignored, and embedded comments starting with # are ignored until the end of a line. Unix
lines mode can also be enabled via the embedded flag expression. |
| Pattern.DOTALL (?s) |
In dotall mode, the expression
. matches any character, including a line terminator. By default, the
. expression does not match line terminators. |
| Pattern.MULTILINE (?m) |
In multiline mode, the
expressions ^ and $ match the beginning and ending
of a line, respectively. ^ also matches the beginning of the input
string, and $ also matches the end of the input string. By default,
these expressions only match at the beginning and the end of the entire input string. |
| Pattern.UNICODE_CASE (?u) |
When this flag is specified,
case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a
manner consistent with the Unicode Standard. By default, case-insensitive matching assumes
that only characters in the US-ASCII character set are being matched. |
| Pattern.UNIX_LINES (?d) |
In this mode, only the \n
line terminator is recognized in the behavior of ., ^,
and $. |
Particularly useful among these flags are Pattern.CASE_INSENSITIVE,
Pattern.MULTILINE, and Pattern.COMMENTS (which is helpful for clarity and/or
documentation). Note that the behavior of most of the flags can also be obtained by
inserting the parenthesized characters, shown in the table beneath the flags, into your
regular expression preceding the place where you want the mode to take effect. .
You can combine the effect of these and other flags through an "OR" (|)
operation:
//: c12:ReFlags.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
public class ReFlags {
private static Test monitor = new Test();
public static void main(String[] args) {
Pattern p = Pattern.compile("^java",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher m = p.matcher(
"java has regex\nJava has regex\n" +
"JAVA has pretty good regular expressions\n" +
"Regular expressions are in Java");
while(m.find())
System.out.println(m.group());
monitor.expect(new String[] {
"java",
"Java",
"JAVA"
});
}
} ///:~
This creates a pattern that will match lines starting with java,
Java, JAVA, etc., and attempt a match for each line within a
multiline set (matches starting at the beginning of the character sequence and following
each line terminator within the character sequence). Note that the group( )
method only produces the matched portion. .
Splitting divides an input string into an array of String objects, delimited by
the regular expression.
String[] split(CharSequence charseq) String[] split(CharSequence charseq, int limit)
This is a quick and handy way of breaking up input text over a common boundary:
//: c12:SplitDemo.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
import java.util.*;
public class SplitDemo {
private static Test monitor = new Test();
public static void main(String[] args) {
String input =
"This!!unusual use!!of exclamation!!points";
System.out.println(Arrays.asList(
Pattern.compile("!!").split(input)));
// Only do the first three:
System.out.println(Arrays.asList(
Pattern.compile("!!").split(input, 3)));
System.out.println(Arrays.asList(
"Aha! String has a split() built in!".split(" ")));
monitor.expect(new String[] {
"[This, unusual use, of exclamation, points]",
"[This, unusual use, of exclamation!!points]",
"[Aha!, String, has, a, split(), built, in!]"
});
}
} ///:~
The second form of split( ) limits the number of splits that occur. .
Notice that regular expressions are so valuable that some operations have also been
added to the String class, including split( ) (shown here), matches( ),
replaceFirst( ), and replaceAll( ). These behave like their Pattern
and Matcher counterparts. .
Regular expressions become especially useful when you begin replacing text. Here are
the available methods:
replaceFirst(String replacement) replaces the first matching part of the input
string with replacement. .
replaceAll(String replacement) replaces every matching part of the input string
with replacement. .
appendReplacement(StringBuffer sbuf, String replacement) performs step-by-step
replacements into sbuf, rather than replacing only the first one or all of them, as
in replaceFirst( ) and replaceAll( ), respectively. This is a very
important method, because it allows you to call methods and perform other processing in
order to produce replacement (replaceFirst( ) and replaceAll( )
are only able to put in fixed strings). With this method, you can programmatically pick
apart the groups and create powerful replacements. .
appendTail(StringBuffer sbuf, String replacement) is invoked after one or more
invocations of the appendReplacement( ) method in order to copy the remainder
of the input string. .
Heres an example that shows the use of all the replace operations. In addition,
the block of commented text at the beginning is extracted and processed with regular
expressions for use as input in the rest of the example:
//: c12:TheReplacements.java
import java.util.regex.*;
import java.io.*;
import com.bruceeckel.util.*;
import com.bruceeckel.simpletest.*;
/*! Here's a block of text to use as input to
the regular expression matcher. Note that we'll
first extract the block of text by looking for
the special delimiters, then process the
extracted block. !*/
public class TheReplacements {
private static Test monitor = new Test();
public static void main(String[] args) throws Exception {
String s = TextFile.read("TheReplacements.java");
// Match the specially-commented block of text above:
Matcher mInput =
Pattern.compile("/\\*!(.*)!\\*/", Pattern.DOTALL)
.matcher(s);
if(mInput.find())
s = mInput.group(1); // Captured by parentheses
// Replace two or more spaces with a single space:
s = s.replaceAll(" {2,}", " ");
// Replace one or more spaces at the beginning of each
// line with no spaces. Must enable MULTILINE mode:
s = s.replaceAll("(?m)^ +", "");
System.out.println(s);
s = s.replaceFirst("[aeiou]", "(VOWEL1)");
StringBuffer sbuf = new StringBuffer();
Pattern p = Pattern.compile("[aeiou]");
Matcher m = p.matcher(s);
// Process the find information as you
// perform the replacements:
while(m.find())
m.appendReplacement(sbuf, m.group().toUpperCase());
// Put in the remainder of the text:
m.appendTail(sbuf);
System.out.println(sbuf);
monitor.expect(new String[]{
"Here's a block of text to use as input to",
"the regular expression matcher. Note that we'll",
"first extract the block of text by looking for",
"the special delimiters, then process the",
"extracted block. ",
"H(VOWEL1)rE's A blOck Of tExt tO UsE As InpUt tO",
"thE rEgUlAr ExprEssIOn mAtchEr. NOtE thAt wE'll",
"fIrst ExtrAct thE blOck Of tExt by lOOkIng fOr",
"thE spEcIAl dElImItErs, thEn prOcEss thE",
"ExtrActEd blOck. "
});
}
} ///:~
The file is opened and read using the TextFile.read( ) method introduced
earlier in this chapter. mInput is created to match all the text (notice the
grouping parentheses) between /*! and !*/. Then,
more than two spaces are reduced to a single space, and any space at the beginning of each
line is removed (in order to do this on all lines and not just the beginning of the input,
multiline mode must be enabled). These two replacements are performed with the equivalent
(but more convenient, in this case) replaceAll( ) thats part of String.
Note that since each replacement is only used once in the program, theres no extra
cost to doing it this way rather than precompiling it as a Pattern. .
replaceFirst( ) only performs the first replacement that it finds. In
addition, the replacement strings in replaceFirst( ) and replaceAll( )
are just literals, so if you want to perform some processing on each replacement they
dont help. In that case, you need to use appendReplacement( ), which
allows you to write any amount of code in the process of performing the replacement. In
the preceding example, a group( ) is selected and processedin this
situation, setting the vowel found by the regular expression to upper caseas the
resulting sbuf is being built. Normally, you would step through and perform all the
replacements and then call appendTail( ), but if you wanted to simulate replaceFirst( )
(or replace n), you would just do the replacement one time and then call appendTail( )
to put the rest into sbuf. .
appendReplacement( ) also allows you to refer to captured groups directly
in the replacement string by saying $g where g is the group
number. However, this is for simpler processing and wouldnt give you the desired
results in the preceding program. .
An existing Matcher object can be applied to a new character sequence Using the reset( )
methods:
//: c12:Resetting.java
import java.util.regex.*;
import java.io.*;
import com.bruceeckel.simpletest.*;
public class Resetting {
private static Test monitor = new Test();
public static void main(String[] args) throws Exception {
Matcher m = Pattern.compile("[frb][aiu][gx]")
.matcher("fix the rug with bags");
while(m.find())
System.out.println(m.group());
m.reset("fix the rig with rags");
while(m.find())
System.out.println(m.group());
monitor.expect(new String[]{
"fix",
"rug",
"bag",
"fix",
"rig",
"rag"
});
}
} ///:~
reset( ) without any arguments sets the Matcher to the beginning of
the current sequence. .
Most of the examples so far have shown regular expressions applied to static strings.
The following example shows one way to apply regular expressions to search for matches in
a file. Inspired by Unixs grep, JGrep.java takes two arguments: a
filename and the regular expression that you want to match. The output shows each line
where a match occurs and the match position(s) within the line. .
//: c12:JGrep.java
// A very simple version of the "grep" program.
// {Args: JGrep.java "\\b[Ssct]\\w+"}
import java.io.*;
import java.util.regex.*;
import java.util.*;
import com.bruceeckel.util.*;
public class JGrep {
public static void main(String[] args) throws Exception {
if(args.length < 2) {
System.out.println("Usage: java JGrep file regex");
System.exit(0);
}
Pattern p = Pattern.compile(args[1]);
// Iterate through the lines of the input file:
ListIterator it = new TextFile(args[0]).listIterator();
while(it.hasNext()) {
Matcher m = p.matcher((String)it.next());
while(m.find())
System.out.println(it.nextIndex() + ": " +
m.group() + ": " + m.start());
}
}
} ///:~
The file is opened as a TextFile object (these were introduced earlier in this
chapter). Since a TextFile contains the lines of the file in an ArrayList,
from that array a ListIterator is produced. The result is an iterator that will
allow you to move through the lines of the file (forward and backward). .
Each input line is used to produce a Matcher, and the result is scanned with find( ).
Note that the ListIterator.nextIndex( ) keeps track of the line numbers. .
The test arguments open the JGrep.java file to read as input, and search for
words starting with [Ssct]. .
The new capabilities provided with regular expressions might prompt you to wonder
whether the original StringTokenizer class is still
necessary. Before JDK 1.4, the way to split a string into parts was to
tokenize it with StringTokenizer. But now its much easier and
more succinct to do the same thing with regular expressions:
//: c12:ReplacingStringTokenizer.java
import java.util.regex.*;
import com.bruceeckel.simpletest.*;
import java.util.*;
public class ReplacingStringTokenizer {
private static Test monitor = new Test();
public static void main(String[] args) {
String input = "But I'm not dead yet! I feel happy!";
StringTokenizer stoke = new StringTokenizer(input);
while(stoke.hasMoreElements())
System.out.println(stoke.nextToken());
System.out.println(Arrays.asList(input.split(" ")));
monitor.expect(new String[] {
"But",
"I'm",
"not",
"dead",
"yet!",
"I",
"feel",
"happy!",
"[But, I'm, not, dead, yet!, I, feel, happy!]"
});
}
} ///:~
With regular expressions, you can also split a string into parts using more complex
patternssomething thats much more difficult with StringTokenizer. It
seems safe to say that regular expressions replace any tokenizing classes in earlier
versions of Java. .
You can learn much more about regular expressions in Mastering Regular Expressions,
2nd Edition, by Jeffrey E. F. Friedl (OReilly, 2002). .
The Java I/O stream library does satisfy the basic requirements: you can perform
reading and writing with the console, a file, a block of memory, or even across the
Internet. With inheritance, you can create new types of input and output objects. And you
can even add a simple extensibility to the kinds of objects a stream will accept by
redefining the toString( ) method thats automatically called when you
pass an object to a method thats expecting a String (Javas limited
automatic type conversion). .
There are questions left unanswered by the documentation and design of the I/O stream
library. For example, it would have been nice if you could say that you want an exception
thrown if you try to overwrite a file when opening it for outputsome programming
systems allow you to specify that you want to open an output file, but only if it
doesnt already exist. In Java, it appears that you are supposed to use a File
object to determine whether a file exists, because if you open it as a FileOutputStream
or FileWriter, it will always get overwritten. .
The I/O stream library brings up mixed feelings; it does much of the job and its
portable. But if you dont already understand the decorator pattern, the design is
not intuitive, so theres extra overhead in learning and teaching it. Its also
incomplete; for example, I shouldnt have to write utilities like TextFile,
and theres no support for the kind of output formatting that virtually every other
languages I/O package supports. .
However, once you do understand the decorator pattern and begin using the
library in situations that require its flexibility, you can begin to benefit from this
design, at which point its cost in extra lines of code may not bother you as much. .
If you do not find what youre looking for in this chapter (which has only been an
introduction and is not meant to be comprehensive), you can find in-depth coverage in Java
I/O, by Elliotte Rusty Harold (OReilly, 1999). .
Solutions to selected exercises can be found in the electronic document The Thinking
in Java Annotated Solution Guide, available for a small fee from www.BruceEckel.com.
^Java
\Breg.*
n.w\s+h(a|i)s
s?
s*
s+
s{4}
s{1.}
s{0,3}
(?i)((^[aeiou])|(\s+[aeiou]))\w+?[aeiou]\b
to
"Arline ate eight apples and one orange while Anita hadn't any"
String[] filenames = new File(".").list();
[61] Design
Patterns, Erich Gamma et al., Addison-Wesley 1995.
[62] Its not
clear that this was a good design decision, especially compared to the simplicity of I/O
libraries in other languages. But its the justification for the decision.
[63] XML is another
way to solve the problem of moving data across different computing platforms, and does not
depend on having Java on all platforms. JDK 1.4 contains XML tools in javax.xml.*
libraries. These are covered in Thinking in Enterprise Java, at www.MindView.net.
[64] Chapter 13 shows
an even more convenient solution for this: a GUI program with a scrolling text area.
[65] Chintan Thakker
contributed to this section.
[66] A chapter
dedicated to strings will have to wait until the 4th edition. Mike Shea
contributed to this section.
[67] I have no idea
how they came up with this method name, or what its supposed to refer to. But
its reassuring to know that whoever comes up with nonintuitive method names is still
employed at Sun. And that their apparent policy of not reviewing code designs is still in
place. Sorry for the sarcasm, but this kind of thing gets tiresome after a few years.
|
|
|