HTML Markup | JavaScript | Java | Home & Links

Java Tutorial 9 - File IO

In addition to redirection using standard IO streams, Java provides file streams, data streams, pipe streams and object streams which require access to the io class library using the instruction import java.io.*. Complex structures such as records and trees use object stream files.

File Management

File management is the manner in which files are monitored and controlled for standard I/O access. The File class provides a constructor to create a file handle. This file handle is then used by various file class methods to access the properties of a specific file or by file stream constructors to open files. The File class also provides an appropriate platform dependent directory and file separator symbol (slash or backslash) using either File.separator or File.separatorChar. Simple file constructors either use hardcoded names or pass a value from the parameter line such as:

File simple=new File("sample.dat"); // simple name in current dir
File path=new File("subfolder/sample.dat"); // using relative path
File hard=new File(File.separator + "sample.dat"); // in root dir
File soft=new File(args[0]); // entered as part of command line

Accessor methods: getAbsolutePath(), getCanonicalPath(), getName(), getPath(), getParent(), lastModified(), length(), list() [returns array of String], listFiles() [returns array of File objects].

Mutator methods: delete(), deleteOnExit(), mkdir(), mkdirs(), renameTo(), setLastModified(), setReadOnly().

Boolean methods: canRead(), canWrite(), compareTo(), exists(), isAbsolute(), isDirectory(), isFile(), isHidden().

Here is a very useful routine to establish the current directory path:

To limit what is returned by the list() method, apply a filter using the FilenameFilter interface. accept() is the only method allowed. A program showing the use of FilenameFilter is:

Note: More sophisticated GUI techniques for selecting files include the Swing JFileChooser class and its awt cousin FileDialog. These classes also include file filters and checks for existence and the ability to access the data.

File Streams

File streams are primitive streams whose sources or destinations are files. Both byte [8-bit] (FileInputStream / FileOutputStream) and character [16-bit] (FileReader / FileWriter) quantities can be used. Streams are opened when constructed. The constructors are:

FileInputStream(fileObj)                  FileReader(fileObj)
FileInputStream(FilePath[,append_flag])   FileReader(filePath[,append_flag])
FileOutputStream(fileObj)                 FileWriter(fileObj)
FileOutputStream(FilePath[,append_flag])  FileWriter(filePath[,append_flag])
InputStreamReader(System.in) /*read from console*/

Note: Use character streams for new code! This allows Unicode material to be processed correctly.

Note: Always close any opened output file to make sure that the file buffers are completely written.

Note: The read() method returns an integer even when character streams are used.

copybyte.java uses primitive 8-bit file streams to read and copy bytes to a new file. Set DEBUG=true to get a diagnostic byte/ascii screen dump. Use copybyte as a start point for other utilities by placing code between the read() and write() methods. However there are more efficient ways of handling most types of data. File streams should be wrapped and buffered in data streams. Refer to fileCopy() for a very efficient network file duplication method.

Data Streams

Data streams are streams whose sources and destinations are other streams. They are known as wrappers because they wrap the primitive file stream object mechanism inside a more powerful one. Data streams are buffered so that more than a single 8/16 bit quantity is processed at a time. The basic buffered streams are BufferedInputStream(), BufferedOutputStream(), BufferedReader() and BufferedWriter().

DataInputStream() and DataOutputStream() streams can also be used to read/write primitive data types. Some useful methods are: read(), readXX(), write() and writeXX() where XX is a primitive data type.

Note: Writing files in binary rather than character format removes the temptation to modify stored data directly.

Note: Java does not provide an EOF() method as other languages do! Whenever an EOF event occurs read() returns an integer -1 and readLine() returns null. But a better technique is to use an exception handler to catch the EOFException and handle it explicitly.

copyline.java uses buffered 16-bit data streams to read and copy lines to a new file. Many utilities rely on text files which are often best handled one line at a time. Set STRIP=true to remove blank lines from file. Use copyline as a start point for your own utility by altering the lines between the readLine() and write() methods.

Stream Tokenization

The StreamTokenizer class can be used to read tokens directly from a file stream! This makes some utilities more efficient because they can work with individual tokens rather than lines of text (ie. the line had already been parsed). The constructor requires a filereader object as a parameter. resetSyntax() allows setting custom delimiters. Whitespace is defined with the whitespaceChars(iStart,iEnd) method. Valid word characters are defined with the wordChars(iStart,iEnd) method. Unfortunately this selection by range limits usefulness of the class! The eolIsSignificant(true) method allows newlines to be detected. tttype contains the type of token scanned by nextToken(). The token scanned sits in either nval (numeric) or sval (string). copytoken.java uses a tokenized 16-bit token stream to read and copy tokens to a new file.

Note: Whitespace is minimized by tokenization which makes it a great method for compressing HTML source files into a server copy. Use copytoken as a start point and add your own utility between the read and write operations. One easy project to start with is wc (the Unix word count utility).

Scanner class objects are constructed with a stream name as its parameter (eg. Scanner(inStream) or Scanner(System.in). It has the methods: next() and nextXxx() [where Xxx is a primitive type like Int], hasNext(), hasNextXxx(), useDelimiter(reg_exp), useRadix(int) [defaults to base 10], findInline(reg_exp) and skip(). If nextXxx() gets a token that does not match the Xxx type, it throws an InputMisMatchException.

Scanner input=new Scanner (System.in);  // set up input stream
System.out.println("Enter a positive integer"); // prompt user
num=input.nextInt();   // fetch his response

Random Access Files

Random access files allow files to be accessed at a specific point in the file. They can also be opened in read/write mode which allows updating of a current file. The constructor is RandomAccessFile(FilefileObject, String accessMethod) where the access method is either "r" or "rw". The seek(long position) method moves the file position pointer. It is incremented automatically on a write. The getFilePointer() method returns the current file position pointer. The file size can be adjusted with setLength(). Normal i/o methods are used for access.

copyrandom.java is a working random access file io system that uses bit streams to read and copy binary data to a new file. Note that it illustrates file creation but does not demonstrate either the ability to access at specific points in the file or to update a file.

mirror.java shows a very simple use of the seek() method. The source file is read backwards and each byte written to a new file. This is one of the simplest forms of encryption offered.

RandomAccess.java is a more complete example that uses graphical user input (GUI) to alter file contents. Since it extends the GenericApplication class, that file must be compiled first. Once both files are compiled, test with java RandomAccess xxx where xxx is the filename.

I/O Projects

school uses standard io and data stream buffering to get a student's gender and age and generate a gender/age specific response corresponding to the normal school level.

oddeven uses a Scanner class object for fetching numbers from the standard io stream.

TextIO is a reusable class that has methods to: open files as buffered data streams, close files, and read|write text on a character or line by line basis. The project tests your knowledge of basic file IO as well as how to encapsulate an object. Guard against closing unopened files. Add a readTag() method that returns the next html tag string (text bracketed by angle brackets). To have a test driver program do something, write a worker class that removes comment lines from a file. This will require the indexOf() and substring() methods from the String class. Write a second driver program that fetches tags from the sample file to test the readTag() method.

longest uses a command line glob expansion to run a batch operation on a method that identifies the longest line in each file in the glob. It demos reuse of TextIO class.

NameReader is a small project that demonstrates the reuse of the TextIO class project as well as simple control and string work. It forms a good review of Java topics to this point. NameReader reads lines of firstname surname from a file and writes a file of names based on one of three options. Option 1 writes every line. Option 2 writes only the lines where the firstname and surname have common initial (eg Adam Ant). Option 3 writes only lines where the surname has a specific initial based on command line input.

wordCount2 adds file IO to the previous word counting project. Reuse wordCount1 and add the text io class to the workspace class. A GUI will be added as part of a case study. Note: wcPlus adds the advantages of stream tokenization, dynamic arrays (collections) and multiple file analysis.

XCheck2 adds basic file IO to the previous HTML analysis project. Reuse XCheck1 and add the text io class to the workspace class. A GUI will be added as part of a case study.

Tutorial Source Code

Obtain source for Concord, concordance, copybyte, mirror, longest, oddeven, RandomAccess, school, textIO, wordCount2, wcPlus, XCheck2, etc. here.



JR's HomePage | Comments [jatutor9.htm:2014 04 04]