Read a File Into a String Java
Reading files in Java is the cause for a lot of confusion. There are multiple means of accomplishing the same chore and it'due south often not clear which file reading method is best to use. Something that's quick and dirty for a small example file might not be the all-time method to use when y'all need to read a very large file. Something that worked in an earlier Coffee version, might not be the preferred method anymore.
This article aims to be the definitive guide for reading files in Java vii, 8 and ix. I'grand going to cover all the ways you can read files in Java. As well often, you lot'll read an article that tells you one fashion to read a file, only to discover later in that location are other ways to practise that. I'chiliad actually going to embrace 15 dissimilar ways to read a file in Java. I'1000 going to cover reading files in multiple ways with the core Java libraries as well as two third political party libraries.
But that's non all – what practiced is knowing how to do something in multiple ways if you don't know which way is best for your situation?
I too put each of these methods to a real operation test and document the results. That manner, you lot will take some hard information to know the performance metrics of each method.
Methodology
JDK Versions
Coffee code samples don't live in isolation, peculiarly when it comes to Java I/O, as the API keeps evolving. All code for this article has been tested on:
- Coffee SE 7 (jdk1.7.0_80)
- Java SE eight (jdk1.8.0_162)
- Java SE nine (jdk-9.0.four)
When there is an incompatibility, it will be stated in that section. Otherwise, the code works unaltered for different Coffee versions. The main incompatibility is the utilize of lambda expressions which was introduced in Java viii.
Java File Reading Libraries
In that location are multiple means of reading from files in Coffee. This commodity aims to be a comprehensive drove of all the different methods. I will cover:
- java.io.FileReader.read()
- java.io.BufferedReader.readLine()
- coffee.io.FileInputStream.read()
- java.io.BufferedInputStream.read()
- java.nio.file.Files.readAllBytes()
- coffee.nio.file.Files.readAllLines()
- java.nio.file.Files.lines()
- java.util.Scanner.nextLine()
- org.apache.commons.io.FileUtils.readLines() – Apache Commons
- com.google.common.io.Files.readLines() – Google Guava
Closing File Resource
Prior to JDK7, when opening a file in Java, all file resource would need to be manually closed using a try-catch-finally block. JDK7 introduced the try-with-resource statement, which simplifies the process of closing streams. You no longer need to write explicit lawmaking to shut streams considering the JVM will automatically close the stream for you, whether an exception occurred or not. All examples used in this article utilise the try-with-resources statement for importing, loading, parsing and endmost files.
File Location
All examples will read test files from C:\temp.
Encoding
Grapheme encoding is not explicitly saved with text files then Java makes assumptions nearly the encoding when reading files. Usually, the assumption is correct but sometimes yous want to be explicit when instructing your programs to read from files. When encoding isn't correct, you'll encounter funny characters appear when reading files.
All examples for reading text files use ii encoding variations:
Default system encoding where no encoding is specified and explicitly setting the encoding to UTF-8.
Download Lawmaking
All lawmaking files are available from Github.
Code Quality and Lawmaking Encapsulation
There is a difference betwixt writing lawmaking for your personal or work project and writing code to explain and teach concepts.
If I was writing this lawmaking for my ain project, I would use proper object-oriented principles like encapsulation, brainchild, polymorphism, etc. Only I wanted to make each example stand alone and hands understood, which meant that some of the lawmaking has been copied from 1 example to the adjacent. I did this on purpose because I didn't desire the reader to have to figure out all the encapsulation and object structures I so cleverly created. That would take away from the examples.
For the same reason, I chose Non to write these instance with a unit testing framework like JUnit or TestNG considering that's not the purpose of this commodity. That would add together another library for the reader to sympathise that has aught to do with reading files in Java. That's why all the example are written inline within the main method, without extra methods or classes.
My main purpose is to make the examples as easy to understand as possible and I believe that having actress unit of measurement testing and encapsulation code will not assist with this. That doesn't mean that'due south how I would encourage you lot to write your own personal lawmaking. It'south merely the way I chose to write the examples in this article to brand them easier to understand.
Exception Handling
All examples declare whatever checked exceptions in the throwing method announcement.
The purpose of this article is to show all the different ways to read from files in Java – it's not meant to show how to handle exceptions, which will be very specific to your situation.
So instead of creating unhelpful try catch blocks that just print exception stack traces and ataxia up the lawmaking, all case will declare any checked exception in the calling method. This volition make the code cleaner and easier to empathise without sacrificing any functionality.
Future Updates
Every bit Java file reading evolves, I will be updating this article with any required changes.
File Reading Methods
I organized the file reading methods into three groups:
- Classic I/O classes that have been part of Java since before JDK 1.7. This includes the java.io and java.util packages.
- New Java I/O classes that have been part of Coffee since JDK1.seven. This covers the java.nio.file.Files class.
- Tertiary party I/O classes from the Apache Commons and Google Guava projects.
Classic I/O – Reading Text
1a) FileReader – Default Encoding
FileReader reads in one graphic symbol at a time, without any buffering. It's meant for reading text files. It uses the default character encoding on your organization, so I have provided examples for both the default example, as well as specifying the encoding explicitly.
i
2
3
four
v
6
7
8
9
10
11
12
13
14
15
16
17
18
nineteen
import coffee.io.FileReader ;
import java.io.IOException ;public class ReadFile_FileReader_Read {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;endeavor ( FileReader fileReader = new FileReader (fileName) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = fileReader.read ( ) ) != - ane ) {
singleChar = ( char ) singleCharInt;//display 1 grapheme at a time
Organisation.out.print (singleChar) ;
}
}
}
}
1b) FileReader – Explicit Encoding (InputStreamReader)
It's actually not possible to ready the encoding explicitly on a FileReader so you lot take to use the parent class, InputStreamReader and wrap it effectually a FileInputStream:
i
two
3
4
five
six
vii
viii
ix
10
11
12
thirteen
xiv
fifteen
16
17
eighteen
xix
twenty
21
22
import java.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public course ReadFile_FileReader_Read_Encoding {
public static void primary( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileInputStream fileInputStream = new FileInputStream (fileName) ;//specify UTF-8 encoding explicitly
try ( InputStreamReader inputStreamReader =
new InputStreamReader (fileInputStream, "UTF-8" ) ) {int singleCharInt;
char singleChar;
while ( (singleCharInt = inputStreamReader.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Organization.out.print (singleChar) ; //display one grapheme at a time
}
}
}
}
2a) BufferedReader – Default Encoding
BufferedReader reads an unabridged line at a time, instead of one character at a time like FileReader. Information technology'southward meant for reading text files.
1
2
3
iv
v
six
7
eight
9
10
eleven
12
13
14
15
16
17
import java.io.BufferedReader ;
import java.io.FileReader ;
import coffee.io.IOException ;public form ReadFile_BufferedReader_ReadLine {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
FileReader fileReader = new FileReader (fileName) ;effort ( BufferedReader bufferedReader = new BufferedReader (fileReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != zippo ) {
System.out.println (line) ;
}
}
}
}
2b) BufferedReader – Explicit Encoding
In a similar manner to how we set encoding explicitly for FileReader, we need to create FileInputStream, wrap it inside InputStreamReader with an explicit encoding and pass that to BufferedReader:
1
two
3
4
5
6
7
eight
9
10
11
12
13
14
15
16
17
eighteen
19
20
21
22
import java.io.BufferedReader ;
import java.io.FileInputStream ;
import java.io.IOException ;
import java.io.InputStreamReader ;public class ReadFile_BufferedReader_ReadLine_Encoding {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;FileInputStream fileInputStream = new FileInputStream (fileName) ;
//specify UTF-8 encoding explicitly
InputStreamReader inputStreamReader = new InputStreamReader (fileInputStream, "UTF-8" ) ;try ( BufferedReader bufferedReader = new BufferedReader (inputStreamReader) ) {
String line;
while ( (line = bufferedReader.readLine ( ) ) != nada ) {
Organization.out.println (line) ;
}
}
}
}
Classic I/O – Reading Bytes
ane) FileInputStream
FileInputStream reads in i byte at a fourth dimension, without any buffering. While it'southward meant for reading binary files such as images or audio files, it can still be used to read text file. It's similar to reading with FileReader in that you're reading i character at a time as an integer and you need to cast that int to a char to see the ASCII value.
By default, it uses the default graphic symbol encoding on your system, so I take provided examples for both the default case, as well as specifying the encoding explicitly.
1
2
3
4
5
6
7
8
ix
x
xi
12
xiii
14
15
16
17
18
nineteen
20
21
import java.io.File ;
import java.io.FileInputStream ;
import java.io.FileNotFoundException ;
import java.io.IOException ;public class ReadFile_FileInputStream_Read {
public static void main( String [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try ( FileInputStream fileInputStream = new FileInputStream (file) ) {
int singleCharInt;
char singleChar;while ( (singleCharInt = fileInputStream.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
System.out.print (singleChar) ;
}
}
}
}
two) BufferedInputStream
BufferedInputStream reads a set of bytes all at one time into an internal byte array buffer. The buffer size can be set explicitly or employ the default, which is what nosotros'll demonstrate in our example. The default buffer size appears to be 8KB but I have not explicitly verified this. All functioning tests used the default buffer size so it will automatically re-size the buffer when information technology needs to.
i
2
three
4
v
6
seven
8
nine
10
11
12
13
fourteen
15
16
17
xviii
19
20
21
22
import coffee.io.BufferedInputStream ;
import coffee.io.File ;
import java.io.FileInputStream ;
import coffee.io.FileNotFoundException ;
import java.io.IOException ;public class ReadFile_BufferedInputStream_Read {
public static void main( String [ ] pArgs) throws FileNotFoundException, IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;
FileInputStream fileInputStream = new FileInputStream (file) ;try ( BufferedInputStream bufferedInputStream = new BufferedInputStream (fileInputStream) ) {
int singleCharInt;
char singleChar;
while ( (singleCharInt = bufferedInputStream.read ( ) ) != - 1 ) {
singleChar = ( char ) singleCharInt;
Arrangement.out.print (singleChar) ;
}
}
}
}
New I/O – Reading Text
1a) Files.readAllLines() – Default Encoding
The Files class is part of the new Java I/O classes introduced in jdk1.vii. It but has static utility methods for working with files and directories.
The readAllLines() method that uses the default character encoding was introduced in jdk1.8 so this instance will not work in Coffee 7.
one
2
iii
4
5
6
7
8
9
10
11
12
13
14
fifteen
sixteen
17
import java.io.File ;
import coffee.io.IOException ;
import java.nio.file.Files ;
import java.util.List ;public form ReadFile_Files_ReadAllLines {
public static void main( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;Listing fileLinesList = Files.readAllLines (file.toPath ( ) ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
1b) Files.readAllLines() – Explicit Encoding
1
two
3
4
five
half dozen
vii
8
9
ten
11
12
xiii
14
15
xvi
17
18
nineteen
import coffee.io.File ;
import java.io.IOException ;
import coffee.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.List ;public grade ReadFile_Files_ReadAllLines_Encoding {
public static void primary( Cord [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//use UTF-8 encoding
List fileLinesList = Files.readAllLines (file.toPath ( ), StandardCharsets.UTF_8 ) ;for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
2a) Files.lines() – Default Encoding
This code was tested to work in Java eight and 9. Coffee 7 didn't run because of the lack of support for lambda expressions.
one
2
iii
4
v
vi
7
8
9
10
eleven
12
thirteen
fourteen
fifteen
xvi
17
import java.io.File ;
import coffee.io.IOException ;
import java.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines {
public static void principal( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Stream linesStream = Files.lines (file.toPath ( ) ) ) {
linesStream.forEach (line -> {
System.out.println (line) ;
} ) ;
}
}
}
2b) Files.lines() – Explicit Encoding
Just like in the previous example, this code was tested and works in Coffee 8 and 9 but non in Java vii.
1
two
3
iv
5
6
7
eight
9
x
11
12
13
14
xv
16
17
18
import java.io.File ;
import java.io.IOException ;
import java.nio.charset.StandardCharsets ;
import java.nio.file.Files ;
import java.util.stream.Stream ;public class ReadFile_Files_Lines_Encoding {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;try (Stream linesStream = Files.lines (file.toPath ( ), StandardCharsets.UTF_8 ) ) {
linesStream.forEach (line -> {
System.out.println (line) ;
} ) ;
}
}
}
3a) Scanner – Default Encoding
The Scanner course was introduced in jdk1.seven and can be used to read from files or from the console (user input).
1
ii
iii
4
5
6
7
8
9
ten
11
12
13
14
xv
16
17
18
nineteen
import java.io.File ;
import coffee.io.FileNotFoundException ;
import coffee.util.Scanner ;public class ReadFile_Scanner_NextLine {
public static void master( String [ ] pArgs) throws FileNotFoundException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;effort (Scanner scanner = new Scanner(file) ) {
Cord line;
boolean hasNextLine = simulated ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
System.out.println (line) ;
}
}
}
}
3b) Scanner – Explicit Encoding
1
ii
iii
4
v
6
7
8
9
10
xi
12
13
xiv
15
xvi
17
eighteen
19
20
import java.io.File ;
import java.io.FileNotFoundException ;
import java.util.Scanner ;public class ReadFile_Scanner_NextLine_Encoding {
public static void main( String [ ] pArgs) throws FileNotFoundException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;//utilise UTF-eight encoding
try (Scanner scanner = new Scanner(file, "UTF-eight" ) ) {
String line;
boolean hasNextLine = false ;
while (hasNextLine = scanner.hasNextLine ( ) ) {
line = scanner.nextLine ( ) ;
Organisation.out.println (line) ;
}
}
}
}
New I/O – Reading Bytes
Files.readAllBytes()
Even though the documentation for this method states that "information technology is not intended for reading in big files" I establish this to be the accented best performing file reading method, fifty-fifty on files as large as 1GB.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
fifteen
16
17
import java.io.File ;
import coffee.io.IOException ;
import java.nio.file.Files ;public class ReadFile_Files_ReadAllBytes {
public static void main( String [ ] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;byte [ ] fileBytes = Files.readAllBytes (file.toPath ( ) ) ;
char singleChar;
for ( byte b : fileBytes) {
singleChar = ( char ) b;
System.out.print (singleChar) ;
}
}
}
3rd Party I/O – Reading Text
Eatables – FileUtils.readLines()
Apache Commons IO is an open source Java library that comes with utility classes for reading and writing text and binary files. I listed it in this article because it can be used instead of the built in Java libraries. The class we're using is FileUtils.
For this article, version ii.6 was used which is uniform with JDK ane.7+
Note that you demand to explicitly specify the encoding and that method for using the default encoding has been deprecated.
1
two
3
4
5
half-dozen
7
8
9
10
xi
12
13
xiv
15
16
17
18
import coffee.io.File ;
import java.io.IOException ;
import java.util.Listing ;import org.apache.commons.io.FileUtils ;
public class ReadFile_Commons_FileUtils_ReadLines {
public static void principal( String [ ] pArgs) throws IOException {
Cord fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = FileUtils.readLines (file, "UTF-8" ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
Guava – Files.readLines()
Google Guava is an open up source library that comes with utility classes for common tasks like collections handling, cache management, IO operations, string processing.
I listed it in this commodity because it can be used instead of the built in Java libraries and I wanted to compare its performance with the Java built in libraries.
For this article, version 23.0 was used.
I'm non going to examine all the different ways to read files with Guava, since this article is not meant for that. For a more detailed await at all the different means to read and write files with Guava, have a wait at Baeldung's in depth article.
When reading a file, Guava requires that the character encoding exist set explicitly, just similar Apache Commons.
Compatibility notation: This lawmaking was tested successfully on Java viii and 9. I couldn't get information technology to piece of work on Java 7 and kept getting "Unsupported major.minor version 52.0" error. Guava has a divide API doc for Java 7 which uses a slightly unlike version of the Files.readLine() method. I idea I could get information technology to piece of work but I kept getting that error.
ane
2
three
four
five
6
7
viii
ix
10
eleven
12
thirteen
fourteen
15
xvi
17
eighteen
19
import java.io.File ;
import java.io.IOException ;
import java.util.List ;import com.google.common.base of operations.Charsets ;
import com.google.common.io.Files ;public class ReadFile_Guava_Files_ReadLines {
public static void main( String [ ] args) throws IOException {
String fileName = "c:\\temp\\sample-10KB.txt" ;
File file = new File (fileName) ;List fileLinesList = Files.readLines (file, Charsets.UTF_8 ) ;
for ( String line : fileLinesList) {
System.out.println (line) ;
}
}
}
Performance Testing
Since there are and so many ways to read from a file in Java, a natural question is "What file reading method is the best for my situation?" So I decided to test each of these methods against each other using sample data files of unlike sizes and timing the results.
Each code sample from this article displays the contents of the file to a string and then to the console (System.out). However, during the performance tests the System.out line was commented out since it would seriously slow down the performance of each method.
Each performance examination measures the time it takes to read in the file – line past line, graphic symbol by character, or byte by byte without displaying anything to the console. I ran each test 5-10 times and took the average so as not to let any outliers influence each test. I also ran the default encoding version of each file reading method – i.e. I didn't specify the encoding explicitly.
Dev Setup
The dev environment used for these tests:
- Intel Core i7-3615 QM @two.3 GHz, 8GB RAM
- Windows viii x64
- Eclipse IDE for Java Developers, Oxygen.2 Release (four.7.two)
- Java SE 9 (jdk-9.0.4)
Information Files
GitHub doesn't allow pushing files larger than 100 MB, so I couldn't find a applied way to store my big test files to allow others to replicate my tests. And so instead of storing them, I'one thousand providing the tools I used to generate them so you tin create test files that are similar in size to mine. Obviously they won't be the same, but yous'll generate files that are like in size every bit I used in my operation tests.
Random Cord Generator was used to generate sample text then I just copy-pasted to create larger versions of the file. When the file started getting too large to manage inside a text editor, I had to use the command line to merge multiple text files into a larger text file:
re-create *.txt sample-1GB.txt
I created the following 7 data file sizes to test each file reading method across a range of file sizes:
- 1KB
- 10KB
- 100KB
- 1MB
- 10MB
- 100MB
- 1GB
Performance Summary
There were some surprises and some expected results from the performance tests.
Every bit expected, the worst performers were the methods that read in a file character by graphic symbol or byte by byte. Only what surprised me was that the native Java IO libraries outperformed both 3rd political party libraries – Apache Eatables IO and Google Guava.
What'due south more – both Google Guava and Apache Commons IO threw a coffee.lang.OutOfMemoryError when trying to read in the 1 GB test file. This likewise happened with the Files.readAllLines(Path) method but the remaining 7 methods were able to read in all exam files, including the 1GB test file.
The following table summarizes the boilerplate fourth dimension (in milliseconds) each file reading method took to complete. I highlighted the top three methods in light-green, the average performing methods in yellow and the worst performing methods in ruby-red:
The following nautical chart summarizes the above table merely with the following changes:
I removed java.io.FileInputStream.read() from the chart considering its performance was and so bad information technology would skew the unabridged nautical chart and yous wouldn't see the other lines properly
I summarized the data from 1KB to 1MB because later on that, the chart would get as well skewed with and then many under performers and as well some methods threw a java.lang.OutOfMemoryError at 1GB
The Winners
The new Coffee I/O libraries (coffee.nio) had the best overall winner (java.nio.Files.readAllBytes()) but it was followed closely backside past BufferedReader.readLine() which was also a proven pinnacle performer across the board. The other splendid performer was java.nio.Files.lines(Path) which had slightly worse numbers for smaller test files merely really excelled with the larger test files.
The accented fastest file reader across all data tests was java.nio.Files.readAllBytes(Path). It was consistently the fastest and even reading a 1GB file only took about one second.
The following chart compares operation for a 100KB test file:
You can see that the lowest times were for Files.readAllBytes(), BufferedInputStream.read() and BufferedReader.readLine().
The following chart compares functioning for reading a 10MB file. I didn't carp including the bar for FileInputStream.Read() because the performance was so bad it would skew the entire chart and you couldn't tell how the other methods performed relative to each other:
Files.readAllBytes() really outperforms all other methods and BufferedReader.readLine() is a distant 2d.
The Losers
Equally expected, the accented worst performer was java.io.FileInputStream.read() which was orders of magnitude slower than its rivals for most tests. FileReader.read() was also a poor performer for the same reason – reading files byte past byte (or character past character) instead of with buffers drastically degrades functioning.
Both the Apache Eatables IO FileUtils.readLines() and Guava Files.readLines() crashed with an OutOfMemoryError when trying to read the 1GB test file and they were about average in performance for the remaining test files.
java.nio.Files.readAllLines() also crashed when trying to read the 1GB test file but it performed quite well for smaller file sizes.
Performance Rankings
Hither's a ranked list of how well each file reading method did, in terms of speed and handling of large files, also as compatibility with dissimilar Coffee versions.
| Rank | File Reading Method |
|---|---|
| 1 | java.nio.file.Files.readAllBytes() |
| two | java.io.BufferedFileReader.readLine() |
| three | coffee.nio.file.Files.lines() |
| 4 | java.io.BufferedInputStream.read() |
| 5 | java.util.Scanner.nextLine() |
| six | java.nio.file.Files.readAllLines() |
| seven | org.apache.eatables.io.FileUtils.readLines() |
| 8 | com.google.common.io.Files.readLines() |
| ix | coffee.io.FileReader.read() |
| 10 | coffee.io.FileInputStream.Read() |
Conclusion
I tried to present a comprehensive set of methods for reading files in Java, both text and binary. We looked at 15 unlike means of reading files in Coffee and we ran performance tests to see which methods are the fastest.
The new Coffee IO library (java.nio) proved to be a great performer but so was the classic BufferedReader.
Read a File Into a String Java
Source: https://funnelgarden.com/java_read_file/
0 Response to "Read a File Into a String Java"
Post a Comment