Introduction

csvBeans is an open source library that allows to map CSV files with Java beans. It looks like other mapping tools (XMLBeans) or databases (Hibernate) except that csvBeans is dedicated to CSV files. This document is a quick start tutorial to learn the basics of csvBeans. After reading, you should be able to install and to use the library in your own projects.

Installation

The source code of the last version of csvBeans is available here. You can download and unzip it in a directory, for example: c:\csvBeans. You will find the source code in the src/java directory and the unit tests in the src/tests directory.

The distribution contains the last version of the library in a jar file. To use it, you need to add it to your classpath. The lib/runtime directory contains the external and required libraries that must also be present in your classpath.

If you use a version of the JDK lesser than the 1.4, you should include the XML libraries of the lib/runtime/jdk1.3 directory.

In the case where you want to extend the library, you will need Ant to rebuild the jar file. Once Ant is installed and its bin directory in your path, you can execute the ant command at the root of the csvBeans distribution directory:


c:\csvBeans>ant package

	

This command will generate the CSVBeans jar file.

If you need to run the unit tests, an ant target is also available. Just execute the following line command:

	
c:\csvBeans>ant test

	

You can also work with the CVS version of the library which contains the features that will appear in the next version of csvBeans. Here is the CVS command to use:


cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/csvbeans login 

	

In this case, you will probably want to download Maven 1.0 since the project built is based on it.

Usage

What is CSV ?

CSV means "Comma Separated Values". A CSV record is a line that contains fields where each field is separated with a comma. CSV data are useful when working with database applications or when applications exchange data in this format.

Initially, csvBeans has been developed to handle csv data. However, since 0.7, it has been extended to manage fixed length data files that are pure flat files where there is no separator between the fields.

Define your CSV structure

In order to use the csvBeans library, you will need to write specifications files that describe the structure of your CSV files. The source code related to the following example can be found in the samples directory of the distribution.

Suppose that we need to parse a CSV file that contains a list of products, each line representing a product. A line is splitted into fields, each field is separated by a ;. We could describe the structure of a line in the following array:

Field Maximum Length Required
Product Id 10 char. true
Product Name 100 char. true
Product Quantity 3 char. true

An example of CSV lines following this description is:

ID0001;Product 1;5
ID0002;Product 2;2
ID0003;Product 3;0
ID0004;Product 4;7
		

A line structure is called a record in csvBeans.

Define the XML stuff

The specifications file is an XML file that follows a schema (csvbeans.xsd) provided with the library. In order to define the structure of our CSV products file, we can write the following specifications file:

		
<?xml version="1.0" ?>
<property name="separator" value=";" />
<record start="# Products" className="org.csvbeans.samples.quickstart.Product">
  <field name="id" maxLen="10" required="true" />
  <field name="name" maxLen="100" required="true" />
  <field name="quantity" maxLen="3" required="true" />
</record>
		
		

For the most part, the XML is easy to understand and can be guessed from the previous array.

The property element can be used to customize the library. Here, we use the separator property that defines the separator between field values in the CSV file. When this property is not defined in the specifications file, the library will suppose that the separator is ; by default.

A CSV line structure is described by the record element. This tag has two attributes: start and className. The start attribute value defines a section in the CSV file. A section is a set of lines that define several objects defined by the same class. In our example, the # Products section is followed by several products, each on on a line. Sometimes, you will probably have to parse or build CSV files with no defined section (which means a file that contains only CSV lines). In this case, you will have to use the noStartTag property and not the start attribute of the record element:


<property name="noStartTag" value="true" />

		

This definition of the section is related with the parser you use in CSVBeans. If you create your own parser (or builder), you can define the section another way or you can even not use it. Here is an example of a valid CSV file:

# Products
ID0001;Product 1;5
ID0002;Product 2;2
ID0003;Product 3;0
ID0004;Product 4;7
		

To sum-up, the # Products line is the beginning of the section and each line defines a product. You are not limited by the number of sections your file can contain. For example, your CSV file may contain products and clients:

# Products
ID0001;Product 1;5
ID0002;Product 2;2
ID0003;Product 3;0
ID0004;Product 4;7
# Clients
1;Client 1; Address 1; Zip 1; Town 1; Country 1; Tel 1
2;Client 2; Address 2; Zip 2; Town 2; Country 2; Tel 2
...and so on
		

The className attribute defines the Java bean to associate with the fields of a line. If this line is parsed, a Java object will be created and initialized with the different fields values. If the line is generated, it will be built from the Java object. Therefore, the same specification file can be used both for parsing or generation of CSV files.

A field element describes the structure of a CSV field. The required attribute is its name that must match an accessible property of the Java class declared in the class attribute of the XML element wrapping the field (the record element in our preceeding example)

The field element supports several attributes. Only the name attribute is required. Here is a little description of the attributes we have used in the specification above:

Attribute name Description Type Required Default
name The field name. It must match with a Javabean property of the bean associated with the field. string Yes
maxLen The maximum length of the value contained in the field integer No 10
required Specify if the field must contain a value boolean (true/false) No false

In our example, here is the class to handle our CSV lines:

package org.csvbeans.quickstart.samples;

public class Product {
  private String id;
  private String name;
  private int quantity;

  public Product() {}

  public String getId() { return id; }
  public void setId(String id) { this.id = id; }

  public String getName() { return name; }
  public void setName(String name) { this.name = name; }

  public int getQuantity() { return quantity; }
  public void setQuantity(int quantity) { this.quantity = quantity; }
}
		

There is no need to implement some interface or to extend some specific class: your Java classes are independant of the csvBeans library. Only accessors are needed if you want to generate CSV files or mutators if you need to parse them.

The field element is useful to describe simple Java types (like String, integers, boolean and so on). But your POJO can be more complex. Suppose that the quantity information is sent into two fields: one for the value and one for the units. We could have two attributes in our Product class but let say that our model represents the quantity by another POJO:

package org.csvbeans.quickstart.samples;

public class Quantity {
  private int value;
  private String units;

  public int getValue() { return value; }
  public void setValue(int value) { this.value = value; }

  public String getUnits() { return units; }
  public void setUnits(String units) { this.units = units; }
}

public class Product {
  private String id;
  private String name;
  private Quantity quantity;

  public Product() {}

  public String getId() { return id; }
  public void setId(String id) { this.id = id; }

  public String getName() { return name; }
  public void setName(String name) { this.name = name; }

  public Quantity getQuantity() { return quantity; }
  public void setQuantity(Quantity quantity) { this.quantity = quantity; }
}
		

The quantity attribute of the Product class cannot be described with the field element. csvBeans provides another XML element called bean that can be used to handle such a situation. It can be used as in the following example:


<record start="# Products" className="org.csvbeans.samples.quickstart.Product">
  <field name="id" maxLen="10" required="true" />
  <field name="name" maxLen="100" required="true" />
  <bean name="quantity" className="org.csvbeans.samples.quickstart.Quantity">
    <field name="value" maxLen="3" required="true" />
    <field name="units" maxLen="4" required="true" />
  </bean>
</record>

		

Here is a CSV file that follows this specification:

# Products
ID0001;Product 1;5;g
ID0002;Product 2;2;L
ID0003;Product 3;0;kg
ID0004;Product 4;7;bo
		

The bean element can also contain several field or bean elements. Therefore, it is easy to build complex bean definitions. The main attributes of the bean elements are described in the following array:

Attribute name Description Type Required Default
name The Javabean property that holds the bean value. It is similar to the name attribute of the field element. string Yes
class The type of the Javabean property. string Yes

Please refer to the Tag reference section to see the other attributes of this element.

Choosing the strategy

There are two ways to create a CSV parser or CSV builder:

  • In the specifications file
  • In your program
I prefer the first solution since it makes your code independant on the strategy implementation.

  • In the specifications file

csvBeans ships with several implementations of CSV parsers and builders. Each can be used with some kind of CSV file. Moreover, the design of the library has been made opened in order to add other parsers and builders. Therefore, in csvBeans, a parser or a builder is actually named a strategy. Those strategies are represented by two interfaces: ParsingStrategy and BuildingStrategy. Each parser and each builder implement the related interface.

You can specify in the XML file which strategies you want to use in your program thanks to the strategy element:


<strategy>
  <parser className="org.csvbeans.parsers.CSVParser" />
  <builder className="org.csvbeans.builders.CSVBuilder" />
</strategy>

	
The parser element defines which CSV parser you want to use and the builder which CSV building strategy implementation. The CSVParser and the CSVBuilder classes are the most common to parse and build csv files. In your code, you get the strategy from the specifications file:
//	specs is an instance of the CSVSpecifications class (see below)
ParsingStrategy parser = specs.getParsingStrategy();
// then do the parsing
You can see that there is no reference to the parser implementation but to its interface in the code which is usually a good point for unit testing with mock objects.

  • In your code

If you need to create the parser in your code, you do a new operation and provide the XML file parser:

//	specs is an instance of CSVSpecificationsFile (see below)
CSVParser parser = new CSVParser();
parser.setSpecifications(specs);
parser.setProperties(specs.getParserProperties());
// then do the parsing
	

As you can see, you have to initialize by hand the properties of the parser (or builder) therefore this is not the advised way.

Create the specifications file object

To parse the XML specifications file, you can use the SpecificationsFileParser class to obtain an image of the file:

SpecificationsFileParser specsParser = new SpecificationsFileParser();
CSVSpecifications specs = specsParser.parse(new FileInputStream("samples/org/csvbeans/samples/quickstart/mapping.xml"));
	

If an error occurs during the parsing, a SpecificationsFileException object will be thrown. By default, the XML validation is enabled, if you need to disable it, just call the setValidate method before parsing:

SpecificationsFileParser specsParser = new SpecificationsFileParser();
specsParser.setValidate(false);
CSVSpecifications specs = specsParser.parse(new FileInputStream("samples/org/csvbeans/samples/quickstart/mapping.xml"));
	

Parsing the CSV file

To parse a CSV file, you will use the parse method of the ParsingStrategy interface. If you take a look at its definition, you can see that it parses a LinesReader object instead of a file directly. This is mainly for two reasons: the first one is that it makes easier my unit tests and the second one is that it does not prevent a future extension of the library where the CSV lines are obtained from another way than a file: they could come from the network for example. Therefore, you will only need to add another implementation of the LinesReader interface dedicated to the network.

Currently, there is only one implementation of this interface: InputStreamLinesReader. You can use it to parse CSV files:

ParsingStrategy parser = specs.getParsingStrategy();
parser.parse(new InputStreamLinesReader(new FileInputStream("samples/org/csvbeans/samples/quickstart/fileToParse.txt")));
	

If there is an error in the CSV file, a ParsingException is thrown: this exception can be used to retrieve the failed CSV line. If you need to parse several files, you can create new parsers (currently, this is what the getParsingStrategy method does) but you should parse the XML specifications file only once since this operation is time consuming.

By default, the parser stores the Java beans built from the CSV file in a Map object. When you need to parse big CSV files, you should consider using a listener which will receive each Java object when built instead of storing them in memory. Take a look at the CSVParserListener Javadoc for more info (there is also a sample in the distribution).

Getting the built Java objects from the parser is easy:

List products = parser.getBeans("# Products");
for (Iterator it = products.iterator(); it.hasNext(); ) {
  Product product = (Product) it.next();
  System.out.println(product);
}
	

The getBeans takes the name of the section in the CSV file and returns the list of matching beans.

Build a CSV file

To generate a CSV file, we can use the same specifications file than in the parsing operation. In your Java code, set the different attributes of the objects you want to put in your CSV file:

Product p1 = new Product();
p1.setId("P0001");
p1.setName("Product 1");
p1.setQuantity(new Quantity(10, "kg"));

Product p2 = new Product();
p2.setId("P0002");
p2.setName("Product 2");
p2.setQuantity(new Quantity(5, "bo"));

List products = new ArrayList();
products.add(p1);
products.add(p2);
BuildingStrategy builder = specs.getBuildingStrategy();
builder.addBeans("# Products", products);

builder.build(new OutputStreamLinesWriter(new FileOutputStream("builtFile.txt")));
	

To generate a CSV file, we need to create the beans, put them into a List object and provide the list to the generator. The build method is similar to the parse method of the parsing strategy. It takes in parameter an implementation of the LinesWriter interface. csvBeans ships with one implementation of this interface that allows to generate a CSV file:

# Products
P0001;Product 1;10;kg
P0002;Product 2;5;bo
	

Conclusion

We only have see a subset of what CSVBeans is able to do. You should now take a look at the tag reference section to get a more detailled view of what can be specified in the specifications file.