Introduction

csvBeans is a Java framework that helps developers to map Java beans with CSV files. It can be used to make easier the parsing or the building of CSV files. It can simplify your application code: you only work with your Java model and csvBeans helps you to map your domain objects with CSV files.

Line Set

Since csvBeans 0.7, the notion of line set has been introduced. A line set can be seen as a set of lines that match a record. In classic CSV files, a line means a record but sometimes, one can work with files where information about a record are dispatched between several lines.

Here is an example to make things appear clearer:

# Products
1;Product1;12
2;Product2;14
3;Product3;17

In the source above, each line can be mapped with a Product object. Therefore, the line set is composed of one line only. In the following source:

# Products
1;Product1
1;12
2;Product2
2;14
3;Product3
3;17

two lines are needed to build each Product object: the line set is composed of two lines.

TODO

The specifications file

In order to map your Java beans with CSV files, you need to define a specifications file. This file is in XML format and must contain all the information concerning the mapping relations. If you have already used frameworks like Hibernate, you will find the csvBeans specifications file format easy to understand.

csvBeans is shipped with an XSD file that defines the format the specifications file must follow to be valid. Its format is described in the tag reference section.

The parsers

CSVParser

CSVParser is the first CSV file parser I've created in order to map Java objects with lines of a CSV file. It can be used to parse files whose format follows these rules:

  • A line to indicate the beginning of a section associated with a Java bean
  • Several lines where each line map with a Java bean
Here is an example of such kind of files:


[Section 1]
12;0;;56
23;;ACK;JI

[Section 2]
1;21
3;47
5;80

[Section 3]
a;1;2;b
c;r;;
d;4;3;f

      

A specifications file to use the parser and for this kind of CSV file can be like this one:


<csvbeans>
  <strategy>
    <parser className="org.csvbeans.parsers.CSVParser" />
  </strategy>
  <record start="[Section 1]" className="org.csvbeans.samples.strategy.Section1Bean">
    <field name="column1" maxLen="2" required="true">
      <validators>
        <validator class="org.csvbeans.validators.DigitValidator" />
      </validators>
    </field>
    <field name="column2" maxLen="1">
      <validators>
        <validator class="org.csvbeans.validators.IntegerRangeValidator">
          <property name="minimumValue" value="0" />
          <property name="maximumValue" value="1" />
        </validator>
      </validators>
    </field>
    <bean name="sectionBean" className="org.csvbeans.samples.strategy.SectionBean">
      <field name="value1" maxLen="10" />
      <field name="value2" maxLen="5" />
    </bean>
  </record>
  <record start="[Section 2]" className="org.csvbeans.samples.strategy.Section2Bean">
    <field name="column1" maxLen="1">
      <validators>
        <validator class="org.csvbeans.validators.DigitValidator" />
      </validators>
    </field>
    <field name="column2" maxLen="2" />
  </record>
  <record start="[Section 3]" className="org.csvbeans.samples.strategy.Section3Bean">
    <bean name="sectionBean" className="org.csvbeans.samples.strategy.SectionBean">
      <field name="value1" maxLen="10" />
      <field name="value2" maxLen="5" />
    </bean>
    <field name="column3" maxLen="1">
      <validators>
        <validator class="org.csvbeans.validators.DigitValidator" />
      </validators>
    </field>
    <field name="column4" maxLen="1">
      <authorized-values>
        <value>b</value>
        <value>f</value>
      </authorized-values>
    </field>
  </record>
</csvbeans>

      

Each section can be associated with a Java bean:


public class Section1Bean {
  private Integer column1;
  private Integer column2;
  private SectionBean sectionBean = new SectionBean();

  // getters and setters
}

public class Section2Bean {
  private Integer column1;
  private String column2;

  // getters and setters
}

public class Section3Bean {
  private SectionBean sectionBean = new SectionBean();
  private Integer column3;
  private String column4;

  // getters and setters
}

      

And you can use the parser this way:


CSVSpecificationsFile specs = new CSVSpecificationsFile(new FileInputStream("specifications.xml"));
specs.parse();
ParsingStrategy parser = specs.getParsingStrategy();
parser.parse(new InputStreamLinesReader(new FileInputStream("sample1.txt")));
List objectsOfSection1 = parser.getBeans("[Section 1]");
for (Iterator it = objectsOfSection1.iterator(); it.hasNext(); ) {
  System.out.println(it.next());
}
// ...

      

As you can see, defining the parsing strategy within the specifications file makes your code independant of the parser implementation (there is no reference to the CSVParser class !). I think that you should follow this rule to avoid such dependancies.

Validators

You may have seen in the previous specifications file, the usage of the validators element. This XML element allows to associate with a field a list of validators that will be able to check the values of the field.

The original idea of validators came from Oscar Gonzalez, CTO of Samelan society.

In practice, a validator is a class that implements the org.csvbeans.validators.Validator interface. csvBeans comes with a list of validators but you can define your owns and use them in the specifications file.

There is an abstract class named FieldValidator that you can extend when developing your validators.

Context management

If you take a look at the Validator interface, you will see the init method that takes a ContextManager object as a parameter.

A context can be seen as a list of mapped properties that you can use in order to exchange data between your validators.

The original idea of contexts came from Oscar Gonzalez, CTO of Samelan society.

There are two kinds of contexts: a record context and a session context. The following schema shows when they are created and initialized so you can have an idea of how to use them:

XXX - SCHEMA du cycle de vie du parsing

Using a listener

In the preceeding section, we have seen that the parsed objects are stored in a map and that you retrieve them in a list. But this can be memory consuming when you have to parse huge CSV files.

To solve this kind of problem, CSVBeans provides a listening interface that you can implement in order to read the objects during the parsing. This is quite similar to SAX parsing in XML.

To create your listener, you need to implement the CSVParserListener interface. Here are the methods that you need to implement:

Method Description
onStart Is called just before the beginning of the parsing. Perform your initialization stuff in this method.
onBean Is called each time a bean has been created by the parser. The bean is initialized with all the content that has been found in the CSV lines associated with it.
onError Is called when an error has been generated by the parser. If the continueParsingIfErrors property is set to true, it will be called each time an error has occured.
onEnd It is called at the end of the performance. This is where you will free all your resources if needed.

In the samples directory, you will find a listener implementation that generates SQL insert statements from a CSV file containing the data. It can be used when you have a flat file of some table data (generated by SQL Server for example) and you need to import these data in another database that is not able to import flat files (like Oracle for example).

CSVParser2

CSVParser2 is another parsing strategy shipped with csvBeans. This parser can be used when the start tag is at the beginning of each record instead of once at a given line. Here is a CSV example:


tag1;a;b
tag1;c;d
tag1;e;f
tag2;1;2;3
tag2;4;5;6

    

To specify that the start tag is at the beginning of the record, you only need to specify that you want to use the CSVParser2 strategy. It can be done for example in the XML specifications file:


<csvbeans>
 <strategy>
   <parser className="org.csvbeans.parsers.CSVParser2" />
 </strategy>
 <record className="BeanTag1" start="tag1">
   <field name="field1" maxLen="10" />
   <field name="field2" maxLen="2" required="true" />
 </record>
 <record className="BeanTag2" start="tag2">
   <field name="field1" maxLen="3" digit="true" />
   <field name="field2" maxLen="2" digit="true" />
   <field name="field3" maxLen="1" digit="true" />
 </record>
</csvbeans>

    

You can see that there is no much difference between this mapping file and the preceeding ones, only the parsing strategy has been changed to adapt the application to the new CSV file format.

CSVParser3

CSVParser3 extends the CSVParser2 strategy. This parsing strategy offers the ability to parse CSV files where a bean owns lists of beans that are defined in following lines. Here is a CSV example of this kind of file:


tag1;a;b
tag2;1;2;3
tag2;4;5;6
tag1;c;d
tag2;7;8;9
tag1;e;f

    

In this example, we want that the beans identified with the tag1 tag contain beans with the tag2 tag. Here is a way to define our beans:


public class BeanTag1 {
  private String field1;
  private String field2;
  private List beans = new ArrayList();

  // ... getters and setters
}

public class BeanTag2 {
  private Integer field1;
  private Integer field2;
  private Integer field3;

  // ... getters and setters
}

    

To define that the lines that follow the tag1 line must be included in the list, we can use the children XML element in our mapping file:


<csvbeans>
 <strategy>
   <parser className="org.csvbeans.parsers.CSVParser3" />
 </strategy>
 <record className="org.csvbeans.samples.strategy.BeanTag1" start="tag1">
   <field name="field1" maxLen="10" />
   <field name="field2" maxLen="2" required="true" />
   <children>
     <child ref="tag2" name="beans" />
   </children>
 </record>
 <record className="org.csvbeans.samples.strategy.BeanTag2" start="tag2">
   <field name="field1" maxLen="3" digit="true" />
   <field name="field2" maxLen="2" digit="true" />
   <field name="field3" maxLen="1" digit="true" />
 </record>
</csvbeans>

    

The child element references the start tag of the beans to be added: all the tag2 lines that follow a tag1 line will be converted into BeanTag2 objects and added to the beans list of the BeanTag1 object.

The builders

CSVBuilder

The CSVBuilder building strategy works with the same kind of files than CSVParser except that it allows to build the files instead of parsing them.

You define the building strategy in the XML specifications file or you can create it in your code. In the latter case, you will need to perform the strategy initialization by yourself which means:

  • Instantiate the strategy
  • Give it a reference to an already parsed specifications file
  • Give it a reference to internal properties if needed
  • Give it a reference to a messages bundle
Here is an example:

      CSVBuilder = new CSVBuilder();
      strategy.setSpecifications(specifications);
      strategy.setProperties(new HashMap());
      strategy.setMessageSource(new ResourceBundleMessageSource());
      

Then, you can use the builder. If you define the builder in the XML file, all the initialization stuff is performed internally by csvBeans, therefore this is the prefered way to define the strategy:


<csvbeans>
  <strategy>
    <builder className="org.csvbeans.builders.CSVBuilder" />
  </strategy>

  <!- Records mappings -->
</csvbeans>

      

Once you get the builder, you can feed it with your beans after having obtaining it from the specifications:


BuildingStrategy builder = specs.getBuildingStrategy();
builder.addBeans("# MyBeans", beans);

      

To generate the CSV, juste create a LinesWriter object (the library provides an OutputStreamLinesWriter implementation that writes the CSV into a file) and use the build method:

builder.build(new OutputStreamLinesWriter(new FileOutputStream("result.csv")));

CSVBuilder3

CSVBuilder3 has been developed to answer some special requirements in a project I've worked on. I am not sure that it will stay in future releases of csvBeans since it will be probably refactored.

It can be used to build similar files than those parsed by a CSVParser3 object.

Converters

Since csvBeans 0.7, a new and important feature has been included with converters. The initial idea, provided by Christian Menz, was to provide a way to encode and decode values when building or parsing a CSV file, in order to separate "infrastructure" operations (like encryption/decryption). But I found this so powerful that I decided to replace some operations that were done thanks to some field attributes with their own converters.

Therefore, padding and date operations are now supplied with PaddingConverter and DateConverter. The XML syntax will look more complex but it allows to combine several kinds of operations and also to define your owns.

The use of a converter is easy: just define the class of the converter to use in your specifications file:

<csvbeans> <converters> <converter id="converter1" className="org.csvbeans.converters.HexConverter" /> <converter id="dateConverter" class="org.csvbeans.converters.DateConverter"> <property name="pattern" value="MM-dd-yyyy" /> </converter> </converters> <record className="org.csvbeans.beans.Bean1" start="tag"> <field name="field1" maxLen="10"> <converter ref="converter1" /> </field> <field name="field2" maxLen="2"> <converter id="converter2" className="org.csvbeans.converters.Base64Converter" /> </field> <field name="someDate"> <converter ref="dateConverter" /> </field> </record> </csvbeans>

The converter tag is used to define a converter. The id attribute must be unique within the XML file and identified the converter that can be referenced below field elements. The className attributes defines the class of the converter: csvBeans ships with some simple converters that you can use. The converter element can contain nested property elements to customize the converter if needed.

You can see in the preceeding sample that the converter tag can be defined globally or within a field element. To refer a global converter, use the ref attribute of the converter element.

The converter must implement the Converter interface. The encode method will be called during the built of the CSV file and the decode method will be called during the parsing. You can of course create your own converters.

You must use the DateConverter when you need to parse or build a date field. The date attribute of the field element is no more supported.

Working with Fixed length files

csvBeans has been first created to handle CSV files but since version 0.7, it is also possible to work with files where the field values are not separated with some character.

The original idea comes from Oscar Gonzalez, of SameLAN, S.L. Soluciones Tecnológicas who remarked that having a mapping tool to handle such flat files would be helpful since many projects work with that kind of files.

A fixed length file is a file where the fields are specified with their positions. For example:


# Tag1
11222333
44555666
77888999

  		

Each line is composed of three fields values. csvBeans allows to map such lines with POJOs even if there is no separator. To parse (or build) such files, your record specification will look like the following:


<record start="# Tag1" className="org.csvbeans.samples.fixedlength.Bean1">
  <field name="field1" startPosition="1" endPosition="2" />
  <field name="field2" startPosition="3" endPosition="5" />
  <bean name="bean" className="org.csvbeans.samples.fixedlength.Bean2">
    <field name="field1" startPosition="6" endPosition="8" />
  </bean>
</record>

Moreover, csvBeans can handle flat files where a bean is defined by multiple lines instead of a unique one. Suppose that we have the following data:


# Tag2
123456
76543223
234499
89938090
433146
99880347

In this example, each record needs two lines to be built. Here is a specification to describe this. Note the usage of the lineSetPosition attribute:


<record start="# Tag2" className="org.csvbeans.samples.fixedlength.Bean1">
  <field name="field1" startPosition="1" length="2" lineSetPosition="1"/>
  <field name="field2" startPosition="1" length="3" lineSetPosition="2" />
  <bean name="bean" className="org.csvbeans.samples.fixedlength.Bean2">
    <field name="field1" startPosition="4" length="5" lineSetPosition="2"/>
    <field name="field2" startPosition="3" length="4" lineSetPosition="1"/>
  </bean>
</record>

Given two lines, the field2 property of the Bean1 object is obtained from reading the third first characters of the second line.

Extending CSVBeans

csvBeans has been architectured with the idea in mind that it should be easy to add new parsers and builders and to extend it. Therefore, there are many interfaces defined in the library and csvBeans comes with implementations of those interfaces that can be exchanged with new implementations.

Those interfaces can be found in the org.csvbeans.interfaces and underlying packages.

An example of how csvBeans has been extended is when a fixed length files parser and builder have been included in the library. First, I wanted to introduce new implementations of the ParsingStrategy and BuildingStrategy interfaces. It could have been a solution but according to me, a better one was to provide new implementations of the LineParser and the LineBuilder interfaces.

Indeed, they are easier to implement and to test. Moreover, they are easy to associate them with the CSVParser and CSVBuilder classes in the specifications file through an XML attribute.

Tips and tricks

Changing the CSV separator

Just set the separator property in the XML mapping file:


<csvbeans>
  <property name="separator" value="|" />
  <!-- -->
</csvbeans>

    

Parsing a CSV file that has no start tag

Just set the noStartTag property to true in the specifications file:


<csvbeans>
  <property name="noStartTag" value="true" />
  <!-- -->
</csvbeans>

    

When using a listener, reduce memory usage

When you parse a file with the supplied parsers, the beans that are created are stored within the parser by default. Therefore, you are able to retrieve them thanks to the getBeans method. However, if you use a listener, you may not need to use this method, therefore you do not need that the parsers store the beans. You can do that thanks to the enableBeanStorage property:


<csvbeans>
  <property name="enableBeanStorage" value="true" />
  <!-- -->
</csvbeans>

    

Internationalize the library errors messages

The error messages of the library are now externalized in resources properties. Those properties can be found in the jar file and are called messages_xx.properties where xx is the language of the messages.

There is also a messages.properties file which contains the default messages when no properties file has been found for your language. You can add your own messages files or replace an existing one by adding it in the classpath.

Just copy an existing messages file and rename it with the ISO code of your language.