JDBC Importer

JDBC Importer Logo

JDBC Importer Tutorial 8 : One Table, CSV Delimited, New Database-Aware Column Translator

Please make sure you have the appropriate libraries in your classpath (including the JDBC driver used to connect to your database) before starting the tutorials.

In this tutorial, you'll learn the basics of creating a new Database-Aware Column Translator and running the import with it. The table that will contain the rows imported is called employee and it has the following columns:

employee
NameType
idnumber(6)
firstnamevarchar(10)
lastnamevarchar(10)
jobdescriptionvarchar(10)
manageridnumber(6)
startdatedate
salarynumber(9,2)
departmentnumber(6)

The column translator will use the following table to translate a value from the input file:

department
NameType
idnumber(6)
namevarchar(20)
officevarchar(20)

Make sure that these table(s) are created in the database that you'll be importing data. You can find the oracle creation script in the samples directory under the filename : 'tutorial8/createtable_ora.sql'.

Now that the database is setup, you can examine the architecture to see how the Column Translator is used during the import.

Architecture Background

The Column Translator is used during the import for converting the column value read from the file into a different value.

The JDBCImporter requires that each Database-Aware Column Translator be a Java Bean like object. All <property> tags defined inside the '<translator>' will be passed to the appropriate set method. The DbColumnTranslator must implement three methods : setup( Connection ), getValue() and cleanup().

Custom Column Translator

The Column Translator that you will be creating will convert the string representing the name of the department into the id of the department.

The first thing to do is create the class DbLookupTableColumnTranslator that implements the DbColumnTranslator interface.

import java.sql.Connection;
import java.sql.SQLException;

import net.sourceforge.jdbcimporter.ColumnDef;
import net.sourceforge.jdbcimporter.ColumnValue;
import net.sourceforge.jdbcimporter.DbColumnTranslator;

public class DbLookupTableColumnTranslator implements DbColumnTranslator
{

	public void setup( Connection con ) throws SQLException
	{
		// TODO Auto-generated method stub

	}

	public void cleanup( ) throws SQLException
	{
		// TODO Auto-generated method stub

	}

	public ColumnValue getValue( ColumnDef column, ColumnValue columnValue )
	{
		// TODO Auto-generated method stub
		return null;
	}

}
				
Initial Code

The Column Translator needs three properties : the lookup table, the column name of the source value (the value in the file) and the column name of the lookup value (the value that will be stored in the database).

...

public class DbLookupTableColumnTranslator implements DbColumnTranslator
{

	protected String lookupTable;
	protected String lookupColumn;
	protected String sourceColumn;
	
	public void setLookupTable( String table )
	{
		this.lookupTable = table;
	}

	public void setLookupColumn( String column )
	{
		this.lookupColumn = column;
	}
			
	public void setSourceColumn( String column )
	{
		this.sourceColumn = column;
	}	
}
				
Properties

These properties will be set before any of the three methods are called. The setup method will initialize a PreparedStatement to retrieve a value from the lookup table. The cleanup method will close the PreparedStatement.

import java.sql.PreparedStatement;
...

public class DbLookupTableColumnTranslator implements DbColumnTranslator
{
...
	protected PreparedStatement stmt;
	
...
	public void setup( Connection con ) throws SQLException
	{
		stmt = con.prepareStatement( "SELECT "+lookupColumn+" FROM "+
				lookupTable+ " WHERE "+sourceColumn+" = ?");
	}

	public void cleanup( ) throws SQLException
	{
		stmt.close();
	}
...
}		
				
Setup And Cleanup

The Column Translator now needs to lookup the value based on the source value read from the file:

...

public class DbLookupTableColumnTranslator implements DbColumnTranslator 
{
	protected PreparedStatement stmt;
	protected JDBCParameterHelper helper = new JDBCParameterHelper();
	
	public ColumnValue getValue( ColumnDef column, ColumnValue columnValue )
	{
		ColumnValue returnValue = new ColumnValue();
		try
		{
			stmt.setString( 1, columnValue.getString() );
			ResultSet resultSet = stmt.executeQuery();
			if ( resultSet.next() )
			{
				returnValue = helper.getColumn( resultSet, 1, column );
			}
			resultSet.close();
		}
		catch ( SQLException e )
		{
			e.printStackTrace();
		}
		return returnValue;
	}
}
				
getValue Implementation

The DbLookupTableColumnTranslator sets the first parameter of the PreparedStatement to the value read from the file (it assumes that the column is a string). It then executes the query and returns the first value of the result set. If there is no values than an empty ColumnValue is returned.

This ends the tutorial for creating the custom Database-Aware Column Translator. The full source code of the DbLookupTableColumnTranslator is found under the package 'samples.columntranslator'. What follows now is the instructions on how to use the custom Column Translator during the import. If you have read through the first tutorial then you may wish to skip to the list of columns in the entity definition section. The other sections are the same as the first tutorial.

Import Config XML

Now that the database is setup, you can examine the import XML config file that will be used (in the samples directory under the filename : 'tutorial8/import.xml'). The file begins with the standard XML document declaration followed by the '<import>' tag. This tag indicates that there is an import to be processed. There are seven attributes specified on the '<import>' tag: the 'log' attribute, the 'bad' attribute, the 'commitCount', the 'batchCount' attribute, the 'preSQLFile' attribute, the 'postSQLFile' attribute and the 'trimValues' attribute. The 'log' attribute specifies a filename into which JDBCImporter writes all audit, error, and warnings that occur during the import. The 'bad' attribute specifies a filename into which JDBCImporter writes data that was not properly imported into the database. The 'commitCount' attribute specifies how many rows to import before calling commit on the JDBC Connection. The 'batchCount' attribute specifies how many rows to import before calling executeBatch on the import engine (when the JDBC driver supports batch mode). By default, the 'commitCount' and 'batchCount' attributes are set 1, auto commit is turned on and batch mode is not used. The 'preSQLFile' and the 'postSQLFile' attributes specify filenames that contain sql statements to be executed before and after the import , respectively. The 'trimValues' attribute specifies whether strings values read from the Delimiter Parser are trimmed (ie. remove leading and trailing whitespace). By default, it is set to false.

There are two parts inside the '<import>' tag that define how and where the data is imported: the connection definition and the entity definitions.

Connection Definition

The connection definition begins with '<connection>' tag and contains the information needed to connect to the database. In this tutorial, you will be using the JDBC DriverManager to initialize a connection to the database. To indicate this, the 'type' attribute's value, inside the '<connection>' tag, is 'jdbc'. The specific connection information is found inside the '<connection>' tag as '<property>' tags. A '<property>' tag has two attributes: 'name' specifies the name of the property and 'value' specifies the string value of the property. For the JDBC DriverManager, you will need to specify the following information: the driver class name (with the property name 'driver'), the connection url (with the property name 'url'), the username (with the property name 'username'), the password (with the property name 'password'). The following is an example of the connection definition :

 <connection type="jdbc"> 
    <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
    <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
    <property name="username" value="scott"/> 
    <property name="password" value="tiger"/> 
 </connection> 
Sample XML for Connection Definition

Entity Definition

Since you will be importing data into one table, there will be only one entity definition.In general, you will need an entity definition for each table that you will be importing data. Remember to specify the entity definitions in the order that the import should occur. For example, if table 'ingredient' depends on table 'recipe' (ie. has a foreign key), the entity definition of table 'recipe' should be placed before the entity definition of table 'ingredient'. Every entity definition begins with '<entity>' tag.

The 'table' attribute must contain the name of the table. Optionally, you can further specify the table by providing values for the 'schema' and the 'catalog' attributes.

To specify a custom import engine to process the entity, you may add the 'engine' attribute, whose value is the classname of the import engine.In this tutorial, you will be using the default import engine.

The 'source' attribute must contain the data file location.From looking at the sample data (found under 'samples/tutorial8/employee.csv'), you will see that there are 8 columns that are separated by the ',' character.

There are three parts inside the '<entity>' tag : the delimiter parser definition, row translator definition, and the list of columns found in the data file.

Delimiter Parser

The delimiter parser definition begins with the '<delimiter>' tag and contains the information needed to parse the input file into a set of rows that will be imported into the table.In this tutorial, you will be using the CSV Delimiter Parser. To indicate this, the 'type' attribute's value, inside the '<delimiter>' tag, is 'csv'. The specific Delimiter Parser information is found inside the '<delimiter>' tag. For the CSV Delimiter Parser, you will need to specify the following information (as '<property>' tags): the string that delimits a column (in the property named 'columnDelimiter'), the string that encloses a column (optional, in the property named 'enclosedDelimiter'), whether the string that encloses a column is optional (in the property named 'enclosedOptional', it must have a value of 'true' or 'false'). Since, the data file has only a column delimiter (',' is the string separating the columns), the Delimiter Parser definition will look like this :

  <delimiter type="csv"> 
    <property name="columnDelimiter" value=","/> 
  </delimiter> 
Sample XML for CSV Delimiter Parser

Row Translator

The row translator definition is optional and begins with the '<translator>' tag. It contains the information needed to translate each row's values and may add, remove column values or skip the whole row. In this tutorial, you will not be using a row translator. Therefore the '<translator>' does not appear as a child inside the '<entity>' tag.

List of Columns

The final portion of the entity definition is the list of columns that are to be imported from the input file into the database. The list of columns should be the same order as they appear in the input file. Each column is defined inside the '<column>' tag. The name of the column must appear in the 'name' attribute of the '<column>' tag. Optionally, the java.sql.Type may be specified in the 'SQLType' attribute of the '<column>'.You will be letting the JDBC Importer figure out most of the column types (except for dates) in the database, so the 'SQLType' attribute is omitted except for the 'startdate' column. Since you will be using the custom Column Translator for the 'department' column, you will have to specify the '<translator>' tag inside the '<column>' tag. You must choose an identifier for the custom Column Translator (ex. 'tutorial_lookup') and set the 'type' attribute's value, inside the '<translator>' tag, to that identifier. The specific Column Translator information is found inside the '<translator>' tag. For the DB Lookup Table Column Translator, there are three properties that need to be set : the lookup table = 'department', the source column = 'name' and the lookup column = 'id'. Here is an example of how the list of columns are defined in the import definition:

  <column name="id"></column>
  <column name="firstname"></column>
  <column name="lastname"></column>
  <column name="jobdescription"></column>
  <column name="manager"></column>
  <column name="startdate" SQLType="DATE"></column>
  <column name="department">
	 <translator type="tutorial_lookup"> 
		   <property name="lookupTable" value="department"/> 
		   <property name="sourceColumn" value="name"/> 
		   <property name="lookupColumn" value="id"/> 
	 </translator> 
</column>
Sample XML for List of Columns

Running the Import

By now, the import definition should look like this (with your appropriate connection information):

<import log="import.log bad="import.bad"> 
  <connection type="jdbc"> 
     <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
     <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
     <property name="username" value="scott"/> 
     <property name="password" value="tiger"/> 
  </connection> 
  <entity table="employee" source="employee.csv">
    <delimiter type="csv"> 
      <property name="columnDelimiter" value=","/> 
    </delimiter> 
    <column name="id"></column>
    <column name="firstname"></column>
    <column name="lastname"></column>
    <column name="jobdescription"></column>
    <column name="managerid"></column>
    <column name="startdate" SQLType="DATE"></column>
    <column name="salary"></column>
    <column name="department">
     <translator type="tutorial_lookup"> 
         <property name="lookupTable" value="department"/> 
         <property name="sourceColumn" value="name"/> 
         <property name="lookupColumn" value="id"/> 
     </translator> 
  </column>
  </entity> 
</import> 
Sample XML for Tutorial 5

Since you are using the custom Column Translator (DB Lookup Table Column Translator) you will have to create a property file with one entry that maps the identifier to the full name of the class that implements the DbColumnTranslator interface. The entry's key should start with 'columntranslator.' (this indicates that the custom component is a Column Translator). It should look like this:

columntranslator.tutorial_systemtime=samples.columntranslator.DbLookupTableColumnTranslator
				
Property File Entry for Custom Column Translator

You will also have to include an extra jar file in the classpath before you can use the custom Column Translator (the jar file 'jdbcimporter-samples.jar' under the directory 'lib' contains the custom Column Translator).

You can run the import by issuing the following command (assuming that the import definition and property file are in the current directory and are called 'import.xml' and 'custom.properties', respectively):

java net.sourceforge.jdbcimporter.Importer import.xml custom.properties

If all goes well then the two log files should be created. In the normal log file there should be an informational message indicating that all rows were imported. In the bad log file there should be a heading for the import table.