JDBC Importer

JDBC Importer Tutorial 11 : One Table, Excel File (New Binary Delimiter Parser)

Please make sure you have the appropriate libraries in your classpath (including the JDBC driver used to connect to your database) before starting the tutorials.

In this tutorial, you'll learn the basics of creating a new Binary Delimiter Parser and running the import with it. The table that will contain the rows imported is called employee and it has the following columns:

employee
Name	Type
id	number(6)
firstname	varchar(10)
lastname	varchar(10)
jobdescription	varchar(10)
managerid	number(6)
startdate	date
salary	number(9,2)
department	number(6)

Make sure that these table(s) are created in the database that you'll be importing data. You can find the oracle creation script in the samples directory under the filename : 'tutorial11/createtable_ora.sql'.

Now that the database is setup, you can examine the architecture to see how the Binary Delimiter Parser is used during the import.

Architecture Background

The Binary Delimiter Parser similar to the Delimiter Parser. It is used during the import for two functions: splitting the binary file into multiple rows that will be imported (the getNextRow() method), and splitting each row into multiple columns (the getValues() method).

The JDBCImporter requires that each Binary Delimiter Parser be a Java Bean like object. All <property> tags defined inside the '<delimiter>' will be passed to the appropriate set method. For example, with the CSV Delimiter Parser, you define a <property> tag with the name 'columnDelimiter' inside the '<delimiter>' tag. The JDBCImporter looks for the 'setCSVDelimiter' method in the CSVDelimtierParser.java and calls it with the property value.

After the Binary Delimiter Parser is created, the JDBCImporter will set the input reader (the setInputStream() method) before calling the two methods. The JDBCImporter will call the getNextRow() (followed by getValues()) until getNextRow() returns null. The JDBCImporter will also call the getRowAsString() to convert the row into a string that can be logged if the import fails.

Custom Delimiter Parser

Here are the requirements for the Excel Binary Delimiter Parser :

Any Excel spreadsheet can be imported

The first thing to do is create the class ExcelBinaryDelimiterParser that implements the BinaryDelimiterParser interface.

package samples.delimiterparser;

import java.io.IOException;
import java.io.InputStream;

import net.sourceforge.jdbcimporter.BinaryDelimiterParser;
import net.sourceforge.jdbcimporter.ColumnValue;
import net.sourceforge.jdbcimporter.MalformedDataException;

public class ExcelBinaryDelimiterParser implements BinaryDelimiterParser {

	public Object getNextRow() throws IOException 
	{
		// TODO Auto-generated method stub
		return null;
	}

	public String getRowAsString(Object row) 
	{
		// TODO Auto-generated method stub
		return null;
	}

	public ColumnValue[] getValues(Object row) throws MalformedDataException 
	{
		// TODO Auto-generated method stub
		return null;
	}

	public void setInputStream(InputStream in) 
	{
		// TODO Auto-generated method stub
	}

}

Initial Code

The user will have to configure the Excel Binary Delimiter Parser with the sheet and the columns that will be read to import each row. So, the ExcelBinaryDelimiterParser class needs to have a method to set the index of the sheet and another method to set the array of columns.

...

public class ExcelBinaryDelimiterParser implements DelimiterParser
{
  ...
	
	int sheet = 0;
	int[] columns = new int[0];
	
	public void setSheet( int sheet )
	{
		this.sheet = sheet;
	}
	
	public void setColumns( int[] columns )
	{
		this.columns = columns;
	}
}

XSLFile property

You can now define the XML in the import config file that will be used to construct and configure the XML Delimiter Parser. The XML will contain two property elements: 'sheet' that will contain the index of the sheet and 'columns' that will contain the array of columns. Here is an example:

  <delimiter type="tutorial_excel"> 
    <property name="sheet" value="0"/> 
    <property name="columns" value="0,1,2,3,4,5,6,7"/> 
  </delimiter>

Sample XML for Excel Binary Delimiter Parser

Now that the Excel Binary Delimiter Parser is properly setup, a third party library called 'Java Excel API' will be used to read the Excel spreadsheet and implement the two methods. The first method 'getNextRow()' is implemented by opening the input stream as a Java Excel API Workbook. If the current row is less then the maximum row, each column is validated to be not null. The row is returned as an Integer:

import java.io.IOException;
import java.io.InputStream;

import jxl.Cell;
import jxl.CellType;
import jxl.DateCell;
import jxl.LabelCell;
import jxl.NumberCell;
import jxl.Sheet;
import jxl.Workbook;
import jxl.read.biff.BiffException;
import net.sourceforge.jdbcimporter.BinaryDelimiterParser;
import net.sourceforge.jdbcimporter.ColumnValue;
import net.sourceforge.jdbcimporter.MalformedDataException;

public class ExcelBinaryDelimiterParser implements BinaryDelimiterParser {

	Workbook  curWorkbook   = null;
	Sheet     curSheet      = null;
	int         curRow      = 0;
	int         curMaxRow   = 0;
	
...
		
	public Object getNextRow() throws IOException 
	{
		if ( curWorkbook == null )
		{
			try
			{
				curWorkbook = Workbook.getWorkbook(curIn);
				curSheet = curWorkbook.getSheet(sheet);
				curRow   = 0;
				curMaxRow = curSheet.getRows();
			}
			catch ( BiffException e )
			{
				throw new IOException( e.getMessage() );
			}
		}
		
		if ( curRow >= curMaxRow )
		{
			return null;
		}
		
		boolean isNull = true;
		
		for ( int i = 0; i < columns.length; i++ )
		{
			int col = columns[i];
			if ( curSheet.getCell(col,curRow).getContents() != null )
			{
				isNull = false;
				break;
			}
		}
		
		if ( isNull )
		{
			return null;
		}
		Integer returnInteger = new Integer(curRow);
		curRow++;
		return returnInteger;
	}
	
...

}

getNextRow Implementation

The second method 'getValues()' will take the Integer and create a ColumnValue[] array by reading Excel column values of that row.

...
public class ExcelBinaryDelimiterParser implements DelimiterParser
{
...	
  public ColumnValue[] getValues(String nextRow) throws MalformedDataException
  {
		Integer rowInt = (Integer) row;
		int nextRow = rowInt.intValue();
		ColumnValue[] vals = new ColumnValue[columns.length];
		for ( int i = 0; i < columns.length; i++ )
		{
			int col = columns[i];
			Cell cell = curSheet.getCell(col,nextRow);
			vals[i] = new ColumnValue();
			if (cell.getType() == CellType.LABEL)
			{
				vals[i].setString( ((LabelCell) cell).getString() );
			}
			else if (cell.getType() == CellType.NUMBER)
			{
			    vals[i].setObject( new Double( ((NumberCell) cell).getValue() ) );
			}
			else if (cell.getType() == CellType.DATE)
			{
				vals[i].setObject( ((DateCell) cell).getDate() );
			}
		}
		return vals;
  }
...
}

getValues Implementation

Finally, the two last methods need to be implemented along with the 'finalize' method to clean up the resources:

...
public class ExcelBinaryDelimiterParser implements DelimiterParser
{
...
	InputStream curIn       = null;
...	
	public void finalize()
	{
		if ( curWorkbook != null )
		{
			curWorkbook.close();
			curWorkbook = null;
			curSheet = null;
		}
	}

	public String getRowAsString(Object row) {
		Integer rowInt = (Integer) row;
		int nextRow = rowInt.intValue();
		StringBuffer buf = new StringBuffer("");
		ColumnValue[] vals = new ColumnValue[columns.length];
		for ( int i = 0; i < columns.length; i++ )
		{
			int col = columns[i];
			Cell cell = curSheet.getCell(col,nextRow);
			if ( i > 0 )
			{
				buf.append(",");
			}
			if ( cell.getContents() != null )
			{
				buf.append( cell.getContents() );
			}
		}
		return buf.toString();
	}

	public void setInputStream(InputStream in) 
	{
		if ( curWorkbook != null )
		{
			curWorkbook.close();
			curWorkbook = null;
			curSheet = null;
		}
		curIn = in;
	}
	
...
}

Other Method Implementations

You now need to convert the comma-separated list of columns into an array of integers. Following the Java Bean specification, you will have to create one class: 'ExcelBinaryDelimiterParserBeanInfo'. The 'ExcelBinaryDelimiterParserBeanInfo' simply tells the JDBCImporter that there is a property called 'columns' and that it should use the 'IntegerArrayEditor' to convert the string representation into an array of integers. Here is the class :

package samples.delimiterparser;

import java.beans.BeanDescriptor;
import java.beans.PropertyDescriptor;
import java.beans.SimpleBeanInfo;

import net.sourceforge.jdbcimporter.util.IntegerArrayEditor;

public class ExcelBinaryDelimiterParserBeanInfo extends SimpleBeanInfo {

	PropertyDescriptor[] descriptors;
	BeanDescriptor descriptor;
	
	public ExcelBinaryDelimiterParserBeanInfo()
		throws java.beans.IntrospectionException
	{
		try
		{
			descriptors = new PropertyDescriptor[2];
			descriptors[0] = new PropertyDescriptor( "sheet",   ExcelBinaryDelimiterParser.class, null, "setSheet" );
			descriptors[1] = new PropertyDescriptor( "columns", ExcelBinaryDelimiterParser.class, null, "setColumns" );
			descriptors[1].setPropertyEditorClass( IntegerArrayEditor.class );
		}
		catch ( java.beans.IntrospectionException e )
		{
			e.printStackTrace();
			throw e;
		}
		descriptor = new BeanDescriptor(ExcelBinaryDelimiterParser.class);
	}
	
	public BeanDescriptor getBeanDescriptor()
	{
		return descriptor;
	}

	public PropertyDescriptor[] getPropertyDescriptors()
	{
		return descriptors;
	}
}

Bean Info for Excel Delimiter Parser

This ends the tutorial for creating the custom Binary Delimiter Parser. The full source code of the Excel Binary Delimiter Parser is found under the package samples.delimiterparser. What follows now is the instructions on how to use the custom Binary Delimiter Parser during the import. If you have read through the first tutorial then you may wish to skip to the delimiter parser definition section. The other sections are the same as the first tutorial.

Import Config XML

Now that the database is setup, you can examine the import XML config file that will be used (in the samples directory under the filename : 'tutorial11/import.xml'). The file begins with the standard XML document declaration followed by the '<import>' tag. This tag indicates that there is an import to be processed. There are seven attributes specified on the '<import>' tag: the 'log' attribute, the 'bad' attribute, the 'commitCount', the 'batchCount' attribute, the 'preSQLFile' attribute, the 'postSQLFile' attribute and the 'trimValues' attribute. The 'log' attribute specifies a filename into which JDBCImporter writes all audit, error, and warnings that occur during the import. The 'bad' attribute specifies a filename into which JDBCImporter writes data that was not properly imported into the database. The 'commitCount' attribute specifies how many rows to import before calling commit on the JDBC Connection. The 'batchCount' attribute specifies how many rows to import before calling executeBatch on the import engine (when the JDBC driver supports batch mode). By default, the 'commitCount' and 'batchCount' attributes are set 1, auto commit is turned on and batch mode is not used. The 'preSQLFile' and the 'postSQLFile' attributes specify filenames that contain sql statements to be executed before and after the import , respectively. The 'trimValues' attribute specifies whether strings values read from the Delimiter Parser are trimmed (ie. remove leading and trailing whitespace). By default, it is set to false.

There are two parts inside the '<import>' tag that define how and where the data is imported: the connection definition and the entity definitions.

Connection Definition

The connection definition begins with '<connection>' tag and contains the information needed to connect to the database. In this tutorial, you will be using the JDBC DriverManager to initialize a connection to the database. To indicate this, the 'type' attribute's value, inside the '<connection>' tag, is 'jdbc'. The specific connection information is found inside the '<connection>' tag as '<property>' tags. A '<property>' tag has two attributes: 'name' specifies the name of the property and 'value' specifies the string value of the property. For the JDBC DriverManager, you will need to specify the following information: the driver class name (with the property name 'driver'), the connection url (with the property name 'url'), the username (with the property name 'username'), the password (with the property name 'password'). The following is an example of the connection definition :

 <connection type="jdbc"> 
    <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
    <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
    <property name="username" value="scott"/> 
    <property name="password" value="tiger"/> 
 </connection>

Sample XML for Connection Definition

Entity Definition

Since you will be importing data into one table, there will be only one entity definition.In general, you will need an entity definition for each table that you will be importing data. Remember to specify the entity definitions in the order that the import should occur. For example, if table 'ingredient' depends on table 'recipe' (ie. has a foreign key), the entity definition of table 'recipe' should be placed before the entity definition of table 'ingredient'. Every entity definition begins with '<entity>' tag.

The 'table' attribute must contain the name of the table. Optionally, you can further specify the table by providing values for the 'schema' and the 'catalog' attributes.

To specify a custom import engine to process the entity, you may add the 'engine' attribute, whose value is the classname of the import engine.In this tutorial, you will be using the default import engine.

The 'source' attribute must contain the data file location. From looking at the sample data (found under 'samples/tutorial11/employee.xls'), you will see that the file is an Excel file.

There are three parts inside the '<entity>' tag : the delimiter parser definition, row translator definition, and the list of columns found in the data file.

Delimiter Parser

The delimiter parser definition begins with the '<delimiter>' tag and contains the information needed to parse the input file into a set of rows that will be imported into the table.In this tutorial, you will be using the custom Binary Delimiter Parser 'Excel Binary Delimiter Parser'. To indicate this, you must choose an identifier for the custom binary delimiter parser (ex. 'tutorial_excel') and set the 'type' attribute's value, inside the '<delimiter>' tag, to that identifier. The specific binary delimiter parser information is found inside the '<delimiter>' tag. For the Binary Delimiter Parser, you will need to specify the following information (as '<property>' tags): the array of columns (in the property named 'columns' ). Each row's columns in the Excel spreadsheet is in the same order as the database's columns to be imported, so the delimiter parser definition will look like this :

  <delimiter type="tutorial_excel"> 
	   <property name="columns" value="0,1,2,3,4,5,6,7"/> 
  </delimiter>

Sample XML for Excel Binary Delimiter Parser

Row Translator

The row translator definition is optional and begins with the '<translator>' tag. It contains the information needed to translate each row's values and may add, remove column values or skip the whole row. In this tutorial, you will not be using a row translator. Therefore the '<translator>' does not appear as a child inside the '<entity>' tag.

List of Columns

The final portion of the entity definition is the list of columns that are to be imported from the input file into the database. The list of columns should be the same order as they appear in the input file. Each column is defined inside the '<column>' tag. The name of the column must appear in the 'name' attribute of the '<column>' tag. Optionally, the java.sql.Type may be specified in the 'SQLType' attribute of the '<column>'.You will be letting the JDBC Importer figure out most of the column types (except for dates) in the database, so the 'SQLType' attribute is omitted except for the 'startdate' column. Here is an example of how the list of columns are defined in the import definition:

   <column name="id"></column>
   <column name="firstname"></column>
   <column name="lastname"></column>

Sample XML for List of Columns

Running the Import

By now, the import definition should look like this (with your appropriate connection information):

<import log="import.log bad="import.bad"> 
	 <connection type="jdbc"> 
		   <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
		   <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
		   <property name="username" value="scott"/> 
		   <property name="password" value="tiger"/> 
	 </connection> 
	 <entity table="employee" source="employee.xls">
		  <delimiter type="tutorial_excel"> 
			   <property name="columns" value="0,1,2,3,4,5,6,7"/> 
		  </delimiter> 
		  <column name="id"></column>
		  <column name="firstname"></column>
		  <column name="lastname"></column>
		  <column name="jobdescription"></column>
		  <column name="managerid"></column>
		  <column name="startdate" SQLType="DATE"></column>
		  <column name="salary"></column>
		  <column name="department"></column>
	 </entity> 
</import>

Sample XML for Tutorial 11

Since you are using the custom Binary Delimiter Parser (Binary Delimiter Parser), you will have to create a property file with one entry that maps the identifier to the full name of the class that implements the Binary Delimiter Parser interface. The entry's key should start with 'delimiter.' (this indicates that the custom component is a delimiter parser or binary delimiter parser). It should look like this:

delimiter.tutorial_excel=samples.delimiterparser.ExcelBinaryDelimiterParser

Property File Entry for XML Delimiter Parser

You will also have to include the extra jar files in the classpath before you can use the custom Delimiter Parser : the jar file 'jdbcimporter-samples.jar' under the directory 'lib' contains the custom Binary Delimiter Parser) and the jxl.jar found in the distribution of Java Excel API (http://www.andykhan.com/jexcelapi/).

You can run the import by issuing the following command (assuming that the import definition and property file is in the current directory and is called 'import.xml' and 'custom.properties', respectively):

java net.sourceforge.jdbcimporter.Importer import.xml custom.properties

If all goes well then the two log files should be created. In the normal log file there should be an informational message indicating that all rows were imported. In the bad log file there should be a heading for the import table.