JDBC Importer

Data Generator Logo

Data Generator Tutorial 4 : One Table, XML File, New Column Value Generator

Introduction

Please make sure you have the appropriate libraries in your classpath (including the JDBC driver if you are connecting to a database) before starting the tutorials.

In this tutorial, you'll learn the basics of creating a Column Value Generator (that requires values generated for another column) and running the data generation with it. The table, whose data will be generated, is called employee and it has the following columns :

employee
NameType
idnumber(6)
firstnamevarchar(10)
lastnamevarchar(10)
jobdescriptionvarchar(10)
manageridnumber(6)
startdatedate
salarynumber(9,2)
departmentnumber(6)

If you are letting Data Generator figure out the SQL Types or have a source database table then make sure that these table(s) are created in the database. You can find the oracle creation script in the samples directory under the filename : 'tutorial4/createtable_ora.sql'.

Now that the database is setup, you can examine the architecture to see how the Column Value Generator is used during the data generation.

The Column Value Generator is used during the generation for one function : generating random values for a column.

The DataGenerator requires that each Column Value Generator be a Java Bean like object. All <property> tags defined inside the '<generator>' will be passed to the appropriate set method. For example, with the Number Range Value Generator, you define a property called 'start' inside the '<delimiter>' tag. The DataGenerator looks for the 'setStart' method in the NumberRangeValueGenerator.java and calls it with the property value.

After the Column Value Generator is created, the DataGenerator will set the column definition (the setColumnDef() method) and ask for a list of columns whose values the Column Value Generator depends on (the getDependencies() method). If the Column Value Generator depends on other column values then the DataGenerator calls one of two methods: the setDependentValue() method or the setDependentValues() method. After all the dependencies are resolved, the DataGenerator retrieves the column values one at a time (the getNextColumnValue() method).

Custom Column Value Generator

The Custom Column Value Generator that you will be creating will be similar to the Dependent Choice Value Generator found under the package net.sourceforge.datagenerator.generator. Here are the requirements for the Optional Dependent Choice Value Generator :

The first thing to do is create the class OptionalDependentChoiceValueGenerator that extends the ColumnValueGeneratorBase abstract class.

import java.util.Random;

import net.sourceforge.datagenerator.ColumnValueGeneratorBase;

public class OptionalDependentChoiceValueGenerator extends ColumnValueGeneratorBase
{
  public String[] getDependencies( )
  {
    return new String[0];
  }

  public ColumnValue getNextColumnValue( int row, Random r )
  {
    return null;
  }
}
				
Initial Code

You would like the user to configure the dependent column name and the percentage of generated values that will be null/empty. So, the OptionalDependentChoiceValueGenerator class needs to have two methods : setDependency and setOptionalPercentage. In addition, the OptionalDependentChoiceValueGenerator will implement the getDependencies method to return an array whose element is the dependent column name. This will notify the DataGenerator to set the values of the dependent column before calling the getNextColumnValue method.

...

public class OptionalDependentChoiceValueGenerator extends ColumnValueGeneratorBase
{
  ...
  int    optionalPercentage = 0;
  String dependentField     = null;
   
  public void setDependency( String field )
  {
    dependentField = field;
  }

  public void setOptionalPercentage( int percentage )
  {
    this.optionalPercentage = percentage;
  }

  public String[] getDependencies() 
  {
    return new String[] { dependentField };
  }
}
				
dependency, optionalDependency properties

You can define the XML in the generate config file that will be used to construct and configure the Optional Dependent Choice Value Generator. The XML will contain two property elements named 'dependency' and optionalPercentage. Here is an example:

 <generator type="tutorial_optionalDependentChoice"> 
    <property name="dependency" value="employee.id"/> 
    <property name="optionalPercentage" value="10"/> 
 </generator> 
Sample XML for Optional Dependent Choice Value Generator

You can now implement the getNextColumnValue() method that will first check if the random int falls into the optional percentage. If it does, the column value generator will return an empty ColumnValue otherwise it will choose a value from the dependent columns' values.

public class OptionalDependentChoiceValueGenerator extends ColumnValueGeneratorBase
{
  ...
  public ColumnValue getNextColumnValue( int row, Random r )
  {
    ColumnValue result = new ColumnValue();
    int randomInt = r.nextInt();
    if ( optionalPercentage != 0 && Math.abs( randomInt ) % 100 <= optionalPercentage )
    {
      return result;
    }
    
    String[] dependency = ( String[] ) dependentValues.get( dependentField );
    if ( dependency == null )
    {
      return result;
    }
    if ( dependency.length > 1 )
    {
      int nextIndex = Math.abs( r.nextInt() % dependency.length );
      result.setString( dependency[nextIndex] );
    }
    else if ( dependency.length == 1)
    {
      result.setString( dependency[0] );
    }
    return result;
  }
}
				
getNextColumnValue Implementation

This ends the tutorial for creating the custom Column Value Generator. The full source code of the Optional Dependent Choice Value Generator is found under the package samples.columngenerator. What follows now is the instructions on how to use the custom Column Value Generator during the data generation. If you have read through the first tutorial then you may wish to skip to the list of columns section. The other sections are the same as the first tutorial.

Data Generate Config XML

You can examine the generate XML config file that will be used (in the samples directory under the filename : 'tutorial4/generate.xml'). The file begins with the standard XML document declaration followed by the '<generate>' tag. This tag indicates that there is a set of data to be generated. There is one attribute specified on the '<generate>' tag: the 'log' attribute. The 'log' attribute specifies a filename into which Data Generator writes all audit, error, and warnings that occur during the generation of data.

There are two parts inside the '<generate>' tag that define how and where the data is generated: the connection definition and the entity definitions.

Connection Definition

The connection definition begins with '<connection>' tag and contains the information needed to connect to the database. If you are explicitly specifying the SQL types for each column and there are no source database tables then this element is optional. In this tutorial, you will be using the JDBC DriverManager to initialize a connection to the database. To indicate this, the 'type' attribute's value, inside the '<connection>' tag, is 'jdbc'. The specific connection information is found inside the '<connection>' tag as '<property>' tags. A '<property>' tag has two attributes: 'name' specifies the name of the property and 'value' specifies the string value of the property. For the JDBC DriverManager, you will need to specify the following information: the driver class name (with the property name 'driver'), the connection url (with the property name 'url'), the username (with the property name 'username'), the password (with the property name 'password'). The following is an example of the connection definition :

 <connection type="jdbc"> 
    <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
    <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
    <property name="username" value="scott"/> 
    <property name="password" value="tiger"/> 
 </connection> 
Sample XML for Connection Def

Entity Definition

Since you will be generating data from one table, there will be only one entity definition. In general, you will need an entity definition for each table that you will be generating data or using as source data. Remember to specify the entity definitions in the order that the data generation should occur to improve performance. For example, if table 'ingredient' depends on table 'recipe' (ie. has a foreign key), the entity definition of table 'recipe' should be placed before the entity definition of table 'ingredient'. Every entity definition begins with '<entity>' tag. The difference between specifying a table for data generation and a table for source data is the attributes of the '<entity>' tag. When defining a table for data generation, the 'count' and the 'target' attributes are required. When defining a table for source data, the 'source' attribute is required.

The 'table' attribute must contain the name of the table. Optionally, you can further specify the table by providing values for the 'schema' and the 'catalog' attributes.

The 'count' attribute must contain the number of rows to generate.

The 'target' attribute must contain the data file location.You will be generating data into a xml file (called 'employee_generate.xml').

There are two parts inside the '<entity>' tag : the delimiter formatter definition and the list of columns that will be generated.

Delimiter Formatter

The delimiter formatter definition begins with the '<delimiter>' tag and contains the information needed to format a set of rows, that will be generated, into the output file.In this tutorial, you will be using the custom Delimiter Formatter 'XMLDelimiterFormatter' that was created in JDBCExporter tutorial 4. To indicate this, you must choose an identifier for the custom Delimiter Formatter (ex. 'tutorial_xml') and set the 'type' attribute's value, inside the '<delimiter>' tag, to that identifier. The specific delimiter formatter information is found inside the '<delimiter>' tag.

For the XML Delimiter Formatter, you will need to specify the following information (as '<property>' tags): the XSL filename (in the property named 'XSLFile' ). An XSL has been created for this tutorial (its name is employee_generate.xsl), so the delimiter parser definition will look like this :

  <delimiter type="tutorial_xml"> 
    <property name="XSLFile" value="employee_generate.xsl"/> 
  </delimiter> 
Sample XML for XML Delimiter Formatter

List of Columns

The final portion of the entity definition is the list of columns that will be generated or used as source values. The list of columns should be the same order as they will appear in the output file. Each column is defined inside the '<column>' tag. The name of the column must appear in the 'name' attribute of the '<column>' tag. Optionally, the java.sql.Type may be specified in the 'SQLType' attribute of the '<column>'.You will be letting the Data Generator figure out most of the column types (except for dates) in the database, so the 'SQLType' attribute is omitted except for the 'startdate' column.

Inside the '<column>' tag there will be '<generator>' tag that describes how data will be generated. You can indicate which column generator to use by specifying the 'type' attribute inside the '<generator>' tags. Each column generator can be further configured by using <property> tags.

For example, the choice column generator (represented by the type 'choice') requires that a property 'choices' be specified as a comma-separated list of possible values. The number sequence column generator requires the starting number (specified in the 'start' property) and has an optional property for specifying the increment (specified in the 'increment').

In this tutorial, you will be using the custom Column Value Generator 'OptionalDependentChoiceValueGenerator' when generating values for the managerid column. To indicate this, you must choose an identifier for the custom Column Value Generator (ex. 'tutorial_optionalDependentChoice' ) and set the 'type' attribute's value, inside the '<generator>' tag, to that identifier. The specific column value generator information is found inside the '<generator>' tag.

Here is an example of how the list of columns are defined in the data generate definition:

  <column name="id">
	 <generator type="numberSequence"> 
		   <property name="start" value="1"/> 
	 </generator> 
</column>
  <column name="managerid">
	 <generator type="tutorial_optionalDependentChoice"> 
		   <property name="dependency" value="employee.id"/> 
		   <property name="optionalPercentage" value="10"/> 
	 </generator> 
</column>
Sample XML for List of Columns

Running the Data Generation

By now, the generate definition should look like this (with your appropriate connection information):

<generate log="generate.log > 
  <connection type="jdbc"> 
     <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
     <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
     <property name="username" value="scott"/> 
     <property name="password" value="tiger"/> 
  </connection> 
  <entity table="employee" target="employee_generate.xml">count="75">
  <delimiter type="tutorial_xml"> 
    <property name="XSLFile" value="employee_generate.xsl"/> 
  </delimiter> 
	  <column name="id">
		 <generator type="numberSequence"> 
			   <property name="start" value="1"/> 
		 </generator> 
	</column>
	  <column name="firstname">
		 <generator type="choice"> 
			   <property name="choices" value="Chris,John,Mary,Sam,Mark,Matthew,Elizabeth,Jessica,Heather,Tabitha,Ervin,Eva"/> 
		 </generator> 
	</column>
	  <column name="lastname">
		 <generator type="choice"> 
			   <property name="choices" value="Smith,Jackson,Whitney,Coleman,Sheffield,Morris,Preston,Lynch,Starke,Hill,Simpson"/> 
		 </generator> 
	</column>	
	  <column name="jobdescription">
		 <generator type="choice"> 
			   <property name="choices" value="Developer,QA,Sales,Marketing"/> 
		 </generator> 
	</column>
	  <column name="managerid">
		 <generator type="tutorial_optionalDependentChoice"> 
			   <property name="dependency" value="employee.id"/> 
			   <property name="optionalPercentage" value="10"/> 
		 </generator> 
	</column>
	  <column name="startdate" SQLType="DATE">
		 <generator type="dateRange"> 
			   <property name="start" value="2000-01-01"/> 
			   <property name="end" value="2004-01-01"/> 
		 </generator> 
	</column>
	  <column name="salary">
		 <generator type="numberRange"> 
			   <property name="start" value="30000"/> 
			   <property name="end" value="90000"/> 
			   <property name="increment" value="1000"/> 
		 </generator> 
	</column>
	  <column name="department">
		 <generator type="numberRange"> 
			   <property name="start" value="1"/> 
			   <property name="end" value="9"/> 
			   <property name="increment" value="1"/> 
		 </generator> 
	</column>
  </entity> 
</generate>    
Sample XML for Tutorial 4

Since you are using the custom Delimiter Formatter (XML Delimiter Formatter) and the custom Column Value Generator (Optional Dependent Choice Value Generator), you will have to create a property file with two entries that map the identifiers to the full name of the classes that implements the DelimiterFormatter and ColumnValueGenerator interfaces. The first entry's key should start with 'delimiter.' (this indicates that the custom component is a Delimiter Formatter) and the second entry's key should start with 'generator' (this indicates that the custom component is a Column Value Generator). It should look like this:

delimiter.tutorial_xml=samples.delimiterformatter.XMLDelimiterFormatter
generator.tutorial_optionalDependentChoice=samples.columngenerator.OptionalDependentChoiceValueGenerator
				
Property File Entries for XML Delimiter Formatter and Optional Dependent Choice Value Generator

You will also have to include an extra jar file in the classpath before you can use the custom Delimiter Formatter and the custom Column Value Generator (the jar file 'jdbcimporter-samples.jar' under the directory 'lib' contains the custom Delimiter Formatter and the custom Column Value Generator).

You can run the data generation by issuing the following command (assuming that the data generation definition and property file is in the current directory and is called 'generate.xml' and 'custom_generate.properties', respectively):

java net.sourceforge.datagenerator.DataGenerator generate.xml custom_generate.properties

If all goes well then the log file and the 'employee_generate.xml' file should be created. In the log file there should be an informational message indicating that all rows were generated.