JDBC Importer

Data Generator Logo

Data Generator Tutorial 5 : One Table, One Source File, CSV Delimited File

Introduction

Please make sure you have the appropriate libraries in your classpath (including the JDBC driver if you are connecting to a database) before starting the tutorials.

In this tutorial, you'll learn how to use a source table (whose data is in a file) as a dependent column value. The table, whose data will be generated, is called employee and it has the following columns :

employee
NameType
idnumber(6)
firstnamevarchar(10)
lastnamevarchar(10)
jobdescriptionvarchar(10)
manageridnumber(6)
startdatedate
salarynumber(9,2)
departmentnumber(6)

The table that contains the source table is called department and it has the following columns :

department
NameType
idnumber(6)
namevarchar(20)
officevarchar(20)

If you are letting Data Generator figure out the SQL Types or have a source database table then make sure that these table(s) are created in the database. You can find the oracle creation script in the samples directory under the filename : 'tutorial5/createtable_ora.sql'.

You can examine the generate XML config file that will be used (in the samples directory under the filename : 'tutorial5/generate.xml'). The file begins with the standard XML document declaration followed by the '<generate>' tag. This tag indicates that there is a set of data to be generated. There is one attribute specified on the '<generate>' tag: the 'log' attribute. The 'log' attribute specifies a filename into which Data Generator writes all audit, error, and warnings that occur during the generation of data.

There are two parts inside the '<generate>' tag that define how and where the data is generated: the connection definition and the entity definitions.

Connection Definition

The connection definition begins with '<connection>' tag and contains the information needed to connect to the database. If you are explicitly specifying the SQL types for each column and there are no source database tables then this element is optional. In this tutorial, you will be using the JDBC DriverManager to initialize a connection to the database. To indicate this, the 'type' attribute's value, inside the '<connection>' tag, is 'jdbc'. The specific connection information is found inside the '<connection>' tag as '<property>' tags. A '<property>' tag has two attributes: 'name' specifies the name of the property and 'value' specifies the string value of the property. For the JDBC DriverManager, you will need to specify the following information: the driver class name (with the property name 'driver'), the connection url (with the property name 'url'), the username (with the property name 'username'), the password (with the property name 'password'). The following is an example of the connection definition :

 <connection type="jdbc"> 
    <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
    <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
    <property name="username" value="scott"/> 
    <property name="password" value="tiger"/> 
 </connection> 
Sample XML for Connection Def

Entity Definition

Since you will be generating data from one table and one source table, there will be only two entity definitions. In general, you will need an entity definition for each table that you will be generating data or using as source data. Remember to specify the entity definitions in the order that the data generation should occur to improve performance. For example, if table 'ingredient' depends on table 'recipe' (ie. has a foreign key), the entity definition of table 'recipe' should be placed before the entity definition of table 'ingredient'. Every entity definition begins with '<entity>' tag. The difference between specifying a table for data generation and a table for source data is the attributes of the '<entity>' tag. When defining a table for data generation, the 'count' and the 'target' attributes are required. When defining a table for source data, the 'source' attribute is required.. In this section, you'll be defining the table for data generation. A following section describes the entity definition for the source table.

The 'table' attribute must contain the name of the table. Optionally, you can further specify the table by providing values for the 'schema' and the 'catalog' attributes.

The 'count' attribute must contain the number of rows to generate.

The 'target' attribute must contain the data file location.You will be generating data into a csv file (called 'employee_generate.csv'). The csv file will have 8 columns separated by the ',' character (similar to the file found under 'samples/tutorial1/employee.csv').

There are two parts inside the '<entity>' tag : the delimiter formatter definition and the list of columns that will be generated.

Delimiter Formatter

The delimiter formatter definition begins with the '<delimiter>' tag and contains the information needed to format a set of rows, that will be generated, into the output file. In this tutorial, you will be using the CSV Delimiter Formatter. To indicate this, the 'type' attribute's value, inside the '<delimiter>' tag, is 'csv'. The specific delimiter formatter information is found inside the '<delimiter>' tag.

For the CSV Delimiter Formatter, you will need to specify the following information (as '<property>' tags): the string that delimits a column (in the property named 'columnDelimiter'), the string that encloses a column (optional, in the property named 'enclosedDelimiter'), whether the string that encloses a column is optional (in the property named 'enclosedOptional', it must have a value of 'true' or 'false'). Since, the data file will only have a column delimiter (',' is the string separating the columns), the delimiter formatter definition will look like this :

  <delimiter type="csv"> 
    <property name="columnDelimiter" value=","/> 
  </delimiter> 
Sample XML for CSV Delimiter Formatter

List of Columns

The final portion of the entity definition is the list of columns that will be generated or used as source values. The list of columns should be the same order as they will appear in the output file. Each column is defined inside the '<column>' tag. The name of the column must appear in the 'name' attribute of the '<column>' tag. Optionally, the java.sql.Type may be specified in the 'SQLType' attribute of the '<column>'.You will be letting the Data Generator figure out most of the column types (except for dates) in the database, so the 'SQLType' attribute is omitted except for the 'startdate' column.

Inside the '<column>' tag there will be '<generator>' tag that describes how data will be generated. You can indicate which column generator to use by specifying the 'type' attribute inside the '<generator>' tags. Each column generator can be further configured by using <property> tags.

For example, the choice column generator (represented by the type 'choice') requires that a property 'choices' be specified as a comma-separated list of possible values. The number sequence column generator requires the starting number (specified in the 'start' property) and has an optional property for specifying the increment (specified in the 'increment').

Here is an example of how the list of columns are defined in the data generate definition:

  <column name="id">
	 <generator type="numberSequence"> 
		   <property name="start" value="1"/> 
	 </generator> 
</column>
  <column name="firstname">
	 <generator type="choice"> 
		   <property name="choices" value="Chris,John,Mary,Sam,Mark,Matthew,Elizabeth,Jessica,Heather,Tabitha,Ervin,Eva"/> 
	 </generator> 
</column>
  <column name="lastname">
	 <generator type="choice"> 
		   <property name="choices" value="Smith,Jackson,Whitney,Coleman,Sheffield,Morris,Preston,Lynch,Starke,Hill,Simpson"/> 
	 </generator> 
</column>	
Sample XML for List of Columns

Source Entity Definition

The next entity definition will define a source table. There are two types of source data for data generation : data from a file or data from a database. In this tutorial, you will be using source data from a file.

The 'source' attribute specifies the file containing the source data or 'db' to indicate the database contains the source data.You will be using a file called 'department.csv' for source data. The file is a csv delimited file and you will have to specify a delimiter parser that parses the data from the file.

There are two parts inside the '<entity>' tag : the delimiter parser definition and the list of columns that will be generated. The delimiter parser definition is optional when defining a entity whose source data is in a database.

Delimiter Parser

The delimiter parser definition begins with the '<delimiter>' tag and contains the information needed to parse the input file into a set of rows that will used as source data. In this tutorial, you will be using the CSV Delimiter Formatter. To indicate this, the 'type' attribute's value, inside the '<delimiter>' tag, is 'csv'. The specific delimiter formatter information is found inside the '<delimiter>' tag.

For the CSV Delimiter Formatter, you will need to specify the following information (as '<property>' tags): the string that delimits a column (in the property named 'columnDelimiter'), the string that encloses a column (optional, in the property named 'enclosedDelimiter'), whether the string that encloses a column is optional (in the property named 'enclosedOptional', it must have a value of 'true' or 'false'). Since, the data file will only have a column delimiter (',' is the string separating the columns), the delimiter formatter definition will look like this :

  <delimiter type="csv"> 
    <property name="columnDelimiter" value=","/> 
  </delimiter> 
Sample XML for CSV Delimiter Parser

List of Columns

The final portion of the entity definition is the list of columns that are to be used as source data. The list of columns should be the same order as they appear in the input file. Each column is defined inside the '<column>' tag. The name of the column must appear in the 'name' attribute of the '<column>' tag. Optionally, the java.sql.Type may be specified in the 'SQLType' attribute of the '<column>'.You will be letting the Data Generator figure out most of the column types (except for dates) in the database, so the 'SQLType' attribute is omitted except for the 'startdate' column.

Here is an example of how the list of columns are defined for a source table in the data generate definition:

  <column name="id">
</column>
  <column name="name">
</column>
  <column name="office">
</column>	
Sample XML for List of Columns

Now that the entity definition for the source table has been defined, you can reference any column inside the first entity definition. For this tutorial, you will be using the 'department.id' column as source data for the 'employee.department' column.

Running the Data Generation

By now, the generate definition should look like this (with your appropriate connection information):

<generate log="generate.log > 
  <connection type="jdbc"> 
     <property name="driver" value="oracle.jdbc.driver.OracleDriver"/> 
     <property name="url" value="jdbc:oracle:thin:@localhost:1521:orcl"/> 
     <property name="username" value="scott"/> 
     <property name="password" value="tiger"/> 
  </connection> 
  <entity table="employee" target="employee_generate.csv">count="75">
    <delimiter type="csv"> 
      <property name="columnDelimiter" value=","/> 
    </delimiter> 
	  <column name="id">
		 <generator type="numberSequence"> 
			   <property name="start" value="1"/> 
		 </generator> 
	</column>
	  <column name="firstname">
		 <generator type="choice"> 
			   <property name="choices" value="Chris,John,Mary,Sam,Mark,Matthew,Elizabeth,Jessica,Heather,Tabitha,Ervin,Eva"/> 
		 </generator> 
	</column>
	  <column name="lastname">
		 <generator type="choice"> 
			   <property name="choices" value="Smith,Jackson,Whitney,Coleman,Sheffield,Morris,Preston,Lynch,Starke,Hill,Simpson"/> 
		 </generator> 
	</column>	
	  <column name="jobdescription">
		 <generator type="choice"> 
			   <property name="choices" value="Developer,QA,Sales,Marketing"/> 
		 </generator> 
	</column>
	  <column name="managerid">
		 <generator type="numberRange"> 
			   <property name="start" value="30000"/> 
			   <property name="end" value="90000"/> 
			   <property name="increment" value="1"/> 
		 </generator> 
	</column>
	  <column name="startdate" SQLType="DATE">
		 <generator type="dateRange"> 
			   <property name="start" value="2000-01-01"/> 
			   <property name="end" value="2004-01-01"/> 
		 </generator> 
	</column>
	  <column name="salary">
		 <generator type="numberRange"> 
			   <property name="start" value="30000"/> 
			   <property name="end" value="90000"/> 
			   <property name="increment" value="1000"/> 
		 </generator> 
	</column>
	  <column name="department">
		 <generator type="dependentChoice"> 
			   <property name="dependency" value="department.id"/> 
		 </generator> 
	</column>
  </entity> 
  <entity table="department" source="department.csv">
    <delimiter type="csv"> 
      <property name="columnDelimiter" value=","/> 
    </delimiter> 
    <column name="id">
  </column>
    <column name="name">
  </column>
    <column name="office">
  </column>
  </entity> 
</generate>    
Sample XML for Tutorial 5

Since there are no custom connections, delimiters or translators, you can run the data generation by issuing the following command (assuming that the data generation definition is in the current directory and is called 'generate.xml'):

java net.sourceforge.datagenerator.DataGenerator generate.xml

If all goes well then the log file and the 'employee_generate.csv' file should be created. In the log file there should be an informational message indicating that all rows were generated.