Massive data import! - Part 1

September 01, 2011Fabrice Bernhard2 min read

We often have to face the problem of importing data off an Excel file with thousands lines.
PHP is not suited for this task, it’s slow and there is a high risk for the import to crash due to "memory limit" or some other annoying stuff like that!
So instead we chose a better way by using pure SQL which is much faster at this kind of operation.

At first, you must convert your Excel file to CSV (Excel does it very well). Be careful to choose the right field separator: I generally use "~" because there is little chance of finding this character in your written data.

Steps:

  • Create a temporary table that matches exactly the structure of the Excel file
  • Fill the temporary table with the CSV file
  • Run SQL queries to fill your database

Practical example:

Suppose we have an Excel file containing thousands of users that must be dispatched to several tables depending on their type.

CSV file sample:

    User 1~user1@theodo.fr~0987564321~user~~~
    User 2~user2@theodo.fr~0134256789~user~~~
    User 3~user3@theodo.fr~0128971271~user~~~
    Agent 1~agent1@company.com~0486282688~agent~Company 1~Role 1~0987654321
    Agent 2~agent2@company.com~0176254621~agent~Company 2~Role 2~0445664332
    User 4~user4@company.com~0456789856~user~~~

1. Create the temporary table

We will create a table contain the following fields:

  • name
  • email
  • phone
  • type
  • company_name
  • agent_role
  • company_phone

DROP TABLE IF EXISTS user_tmp; CREATE TABLE user_tmp ( name varchar(127), email varchar(127), phone varchar(20), type varchar(20), company_name varchar(127), agent_role varchar(127), company_phone varchar(20), id int(11) NOT NULL auto_increment, PRIMARY KEY (`id`), UNIQUE KEY `IDX_ATTRIBUTE_VALUE` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

2. Fill the temporary table

Import your CSV file into the temporary table:

LOAD DATA LOCAL INFILE 'PATH\_TO\_YOUR\_CSV\_FILE/users.csv'
    INTO TABLE user\_tmp CHARACTER SET 'utf8' FIELDS TERMINATED BY '~' LINES TERMINATED BY '\\n';

3. Fill your own tables

Suppose you have the following two tables:

User

  • name
  • phone
  • email

Agent

  • name
  • phone
  • email
  • company_name
  • role
  • company_phone

Insert data with SQL queries:

INSERT INTO user (name, phone, email) SELECT name, phone, email FROM user_tmp WHERE type = 'user';

INSERT INTO agent (name, phone, email, company_name, role, company_phone) SELECT name, phone, email, company_name, agent_role, company_phone FROM user_tmp WHERE type = 'agent';

All done! Your tables are complete.

This is a simple example, you can use this method to make more complex data imports (with joins). All you need to do is to adapt your SQL queries.

Here we have seen how we can leverage something fast but apparently limited (LOAD DATA) and make it powerful, by using a temporary table and SQL requests inserting data into the actual tables.

F

Fabrice Bernhard

Web Developer at Theodo