Blog

import old database in new schema with mysql and rails

import old database in new schema with mysql and rails

In this tutorial I will show you how you could import an old database into a new schema with mysql and rails. I will explain the process with an example scenario.

Scenario

Imagine you have an old php project with an old database schema and you want to completely refactor this project. Firstly you want to use rails instead of php and create a new rails project with the standard rails conventions. Secondly you want to restructure the old database into a new scheme without losing data and make it more "railsish". You just have a mysql dump of your old database.

Old database:
You have a mysql database with a few tables:

  • "Person" with: pid (bigint 20), uid (varchar 8), firstname (varchar 40), lastname (varchar (50), mail (varchar 40), isfemale (tinyint 4)
  • "Work" with: wid (bigint 20), headline (varchar 250), text_facts (text), text_long_facts(text),  pub_date (date), hyperlink (varchar 200)

Now you want to change this scheme to the following one:

New database:
Your new database contains the following tables:

  • "users" with: id (int 11), first_name ( varchar 255), last_name (varchar 255), email (varchar 255), isfemale (tinyint 1)
  • "projects" with: id (int 11), title (varchar 255), teaser (text), description (text), published_at (date), link (varchar 255)

Now you have good the background information, let us move one step further:

 

Model creation

You have to create migrations for your new models. Just open a terminal and generate them with the normal rails generators:

rails g model User first_name:string last_name:string email:string isfemale:boolean

rails g model Project title:string teaser:text description:text published_at:date, link:string

Do not forget to  migrate your database:

rake db:migrate

Now you have generated your models and you can add some validations. In this scneario I add the "presence: true" validation to every model attribute.

 

Importer Structure

You have a mysql dump and want to import this dump into the new schema. I will now explain how you can do that through a Raketask.

You need a Importer class and a normal Raketask. For a better structure I will split up the Importer class into different modules. The structure of the whole importer looks like as follows:

  • /lib/tasks/project_name.rake: this file contains the Raketasks
  • /lib/tasks/importer/importer.rb: Importer class which includes the different sub-importer modules and all necessary extension files.
  • /lib/tasks/importer/users_importer.rb: sub-importer module which contains the users importer stuff.
  • /lib/tasks/importer/projects_importer.rb: sub-importer module which contains the projects importer code.

Subsequently I want explain each of this classes and modules. Let's start with the Raketask file

 

.rake-File

namespace :project_name do
  require Rails.root + "lib/tasks/importer"

  desc "Import old database, usage: rake project_name:import['old_database_name']"
  task :import, :oldDatabase, needs::environment do |t, args|
    args.with_defaults(oldDatabase: "import")

    oldDatabaseName = args.oldDatabse
    newDatabaseName = YAML::load(IO.read(Rails.root.join("config/database.yml")))[Rails.env]["database"]

    importer = Importer.new newDatabaseName, oldDatabaseName
    importer.execute
  end
end

This is the .rake-File and contains just one normal Raketask. This Raketask requires the name of the old database. For example you create a new database called "old_database" with phpmyadmin or a tool like that. After that you have to import the mysql dump from your old database in the currently new created one. The next step is to call the Raketask in your terminal: "rake project_name:import['old_database']. This will trigger the import. At this moment the import will fail, because you do not have neither the Importer class nor the instance method "execute" called at the end of the Raketask. So let's move further to the Importer class.

 

importer.rb

require File.dirname(__FILE__) +  "/users_importer"
require File.dirname(__FILE__) +  "/projects_importer"

class Importer

  # include sub-modules
  include UsersImporter
  include ProjectsImporter

  # initializer for a new importer
  def initialize new, old
    # feedback for the programmer
    puts "Importing from #{old} to {new}"

    @newDb = new
    @oldDb = old
  end

  # execute the import
  def execute
    
    # if you do not like to import data, if the new database
    # already contains data, just a security hint..
    if User.count > 0
      raise "Import aborted! There already are users in the database."
    end
    
    if Project.count > 0
     ..and so on..
    #####
    
    # call sub-importer modules
    import_users
    import_projects
  end

  # later in the import process you have to switch beween
  # the old and the new database.

  # use new database (= switch to new database)
  def use_new_database
    ActiveRecord::Base.connection.execute("use #{@newDb}")
  end

  # use old database (= switch to old database)
  def use_old_database
    ActiveRecord::Base.connection.execute("use #{@oldDb}")
  end

end

The importer.rb contains the Importer class with a normal initializer and an execute method. This method just calls all sub-module importer methods to completely start the import. At this moment it will not work, because the sub-modules do not exist yet. But in the end everthing will work together very good :-). Furthermore the Importer class has two helper methods to switch between the old and the new database.

Do not forget to include the different sub-modules in the Importer class, otherwise it will not work and you will get include errors and undefined methods.

For convenience you can write following mysql extensions and put them in the importer.rb File under the Importer class, but not in the Importer class.

# custom mysql row to facilite access
class Row
  def initialize fields, values
    @fields = fields
    @values = values
  end

  def get field
    @values[@fields.index(field)]
  end
end

# Add get_row method to Mysql2::Result class
class Mysql2::Result
  def get_row index
    Row.new self.fields, self.to_a[index].to_a
  end
end

Now let's start with the first import script:

 

users_importer.rb

module UsersImporter

  # import users
  def import_users
    puts "Importing users..."
    use_old_database
    users = ActiveRecord::Base.connection.execute('
      SELECT pid, uid, firstname, lastname, mail, isfemale FROM Person
      ')
    
    use_new_database
    for i in 0...users.count do
      row = users.get_row i
      
      user = User.where(first_name: row.get("firstname"), 
                        last_name: row.get("lastname"),
                        email: row.get("mail"),
                        isfemale: row.get("isfemale"))
      unless user
        user = User.new(first_name: row.get("firstname"), 
                        last_name: row.get("lastname"),
                        email: row.get("mail"),
                        isfemale: row.get("isfemale"))
      begin
        user.save!
      rescue Exception => e
        puts "Failed to save #{row.get("firstname")} #{row.get("lastname"): #{e.message}"
      end
    end
  end

end

This sub-module contains the users importer. You can see that you have to use pure Mysql for the queries in the old database. That's because you do not have a model and cannot use the ActiveRecord ORM.

 

projects_importer.rb

module ProjectsImporter

  # import projects
  def import_projects
    puts "Importing projects..."
    use_old_database
    projects = ActiveRecord::Base.connection.execute('
      SELECT wid, headline, text_facts, text_long_facts, pub_date, hyperlink FROM work
      ')
    
    use_new_database
    for i in 0...projects.count do
      row = projects.get_row i
      
      project = Project.where(title: row.get("headline"), 
                        teaser: row.get("text_facts"),
                        published_at: row.get("pub_date"))
      unless project
        project = Project.new(title: row.get("headline"), 
                              teaser: row.get("text_facts"),
                              description: row.get("text_long_facts"),
                              published_at: row.get("pub_date"),
                              link: row.get("hyperlink"))
      begin
        project.save!
      rescue Exception => e
        puts "Failed to save #{row.get("title")} #{row.get("published_at"): #{e.message}"
      end
    end
  end

end

The projects importer works exactly like the users importer. There is one method which handles the import.

 

Conclusion

Now everything should work and you are done. I know that my example import scripts are not really heavy to program, but I just wanted to explain how you could structure such an importer. If you have a database with a lot of joins and a tricky schema your import script will be much much more complicated. But with this importer-structure you just have to add a new module which handels the import part, call the method from the module in the Importer class and you are done.

You always should keep a very close eye on the topic "Testing". You should test your importer script before starting an import on your production system, that is very important! Otherwise you can destroy your production database and that never is funny ;-)

Feel free to comment on this blogpost, I will be happy about that! If you are interested in a more difficult importer script do not hestitate to ask me.

21.03.2012
Matthias Frick
Ruby on Rails
6 Kommentare

Über den Autor

Matthias Frick
Matthias Frick, MSc.

Er ist ein langjähriger Ruby-on-Rails Entwickler und leitet das Unternehmen Frick-Web.

6 Kommentare zu "import old database in new schema with mysql and rails"

  1. Aaron Spiker
    Aaron Spiker 27.03.2012
    Your importer has a nice structure, thanks for sharing!
  2. servin
    servin 29.03.2012
    Good structure
  3. Gregory
    Gregory 28.08.2013
    Very nice article - thanks a lot! One nitpick, though, - it took me a while to figure out that use_import and use_target are supposed to be use_old_database and use_new_database, respectively. Am I correct or is it just me misinterpreting the code? Great structure, anyway!
  4. Matthias Frick
    Matthias Frick 28.08.2013
    @Gregory: Thanks for the hint! I never saw that before :-)..was a copy an paste error.. now it is fixed.
  5. Yerassyl
    Yerassyl 12.06.2015
    Why I get an error when run some rake command after adding .rake file: TypeError: no implicit conversion of Pathname into String
  6. Matthias Frick
    Matthias Frick 12.06.2015
    On which line do you get the error?

Kommentar verfassen