Data Migration: Process, Types, and Golden Rules to Follow
In our daily lives, moving information from one location to another is no more than a simple copy-and-paste operation. Everything gets far more complicated when it comes to transferring millions of data units into a new system.
However, many companies treat even a massive data migration as a low-level, two-clicks task. Such an initial underestimation translates to spending extra time and money. Recent studies revealed that 55 percent of data migration projects went over budget and 62 percent appeared to be harder than expected or actually failed.
How to avoid falling into the same trap? The answer lies in understanding the essentials of the data migration process, from its triggers to final phases.
If you are already familiar with theoretical aspects of the problem, you may jump to the section Data Migration Process where we give practical recommendations. Otherwise, let’s start from the most basic question: What is data migration?
What is data migration?
In general terms, data migration is the transfer of the existing historical data to new storage, system, or file format. This process is not as simple as it may sound. It involves a lot of preparation and post-migration activities including planning, creating backups, quality testing, and validation of results. The migration ends only when the old system, database, or environment is shut down.
Usually, data migration comes as a part of a larger project such as
- legacy software modernization or replacement,
- the expansion of system and storage capacities,
- the introduction of an additional system working alongside the existing application,
- the shift to a centralized database to eliminate data silos and achieve interoperability,
- moving IT infrastructure to the cloud, or
- merger and acquisition (M&A) activities when IT landscapes must be consolidated into a single system.
Data migration is sometimes confused with other processes involving massive data movements. Before we go any further, it’s important to clear up the differences between data migration, data integration, and data replication.
Data migration vs data integration
Unlike migration dealing with the company’s internal information, integration is about combining data from multiple sources outside and inside the company into a single view. It is an essential element of the data management strategy that enables connectivity between systems and gives access to the content across a wide array of subjects. Consolidated datasets are a prerequisite for accurate analysis, extracting business insights, and reporting.
Data migration is a one-way journey that ends once all the information is transported to a target location. Integration, by contrast, can be a continuous process, that involves streaming real-time data and sharing information across systems.
Data migration vs data replication
In data migration, after the data is completely transferred to a new location, you eventually abandon the old system or database. In replication, you periodically transport data to a target location, without deleting or discarding its source. So, it has a starting point, but no defined completion time.
Data replication can be a part of the data integration process. Also, it may turn into data migration — provided that the source storage is decommissioned.
Now, we’ll discuss only data migration — a one-time and one-way process of moving to a new house, leaving an old one empty.
Main types of data migration
There are six commonly used types of data migration. However, this division is not strict. A particular case of the data transfer may belong, for example, to both database and cloud migration or involve application and database migration at the same time.
Storage migration occurs when a business acquires modern technologies discarding out-of-date equipment. This entails the transportation of data from one physical medium to another or from a physical to a virtual environment. Examples of such migrations are when you move data
- from paper to digital documents,
- from hard disk drives (HDDs) to faster and more durable solid-state drives (SSDs), or
- from mainframe computers to cloud storage.
The primary reason for this shift is a pressing need for technology upgrades rather than a lack of storage space. When it comes to large-scale systems, the migration process can take years. Say, Sabre, the second-largest global distribution system (GDS), has been moving its software and data from mainframe computers to virtual servers for over a decade. Its Migration Period is expected to be entirely completed in 2023.
A database is not just a place to store data. It provides a structure to organize information in a specific way and is typically controlled via a database management system (DBMS).
So, most of the time, database migration means
- an upgrade to the latest version of DBMS (so-called homogeneous migration),
- a switch to a new DBMS from a different provider — for example, from MySQL to PostgreSQL or from Oracle to MSSQL (so-called heterogeneous migration)
The latter case is tougher than the former, especially if target and source databases support different data structures. It makes the task still more challenging when you have to move data from legacy databases — like Adabas, IMS, or IDMS.
When a company changes an enterprise software vendor — for instance, a hotel implements a new property management system or a hospital replaces its legacy EHR system — this requires moving data from one computing environment to another. The key challenge here is that old and new infrastructures may have unique data models and work with different data formats.
Data center migration
A data center is a physical infrastructure used by organizations to keep their critical applications and data. Put more precisely, it’s the very dark room with servers, networks, switches, and other IT equipment. So, data center migration can mean different things: from relocation of existing computers and wires to other premises to moving all digital assets, including data and business applications to new servers and storages.
Business process migration
This type of migration is driven by mergers and acquisitions, business optimization, or reorganization to address competitive challenges or enter new markets. All these changes may require the transfer of business applications and databases with data on customers, products, and operations to the new environment.
Cloud migration is a popular term that embraces all the above-mentioned cases, if they involve moving data from on-premises to the cloud or between different cloud environments. Gartner expects that by 2024 the cloud will attract over 45 percent of IT spending and dominate ever-growing numbers of IT decisions.
Depending on volumes of data and differences between source and target locations, migration can take from some 30 minutes to months and even years. The complexity of the project and the cost of downtime will define how exactly to unwrap the process.
Approaches to data migration
Choosing the right approach to migration is the first step to ensure that the project will run smoothly, with no severe delays.
Big bang data migration
Advantages: less costly, less complex, takes less time, all changes happen once Disadvantages: a high risk of expensive failure, requires downtime
In a big bang scenario, you move all data assets from source to target environment in one operation, within a relatively short time window.
Systems are down and unavailable for users so long as data moves and undergoes transformations to meet the requirements of a target infrastructure. The migration is typically executed during a legal holiday or weekend when customers presumably don’t use the application.
The big bang approach allows you to complete migration in the shortest possible time and saves the hassle of working across the old and new systems simultaneously. However, in the era of Big Data, even midsize companies accumulate huge volumes of information while the throughput of networks and API gateways is not endless. This constraint must be considered from the start.
Verdict. The big bang approach fits small companies or businesses working with small amounts of data. It doesn’t work for mission-critical applications that must be available 24/7.
Trickle data migration
Advantages: less prone to unexpected failures, zero downtime required Disadvantages: more expensive, takes more time, needs extra efforts and resources to keep two systems running
Also known as a phased or iterative migration, this approach brings Agile experience to data transfer. It breaks down the entire process into sub-migrations, each with its own goals, timelines, scope, and quality checks.
Trickle migration involves parallel running of the old and new systems and transferring data in small increments. As a result, you take advantage of zero downtime and your customers are happy because of the 24/7 application availability.
On the dark side, the iterative strategy takes much more time and adds complexity to the project. Your migration team must track which data has been already transported and ensure that users can switch between two systems to access the required information.
Another way to perform trickle migration is to keep the old application entirely operational until the end of the migration. As a result, your clients will use the old system as usual and switch to the new application only when all data is successfully loaded to the target environment.
However, this scenario doesn’t make things easier for your engineers. They have to make sure that data is synchronized in real time across two platforms once it is created or changed. In other words, any changes in the source system must trigger updates in the target system.
Verdict. Trickle migration is the right choice for medium and large enterprises that can’t afford long downtime but have enough expertise to face technological challenges.
Data migration process
No matter the approach, the data migration project goes through the same key phases — namely
- data auditing and profiling,
- data backup,
- migration design,
- testing, and
- post-migration audit.
Below, we’ll outline what you should do at each phase to transfer your data to a new location without losses, extansive delays, or/and ruinous budget overrun.
Planning: create a data migration plan and stick to it
Data migration is a complex process, and it starts with the evaluation of existing data assets and careful designing of a migration plan. The planning stage can be divided into four steps.
Step 1 — refine the scope. The key goal of this step is to filter out any excess data and to define the smallest amount of information required to run the system effectively. So, you need to perform a high-level analysis of source and target systems, in consultation with data users who will be directly impacted by the upcoming changes.
Step 2 — assess source and target systems. A migration plan should include a thorough assessment of the current system’s operational requirements and how they can be adapted to the new environment.
Step 3 — set data standards. This will allow your team to spot problem areas across each phase of the migration process and avoid unexpected issues at the post-migration stage.
Step 4 — estimate budget and set realistic timelines. After the scope is refined and systems are evaluated, it’s easier to select the approach (big bang or trickle), estimate resources needed for the project, set schedules, and deadlines. According to Oracle estimations, an enterprise-scale data migration project lasts six months to two years on average.
Data auditing and profiling: employ digital tools
This stage is for examining and cleansing the full scope of data to be migrated. It aims at detecting possible conflicts, identifying data quality issues, and eliminating duplications and anomalies prior to the migration.
Auditing and profiling are tedious, time-consuming, and labor-intensive activities, so in large projects, automation tools should be employed. Among popular solutions are Open Studio for Data Quality, Data Ladder, SAS Data Quality, Informatica Data Quality, and IBM InfoSphere QualityStage, to name a few.
Data backup: protect your content before moving it
Technically, this stage is not mandatory. However, best practices of data migration dictate the creation of a full backup of the content you plan to move — before executing the actual migration. As a result, you’ll get an extra layer of protection in the event of unexpected migration failures and data losses.
Migration design: hire an ETL specialist
The migration design specifies migration and testing rules, clarifies acceptance criteria, and assigns roles and responsibilities across the migration team members.
Though several technologies can be used for data migration, extract, transform, and load (ETL) is the preferred one. It makes sense to hire an ETL developer — or a dedicated software engineer with deep expertise in ETL processes, especially if your project deals with large data volumes and complex data flow.
At this phase, ETL developers or data engineers create scripts for data transition or choose and customize third-party ETL tools. An integral part of ETL is data mapping. In the ideal scenario, it involves not only an ETL developer, but also a system analyst knowing both source and target system, and a business analyst who understands the value of data to be moved.
The duration of this stage depends mainly on the time needed to write scripts for ETL procedures or to acquire appropriate automation tools. If all required software is in place and you only have to customize it, migration design will take a few weeks. Otherwise, it may span a few months.
Execution: focus on business goals and customer satisfaction
This is when migration — or data extraction, transformation, and loading — actually happens. In the big bang scenario, it will last no more than a couple of days. Alternatively, if data is transferred in trickles, execution will take much longer but, as we mentioned before, with zero downtime and the lowest possible risk of critical failures.
If you’ve chosen a phased approach, make sure that migration activities don’t hinder usual system operations. Besides, your migration team must communicate with business units to refine when each sub-migration is to be rolled out and to which group of users.
Data migration testing: check data quality across phases
In fact, testing is not a separate phase, as it is carried out across the design, execution, and post-migration phases. If you have taken a trickle approach, you should test each portion of migrated data to fix problems in a timely manner.
Frequent testing ensures the safe transit of data elements and their high quality and congruence with requirements when entering the target infrastructure. You may learn more about the details of testing the ETL process from our dedicated article.
Post-migration audit: validate results with key clients
Before launching migrated data in production, results should be validated with key business users. This stage ensures that information has been correctly transported and logged. After a post-migration audit, the old system can be retired.
Golden rules of data migration
While each data migration project is unique and presents its own challenges, some common golden rules may help companies safely transit their valuable data assets, avoiding critical delays.
- Use data migration as an opportunity to reveal and fix data quality issues. Set high standards to improve data and metadata as you migrate them.
- Hire data migration specialists and assign a dedicated migration team to run the project.
- Minimize the amount of data to be migrated.
- Profile all source data before writing mapping scripts.
- Allocate considerable time to the design phase as it has a high impact on project success.
- Don’t be in a hurry to switch off the old platform. Sometimes, the first attempt of data migration fails, demanding rollback and another try.
Data migration is often viewed as a necessary evil rather than a value-adding process. And this seems to be the key root of many if not all difficulties. Considering migration an important innovation project worthy of special focus is half the battle won.
Originally published at AltexSoft tech blog “Data Migration: Process, Types, and Golden Rules to Follow”