Managing Inconsistent Data from Multiple APIs
Updated: 13th October 2025
Tags: programming database
When working with multiple data sources, a common approach is to save the data as-is and move on. However, you often need to edit, unify, or supplement that information, and doing it incorrectly can create major headaches.
In our case, we use data from two APIs, each stored in its own table:
items_from_api1items_from_api2
Each source may return slightly different versions of the same items. In the main products table, we store item_id_from_api1 and item_id_from_api2 to link each product to its original source data.
Initial Attempt: Unifying Data Early
I initially tried to unify item_name directly by adding a same_as field in the source tables:
id | item_name | same_as | created_at
When a slightly different spelling appeared, I would reference the main item’s id in same_as. A similar approach was used for items_from_api2.
This caused problems: some items had legacy or alternate names, and forcing them into a unified same_as mapping caused misidentifications. The key lesson: never modify source data to unify it—always store it exactly as it comes from the API.
Protecting Source Data
Sometimes we need to edit or add related items in the main products table, such as linking additional item_ids. However, we never change products columns item_id_from_api1 or item_id_from_api2, because doing so would break the link to the original API data and its relationships.
The Local Mapping Solution
To solve this, we implemented a local mapping system:
- Created a new table
items_from_localand added a columnitem_id_from_localin the mainproductstable. - Added a
local_idcolumn to bothitems_from_api1anditems_from_api2, mapping each API item to its local equivalent. - Ran a script that updates
item_id_from_localin the main table based on thelocal_idmappings.
When a new item appears in an API table, the script:
- Creates a new row in
items_from_local. - Sets the
local_idin the API table to the new local row’s ID. - Updates
item_id_from_localin the main table.
This system allows both programmatic updates and manual adjustments, while preserving the integrity of the original API data. Admins can edit, add, or delete entries in the local table. All changes are logged internally. Users can filter or search using item_id_from_local without worrying about inconsistent or misspelled source items.
Benefits of This Approach
- Source data is preserved: API tables are never modified except for storing
local_id. - Unified reference:
items_from_localmaps all variants of the same item. - Safe editing: Admins can modify
item_id_from_localwithout risking source integrity. - Reliable filtering: Users can filter or search using a single unified column.
- Robust to inconsistencies: Handles misspellings, legacy names, and API variations cleanly.
Diagram (Conceptual Flow)
┌─────────────────────────┐
│ items_from_api1 │
│ (API 1 Source) │
├─────────────────────────┤
│ id │ name │ local_id │
│ 1 │ Item 1 │ 1 │
│ 2 │ Item 2 │ 2 │
│ 3 │ Item 3 │ 3 │
│ 4 │ Item 2` │ 2 │
└─────────────────────────┘
│
▼
┌─────────────────┐
│ items_from_local│
│ (Unified Table) │
├─────────────────┤
│ id │ name │
│ 1 │ Item 1 │
│ 2 │ Item 2 │
│ 3 │ Item 3 │
└─────────────────┘
▲
│
┌─────────────────────────┐
│ items_from_api2 │
│ (API 2 Source) │
├─────────────────────────┤
│ id │ name │ local_id │
│ 1 │ Itemm 1 │ 1 │
│ 2 │ Itemm 2 │ 2 │
│ 3 │ Itemm 3 │ 3 │
│ 4 │ Itemm 2` │ 2 │
└─────────────────────────┘
│
▼
┌─────────────────┐
│ products │
│ (Main Table) │
├─────────────────┤
│ id │ item_id_from_api1 │ item_id_from_api2 │ item_id_from_local │
│ 1 │ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │ 3 │
│ 4 │ 4 │ 4 │ 2 │
└─────────────────┘