[En] Backend-side translations

Sep 11, 2023

Preface
Example
- “Food cuisines” in Talabat.
- Pre-translations ERD.
- Pre-translations API.
Scope
- Read-side only.
- Enum-like database tables.
ERD
- Article.
- Weakness.
API design
- Final API design.
- Different original/default languages.
API design - other options for `lang` input
- Custom header. E.g., `X-lang`.
- Accept-Language header.
API design - other options
- Returning all translations for a food cuisine.
- Returning all `TextContent`s, with their translations.
C# automatic implementation with EF Core & AutoMapper
- Problem.
- Solution - AutoMapper custom converter.
- Solution - AutoInclude.
Final words

Preface

Anyone who’s worked on the mobile side, or even the frontend, has a standard solution for handling translations.

For example, in Flutter, you’d have a `translations` folder, along with, e.g., `en.json` and `ar.json`, and so on.

But what about the backend?

What if you have dynamic data that you want to store on the backend side such that you don’t need to upload a new app version — and wait weeks or months until users update, every time you want to edit this dynamic data?

Example

(1) “Food cuisines” in Talabat.

When you search for a restaurant in Talabat, you have the option of filtering by “food cuisine”.

Most probably, or certainly, the list of food cuisines is coming from the backend and needs to be in multiple languages.

(2) Pre-translations ERD

Before the ERD supporting translations, the database could’ve looked like this:

| Id | Name       |
|----|------------|
| 1  | "Egyptian" |
| 2  | "Syrian"   |
| 3  | "Bakery"   |

(3) Pre-translations API

Before the API supporting translations, the API could’ve looked like this:

GET /api/v1/food-cuisines
Response:
[
    {
        "Id": 1,
        "Name": "Egyptian"
    },
    {
        "Id": 2,
        "Name": "Syrian"
    },
    {
        "Id": 3,
        "Name": "Bakery"
    }
]

(Of course, this is only an example. I don’t know what Talabat ERD or API looks or looked like)

Scope

(1) Read-side only.

This post will focus only on the read-side of things.

That is, I won’t talk about creating, updating, or deleting food cuisines and their translations.

(2) Enum-like database tables.

The focus of this post is also enum-like database tables. E.g., currencies, and so on.

I’m not sure if the same design would apply to typical database tables that hold user-generated data. But the design took into consideration only enum-like database tables.

Also, that’s why full CRUD is not included. Table data, in my case, was seeded with a migration script (regardless of whether this is optimal or not), and thus I didn’t need to think of CRUD APIs.

ERD

(1) Article.

I don’t have anything to add beside what’s in this article.

I only skimmed the article. But when I was designing the ERD, I remember going through the same designs mentioned, and ending up also with the last one, ‘Translation subschema’.

(2) Weakness.

I actually have one thing to add:

Note that this ERD has one weakness: It allows for having a `Translation` with the same language as the original or default language of a text content, creating a possible conflict.

For example, it’s easy to disallow multiple translations for the same language by adding a composite unique constraint on `Translations(TextContextId, LanguageId)`:

Won't be allowed:

| Id | TextContentId            | LanguageId | Translation    |
|----|--------------------------|------------|----------------|
| 1  | 1 (chinese_food_cuisine) | 1 (en)     | "Chinese"      |
| 2  | 1 (chinese_food_cuisine) | 2 (ar)     | "صيني"         |
| 3  | 1 (chinese_food_cuisine) | 1 (en)     | "Chinese food" |

However, there’s no way to prevent this:

| Id | OriginalText   | OriginalLanguageId |
|----|----------------|--------------------|
| 1  | "Chinese food" | 1 (en)             |


| Id | TextContentId            | LanguageId | Translation    |
|----|--------------------------|------------|----------------|
| 1  | 1 (chinese_food_cuisine) | 1 (en)     | "Chinese"      |
| 2  | 1 (chinese_food_cuisine) | 2 (ar)     | "صيني"         |

In the case above, there’s a conflict in case of English: Should it be “Chinese” or “Chinese food”?

I didn’t pursue things further to see how I’d update the ERD to prevent this case.

This is just something to be aware of when seeding the data, to make sure there’s no translation for a text content with the same language as the original-text language.

API design

(1) Final API design

To cut things short, I went with this API design:

GET /api/v1/food-cuisines[?lang=en|ar|...]

GET /api/v1/food-cuisines?lang=en (or without `?lang=en`)
Response:
[
    {
        "Id": 1,
        "Name": "Egyptian"
    },
    {
        "Id": 2,
        "Name": "Syrian"
    },
    {
        "Id": 3,
        "Name": "Bakery"
    }
]

GET /api/v1/food-cuisines?lang=ar
Response:
[
    {
        "Id": 1,
        "Name": "مصري"
    },
    {
        "Id": 2,
        "Name": "سوري"
    },
    {
        "Id": 3,
        "Name": "مخبز"
    }
]

(2) Different original/default languages.

If `lang` query param is missing, and if the food cuisines each has a different original/default language, then it’s possible that the list of food cuisines returned by the API do not have the same language, which is not a good experience. E.g.,:

GET /api/v1/food-cuisines
Response:
[
    {
        "Id": 1,
        "Name": "Egyptian"
    },
    {
        "Id": 2,
        "Name": "سوري"
    },
    {
        "Id": 3,
        "Name": "Bakery"
    }
]

One possible, simple solution is to just make sure that all food cuisines, or all text contents in general, have the same original/default language in the database, when seeding the data.

For example, to make all rows in `TextContents` table have the ID of English in `OriginalLanguageId`, and to have the English text as the `OriginalText`.

There are other, possibly better, solutions, of course. E.g., to default the `lang` query param to `en` (or something else), when missing.

API design - other options for `lang` input

(1) Custom header. E.g., `X-lang`.

I have no opinion on custom header vs query param for the `lang` input.

I went with query param by deferring to authority: I checked Talabat’s API, and it seems to use the query-param option.

Anything to break the analysis paralysis 😁.

(2) Accept-Language header.

This is quite the controversial point.

On one hand, it’s a standard. On the other, (a) it has bad sentiments on the web due to device fingerprinting (although can be counter-argued against), and (b) being a full-featured tool for content negotiation, it could be more complicated than you need and you might end up being not compliant (not sure, though).

I preferred simplicity, and that’s why I went for the custom query param or header.

If things get complicated in the future, one could use `Accept-Language` header or take inspirations from its design.

API design - other options

I don’t think either of these makes sense, at least for the problem at hand. But mentioning them for completeness:

(1) Returning all translations for a food cuisine.

Instead of taking in a language input, you could’ve just returned all translations. E.g.,:

GET /api/v1/food-cuisines
Response:
[
    {
        "Id": 1,
        "Name": { "en": "Egyptian", "ar": "مصري" }
    },
    {
        "Id": 2,
        "Name": { "en": "Syrian", "ar": "سوري" }
    },
    {
        "Id": 3,
        "Name": { "en": "Bakery", "ar": "مخبز" }
    }
]

(2) Returning all `TextContent`s, with their translations.

You could’ve also created a separate endpoint that returns what would the be the equivalent of `en.json` and `ar.json` in Flutter, and call it on app start.

I see totally no reason to go with this option, though, for real :D

C# automatic implementation with EF Core & AutoMapper

I hope a video on this next week isA, although possibly in Arabic. But in the meantime:

(1) Problem.

Implementing the API design above, of taking in `lang=en|ar`, and returning the corresponding translation, would pollute `FoodCuisinesService` with extra logic it’s not interested in.

Even worse, this logic would need to be duplicated everywhere we’re retrieving a `FoodCuisine`, or for new tables that have translations as well.

(2) Solution - AutoMapper custom converter.

AutoMapper’s `ITypeConverter` can use dependency injection and access the current http context.

So, the solution is to create an `ITypeConverter<TextContent, string>` that would convert a `TextContent` (see ERD above) to a string, based on the `lang` query param.

E.g., the type converter would convert:

// Entity classes - pseudocode

class FoodCuisine {
  long Id;
  long NameId;
  TextContent Name;
}

class TextContent {
  long Id;
  long OriginalLanguageId;
  string OriginalText;
  List<Translation> Translations;
}

class Language {...}
class Translation {...}

…into:

record FoodCuisineDTO(long Id, string Name);

…picking the right string for `Name` from the `TextContent.OriginalText` and `TextContent.Translations`, based on the `lang` query param in the current http context.

(3) Solution - AutoInclude.

We didn’t go through the implementation of the custom converter.

But one note would be that we’d better add `AutoInclude` configuration for `FoodCuisines.TextContent`, `TextContent.OriginalLanguage`, `TextContent.Translations`, and `Translation.OriginalLanguage` in our context’s `OnModelCreating`.

This part should be clearer if I end up making the video next week, isA.

Final words

If you’ve faced this problem and designed or implemented it differently, or have any advice or opinion in general, I’d love to hear it!

Thank you for reading :)

Hisham Ramadan

Sep 11, 2023Edited

I work for Talabat and I can relate to this problem :)

I’d say the “cuisines” example & even the whole article you mentioned above is based on 3 assumptions. First, the amount of data is small & static. Second, the data is not frequently updated. Third, We need to support so many languages.

Let’s be pragmatic, how many times anyone wanted to build a software that needs to support more than 3-4 languages? I don’t think it’s a big number. Adding to that, what if the data is huge and denormalization is required to reduce number of unnecessary joins.

I haven’t tried the second approach (to have single column with json format of all translations) but I thought of it as the best approach here, for two simple reasons:

1. Doesn’t require db schema change, which is the drawback of the first approach, whenever you have a new language to add, or when you need say 3 fields, you won’t need to add 3*NumberOfLangauges columns.

2. Doesn’t require at least two unnecessary joins just to fetch a translation of just one field. What if I have 5 fields? Shall I do 10 joins in a single query?

The drawback mentioned in the article is not actually a drawback. The most famous relational database engines now support json fields efficiently. If you relying on an old db version, that’s another problem :)

Expand full comment

1 reply by Hossam El-Deen

1 more comment...

Hossam El-Deen