# Beyond Dictionary — Architecture & Design Document

**Project:** Cretorial Beyond Dictionary | 400K Words
**Version:** 1.0
**Last Updated:** 2026-05-02
**Reference Word:** `pride` (see [pride.json](pride.json))

---

## 1. Vision

Build the world's most comprehensive, modular, and creative language reference — covering 400,000 words, each entry up to 20,000+ words across 32+ specialized sections (definition, pronunciation, cultural meaning, age-tuned interpretation, creative usage, role-specific applications for writers/marketers/designers/voice artists/educators/etc.).

The system must be:
- **Independent at the section level** — every section is a standalone, citable, embeddable unit
- **Connected** — sections cross-link within a word and across words
- **Extensible** — new sections can be added years later with zero schema migration
- **Composable** — sections can be used alone, grouped, or assembled into custom views
- **Render-ready** — content carries its own display hints

---

## 2. Project Scale & Cost

### 2.1 Per-Word Entry Cost

| Component | Lean (India team / AI-assisted) | Premium (global team) |
|---|---|---|
| Lead writer/researcher (20K words) | ₹40,000 / $500 | ₹2,00,000 / $2,500 |
| Subject experts (linguist, cultural, NLP, voice) | ₹30,000 / $375 | ₹1,50,000 / $1,800 |
| Editor + copyedit + fact-check | ₹15,000 / $180 | ₹60,000 / $750 |
| Translation / multilingual + Hinglish | ₹20,000 / $250 | ₹80,000 / $1,000 |
| Illustration / visual design | ₹15,000 / $180 | ₹1,00,000 / $1,200 |
| Audio / voice artist samples | ₹10,000 / $120 | ₹50,000 / $600 |
| JSON/HTML/DOCX/XML structuring | ₹10,000 / $120 | ₹40,000 / $500 |
| **Per-word total** | **~₹1,40,000 / $1,725** | **~₹6,80,000 / $8,350** |

With AI-assisted drafting + human editing: **~₹40,000–70,000 ($500–850) per entry**.

### 2.2 Full-Project Budget (400K words)

Pure human authoring is economically unviable. The recommended **3-tier model**:

| Tier | Words | Approach | Cost/entry | Total |
|---|---|---|---|---|
| Hero | 1,000 | Full human craft | ₹1L+ | ₹10 cr / ~$1.2M |
| Standard | 10,000 | AI-drafted, human-edited | ₹3,000 | ₹3 cr / ~$360K |
| Long-tail | 389,000 | AI-generated + spot QA | ₹50 | ₹2 cr / ~$240K |

**Realistic full-project budget: ₹15–20 crore (~$1.8M–2.4M)** spread over 2–3 years, plus ₹1–2 crore for platform/tech.

---

## 3. Storage Architecture — Two Elasticsearch Indices

### 3.1 Why Two Indices

| Index | Purpose | Row Granularity | Approx. Row Count |
|---|---|---|---|
| `bd_words` | Lightweight manifest, table of contents, global metadata | One row per word | 400,000 |
| `bd_sections` | All actual content — every section is its own row | One row per section per word | ~13,000,000 |

This split delivers:
- Fast word lookup and autocomplete (only manifests scanned)
- Independent section publishing, versioning, and editing
- Cross-word queries by section type / theme / audience
- Independent translation workflow per section
- Per-section quality scoring

### 3.2 ER Diagram

```
              ┌─────────────────────────────────┐
              │  bd_words: { id: "pride" }      │  ← manifest only
              │  toc: [definition, pronoun...]  │
              └────────────┬────────────────────┘
                           │ references via wordId
       ┌───────────────────┼─────────────────────┐
       ▼                   ▼                     ▼
┌──────────────┐   ┌──────────────┐    ┌──────────────────┐
│ pride::      │   │ pride::      │    │ pride::          │
│ definition   │◄──┤ culturalMean │───►│ forWriters       │
└──────────────┘   └──────────────┘    └──────────────────┘
       ▲                                        │
       │ relations.seeAlso                      │ relations.relatedTo
       │ (cross-word link)                      ▼
┌──────────────┐                       ┌──────────────────┐
│ honor::      │                       │ pride::          │
│ definition   │                       │ creativePhrases  │
└──────────────┘                       └──────────────────┘
```

---

## 4. The Section Envelope — Universal Contract

Every section, regardless of type, wraps its content in the same envelope. **This contract is what makes sections interchangeable, independently versioned, and renderable by a single frontend pattern.**

### 4.1 Envelope Structure

```json
{
  "id": "pride::definition",
  "wordId": "pride",
  "word": "pride",
  "sectionType": "definition",
  "sectionSlug": "definition",
  "version": "1.0",
  "status": "published",
  "language": "en",
  "order": 3,

  "title": "Definition",
  "summary": "Pride is a feeling of deep satisfaction, self-worth, or esteem.",
  "icon": "book-open",
  "color": "#6B4FBB",

  "audience": ["student", "writer", "general"],
  "ageGroup": ["all"],
  "tags": ["core", "meaning"],
  "themes": ["identity", "emotion"],

  "render": {
    "layout": "card",
    "component": "DefinitionCard",
    "variant": "with-examples",
    "width": "full",
    "props": { "showNuances": true, "showExamples": true }
  },

  "relations": {
    "dependsOn":  [],
    "relatedTo":  ["pride::synonymsAntonyms", "pride::culturalMeaning"],
    "seeAlso":    ["honor::definition", "dignity::definition"],
    "partOf":     ["bundle:core"],
    "crossWord":  ["honor", "dignity", "ego"]
  },

  "body": {
    "primary": "Pride is a feeling of deep satisfaction...",
    "coreMeaning": "At its essence, pride reflects how we value ourselves...",
    "nuances": { "positive": [...], "neutral": [...], "negative": [...] },
    "examples": [...]
  },

  "blocks": [
    { "blockId": "b1", "blockType": "paragraph", "text": "..." },
    { "blockId": "b2", "blockType": "quote", "text": "...", "data": { "author": "..." } }
  ],

  "media": [
    { "mediaType": "image", "url": "...", "alt": "...", "caption": "..." },
    { "mediaType": "audio", "url": "...", "caption": "British pronunciation" }
  ],

  "searchText": "pride dignity honor self-respect ego ...",
  "qualityScore": 0.92,

  "createdAt": "2026-01-28",
  "updatedAt": "2026-05-02",
  "publishedAt": "2026-02-01",
  "author": "team-pride",
  "reviewers": ["editor1", "linguist2"]
}
```

### 4.2 Why Each Field Exists

| Field | Purpose |
|---|---|
| `id` | Globally unique. Format: `{wordId}::{sectionType}`. Citable as a URL. |
| `wordId`, `word` | Group all sections of one word. |
| `sectionType` | What kind of section this is. New types can be added freely. |
| `version`, `status` | Independent publishing lifecycle per section. |
| `language` | Each section can ship in different languages at different times. |
| `order` | Display order on the word page. |
| `title`, `summary`, `icon`, `color` | Card preview without loading the full body. |
| `audience`, `ageGroup`, `tags`, `themes` | Cross-cutting groupings for custom views. |
| `render.component` | Tells the frontend which renderer to use. |
| `render.props` | Per-render configuration (passed to the component). |
| `relations.*` | Link graph. Independent rows, connected via IDs. |
| `body` | The actual content — shape varies by section type (dynamic mapping). |
| `blocks` | Optional rich-content blocks (paragraph, quote, list, table). |
| `media` | Images, audio, video. |
| `qualityScore` | Per-section quality metric. |
| `raw` | Original JSON, queryable via flattened type. |

---

## 5. Five Grouping Dimensions

A single section row can simultaneously belong to multiple groups. None require schema changes.

| Grouping | Field | Example Query |
|---|---|---|
| By word | `wordId` | All sections of *pride* |
| By section type | `sectionType` | All `culturalMeaning` sections across 400K words |
| By audience | `audience[]` | Everything for marketers |
| By theme | `themes[]` | Everything tagged `identity` |
| By explicit bundle | `relations.partOf[]` | Curated set "PrideMonth2026" |

---

## 6. Adding New Sections Later — Zero-Migration Pattern

The mapping has these escape hatches built in:

```json
"dynamic": "true"                                    ← new top-level fields auto-map
"body":   { "type": "object", "dynamic": "true" }   ← any body shape works
"render.props": { "type": "flattened" }             ← any UI props
"relations.partOf": { "type": "keyword" }           ← any group name
"sectionType":     { "type": "keyword" }            ← any new section name
```

### 6.1 Three Steps to Add a New Section Type

**Step 1 — Write the row.** Example: a new `forTeachers` section.

```json
{
  "id": "pride::forTeachers",
  "wordId": "pride",
  "sectionType": "forTeachers",
  "title": "For Teachers",
  "audience": ["teacher", "educator"],
  "ageGroup": ["middle-school", "high-school"],
  "render": { "component": "TeacherToolkit", "layout": "tabs" },
  "relations": {
    "dependsOn": ["pride::definition"],
    "relatedTo": ["pride::culturalMeaning"],
    "partOf":    ["bundle:education-pack-2026"]
  },
  "body": {
    "lessonPlan": { ... },
    "discussionPrompts": [ ... ],
    "activities": [ ... ],
    "rubric": { ... }
  }
}
```

**Step 2 — Bulk index** via `POST /bd_sections/_bulk`. ES auto-maps any new fields. No `PUT /bd_sections/_mapping` call needed.

**Step 3 — Register the renderer** in the frontend:

```js
RENDERERS.TeacherToolkit = TeacherToolkitComponent;
```

That's the entire change. Old rows untouched. Old word pages keep working.

### 6.2 Future Sections That Will Just Work

| Future section | `sectionType` | New renderer |
|---|---|---|
| Lesson plans | `forTeachers` | `TeacherToolkit` |
| AI prompt library | `aiPromptLibrary` | `PromptGrid` |
| Podcast scripts | `forPodcasters` | `ScriptViewer` |
| Sign language | `signLanguage` | `VideoPlayer` |
| Brand voice | `forBrands` | `BrandCard` |
| Therapy / counseling | `forTherapists` | `TherapyGuide` |
| Legal usage | `forLawyers` | `CaseLawList` |
| Quizzes | `quizzes` | `QuizPlayer` |
| AR/VR experience | `immersive` | `ARLauncher` |
| Community stories | `communityStories` | `StoryFeed` |

---

## 7. Query Patterns

### 7.1 Show full word page

```json
GET /bd_sections/_search
{
  "query": { "bool": { "must": [
    { "term": { "wordId": "pride" } },
    { "term": { "status": "published" } },
    { "term": { "language": "en" } }
  ]}},
  "sort": [{ "order": "asc" }],
  "size": 100
}
```

### 7.2 Show one specific section

```bash
GET /bd_sections/_doc/pride::definition
```

### 7.3 Compare across words

```json
GET /bd_sections/_search
{
  "query": { "bool": { "must": [
    { "term":  { "sectionType": "culturalMeaning" } },
    { "terms": { "wordId": ["pride", "honor", "dignity"] } }
  ]}}
}
```

### 7.4 Build "for marketers" view across all 400K words

```json
GET /bd_sections/_search
{
  "query": { "bool": { "must": [
    { "term": { "audience": "marketer" } },
    { "term": { "status": "published" } }
  ]}}
}
```

### 7.5 Cross-word theme page (e.g. "Identity")

```json
GET /bd_sections/_search
{ "query": { "term": { "themes": "identity" } } }
```

### 7.6 Curated bundle

```json
GET /bd_sections/_search
{ "query": { "term": { "relations.partOf": "bundle:education-pack-2026" } } }
```

### 7.7 Resolve related sections

Read `relations.relatedTo[]` from a section, then bulk fetch by IDs:

```bash
GET /bd_sections/_mget
{ "ids": ["pride::synonymsAntonyms", "pride::culturalMeaning"] }
```

---

## 8. Rendering Layer

### 8.1 Section Renderer Registry (Frontend)

```js
const RENDERERS = {
  DefinitionCard,       // sectionType: definition
  PronunciationPlayer,  // sectionType: pronunciation (audio)
  AccentTable,          // sectionType: accents
  ExamplesCarousel,     // sectionType: examples
  TimelineView,         // sectionType: historyTrivia
  HashtagCloud,         // sectionType: hashtags
  WriterToolkit,        // sectionType: forWriters
  IndustryGrid,         // sectionType: industryApplications
  TeacherToolkit,       // sectionType: forTeachers (added later)
  // ...new ones added without touching anything else
};

function Section({ section }) {
  const Component = RENDERERS[section.render.component];
  return <Component {...section.body} {...section.render.props} />;
}
```

### 8.2 Section Group Registry (Static JSON, content-only)

Organizes section types into logical UI tabs. **No ES change to update.**

```json
{
  "core":      ["overview", "definition", "pronunciation", "synonymsAntonyms"],
  "culture":   ["culturalMeaning", "globalWisdom", "historyTrivia", "ageMeaning"],
  "creators":  ["forWriters", "forDesigners", "voiceArtist", "forMarketers"],
  "play":      ["funPlay", "creativePhrases", "creativeSentences", "hashtags"],
  "future":    ["forTeachers", "aiPromptLibrary", "forPodcasters"]
}
```

---

## 9. Mapping: `bd_words` (Manifest Index)

Full mapping in [es-mapping.json](es-mapping.json).

### 9.1 Field Reference

| Field | Type | Purpose |
|---|---|---|
| `word` | keyword + text + suggest | Lookup, full-text, autocomplete |
| `wordNormalized` | keyword | Lowercased, accent-stripped form |
| `language`, `languages[]` | keyword | Filter by language |
| `version` | keyword | Document version |
| `status` | keyword | `draft` / `review` / `published` |
| `tier` | keyword | `hero` / `standard` / `longtail` |
| `tags[]`, `hashtags[]` | keyword | Faceted filters |
| `wordCount`, `sectionCount`, `completeness` | numeric | Quality metrics |
| `overview`, `definition`, `pronunciation`, `synonymsAntonyms`, `industryApplications` | object/nested | Cached top-level summaries for word cards |
| `multilingual`, `hinglish` | dynamic object | Language variants |
| `sections` | dynamic object | TOC / per-section status map |
| `searchBlob` | text | Single-field full-text fallback |
| `raw` | flattened | Original JSON, queryable |

### 9.2 Settings

- 3 shards, 1 replica
- Custom analyzers: `content_analyzer` (English with stemming + stop words), `ngram_analyzer` (autocomplete), `hinglish_analyzer`
- `total_fields` limit raised to 5,000 to handle dynamic expansion

---

## 10. Mapping: `bd_sections` (Section Index)

Full mapping in [es-mapping-sections.json](es-mapping-sections.json).

### 10.1 Field Reference

| Field | Type | Purpose |
|---|---|---|
| `id` | keyword | Primary key. Format: `{wordId}::{sectionType}` |
| `wordId`, `word` | keyword | Group all sections of one word |
| `sectionType` | keyword | `definition` / `pronunciation` / `forWriters` / `forTeachers` ... |
| `sectionSlug` | keyword | URL-friendly slug |
| `version`, `status`, `language` | keyword | Independent lifecycle per section |
| `order` | integer | Display order on word page |
| `title`, `summary`, `icon`, `color` | mixed | Card preview |
| `audience[]`, `ageGroup[]`, `tags[]`, `themes[]` | keyword | Cross-cutting groupings |
| `render.layout`, `render.component`, `render.variant`, `render.width` | keyword | UI hints |
| `render.props` | flattened | Renderer-specific configuration |
| `relations.dependsOn[]`, `relatedTo[]`, `seeAlso[]`, `partOf[]`, `crossWord[]` | keyword | Link graph |
| `body` | dynamic object | Section content — shape varies by `sectionType` |
| `blocks[]` | nested | Rich content blocks (paragraph, quote, list, table) |
| `media[]` | nested | Images, audio, video |
| `searchText` | text | Concatenated full-text for relevance search |
| `qualityScore` | float | Per-section quality metric |
| `author`, `reviewers[]` | keyword | Editorial workflow |
| `createdAt`, `updatedAt`, `publishedAt` | date | Lifecycle timestamps |
| `raw` | flattened | Original JSON, queryable |

### 10.2 Settings

- 3 shards, 1 replica
- Single `content_analyzer` (English with stemming + stop words)
- Dynamic template: any new string field → text + keyword sub-field
- `total_fields` limit raised to 5,000

---

## 11. Index Creation

```bash
curl -X PUT "http://localhost:9200/bd_words" \
  -H 'Content-Type: application/json' \
  -d @es-mapping.json

curl -X PUT "http://localhost:9200/bd_sections" \
  -H 'Content-Type: application/json' \
  -d @es-mapping-sections.json
```

---

## 12. Example: One Word, One Document, ~32 Section Rows

For *pride*, indexing produces:

- 1 row in `bd_words` with `id: pride`
- 32 rows in `bd_sections`:
  - `pride::overview`
  - `pride::definition`
  - `pride::pronunciation`
  - `pride::artOfSpeech`
  - `pride::ageMeaning`
  - `pride::culturalMeaning`
  - `pride::funPlay`
  - `pride::historyTrivia`
  - `pride::wordTransformation`
  - `pride::synonymsAntonyms`
  - `pride::hashtags`
  - `pride::creativePhrases`
  - `pride::creativeSentences`
  - `pride::conversationalClips`
  - `pride::globalWisdom`
  - `pride::cultureStoryStudy`
  - `pride::sentenceTypesByAge`
  - `pride::clarifications`
  - `pride::forWriters`
  - `pride::atWork`
  - `pride::forMarketers`
  - `pride::forDesigners`
  - `pride::voiceArtist`
  - `pride::nlpExpert`
  - `pride::multilingual`
  - `pride::hinglish`
  - `pride::quickReference`
  - `pride::llosMetadata`
  - `pride::navigation`
  - `pride::industryApplications`
  - `pride::interactions`
  - `pride::meta`

---

## 13. Why This Architecture Is "World-Best Dictionary" Quality

1. **Every section is a citable URL** — `/word/pride/section/definition` works directly.
2. **Sections embed anywhere** — blog, app, voice assistant, AI agent.
3. **Per-section quality scoring** — publish a word with 10 great sections and add the rest later.
4. **Per-section translation** — a section ships in English while its Hindi version is still in review.
5. **Cross-word theme browsing** — readers explore by *concept*, not just word.
6. **AI-friendly** — agents fetch one section at a time, cheap and precise.
7. **Future-proof** — add `forTeachers`, `forPodcasters`, `aiPromptLibrary` whenever, no migration.
8. **Independent editorial workflow** — each section has its own author, reviewers, status, version.
9. **Flexible composition** — assemble custom views by audience, theme, age group, or curated bundle.
10. **Rendering decoupled from content** — `render.component` lets the UI evolve independently from the data.

---

## 14. Recommended Next Steps

1. Provision Elasticsearch cluster (3-node minimum for production).
2. Create both indices using the mappings.
3. Write the **ingest script** that explodes a word's JSON (like [pride.json](pride.json)) into 1 manifest row + N section rows. The script computes `id`, `searchText`, `wordCount`, `sectionCount`, and assigns `order`.
4. Build the frontend section renderer registry with the first 32 renderers.
5. Define the section group registry JSON (Section 8.2) for tab organization.
6. Set up an editorial workflow tool (each section can be drafted, reviewed, and published independently).
7. Begin pilot: index 10 hero words across all sections to validate end-to-end.
8. Define quality scoring rubric per `sectionType`.
9. Set up reindex strategy with index aliases (`bd_sections_v1` → alias `bd_sections`) for safe future migrations.
10. Plan multilingual rollout — start with English + Hindi + Hinglish for hero words.

---

## 15. Continuity With Previous Design

The previous iteration of this system used a flat row pattern:

```
(word, section, sequence, generated_response, word_unique_key, row_unique_key)
```

…where `generated_response` could hold any JSON shape (or any other serialization), and the system saved every kind of JSON into it.

**The new design preserves this pattern in full and adds structure around it.**

### 15.1 Field Migration Map

| Previous field | New field | Notes |
|---|---|---|
| `word` | `wordId` (and `word`) | Groups all rows belonging to a word |
| `section` | `sectionType` | Names the section (e.g. `definition`, `forWriters`) |
| `sequence` | `order` | Display ordering within the word page |
| `generated_response` (any JSON) | `body` (dynamic object) + `raw` (flattened) | Any JSON shape works — no schema migration needed |
| `unique_key for word` | `bd_words._id` (= the word slug) | Primary key per word manifest |
| `unique_key for row` | `id` = `"{wordId}::{sectionType}"` | Primary key per section row |

### 15.2 "Any JSON Works" Is Preserved

Two fields together accept any JSON shape:

| Field | Type | Behavior |
|---|---|---|
| `body` | `object`, `dynamic: true` | New keys auto-map; nested structures fully queryable |
| `raw` | `flattened` | Stores the entire original JSON as one searchable field |

This means `generated_response` from the old system maps cleanly into `body` + `raw`. Any new section type (lesson plan, podcast script, AR config, quiz definition, signed-language video metadata) drops in without changing the mapping.

### 15.3 What's Genuinely New

The new design adds — **on top of** the previous flat pattern — these capabilities:

| Addition | What it gives you |
|---|---|
| `status`, `version`, `language` | Independent publishing lifecycle per section |
| `audience[]`, `themes[]`, `tags[]`, `ageGroup[]` | Cross-cutting groupings for custom views |
| `relations.*` | Link graph between sections, within and across words |
| `render.*` | Display hints — content carries its own UI metadata |
| `blocks[]`, `media[]` | Optional rich content + media handled as first-class data |
| `qualityScore`, `author`, `reviewers[]` | Per-section editorial workflow |
| `searchText` | Single-field full-text search |
| `bd_words` manifest index | Fast lookup, autocomplete, faceted filters at scale |

### 15.4 Bottom Line

You keep the same simple primitives — **word, section, sequence, response JSON, unique keys** — and gain independent versioning, cross-cutting groupings, link graph, render hints, and editorial workflow without giving up the "save any JSON" flexibility you already had.

---

## 16. Files in This Project

| File | Purpose |
|---|---|
| [pride.json](pride.json) | Reference word entry — 32 sections, ~7,256 words |
| [pride-book.html](pride-book.html) | HTML rendering of the *pride* entry |
| [Pride Beyond Dictionary Book Reference.docx](Pride%20Beyond%20Dictionary%20Book%20Reference.docx) | DOCX reference |
| [Pride Beyond Dictionary Book Reference.xml](Pride%20Beyond%20Dictionary%20Book%20Reference.xml) | XML reference |
| [llos_word_reference_example_pride.md](llos_word_reference_example_pride.md) | Markdown spec |
| [pride_detailed_set.txt](pride_detailed_set.txt) | Detailed content set |
| [es-mapping.json](es-mapping.json) | `bd_words` index mapping |
| [es-mapping-sections.json](es-mapping-sections.json) | `bd_sections` index mapping |
| [ARCHITECTURE.md](ARCHITECTURE.md) | This document |
