Module C: Semantic Mapping & Linkage¶

Example User Question for Pihu (Reminder):*

"Compare the overall liking of SLHB5 and SLHB8 from the Signature Latte Cold V2 study. Did SLHB8 meet the benchmark, and what were the main aroma notes for it among millennials?"

How Module C helps answer our example question:

Product Identification:
- Pihu needs to map "SLHB5" and "SLHB8" to their actual product_ids within the "Signature Latte Cold V2 study" (let's assume its collaboration_id is COLLAB_SLC_V2_JAN24).
- It also needs to know if "Signature Latte Cold V2 study" is a user-friendly name for COLLAB_SLC_V2_JAN24.
Attribute-to-Question Mapping:
- To get "overall liking," Pihu needs to know which question_id (QID) in the COLLAB_SLC_V2_JAN24 study corresponds to the canonical attribute ATTR_LIKING_OVERALL_001 (from Module A) and which database column holds the score.
- Similarly, for "main aroma notes," Pihu needs to find the QID for aroma CATA questions in that study and the column for selected_option_text.
Demographic Filter Mapping:
- To filter for "millennials," Pihu needs to know how the term "millennials" maps to a specific filter condition on the respondent demographics table (e.g., generation column = 'Millennials' or 'Gen Y', or a birth year range).
Historical Linking (if the question involved trends):
- If the question was "How does SLHB8 compare to its previous version?", Module C would help link SLHB8 in COLLAB_SLC_V2_JAN24 to its V1 counterpart in a previous study. (Our current example doesn't explicitly ask for this, but it's a key function of this module).

Module C: Semantic Mapping & Linkage¶

Goal: This module acts as Pihu's "GPS" for navigating TagTaste's data. It links:

User-friendly product names (like "SLHB5") and study names (like "Signature Latte Cold V2 study") to their actual IDs in the database.
General sensory concepts (like "overall liking" or "aroma notes" from Module A) to the specific questions (QIDs) and response options used to measure them in each particular study.
User-friendly demographic terms (like "millennials") to the way this information is stored and can be filtered in the database.

This is crucial because product codes, question IDs, and even the exact list of aroma notes can change from one study to another. Module C keeps track of these specifics.

1) Text-Based Structure with Actual Examples (Relevant to the User Question)¶

A. Product Mapping (Per Collaboration) (Tells Pihu what "SLHB5", "SLHB8", and "Signature Latte Cold V2 study" actually mean in database terms)

Mapping Type: PER_COLLABORATION_ALIAS
Collaboration ID (Database ID for the study): COLLAB_SLC_V2_JAN24
User-Facing Collaboration Name(s) (How users might refer to the study):
- Signature Latte Cold V2 study
- SigLatte V2 Jan 2024
Status: Active
Products Tested in this Collaboration:
- Product 1:
  - Database Product ID (used in raw data for this study): 1497
  - User-Facing Name: Sample B8
  - Internal Code / Alias 1: SLHB8
  - Description: Test variant B of Signature Latte Cold, Version 2.
- Product 2:
  - Database Product ID (for this study): 1498
  - User-Facing Name (for this study): Sample B5
  - Internal Code / Alias 1: SLHB5
  - Description: Test variant A of Signature Latte Cold, Version 2.
Notes for Pihu: For queries mentioning "Signature Latte Cold V2 study" (or its aliases), use collaboration ID COLLAB_SLC_V2_JAN24. Within this study, SLHB5 is product 1498, and SLHB8 is product 1497.
Created By: project_manager_jane
Created At: 2024-01-15T09:00:00Z

B. Attribute-to-Question Mapping (Per Attribute, Per Collaboration) (Tells Pihu how "overall liking" and "aroma notes" were measured in the "Signature Latte Cold V2 study")

Mapping for: Overall Product Liking

Mapping ID: AQM_LIKING_OVERALL_SLCV2
Canonical Attribute ID (from Module A): ATTR_LIKING_OVERALL_001 (Overall Product Liking)
Status: Active
Details for this specific study:
- Collaboration ID: COLLAB_SLC_V2_JAN24
- Database Question ID (QID in raw data): 43449
- Question Text (as asked in this study): Overall, how much do you LIKE or DISLIKE this product?
- Section Name (in this study's questionnaire): PRODUCT EXPERIENCE
- Scale Type Used: 9-point Hedonic
- Response Value Column (in database where score is stored): option_value_numeric
- Response Options (Scale Anchors for this QID, if different from Module A standard, or for clarity):
  - 1: Dislike Extremely
  - 5: Neither Like nor Dislike
  - 9: Like Extremely (Note: If these are identical to the typical anchors in Module A for ATTR_LIKING_OVERALL_001, this sub-field could be optional here, but explicit is often better.)
- Is this the Primary Way "Overall Liking" was measured in this study? Yes
Created By: data_manager_doe
Created At: 2024-01-16T10:00:00Z

Mapping for: Aroma Notes (CATA)

Mapping ID: AQM_AROMA_NOTES_CATA_SLCV2
Canonical Attribute ID (from Module A): ATTR_AROMA_NOTE_CATA_001 (Aroma Note CATA)
Status: Active
Details for this specific study:
- Collaboration ID: COLLAB_SLC_V2_JAN24
- Database Question ID (QID in raw data): 43417
- Question Text (as asked in this study): Which of these sensations or feelings are more pronounced than the others in this Coffee? (Assuming this question presented the CATA list for aromas)
- Section Name (in this study's questionnaire): AROMA
- Scale Type Used: CATA (Check-All-That-Apply) / Multiple Choice
- Response Value Column (where selected aroma terms are stored): selected_option_text (or option_value_text)
- Response Options (The actual list of aroma notes presented to panelists for QID 43417 in this study):
  - Coffee-Roast
  - Dairy
  - Sugary
  - Cocoa
  - Toasted
  - Nutty
  - Ash/Smokey
  - Over Roasted Coffee
  - Burnt
  - Smokey
- Is this the Primary Way "Aroma Notes" were captured in this study? Yes
Notes for Pihu: To find "main aroma notes" for SLHB8 in COLLAB_SLC_V2_JAN24, count the frequency of each of these listed 'Response Options' selected for QID 43417 in the AROMA section.
Created By: data_manager_doe
Created At: 2024-01-16T10:05:00Z

C. Demographic Segment Mapping (Tells Pihu how to filter data for "millennials")

Segment Map ID: DSM_GENERATION_MILLENNIALS_001
User Term (what the user says): Millennials
Status: Active
Description: Defines the Millennial / Gen Y cohort.
Database Filter Logic (how to find them in the data):
- Table to Filter: Respondents
- Column in Table: generation
- Operator: EQUALS
- Value in Database: Gen Y (This must match exactly how it's stored, e.g., 'Gen Y', 'Millennial', or a birth year range)
- Value Data Type: STRING
Notes for Pihu: When a user query includes "millennials", apply a filter where the 'generation' column in the Respondents table is 'Gen Y'.
Created By: analyst_smith
Created At: 2023-05-10T11:00:00Z

2) Actual Data Example in JSON (for Module C - focusing on relevant mappings)¶

This JSON file would store all these mappings.

{
  "moduleName": "Semantic Mapping & Linkage",
  "version": "1.2",
  "lastUpdated": "2024-01-22T14:30:00Z",
  "mappings": {
    "productMappings": [
      {
        "mappingType": "PER_COLLABORATION_ALIAS",
        "collaborationId": "COLLAB_SLC_V2_JAN24",
        "userFacingCollaborationNames": ["Signature Latte Cold V2 study", "SigLatte V2 Jan 2024"],
        "productsInCollaboration": [
          {
            "databaseProductIdForCollab": "1497",
            "userFacingName": "Sample B8",
            "internalCode": "SLHB8",
            "aliases": ["Signature Latte Cold - Variant B"],
            "description": "Test variant B of Signature Latte Cold, V2."
          },
          {
            "databaseProductIdForCollab": "1498",
            "userFacingName": "Sample B5",
            "internalCode": "SLHB5",
            "aliases": ["Signature Latte Cold - Variant A"],
            "description": "Test variant A of Signature Latte Cold, V2."
          }
        ],
        "status": "Active",
        "notesForPihu": "Maps user-friendly names to DB IDs.",
        "createdBy": "project_manager_jane",
        "createdAt": "2024-01-15T09:00:00Z",
        "lastModifiedBy": "project_manager_jane",
        "lastModifiedAt": "2024-01-15T09:00:00Z"
      }
      // Add other product mappings for other studies, or PRODUCT_CONCEPT_HISTORICAL mappings as needed.
    ],
    "attributeQuestionMappings": [
      {
        "mappingId": "AQM_LIKING_OVERALL_SLCV2",
        "canonicalAttributeId": "ATTR_LIKING_OVERALL_001",
        "status": "Active",
        "description": "Mapping for 'Overall Product Liking' in SLC V2 study.",
        "mappingsPerCollaborationInstance": [
          {
            "collaborationId": "COLLAB_SLC_V2_JAN24",
            "databaseQuestionId": "43449",
            "questionTextAsInCollab": "Overall, how much do you LIKE or DISLIKE this product?",
            "sectionNameAsInCollab": "PRODUCT EXPERIENCE",
            "scaleTypeUsed": "9-point Hedonic",
            "responseValueColumn": "option_value_numeric",
            "responseOptions": [ // Explicitly listing anchors for clarity, though could be optional if standard
                {"value": 1, "label": "Dislike Extremely"},
                {"value": 5, "label": "Neither Like nor Dislike"},
                {"value": 9, "label": "Like Extremely"}
            ],
            "isPrimaryMeasure": true
          }
          // Add mappings for how ATTR_LIKING_OVERALL_001 was measured in OTHER studies here
        ],
        "notesForPihu": "Use QID 43449 and option_value_numeric for overall liking in COLLAB_SLC_V2_JAN24.",
        "createdBy": "data_manager_doe",
        "createdAt": "2024-01-16T10:00:00Z",
        "lastModifiedBy": "data_manager_doe",
        "lastModifiedAt": "2024-01-16T10:00:00Z"
      },
      {
        "mappingId": "AQM_AROMA_NOTES_CATA_SLCV2",
        "canonicalAttributeId": "ATTR_AROMA_NOTE_CATA_001",
        "status": "Active",
        "description": "Mapping for Aroma Notes (CATA) in SLC V2 study.",
        "mappingsPerCollaborationInstance": [
          {
            "collaborationId": "COLLAB_SLC_V2_JAN24",
            "databaseQuestionId": "43417",
            "questionTextAsInCollab": "Which of these sensations or feelings are more pronounced than the others in this Coffee?",
            "sectionNameAsInCollab": "AROMA",
            "scaleTypeUsed": "CATA / Multiple Choice",
            "responseValueColumn": "selected_option_text",
            "responseOptions": [
                "Coffee-Roast", "Dairy", "Sugary", "Cocoa", "Toasted",
                "Nutty", "Ash/Smokey", "Over Roasted Coffee", "Burnt", "Smokey"
                // This list MUST match the actual options for QID 43417 in this study
            ],
            "isPrimaryMeasure": true
          }
          // Add mappings for how ATTR_AROMA_NOTE_CATA_001 was measured in OTHER studies here
        ],
        "notesForPihu": "For main aroma notes in COLLAB_SLC_V2_JAN24, count frequencies of these responseOptions for QID 43417.",
        "createdBy": "data_manager_doe",
        "createdAt": "2024-01-16T10:05:00Z",
        "lastModifiedBy": "data_manager_doe",
        "lastModifiedAt": "2024-01-22T14:30:00Z" // Updated timestamp
      }
    ],
    "demographicSegmentMappings": [
      {
        "segmentMapId": "DSM_GENERATION_MILLENNIALS_001",
        "userTerm": "Millennials",
        "status": "Active",
        "description": "Defines Millennials (Gen Y).",
        "databaseFilterLogic": {
          "condition": "AND",
          "rules": [
            {
              "respondentDataTableColumn": "generation",
              "operator": "EQUALS",
              "value": "Gen Y",
              "valueDataType": "STRING"
            }
          ]
        },
        "notesForPihu": "Filter for Millennials/Gen Y respondents.",
        "createdBy": "analyst_smith",
        "createdAt": "2023-05-10T11:00:00Z",
        "lastModifiedBy": "analyst_smith",
        "lastModifiedAt": "2023-05-10T11:00:00Z"
      }
      // Add other demographic mappings (Gen Z, Experts, Delhi NCR users, etc.)
    ]
  }
}

3) JSON Structure (for Module C - Simplified Comments)¶

{
  "moduleName": "Semantic Mapping & Linkage",
  "version": "1.0",
  "lastUpdated": "YYYY-MM-DDTHH:MM:SSZ",
  "mappings": {
    "productMappings": [
      {
        "mappingType": "ENUM_STRING", // "PER_COLLABORATION_ALIAS" or "PRODUCT_CONCEPT_HISTORICAL"
        // --- For PER_COLLABORATION_ALIAS ---
        "collaborationId": "NULLABLE_STRING_COLLAB_ID", // DB ID of the study
        "userFacingCollaborationNames": ["STRING"],     // How users might name the study
        "productsInCollaboration": [
          {
            "databaseProductIdForCollab": "STRING_PRODUCT_ID_IN_DB", // Product's ID in this specific study's data
            "userFacingName": "STRING",                             // Name used in this study
            "internalCode": "NULLABLE_STRING",                      // e.g., SLHB5, SLHB8
            "aliases": ["STRING_ALIAS_1"]                           // Other names for this product
          }
        ],
        // --- For PRODUCT_CONCEPT_HISTORICAL ---
        "productConceptId": "NULLABLE_STRING_UNIQUE_ID_CONCEPT", // Links the same product idea across different studies/versions
        "productConceptName": "NULLABLE_STRING",
        "instances": [ // Each time this product concept was tested
          {
            "instanceCollaborationId": "STRING_COLLAB_ID",
            "instanceDatabaseProductId": "STRING_PRODUCT_ID_IN_DB",
            "instanceUserFacingName": "STRING",
            "versionNotes": "TEXT_AREA_STRING" // e.g., "V1 - Original recipe"
          }
        ],
        // --- Common fields ---
        "status": "ENUM_STRING", // "Active", "Archived"
        "notesForPihu": "TEXT_AREA_STRING",
        "createdBy": "STRING_USER_ID",
        "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
        "lastModifiedBy": "STRING_USER_ID",
        "lastModifiedAt": "YYYY-MM-DDTHH:MM:SSZ"
      }
    ],
    "attributeQuestionMappings": [ // Links general attributes (from Module A) to specific questions in studies
      {
        "mappingId": "STRING_UNIQUE_ID_AQM",
        "canonicalAttributeId": "STRING_UNIQUE_ID_ATTR", // The general attribute ID from Module A
        "status": "ENUM_STRING",
        "mappingsPerCollaborationInstance": [ // How this attribute was measured in each specific study
          {
            "collaborationId": "STRING_COLLAB_ID",            // The study ID
            "databaseQuestionId": "STRING_QID_IN_DB",         // The Question ID from the raw data for this study
            "questionTextAsInCollab": "TEXT_AREA_STRING",     // The exact question wording in this study
            "sectionNameAsInCollab": "NULLABLE_STRING",       // e.g., "AROMA", "TASTE"
            "scaleTypeUsed": "STRING",                        // e.g., "9-point Hedonic", "CATA"
            "responseValueColumn": "STRING_DB_COLUMN_NAME",   // DB column where the answer/score is stored
            "responseOptions": [                              // NEW: List of actual choices/anchors for this QID in this study
                // For CATA/MCQ: "Option 1", "Option 2"
                // For Scales (if specific): { "value": 1, "label": "Anchor 1" }
            ],
            "isPrimaryMeasure": "BOOLEAN"                     // Was this the main way this attribute was measured?
          }
        ],
        "notesForPihu": "TEXT_AREA_STRING",
        "createdBy": "STRING_USER_ID",
        "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
        "lastModifiedBy": "STRING_USER_ID",
        "lastModifiedAt": "YYYY-MM-DDTHH:MM:SSZ"
      }
    ],
    "demographicSegmentMappings": [ // Links user terms for groups (e.g., "Millennials") to DB filters
      {
        "segmentMapId": "STRING_UNIQUE_ID_DSM",
        "userTerm": "STRING", // What the user says, e.g., "Millennials"
        "status": "ENUM_STRING",
        "databaseFilterLogic": { // How to filter the respondent data
          "condition": "ENUM_STRING_LOGICAL", // "AND", "OR"
          "rules": [
            {
              "respondentDataTableColumn": "STRING_DB_COLUMN_NAME", // Column in respondent demographics table
              "operator": "ENUM_STRING_OPERATOR",                   // e.g., "EQUALS", "GREATER_THAN"
              "value": "ANY_PRIMITIVE_OR_ARRAY",                    // Value to filter by
              "valueDataType": "ENUM_STRING_DATATYPE"               // "STRING", "NUMBER"
            }
          ]
        },
        "notesForPihu": "TEXT_AREA_STRING",
        "createdBy": "STRING_USER_ID",
        "createdAt": "YYYY-MM-DDTHH:MM:SSZ",
        "lastModifiedBy": "STRING_USER_ID",
        "lastModifiedAt": "YYYY-MM-DDTHH:MM:SSZ"
      }
    ]
  }
}

Next Steps for Your Team (for Module C):

Project/Data Management Team (Crucial Task):
- For every new collaboration/study that Pihu needs to access:
  - Create/update a PER_COLLABORATION_ALIAS entry. This involves listing all products tested, their database IDs for that specific study, and all known user-facing names or codes (like SLHB5, Sample B8).
  - For every canonical attribute (from Module A) measured in that study, create/update an attributeQuestionMappings entry. This means finding the exact databaseQuestionId (QID), the sectionName, and the responseValueColumn used in that study's raw data to measure that attribute. This is the most labor-intensive but absolutely vital part for Pihu to work correctly.
- If historical tracking is needed, populate PRODUCT_CONCEPT_HISTORICAL entries.
Sensory/Analytical Team:
- Define the demographicSegmentMappings. How do terms like "Millennials," "Gen Z," "Experts," etc., translate into filters on your respondent database?
Technical Team:
- Ensure the Pihu "Data Linking Agent" can efficiently query this module to:
  - Resolve product names/aliases and collaboration names to their respective database IDs.
  - Find the correct QIDs and data columns for requested attributes within a specific collaboration.
  - Construct SQL WHERE clauses based on demographic segment definitions.