Graded Quiz: Advanced Multimodal Applications :Build Multimodal Generative AI Applications (BM RAG and Agentic AI Professional Certificate) Answers 2025
Question 1
Which model/platform should TechFood Inc. use for identifying nutritional content from images?
❌ BERT on TensorFlow
✅ Llama 4 Maverick model on IBM watsonx
❌ ResNet50 on PyTorch
❌ GPT-3 on OpenAI
Explanation:
Llama 4 Maverick is a multimodal vision-capable model suitable for analyzing food images and providing nutritional insights.
Question 2
Crucial step to prepare images for a multimodal text-vision model:
❌ Compress images
❌ Enhance resolution
✅ Convert images into base64-encoded strings
❌ Store images in database
Explanation:
Most multimodal APIs accept images as base64 strings for transmission and processing.
Question 3
Technique for retrieving visually similar images:
❌ Histogram comparison
✅ Cosine similarity for feature vectors
❌ Euclidean distance on raw pixels
❌ Fourier transform
Explanation:
Feature embeddings + cosine similarity is the standard approach for image similarity search.
Question 4
Essential component for integrating text + images:
❌ Visual prioritization
✅ Integrated understanding across modalities
❌ Text then image
❌ Separate processing only
Explanation:
Multimodal AI uses fusion layers to combine image + text representations into a unified meaning.
Question 5
Crucial step before building a Flask multimodal interface:
✅ Importing necessary Python libraries
❌ Designing UI
❌ Testing functionality
❌ Configuring server
Explanation:
Environment preparation begins with loading all required libraries so the app can run.
Question 6
Critical step to ensure the model can process images:
❌ Enhance resolution
❌ Store images in a DB
❌ Use grayscale
✅ Convert images into AI-readable formats
Explanation:
Images must be encoded (often as base64, tensors, or bytes) before entering the model pipeline.
Question 7
Why is MM-RAG retrieval based on file names flawed?
❌ File names converted to vectors
✅ Retrieval should be based on semantic similarity between embeddings, not file names
❌ File names must match
❌ File names have hidden metadata
Explanation:
RAG retrieval works on semantic embeddings, while filenames carry no semantic meaning.
🧾 Summary Table
| Q No. | Correct Answer | Key Concept |
|---|---|---|
| 1 | Llama 4 Maverick on watsonx | Vision-LLM for nutrition |
| 2 | Convert to base64 | Image prep for multimodal models |
| 3 | Cosine similarity | Image similarity search |
| 4 | Integrated multimodal understanding | Text-image fusion |
| 5 | Import libraries | Flask environment setup |
| 6 | Convert to AI-readable formats | Image preprocessing |
| 7 | Semantic similarity > filenames | MM-RAG limitations |