Skip to content

Graded Quiz: Advanced Multimodal Applications :Build Multimodal Generative AI Applications (BM RAG and Agentic AI Professional Certificate) Answers 2025

Question 1

Which model/platform should TechFood Inc. use for identifying nutritional content from images?

❌ BERT on TensorFlow
Llama 4 Maverick model on IBM watsonx
❌ ResNet50 on PyTorch
❌ GPT-3 on OpenAI

Explanation:
Llama 4 Maverick is a multimodal vision-capable model suitable for analyzing food images and providing nutritional insights.


Question 2

Crucial step to prepare images for a multimodal text-vision model:

❌ Compress images
❌ Enhance resolution
Convert images into base64-encoded strings
❌ Store images in database

Explanation:
Most multimodal APIs accept images as base64 strings for transmission and processing.


Question 3

Technique for retrieving visually similar images:

❌ Histogram comparison
Cosine similarity for feature vectors
❌ Euclidean distance on raw pixels
❌ Fourier transform

Explanation:
Feature embeddings + cosine similarity is the standard approach for image similarity search.


Question 4

Essential component for integrating text + images:

❌ Visual prioritization
Integrated understanding across modalities
❌ Text then image
❌ Separate processing only

Explanation:
Multimodal AI uses fusion layers to combine image + text representations into a unified meaning.


Question 5

Crucial step before building a Flask multimodal interface:

Importing necessary Python libraries
❌ Designing UI
❌ Testing functionality
❌ Configuring server

Explanation:
Environment preparation begins with loading all required libraries so the app can run.


Question 6

Critical step to ensure the model can process images:

❌ Enhance resolution
❌ Store images in a DB
❌ Use grayscale
Convert images into AI-readable formats

Explanation:
Images must be encoded (often as base64, tensors, or bytes) before entering the model pipeline.


Question 7

Why is MM-RAG retrieval based on file names flawed?

❌ File names converted to vectors
Retrieval should be based on semantic similarity between embeddings, not file names
❌ File names must match
❌ File names have hidden metadata

Explanation:
RAG retrieval works on semantic embeddings, while filenames carry no semantic meaning.


🧾 Summary Table

Q No. Correct Answer Key Concept
1 Llama 4 Maverick on watsonx Vision-LLM for nutrition
2 Convert to base64 Image prep for multimodal models
3 Cosine similarity Image similarity search
4 Integrated multimodal understanding Text-image fusion
5 Import libraries Flask environment setup
6 Convert to AI-readable formats Image preprocessing
7 Semantic similarity > filenames MM-RAG limitations