Skip to content

Introduction to Grafana Loki


Overview

Welcome to **Lab 1 ** in the AIOps in Practice: Logging & Alerting at Scale course! This introductory lab provides you with the foundational knowledge you need to understand Grafana Loki, a modern log aggregation system that's revolutionizing how organizations handle logging at scale.

In this lab, you'll learn about Loki's innovative architecture, understand why it's 10x more cost-effective than traditional solutions like Elasticsearch, and discover how it integrates seamlessly with the Grafana observability ecosystem.

By the end of this lab, you'll be able to identify root causes in seconds and build powerful log-based dashboards.

What You'll Learn

πŸ“š Course Foundations

  • Overview of the AIOps Foundations course structure
  • Understanding the scope and learning objectives
  • How Loki fits into modern observability stacks
  • The journey from basic logging to intelligent alerting

🎯 Loki Fundamentals

  • What Loki is and the problems it solves
  • Why Loki is more cost-effective than traditional logging solutions
  • How label-based indexing works
  • When to use Loki vs. other logging systems

πŸ—οΈ Architecture Deep Dive

  • Log collection agents: Promtail, Alloy, Fluentd, Logstash
  • How Loki processes and labels logs
  • Storage backends: filesystem vs. object storage
  • Querying with LogQL
  • Integration with Grafana

πŸš€ Installation Options

  • Binary installation for local development
  • Docker deployment for testing environments
  • Kubernetes/Helm deployment for production
  • Configuration best practices

The Lab Journey

This lab follows a structured learning path designed to build your knowledge progressively:

  • Section 1: Course Introduction (10 min) - Understand the course objectives, structure, and what you'll achieve by the end
  • Section 2: Understanding Loki (10 min) - Learn what Loki is, its benefits, and how it compares to traditional logging solutions
  • Section 3: Architecture Deep Dive (15 min) - Explore Loki's architecture from log collection to querying and visualization
  • Section 4: Installation Overview (10 min) - Review different deployment options and configuration approaches

This introduction has given you the foundational knowledge to understand Loki. Now it's time to get hands-on:

Lab 1: Centralized Logging with Grafana Loki

You'll deploy the complete Loki stack using Docker Compose:

  • Deploy Loki, Grafana Alloy, and Grafana
  • Configure log collection from Docker containers
  • Send logs from a sample application
  • Query logs through Grafana

Lab 2: Querying Logs with LogQL

Master Loki's query language:

  • Write label selectors
  • Filter and search log content
  • Parse structured logs (JSON)
  • Extract fields and create metrics from logs
  • Build log-based dashboards

Lab 3: Correlating Metrics and Logs

Build unified observability:

  • Link Prometheus metrics with Loki logs
  • Create drill-down dashboards
  • Correlate performance metrics with log events
  • Implement log-metric correlation

Lab 4: Intelligent Alerting with Alertmanager

Create sophisticated alerting:

  • Alert on log patterns
  • Combine metric and log alerts
  • Configure alert routing
  • Reduce alert fatigue with intelligent grouping

Prerequisites

To succeed in this course, you should have:

Required:

  • Basic Linux/Unix knowledge: Comfortable with command-line operations
  • Container basics: Understanding of Docker concepts
  • YAML familiarity: Ability to read and edit configuration files

Recommended:

  • Prometheus experience: Understanding of metrics and PromQL
  • Grafana experience: Familiarity with dashboards and queries
  • Kubernetes basics: Helpful for understanding deployment patterns

Not Required:

  • Previous logging experience
  • Elasticsearch knowledge
  • Programming skills

Additional Resources

Official Documentation

Community Resources

Learning Paths

Blog Posts & Articles

Let's Get Started!

You've now completed the introduction to Grafana Loki! You understand:

  • What Loki is and why it's valuable
  • How its architecture enables cost-effective logging
  • Different deployment and configuration options
  • How it fits into the broader observability ecosystem

Real-World Context

Why This Matters

In modern cloud-native environments, logging is not just about collecting dataβ€”it's about doing so efficiently and cost-effectively at scale. Traditional logging solutions present significant challenges:

The Elasticsearch Problem:

  • Indexes every word in every log line
  • Requires expensive infrastructure
  • Complex to configure and maintain
  • Often needs dedicated personnel
  • Costs can spiral out of control at scale

The Loki Solution:

  • Indexes only labels (metadata), not content
  • Runs on commodity hardware or object storage
  • Simple configuration similar to Prometheus
  • Manages itself with minimal overhead
  • 10x cheaper at scale

Real-World Impact

Consider a real scenario:

  • Startup with 100 microservices: Switched from Elasticsearch to Loki
  • Result: Reduced logging infrastructure costs from $10,000/month to $1,000/month
  • Benefit: Same query performance, simpler operations, freed up engineering time

This is the power of Loki's design philosophy: simplicity and cost-effectiveness without compromising functionality.

Who Uses Loki?

Grafana Loki has been adopted by organizations worldwide:

  • Cloud-Native Startups: Building on Kubernetes from day one
  • Fortune 500 Companies: Migrating from expensive proprietary solutions
  • Financial Services: Meeting compliance requirements with open-source tools
  • E-Commerce Platforms: Handling massive log volumes during peak traffic
  • SaaS Providers: Multi-tenant logging with strong isolation
  • Gaming Companies: Real-time log analysis for player experience

As a Cloud Native Computing Foundation (CNCF) project, Loki has massive industry backing and is considered a critical component of modern observability stacks.

Key Concepts Explained

The Three Pillars of Observability

Modern observability relies on three data types working together:

1. Metrics (Prometheus)

  • Numerical measurements over time
  • Examples: CPU usage, request rate, error rate
  • Best for: Trends, alerts, capacity planning
  • Limitation: Shows "what" but not "why"

2. Logs (Loki)

  • Detailed event records
  • Examples: Error messages, request traces, debug info
  • Best for: Troubleshooting, debugging, audit trails
  • Limitation: High volume, expensive to store traditionally

3. Traces (Tempo)

  • Request flow across services
  • Examples: Distributed transaction paths
  • Best for: Understanding service dependencies, latency analysis
  • Limitation: Requires instrumentation

Together, they provide complete visibility: Metrics alert you to problems, logs explain what went wrong, and traces show how requests flowed through your system.

Label-Based Indexing

This is Loki's key innovation and why it's so cost-effective:

Traditional Approach (Elasticsearch):

Log: "User john@example.com failed login attempt at 2024-01-15 10:30:45"

Indexed Terms:
- User
- john
- example
- com
- failed
- login
- attempt
- at
- 2024
- 01
- 15
- 10
- 30
- 45

Result: Every word indexed, massive storage overhead

Loki's Approach:

Labels: {app="auth-service", level="error", environment="production"}
Log Content: "User john@example.com failed login attempt at 2024-01-15 10:30:45"

Indexed: Only the labels
Stored: Compressed log content in chunks

Result: Minimal index size, lower costs

Query Flow:

  1. Select logs by labels: {app="auth-service", level="error"}
  2. Filter content: |= "failed login"
  3. Loki searches only relevant log chunks
  4. Fast results, low costs

Log Streams

Loki organizes logs into streams based on label combinations:

Stream 1: {app="frontend", env="prod", region="us-east"}
β”œβ”€β”€ Log 1: "Request received"
β”œβ”€β”€ Log 2: "User authenticated"
└── Log 3: "Response sent"

Stream 2: {app="backend", env="prod", region="us-east"}
β”œβ”€β”€ Log 1: "Database query executed"
β”œβ”€β”€ Log 2: "Cache hit"
└── Log 3: "Response generated"

Stream 3: {app="frontend", env="dev", region="us-west"}
β”œβ”€β”€ Log 1: "Debug: variable value"
└── Log 2: "Warning: slow response"

Each unique label combination creates a separate stream. This organization:

  • Enables efficient storage compression
  • Allows fast label-based queries
  • Keeps related logs together
  • Supports time-based searching within streams

Cardinality Management

Cardinality refers to the number of unique label combinations. Managing it is critical:

Low Cardinality (Good):

Labels: {app="web", env="prod"}
Streams: ~10 unique combinations

Result: Efficient, fast queries

High Cardinality (Bad):

Labels: {app="web", user_id="12345", request_id="abc-xyz", timestamp="..."}
Streams: Millions of unique combinations

Result: Performance degradation, high costs Best Practice: Use labels for dimensions you want to filter by, not for unique values like user IDs or request IDs. Put those in the log content instead.


QA

Q: What are the three pillars of observability? A: Metrics, Logs and Traces

Q: What makes Loki more cost-effective than traditional logging systems like Elasticsearch? A: it indexes only labels (metadata), not full log content

Q: When would you use logs instead of metrics? A: To find the exact error message for a failed request

Q: Which of the following labels would cause high cardinality and hurt Loki's performance? A: {user_id="12345", session_id="abc-xyz"}

Q: In Loki, logs are organized into streams. What defines a unique stream? A: The unique combination of label values

Q: Which of the following can be used as a log collection agent with Loki? A: Promtail, Alloy, Fluentd, logstash

Q: What does Loki index from incoming logs? A: Only the labels (metadata) attached to logs

Q: Which storage backend is recommended for production Loki deployments? A: Object storage (s3, GCS, Azure Blob)

Q: What is the name of Loki's query language? A: LogQl

Q: Which deployment method is recommended for production environments? A: Kubernetes Helm Charts

Q: Port A: 3100

Q: Which Grafana feature allows you to jump from a metric spike to related logs? A: Data links / Drill-down

Q: What is the primary Grafana interface for ad-hoc log queries and exploration? A: Explore


🎯 Why Label Design Matters

Proper label design is critical for Loki's performance and cost-effectiveness. Labels should have low cardinality (limited number of unique values) to keep indexing efficient.

βœ… Good Labels (Low Cardinality)

  • app="frontend" - Application name
  • env="production" - Environment
  • level="error" - Log level
  • region="us-east" - Region/cluster

❌ Bad Labels (High Cardinality)

  • user_id="12345" - Unique per user
  • request_id="abc-xyz" - Unique per request
  • ip_address="1.2.3.4" - Unique per client

Architecture Components

The Complete Stack

A typical Loki deployment consists of:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Grafana                       β”‚
β”‚         (Visualization & Querying)              β”‚
β”‚                                                 β”‚
β”‚  - Explore interface for ad-hoc queries        β”‚
β”‚  - Dashboards with log panels                  β”‚
β”‚  - Alerting based on log queries               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β”‚ Query via HTTP API
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 Loki Server                     β”‚
β”‚           (Query & Ingestion)                   β”‚
β”‚                                                 β”‚
β”‚  Components:                                    β”‚
β”‚  - Distributor: Receives logs                  β”‚
β”‚  - Ingester: Processes and buffers             β”‚
β”‚  - Querier: Handles queries                    β”‚
β”‚  - Query Frontend: Splits/caches queries       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β”‚ Store chunks & index
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Storage Backend                    β”‚
β”‚    (Object Store or Filesystem)                 β”‚
β”‚                                                 β”‚
β”‚  Options:                                       β”‚
β”‚  - Amazon S3                                    β”‚
β”‚  - Google Cloud Storage                         β”‚
β”‚  - Azure Blob Storage                           β”‚
β”‚  - Local filesystem (dev only)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β”‚ Send logs with labels
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Log Collection Agents                β”‚
β”‚  (Promtail, Alloy, Fluentd, Logstash)         β”‚
β”‚                                                 β”‚
β”‚  Responsibilities:                              β”‚
β”‚  - Discover log sources                         β”‚
β”‚  - Extract labels                               β”‚
β”‚  - Parse log formats                            β”‚
β”‚  - Forward to Loki                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β”‚ Read logs
                 β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚             Applications                        β”‚
β”‚  (Containers, VMs, Serverless)                 β”‚
β”‚                                                 β”‚
β”‚  Log outputs:                                   β”‚
β”‚  - stdout/stderr                                β”‚
β”‚  - Log files                                    β”‚
β”‚  - Systemd journals                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Roles

Applications: Generate logs to stdout, files, or journals

Collection Agents: Discover, label, and forward logs

  • Promtail: Loki-native, lightweight
  • Grafana Alloy: Unified agent for metrics, logs, traces
  • Fluentd: Extensive plugin ecosystem
  • Logstash: Elastic stack compatibility

Loki Server: Receives, processes, stores, and serves logs

  • Distributor: Load balances incoming logs
  • Ingester: Chunks and compresses logs
  • Querier: Executes LogQL queries
  • Query Frontend: Optimizes query performance

Storage: Persists log data and indexes

  • Object Storage: Recommended for production (S3, GCS, Azure)
  • Filesystem: Simple for development

Grafana: User interface for querying and visualization

  • Explore: Ad-hoc log queries and analysis
  • Dashboards: Persistent visualizations
  • Alerting: Notifications based on log patterns

Installation Approaches

1. Binary Installation

Best for: Learning, local development, quick testing

Pros:

  • Fast setup
  • No containers required
  • Easy to debug
  • Minimal dependencies

Cons:

  • Not production-ready
  • Manual updates
  • Single-machine only

Use Case: "I want to learn Loki basics on my laptop"

2. Docker Compose

Best for: Development, testing, demos

Pros:

  • Easy multi-component deployment
  • Reproducible environments
  • Simple networking
  • Version control for config

Cons:

  • Not highly available
  • Limited scalability
  • Requires Docker knowledge
  • Use Case: "I want to test Loki with my application locally"

3. Kubernetes (Helm)

Best for: Production deployments

Pros:

  • High availability
  • Auto-scaling
  • Native cloud integration
  • Battle-tested configurations

Cons:

  • Complex setup
  • Requires Kubernetes expertise
  • More infrastructure overhead

Use Case: "I need production-grade logging for my microservices"

Configuration Essentials

Key Configuration Sections

Every Loki deployment requires configuration for:

  1. Server Settings

server:
  http_listen_port: 3100    # API and UI
  grpc_listen_port: 9096    # Internal communication
2. Storage Configuration

storage_config:
  aws:
    s3: s3://region/bucket    # Production: object storage
  filesystem:
    directory: /tmp/loki      # Development only
3. Schema Configuration

schema_config:
  configs:
    - from: 2020-10-24        # Schema start date
      store: boltdb-shipper   # Index storage
      object_store: s3        # Chunk storage
      schema: v11             # Schema version
4. Retention

limits_config:
  retention_period: 744h      # 31 days

Integration with Grafana

Why Grafana + Loki?

Loki and Grafana are designed to work together seamlessly:

Unified Observability:

  • Metrics from Prometheus
  • Logs from Loki
  • Traces from Tempo
  • All in one interface

Contextual Navigation:

  • Click on a metric spike β†’ jump to logs
  • See a log error β†’ view related metrics
  • Trace a request β†’ see all associated logs

Powerful Visualization:

  • Log panels in dashboards
  • Live tailing
  • Log context (surrounding lines)
  • Template variables for filtering

Setting Up Loki in Grafana

  1. Navigate to Configuration β†’ Data Sources
  2. Click "Add data source"
  3. Select "Loki"
  4. Enter URL: http://loki:3100
  5. Save & Test

Explore Interface

Grafana's Explore interface is where you'll spend most of your time:

  • Label Browser: Discover available labels
  • Query Builder: Construct queries visually
  • Log Browser: View results in context
  • Time Controls: Adjust search window
  • Live Tail: Real-time log streaming

What You've Learned

  • What Loki is and why it's 10x more cost-effective
  • How label-based indexing works
  • Loki's architecture and components
  • Label design best practices (low cardinality)
  • Deployment options for different environments
  • Integration with Grafana for unified observability