Categories
Uncategorized

Python Package Performance Profiling Guide

A comprehensive guide to profiling and optimizing Python packages for real-world performance. Learn how to measure, analyze, and improve your package’s startup time, import overhead, and runtime performance.

Why Profile Your Python Package?

Performance matters. Users notice when your CLI takes 3 seconds to start or when your library adds 500ms to import time. This guide covers practical techniques to identify and fix performance bottlenecks.

Measuring Import Time

Python’s import system can be surprisingly slow. Here’s how to measure it:

Using time.perf_counter()

import time

t0 = time.perf_counter()
import your_module
import_time = (time.perf_counter() - t0) * 1000
print(f'Import time: {import_time:.0f}ms')

Using Python’s -X importtime

python -X importtime -c "import your_module" 2>&1 | head -20

This shows cumulative and self time for each import, helping identify slow dependencies.

Using cProfile for Function-Level Analysis

cProfile is Python’s built-in profiler. Use it to find hot functions:

import cProfile
import pstats

profiler = cProfile.Profile()
profiler.enable()

# Your code here
result = your_function()

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20)  # Top 20 functions

Key Metrics

  • cumulative: Total time in function including subcalls
  • tottime: Time in function excluding subcalls
  • calls: Number of times function was called

Separating Network from Compute

For packages that make API calls, separate network latency from local computation:

import time
import json

t0 = time.perf_counter()
# Import phase
import openai
t_import = time.perf_counter()

# Init phase
client = openai.OpenAI()
t_init = time.perf_counter()

# Network phase
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hi"}]
)
t_network = time.perf_counter()

print(f'Import:  {(t_import - t0) * 1000:.0f}ms')
print(f'Init:    {(t_init - t_import) * 1000:.0f}ms')
print(f'Network: {(t_network - t_init) * 1000:.0f}ms')
print(f'Total:   {(t_network - t0) * 1000:.0f}ms')

Cold vs Warm Runs

Always measure both cold (first run) and warm (subsequent) performance:

import subprocess
import statistics

def measure_cold_start(command, runs=3):
    times = []
    for _ in range(runs):
        t0 = time.perf_counter()
        subprocess.run(command, capture_output=True)
        times.append((time.perf_counter() - t0) * 1000)
    return statistics.mean(times), statistics.stdev(times)

avg, std = measure_cold_start(['python', '-c', 'import your_module'])
print(f'Cold start: {avg:.0f}ms (±{std:.0f}ms)')

Identifying Eager Imports

Eager imports at module level are the #1 cause of slow startup. Look for:

# BAD: Eager import at module level
from heavy_dependency import HeavyClass

# GOOD: Lazy import when needed
def get_heavy_class():
    from heavy_dependency import HeavyClass
    return HeavyClass

Using importlib.util.find_spec

Check if a module is available without importing it:

import importlib.util

# Fast availability check (no import)
HEAVY_AVAILABLE = importlib.util.find_spec("heavy_module") is not None

# Lazy import helper
def _get_heavy_module():
    if HEAVY_AVAILABLE:
        import heavy_module
        return heavy_module
    return None

Lazy Loading with __getattr__

Python 3.7+ supports module-level __getattr__ for lazy loading:

# In your_package/__init__.py
_lazy_cache = {}

def __getattr__(name):
    if name in _lazy_cache:
        return _lazy_cache[name]
    
    if name == "HeavyClass":
        from .heavy_module import HeavyClass
        _lazy_cache[name] = HeavyClass
        return HeavyClass
    
    raise AttributeError(f"module has no attribute {name}")

Creating Timeline Diagrams

Visualize execution phases with ASCII timeline diagrams:

def create_timeline(phases):
    """
    phases: list of (name, duration_ms) tuples
    """
    total = sum(d for _, d in phases)
    scale = 50.0 / total
    
    # Top line
    line = "ENTER "
    for name, ms in phases:
        width = max(8, int(ms * scale))
        line += "─" * width
    line += "► END"
    print(line)
    
    # Phase names
    line = "      "
    for name, ms in phases:
        width = max(8, int(ms * scale))
        line += "│" + name.center(width - 1)
    line += "│"
    print(line)
    
    print(f"{'':>50} TOTAL: {total:.0f}ms")

Comparing SDK vs Wrapper Performance

When building wrappers around SDKs, measure the overhead:

# Baseline: Raw SDK
sdk_time = measure_sdk_call()

# Your wrapper
wrapper_time = measure_wrapper_call()

overhead = wrapper_time - sdk_time
print(f'Wrapper overhead: {overhead:.0f}ms ({overhead/sdk_time*100:.1f}%)')

Target: Keep wrapper overhead under 5% of SDK time.

CLI vs Python API Performance

CLI tools have additional overhead from subprocess spawning:

# Python API (faster)
from your_package import YourClass
result = YourClass().run()

# CLI (slower due to subprocess)
subprocess.run(['your-cli', 'command'])

Typical CLI overhead: 100-300ms for subprocess spawn + Python startup.

Caching Strategies

Module-Level Caching

class MyClass:
    _cached_client = None
    
    @classmethod
    def get_client(cls):
        if cls._cached_client is None:
            cls._cached_client = ExpensiveClient()
        return cls._cached_client

Configuration Caching

_config_applied = False

def apply_config():
    global _config_applied
    if _config_applied:
        return
    # Expensive configuration
    _config_applied = True

Profiling Pitfalls

1. Measuring in Development Mode

Debug mode, assertions, and development dependencies add overhead. Profile in production-like conditions.

2. Ignoring Variance

Always run multiple iterations and report standard deviation:

times = [measure() for _ in range(10)]
print(f'{statistics.mean(times):.0f}ms (±{statistics.stdev(times):.0f}ms)')

3. Profiler Overhead

cProfile adds ~10-20% overhead. For accurate timing, use time.perf_counter() for wall-clock measurements.

4. Network Variance

API calls have high variance. Separate network timing from local computation.

Performance Targets

Reasonable targets for Python packages:

MetricTarget
CLI –help< 500ms
Package import< 100ms
Wrapper overhead vs SDK< 5%
Profiling overhead< 5%

Summary

Key techniques for Python package performance:

  1. Measure first: Use time.perf_counter() and cProfile
  2. Separate phases: Import, init, network, execution
  3. Lazy load: Use __getattr__ and importlib.util.find_spec
  4. Cache wisely: Module-level caching for expensive operations
  5. Multiple runs: Report mean and standard deviation
  6. Timeline diagrams: Visualize where time is spent

Performance optimization is iterative. Measure, identify bottlenecks, fix, and measure again.

Categories
Praison AI

Creating Agent Skills: From Basic to Advanced

Agent Skills are a powerful way to extend AI agents with specialized capabilities. This tutorial walks you through creating skills at three complexity levels, from basic instructions to full-featured skills with scripts and resources.

What are Agent Skills?

Agent Skills are folders containing instructions, scripts, and resources that AI agents can discover and use to perform specific tasks more reliably. They follow the open agentskills.io specification and use progressive disclosure to manage context efficiently.

Skill Directory Structure

my-skill/
├── SKILL.md           # Required: instructions + metadata
├── scripts/           # Optional: executable code
├── references/        # Optional: documentation
└── assets/            # Optional: templates, resources

Level 1: Basic Skill (SKILL.md Only)

The simplest skill contains just a SKILL.md file with YAML frontmatter and instructions.

Example: Code Review Skill

Create the directory structure:

mkdir -p ~/.praison/skills/code-review

Create ~/.praison/skills/code-review/SKILL.md:

---
name: code-review
description: Review code for bugs, security issues, and best practices. Use when asked to review, audit, or check code quality.
---

# Code Review Skill

## When to Use
Activate this skill when the user asks to:
- Review code for bugs or issues
- Check code quality
- Audit security vulnerabilities
- Suggest improvements

## Review Checklist

### 1. Security
- [ ] No hardcoded secrets or API keys
- [ ] Input validation present
- [ ] SQL injection prevention
- [ ] XSS protection for web code

### 2. Code Quality
- [ ] Functions are single-purpose
- [ ] Variable names are descriptive
- [ ] No code duplication
- [ ] Error handling is comprehensive

### 3. Performance
- [ ] No unnecessary loops
- [ ] Efficient data structures used
- [ ] Database queries optimized

## Output Format
Provide findings in this format:
1. **Critical Issues** - Must fix before deployment
2. **Warnings** - Should address soon
3. **Suggestions** - Nice to have improvements

Using the Basic Skill

from praisonaiagents import Agent

# Agent automatically discovers skills from default directories
agent = Agent(
    instructions="You are a code assistant",
    skills_dirs=["~/.praison/skills"]
)

# The skill is available when relevant
response = agent.chat("Review this Python function for issues: def login(user, pwd): return db.query(f'SELECT * FROM users WHERE name={user}')")
print(response)

Level 2: Skill with Script

Add executable scripts for deterministic operations that benefit from code execution rather than LLM generation.

Example: Data Validation Skill

Create the directory structure:

mkdir -p ~/.praison/skills/data-validator/scripts

Create ~/.praison/skills/data-validator/SKILL.md:

---
name: data-validator
description: Validate CSV and JSON data files for schema compliance, data types, and required fields. Use when asked to validate, check, or verify data files.
---

# Data Validation Skill

## When to Use
Activate this skill when the user needs to:
- Validate CSV or JSON files
- Check data types and formats
- Verify required fields exist
- Find data quality issues

## Available Scripts

### validate_csv.py
Validates CSV files against expected schema.

Usage:
```bash
python scripts/validate_csv.py <file.csv> [--schema schema.json]
```

### validate_json.py
Validates JSON files against JSON Schema.

Usage:
```bash
python scripts/validate_json.py <file.json> --schema schema.json
```

## Workflow
1. Identify the file type (CSV or JSON)
2. Run the appropriate validation script
3. Report any validation errors found
4. Suggest fixes for common issues

Create ~/.praison/skills/data-validator/scripts/validate_csv.py:

#!/usr/bin/env python3
"""CSV Validation Script for Data Validator Skill."""

import csv
import sys
import json
from pathlib import Path

def validate_csv(filepath: str, schema_path: str = None) -> dict:
    """Validate a CSV file and return results."""
    results = {
        "valid": True,
        "errors": [],
        "warnings": [],
        "stats": {}
    }
    
    try:
        with open(filepath, 'r', newline='', encoding='utf-8') as f:
            reader = csv.DictReader(f)
            headers = reader.fieldnames or []
            
            results["stats"]["columns"] = len(headers)
            results["stats"]["headers"] = headers
            
            row_count = 0
            empty_cells = 0
            
            for i, row in enumerate(reader, start=2):
                row_count += 1
                for col, value in row.items():
                    if value is None or value.strip() == '':
                        empty_cells += 1
                        results["warnings"].append(
                            f"Row {i}, Column '{col}': Empty value"
                        )
            
            results["stats"]["rows"] = row_count
            results["stats"]["empty_cells"] = empty_cells
            
            # Schema validation if provided
            if schema_path and Path(schema_path).exists():
                with open(schema_path) as sf:
                    schema = json.load(sf)
                    required = schema.get("required_columns", [])
                    for col in required:
                        if col not in headers:
                            results["valid"] = False
                            results["errors"].append(
                                f"Missing required column: {col}"
                            )
                            
    except FileNotFoundError:
        results["valid"] = False
        results["errors"].append(f"File not found: {filepath}")
    except csv.Error as e:
        results["valid"] = False
        results["errors"].append(f"CSV parsing error: {e}")
    except Exception as e:
        results["valid"] = False
        results["errors"].append(f"Unexpected error: {e}")
    
    return results

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: validate_csv.py <file.csv> [--schema schema.json]")
        sys.exit(1)
    
    filepath = sys.argv[1]
    schema = None
    
    if "--schema" in sys.argv:
        idx = sys.argv.index("--schema")
        if idx + 1 < len(sys.argv):
            schema = sys.argv[idx + 1]
    
    results = validate_csv(filepath, schema)
    print(json.dumps(results, indent=2))

Using the Script-Based Skill

from praisonaiagents import Agent
from praisonaiagents.skills import SkillManager

# Load skill and access scripts
manager = SkillManager()
manager.discover()
manager.activate("data-validator")

skill = manager.get_skill("data-validator")
scripts = manager.load_resources("data-validator")

print(f"Available scripts: {list(scripts.get('scripts', {}).keys())}")

# Agent can execute scripts via code execution tool
from praisonai.code import execute_command

result = execute_command(
    f"python {skill.properties.path}/scripts/validate_csv.py data.csv"
)
print(result['stdout'])

Level 3: Full-Featured Skill

A complete skill with all optional directories: scripts for execution, references for documentation, and assets for templates.

Example: Report Generator Skill

Create the full directory structure:

mkdir -p ~/.praison/skills/report-generator/{scripts,references,assets}

Create ~/.praison/skills/report-generator/SKILL.md:

---
name: report-generator
description: Generate professional reports in multiple formats (PDF, HTML, Markdown). Use when asked to create reports, summaries, or documentation from data.
license: Apache-2.0
compatibility: Requires Python 3.8+ with reportlab and jinja2
metadata:
  author: PraisonAI
  version: "1.0"
---

# Report Generator Skill

## Overview
This skill generates professional reports from data in multiple formats.

## When to Use
Activate when the user needs to:
- Generate PDF reports from data
- Create HTML documentation
- Build formatted Markdown reports
- Use branded report templates

## Available Scripts

### generate_report.py
Main report generation script.

```bash
python scripts/generate_report.py --input data.json --output report.pdf --template default
```

Options:
- `--input`: Input data file (JSON or CSV)
- `--output`: Output file path
- `--format`: Output format (pdf, html, md)
- `--template`: Template name from assets/

## References
- See [TEMPLATES.md](references/TEMPLATES.md) for template customization
- See [STYLING.md](references/STYLING.md) for styling options

## Assets
- `assets/default_template.html` - Default HTML template
- `assets/report_styles.css` - CSS styles for reports
- `assets/logo.png` - Default logo for headers

Create ~/.praison/skills/report-generator/scripts/generate_report.py:

#!/usr/bin/env python3
"""Report Generation Script."""

import argparse
import json
from pathlib import Path
from datetime import datetime

def generate_markdown_report(data: dict, template_path: str = None) -> str:
    """Generate a Markdown report from data."""
    title = data.get("title", "Report")
    sections = data.get("sections", [])
    
    lines = [
        f"# {title}",
        f"",
        f"*Generated: {datetime.now().strftime('%Y-%m-%d %H:%M')}*",
        f"",
        "---",
        ""
    ]
    
    for section in sections:
        lines.append(f"## {section.get('heading', 'Section')}")
        lines.append("")
        lines.append(section.get('content', ''))
        lines.append("")
        
        if 'items' in section:
            for item in section['items']:
                lines.append(f"- {item}")
            lines.append("")
    
    return "\n".join(lines)

def main():
    parser = argparse.ArgumentParser(description="Generate reports")
    parser.add_argument("--input", required=True, help="Input JSON file")
    parser.add_argument("--output", required=True, help="Output file")
    parser.add_argument("--format", default="md", choices=["md", "html", "pdf"])
    parser.add_argument("--template", default="default")
    
    args = parser.parse_args()
    
    with open(args.input) as f:
        data = json.load(f)
    
    if args.format == "md":
        content = generate_markdown_report(data)
        with open(args.output, 'w') as f:
            f.write(content)
        print(f"Report generated: {args.output}")
    else:
        print(f"Format {args.format} requires additional dependencies")

if __name__ == "__main__":
    main()

Create ~/.praison/skills/report-generator/references/TEMPLATES.md:

# Template Customization Guide

## Available Templates

### default
The default template provides a clean, professional layout suitable for most reports.

### executive
A condensed template for executive summaries with key metrics highlighted.

### technical
Detailed template with code blocks, tables, and technical formatting.

## Creating Custom Templates

1. Copy an existing template from `assets/`
2. Modify the HTML/CSS as needed
3. Reference your template with `--template your-template`

## Template Variables

Templates support these variables:
- `{{title}}` - Report title
- `{{date}}` - Generation date
- `{{sections}}` - Content sections
- `{{logo}}` - Logo path

Create ~/.praison/skills/report-generator/references/STYLING.md:

# Styling Guide

## Color Schemes

### Professional (Default)
- Primary: #2563eb (Blue)
- Secondary: #64748b (Slate)
- Accent: #10b981 (Emerald)

### Corporate
- Primary: #1e293b (Dark)
- Secondary: #475569 (Gray)
- Accent: #f59e0b (Amber)

## Typography

- Headings: Inter, system-ui
- Body: Georgia, serif
- Code: JetBrains Mono, monospace

## Customization

Edit `assets/report_styles.css` to customize:
- Colors and fonts
- Spacing and margins
- Header/footer layouts

Create ~/.praison/skills/report-generator/assets/default_template.html:

<!DOCTYPE html>
<html>
<head>
    <title>{{title}}</title>
    <link rel="stylesheet" href="report_styles.css">
</head>
<body>
    <header>
        <img src="{{logo}}" alt="Logo" class="logo">
        <h1>{{title}}</h1>
        <p class="date">{{date}}</p>
    </header>
    <main>
        {{content}}
    </main>
    <footer>
        <p>Generated by Report Generator Skill</p>
    </footer>
</body>
</html>

Using the Full-Featured Skill

from praisonaiagents import Agent
from praisonaiagents.skills import SkillManager, SkillLoader

# Initialize and discover skills
manager = SkillManager()
manager.discover()

# Get skill info
skill = manager.get_skill("report-generator")
print(f"Skill: {skill.properties.name}")
print(f"Path: {skill.properties.path}")

# Load all resources (Level 3)
loader = SkillLoader()
loaded = loader.load(str(skill.properties.path), activate=True)
loader.load_all_resources(loaded)

# Access different resource types
print(f"\nScripts: {list(loaded.get_scripts().keys())}")
print(f"References: {list(loaded.get_references().keys())}")
print(f"Assets: {list(loaded.get_assets().keys())}")

# Use with an agent
agent = Agent(
    instructions="You are a report generation assistant",
    skills=["~/.praison/skills/report-generator"]
)

# Get the skills XML for system prompt
skills_xml = agent.get_skills_prompt()
print(f"\nSkills XML:\n{skills_xml}")

Progressive Disclosure in Action

Skills use three levels of loading to manage context efficiently:

LevelWhat's LoadedWhenToken Cost
1. Metadataname, descriptionAt startup~100 tokens
2. InstructionsFull SKILL.md bodyWhen skill triggered<5k tokens
3. Resourcesscripts/, references/, assets/As neededVariable
from praisonaiagents.skills import SkillLoader

loader = SkillLoader()

# Level 1: Metadata only (~100 tokens)
skill = loader.load_metadata("~/.praison/skills/report-generator")
print(f"Name: {skill.metadata.name}")
print(f"Description: {skill.metadata.description}")

# Level 2: Full instructions (<5k tokens)
loader.activate(skill)
print(f"Instructions loaded: {len(skill.instructions)} chars")

# Level 3: Resources (as needed)
loader.load_scripts(skill)      # Load executable scripts
loader.load_references(skill)   # Load documentation
loader.load_assets(skill)       # Load templates/resources

Built-in Code Execution

PraisonAI includes built-in code execution tools that work seamlessly with skill scripts:

from praisonai.code import execute_command, run_python

# Execute a skill script
result = execute_command(
    "python ~/.praison/skills/data-validator/scripts/validate_csv.py data.csv"
)
print(result['stdout'])

# Run Python code directly
result = run_python("""
import json
data = {"status": "success", "count": 42}
print(json.dumps(data))
""")
print(result['stdout'])

CLI Commands

PraisonAI provides CLI commands for skill management:

# List all discovered skills
praisonai skills list

# Validate a skill directory
praisonai skills validate --path ~/.praison/skills/report-generator

# Create a new skill from template
praisonai skills create --name my-new-skill

# Generate XML prompt for skills
praisonai skills prompt

Best Practices

  • Keep SKILL.md lean - Move detailed documentation to references/
  • Use scripts for determinism - Code execution is more reliable than LLM generation for specific operations
  • Be specific in descriptions - Clear descriptions help agents know when to use skills
  • Follow naming conventions - Lowercase, hyphens, max 64 characters
  • Test your skills - Use praisonai skills validate before deployment

Conclusion

Agent Skills provide a powerful, standardized way to extend AI agents with specialized capabilities. Whether you need simple instructions or complex workflows with scripts and resources, the progressive disclosure model ensures efficient context usage while enabling sophisticated functionality.

PraisonAI's implementation is fully compliant with the agentskills.io specification, making your skills portable across compatible agent products.

Categories
Praison AI

How to Convert a Python Package to Support MCP (Model Context Protocol)

Introduction to MCP

The Model Context Protocol (MCP) is an open standard created by Anthropic that allows AI applications to connect to external tools and data sources in a standardized way. Converting your Python package to support MCP enables it to be used with Claude Desktop, Cursor, VS Code, and other MCP-compatible AI tools.

Prerequisites

  • Python 3.10 or higher
  • uv package manager (recommended) or pip
  • Basic understanding of async Python

Step 1: Install the MCP Python SDK

The official Python SDK makes it easy to create MCP servers:

# Using uv (recommended)
uv add mcp

# Or using pip
pip install mcp

Step 2: Create Your MCP Server

Use FastMCP to quickly create an MCP server that exposes your package’s functionality as tools:

from mcp.server.fastmcp import FastMCP

# Create an MCP server
mcp = FastMCP("My Package Server")

# Expose a function as a tool
@mcp.tool()
def my_function(param1: str, param2: int = 10) -> str:
    """Description of what this tool does."""
    # Your existing package logic here
    return f"Result: {param1}, {param2}"

# Expose data as a resource
@mcp.resource("data://{item_id}")
def get_data(item_id: str) -> str:
    """Get data by ID."""
    return f"Data for {item_id}"

# Run the server
if __name__ == "__main__":
    mcp.run()  # Uses stdio transport by default

Step 3: Add Entry Point to Your Package

Update your pyproject.toml to include an entry point:

[project.scripts]
my-package-mcp = "my_package.mcp_server:main"

Step 4: Configure for Claude Desktop

Add your server to Claude Desktop’s configuration file (claude_desktop_config.json):

{
  "mcpServers": {
    "my-package": {
      "command": "uvx",
      "args": ["my-package-mcp"]
    }
  }
}

Advanced: HTTP Transport

For remote access, use the Streamable HTTP transport:

if __name__ == "__main__":
    mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)

Key Concepts

  • Tools: Functions that perform actions (like POST endpoints)
  • Resources: Read-only data sources (like GET endpoints)
  • Prompts: Reusable templates for LLM interactions

Resources

Categories
Python

Django Q Beginners Guide

pip install django-q
# settings.py
INSTALLED_APPS = (
    # other apps
    'django_q',
)
python manage.py migrate
python manage.py qcluster
# myapp/services.py

from time import sleep

def sleep_and_print(secs):
    sleep(secs)
    print("Task ran!")

Without Async

# myapp/views.py

from django.http import JsonResponse
from time import sleep

def index(request):
    json_payload = {
        "message": "Hello world!"
    }
    sleep(10)
    return JsonResponse(json_payload)

With Async ( using Django Q )

# myapp/views.py
from django.http import JsonResponse
from django_q.tasks import async_task


def index(request):
    json_payload = {"message": "hello world!"}
    # enqueue the task
    async_task("myapp.services.sleep_and_print", 10)
    #
    return JsonResponse(json_payload)
Categories
Python

Regex Extract Python

import re
s = """
... some line abc
... some other line
... name my_user_name is valid
... some more lines"""
p = re.compile("name (.*) is valid")
result = p.search(s)
print(result.group(1))
Categories
Python

Conda Python Environments

1. Creating Environments

conda create --name py2 python=2.7
conda create --name py3 python=3.7

2. Switch to Python 2

source activate py2

Deactivate Conda Python 2

source deactivate

3. Switch to Python 3

source activate py3

Deactivate Conda Python 3

source deactivate
Categories
Python Web Server

Run shell script in Apache Airflow

Create the script.py inside the ../airflow/dags folder

from builtins import range
from datetime import timedelta

import airflow
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator

args = {
    'owner': 'airflow',
    'start_date': airflow.utils.dates.days_ago(2),
}

dag = DAG(
    dag_id='dagid1',
    default_args=args,
    schedule_interval='0 0 * * *',
    dagrun_timeout=timedelta(minutes=60),
)

# [START howto_operator_bash]
create_command = """
echo 'Hello world';
echo 'Welcome';"""

run_this = BashOperator(
    task_id='taskid1',
    bash_command=create_command,
    dag=dag,
)
Categories
Python

Apache Airflow Beginners guide

export AIRFLOW_HOME=~/airflow

# install from pypi using pip
pip install apache-airflow

# initialize the database
airflow initdb

# start the web server, default port is 8080
airflow webserver -p 8080

# start the scheduler
airflow scheduler

# visit localhost:8080 in the browser and enable the example dag in the home page

Nginx reverse proxy

Airflow reverse proxy without rewrite

server {
  listen 80;
  server_name lab.example.com;

  location /myorg/airflow/ {
      proxy_pass http://localhost:8080;
      proxy_set_header Host $host;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
  }
}

Airflow reverse proxy with rewrite

server {
    listen 80;
    server_name lab.example.com;

    location /myorg/workflow/ {
        rewrite ^/myorg/workflow/(.*)$ /$1 break;  # remove prefix from http header
        proxy_pass http://localhost:5555;
        proxy_set_header Host $host;
        proxy_redirect off;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}
Categories
Python

Python Virtual Environment Wrapper

$ pip install virtualenvwrapper
$ which virtualenvwrapper.sh
/usr/local/bin/virtualenvwrapper.sh

Adding path

$ export WORKON_HOME=$HOME/.virtualenvs   # Optional
$ export PROJECT_HOME=$HOME/projects      # Optional
$ source /usr/local/bin/virtualenvwrapper.sh

Reloading Shell

$ source ~/.bashrc
$ echo $WORKON_HOME
/Users/mervinpraison/.virtualenvs

Functions

$ mkvirtualenv my-new-project
(my-new-project) $

To stop using the environment

(my-new-project) $ deactivate
$
$ workon
my-new-project

To activate the environment

$ workon my-new-project

-p parameter to choose the Python Version

$ virtualenv -p $(which python3) blog_virtualenv