vendor_report/README.md
2025-11-06 20:50:19 +04:00

15 KiB

Vendor Report Generator

A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.

📘 For Taskboard Integration: See TASKBOARD_INTEGRATION_CONTEXT.md for detailed context and integration possibilities.

Features

  • Direct Excel Processing: Reads Excel files directly using pandas
  • Data Normalization: Automatically normalizes vendor names, statuses, and priorities
  • 24-Hour Updates: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
  • Priority Tracking: Groups items by priority levels (Very High, High, Medium, Low)
  • Oldest Unaddressed Items: Identifies and highlights the oldest 3 unaddressed items per vendor
  • Interactive HTML Reports: Generates searchable, filterable HTML reports with tabs and filters
  • JSON Export: Exports structured JSON data for further processing
  • SharePoint Integration: Automatically download Excel files from SharePoint
  • Scheduled Generation: Automatically generate reports on a schedule (interval or cron)
  • Web API: REST API for on-demand report generation

Requirements

  • Python 3.8 or higher
  • Dependencies listed in requirements.txt

Installation

  1. Clone the repository:

    git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
    cd vendor_report
    
  2. Create a virtual environment (recommended):

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    

Setup

  1. Prepare your Excel files:

    • Place your Excel files (.xlsx or .xls) in the reports/ directory
    • Ensure your Excel files have the following columns (in order):
      • Column 0: Punchlist Name
      • Column 1: Vendor
      • Column 2: Priority
      • Column 3: Description
      • Column 4: Date Identified
      • Column 5: Status Updates
      • Column 6: Issue Image
      • Column 7: Status
      • Column 8: Date Completed (optional)
  2. Create necessary directories (if they don't exist):

    mkdir -p reports output
    

Usage

Basic Usage

Generate a report from Excel files in the reports/ directory:

python3 report_generator.py

This will:

  • Process all Excel files in the reports/ directory
  • Generate a JSON report at output/report.json
  • Generate an HTML report at output/report.html
  • Save preprocessed data to output/preprocessed_data.txt

Command-Line Options

python3 report_generator.py [OPTIONS]

Options:

  • --reports-dir DIR: Directory containing Excel files (default: reports)
  • --output FILE: Output JSON file path (default: output/report.json)
  • --verbose: Print verbose output (default: True)

Examples:

# Use a custom reports directory
python3 report_generator.py --reports-dir /path/to/excel/files

# Specify custom output file
python3 report_generator.py --output /path/to/output/report.json

# Combine options
python3 report_generator.py --reports-dir my_reports --output my_output/report.json

Programmatic Usage

You can also use the report generator in your own Python scripts:

from report_generator import generate_report

# Generate report with default settings
report_data = generate_report()

# Or with custom settings
report_data = generate_report(
    reports_dir="my_reports",
    output_file="my_output/report.json",
    verbose=True
)

# report_data is a dictionary containing the full report structure
print(f"Processed {len(report_data['vendors'])} vendors")

Report Structure

JSON Report Structure

The generated JSON report follows this structure:

{
  "report_generated_at": "2025-11-05T22:00:00",
  "vendors": [
    {
      "vendor_name": "VendorName",
      "total_items": 10,
      "closed_count": 5,
      "open_count": 3,
      "monitor_count": 2,
      "updates_24h": {
        "added": [...],
        "closed": [...],
        "changed_to_monitor": [...]
      },
      "oldest_unaddressed": [...],
      "very_high_priority_items": [...],
      "high_priority_items": [...],
      "closed_items": [...],
      "monitor_items": [...],
      "open_items": [...]
    }
  ],
  "summary": {
    "total_vendors": 5,
    "total_items": 50,
    "total_closed": 25,
    "total_open": 15,
    "total_monitor": 10
  }
}

HTML Report Features

The HTML report includes:

  • Summary Cards: Overview statistics at the top
  • Vendor Tabs: Quick navigation between vendors
  • Status Tabs: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
  • Search & Filters:
    • Search by item name or description
    • Filter by vendor, status, or priority
  • Quick Filters:
    • Show only vendors with yesterday's updates
    • Show only vendors with oldest unaddressed items
    • Show all vendors
  • Interactive Elements: Click tabs to switch views, use filters to narrow down results

Data Processing Details

Vendor Name Normalization

The tool automatically normalizes vendor names:

  • Handles case variations (e.g., "autstand" → "Autstand")
  • Preserves intentional capitalization (e.g., "AutStand" stays as-is)
  • Normalizes combined vendors (e.g., "Autstand/Beumer")
  • Handles vendors in parentheses (e.g., "MFO (Amazon)")

Status Normalization

Statuses are normalized to:

  • Complete: Items with status containing "complete" or "complette"
  • Monitor: Items with status containing "monitor" or "montor"
  • Incomplete: All other items (default)

Priority Classification

Priorities are classified as:

  • Very High: Priority contains "(1) Very High" or "Very High"
  • High: Priority contains "(2) High" or "High" (but not "Very High")
  • Medium: Priority contains "(3) Medium" or "Medium"
  • Low: Priority contains "(4) Low" or "Low"

24-Hour Window Calculation

The tool uses Baltimore/Eastern timezone (America/New_York) for calculating 24-hour updates:

  • Items are considered "added in last 24h" if their date_identified falls on yesterday's date
  • Items are considered "closed in last 24h" if their date_completed falls on yesterday's date
  • Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window

Output Files

After running the generator, you'll find:

  • output/report.json: Structured JSON report data
  • output/report.html: Interactive HTML report (open in browser)
  • output/preprocessed_data.txt: Human-readable preprocessed data (for debugging)

Project Structure

vendor_report/
├── report_generator.py      # Main report generation script
├── data_preprocessor.py     # Excel data preprocessing and normalization
├── html_generator.py        # HTML report generation
├── models.py                # Pydantic data models
├── excel_to_text.py         # Utility for Excel to text conversion
├── sharepoint_downloader.py # SharePoint file downloader
├── scheduler.py             # Scheduled report generation
├── api_server.py            # REST API for on-demand reports
├── web_ui.py                # Web UI for easy access
├── config.py                # Configuration management
├── config.yaml.template     # Configuration template
├── requirements.txt         # Python dependencies
├── reports/                 # Directory for input Excel files
├── output/                  # Directory for generated reports
└── README.md               # This file

Troubleshooting

No Excel files found

Ensure your Excel files are in the reports/ directory and have .xlsx or .xls extensions.

Date parsing errors

The tool supports common date formats:

  • MM/DD/YY (e.g., 10/14/25)
  • MM/DD/YYYY (e.g., 10/14/2025)
  • YYYY-MM-DD (e.g., 2025-10-17)
  • YYYY-MM-DD HH:MM:SS (e.g., 2025-10-17 00:00:00)

Permission errors

If you encounter permission errors, ensure you have write access to the output/ directory.

Missing dependencies

If you get import errors, ensure all dependencies are installed:

pip install -r requirements.txt

Timezone Notes

The tool uses Baltimore/Eastern timezone (America/New_York) for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.

SharePoint Integration

The application can automatically download Excel files from SharePoint before generating reports. This is useful when your source data is stored in SharePoint.

Setup SharePoint Integration

  1. Create a configuration file:

    cp config.yaml.template config.yaml
    
  2. Edit config.yaml and configure SharePoint settings:

    sharepoint:
      enabled: true
      site_url: "https://yourcompany.sharepoint.com/sites/YourSite"
      folder_path: "/Shared Documents/Reports"
      local_dir: "reports"
      use_app_authentication: true  # Recommended for automation
      client_id: "your-azure-ad-client-id"
      client_secret: "your-azure-ad-client-secret"
    
  3. Authentication Options:

    Option A: App Authentication (Recommended)

    • Register an app in Azure AD
    • Grant SharePoint permissions (Sites.Read.All or Sites.ReadWrite.All)
    • Use client_id and client_secret in config
    • Set use_app_authentication: true

    Option B: User Authentication

    • Use your SharePoint username and password
    • Set username and password in config
    • Set use_app_authentication: false
  4. Test SharePoint download:

    python sharepoint_downloader.py
    

Manual SharePoint Download

Download files from SharePoint without generating a report:

python sharepoint_downloader.py

Scheduled Report Generation

The application can automatically generate reports on a schedule, optionally downloading from SharePoint first.

Setup Scheduling

  1. Edit config.yaml:

    scheduler:
      enabled: true
      schedule_type: "interval"  # or "cron"
      interval_hours: 24  # Generate every 24 hours
      # OR use cron expression:
      # cron_expression: "0 8 * * *"  # 8 AM daily
      timezone: "America/New_York"
    
  2. Start the scheduler:

    python scheduler.py
    

    The scheduler will run continuously and generate reports according to your schedule.

  3. Schedule Types:

    • interval: Generate report every N hours
    • cron: Use cron expression for precise scheduling (e.g., "0 8 * * *" for 8 AM daily)
    • once: Run once immediately (for testing)

Running Scheduler as a Service

Linux (systemd):

# Create service file: /etc/systemd/system/vendor-report-scheduler.service
[Unit]
Description=Vendor Report Scheduler
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/vendor_report
ExecStart=/usr/bin/python3 /path/to/vendor_report/scheduler.py
Restart=always

[Install]
WantedBy=multi-user.target

# Enable and start
sudo systemctl enable vendor-report-scheduler
sudo systemctl start vendor-report-scheduler

Windows (Task Scheduler):

  • Create a scheduled task that runs python scheduler.py at startup or on a schedule

Web UI & On-Demand Report Generation

The application includes both a Web UI and a REST API for generating reports on demand.

A simple, user-friendly web interface for generating reports without using the terminal.

  1. Start the Web UI server:

    python web_ui.py
    
  2. Open in browser:

    http://localhost:8080
    
  3. Features:

    • One-click report generation
    • Download from SharePoint & generate (single button)
    • View generated reports
    • View service status
    • View configuration
    • No terminal knowledge required!

REST API

The application also includes a REST API for integration with other systems or manual triggers.

Setup API Server

  1. Edit config.yaml:

    api:
      enabled: true
      host: "0.0.0.0"
      port: 8080
      api_key: "your-secret-api-key"  # Optional, for authentication
    
  2. Start the Web UI (recommended):

    python web_ui.py
    

    Then open http://localhost:8080 in your browser.

    OR start the API server (for programmatic access):

    python api_server.py
    
  3. Generate report via API:

    # Without authentication
    curl -X POST http://localhost:8080/api/generate \
      -H "Content-Type: application/json" \
      -d '{"download_from_sharepoint": true}'
    
    # With API key authentication
    curl -X POST http://localhost:8080/api/generate \
      -H "Content-Type: application/json" \
      -H "X-API-Key: your-secret-api-key" \
      -d '{"download_from_sharepoint": true}'
    

API Endpoints

  • POST /api/generate: Generate report on demand

    • Request body (optional):
      {
        "download_from_sharepoint": true,
        "reports_dir": "reports",
        "output_file": "output/report.json"
      }
      
  • GET /api/status: Get service status and configuration

  • GET /health: Health check endpoint

Example: Integration with Webhook

You can trigger report generation from SharePoint webhooks, Power Automate, or any HTTP client:

import requests

response = requests.post(
    'http://your-server:8080/api/generate',
    json={'download_from_sharepoint': True},
    headers={'X-API-Key': 'your-api-key'}
)
print(response.json())

Configuration

The application uses a YAML configuration file (config.yaml) for all settings. You can also use environment variables:

Environment Variables

# SharePoint
export SHAREPOINT_ENABLED=true
export SHAREPOINT_SITE_URL="https://yourcompany.sharepoint.com/sites/YourSite"
export SHAREPOINT_FOLDER_PATH="/Shared Documents/Reports"
export SHAREPOINT_CLIENT_ID="your-client-id"
export SHAREPOINT_CLIENT_SECRET="your-client-secret"
export SHAREPOINT_USE_APP_AUTH=true

# Scheduler
export SCHEDULER_ENABLED=true
export SCHEDULER_INTERVAL_HOURS=24

# API
export API_ENABLED=true
export API_PORT=8080
export API_KEY="your-api-key"

Complete Workflow Example

Here's a complete example setup for automated SharePoint → Report generation:

  1. Setup configuration (config.yaml):

    sharepoint:
      enabled: true
      site_url: "https://company.sharepoint.com/sites/Reports"
      folder_path: "/Shared Documents/Vendor Reports"
      use_app_authentication: true
      client_id: "your-client-id"
      client_secret: "your-client-secret"
    
    scheduler:
      enabled: true
      schedule_type: "cron"
      cron_expression: "0 8 * * *"  # 8 AM daily
      timezone: "America/New_York"
    
    report:
      output_dir: "output"
      reports_dir: "reports"
    
  2. Start scheduler:

    python scheduler.py
    
  3. The scheduler will:

    • Download latest Excel files from SharePoint at 8 AM daily
    • Generate reports automatically
    • Save to output/report.json and output/report.html

License

[Add your license information here]

Contributing

[Add contribution guidelines if applicable]

Support

For issues or questions, please contact [your contact information or issue tracker URL].