ilia gu cf28d462b2 Fix status handling and deduplication
- Fix incomplete status detection: check for 'incomplete' before 'complete' to avoid substring match bug
- Change incomplete status badge color to yellow/orange (warning) instead of red
- Remove redundant 'Open' status card, keep only actual statuses (Closed, Monitor, Incomplete)
- Add deduplication at Excel processing level to prevent duplicate items
- Skip Sheet1 and Comments sheets which contain duplicate data
- Improve HTML deduplication when combining items for All tab
- Fix open_count to use incomplete_count for consistency
2025-11-08 18:20:06 +04:00
2025-11-05 22:40:20 +04:00
2025-11-05 22:40:20 +04:00
2025-11-05 22:40:20 +04:00

Vendor Report Generator

A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.

Features

  • Direct Excel Processing: Reads Excel files directly using pandas
  • Data Normalization: Automatically normalizes vendor names, statuses, and priorities
  • 24-Hour Updates: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
  • Priority Tracking: Groups items by priority levels (Very High, High, Medium, Low)
  • Oldest Unaddressed Items: Identifies and highlights the oldest 3 unaddressed items per vendor
  • Interactive HTML Reports: Generates searchable, filterable HTML reports with tabs and filters
  • JSON Export: Exports structured JSON data for further processing

Requirements

  • Python 3.8 or higher
  • Dependencies listed in requirements.txt

Installation

  1. Clone the repository:

    git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
    cd vendor_report
    
  2. Create a virtual environment (recommended):

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    

Setup

  1. Prepare your Excel files:

    • Place your Excel files (.xlsx or .xls) in the reports/ directory
    • Ensure your Excel files have the following columns (in order):
      • Column 0: Punchlist Name
      • Column 1: Vendor
      • Column 2: Priority
      • Column 3: Description
      • Column 4: Date Identified
      • Column 5: Status Updates
      • Column 6: Issue Image
      • Column 7: Status
      • Column 8: Date Completed (optional)
  2. Create necessary directories (if they don't exist):

    mkdir -p reports output
    

Usage

Basic Usage

Generate a report from Excel files in the reports/ directory:

python3 report_generator.py

This will:

  • Process all Excel files in the reports/ directory
  • Generate a JSON report at output/report.json
  • Generate an HTML report at output/report.html
  • Save preprocessed data to output/preprocessed_data.txt

Command-Line Options

python3 report_generator.py [OPTIONS]

Options:

  • --reports-dir DIR: Directory containing Excel files (default: reports)
  • --output FILE: Output JSON file path (default: output/report.json)
  • --verbose: Print verbose output (default: True)

Examples:

# Use a custom reports directory
python3 report_generator.py --reports-dir /path/to/excel/files

# Specify custom output file
python3 report_generator.py --output /path/to/output/report.json

# Combine options
python3 report_generator.py --reports-dir my_reports --output my_output/report.json

Programmatic Usage

You can also use the report generator in your own Python scripts:

from report_generator import generate_report

# Generate report with default settings
report_data = generate_report()

# Or with custom settings
report_data = generate_report(
    reports_dir="my_reports",
    output_file="my_output/report.json",
    verbose=True
)

# report_data is a dictionary containing the full report structure
print(f"Processed {len(report_data['vendors'])} vendors")

Report Structure

JSON Report Structure

The generated JSON report follows this structure:

{
  "report_generated_at": "2025-11-05T22:00:00",
  "vendors": [
    {
      "vendor_name": "VendorName",
      "total_items": 10,
      "closed_count": 5,
      "open_count": 3,
      "monitor_count": 2,
      "updates_24h": {
        "added": [...],
        "closed": [...],
        "changed_to_monitor": [...]
      },
      "oldest_unaddressed": [...],
      "very_high_priority_items": [...],
      "high_priority_items": [...],
      "closed_items": [...],
      "monitor_items": [...],
      "open_items": [...]
    }
  ],
  "summary": {
    "total_vendors": 5,
    "total_items": 50,
    "total_closed": 25,
    "total_open": 15,
    "total_monitor": 10
  }
}

HTML Report Features

The HTML report includes:

  • Summary Cards: Overview statistics at the top
  • Vendor Tabs: Quick navigation between vendors
  • Status Tabs: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
  • Search & Filters:
    • Search by item name or description
    • Filter by vendor, status, or priority
  • Quick Filters:
    • Show only vendors with yesterday's updates
    • Show only vendors with oldest unaddressed items
    • Show all vendors
  • Interactive Elements: Click tabs to switch views, use filters to narrow down results

Data Processing Details

Vendor Name Normalization

The tool automatically normalizes vendor names:

  • Handles case variations (e.g., "autstand" → "Autstand")
  • Preserves intentional capitalization (e.g., "AutStand" stays as-is)
  • Normalizes combined vendors (e.g., "Autstand/Beumer")
  • Handles vendors in parentheses (e.g., "MFO (Amazon)")

Status Normalization

Statuses are normalized to:

  • Complete: Items with status containing "complete" or "complette"
  • Monitor: Items with status containing "monitor" or "montor"
  • Incomplete: All other items (default)

Priority Classification

Priorities are classified as:

  • Very High: Priority contains "(1) Very High" or "Very High"
  • High: Priority contains "(2) High" or "High" (but not "Very High")
  • Medium: Priority contains "(3) Medium" or "Medium"
  • Low: Priority contains "(4) Low" or "Low"

24-Hour Window Calculation

The tool uses Baltimore/Eastern timezone (America/New_York) for calculating 24-hour updates:

  • Items are considered "added in last 24h" if their date_identified falls on yesterday's date
  • Items are considered "closed in last 24h" if their date_completed falls on yesterday's date
  • Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window

Output Files

After running the generator, you'll find:

  • output/report.json: Structured JSON report data
  • output/report.html: Interactive HTML report (open in browser)
  • output/preprocessed_data.txt: Human-readable preprocessed data (for debugging)

Project Structure

vendor_report/
├── report_generator.py      # Main report generation script
├── data_preprocessor.py     # Excel data preprocessing and normalization
├── html_generator.py        # HTML report generation
├── models.py                # Pydantic data models
├── excel_to_text.py         # Utility for Excel to text conversion
├── requirements.txt         # Python dependencies
├── reports/                 # Directory for input Excel files
├── output/                  # Directory for generated reports
└── README.md               # This file

Troubleshooting

No Excel files found

Ensure your Excel files are in the reports/ directory and have .xlsx or .xls extensions.

Date parsing errors

The tool supports common date formats:

  • MM/DD/YY (e.g., 10/14/25)
  • MM/DD/YYYY (e.g., 10/14/2025)
  • YYYY-MM-DD (e.g., 2025-10-17)
  • YYYY-MM-DD HH:MM:SS (e.g., 2025-10-17 00:00:00)

Permission errors

If you encounter permission errors, ensure you have write access to the output/ directory.

Missing dependencies

If you get import errors, ensure all dependencies are installed:

pip install -r requirements.txt

Timezone Notes

The tool uses Baltimore/Eastern timezone (America/New_York) for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.

License

[Add your license information here]

Contributing

[Add contribution guidelines if applicable]

Support

For issues or questions, please contact [your contact information or issue tracker URL].

Description
No description provided
Readme 12 MiB
Languages
HTML 96.7%
Python 3.2%