Add comprehensive README with setup and usage instructions

2025-11-05 22:43:06 +04:00 · 2025-11-05 22:43:06 +04:00 · f294ac155e
commit f294ac155e
parent 8f217a87c2
1 changed files with 270 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,270 @@
 # Vendor Report Generator
 A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.
 ## Features
 - **Direct Excel Processing**: Reads Excel files directly using pandas (no LLM required)
 - **Data Normalization**: Automatically normalizes vendor names, statuses, and priorities
 - **24-Hour Updates**: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
 - **Priority Tracking**: Groups items by priority levels (Very High, High, Medium, Low)
 - **Oldest Unaddressed Items**: Identifies and highlights the oldest 3 unaddressed items per vendor
 - **Interactive HTML Reports**: Generates searchable, filterable HTML reports with tabs and filters
 - **JSON Export**: Exports structured JSON data for further processing
 ## Requirements
 - Python 3.8 or higher
 - Dependencies listed in `requirements.txt`
 ## Installation
 1. **Clone the repository**:
   ```bash
   git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
   cd vendor_report
   ```
 2. **Create a virtual environment** (recommended):
   ```bash
   python3 -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```
 3. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```
 ## Setup
 1. **Prepare your Excel files**:
   - Place your Excel files (`.xlsx` or `.xls`) in the `reports/` directory
   - Ensure your Excel files have the following columns (in order):
     - Column 0: Punchlist Name
     - Column 1: Vendor
     - Column 2: Priority
     - Column 3: Description
     - Column 4: Date Identified
     - Column 5: Status Updates
     - Column 6: Issue Image
     - Column 7: Status
     - Column 8: Date Completed (optional)
 2. **Create necessary directories** (if they don't exist):
   ```bash
   mkdir -p reports output
   ```
 ## Usage
 ### Basic Usage
 Generate a report from Excel files in the `reports/` directory:
 ```bash
 python3 report_generator.py
 ```
 This will:
 - Process all Excel files in the `reports/` directory
 - Generate a JSON report at `output/report.json`
 - Generate an HTML report at `output/report.html`
 - Save preprocessed data to `output/preprocessed_data.txt`
 ### Command-Line Options
 ```bash
 python3 report_generator.py [OPTIONS]
 ```
 **Options**:
 - `--reports-dir DIR`: Directory containing Excel files (default: `reports`)
 - `--output FILE`: Output JSON file path (default: `output/report.json`)
 - `--verbose`: Print verbose output (default: True)
 **Examples**:
 ```bash
 # Use a custom reports directory
 python3 report_generator.py --reports-dir /path/to/excel/files
 # Specify custom output file
 python3 report_generator.py --output /path/to/output/report.json
 # Combine options
 python3 report_generator.py --reports-dir my_reports --output my_output/report.json
 ```
 ### Programmatic Usage
 You can also use the report generator in your own Python scripts:
 ```python
 from report_generator import generate_report
 # Generate report with default settings
 report_data = generate_report()
 # Or with custom settings
 report_data = generate_report(
    reports_dir="my_reports",
    output_file="my_output/report.json",
    verbose=True
 )
 # report_data is a dictionary containing the full report structure
 print(f"Processed {len(report_data['vendors'])} vendors")
 ```
 ## Report Structure
 ### JSON Report Structure
 The generated JSON report follows this structure:
 ```json
 {
  "report_generated_at": "2025-11-05T22:00:00",
  "vendors": [
    {
      "vendor_name": "VendorName",
      "total_items": 10,
      "closed_count": 5,
      "open_count": 3,
      "monitor_count": 2,
      "updates_24h": {
        "added": [...],
        "closed": [...],
        "changed_to_monitor": [...]
      },
      "oldest_unaddressed": [...],
      "very_high_priority_items": [...],
      "high_priority_items": [...],
      "closed_items": [...],
      "monitor_items": [...],
      "open_items": [...]
    }
  ],
  "summary": {
    "total_vendors": 5,
    "total_items": 50,
    "total_closed": 25,
    "total_open": 15,
    "total_monitor": 10
  }
 }
 ```
 ### HTML Report Features
 The HTML report includes:
 - **Summary Cards**: Overview statistics at the top
 - **Vendor Tabs**: Quick navigation between vendors
 - **Status Tabs**: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
 - **Search & Filters**: 
  - Search by item name or description
  - Filter by vendor, status, or priority
 - **Quick Filters**: 
  - Show only vendors with yesterday's updates
  - Show only vendors with oldest unaddressed items
  - Show all vendors
 - **Interactive Elements**: Click tabs to switch views, use filters to narrow down results
 ## Data Processing Details
 ### Vendor Name Normalization
 The tool automatically normalizes vendor names:
 - Handles case variations (e.g., "autstand" → "Autstand")
 - Preserves intentional capitalization (e.g., "AutStand" stays as-is)
 - Normalizes combined vendors (e.g., "Autstand/Beumer")
 - Handles vendors in parentheses (e.g., "MFO (Amazon)")
 ### Status Normalization
 Statuses are normalized to:
 - **Complete**: Items with status containing "complete" or "complette"
 - **Monitor**: Items with status containing "monitor" or "montor"
 - **Incomplete**: All other items (default)
 ### Priority Classification
 Priorities are classified as:
 - **Very High**: Priority contains "(1) Very High" or "Very High"
 - **High**: Priority contains "(2) High" or "High" (but not "Very High")
 - **Medium**: Priority contains "(3) Medium" or "Medium"
 - **Low**: Priority contains "(4) Low" or "Low"
 ### 24-Hour Window Calculation
 The tool uses **Baltimore/Eastern timezone (America/New_York)** for calculating 24-hour updates:
 - Items are considered "added in last 24h" if their `date_identified` falls on yesterday's date
 - Items are considered "closed in last 24h" if their `date_completed` falls on yesterday's date
 - Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window
 ## Output Files
 After running the generator, you'll find:
 - `output/report.json`: Structured JSON report data
 - `output/report.html`: Interactive HTML report (open in browser)
 - `output/preprocessed_data.txt`: Human-readable preprocessed data (for debugging)
 ## Project Structure
 ```
 vendor_report/
 ├── report_generator.py      # Main report generation script
 ├── data_preprocessor.py     # Excel data preprocessing and normalization
 ├── html_generator.py        # HTML report generation
 ├── models.py                # Pydantic data models
 ├── excel_to_text.py         # Utility for Excel to text conversion
 ├── requirements.txt         # Python dependencies
 ├── reports/                 # Directory for input Excel files
 ├── output/                  # Directory for generated reports
 └── README.md               # This file
 ```
 ## Troubleshooting
 ### No Excel files found
 Ensure your Excel files are in the `reports/` directory and have `.xlsx` or `.xls` extensions.
 ### Date parsing errors
 The tool supports common date formats:
 - `MM/DD/YY` (e.g., `10/14/25`)
 - `MM/DD/YYYY` (e.g., `10/14/2025`)
 - `YYYY-MM-DD` (e.g., `2025-10-17`)
 - `YYYY-MM-DD HH:MM:SS` (e.g., `2025-10-17 00:00:00`)
 ### Permission errors
 If you encounter permission errors, ensure you have write access to the `output/` directory.
 ### Missing dependencies
 If you get import errors, ensure all dependencies are installed:
 ```bash
 pip install -r requirements.txt
 ```
 ## Timezone Notes
 The tool uses **Baltimore/Eastern timezone (America/New_York)** for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.
 ## License
 [Add your license information here]
 ## Contributing
 [Add contribution guidelines if applicable]
 ## Support
 For issues or questions, please contact [your contact information or issue tracker URL].