Add comprehensive README with setup and usage instructions

2025-11-05 22:43:06 +04:00 · 2025-11-05 22:43:06 +04:00 · f294ac155e
commit f294ac155e
parent 8f217a87c2
1 changed files with 270 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,270 @@
+# Vendor Report Generator
+
+A Python tool that generates comprehensive vendor punchlist reports from Excel files. The tool processes Excel data, normalizes vendor information, calculates metrics, and generates both JSON and interactive HTML reports.
+
+## Features
+
+- **Direct Excel Processing**: Reads Excel files directly using pandas (no LLM required)
+- **Data Normalization**: Automatically normalizes vendor names, statuses, and priorities
+- **24-Hour Updates**: Tracks items added, closed, or changed to monitor status in the last 24 hours (based on Baltimore/Eastern timezone)
+- **Priority Tracking**: Groups items by priority levels (Very High, High, Medium, Low)
+- **Oldest Unaddressed Items**: Identifies and highlights the oldest 3 unaddressed items per vendor
+- **Interactive HTML Reports**: Generates searchable, filterable HTML reports with tabs and filters
+- **JSON Export**: Exports structured JSON data for further processing
+
+## Requirements
+
+- Python 3.8 or higher
+- Dependencies listed in `requirements.txt`
+
+## Installation
+
+1. **Clone the repository**:
+   ```bash
+   git clone https://gitea.lci.ge/ilia.gurielidze/vendor_report.git
+   cd vendor_report
+   ```
+
+2. **Create a virtual environment** (recommended):
+   ```bash
+   python3 -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   ```
+
+3. **Install dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+## Setup
+
+1. **Prepare your Excel files**:
+   - Place your Excel files (`.xlsx` or `.xls`) in the `reports/` directory
+   - Ensure your Excel files have the following columns (in order):
+     - Column 0: Punchlist Name
+     - Column 1: Vendor
+     - Column 2: Priority
+     - Column 3: Description
+     - Column 4: Date Identified
+     - Column 5: Status Updates
+     - Column 6: Issue Image
+     - Column 7: Status
+     - Column 8: Date Completed (optional)
+
+2. **Create necessary directories** (if they don't exist):
+   ```bash
+   mkdir -p reports output
+   ```
+
+## Usage
+
+### Basic Usage
+
+Generate a report from Excel files in the `reports/` directory:
+
+```bash
+python3 report_generator.py
+```
+
+This will:
+- Process all Excel files in the `reports/` directory
+- Generate a JSON report at `output/report.json`
+- Generate an HTML report at `output/report.html`
+- Save preprocessed data to `output/preprocessed_data.txt`
+
+### Command-Line Options
+
+```bash
+python3 report_generator.py [OPTIONS]
+```
+
+**Options**:
+- `--reports-dir DIR`: Directory containing Excel files (default: `reports`)
+- `--output FILE`: Output JSON file path (default: `output/report.json`)
+- `--verbose`: Print verbose output (default: True)
+
+**Examples**:
+
+```bash
+# Use a custom reports directory
+python3 report_generator.py --reports-dir /path/to/excel/files
+
+# Specify custom output file
+python3 report_generator.py --output /path/to/output/report.json
+
+# Combine options
+python3 report_generator.py --reports-dir my_reports --output my_output/report.json
+```
+
+### Programmatic Usage
+
+You can also use the report generator in your own Python scripts:
+
+```python
+from report_generator import generate_report
+
+# Generate report with default settings
+report_data = generate_report()
+
+# Or with custom settings
+report_data = generate_report(
+    reports_dir="my_reports",
+    output_file="my_output/report.json",
+    verbose=True
+)
+
+# report_data is a dictionary containing the full report structure
+print(f"Processed {len(report_data['vendors'])} vendors")
+```
+
+## Report Structure
+
+### JSON Report Structure
+
+The generated JSON report follows this structure:
+
+```json
+{
+  "report_generated_at": "2025-11-05T22:00:00",
+  "vendors": [
+    {
+      "vendor_name": "VendorName",
+      "total_items": 10,
+      "closed_count": 5,
+      "open_count": 3,
+      "monitor_count": 2,
+      "updates_24h": {
+        "added": [...],
+        "closed": [...],
+        "changed_to_monitor": [...]
+      },
+      "oldest_unaddressed": [...],
+      "very_high_priority_items": [...],
+      "high_priority_items": [...],
+      "closed_items": [...],
+      "monitor_items": [...],
+      "open_items": [...]
+    }
+  ],
+  "summary": {
+    "total_vendors": 5,
+    "total_items": 50,
+    "total_closed": 25,
+    "total_open": 15,
+    "total_monitor": 10
+  }
+}
+```
+
+### HTML Report Features
+
+The HTML report includes:
+
+- **Summary Cards**: Overview statistics at the top
+- **Vendor Tabs**: Quick navigation between vendors
+- **Status Tabs**: Filter by status (All, Yesterday's Updates, Oldest Unaddressed, Closed, Monitor, Open)
+- **Search & Filters**: 
+  - Search by item name or description
+  - Filter by vendor, status, or priority
+- **Quick Filters**: 
+  - Show only vendors with yesterday's updates
+  - Show only vendors with oldest unaddressed items
+  - Show all vendors
+- **Interactive Elements**: Click tabs to switch views, use filters to narrow down results
+
+## Data Processing Details
+
+### Vendor Name Normalization
+
+The tool automatically normalizes vendor names:
+- Handles case variations (e.g., "autstand" → "Autstand")
+- Preserves intentional capitalization (e.g., "AutStand" stays as-is)
+- Normalizes combined vendors (e.g., "Autstand/Beumer")
+- Handles vendors in parentheses (e.g., "MFO (Amazon)")
+
+### Status Normalization
+
+Statuses are normalized to:
+- **Complete**: Items with status containing "complete" or "complette"
+- **Monitor**: Items with status containing "monitor" or "montor"
+- **Incomplete**: All other items (default)
+
+### Priority Classification
+
+Priorities are classified as:
+- **Very High**: Priority contains "(1) Very High" or "Very High"
+- **High**: Priority contains "(2) High" or "High" (but not "Very High")
+- **Medium**: Priority contains "(3) Medium" or "Medium"
+- **Low**: Priority contains "(4) Low" or "Low"
+
+### 24-Hour Window Calculation
+
+The tool uses **Baltimore/Eastern timezone (America/New_York)** for calculating 24-hour updates:
+- Items are considered "added in last 24h" if their `date_identified` falls on yesterday's date
+- Items are considered "closed in last 24h" if their `date_completed` falls on yesterday's date
+- Items are considered "changed to monitor" if their status is Monitor and the date falls within the 24-hour window
+
+## Output Files
+
+After running the generator, you'll find:
+
+- `output/report.json`: Structured JSON report data
+- `output/report.html`: Interactive HTML report (open in browser)
+- `output/preprocessed_data.txt`: Human-readable preprocessed data (for debugging)
+
+## Project Structure
+
+```
+vendor_report/
+├── report_generator.py      # Main report generation script
+├── data_preprocessor.py     # Excel data preprocessing and normalization
+├── html_generator.py        # HTML report generation
+├── models.py                # Pydantic data models
+├── excel_to_text.py         # Utility for Excel to text conversion
+├── requirements.txt         # Python dependencies
+├── reports/                 # Directory for input Excel files
+├── output/                  # Directory for generated reports
+└── README.md               # This file
+```
+
+## Troubleshooting
+
+### No Excel files found
+
+Ensure your Excel files are in the `reports/` directory and have `.xlsx` or `.xls` extensions.
+
+### Date parsing errors
+
+The tool supports common date formats:
+- `MM/DD/YY` (e.g., `10/14/25`)
+- `MM/DD/YYYY` (e.g., `10/14/2025`)
+- `YYYY-MM-DD` (e.g., `2025-10-17`)
+- `YYYY-MM-DD HH:MM:SS` (e.g., `2025-10-17 00:00:00`)
+
+### Permission errors
+
+If you encounter permission errors, ensure you have write access to the `output/` directory.
+
+### Missing dependencies
+
+If you get import errors, ensure all dependencies are installed:
+```bash
+pip install -r requirements.txt
+```
+
+## Timezone Notes
+
+The tool uses **Baltimore/Eastern timezone (America/New_York)** for all date calculations. This ensures consistent 24-hour window calculations regardless of where the script is run. All dates are stored as timezone-aware datetime objects.
+
+## License
+
+[Add your license information here]
+
+## Contributing
+
+[Add contribution guidelines if applicable]
+
+## Support
+
+For issues or questions, please contact [your contact information or issue tracker URL].
+