About Categenie

Intelligent data cleaning for attribute-value standardization

Our Mission

Transform messy, inconsistent attribute data into clean, structured, and validated information that's ready for analysis and application.

How It Works

1
Remove Prefixes

Strip "rp_" and other prefixes from values

2
Numerical Without Units

Extract pure numbers, decimals, fractions

3
Numerical With Units

Validate and standardize unit measurements

4
Non-Numerical Values

Classify categorical and varchar data

5
Thread Sizes

Parse thread specifications (e.g., 1/2"-13)

6
Dual Units

Handle compound units (e.g., cu in, degree F)

7
Mixed Fractions

Convert mixed numbers (e.g., 2 3/4 in)

8
Dimensions

Split L x W x H into separate values

9
"To" Ranges

Parse range values (e.g., 100 to 500 km)

10
"+/-" Ranges

Handle tolerance values (+/- 0.05)

11
Moderation Filter

Flag ambiguous values for review

What You Get

Cleaned Data
  • Standardized units
  • Validated formats
  • Structured data types
  • Human-readable display values
  • Ready for immediate use
Needs Review
  • Invalid units flagged
  • Parsing failures identified
  • Edge cases highlighted
  • Ambiguous patterns marked
  • Full context preserved

Technology

Built with Python, Flask, and pandas, Categenie leverages intelligent pattern matching and validation to automatically clean and structure your data.

Python

Flask

Pandas