View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0001125 | libextractor | plugins | public | 2006-07-02 05:50 | 2007-01-01 18:48 |
| Reporter | cyberix | Assigned To | Christian Grothoff | ||
| Priority | normal | Severity | feature | Reproducibility | N/A |
| Status | closed | Resolution | fixed | ||
| Product Version | Git master | ||||
| Summary | 0001125: deserialize - split keywords at potential serial numbering | ||||
| Description | After split extractor has done the regular splitting, recall it with numbers (1234567890) as splitpoints. This allows user to find all instances of files with a serial numbering naming convention, which is widely used for pictures. We should end up with something like... Keywords for file mozarella_cheese12.png: split - cheese12 split - mozarella deserial - cheese deserial - mozarella_cheese filename - mozarella_cheese12.png | ||||
| Tags | No tags attached. | ||||
|
|
Sounds to me like an extra extractor "duplicate and remove numbers" (or other tokens as specified by user configuration). Makes sense, should be implemented. |
|
|
Maybe it would be nice to add also non-alphanumeric characters like - _ ( ! ; < # and so on. |
|
|
The specific set of characters should be configured via extractor options. |
|
|
Fixed a bug in splitextractor. As a result, the above behavior can now be achieved using: -l "libextractor_filename:-libextractor_split:-libextractor_split(0123456789)" I'll make this the default in config-client.scm. |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2006-07-02 05:50 | cyberix | New Issue | |
| 2006-12-01 21:47 | Christian Grothoff | Note Added: 0002803 | |
| 2006-12-01 21:48 | Christian Grothoff | Status | new => confirmed |
| 2006-12-17 09:15 | milan | Note Added: 0002822 | |
| 2006-12-28 17:13 | Christian Grothoff | Note Added: 0002841 | |
| 2006-12-28 19:22 | Christian Grothoff | Status | confirmed => assigned |
| 2006-12-28 19:22 | Christian Grothoff | Assigned To | => Christian Grothoff |
| 2006-12-28 19:23 | Christian Grothoff | Note Added: 0002842 | |
| 2006-12-28 19:24 | Christian Grothoff | Status | assigned => resolved |
| 2006-12-28 19:24 | Christian Grothoff | Resolution | open => fixed |
| 2007-01-01 18:48 | Christian Grothoff | Status | resolved => closed |