View Issue Details

IDProjectCategoryView StatusLast Update
0001125libextractorpluginspublic2007-01-01 18:48
Reportercyberix Assigned ToChristian Grothoff  
PrioritynormalSeverityfeatureReproducibilityN/A
Status closedResolutionfixed 
Product VersionGit master 
Summary0001125: deserialize - split keywords at potential serial numbering
DescriptionAfter split extractor has done the regular splitting, recall it with numbers (1234567890) as splitpoints. This allows user to find all instances of files with a serial numbering naming convention, which is widely used for pictures. We should end up with something like...

Keywords for file mozarella_cheese12.png:
split - cheese12
split - mozarella
deserial - cheese
deserial - mozarella_cheese
filename - mozarella_cheese12.png
TagsNo tags attached.

Activities

Christian Grothoff

2006-12-01 21:47

manager   ~0002803

Sounds to me like an extra extractor "duplicate and remove numbers" (or other tokens as specified by user configuration). Makes sense, should be implemented.

milan

2006-12-17 09:15

reporter   ~0002822

Maybe it would be nice to add also non-alphanumeric characters like - _ ( ! ; < # and so on.

Christian Grothoff

2006-12-28 17:13

manager   ~0002841

The specific set of characters should be configured via extractor options.

Christian Grothoff

2006-12-28 19:23

manager   ~0002842

Fixed a bug in splitextractor. As a result, the above behavior can now be achieved using:

-l "libextractor_filename:-libextractor_split:-libextractor_split(0123456789)"
 
I'll make this the default in config-client.scm.

Issue History

Date Modified Username Field Change
2006-07-02 05:50 cyberix New Issue
2006-12-01 21:47 Christian Grothoff Note Added: 0002803
2006-12-01 21:48 Christian Grothoff Status new => confirmed
2006-12-17 09:15 milan Note Added: 0002822
2006-12-28 17:13 Christian Grothoff Note Added: 0002841
2006-12-28 19:22 Christian Grothoff Status confirmed => assigned
2006-12-28 19:22 Christian Grothoff Assigned To => Christian Grothoff
2006-12-28 19:23 Christian Grothoff Note Added: 0002842
2006-12-28 19:24 Christian Grothoff Status assigned => resolved
2006-12-28 19:24 Christian Grothoff Resolution open => fixed
2007-01-01 18:48 Christian Grothoff Status resolved => closed