Introduction



Textanz is text analysis tool that calculates frequencies of repeatable phrases and wordforms in document. The information obtained from Textanz is useful for :


Textanz is not text editor and does intend to compete against the army of excellent well-known editors created for various specific formats.  Instead, Textanz is attempting to recognise known document types and extract plain text from them for analysis. This extraction job is delegated to Apache Tika suite of parsers. Tika project page contains link to the list of supported formats so you can always check if particular exotic one  can be understood by Textanz. Of course, all popular formats are supported : html, xml, rtf, pdf, MS Office, OpenOffice .


Installation

Once you have downloaded textanz.zip, do the following steps : If everything is correct, Textanz main window should open.

Terms and principles


Word. Textanz does not use dictionaries or language-specific rules. Any sequence of alphanumeric characters is being treated as word. Examples of words :

book
WinXP
2011
12Ae8N$$$

Phrase is any combination of N words (N > 0), that does not contain phrase delimiters. Delimiters are characters that normally terminates the sentence :  point, question mark, exclamation mark. All other word delimiters - semicolon, comma etc. are just ignored. Different form of spacing and new linefeeds also does not make difference. The only important thing for Textanz is sequence of words. Example of two equal phrases :

Don't worry be happy.

Don't  worry,
be
happy!

Wordform. Since Textanz does not know morphology of any language, any continous part of the word is being considered as wordform. The text can be of natural language, fantasy language,
programming language, set of digits.

Frequency is number of occurrences of particular phrase, word or wordform in text. Textanz calculates all frequencies greater than 1 , i.e. any fragment repeated at least twice
will be found.


Loading the text

Use "Text" menu or toolbar buttons to load the text for analysis. Textanz offers 3 ways of loading:
Please note that extraction from complex document may cause some delay before you see text in Textanz. Additional time is required to download document from remote URL.  The text appeas on the right pane. Whenever this tab is not empty, "Calculate" menu items are enabled and text can be analysed for frequencies.
You can open multiple documents in Textanz. Each document will be opened in separate tab.


Calculating frequencies and seeing results

Use "Calculate" menu or or toolbar buttons to start frequency calculation for phrases or wordforms. Textanz willl display progress indicator during the calculation , and "Cancel" action is available to interrupt the process. Once the calculation is finished, program populates frequency table with results.
If multiple text tabs are opened, Textanz will calculate frequencies across all loaded documents.

Phrase frequencies table contains columns :
"Phrase" - the phrase itself 
"Frequency" - number of occurrences
"Length" - number of words in phrase

Wordform frequencies table contains columns :
"Wordform" - the wordform
"Frequency" - number of occurrences
"Length" - number of characters in wordform

The default sorting order for phrase frequency records is by phrase length , then by frequency. Wordforms are sorted by frequency then length. Column titles in both tables acts as buttons which will reorder records in acsending or descending order by the corresponding column.

Filter box above the table allows to limit displayed phrases or wordforms to only those containing the typed string.

In order to see the occurrences of phrase or word in source text, select the table row and use "Highlight positions" menu item or toolbar button or mouse doubleclick.  You can select multiple rows by using CTRL or SHIFT key together with mouse button. Navigation markers on the left egde of pane works as hyperlinks scrolling position into the view. Please note that if calculation was made for multiple texts, particular text tab may contain no occurrences of some word/phrase. In such case , "NO OCCURRENCES" message will be displayed.
Using "Shift" and/or "Ctrl" with mouse, you can select multiple rows in a table and then highlight positions for them all in the text.
"Highlight positions" and "Clear highlighting" commands applies to the selected text tab only. These functions will not be available if text was loaded to the tab already after the calculation (new calculation is required to add results for this text).

Exporting calculation results


Textanz offers 3 forms of export to external files :
Invocation of any export action opens standard "Save as" .. dialog with corresponding default file extension . Dialog message informs about the sussessful export upon completion.


Configuration


Configuration dialog is available via "Calculation".."Configuration" menu item or toolbar button. It contains the following tabs :

General : configuration parameters for all types of calculation  :
Phrases : parameters specific to phrase frequency  :
Wordforms : parameters specific to wordform frequency  :
Language : parameters specific to text language

The following buttons can be used in Configuration dialog :
Apply - settings will be used in current session but not saved. Next time you launch Textanz the previous configuration will be restored.
Save - settings will be saved in configuration files and used all the next times.
Close - close the configuration dialog.

Textanz saves configuration in /conf subdirectory of TEXTANZ_HOME , or in user home directory of your operating system if TEXTANZ_HOME is no set .

Registration


Textanz application is being distributed as trialware. That means after the trial period of 30 days user must purchase a license to continue using this tool for either personal or professional purposes. We believe that you are interested in further evolution of Textanz and respect the work already done.    
Registered users gets future versions of Textanz without additional charge. "Check for updates" command will open web-page with Textanz change log. You are always welcome to send comments, suggestions and update requests to info@textanz.com.