Anda di halaman 1dari 32

Fieldwork: Software

Fieldwork: Software
Sebastian Nordho
Universiteit van Amsterdam

Sebastian Nordho Intro Preliminaries Capturing Cutting Backup

Workshop on Fieldwork, Universiteit van Amsterdam Second Session: Software 4 May 2007

Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Intro
Fieldwork: Software Sebastian Nordho Intro

How to work with the wonderful digital data your devices produce What to do when and in which order How to do it

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

From hardware storage to web publication


Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

The Problem
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Prevention of data loss


Fieldwork: Software Sebastian Nordho Intro

labeling work ow uncompressed better than compressed backup

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Accessibility and longevity


Fieldwork: Software

choose open formats


open formats are documented and will be readable 20 years from now proprietary formats are not documented and only known to their inventors

Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

KISS: Keep It Simple, Stupid choose the simplest le format that suits your needs very often *txt is as good as *doc or *odt *csv is often as good as *xls or *ods, but easier to convert into other formats
Sebastian Nordho Fieldwork: Software

Open and closed formats


Fieldwork: Software Sebastian Nordho Intro Preliminaries

meta051214.xls

Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating

meta051214.csv

Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

File formats
Fieldwork: Software

Good formats (open and documented)


you are sure that 100 years from now people will be able to make sense of your data sound: *wav *mp3 *flac *ogg video: *avi *mpeg text: *txt, *rtf, *odt (*html *xml) use unicode. Everywhere.

Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Bad formats (proprietary and non-documented)


There are good chances that no one will understand your data 100 years from now sounds: *wma, MiniDisc video: *wmv, *rm text: *doc

Sebastian Nordho

Fieldwork: Software

Capturing
Fieldwork: Software

Capturing is the process of transfering data from a storage medium to the computer For audio, solid state recorders make capturing obsolete One can just drag and drop the le from the storage medium on the computer le

Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Capturing
Fieldwork: Software Sebastian Nordho Intro

Video capturing takes real time 1h of *avi video takes 15 GB storage space I use premiere, but there are surely cheaper alternatives Do not use Windows Movie Maker, it gives you *wmv (evil format)

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Exporting and conversion


Fieldwork: Software Sebastian Nordho Intro

Video les tend to be very large Export audio, which is easier to handle Convert *avi video to *mpeg with the Tsunami encoder. This takes about real time, depending on the processor *mpeg les are 20 times smaller than *avi les

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Exporting and conversion


Fieldwork: Software Sebastian Nordho Intro

*wav can be converted into *mp3 or *ogg quite fast reduces sound quality sound quality is only an issue for phoneticians

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Editing
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup

You are a linguist, not a lm director


You generally do not want to edit video but there is specialized software for that available which will overwhelm you with its functions

Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Editing
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup

You are a linguist, not a rock star


You generally do not want to edit audio Editing audio is easier than editing video Some functions are trivial, e.g. normalize

Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Cutting
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

1h of sound takes a loooong time to transcribe better have smaller les these must be logically self-contained transcriber has a very nice function to cut a large les into smaller ones write down master-le and time-code! You can also cut video cutting *avi and *wav is easy, cutting compressed formats liks *mp3 or *mpg is more dicult

Sebastian Nordho

Fieldwork: Software

Backup
Fieldwork: Software Sebastian Nordho

Backup early, backup often Backup audio, video, transcriptions and your toolbox lexicon an empty DVD costs 1 EUR, a working hour of yours costs at least 10 EUR there are at least 10 working hours of yours contained in a DVD would you rather lose 1 EUR or 100 EUR ?

Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Backup
Fieldwork: Software Sebastian Nordho

decentralized storage to prevent data loss


Intro

I have a working copy on my laptop, a backup on an external drive As soon as new data reaches 4GB, I burn two DVDs. One stays with me, the other one goes to NL by mail that makes 4 copies altogether the question is not Am I paranoid? the question is Am I paranoid enough?

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Acoustic analysis
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

praat can do all kinds of acoustic analyses oscillogram, spectrogram, vowel formants, pitch extraction, beats too complex to treat here developed at the UvA, so you are at the source of knowledge

Sebastian Nordho

Fieldwork: Software

Praat
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Transcribing
Fieldwork: Software Sebastian Nordho Intro Preliminaries

The program transcriber is used for, well, transcribing rst run: segment. Playback: continuous second run: transcribe. Playback: loop

Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Filenames
Fieldwork: Software Sebastian Nordho

Store information about how spoke when to whom about what where in which language do it consistently labeling of les
P YYMMDD GEN 99 K 061231 nar 02 Kandy 31.12.2006 narrative 2nd

Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

more or less information also possible depending on aims very long names are not a good idea

Sebastian Nordho

Fieldwork: Software

Metadata
Fieldwork: Software

le types, le quality recording quality master source speakers


name age language prociencies contact details

Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

how to store metadata


easy: spreadsheet (Excel, OpenOceCalc) more complex: IMDI-Editor most complex: database

Sebastian Nordho

Fieldwork: Software

Annotating
Fieldwork: Software Sebastian Nordho

The transcribed le is imported into Toolbox Toolbox has semi-automatic interlinear glossing The dictionary is build incrementally ambiguity resolution word formulas resulting le can be exported to a variety of formats

Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Special characters
Fieldwork: Software Sebastian Nordho

[ChW;m ] " Special characters like IPA can be inserted with KeyMan
for Windows. Keyman does not work well with transcriber transcriber lets you dene special keys but forgets them when it closes down #%%#

Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Special characters
Fieldwork: Software Sebastian Nordho Intro

typing <E> for @ is much faster than typing <ctrl+shift+i><e><=><ctrl+shift+o> Search and replace afterwards be consistent

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Custom conversion
Well-dened opem formats can be converted into other well dened open formats :"Dutch": is easy to convert to <Language><Name>Dutch</Name></Language> meta051214.xls meta051214.csv
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating

You might want to learn PERL Practical Extracting and Reporting Language
Sebastian Nordho Fieldwork: Software

Enhancing Publishing References

Enhancing
Fieldwork: Software Sebastian Nordho Intro Preliminaries

import Toolbox into ELAN align with audio and video produce subtitles

Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

ELAN
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Publishing
Fieldwork: Software Sebastian Nordho Intro

GALOES is (will be) a grammar authoring platform that lets you create a web grammar with hyperlinks full text search multimedia support easy retrieval of source texts

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Software
Fieldwork: Software

Capturing
Premiere: www.adobe.com/products/premiere there is lots of other, cheaper software, but I have not tested it and am unable to give any recommendations.

Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Editing
Audacity: audacity.sourceforge.net; Professional: Cubase, Soundforge

Converting video
Tsunami: www.tmpgenc.net

Cutting
transcriber: trans.sourceforge.net

Sebastian Nordho

Fieldwork: Software

Software
Transcribing
transcriber: trans.sourceforge.net
Fieldwork: Software Sebastian Nordho Intro Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Annotating
Toolbox: www.sil.org/computing/toolbox

Conversion
Perl: www.perl.org/

Special Characters
Keyman: www.tavultesoft.com/keyman

Acoustic analysis
Praat: www.fon.hum.uva.nl/praat

Enhancing
ELAN: www.lat-mpi.eu/tools/elan

Publishing
GALOES: http://84.16.245.50/slmwiki
Sebastian Nordho Fieldwork: Software

Web sites
Fieldwork: Software Sebastian Nordho Intro

Best practices: www.e-meld.org/school Tools: www.lingweb.eva.mpg.de/eldtools/tools.htm Language Archive Newsletter: www.mpi.nl/LAN Link list: www.hrelp.org/languages/resources/orel

Preliminaries Capturing Cutting Backup Acoustic analysis Transcribing Metadata Annotating Enhancing Publishing References

Sebastian Nordho

Fieldwork: Software

Anda mungkin juga menyukai