ACL Workshop on Computation and Written Language (CAWL)

Most work on NLP focuses on language in its canonical written form. This has often led researchers to ignore the differences between written and spoken language or, worse, to conflate the two. Instances of conflation are statements like “Chinese is a logographic language" or “Persian is a right-to-left language", variants of which can be found frequently in the ACL anthology. These statements confuse properties of the language with properties of its writing system. Ignoring differences between written and spoken language leads, among other things, to conflating different words that are spelled the same (e.g., English bass), or treating as different, words that have multiple spellings (e.g., Japanese umai ‘tasty’, which can be written 旨い, うまい, ウマい, or 美味い).

Furthermore, methods for dealing with written language issues (e.g., various kinds of normalization or conversion) or for recognizing text input (e.g. OCR & handwriting recognition or text entry methods) are often regarded as precursors to NLP rather than as fundamental parts of the enterprise, despite the fact that most NLP methods rely centrally on representations derived from text rather than (spoken) language. This general lack of consideration of writing has led to much of the research on such topics to largely appear outside of ACL venues, in conferences or journals of neighboring fields such as speech technology (e.g., text normalization) or human-computer interaction (e.g., text entry).

We will invite submissions on the relationship between written and spoken language, the properties of written language, the ways in which writing systems encode language, and applications specifically focused on characteristics of writing systems.

Schedule

9:00-9:05	Organizers	Open remarks
9:05-9:15	Position paper: Kyle Gorman and Richard Sproat	Myths about writing systems in speech & language technology
9:15-10:15	Invited speaker: Mark Aronoff	Paradise lost: how the alphabet fell from perfection
10:15-10:30	Manex Agirrezabal, Sidsel Boldsen, and Nora Hollenstein	The hidden folk: linguistic properties encoded in multilingual contextual character representations
10:30-11:00	Coffee break
11:00-11:20	Christian Gold, Ronja Laarmann-Quante, and Torsten Zesch	Preserving the authenticity of handwritten learner language: annotation guidelines for creating transcripts retaining orthographic features
11:20-11:40	Kurt Micallef, Fadhl Eryani, Nizar Habash, Houda Bouamor, and Claudia Borg	Exploring the impact of transliteration on NLP performance: Treating Maltese as an Arabic dialect
11:40-12:00	Elizabeth Nielsen, Christo Kirov, and Brian Roark	Distinguishing Romanized Hindi from Romanized Urdu
12:00-1:30	Lunch break
1:30-2:30	Invited speaker: Amalia Gnanadesikan	How linguistic are writing systems?
2:30-2:45	Yuying Ren	Back-transliteration of English loanwords in Japanese
2:45-3:05	Wen Zhang	Pronunciation ambiguities in Japanese kanji
3:05-3:25	Shigeki Karita, Richard Sproat, and Haruko Ishikawa	Lenient evaluation of Japanese speech recognition: modeling naturally occurring spelling inconsistency
3:25-4:00	Coffee break
4:00-4:20	Logan Born, M. Monroe, Kathryn Kelley, and Anoop Sarkar	Disambiguating numeral sequences to decipher ancient accounting corpora
4:20-4:40	Fabio Tamburini	Decipherment of lost ancient scripts as combinatorial optimisation using coupled simulated annealing
4:40-5:00	Logan Born, M. Monroe, Kathryn Kelley, and Anoop Sarkar	Learning the character inventories of undeciphered scripts using unsupervised deep clustering
5:00-5:15	Noah Hermalin	A mutual information-based approach to quantifying logography in Japanese and Sumerian
5:15-5:30	Organizers	Closing remarks

Organization

Program Committee:

Manex Agirrezabal, University of Copenhagen, Denmark
Sina Ahmadi, George Mason University, USA
Cecilia Alm, Rochester Institute of Technology, USA
Steven Bedrick, Oregon Health & Science University, USA
Taylor Berg-Kirkpatrick, UC San Diego, USA
Dan Garrette, Google, USA
Alexander Gutkin, Google, UK
Nizar Habash, NYU Abu Dhabi, United Arab Emirates
Yannis Haralambous, IMT Atlantique & CNRS Lab-STICC, France
Cassandra Jacobs, University of Buffalo, USA
George Kiraz, Princeton University, USA
Christo Kirov, Google, USA
Grzegorz Kondrak, University of Alberta, Canada
Martin Jansche, Amazon, UK
Yang Li, Northwestern Polytechnical University, China
Constantine Lignos, Brandeis University, USA
Zoey Liu, University of Florida, USA
Gerald Penn, University of Toronto, Canada
Yuval Pinter, Ben-Gurion University of the Negev, Israel
William Poser, independent scholar, Canada
Emily Prud’hommeaux, Boston College, USA
Shruti Rijhwani, Carnegie Mellon University, USA
Maria Ryskina, MIT, USA
Lane Schwartz, University of Alaska, Fairbanks, USA
Djamé Seddah, Sorbonne University & Inria, France
Shuming Shi, Tencent, China
David Smith, Northeastern University, USA
Kumiko Tanaka-Ishii, University of Tokyo, Japan
Annalu Waller, University of Dundee, UK

Schedule

Organization

Sponsorship