By Cedrick Fairon, Sebastien Paumier
This article presents a corpus of 30,000 French SMS with its uniqueness in quality, size and the fact that the SMS was translated into “standard” French. As well it shows the collection process and the detail of the translation process.
Sociologists and linguists started to describe how the new language of new forms in the written form such as chat, forums and SMS is adapted and how users play with each to “make sense” faster with fewer words and characters.
The shortage of reference corpora, especially with SMS due to the difficulty in collecting it made the researchers be hard in studying the new forms of written language. However recently, the collection was carried out by students and messages were manually copied from phone screens.
Two important limitations; i. corpora by restricted SMS users
ii. typing mistakes or voluntary corrections
“Give your SMS to Science”
A SMS collection in the French-speaking part of Belgium was organized;
- to facilitate the data collection, a toll free short code was made
- a call for participation was broadcast
- participants were invited to send copies of their SMS & to fill in online sociolinguistic form
- from Oct. 2004 to Dec. 2004, more than 75,000 SMS by more than 3,200 people
- 2,500 people answered the form, aged from 12 to 65, divided into 1200 men and 1500 women
Goal; to build a reference corpus as a solid base for linguistic studies
Preprocessing the corpus
73,127 raw SMS was received.
Two operations
1. the first - to reassemble messages of more than 160 characters that were split into several SMS and to remove SMS (non-French SMS, graphical SMS, duplicated, etc)
2. the second - to remove personal information
Translating the corpus
Why translate? (Motivations)
- “translate” or “transliterate” the corpus into “standardized” French (called a bilingual corpora)
- Both SMS and its translation in standardized French
1. readability; the difficulty in reading due to without spaces, mix upper case & lower case letter, non-standard abbreviations & text transformations, codes, usages and habits of SMS writers, and sequence errors
2. usability; facilitation for exploration of messages
Translation protocol
Translation rules
1. IdSMS – Index of the SMS in the database
2. User – number standing for a GSM number
3. Sex – to check gender agreements, in particular for past participles
4. Flag – message annotations
5. Message – Original SMS(already anonymised)
6. Trans. - translation in “standard” French
Two general rules;
1. original SMS – not modified
2. protocol – strictly observed in both “standard” French and the original messages
Subset rules; about foreign word, punctuation marks, mathematical symbols, abbreviations, smileys, spaces & new lines, acronyms & sigla, letter repetitions, phonetic transformations, onomatopoeia & interjections, proper names, numbers, neologisms, obvious errors, unexpected or incomprehensible symbols, character case, typing errors, and missing words, accordingly
The corpus
The translation of 30,000 SMS was finally made.
1. randomly selected messages with a sociolinguistic profile – from 1,736 authors
2. 11% of SMS with no associated profile to avoid any bias – from 799 authors
Published; in CD-Rom
The corpus; distributed as a database linked to a graphical interface for searching and sorting original and translated messages as well as author profiles
Conclusion
This SMS corpus is unique in its size and accuracy, the number of contributors and the amount of meta-data. It has also translated manually for a bilingual corpus allowing both standard French and the SMS variants.
It opens new perspective for studies of SMS languages as well as providing a high value to the corpus.
http://www.sms4science.org/userfiles/A%20translated%20corpus.pdf
2010년 4월 12일 월요일
2010년 4월 8일 목요일
Generation Txt? The sociolinguistics of young people’s text-messaging

By Crispin Thurlow
This article is about ‘net generation’s uses of mobile phone text-messaging and SMS to examine the linguistic form and communicative functions as a novel, creative means of enhancing and supporting intimate relationships and existing social networks among them.
Main interpretations and preliminary discussion;
1. message length – SMS and mobile phone have longer messages than online chat.
☞ SMS → more interactive written discourse that CMC
☞ Mobile phone text-message → ‘predictive text’
2. ‘New’ linguistic form
① shortening, contractions, G-clippping and others
② acronyms & initialism
③ letter/number homophones
④ ‘misspelling; and typos
⑤ Non-conventional spellings
⑥ Accent stylizations with capitalization(prosodic & personal style)
Primary functional orientation of each message;
1. informational-practical orientation
2. informational-relational orientation
3. practical arrangement orientation
4. social arrangement orientation
5. salutary orientation
6. friendship maintenance orientation
7. romantic orientation
8. sexual orientation
9. chain messages
From the research result, high intimacy and high relational orientation of functional categories (the number of 4,5, 6,7, and 8) occupied higher percentage portion.
SMS and mobile phone text-messaging also offer the users the other functions such as expressing humor, taboo, relative licentiousness or flame-potential and hyper-coordination & co-presence.
The communication imperative;
- Young generation prefers ‘text-messaging’ because it is the unobtrusive and relatively inexpensive mode of communication.
- Young generation’s text-messaging is becoming increasing dialogic like in online chat.
Four gratifications for young people (compared to CMC, Email)
① high transportability
② reasonable affordability(price)
③ good adaptability(voice)
④ general suitability(it is quiet, discrete)
→ need for intimacy and social intercourse; ‘technologies of sociability’
The language of SMS;
‘Re-inventing the (English) language?
;Linguistic & communicative practices of text-messages emerge from a particular combination of
① technological affordances (abilities of technology)
② contextual variables
③ interpersonal priorities
The sociolinguistic maxims of SMS
① brevity & speed
② paralinguistic restitution
③ phonological approximation
Non-standardness in SMS
① ‘new’ linguistic form → incomprehensible
② impenetrability & exclusivity of SMS language
④ quantity & manner; i. abbreviation,
ii. non-conventional spelling
iii. phonological approximation
→ The younger; ‘write it as if saying it’
They think SMS and mobile phone text-messages are intelligible and appropriate to the overall communicative function.
In conclusion
1.text-messaging → ‘folded into the warp and woof of life”
2.new linguistic practices → adaptive & addictive
3.young text-messages manipulates conventional practices with linguistic creativity and communicative competence for their intimacy and social intercourse
This article is about ‘net generation’s uses of mobile phone text-messaging and SMS to examine the linguistic form and communicative functions as a novel, creative means of enhancing and supporting intimate relationships and existing social networks among them.
Main interpretations and preliminary discussion;
1. message length – SMS and mobile phone have longer messages than online chat.
☞ SMS → more interactive written discourse that CMC
☞ Mobile phone text-message → ‘predictive text’
2. ‘New’ linguistic form
① shortening, contractions, G-clippping and others
② acronyms & initialism
③ letter/number homophones
④ ‘misspelling; and typos
⑤ Non-conventional spellings
⑥ Accent stylizations with capitalization(prosodic & personal style)
Primary functional orientation of each message;
1. informational-practical orientation
2. informational-relational orientation
3. practical arrangement orientation
4. social arrangement orientation
5. salutary orientation
6. friendship maintenance orientation
7. romantic orientation
8. sexual orientation
9. chain messages
From the research result, high intimacy and high relational orientation of functional categories (the number of 4,5, 6,7, and 8) occupied higher percentage portion.
SMS and mobile phone text-messaging also offer the users the other functions such as expressing humor, taboo, relative licentiousness or flame-potential and hyper-coordination & co-presence.
The communication imperative;
- Young generation prefers ‘text-messaging’ because it is the unobtrusive and relatively inexpensive mode of communication.
- Young generation’s text-messaging is becoming increasing dialogic like in online chat.
Four gratifications for young people (compared to CMC, Email)
① high transportability
② reasonable affordability(price)
③ good adaptability(voice)
④ general suitability(it is quiet, discrete)
→ need for intimacy and social intercourse; ‘technologies of sociability’
The language of SMS;
‘Re-inventing the (English) language?
;Linguistic & communicative practices of text-messages emerge from a particular combination of
① technological affordances (abilities of technology)
② contextual variables
③ interpersonal priorities
The sociolinguistic maxims of SMS
① brevity & speed
② paralinguistic restitution
③ phonological approximation
Non-standardness in SMS
① ‘new’ linguistic form → incomprehensible
② impenetrability & exclusivity of SMS language
④ quantity & manner; i. abbreviation,
ii. non-conventional spelling
iii. phonological approximation
→ The younger; ‘write it as if saying it’
They think SMS and mobile phone text-messages are intelligible and appropriate to the overall communicative function.
In conclusion
1.text-messaging → ‘folded into the warp and woof of life”
2.new linguistic practices → adaptive & addictive
3.young text-messages manipulates conventional practices with linguistic creativity and communicative competence for their intimacy and social intercourse
피드 구독하기:
글 (Atom)