United States Navy researchers want to build a global social media archive with 350 billion of digital data records as part of ongoing research efforts at the Naval Postgraduate School from Monterey, CA, conducted through the Department of Defense and Analysis at the Naval Postgraduate School.
As detailed in its synopsis, the military research project aims to provide "enhanced understanding of fundamental social dynamics, to model the evolution of linguistic communities, and emerging modes of collective expression, over time and across countries."
The U.S. Navy intends to go through social media records between at least 7/1/2014 and 12/31/2016, with the data to be collected from a single social media platform and consisting of "all publicly available messages, comments, or posts transmitted on the platform over the specified time period."Archive to contain records from 200M users out of 100 countries
200 million unique users from at least 100 countries will have their messages added to the Navy's global social media archive, with no single country to account for more than 30% of the users which will have their data collected.
In addition, the archive "must include messages written in at least 60 languages, with at least 50% of the messages written in non-English languages."
However, as the project summary also mentions, the collected data "must consist exclusively of publicly available information," with no private info to be crawled and added to the database.
The rest of the minimum requirements for the 350 billion records to be collected for the archive are as follows:
• Each record in the archive must provide the full text of a social media post, unaltered from its original content and formatting, with all publicly available meta-data, including country, language, hashtags, location, handle, timestamp, and URLs, that were associated with the original posting.
• All records must include the time and date at which each message was sent and the public user handle associated with the message.
• Approximate location information, providing self-reported user hometowns, or other publicly available geo-location information, must be included for at least 20% of the records
The research project's synopsis also says that the data will be used for pedagogical purposes, to provide "students new opportunities for thesis research and the development of 'big data' analytic skills."
The military research team wants to "acquire a large-scale global historical archive of social media data, providing the full text of all public social media posts, across all countries and languages covered by the social media platform."
"Social media data allows us for the first time, to measure how colloquial expressions and slang evolve over time, across a diverse array of human societies, so that we can begin to understand how and why communities come to be formed around certain forms of discourse rather than others," T. Camber Warren the main researcher assigned to the project told Bloomberg.