Facebook is spending lots of money and time building its AR glass with Ray-Ban. These gadgets right now only can record and share imagery. A new research project led by the company AI team suggests the scope of the company’s ambitions. It imagines AI systems that are constantly analyzing people’s lives using 1st personal video, recording what they see, do, and hear to help them in everyday tasks.
The researchers have mentioned series of skills it wants these systems to develop, including episodic memory and audio-visual diarization. Currently, the task mentioned above cannot be achieved reliably by any AI system, and the company stresses that this is a research project rather than a commercial development.
Functionality like this is seen by Facebook as the future of AR computing. Huge privacy implications are there in such ambitions. Privacy experts are already worried about how AR glasses allow wearers to record public members covertly. If the footage is recorded, analyze, and transcribe by future versions such concerns will also be raised in the future again.
Ego4D is the name of the company’s research project, and this refers to the analysis of first-person, or egocentric, video. There are two major components: a series of benchmarks that the company thinks AI systems should tackle in the future and an open dataset of egocentric video.
To collect the data the company partnered with 12 universities worldwide. In nine different countries, 855 participants recorded some 3,205 hours of footage. The universities rather than the company were responsible for collecting the data.
Some participants were paid. To record videos of the unscripted activity participants wore AR glasses and GoPro cameras. The activities include construction work to baking to playing with pets and socializing with friends. The universities de-identified all footage in which faces of bystanders were blurred and all personally identifiable information was removed. In both diversity and scale the data is the 1st of its kind.
The 2nd component of Ego4D is a series of benchmarks, or tasks, that the company wants researchers worldwide to try and solve using AI systems trained on its dataset. These are Social interactions (who is speaking with whom), Audio-visual diarization ( what and when something is said), Object and hand manipulation (what is being done), Forecasting ( what will a person do next ). At this moment, AI systems would find tackling any of these problems challenging.
Facebook believes that systems trained on Ego4D might one day be used in home assistant robots and wearable cameras. To navigate the world around them, it relies on first-person cameras. The company’s task seems achievable but many will be worried by the company’s interest in this area. The company’s record on privacy is terrible, including data leaks and $5 billion fines from the FTC. Prominent privacy safeguards are not included in the benchmarks in this Ego4D project.
As per companies spokesperson, it is expected that privacy safeguards would be added in the future. They are expecting that to the extent companies use this dataset and benchmark to develop commercial applications and the company will develop safeguards for such applications.