Is your Data Science team increasing your cyber exposure?

With cyber attacks and data breaches making the headlines almost every week, it is surprising to see how little people know about cyber risks. Many of these breaches are the result of poor security measures and/or human errors: systems are not updated fast enough, sensitive data is not stored encrypted, or encrypted with easy-to-break algorithms. To better understand the magnitude of the problem, read the 2018 Verizon Data Breach Incident Report here.
 
If the problem starts with poor IT practices, it definitely does not end here. Phishing or ransomware campaigns target unsuspecting employees, using deception and social engineering methods. Some companies have invested in training. But this knowledge stays relatively abstract, and most employees don’t apply these teachings to their everyday work and life. For example, they exchange data in excel spreadsheets, by email, without any concern about sensitive information or using encryption.
 
If you thought the current situation is bad, consider your data science team. Their daily activities require them to handle large amounts of data. This data is usually stored in systems that are managed by your IT organization. The data science team gets either direct access to it, or gets an export of it in a text file. In either case, the data is prepared for modeling and ends up unencrypted on a file system somewhere, without any IT control or security measures. To compound the issues, this data gets sometimes sent by email.
 
To change this sad state of affairs, companies need to create a culture where everyone is conscious of cyber exposure and knows what to do on a daily basis to mitigate it.
 
Secure your IT infrastructure
It might be time to change the way you think about security. The traditional perimeter security model relies on network segmentation as the primary mechanism for protecting sensitive resources. Devices inside your firewall are supposed to be more trusted than the ones outside of it. After the Operation Aurora occurred in 2009, Google developed a Zero Trust Network called BeyondCorp. In this new model, all applications are deployed to the internet. Access rights are managed at the user and device levels.
 
Decide with IT where files should be stored and determine best encryption solutions
Agreeing with IT on where modeling datasets are to be stored is a first step towards better managing your cyber risks. It is impossible for IT to secure data they don’t even know exists.
 
Remove any sensitive information
Employees need to know the different types of sensitive information. The main ones are:
 
Personal Identifiable Information(PII) is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context.
 
Protected Health Information(PHI) under the US law is any information about health status, provision of health care, or payment for health care that is created or collected by a Covered Entity (or a Business Associate of a Covered Entity), and can be linked to a specific individual. This is interpreted rather broadly and includes any part of a patient’s medical record or payment history.
 
Payment Card Information(PCI) is information relevant to payment cards, like credit cards.
 
Companies usually own more specialized sensitive information and need to make it to their employees. That information needs to have restricted access and should never be stored in any unencrypted database or file.
 
Make sure your modeling datasets are stored encrypted and unencrypted in memory (never stored unencrypted on file system)
The first stage of any data science project is to prepare data for modeling. The result of this stage is a modeling dataset that is usually stored in a database or a file. Even when all sensitive information has been removed and the data anonymized, it is good practice to encrypt this modeling dataset. Any analysis should decrypt it on the fly, in memory. This way the data is never stored unencrypted anywhere.
 
Avoid sending data by email
Email is a very unsecure protocol and should never be used to exchange unencrypted data. Emails can be spoofed very easily and are a very efficient tool for phishing campaigns. Digitally signing your emails is a way to authenticate them.
 
Always use highly secure encryption tools if you have to send data by email
There are secure solutions that allow you to encrypt attachments. Most of them are seamless to use.
 
Implementing those very simple steps would go a long way in reducing companies’ cyber exposure. Security is each of us responsibility and should not be taken lightly.