Application of machine learning for real-time evaluation of salinity ( or TDS ) in drinking water using photonic sensors

The world is facing an unprecedented problem in safeguarding 0.4 % of potable water, which is gradually depleting day-by-day. From a literature survey it has been observed that the refractive index (RI) of water changes with a change in salinity or total dissolved solids (TDS). In this paper we have proposed an automatic system that can be used for real-time evaluation of salinity or TDS in drinking water. A photonic crystal (PhC) based ring resonator sensor has been designed and simulated using the MEEP (MIT Electromagnetic Equation Propagation) tool and the finite difference time domain (FDTD) algorithm. The modelled and designed sensor is highly sensitive to the changes in the RI of a water sample. This work includes a real-time-based natural sequence follower, which is a machine learning algorithm of the naive Bayesian type, a sequence of statistical algorithms implemented in MATLAB with reference to training data to analyse the sample water. Further interfacing has been done using the Raspberry Pi device to provide an easy display to show the result of water analysis. The main advantage of the designed sensor with an interface is to check whether the salinity or TDS in drinking water is less than 1000 ppm or not. If it is greater than or equal to 2000 ppm, the display shows “High Salinity/TDS Observed”, and if ppm are less than or equal to 1000 ppm, then the display shows “Low salinity/TDS Observed”. The proposed sensor is highly sensitive and it can detect changes in TDS level because of the influence of any dissolved substance in water.


Introduction
Drinking water (or potable water) is considered to be safe enough to consume by humans or to use for domestic and medical purposes with a low risk of immediate or long-term harm.In most countries, the salinity of drinking water is restricted to less than 1000 ppm.Salinity is the measure of concentration of salts in water.Greater concentration of salts in water not only affects the taste of the water, but also causes health hazards.TDS include inorganic salts and organic matter dissolved in water, and a TDS level between 300 and 600 mg L −1 is considered to be good (Fawell et al., 1996).Hence there is a necessity for evaluation of water before it is allowed to be consumed (Walker and Newman, 2011).TDS are water quality parameters which can be measured by water purity measuring devices.
There are several methods of measurement used for drinking water; however, we have studied the following methods for measuring water purity.

Electrical conductivity (EC) method for measurement of TDS in water
The electrical conductivity method is basically used in conventional TDS measurement devices.In this type of TDS meter, voltage is applied between two or more electrodes.
Positively charged ions like sodium (Na + ), calcium (Ca ++ ), and magnesium (Mg ++ ) will get attracted towards the negatively charged electrode.Negatively charged ions like chlo- ride (Cl − ), sulfate (SO −− 4 ), and bicarbonate (HCO − 3 ) will get attracted towards the positively charged electrode.A moving charge produces an electrical current.Neutral molecules remain unaffected by the electrical attraction of the electrodes.The meter then measures the generated current.
The measured current is a function of the following constituents of water under investigation.
-Quantity and types of ions actually present in the sample water -Ions with higher charges tend to have higher conductivity.
-Larger ions will have lower conductivity as because of their size they will have a "drag" effect.
-Conductivity of ions in water depends upon temperature.
TDS meters internally convert the measured current into parts per million (ppm).Such devices, however, have limitations as detailed below.
-Because most of the devices used the conductivity method, these devices do not measure all dissolved solids like sugar, alcohol, organic contents, silica, ammonia, carbon dioxide, iron oxide, dissolved bacteria and viruses.
-Different units of measurement used even though all are referred to as ppm (parts per million).
-The meters come with a factory calibration; sometimes it may require calibrating the meter using a standard solution.
-Using a TDS meter (pen type) is specific to one type of dissolved solid solution and must not be transferred from one type of dissolved solid solution or sample to the next, as this may result in some serious errors.This is because TDS meters are calibrated by correlating the conductivity of the solution with the ppm of dissolved solids, and this correlation varies considerably from one species of dissolved solid to the other.
-When the TDS meters are not carefully calibrated, it is not clear whether they refer to the ppm of sodium chloride equivalents, or to something else, maybe potassium chloride (KCl).
-In order to compensate for temperature effect, ATC (automatic temperature compensation) is required to be part of the device to provide a value that is "corrected" at a standard temperature (25 • C).

Optical refractive index method
Light passing through water tends to bend at a particular angle, depending on the effective RI of water due to dissolved elements.Thus a method of liquid refractometry is useful in the detection of the variation in the salinity/TDS of water.
The proposed device in this work is based on the detection of variation of the effective RI of water because of the TDS.In subsequent sections we have detailed the working of the proposed device.The following are the advantages of the PhC sensor-based device presented in the current work.
-Immune to electromagnetic interferences as the measurement is based on RI change, not on electrical conductivity -Higher sensitivity, compact sized sensing unit (nearly size of a coin), higher safety in hazardous environments -The possibility of processing the signal at large distances from the sensor with little degradation -A non-invasive method used resulted in no material influence on the sample water as no probe was inserted into the water.
-Requires simple circuitry and highly accurate methods of sensing.
-The inorganic compound NaCl or KCl can be differentiated on the basis of the respective indices.
The sensing element of the proposed sensor is designed using PhCs.PhCs are periodic structures and consist of a band gap that restricts the propagation of the specific frequency range of light.This property enables one to control light and produce effects that are impossible with conventional optics.

pH sensing method
From the literature survey it has been observed that pH sensors are electrochemical devices used for the detection of hydrogen ions.The pH factor is used to measure the acidity or alkalinity of water.The pH value is determined by the combination of all the acids and bases present, but this is also influenced by the buffering capacity of the water and temperature.The main limitation of this method is that pH change due to a particular acid/base cannot be measured.This method is less suited for detection of general water quality.There are several other methods of study like CaCo 3 , CEC and SAR.The Co 3 of water is evaluated in conjunction with bicarbonates for several important evaluations such as alkalinity, the sodium adsorption ratio (SAR) (Hossain et al., 2016), adjusted sodium adsorption ratio (SAR adj.), and residual sodium carbonate (RSC).Carbonates will not be a significant component of water at a pH below 8.0, and will likely dominate at a pH above 10.3.

Light propagation in PhC
EM propagation through a medium is dependent on the permittivity of the medium.It is furthermore dependent on the RI of the material.As the RI changes, the permittivity changes; as a result, the EM propagation also gets impacted.
The propagation of light in the PhC (sensing element) is described by Maxwell's electromagnetic (EM) equations as given below: where E(r, t) is the time-varying electric field, H (r, t) is the time-varying magnetic field, (r) is the permittivity, µ (r) is the permeability, and σ (r) is the conductivity of the medium.
Considering the propagation of an EM wave in any medium,  the equations in the Cartesian co-ordinate system for electric and magnetic fields are given by The Bloch-Floquet theorem states that an EM wave propagating in a varying dielectric structure is modulated by the periodicity of the structure.The periodic variation is given by where p is the period of the crystal.The EM field is given by where (r, t) is the electric or magnetic field.K is the propagation constant and a is the period of a crystal.The proposed sensor uses the FDTD (finite difference time domain) algorithm, which solves Maxwell's EM equations (Yee, 1996).For the proposed structure, a Gaussian pulse is used as a source and the fields are updated at each point of the Yee grid according to the finite difference Maxwell curl equations, and the obtained output samples are normalized with respect to the input signal.
In a PhC, RI is periodically modulated where periodicity is in the order of wavelength.PhCs are periodic structures of dielectric material which allow the propagation of a certain frequency range of light (Joannopoulos et al., 1995(Joannopoulos et al., , 1997) ) and stop others (forbidden band gap).This unique behaviour of a PhC is used to control the propagation of light (Meade et www.drink-water-eng-sci.net/9/37/2016/ Drink.Water Eng.Sci., 9, 37-45, 2016   , 1992).The deviation of light in a lattice structure can be controlled by defect engineering.The following Eq.( 1) explains the movement of light in a PhC by solving Maxwell's electromagnetic equation.
= n 2 (10) n -refractive index As in Eq. ( 9), the permittivity of a medium ( ) changes as the angular frequency of resonance (ω) changes.Equation (10) shows that is dependent on RI and is the basis for using PhC as a sensor (Liu and Salemink, 2012).Methods like the photonic band gap method, the effective RI method, spectroscopy, and optical imaging are available (Fan et al., 2008).Since input variations are significantly low, the sensitivity of these methods is less (Nguyen et al., 2011).
The design and simulation of sensors is done by the MEEP tool.This is a FDTD simulation software to model electromagnetic systems.To compute transmission flux at each frequency "ω", sampling of a continuous electromagnetic field in a finite volume of space is done and is determined by Eq. (11).
To calculate P (ω), the following steps are used in the MEEP tool.
1. Compute the integral of the Poynting vector P (t) for each time.
2. Fourier transform the value in no. 1.
3. Compute flux at the specified regions and frequencies.

Machine learning algorithm
Machine learning is an automated action in which improvement is done in the future based on learning from the past.
The key element of this is to devise learning algorithms that do the learning automatically with minimum human actions.
The algorithm in machine learning allows the developed application to come up with its own assessment based on supplied training data (Haung et al., 2010).
The naive Bayes algorithm is a classification technique based on the Bayes theorem with an assumption of independence among predictors (Rish, 2001).The naive Bayes classifier assumes that the presence of a particular feature in a In (a) a nascent curve of the transmission spectrum until 20 % of the outcome is exited immediately, with a dip being followed.These curves indicate that the light intensity drops abruptly around 25 % of the frequency range of the input light wave.These lines of frequency again tend to achieve maxima and do so at exactly 50 % of the frequency spectrum.This is an indication of the highest possible absorption of the intensity of an input light wave that may be a cause of polarization in the vicinity of the waveguide of the proposed structure.This velocity of intensity increase will again tend to become sluggish and abruptly embraces an exponential decay, for which the trapping of light in the waveguide begins to throw off a certain frequency of harmonic wave that tends to create a disruptive interference of the travelling pulse of Gaussian mode.In (b) we can observe that the salinity of water, being an analyte as compared against (a), is increased in concentration by 300 %.Here the entire spectrum is exactly the reciprocal of (a) in that the dip has happened in the first phase of the frequency shift, while here the same has happened in the second.Also, as against (a), the light intensity abruptly decreases before the center of the frequency spectrum is achieved.Thereby the absorption and reflection that have taken place before the identification of light intensity at the output become noticeable; only after the frequency of the spectra are over the central frequency of operation will the light intensity become noticeable twice.The curve remains nascent for around 30 % of the applied frequency and vigorously excites until 60 %.This excitement is immediately damped with a scattering time of under 0.5 units of intensity and remains nascent throughout.This signature curve of the transmission spectrum is incorporated into the database of the application.class is unrelated to the presence of any other feature.Naive Bayes is known for its simplicity that does better than other existing classification methods.The Bayes theorem provides a way of calculating the posterior probability P (c|x) from P (c), P (x) and P (x|c).This is defined in Eq. ( 13): P (c|x) = P (x 1 |c) x P (x 2 |c) x. ..xP (x n |c) x P (c) .
(13) P (c|x) is the posterior probability of a class (c, target) given the predictor (x, attributes).P (c) is the prior probability of a class.P (x|c) is the likelihood, which is the probability of a predictor-given class.
P (x) is the prior probability of a predictor.Depending on various attributes, the algorithm based on the naive Bayes theorem predicts the probability of different classes.This algorithm is used to solve problems with multiple classes.

Methodology
A Gaussian light pulse is considered as a source of simulation (Oskooi et al., 2010).The simulated data obtained and ready reference data available (training data) are given as input to the MATLAB program.The output results are displayed on a LCD screen along with a voice message, using the Raspberry Pi kit.

Sensor design
The objective is to design a two-dimensional PhC-based sensor (Akahane et al., 2003) for water analysis.The refractive indices of water with different salinity/TDS were used and simulations were carried out for the variations in properties of the sample for each constituent (Sharan et al., 2013;Lavanya et al., 2014).A shift in output transmitted power and frequency is observed.Figure 3 shows PhC-based sensor design and light propagation.
The design specifications are the following.This is a generic highly sensitive optical sensor for continuous real-time detection covering the full spectrum of possible chemical contaminants, organics and turbidity detection.This is low cost and low maintenance because it requires no consumables.This sensor measures RI changes in water, using the Mach-Zehnder interferometry (MZI) principle.Any substance, when dissolved in water, will change the RI of the water.Every substance has a unique RI.Dissolved particles in water result in a combined RI called the effective RI.Any substance that is dissolved in water will contribute to the effective RI.A change in the composition of water will result in a change in the effective RI.The proposed sensor can detect this change in RI irrespective of the nature of contamination, whether inorganic, organic or other.A brief comparison of the detection capability of the proposed sensor with the conventional sensors is shown in Table 1.
As can be concluded from the result in Table 2, -RI changes to the order of 10 −5 with change in % of salinity of water.The salinity variation is influenced by the TDS in water.The proposed method can detect RI change of the order of 10 −5 .
-The proposed sensor is highly sensitive and can detect variation of salinity (TDS) in the range of 0.01 to 30% with an accuracy of 0.04%.
The drinking water always contains inorganic salts, organic matter and particles.The particles that are larger than a few micrometres in size always give the greatest RI to the photonic sensor, while the salts that are normally a few nanometres in size always give a much lower RI.As such, even a trace amount of particles and organic matter in the measured water can greatly influence the RI.This may limit the application of the PhC sensor for measuring the TDS in drinking water.However, in the proposed PhC sensor we have addressed this by considering the following.
1. Any substance, when dissolved in water, will change the RI of the water.The change in RI is proportional to the concentration and the RI of the substance.The relationship between RI and concentration is linear.This linearity is maintained when a substance is dissolved in water containing various elements provided that there is no chemical interaction between the added substance and the elements already present in the initial water solution.So even a small amount of concentration of inorganic salt will have impacted the effective RI of the water.From the literature (Deosarkar et al., 2012) we have found that 10 % (v/v) ethanol and water RI is 1.332, whereas the RI for KCl solutions in a 10 % (v/v) ethanol and water mixture at 303.15 K is 1.340.So there is a distinct change in RI (0.008) because of inorganic salt KCl in the mixture.
2. The change in RI because of KCl is of the order of 10 −3 and the studied sensor has an accuracy of the order of 10 −5 .Hence the PhC sensor will be able to overcome the limitation of detection of lower contributions in effective RI by inorganic salts.
3. To ensure that the changes in RI due to inorganic salts get detected, the designed machine learning system would match the signature of the water constituent detected.Essentially each element of TDS in water will correspond to a unique peak frequency of light when www.drink-water-eng-sci.net/9/37/2016/ Drink.Water Eng.Sci., 9, 37-45, 2016 passed through water (Sharan et al., 2013).The unique transmission spectrum of each water constituent is considered the signature of the respective element.
4. In the machine learning process each signature of TDS (organic, inorganic and others) is stored in the database as a reference signature.During detection of TDS, if the signature matches the stored value, then the presence of the element is confirmed.
Figure 5c is a replica of Fig. 5a or b.The only distinguishing factor for the current scenario is the fact that the settling time at the tail and at the horn are squeezed but remain stable for a very long range of frequency of the applied intensity of light.This signature curve of the transmission spectrum is incorporated into the database of the application.

Machine learning application design and development
The naive Bayes classifier algorithm is developed as a MATLAB-based desktop application (Garg, 2013) of that class.The workflow of the system is shown in the flowchart below in Fig. 7.

Application of machine learning output and results
The MATLAB-based application developed was used to detect, analyse and classify the outputs obtained by the PhCbased sensors' simulated result and was used to evaluate the ppm level of salinity/TDS in drinking water.The algorithm selects the class with the highest posterior probability and assigns it to the test data.The accuracy of the algorithm can be obtained by performing a comparison of the class assigned to the test data with the actual class of the test data.The accuracy of the classifier is calculated by the number of correct classifications made/the total number of classifications made.
The simulated result of salinity/TDS and training data is used from the selected USB drives by the developed application (Figure 8a).Based on the salinity check done, the observed result is shown in the display of Raspberry Pi.If it is greater than or equal to 2000 ppm, the display shows "High Salinity/TDS Observed" (Fig. 8b), and if ppm is less than or equal to 1000 ppm, then the display shows "Low salinity/TDS Observed" (Fig. 8c).

Conclusions
The proposed paper concludes the design and implementation of an automatic system that can be used for real-time evaluation of potable water.This developed system includes a PhC-based ring resonator sensing application interface with an LCD display.The result shows the performance of the sensor is optimum as it can detect RI change of the order of 10 −5 in drinking water.Even a 0.04 % change in salinity of water can be detected.The application is based on the statistical algorithm implemented.Further interfacing has been done using the Raspberry Pi device to provide an easy display to show the ppm level of salinity/TDS in water.This application is more accurate and does more continuous measurement than traditional methods.Because of the use of a machine learning algorithm, the accuracy can be further enhanced by the use of a further sub-classification of TDS.
As future work this approach can be extended to detect whether water can be used for other purposes like farming and industrial use.
Edited by: R. Shang Reviewed by: M. Mokarram and one anonymous referee

Figure 1 .
Figure 1.Anions (green) with a negative charge get attracted toward the positively charged electrode; cations (red) with a positive charge get attracted toward the negatively charged electrode.Neutral molecules (blue) remain without any electrical influence.

Figure 2 .
Figure 2. Evaluation of salinity/TDS in water.

Figure 3 .
Figure 3. Design of the two-dimensional PhC line defect.

Figure 4 .
Figure 4. Salinity vs. RI of water.The figure shows variation of the RI with a change in the salinity of water.

Figure 5 .
Figure5.Transmitted spectrum for (a) 500 ppm saline water, (b) 52000 ppm saline water, and (c) 35 000 ppm saline water.The above figures depict transmission spectra with distinct shifts in peak frequencies for different salinities.In (a) a nascent curve of the transmission spectrum until 20 % of the outcome is exited immediately, with a dip being followed.These curves indicate that the light intensity drops abruptly around 25 % of the frequency range of the input light wave.These lines of frequency again tend to achieve maxima and do so at exactly 50 % of the frequency spectrum.This is an indication of the highest possible absorption of the intensity of an input light wave that may be a cause of polarization in the vicinity of the waveguide of the proposed structure.This velocity of intensity increase will again tend to become sluggish and abruptly embraces an exponential decay, for which the trapping of light in the waveguide begins to throw off a certain frequency of harmonic wave that tends to create a disruptive interference of the travelling pulse of Gaussian mode.In (b) we can observe that the salinity of water, being an analyte as compared against (a), is increased in concentration by 300 %.Here the entire spectrum is exactly the reciprocal of (a) in that the dip has happened in the first phase of the frequency shift, while here the same has happened in the second.Also, as against (a), the light intensity abruptly decreases before the center of the frequency spectrum is achieved.Thereby the absorption and reflection that have taken place before the identification of light intensity at the output become noticeable; only after the frequency of the spectra are over the central frequency of operation will the light intensity become noticeable twice.The curve remains nascent for around 30 % of the applied frequency and vigorously excites until 60 %.This excitement is immediately damped with a scattering time of under 0.5 units of intensity and remains nascent throughout.This signature curve of the transmission spectrum is incorporated into the database of the application.

Figure 6 .
Figure 6.Transmitted spectrum for water with various salinity levels.The figure shows the overlapping of all the previous spectra to highlight the shift in the frequency and amplitude.

1.
Rods in air configuration 2. Lattice constant: "a" = 1 3. Rod's radius r = 0.2 µm.4. The silicon slab's di-electric constant " " = 12. 5. Di-electric constant of the sample used for simulation in place of air 6.Light source type used, Gaussian pulse (centre frequency 0.295 and width 0.1) 7. Wavelength of light 1350 nm 8. Height of rods considered as infinity 6 Sensor simulation result analysis

Figure 7 .
Figure 7. Workflow of the developed system.

Table 1 .
Different types of sensors and target substances for detection.

Table 2 .
Shows variation of RI with % of salinity of water.
. The classifier is designed for using unconditional data provide by the user and is made generalized to read any dataset with unconditional data.A Microsoft Excel file is used as input file categorical feature values (non-numerical continuous data).The system is intended to read two input files (.xlsx file) which contain the data set provided by the user.One file contains the training set and the other the test set.Using the training set, the prior probabilities of each class are calculated.Using a single instance from the test set, the conditional probabilities for each feature value are calculated.These values are then used to calculate the posterior probabilities for each class.The class with the highest posterior probability is assigned as the class for that test instance.This process is done in each instance in the test set.The accuracy of the algorithm is calculated by performing a comparison of the class values that are assigned to the class with the original class values