To better enhance driving experience and prevent road accidents, the push for autonomous vehicles is only growing. An example of such a push is the Advanced Driver Assitance System. These are a set of technologies currently implemented in many vehicles and include: adaptive cruise control, GPS/traffic warnings, automated braking, lane keep assist.
Lane detection is a key area in this surge for autonomous driving and was the topic we chose to focus on. Specifically, we were curious to see if there was a robust solution for detection in adverse visibility conditions. These are conditions where illumination or inclement weather may impede the driver from clearly seeing the lane and road. For the purpose of this project, we focused on illumination problems (e.g. shadows, sun glare) using solely camera (RGB) information.
We first researched how basic lane detection works using RGB images and found a common, general pipeline across the different resources that works on simple images. The sequence of steps can be summarized below:
Color selection. Given an RGB image as input, the goal of this step is to convert the image into a grayscale image. To do so, one may use an averaging RGB channel. Another example would be to look at a different color space. That is, an HSL (hue, saturation, lightness) or HSV (hue, saturation, value) color space. As alternative representations to RGB, depending on the input image, this may make the lanes in your image appear more obvious. In turn, this will lead to better separation between lanes and roads when converted to grayscale.
Blurring. Blur the image before searching for edges in order to reduce noise.
Canny edge detection. This is a widely used technique that extracts structural information from images by using their gradient information. Often people use the hysterosis procedure which uses two thresholds. Gradients below the small threshold are not edges, above the higher threshold are string edges, and between are weak edges. Weak edges are kept if they are connected to a strong edge.
Region of interest (ROI). To further reduce the amount of data to be processed, and to avoid obvious non-lanes, only search in a specified region of the image (for example, the bottom half of the image).
Hough transform (HT). HT uses the edge map produced by the Canny edge detector to find lane candidates
Following these steps provided good results on "highway" data (i.e. images in which the road and lanes are obvious), but obviously is not robust enough to find lanes perturbed by changes in illumination (or any type of noise).
It is difficult because the shadows create edges that make it hard to detect lane edges. If the parameters are too restrictive, then few lane edges are found. If they are too loose, the lane is covered by shadow edges. In either case, it is nearly impossible to detect the edge. Obviously we need a more robust solution that can adapt to changing levels of illumination.
To combat illumination changes, we followed the approach used in "Gradient-Enhancing Conversion for Illumination-Robust Lane Detection" [1]. Two key ideas were proposed: gradient-enhancing conversion and adaptive Canny edge detection.
We also added in lane continunity from "Vision-based lane departure detection system in urban
traffic scenes" [2]. Additionally, we developed some other custom lane filters in order to improve results.
Below is a summary of our pipeline:
Label Training Data. First note that the only possible lane colors in our images are yellow or white. So for training, we hand labeled the first 5 images in our dataset as follows: yellow-lane vs. road and white-lane vs. road. For both scenarios, we defined a road to be any pixel that was not a lane (after removing an upper percentage of the image from the training set that is clearly not lane or road). Labeling was performed by creating image masks for yellow lanes, white lanes, and roads (see images below).
Original input image.
Mask separating white lane (in white) from road (in black).
Mask separating yellow lane (in yellow) from road (in black).
Linear Discriminant Analysis (LDA). Given our training data, our goal is to find weight vectors (i.e. gradient-enhanced conversion vector) that, respectively, maximize the difference between the classes. In doing so, this step outputs two weight vectors. One that best separates yellow lanes from the road and one that best separates white lanes from the road.
Gradient Enhancing Conversion. Given the two weight vectors, we apply each one to the original image. This returns a white-lane gradient enhanced image and a yellow-lane gradient enhanced image.
Adaptive Canny Edge Detection.
We use the hysterosis procedure for edge detection, which requires two thresholds. Gradients below the small threshold are not edges, above the higher threshold are string edges, and between are weak edges. Weak edges are kept if they are connected to a strong edge. Given that we know the locations of the lane pixels and their intensities, we can use that information to inform the thresholds for canny edge detection.
We perform this process for both the gradient-enhanced images (white and yellow), and OR their edge images together to get a final edge image for the current frame.
Hough Transform. Once we have the edges in the image, we can perform the hough transform to find possible lane lines in the image. Given that we know the lanes will be on the bottom part of the image, we can exclude the edges occuring in the top part of the image (we exclude the top 40% of the image).
Lane Continuity and Other Filters. Ideally we will have exactly one hough line for each lane. Sometimes when we have multiple hough lines (as we do in this example), we need to filter them down to just 2 predicted hough lines.
Lane Continuity: [2] proposed using lane continuity to avoid predicting traffic markings in urban environments as lanes. The
method works well in the case of illumination variation and shadows as well, since the shadows are like markings on the road that create edges.
The idea is simple: a hough lane prediction must pass through a "near" region (a bottom portion of the image) and a "far" region (the upper portion of the image). If it
does not, then it is very likely not a lane line and can be removed as a prediction. We defined the near region as the bottom 30% of the image and the far region as the upper 70% of the image.
Custom Lane Filters: We sometimes need additional lane filters in order reduce the number of hough lines further (in this example shown, lane continuity does not remove any of the lines). We introduce several other filters that effectively reduce the number of hough lines:
Lines with very small slopes (nearly horizontal lines) are unlikely to be lanes, so we can filter those out.
We divide the remaining lines into those of positive and negative slope and keep one from each group. Lanes tend to start near the bottom of the image, so we keep the line from each group that starts closer to the bottom of the image.
Initial Region of Interest. Once we have two hough lines, we create a "region of interest", or mask, for which to extract lane edges. We set a width around the hough lines, and a distance along the hough line, for which to extract edges that are believed to be lane edges.
Curve Fit. Now that we have extracted the lane edges, we can fit a curve to these lane edges. This allows the algorithm to more effectively handle cases where the lane starts to curve.
Lane Detected Image. Now we can declare the two fitted curves as the lane predictions for this image.
Update Training Data. We then update the training data for future iterations of the algorithm. We pop off the training data from 5 frames ago and add in the information from the current frame. Each pixel is a piece of training data, with its R,G, and B intensities, as well as its label (lane or road).
Implementation
Data
The data used for our implementation was from Udacity's Open Source Self-Driving Car repository. The dataset we used contained a total of 5648 images and contained various lighting conditions (e.g., daytime, shadows). More details about the data can be found here.
Repository
Our downloadable source and executable code can be found here.
Results
After running our pipeline on the Udacity dataset, we randomly sampled 500 images to compute accuracy measurements. [1] discussed how they computed "accuracy" by defining a term they referred to as detection rate. That is, they manually counted the number of correct detected lanes, where a correct prediction is when the predicted lane lies on the true lane markings and its curvature is in the same direction. Although this is a sound measurement, we wanted to be comprehensive in our analyses. In turn, we define true positive, false positive, and false negatives as follows:
True positive: correctly predicts the lane (i.e. predicted lane marking lies on true lane marking and curves in the same directions).
False positive: incorrectly predicts that there is a lane (e.g. predicts the grass on the side of the road is a lane).
False negative: incorrectly predicts that there is no lane when, indeed, there is one.
Note that each frame may contain one or more true lane markings. Therefore, after manually labelling the 500 frames with true positive, false positive, and/or false negative labels, our final accuracy measurements were the following:
There were several roadblocks we faced with this project. For example, it was, in and of itself, quite difficult to find sequential data for this task. In addition, the data that we ended up using for our implementation was not as HD as we would have liked. In terms of implementation, our pipeline at first outputted many candidate lanes and so we incorporated several of our own filters (e.g. eliminating near-horizontal lines) and another paper's proposed method of lane continuity [2]. It took a significant amount of effort to engineer this project since we had no open-source code to work from for this approach and the paper was vague on some implementation details.
With all that being said, we believe we obtained comparable results to [1], given that their measurement is more open to interpretation (according to [1], their average detection rate for daytime data was 94.79%.). They also used a different dataset that we did not have access to that likely had a different distribution. Looking ahead, it would be interesting to see if our results change dramatically if we were to add temporal information in training. With temporal information, we could perhaps add more weight to more recent previous images. This method finds weights based on LDA and so it finds a simple linear combination of our R, G, and B features that best separates the class labels. Perhaps using a recurrent neural network with this temporal information may be more beneficial (albeit we may sacrific in runtime).
Another future direction would be to test on more adverse visibility conditions. As previously mentioned, it was difficult to find sequential data in the first place. However, we would like to see how well our pipeline works in situations where there is heavy rain or snow. Lastly, in practice, we would ideally combine other sensor information for lane detection. Information such as light detection and ranging, GPS, and vehicle sensor information in combination with this approach would likely produce even better results.
References
[1] H. Yoo, U. Yang, and K. Sohn. Gradient-enhancing conversion for illumination-robust
lane detection. IEEE Transactions on Intelligent Transportation Systems, 14(3):1083-
1094, Sept 2013.
[2] Y. C. Leng and C. L. Chen. Vision-based lane departure detection system in urban
traffic scenes. In 2010 11th International Conference on Control Automation Robotics
Vision, pages 1875{1880, Dec 2010.