Coming soon! Our webinar just ended. Check back soon to watch the video.
How to Build Your First Predictive Model in Seconds with InfluxDB and Loud ML
Session date: 2018-04-17 08:00:00 (Pacific Time)
In this webinar, Sébastien Leger from Loud ML will share with you the power of using unsupervised learning frameworks to gain deep insights into your InfluxData time series data (application and performance metrics, network flows, and financial or transactional data). He will then show you how to configure, model, and dig into the modeled times series data using the Loud ML API and your existing InfluxDB databases.
Watch the Webinar
Watch the webinar “How to Build Your First Predictive Model in Seconds with InfluxDB and Loud ML” by filling out the form and clicking on the download button on the right. This will open the recording.
Here is an unedited transcript of the webinar “How to Build Your First Predictive Model in Seconds with InfluxDB and Loud ML” This is provided for those who prefer to read than watch the webinar. Please note that the transcript is raw. We apologize for any transcribing errors.
• Chris Churilo: Director Product Marketing, InfluxData
• Sébastien Leger: Founder and CEO, Loud ML
Chris Churilo 00:00:02.472 Good day, everybody. Hope everyone’s doing well today. We have a really great webinar today. We actually always have a great webinar. But today, I’m really excited. We’ll get started in just one minute. In the meantime, I’ll just cover some housekeeping items. If you have any questions during the presentation, please feel free to type them in either the Q&A, or the Chat Panel. And if you really, really, really want to speak out your questions, just raise your hand and I can un-mute you and you can talk to Sebastian directly. In addition, as always, I will—this session’s being recorded. After I do the edit, then I’ll post it and you will get—usually you’ll get the email first thing tomorrow morning. But I usually end up posting this in a couple of hours. So if you go back to the link, you’ll see that the page actually will change from the registration page to the recording. So you’ll be able to take a listen to it again. And also, we have trainings on Thursdays. This week’s training is an introduction to InfluxDB and Telegraf. I do have some more advanced trainings that’ll be coming up. Just take a look at the event calendar, so influxdata.com/events. In addition, we do have InfluxDays London coming up on June 14th. And our friend, Sebastian, who’s doing the presentation today, will actually be there. So if you’re really excited about what you see today, and you should be, then come and join us at InfluxDays and chat with him and the team directly.
Chris Churilo 00:01:39.746 The reason I’m so excited about what Sebastian’s done is it’s probably one our first plugins into InfluxDB and it just makes the Time Series Database that much more powerful. And I’m not going to steal all the fun thunder. I want to make sure that Sebastian can share all the really cool stuff. But with that, I’d like to introduce you to our friend, Sebastian, who’s actually from a company called REDMINT but they have built a product called Loud ML. And today’s webinar, he’s going to be describing what Loud ML is all about. And then also conclude it with a demo. And if you have any questions, please feel free to put your questions into the chat or the Q&A panel and what we’ll do is if there’s a natural pause in the presentation, then we’ll take a look at those questions and get them answered. I will definitely get all the questions answered before the end of the presentation. So with that—let me make sure that I un-mute Mr. Sebastian so he can get started, and then I’ll just hand the ball to you if you can just put yourself into presentation mode.
Sébastien Leger 00:02:46.002 I’ll try that. Hi guys. Thank you very much Chris for this introduction. How do I start the presentation mode then?
Chris Churilo 00:02:55.576 Yeah, so you go to slideshow in the menu. You know the PowerPoint presentation?
Sébastien Leger 00:03:05.606 Yep.
Chris Churilo 00:03:09.209 Yeah, you can do that. Perfect. I will go on mute.
Sébastien Leger 00:03:14.319 Okay, if that’s better. So hi guys. Very excited to be broadcasting with you today because we like to show you the work we’ve done with InfluxDB and how you can achieve machine learning in a very easy way. So let’s get started. I want to make sure that everybody is—okay. I want to make sure that everybody is—is fully up to speed with machine learning and why it is so hyped today, and we can read a lot about that everywhere in the news. For example, if you’re reading Wired, you could see that if we define machine learning to be a way to analyze your data and find patterns in your data, instead of relying on rules. Also, another definition, you might have found if you’re reading the Economist is that you will find patterns, and you will be able to make predictions without doing some kind of traditional programming. So these are really fine definitions. But the one I like the most is from Andrew, who was a chief data scientist in Baidu. And he says that AI, it’s really the new electricity. So it doesn’t matter—sometimes we refer to AI sometimes we mention machine learning. But really, what everyone is talking about today is deep learning and why it is the new electricity is because eventually in a matter of months, years, everyone will be using it. So it’s—that’s why Andrew is referring to AI as the new electricity and it’s really a very elegant definition. So not really focus much more on time series data because that’s obviously what you guys have, if you have a number of InfluxDB instances running.
Sébastien Leger 00:05:20.485 But the questions you want to solve with machine learning, they might fall into two—I would say two categories. The first one of them is if you’re looking to a single series and you have say a history of this data—could be months of data, maybe a year of data, maybe more. So you would say, “Okay, I have all this history and is it useful in any way to kind of forecast how the trend is going to be in the next minutes, days and more.” So that’s really a forecasting question. It’s first type of question you can have with machine learning and time series. And machine learning it’s really a powerful way to do that. And on the right-hand side, another set of questions you might have is when you have really lots and lots of different series—maybe you have hundreds of thousands of different series—so it’s becoming quite difficult to look at them, each and every one of them, individually. So when you have too much data like that, you want to create clusters in order to compare your data and maybe you find that some of your data is kind of bluish, so you put them in a group. And another set of data is kind of greenish, so you want to reduce the number of groups and clusters, so you can manage them much more easily.
Sébastien Leger 00:06:52.599 And by the way, I saw what you have on screen now is something you can see if you want to visit Italy. It’s a piece of art that is called the Triptych Electric, and as you can see there’s also a pattern that you can see if in this data. So now—okay, so we have these two set of questions, and they apply really well today to many, many case studies and use cases. So on the left-hand side, looking at single series and how they trend over time, it’s very useful to do health checks. So if you want to monitor and supervise, for example, your e-commerce site, you can expect that the number of users, for example, can be pretty much the same on a given time of day, but it’s not to be the same the next day. It’s going to be the same at night. So if you want to track these number of active users, you want to ask yourself the question that, “Is this number of users what you should expect for this time of day?”. So it’s really dependent on the time of day.
Sébastien Leger 00:08:08.598 You can also, in the same fashion, look into—if you have an e-commerce site—you look at the purchases, and if the number of purchases on a given time of day is not matching your expectations then maybe there is an issue someone should look into. Also still into forecasting problems, capacity planning, forecasting the disk usage—when they are going to be full. Also very colorful use cases in retail, because let’s say Amazons should be the first retailer and what they want to make, they want to minimize the inventory. So if you can forecast your demands in retail, then you will be able to solve this question in a very efficient way. On the right-hand side—so when you have multiple series and you want to compare them with clusters, then it’s going to give you—it’s going to enable you to drill down into much more advanced use cases. Because these are use cases that we find in cybersecurity, for example, fraud detection is one of the use cases that we’ve been working a lot on. You can also do predictive maintenance in IoT in a really elegant way. Also, you can drill down, if you’d like, into recommendations engines so it has giving—for example, on your websites—giving advice to different types of users based on the history of what they’ve done. For example, that’s also something you can solve with clustering and based on your time series data. So it’s just an example of the use cases I have in mind.
Sébastien Leger 00:09:56.552 So maybe you guys have other use cases as well. So the question might be, what’s a good use case for machine learning and I think—at the end of the day, it’s about predictability. So anytime you find that there must be a pattern then you want to find some predictability, it’s a good use case. So, just a cartoon here on screen to show you that eventually, you have to feed the dog every day, right? And it’s quite predictable. So we say first class of questions looking into individual series. You can guess the next values in a measurement. You can find daily, nightly, weekly patterns. And also eventually, you can find—you can do anomaly detection and that’s quite easy. Forecasting a set of values, looking into the actual observations, and looking into the differences gives you pretty straightforward anomaly detection.
Sébastien Leger 00:11:05.510 So this is a really powerful—powerful techniques, but it’s not going to work if you have random data, and I’m not assuming that anybody has random data here. Second set of questions we said—okay, so clustering the data in order to reduce the number of time series. You might have a very large number of time series and you can reduce them into different groups, different classes and also it enables you to do anomaly detection in a very efficient way by looking into how the individual series are moving across the different classes over time. So time is always—if you are doing anomaly detection—if it’s what you’re interested in, time is always the most important dimension for that kind of analysis.
Sébastien Leger 00:12:05.840 So to achieve that today, you will find that there are four challenges. The first one is—it’s a matter of availability because finding the skills in machine learning is still pretty hard. If you guys have those skills, then you are on the good side and that’s good for you. But for companies and for employers, it’s still a big challenge to find the skills and to build up the team. That’s the first challenge. The next one is, I would say, affordability—I mean if you want to start from scratch and do everything by yourself, you will end up into a project that’s probably 6 to 18 months. And also it involves trial and error. That means when you do machine learning, I would say the first solution that you try and the first attempt is never the—well it’s not going to work on your first attempt. So you have to try a number of different solutions in order to find the highest accuracy and train the best models. That involves time and it’s difficult to reduce this time. And also a third point is reliability. Because at the end of the day, it’s only software, and software, as we know is—it’s always full of mistakes. So reliability is pretty important. And final, but not least—last but not least, I would say is trust. Because today there is an ongoing debate about AI and how people should use that in the future. So I invite you, if you’re interested in that, I invite you to take a look into the YouTube discussions that occurred in South by Southwest this year.
Sébastien Leger 00:14:02.789 So that’s for all these challenges that we created at Loud ML. So we said availability, affordability, and reliability. First one is availability, and the first good news is that you guys can already download the software. We have a free Loud ML community version, so we want everyone to have access to these technologies because at the end of the day, we said this is a new electricity, so everyone must have it. So the free download we see at the end and give you the link for that. Second point is the affordability. So obviously, for all the use cases we’ve seen before, if you want to start from scratch then it’s going to cost you a lot. And what we find is—we find in 2018 we find a trend on the markets where you will find a number of machine learning frameworks that are more and more available and also it brings down the cost. So it makes more sense to try and use what’s available on the market.
Sébastien Leger 00:15:17.395 Final point is reliability because—well, this is software, and this is what we do and we want to release to you guys the best products. So we apply of course all the best practices and what we can do to deliver a software that is free of any defect. Now, second good news today is that it’s working really out of the box with InfluxDB in a very similar fashion as Kapacitor is working today I would say. So first of all, Loud ML is going to read data and extract data from your databases in order to train your machine learning models and then output predictions. So when we output predictions, we’ll be writing the points in new measurements into the same database where we are pulling the data. So we’ll see in a few minutes how this goes. And I forgot to mention also the notification if you want to run anomaly detection, obviously you will find and you will get notifications in real time.
Sébastien Leger 00:16:32.154 Now let’s have a look inside. It’s not a secret that it’s based on TensorFlow. We find that TensorFlow—it’s a really good framework so we decided to use it and we think it’s a future-proof framework. On the left-hand side, there is what we call the data source. So it’s a back-edge that allows us to read and write data from, for example, InfluxDB. At the top, we also use a number of—of course configuration files and we see that in a moment with the demo. And then at the bottom—at the bottom of the slide, you can see what we call the API, and this is something we are going to release in a matter of weeks now. Because we find that for anomaly notification when you want to take action for an anomaly, there are hundreds and hundreds of ways to do that. So you can send, for example, an email to somebody, you can send an SMS to somebody, or you can use some kind of REST API or SNMP traps to control an equipment. Hundreds and hundreds of ways and the best way we can provide you the flexibility and the freedom to do what you have to do is to open this API. So it’s going to be JPL V2 and you will be able to share with the community your plugins.
Sébastien Leger 00:18:07.376 First steps, are really easy. You can already download the software for Debian 9 and Red Hat 7. We don’t have the docker image, yet. But somebody from the earlier adopter community already is helping us on that so it’s going to be available also in a manner of weeks. Just a few words here to say—so we said community version is free. This allows you to do creative analytics, so that’s when you have single-time series and looking to the forecast and this allows you to run up to three different machine learning models. And if you want to move up to the paid version then it’s going to be subscription license with more machine learning models that you will be able to define and use. And also giving you the ability to answer the clustering questions in the use cases we’ve seen previously on the slides.
Sébastien Leger 00:19:15.542 Okay, so I believe now we can move to the demos. I’m going to try to—maybe, Chris, I have to stop the screen-sharing and share in another screen now?
Chris Churilo 00:19:27.941 Yep.
Chris Churilo 00:19:40.061 While we’re waiting for Sebastian to get set up there, if you have any questions, go ahead and put it in the Q&A or the chat. I think Sebastian did a really good job of describing that.
Chris Churilo 00:20:05.121 Okay, we see—looks like we’re okay.
Sébastien Leger 00:20:10.515 Okay, can you see that?
Chris Churilo 00:20:12.209 Yep.
Sébastien Leger 00:20:13.212 Yeah. Cool. So what I have here—so I have a—I’m running—of course I’m running InfluxDB, so you can see the usual Chronograf here. So I have a dashboard with data that is—hits on the website so every time there’s a hit on the website, you’ll find a measurement with the time of the hit and also duration on the page. Duration—so maybe if I’m going here in Data Explorer I’d find—okay, I have an HTTP measurement with hits, and these hits they provide the duration spent on the page and also the URL of the page. Really basic structure for this data and so coming back on the dashboard, hits on my websites. Okay, what’s my time range here? Okay, so that’s 23rd of January to 26th of January. Maybe I’m going to change this a little.
Sébastien Leger 00:21:55.630 Okay. So the top graph you can see. So that’s the average page view duration. So it’s on average—it’s in milliseconds so we can see on average about—it’s about 60 seconds, that’s increasing during the day. It’s decreasing during the night. Somewhat what we would expect. And there’s a nice pattern we can see. And on the bottom graph, you have the page hits. Also pretty regular, so it’s increasing after six in the morning. There was a small drop at lunchtime. And it’s increasing again, about almost 2,000 visitors, then it’s decreasing at night, of course. So you can see—okay, there’s also a significant pattern in this data obviously, but here—those are missing spots. We’ll see the page counts is dropping really low on the 24th of January. You can see in this data set. So let’s see what we can do with that. Okay. Okay. So Loud ML’s already installed. I’m running—I’m sorry. I’m getting tired. Sorry. It’s already installed on my system. Here you can see I’m running 1.2 version. There are two packages. This is the core package, and this is the data source package for InfluxDB. It’s already installed.
Sébastien Leger 00:23:48.291 The first step is to have a look at the configurations. It’s only one five—so it’s—sorry, I was a bit fast. It’s /etc/loudml/config.yml. It’s a single file to define where to—where to find the databases. So I have defined here two data sources. Both of them are using the same InfluxDB instance, running on my localhost and they are using different databases. So the one—we’ll have a look here—it’s the HTTP database for the web traffic, and that’s pretty much all you have to define. After you’ve done that—okay. So restarting the HTTP demo and checking that it’s running. Good. There you are. So after you’ve done that—so you have a running set-up with Loud ML, but obviously we have to define some things. So we’ll define a machine learning model for page hits and it’s a single file that we have to write to tell how do we want to forecast the data? So we’ll define here a time series model—first type of model we seen on the presentation. We’ll give it a name. That’s important. So we’ll say this is bucket interval because we are making aggregations. We are splitting the data into—here—30 minutes interval. Next parameter is a span. Here it’s six. That’s the number of buckets that we find relevant to make predictions. So if the bucket interval is 30 minutes, then 6 will make it 3 hours. So looking back into the past, the last three hours, will have an impact to output the predictions.
Sébastien Leger 00:26:20.467 Next parameter, using the daytime means that we have a dataset where the time of the event is significant. Obviously, if we get—if we have web traffic, if we get a thousand users at 2 PM, it’s not the same as getting a thousand users at 2 AM. Because we usually don’t have this kind of traffic in the morning, so. Using the daytime ensures that this is taken into account to output the predictions. I’m going to skip the other parameters and here, dive into the feature sections. So what this is really it’s a histogram and we have to define what do we want to output as predictions? So here we’ll define two outputs. The first one is the number of page hits. So we’ll say this is our feature number one. It’s for the measurements that are called hits. And we are going to count the page URLs. The second feature is the average page durations. It’s also using the measurements that are hits, and we are taking the average of the duration total. This is how you would define—so you’re first machine learning model was Loud ML. Then what I’m going to show you here it’s the CLI. And the CLI does exactly that. So it’s going to allow to create your models, eventually maybe develop them, train them, and output predictions. So first step is to create.
Sébastien Leger 00:28:33.597 Okay, successful. So now the training. And training is really the—maybe the most important step. So if we look back into the data that is inside InfluxDB, we have adopted lots of data for January. So what I’m going to do, I’m going to use data from January and say, “I want to train this page hits model using the data from the first of January to the 21st of January.” Usually takes a few seconds to a few minutes, it depends on the amount of data you have. Okay. Here we go. So this tells you and me the accuracy of the training. Here it’s 95%. That’s not too bad. So now we have a model that is trained. Which means it has learned the shape of your data and since it knows the shape, it’s going to be able to output and forecast the next values and to run the anomaly detection as well.
Sébastien Leger 00:29:56.884 The first steps I want to show you are predictions. So similar CLIs, so here it’s Loud ML predict. Name of the model and you can do that—you don’t have to do that real-time. Well, obviously, you can do that real-time, but you can also ask this question for data that you have in the past. So here I’m going to say, “I want to know what the models thinks the right value should be between the 24th of January and the 27th of January. And I want to save the data, so output the predictions back into InfluxDB.” So this is what it tells you in the end. I’m saving the prediction to the data source and the data source is InfluxDB in that case. So here if I—sorry. So if I refresh the screen, you’ll find in green now the output of the prediction for the time range that I’d used on the CLI. Let’s have a look here at the way it’s defined. So in Chronograf, if you are already familiar with Chronograf, there is the query language. The first one I have on this step is—it’s the original data. And on the right tab is—I’m using here the output prediction. So I’m using, as you can see, data from the HTTP measurements—I’m sorry, the HTTP database and the prediction page hits model. That is a new measurement saving the output of the prediction. What’s a bit—a small detail here, but it’s important, so remember that we define the bucket interval to be 30 minutes. So you have—when you are making these graphs, you will have to say, “Here, I want to group by 30 minutes interval.” Same as well on the left tab because otherwise, it’s going to make aggregations that just don’t make sense.
Sébastien Leger 00:32:36.021 And if we go back here inside the Data Explorer tab so you see that we have—so in the HTTP database the original measurement, the hits, and the new measurements. It’s called prediction, name of your model and this has the features that we’ve defined inside the model. Going back on the charts—okay. So there was, in this data set, there was a significant drop on the 24th around 12 or 1 PM. Okay, we can see it here. 1 PM, it’s in blue. It’s difficult to read. So the original data reads 20 and the output of the prediction here at 1 PM says it should be 116,000. So that’s a huge difference. So it leads us to anomaly detection. So I’m going to go back here. I can run the same commands again, but with different options minus—so minus a is for anomaly detection.
Sébastien Leger 00:34:29.733 So what it does here, it’s giving you exactly the same output as we can see on the screen. So it’s—what are the values that are observed and predicted for the different time-slots? And we find here that at 12, there’s anomaly here on—sorry. On the page hits, the page hits are down to 1 when they should be 1,500 also. So here we flag an anomaly and we say, “Okay, there’s a scope with that. The scope is something which is between zero and a hundred. So a hundred means it’s really high and so you have to define where do you set your threshold? I think it was 50—very quickly—page—when we defined it—yeah. So you can see here. Our threshold, it’s 50. So anything which will have a score higher than 50 will be flagged as an anomaly. Okay. And we have a 75—77 here. That’s pretty high score. So that’s what the framework of [inaudible] to do with time series so you can see that there is a forecasting ability and also straightforward anomaly detection built on top of that. You can do that real time and you can do that also on a set of data that you already have in the past. And when I mentioned previously on the slides trial and error, that means that you would have to do that a couple of times in order to find the right settings. So that’s what trial and error means. So you have to play around with the settings to find what really is best according to your data set. But I hope that we’ve made that as easy as it can get.
Sébastien Leger 00:36:59.879 So finally, I’d like also to—that’s something I’d like to show as well. So we mentioned real-time. So if you want to start the real-time predictions, there’s only one command. To run, that’s using the REST API so we’re going to say, “This model has to run—so it has to start.” And after you’ve done that, Loud ML is going to wake up at regular intervals and it’s going to output predictions pretty much like Kapacitor is doing for you already. So this saves you the pain—you don’t have to use a CLI actually. After you’ve enabled that—I mean you’ll find the model will just output its predictions and you don’t have to touch anything. This is something you can—you can check. Here, with this endpoint, which tells you the jobs—so the machine learning jobs that are running in the background. So that’s the job running, but it says, “Unfortunately, there’s no data for the 17th of April.” So it’s complaining, but you can see it’s running. And to stop that, it’s pretty much the same endpoint URL with stopping the end. So that’s it for the web traffic use case and the anomaly detection.
Sébastien Leger 00:38:57.277 What we mentioned in the slides is the availability of plugins that you will be able to write in order to take action on the anomalies. We are still writing documentation for that. So it’s going to be released in a couple of weeks. If you’ll just connect to your Twitter accounts, you’ll find the news there. I’ll make sure of that. Also, so if we are leaving for a few minutes, the web traffic use case, I want to show you what a retail model could be. So let’s say you are Amazon and you have a number of items that you sell. So maybe you are selling pineapples and lemons. So you can imagine that in your InfluxDB database, you can store every time someone is purchasing something, you can store a transaction. So very easy way to forecast the demand for this retail use case would be to define features like this. So feature number one, so that’s our first kind of article, items that are being purchased. We would say only interested here in pineapples and we want to make a sum of the quantities that are being purchased. Second feature, exactly the same but you’re—I’m sorry, just a small mistake. Exactly the same, with lemons and you can go on and on and on. Define as many features as you would need to forecast the next purchase and next volume of items that will be purchased for retail. And here you can say, okay that’s a 24-hours interval—sorry, that’s just—here, bucket interval. That should be good enough. And a span of seven days. Maybe here because I’m really focused on fresh—fresh fruits and vegetables and you will need to have them more than seven days. You need to go back to the shop afterwards.
Sébastien Leger 00:41:22.939 So I hope that’s—it’s been useful to you guys and don’t forget that you can find, and you can try it for free. Here is a download URL. So it’s LoudML.com/download. Remember that it’s currently available for Debian 9 and sent to us Red Hat 7. And just in a manner of weeks, you will also get the docker image. Thank you very much all of you and now I believe we can answer some questions.
Chris Churilo 00:42:05.184 Yeah. Thank you so much. That was really great. If people do have questions, if you guys have any questions, just type them into the Q&A and Chat Panel. We’ll stay here for a few minutes. I do have a question, as you were setting things up, what is—you talked about the accuracy of the model. That was like a number that was I think spit out when you actually applied this to time series data. What does that actually mean, and what should we look for in that number?
Sébastien Leger 00:42:39.637 Okay, so you should definitely look for the highest numbers. What is believed to be a good number by machine learning is something above 95%. So when you can achieve this number it’s really good. What it means, is—so we are talking about—here we are dealing with numbers, right? So if I go back to my screen—okay. So we’re dealing with numbers and you can see there is a small distance between the predictions and the actual values. And this is called—if just do a subtraction of that, this is the error. So the accuracy—but this is what it means, that this measurement of the error—and the smaller the error the higher the accuracy of your predictions.
Chris Churilo 00:43:40.311 Oh. That makes sense. Thank you.
Sébastien Leger 00:43:42.432 And if you get to 97, 98% then use here—something that fits your data really well. I haven’t seen a data set that brings you this, but—okay, you should always try to achieve the highest score. I mean this is what we try to achieve eventually.
Chris Churilo 00:44:04.394 Excellent. So you and I were talking that you actually released at Mobile World Congress in February and already shared it with a number of InfluxDB users. Maybe as we’re waiting for questions, you can just share with us some of the feedback that they gave you about Loud ML.
Sébastien Leger 00:44:24.180 The feedback, it’s—we find that—there’s already a community of InfluxDB users who are getting their hands on that and thank you very much. So there’s already good feedback from your community and what we’d like to do it is what we are doing today is to give the opportunity to everyone who is using InfluxDB today to get their hands on the software. We have reached out to—I would say 150 users already. But there might be much more. So by all means, guys you can have a look and let us know if you want to have more. So, yeah. Definitely, some really positive feedback after the Mobile World Congress. We are receiving as well future requests, some stuff to add inside the roadmap because we have a detailed account. This is something we are very—we are very careful about—we pay close attention to what people would like to see in addition to what’s already being released. And I think that’s the right way to drive things forward.
Chris Churilo 00:45:49.438 All right. We do have a question from our participants. So the question is: “What is a typical way to increase accuracy for the training? When I played Loud ML I was not able to get accuracy more than 10%.”
Sébastien Leger 00:46:03.975 Okay. So a few tricks. What’s the few tricks you can do? So sometimes when you have a low accuracy, it could be that you just—maybe you don’t have enough data. So just a quick example. If I go back to the training, come in line. Here, so what I used just there—training between the first of January and the 21st of January, you can see how I reduced that training between the 1st and the 7th. So this is sending a query and getting the data from InfluxDB. This gets how many—can I see that in the docks? So this gets 288 data points. And you can see the accuracy it’s already dropping. If I want to reduce that again—
Sébastien Leger 00:47:19.000 And it’s dropping. So this is really the first thing to check is do you have enough data for training? The next trick—the next trick is—the next trick is here—there. So this is called max evaluations. I just set it to one because for the webinar we don’t have much time, but if you have much time and if you can wait to better accuracy you can increase that. You can say 10, you can say hundreds. What it’s going to do, it’s going to optimize the parameters even more so it’s not going to do 1 training, it’s going to do 10 training or as much as you want to define. And then it’s going to pick up and it’s going to pick the best one of them. So if you want to read about that, this is called hyper-parameter optimization. So this is when you have already a fit with the first training. How do you improve this fit through additional trainings?
Sébastien Leger 00:48:47.275 And finally, I would say the third advice would be—I’m sorry. Sorry. So, yeah. The third advice is here to play with this. What’s the right bucket interval? What’s the right span? It’s not an obvious—there are no obvious responses for that. So you might have to play around a bit to find the right values that fit your data here. How do you divide the time slots? Is it a one-minute time slot? Is it a 30-minute time slot. So this can also have some great effects on the training results. Span as well. So this, as we mentioned in the presentation, this is part of the trial and error that pretty much any machine learning project has to go through. But I hope we can make that as easy as possible through Loud ML.
Chris Churilo 00:50:15.890 I think those were good pieces of advice and it makes a lot of sense. And, oh, look and our friend who asked the question also said, “Very helpful. Thank you.”
Sébastien Leger 00:50:24.926 Thank you very much. You’re welcome.
Chris Churilo 00:50:27.139 So, we’ll just keep the lines open for just maybe two more minutes. And if you do have any questions, please throw them into the Q&A or chat. If you have questions after today’s presentation, just shoot me an email. You guys have my email with the meeting invite and I will forward on to Sebastian and so that he can answer them for you. And I have a feeling that a lot of people, once they start playing around with it is when I think a lot more questions will come. Because you did a very excellent—
Sébastien Leger 00:50:55.156 You can ask for—yeah, and you can definitely ask your questions on GitHub. I mean the team is always having a look at that, so if we can answer these questions we will do so on GitHub. Well, so please, by all means, you can raise your hands and raise your questions on GitHub.
Chris Churilo 00:51:17.742 Fabulous. And then don’t forget, there’s—you can train three series with the community version, so I recommend trying it out, playing with it. It’s a really nice way to just kind of get your toe in the water with machine learning. Start asking some questions about the models that it’s creating for you. And if you happen to be in London on the 14th of June, please join us for InfluxDays. Sebastian and crew will be there so you can even talk to them in a lot more detail. Probably by then, you’ve played around with it so you might have a lot more questions. So it looks like no questions today, but I think—oh, here we go. Every time I’m [laughter]—about to close. All right. Another question. Quick question. “In the documentation, you have only one metric. So metric:average. Where can we find more information about what is available and what also is used behind?”
Sébastien Leger 00:52:17.844 Okay. So the metric here—is a good point. We’ll try to enhance documentation for that. So the metric here, it can be any metric that InfluxDB already supports. It’s maybe 10 or 12 of them. So there’s mean, max, average, count, style, navigation, spread—Chris help me [laughter]. Why, yeah, sure, if this can help so it could be any type documentation and you can type them in just as you would type them in inside InfluxDB. It’s an aggregation function and we’re really mapping the values to the capabilities that are supported by the database.
Chris Churilo 00:53:15.150 That makes sense. All right, I’m going to leave it open—oh, here we go. So the follow-up question is, “So this is basic what InfluxData supports. Yes. I thought this was from Influx. Thank you.” So yeah, I mean, so the cool thing is about what Sebastian and team have done is they’ve done such a beautiful way of integrating this product—these two products together that it does feel like it is Influx. That’s why I’m so excited about it. So you get to basically use what you’ve already created in InfluxDB, and what you’re already familiar with, when it comes to looking at averages or min, max, etc. And then just be able to apply these trainings onto your various time series and take a look at it, bring it back into Chronograf so you can visually look at it, etc. So it’s—that’s why I’m really excited by this.
Sébastien Leger 00:54:11.966 And if you guys want to come to London, so InfluxDays, we’ll be proud to be here and really pleased to discuss with you on this topic. We have many more—we are working so hard right now. There’s much more stuff and much more cool stuff coming for InfluxDays that I’m not going to talk about today. Obviously. So that’s a teasing for the event. You can always register for the event and we’ll show you more stuff there.
Chris Churilo 00:54:46.721 Okay. Excellent. So I am going to close the—no I’m not. So you got a final comment from one of our attendees. “Just want to say a huge thank-you for this webinar. I got lots of my questions answered and it would be nice if you input this on YouTube.” So I’ll send this link to the—video out to everybody on the call. We typically send the links out the day after. And then this will get posted. In fact, just hang on, I might as well just grab the URL for you guys right now.
Chris Churilo 00:55:38.289 Okay. So the cool thing is—throw this into the Chat Panel. So here’s the URL. So if you go to that URL right now, it says something like: “The webinar recording is coming soon.” So in just a couple of hours—I just have it set on a timer just to give me some time to edit. It’ll actually switch to show the video in just a couple of hours. So, it’s the same URL that you used to register and—but you will also get the email tomorrow that will give you that link in case you forget. But it’s definitely worth playing around. Take another listen to this and then try the community version and play around with it. I think you’ll find that it’s going to be a lot of fun for you. And, now I’m very intrigued about what’s going to happen at InfluxDays. So we will make sure that we keep you guys posted on the whatever cool stuff that Sebastian and team have up their sleeve. So thanks everybody for participating today. And thank you so much to our fabulous speaker today. And—
Sébastien Leger 00:56:46.556 Thank you—Thank you, Chris, so much and thank you, guys.
Chris Churilo 00:56:50.067 Yes, and like I said, we will see you guys again. Bye-bye.
Sébastien Leger 00:56:57.108 Have a great day. Thank you.